Role Summary
We’re looking for a hands-on Data Engineer with strong AWS ELT experience (S3, Glue, Athena) to design, build, and operate production data pipelines and data lakes that support analytics and GenAI use cases. You’ll work across the stack — ingestion, transformation, cataloging, querying, monitoring, and cost optimization — and collaborate with product, ML, and cloud teams.
Key Responsibilities
- Design, implement, and maintain ELT pipelines using AWS S3, AWS Glue (Glue ETL, Glue jobs, Glue Data Catalog), and Amazon Athena.
- Ingest data from relational databases, event streams, APIs, and files; implement schema evolution and partitioning strategies.
- Build reliable, testable, and observable data workflows (Glue Workflows, AWS Step Functions, or orchestration tools).
- Develop and maintain data catalogs, schemas, and metadata (Glue Data Catalog, AWS Glue Crawlers).
- Optimize Athena queries and S3 layout (partitioning, file formats e.g., Parquet/ORC, compaction) for performance and cost.
- Implement data quality checks, monitoring, and alerting (unit tests, data validation, metrics).
- Define and enforce data access patterns, security, IAM roles, encryption, and governance.
- Collaborate with ML engineers, analysts, and product teams to translate business requirements into data solutions.
- Contribute to architecture decisions, best practices, and documentation; mentor junior engineers.
Required Skills
- 2+ years of professional data engineering experience; demonstrable experience building ELT pipelines on AWS.
- Hands-on expertise with S3, AWS Glue (jobs, crawlers, catalog), and Athena in production.
- Strong SQL skills and experience optimizing analytical queries.
- Experience with columnar file formats (Parquet/ORC), partitioning, and data lake design patterns.
- Familiarity with orchestration/CI-CD for data pipelines (Airflow, Step Functions, Glue Workflows, GitHub Actions, etc.).
- Knowledge of data modelling for analytics and ML (star schema, wide/narrow fact tables).
- Solid understanding of AWS IAM, encryption (KMS), and security best practices for data.
- Experience with monitoring/logging tools (CloudWatch, Prometheus, Grafana, or similar).
- Excellent communication, problem solving, and collaboration skills.
Preferred Skills
- Experience with streaming ingestion (Kinesis, Kafka) and CDC tools (Debezium, DMS).
- Familiarity with other AWS analytics services (Redshift, EMR, Lake Formation) or cloud providers.
- Experience supporting ML pipelines and feature stores.
- Python or Scala programming for ETL and transformations.
- Experience working at a consulting or product engineering firm.
Personal Qualities
- Strong problem-solving and analytical skills
- Excellent communication and teamwork abilities
- Self-motivated and able to work independently when required
- Passionate about learning new technologies and keeping up with industry trends
- Detail-oriented with a focus on writing clean, efficient, and maintainable code
What We Offer
- Opportunity to work on GenAI, cloud-first projects for diverse clients.
- Collaborative engineering culture with mentoring and career growth.
- Competitive salary and benefits (location-adjusted).
- Flexible work arrangements.