No C2C
Were looking for a hands-on Data Engineer to help build, scale, and fine-tune real-time data systems using Kafka, AWS, and a modern data stack. In this role, youll work deeply with streaming data, ETL, distributed systems, and PostgreSQL to power analytics, product innovation, and AI-driven use cases. Youll also get to work with AI / ML frameworks, automation, and MLOps tools to support advanced modeling and a highly responsive data platform.
What Youll Do
- Design and build real-time streaming pipelines using Kafka, Confluent Schema Registry, and Zookeeper
- Build and manage cloud-based data workflows using AWS services like Glue, EMR, EC2, and S3
- Optimize and maintain PostgreSQL and other databases with strong schema design, advanced SQL, and performance tuning
- Integrate AI and ML frameworks (TensorFlow, PyTorch, Hugging Face) into data pipelines for training and inference
- Automate data quality checks, feature generation, and anomaly detection using AI-powered monitoring and observability tools
- Partner with ML engineers to deploy, monitor, and continuously improve machine learning models in both batch and real-time pipelines using tools like MLflow, SageMaker, Airflow, and Kubeflow
- Experiment with vector databases and retrieval-augmented generation (RAG) pipelines to support GenAI and LLM initiatives
- Build scalable, cloud-native, event-driven architectures that power AI-driven data products
What You Bring
Bachelors degree in Computer Science, Engineering, Math, or a related technical field3+ years of hands-on data engineering experience with Kafka (Confluent or open-source) and AWSExperience with automated data quality, monitoring, and observability toolsStrong SQL skills and solid database fundamentals with PostgreSQL and both traditional and NoSQL databasesProficiency in Python, Scala, or Java for pipeline development and AI integrationsExperience with synthetic data generation, vector databases, or GenAI-powered data productsHands-on experience integrating ML models into production data pipelines using frameworks like PyTorch or TensorFlow and MLOps tools such as Airflow, MLflow, SageMaker, or Kubeflow