Data Engineer
Primary Skills : Data Engineering, Databricks, Pyspark, Apache Airflow, Apache Spark
Location : San Jose, CA (Hybrid 3 days / week onsite)
Duration : 5 months
Contract Type : W2 only
Pay Rate : $99.13 / Hour
Responsibilities
- Design, develop, and maintain scalable and reliable data pipelines to support large-scale data processing.
- Build and optimize data workflows using orchestration tools like Apache Airflow and Spark to support scheduled and event-driven ETL / ELT processes.
- Implement complex parsing, cleansing, and transformation logic to normalize data from a variety of structured and unstructured sources.
- Collaborate with data scientists, analysts, and application teams to integrate, test, and validate data products and pipelines.
- Operate and maintain pipelines running on cloud platforms (AWS) and distributed compute environments (e.g., Databricks).
- Monitor pipeline performance, perform root cause analysis, and troubleshoot failures to ensure high data quality and uptime.
- Ensure proper security, compliance, and governance of data across systems and environments.
- Contribute to the automation and standardization of data engineering processes to improve development velocity and operational efficiency.
Required Skills
9-12 YOEProficient in Python and PySpark for data processing and scripting.Strong experience with SQL for data manipulation and performance tuning.Deep understanding of distributed data processing with Apache Spark.Hands-on experience with Airflow or similar orchestration tools.Experience with cloud services and data tools in AWS (e.g., S3, Lambda, SQS, Gateway, Networking).Expertise with Databricks for collaborative data engineering and analytics.Solid understanding of data modeling, data warehousing, and best practices in data pipeline architecture.Strong problem-solving skills with the ability to work independently on complex tasks.Familiarity with CI / CD practices and version control (e.g., Git) in data engineering workflows.