Job Summary :
We are looking for a skilled and motivated Hadoop Data Lake Automation Engineer with 4 5 years of experience in automating data workflows and processes within Hadoop-based data lake environments. The ideal candidate will be responsible for building scalable automation solutions, optimizing data pipelines, and ensuring efficient data movement and transformation across platforms.
Key Responsibilities :
- Design and implement automation solutions for data ingestion, transformation, and processing in Hadoop data lake environments.
- Develop and maintain scalable data pipelines using tools such as Apache NiFi, Spark, Hive, and Sqoop.
- Collaborate with data engineers, analysts, and business stakeholders to understand data requirements and deliver automation solutions.
- Monitor and troubleshoot data workflows, ensuring reliability and performance.
- Implement best practices for data governance, security, and metadata management.
- Maintain documentation for data flows, automation scripts, and operational procedures.
- Support production environments and participate in on-call rotations as needed.
Required Skills & Qualifications :
3 5 years of hands-on experience in Hadoop ecosystem (HDFS, Hive, Spark, Sqoop, Oozie, etc.).Strong experience in automating data lake workflows and ETL processes.Proficiency in scripting languages such as Python, Shell, or Scala.Experience with scheduling and orchestration tools (e.g., Apache Airflow, Control-M, AutoSys).Solid understanding of data modelling, data quality, and performance optimization.Familiarity with cloud platforms (AWS, Azure, GCP) and big data services.Excellent problem-solving and communication skills.Preferred Qualifications :
Experience with Apache NiFi or similar data flow tools.Exposure to CI / CD pipelines and DevOps practices.Knowledge of data cataloguing and lineage tools.