Data Engineer – Hadoop Administrator
HIGHLIGHTS
Location : Chicago, IL / New York, NY / Phoenix, AZ (Hybrid)
Position Type : Direct Hire
Compensation : BOE
Overview
We are seeking a Data Engineer to support Newton , our Data Science R&D compute cluster. This role functions as a Hadoop Administrator embedded within the ML Ops organization, providing hands-on operational support for the platform while partnering directly with data scientists, DevOps, and infrastructure teams. This individual will ensure the health, stability, performance, and usability of the Newton cluster, acting as the primary point of contact for platform support, troubleshooting, and environment optimization.
This is a highly collaborative and technical role with room for long-term career progression.
Key Responsibilities
- Serve as the primary administrator for the Newton Hadoop / Cloudera cluster.
- Provide direct support to data scientists experiencing issues with jobs, workloads, dependencies, cluster resources, or environment performance.
- Troubleshoot complex Hadoop, Spark, Python, and OS-level issues; drive root cause analysis and implement permanent fixes.
- Coordinate closely with DevOps to ensure patching, upgrades, infrastructure changes, and system reliability activities are completed on schedule.
- Monitor cluster performance, capacity, and resource utilization; tune and optimize for efficiency and cost.
- Manage Hadoop and Cloudera configurations, services, security, policies, and operational health.
- Implement automation and scripting to improve operational workflows and reduce manual intervention.
- Validate vendor patches, updates, and upgrades and coordinate deployments with DevOps and infrastructure teams.
- Maintain documentation, operational runbooks, troubleshooting guides, and environment standards.
- Serve as a liaison between Data Science, ML Ops, Infrastructure, and DevOps teams to ensure seamless platform operations.
- Support the organization’s commitment to protecting the integrity, availability, and confidentiality of systems and data.
Required Technical Skills
Strong hands-on experience with Hadoop administration , ideally within Cloudera environments.Proficiency with Python , particularly for automation and data workflows.Experience with Apache Spark (supporting jobs, tuning performance, understanding resource usage).Solid understanding of Linux / Unix systems administration , shell scripting, permissions, networking basics, and OS-level troubleshooting.Experience supporting distributed compute environments or large-scale data platforms.Familiarity with DevOps collaboration (patching, upgrades, deployments, incident response, etc.).Required Soft Skills & Competencies
Excellent communication skills with the ability to work directly with data scientists and technical end users.Ability to coordinate with multiple technical teams (DevOps, Infrastructure, ML Ops).Strong troubleshooting and problem-solving capabilities.Ability to manage multiple priorities in a fast-moving environment.Preferred Skills (Nice to Have)
Experience with ML Ops environments or supporting machine learning workflows.Experience with cluster performance optimization and capacity planning.Background in distributed systems or data engineering.