Job Title : Machine Learning Lead Engineer
Work Location : Pittsburgh, PA
Duration : 6 month Contract-to-Hire
Education / Experience Required : Leading ML / DS projects in an industrial setting, experience working with time-series data and SCADA applications.
Job Description & Responsibilities :
- Lead end-to-end delivery of data engineering and machine learning projects that improve operational efficiency and reduce emissions in natural gas development, production, and midstream operations.
- Own the full lifecycle of ML solutions from ideation and scoping, to data discovery, modeling, validation, deployment, and long-term monitoring.
- Provide technical leadership for a small but high-impact Data Engineering & Machine Learning team, including mentoring, code reviews, and setting best practices for quality and reliability.
- Work on-site with engineering, operations, and SCADA teams on-site to understand real-world constraints, define use cases, and translate business problems into data-driven solutions.
- Lead the effort to productionize existing models and scale multiple proof-of-concept solutions into robust, maintainable, and observable production systems.
- Architect, design, and implement data pipelines that ingest, clean, transform, and store large volumes of time-series and event data from SCADA systems, field sensors, and other industrial data sources.
- Develop and refine machine learning models focused on time-series problems such as forecasting, anomaly detection, remaining useful life estimation, and performance optimization of assets.
- Contribute to and help lead Digital Twin Analytics initiatives, integrating physics-based and data-driven models to simulate and optimize complex systems in the field.
- Collaborate on the design and implementation of anomaly detection frameworks that surface early warning signals for equipment failure, process instability, and emissions events.
- Partner with software and DevOps engineering to design and implement MLOps practices, including CI / CD for ML, feature stores, model versioning, model performance tracking, and automated retraining pipelines.
- Establish standards for model validation, A / B testing, and offline / online evaluation to ensure robustness and reliability in operational environments.
- Communicate technical concepts, tradeoffs, and results clearly to both technical stakeholders and non-technical operations leaders, including presenting insights and recommendations to leadership.
- Evaluate new tools, libraries, and cloud-native services in the data and ML ecosystem and make recommendations to improve performance, scalability, and developer productivity.
- Champion a culture of experimentation, data quality, and continuous improvement while keeping a strong focus on safety, regulatory compliance, and environmental impact.
Skills & Qualifications :
Bachelor's degree in Computer Science, Data Science, Engineering, Applied Mathematics, or a related field; a graduate degree is preferred but not required with sufficient experience.7+ years of hands-on experience in Data Science, Machine Learning, or Data Engineering, with at least 2+ years in a lead or senior role driving projects from concept to production.Strong proficiency in Python for data engineering and machine learning, including experience with common data and ML libraries (for example pandas, NumPy, scikit-learn, PyTorch or TensorFlow, statsmodels).Strong SQL skills and experience working with relational databases, time-series databases, or data warehouses for large-scale analytics.Proven experience working with time-series data in industrial or operational settings, including feature engineering, resampling, handling missing data, and building time-series forecasting or anomaly detection models.Experience integrating and analyzing SCADA data or similar industrial control and telemetry systems in energy, utilities, manufacturing, or related heavy industry.Solid understanding of data engineering concepts including ETL / ELT pipelines, batch and streaming architectures, and data modeling for analytical workloads.Hands-on experience with cloud platforms (such as AWS, Azure, or GCP) and modern data tooling, for example cloud storage, managed databases, containerization, and workflow orchestration tools.Practical exposure to MLOps practices, including deployment of models to production, monitoring model performance in real time, and maintaining models over their lifecycle.Familiarity with concepts and techniques related to Digital Twin Analytics, physics-informed models, or asset performance management in an industrial context is a strong plus.Strong software engineering fundamentals including version control (Git), code review practices, testing, and documentation.Demonstrated ability to lead cross-functional initiatives, prioritize a portfolio of projects, and manage stakeholder expectations in a fast-moving environment.Excellent communication skills, with the ability to explain complex technical concepts clearly and to influence decision making among operations leaders, field engineers, and executives.Willingness to work on-site in the Pittsburgh, PA area and periodically visit field or plant locations as needed; relocation support is available for exceptional candidates.