Talent.com
Principal DevOps Engineer - ML/AI Algorithms
Principal DevOps Engineer - ML/AI AlgorithmsRoche Holdings Inc. • Santa Clara, California, United States
Principal DevOps Engineer - ML / AI Algorithms

Principal DevOps Engineer - ML / AI Algorithms

Roche Holdings Inc. • Santa Clara, California, United States
job_description.job_card.variable_hours_ago
serp_jobs.job_preview.job_type
  • serp_jobs.job_card.full_time
job_description.job_card.job_description

At Roche you can show up as yourself, embraced for the unique qualities you bring. Our culture encourages personal expression, open dialogue, and genuine connections, where you are valued, accepted and respected for who you are, allowing you to thrive both personally and professionally. This is how we aim to prevent, stop and cure diseases and ensure everyone has access to healthcare today and for generations to come. Join Roche, where every voice matters.

Principal DevOps Engineer – ML / AI Algorithms

As Principal DevOps Engineer – ML / AI Algorithms, you will work on products that help people with the most precious thing they have – their health. You will be part of the RIS Research & Development team contributing to digital health products touching Imaging, ML / AI, and computational science.

The Opportunity

As Principal DevOps Engineer, you will collaborate with important stakeholders on the development of the build, release, and deploy toolchain for DevOps, paving the way for seamless and efficient software delivery processes.

Locations

This role can be based in Santa Clara (primary location) or in secondary locations (Mississauga, Canada or Basel, Switzerland).

Key responsibilities

Lead the initiative to set up, manage, and meticulously maintain parity across development, staging, and production application environments in cutting‑edge cloud infrastructure, ensuring a robust and consistent deployment pipeline.

Champion the implementation of advanced monitoring infrastructure, empowering the team with real‑time insights and ensuring the highest levels of system reliability and performance.

Provide dedicated on‑call support for production operations, ensuring the uninterrupted delivery of critical services and swift resolution of any operational issues.

Interface with software developers, product managers, test engineers and administrators on projects to design and develop the build, release, and deploy toolchain for DevOps while providing on‑call support.

Identify, troubleshoot and resolve issues quickly and effectively, sometimes under pressure.

Actively involved in planning, high availability engineering, performance tuning, and automation / tools development.

Manage multiple releases with focus on system reliability, scalability, and efficiency.

Implement and manage the full lifecycle of machine learning models, including versioning, deployment strategies (e.g., canary, A / B testing), monitoring for drift and performance, and decommissioning.

Bring in leadership quality to improve technology and process of devops as well as provide mentorship to other devops engineers in the team.

Who You Are

Bachelor's degree in Computer Science, Engineering, or a related field with a minimum of 8+ years of experience in a DevOps or equivalent combination of education and experience to perform at this level.

8+ years of experience with container technology, including Kubernetes, AWS EKS, Helm Charts, Splunk, and Docker, along with provisioning infrastructure through IAC using Terraform and cloud automation principles.

Proficiency in Unix / Linux administration in Shell scripting and internals with a preference for Ubuntu.

Deep working experience and extensive knowledge in building and deploying infrastructure using IaC frameworks such as Terraform and AWS Cloudformation / SAM.

Experience building and automating scalable data pipelines for ingesting, transforming, distributed computing and versioning large-scale image datasets.

Familiarity with DevOps practices and proficiency in log analysis and monitoring tools are essential for effective troubleshooting and system optimization.

Proficiency in Python for automating production systems, including Git, Gitlab, Git actions, GitHub CI / CD, familiarity with common ML libraries such as TensorFlow, PyTorch, and scikit‑learn to understand the engineering needs of the ML models you will be deploying.

Strong working knowledge of AWS Cloud infrastructure, including EC2, S3, API Gateway, Kubernetes, RDS, VPC peering, Route53, IAM, Batch, Lambda, AWS Config and Autoscaling.

Preferred

MLOps experience with demonstrated experience supporting machine learning or computer vision teams.

Deep experience with container orchestration for ML workloads using Kubernetes, including frameworks like Kubeflow or KubeRay to manage distributed training jobs.

Familiarity with data versioning tools like DVC.

Familiarity with common ML libraries such as TensorFlow, PyTorch, and scikit‑learn to understand the engineering needs of the ML models.

Familiarity with other languages such as Java, R, and C / C++.

Experience with AWS services for machine learning, such as Amazon SageMaker, and experience managing GPU‑accelerated compute instances (e.g., EC2 P and G series) for model training and inference.

Compensation

The expected salary range for this position based on the primary location of Santa Clara, CA is between $162,600 and $302,000. Actual pay will be determined based on experience, qualifications, geographic location, and other job‑related factors permitted by law. A discretionary annual bonus may be available based on individual and Company performance. This position also qualifies for the benefits detailed at the link provided below.

Benefits

Relocation benefits are not available for this position.

About Roche

A healthier future drives us to innovate. Together, more than 100,000 employees across the globe are dedicated to advance science, ensuring everyone has access to healthcare today and for generations to come. Our efforts result in more than 26 million people treated with our medicines and over 30 billion tests conducted using our diagnostics products. We empower each other to explore new possibilities, foster creativity, and keep our ambitions high, so we can deliver life‑changing healthcare solutions that make a global impact.

Equal Opportunity

Roche is an equal opportunity employer. It is our policy and practice to employ, promote, and otherwise treat any and all employees and applicants on the basis of merit, qualifications, and competence. The company’s policy prohibits unlawful discrimination, including but not limited to discrimination on the basis of protected veteran status, individuals with disability status, and consistent with all federal, state, or local laws.

If you have a disability and need an accommodation in relation to the online application process, please contact us by completing this form Accommodations for Applicants.

#J-18808-Ljbffr

serp_jobs.job_alerts.create_a_job

Principal Engineer • Santa Clara, California, United States

Job_description.internal_linking.related_jobs
Principal DevOps Engineer

Principal DevOps Engineer

Zoom • San Jose, CA, United States
serp_jobs.job_card.full_time
We are seeking a Principal Meeting DevOps Engineer who combines deep technical expertise with broad system understanding. This engineer should be capable of diving into a wide range of services and ...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Principal ML Engineer — GenAI & Large-Scale AI Systems

Principal ML Engineer — GenAI & Large-Scale AI Systems

Walmart • Sunnyvale, CA, United States
serp_jobs.job_card.full_time
A large retail company in California is looking for a Principal Machine Learning Engineer to lead AI and machine learning projects. This role involves developing and deploying scalable solutions, co...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Sr. Solution Engineer - DevOps Software Solution (27728)

Sr. Solution Engineer - DevOps Software Solution (27728)

Supermicro • San Jose, CA, United States
serp_jobs.job_card.full_time
Supermicro is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop / Big Data, Hyperscale, HPC and IoT / Embedded customers...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Principal Machine Learning Engineer, Firefly

Principal Machine Learning Engineer, Firefly

Adobe Inc. • San Jose, CA, United States
serp_jobs.job_card.full_time
Changing the world through digital experiences is what Adobe is all about.We empower everyone—from emerging artists to global brands—to design and deliver exceptional digital experiences.Our passio...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Principal DevOps Engineer (Cortex Observability)

Principal DevOps Engineer (Cortex Observability)

Palo Alto Networks • Santa Clara, CA, United States
serp_jobs.job_card.full_time +1
Principal DevOps Engineer (Cortex Observability).At Palo Alto Networks® everything starts and ends with our mission : To be the cybersecurity partner of choice, protecting our digital way of life.Ou...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Principal DevOps Engineer - ML / AI Algorithms

Principal DevOps Engineer - ML / AI Algorithms

F. Hoffmann-La Roche Gruppe • Santa Clara, CA, United States
serp_jobs.job_card.full_time
At Roche you can show up as yourself, embraced for the unique qualities you bring.Our culture encourages personal expression, open dialogue, and genuine connections, where you are valued, accepted ...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Senior DevOps Engineer – ML / AI Pipelines & Cloud

Senior DevOps Engineer – ML / AI Pipelines & Cloud

F. Hoffmann-La Roche Gruppe • Pleasanton, CA, United States
serp_jobs.job_card.full_time
A global healthcare leader seeks a Principal DevOps Engineer - ML / AI Algorithms to develop impactful software that supports health initiatives. This role focuses on maintaining deployment environmen...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_hours • serp_jobs.job_card.promoted • serp_jobs.job_card.new
Principal Machine Learning Engineer

Principal Machine Learning Engineer

Cisco Systems • San Jose, CA, United States
serp_jobs.job_card.full_time
We are an agile team with a startup feel and a strong bias for action.We move fast, embrace failure as part of the process, and stay focused on solving real‑world problems for defenders on the fron...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_1_day • serp_jobs.job_card.promoted
Sr. DevOps Engineer

Sr. DevOps Engineer

Supermicro • San Jose, CA, United States
serp_jobs.job_card.full_time
Supermicro is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop / Big Data, Hyperscale, HPC and IoT / Embedded customers...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Principal AI / ML Engineer - Gen AI & LLM Ops

Principal AI / ML Engineer - Gen AI & LLM Ops

JPMorgan Chase & Co. • Palo Alto, CA, US
serp_jobs.job_card.full_time
A leading financial institution is seeking a highly skilled Principal AI / ML and Gen AI Engineer to enhance their AI / ML capabilities. The role involves designing and implementing robust infrastructur...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_hours • serp_jobs.job_card.promoted • serp_jobs.job_card.new
Principal DevOps Engineer

Principal DevOps Engineer

Devopshunt • San Jose, CA, United States
serp_jobs.job_card.full_time
Roche fosters diversity, equity and inclusion, representing the communities we serve.When dealing with healthcare on a global scale, diversity is an essential ingredient to success.We believe that ...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Principal AI / ML Operations Engineer

Principal AI / ML Operations Engineer

BlackLine • Pleasanton, CA, United States
serp_jobs.job_card.full_time
It's fun to work in a company where people truly believe in what they're doing!.At BlackLine, we're committed to bringing passion and customer focus to the business of enterprise applications.Since...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Principal Machine Learning Engineer

Principal Machine Learning Engineer

Cisco Systems, Inc. • San Jose, CA, United States
serp_jobs.job_card.full_time
We are an agile team with a startup feel and a strong bias for action.We move fast, embrace failure as part of the process, and stay focused on solving real‑world problems for defenders on the fron...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Elasticsearch - Principal Software Engineer II - Vector Search

Elasticsearch - Principal Software Engineer II - Vector Search

Elastic • Mountain View, CA, United States
serp_jobs.job_card.full_time
Elastic, the Search AI Company, enables everyone to find the answers they need in real time, using all their data, at scale - unleashing the potential of businesses and people.The Elastic Search AI...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Principal Software Engineer - AI Systems

Principal Software Engineer - AI Systems

ODAIA • Sunnyvale, CA, United States
serp_jobs.job_card.full_time
Design and implement large-scale, production-grade AI systems that integrate LLMs and Generative AI into real-world applications. Build frameworks that support Retrieval-Augmented Generation (RAG), ...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Principal ML Engineer for AI-Driven Cyber Defense

Principal ML Engineer for AI-Driven Cyber Defense

Cisco Systems • San Jose, CA, United States
serp_jobs.job_card.full_time
A leading technology company in San Jose is seeking a candidate for a role focused on designing and building AI-driven workflows for security operations. Candidates should have a strong background i...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_hours • serp_jobs.job_card.promoted • serp_jobs.job_card.new
Principal DevOps Engineer : Cloud, CI / CD & Secure Infra

Principal DevOps Engineer : Cloud, CI / CD & Secure Infra

Fortinet • Sunnyvale, CA, United States
serp_jobs.job_card.full_time
A leading cybersecurity company in Sunnyvale is seeking a DevOps Engineer to design and maintain scalable cloud infrastructure and enhance development workflows. This role requires expertise in clou...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Principal Machine Learning Engineer - Autonomy

Principal Machine Learning Engineer - Autonomy

Wayve • Sunnyvale, CA, United States
serp_jobs.job_card.full_time
At Wayve we’re committed to creating a diverse, fair and respectful culture that is inclusive of everyone based on their unique skills and perspectives, and regardless of sex, race, religion or bel...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted