Talent.com
Senior Research Engineer, Foundation Model Training Infrastructure
Senior Research Engineer, Foundation Model Training InfrastructureNVIDIA • Santa Clara, CA, United States
serp_jobs.error_messages.no_longer_accepting
Senior Research Engineer, Foundation Model Training Infrastructure

Senior Research Engineer, Foundation Model Training Infrastructure

NVIDIA • Santa Clara, CA, United States
job_description.job_card.30_days_ago
serp_jobs.job_preview.job_type
  • serp_jobs.job_card.full_time
job_description.job_card.job_description

NVIDIA is searching for a senior or principal engineer who specializes in building cutting-edge infrastructure for large-scale foundation model training in the Generalist Embodied Agent Research (GEAR) group. Our team is leading Project GR00T () , NVIDIA’s moonshot initiative at building foundation models and full-stack technology for humanoid robots.

You will work with an amazing and collaborative research team that consistently produces influential works on multimodal foundation models, large-scale robot learning, embodied AI, and physics simulation. Our past projects include Eureka () , VIMA () , Voyager () , MineDojo () , MimicPlay () , Prismer () , and more. Your contributions will have a significant impact on our research projects and product roadmaps.

What you will be doing :

Design and maintain large-scale distributed training systems to support multi-modal foundation models for robotics.

Optimize GPU and cluster utilization for efficient model training and fine-tuning on massive datasets.

Implement scalable data loaders and preprocessors tailored for multimodal datasets, including videos, text, and sensor data.

Develop robust monitoring and debugging tools to ensure the reliability and performance of training workflows on large GPU clusters.

Collaborate with researchers to integrate cutting-edge model architectures into scalable training pipelines.

What we need to see :

Bachelor's degree in Computer Science, Robotics, Engineering, or a related field;

10+ years of full-time industry experience in large-scale MLOps and AI infrastructure;

Proven experience designing and optimizing distributed training systems with frameworks like PyTorch, JAX, or TensorFlow.

Deep understanding of GPU acceleration, CUDA programming, and cluster management tools like Kubernetes.

Strong programming skills in Python and a high-performance language such as C++ for efficient system development.

Strong experience with large-scale GPU clusters, HPC environments, and job scheduling / orchestration tools (e.g., SLURM, Kubernetes).

Ways to stand out from the crowd :

Master’s or PhD’s degree in Computer Science, Robotics, Engineering, or a related field;

Demonstrated Tech Lead experience, coordinating a team of engineers and driving projects from conception to deployment;

Strong experience at building large-scale LLM and multimodal LLM training infrastructure;

Contributions to popular open-source AI frameworks or research publications in top-tier AI conferences, such as NeurIPS, ICRA, ICLR, CoRL.

NVIDIA is widely considered to be one of the technology world's most desirable employers. We have some of the most forward-thinking and productive people in the world. Please join us and be part of the forefront of developing general-purpose robots and large-scale foundation models!

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 224,000 USD - 356,500 USD.

You will also be eligible for equity and benefits () .

Applications for this job will be accepted at least until July 29, 2025.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

serp_jobs.job_alerts.create_a_job

Senior Research Engineer • Santa Clara, CA, United States

Job_description.internal_linking.related_jobs
Sr ML / Deep Learning Architect (Fremont)

Sr ML / Deep Learning Architect (Fremont)

CitiusTech • Fremont, CA, US
serp_jobs.job_card.part_time
Machine Learning / Deep Learning Architect (Dicpm / Medical Imaging).With over 8,500 healthcare technology professionals worldwide, CitiusTech powers healthcare digital innovation, business transform...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Senior / Staff Machine Learning Research Scientist : Generative Modeling for Planning

Senior / Staff Machine Learning Research Scientist : Generative Modeling for Planning

Nuro • Mountain View, CA, United States
serp_jobs.job_card.full_time
Nuro is a self-driving technology company on a mission to make autonomy accessible to all.Founded in 2016, Nuro is building the world's most scalable driver, combining cutting-edge AI with automoti...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Senior Biological ML Research Engineer

Senior Biological ML Research Engineer

Second Renaissance • Palo Alto, CA, United States
serp_jobs.job_card.full_time
A leading scientific institution in Palo Alto is seeking an experienced machine learning research engineer to advance biological modeling capabilities. The ideal candidate will have a strong backgro...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Senior UX Mixed Method Researcher (NetSec)

Senior UX Mixed Method Researcher (NetSec)

Palo Alto Networks • Santa Clara, CA, United States
serp_jobs.job_card.full_time
At Palo Alto Networks® everything starts and ends with our mission : .Being the cybersecurity partner of choice, protecting our digital way of life. Our vision is a world where each day is safer and m...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Machine Learning Infrastructure Simulation Engineer, Optimus

Machine Learning Infrastructure Simulation Engineer, Optimus

Tesla Motors, Inc. • Palo Alto, CA, United States
serp_jobs.job_card.full_time
The Optimus Simulation team is at the forefront of advancing humanoid robotics by building a high-fidelity virtual world where Optimus can safely learn, adapt, and improve.Our mission is to recreat...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Software Engineer - Model Training Infrastructure - USDS

Software Engineer - Model Training Infrastructure - USDS

Tik Tok • San Jose, CA, United States
serp_jobs.job_card.full_time
About the team The mission of our AML team is to push the next-generation AI infrastructure and recommendation platform for the ads ranking, search ranking, live & ecom ranking in our company.We al...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Research Intern - Training Methods for LLM Efficiency

Research Intern - Training Methods for LLM Efficiency

Microsoft Corporation • Mountain View, CA, United States
serp_jobs.job_card.full_time
Research Internships at Microsoft provide a dynamic environment for research careers with a network of world-class research labs led by globally-recognized scientists and engineers, who pursue inno...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Research Scientist / Engineer – Training Infrastructure

Research Scientist / Engineer – Training Infrastructure

IntelliPro Group Inc. • Palo Alto, CA, US
serp_jobs.job_card.full_time
serp_jobs.filters_job_card.quick_apply
Research Scientist / Engineer – Training Infrastructure Position Type : Full time Location : Palo Alto, CA • Remote - US • Remote - International Salary Range : $220,000 - $300...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30
Machine Learning Infrastructure Engineer

Machine Learning Infrastructure Engineer

Institute Of Foundation Models • Sunnyvale, California, United States
serp_jobs.job_card.full_time
About the Institute of Foundation Models.We are a dedicated research lab for building, understanding, using, and risk-managing foundation models. Our mandate is to advance research, nurture the next...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Senior ML Engineer, Applied Research - Remote

Senior ML Engineer, Applied Research - Remote

Pinterest • Palo Alto, CA, US
serp_jobs.filters.remote
serp_jobs.job_card.full_time
A leading social media platform is seeking a Machine Learning Engineer to build personalized experiences.Applicants should have over 4 years of experience in machine learning methods and big data t...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Foundational ML Researcher / Engineer — Remote, Equity

Foundational ML Researcher / Engineer — Remote, Equity

Pathway Genomics Corporation • Palo Alto, CA, United States
serp_jobs.filters.remote
serp_jobs.job_card.full_time
A cutting-edge AI startup is seeking R&D Engineers for groundbreaking work in attention-based machine learning models.Candidates should have a strong research background and experience in model tra...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Senior Research Engineer

Senior Research Engineer

Harrison Clarke • Sunnyvale, CA, United States
serp_jobs.job_card.full_time
A fast-growing, deeply technical AI company is looking for a.This is an opportunity to work at the frontier of AI, helping design and evaluate models that can understand, write, and reason about co...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Senior Foundation Model ML Engineer – Scalable Inference

Senior Foundation Model ML Engineer – Scalable Inference

Apple Inc. • Santa Clara, CA, United States
serp_jobs.job_card.full_time
A leading technology company in California is seeking a Machine Learning Engineer to optimize AI infrastructures and support cutting-edge models. Candidates should have over 5 years of relevant expe...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Senior Reinforcement Learning Engineer, Helix

Senior Reinforcement Learning Engineer, Helix

Figure • San Jose, CA, United States
serp_jobs.job_card.full_time
Figure is an AI robotics company developing a general purpose humanoid.Our Humanoid is designed for corporate tasks targeting labor shortages and jobs that are undesirable or unsafe.We are based in...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Senior Research Engineer Manager

Senior Research Engineer Manager

Cisco Systems, Inc. • San Jose, CA, United States
serp_jobs.job_card.full_time
Splunk, a Cisco company, is building a safer, more resilient digital world with an endtoend, fullstack platform designed for hybrid, multicloud environments. Join the Foundational Modeling team at S...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Senior ML Engineer, Applied Research — Remote

Senior ML Engineer, Applied Research — Remote

Pinterest • Palo Alto, California, United States
serp_jobs.filters.remote
serp_jobs.job_card.full_time
A leading social media platform is seeking a Machine Learning Engineer to build personalized experiences.Applicants should have over 4 years of experience in machine learning methods and big data t...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Foundational ML Researcher / Engineer - Remote, Equity

Foundational ML Researcher / Engineer - Remote, Equity

Pathway Genomics Corporation • Palo Alto, CA, US
serp_jobs.filters.remote
serp_jobs.job_card.full_time
A cutting-edge AI startup is seeking R&D Engineers for groundbreaking work in attention-based machine learning models.Candidates should have a strong research background and experience in model...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Reinforcement Learning Research Engineer

Reinforcement Learning Research Engineer

Strativ Group • Hayward, CA, United States
serp_jobs.job_card.full_time
Reinforcement Learning Research Engineer.A scaling, SOTA Generative AI Startup operating with a world class team (Founders have multiple prior exits) with talent from Open AI, IBM, MIT and several ...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted