Talent.com
Cluster Infrastructure Engineer
Cluster Infrastructure EngineerCartesia • San Francisco, California, United States
serp_jobs.error_messages.no_longer_accepting
Cluster Infrastructure Engineer

Cluster Infrastructure Engineer

Cartesia • San Francisco, California, United States
job_description.job_card.30_days_ago
serp_jobs.job_preview.job_type
  • serp_jobs.job_card.full_time
job_description.job_card.job_description

About Cartesia

Our mission is to build the next generation of AI : ubiquitous, interactive intelligence that runs wherever you are. Today, not even the best models can continuously process and reason over a year-long stream of audio, video and text—1B text tokens, 10B audio tokens and 1T video tokens—let alone do this on-device.

We're pioneering the model architectures that will make this possible. Our founding team met as PhDs at the Stanford AI Lab, where we invented State Space Models or SSMs, a new primitive for training efficient, large-scale foundation models. Our team combines deep expertise in model innovation and systems engineering paired with a design-minded product engineering team to build and ship cutting edge models and experiences.

We're funded by leading investors at Index Ventures and Lightspeed Venture Partners, along with Factory, Conviction, A Star, General Catalyst, SV Angel, Databricks and others. We're fortunate to have the support of many amazing advisors, and 90+ angels across many industries, including the world's foremost experts in AI.

About the Role

We’re looking for a Cluster Infrastructure Engineer to help build and scale the compute backbone that powers Cartesia’s research on real-time, multimodal intelligence. In this role, you’ll work at the intersection of distributed systems and infrastructure engineering, designing and operating the large-scale GPU clusters that train and serve Cartesia’s foundation models. You’ll own systems that need to be fast, reliable, and highly automated — ensuring our researchers and product teams can move at the speed of innovation. You’ll build the tooling, automation, and monitoring needed to keep clusters resilient under load, quickly diagnose and resolve issues, and continuously push the boundaries of scalability and efficiency.

Your Impact

Design and build large-scale GPU clusters for model training and low-latency inference

Develop automation for provisioning, scaling, and monitoring to ensure clusters are fast, resilient, and self-healing

Collaborate closely with research and product teams to enable distributed training at scale, optimizing for speed, reliability, and utilization

Implement robust observability and alerting systems to monitor GPU health, node stability, and job performance

Diagnose and triage hardware, networking, and distributed training issues across environments, coordinating with provider support as needed

Continuously improve cluster reliability, developer ergonomics, and overall system efficiency across Cartesia’s research and production workloads

What You Bring

Strong engineering fundamentals and experience building and operating large-scale distributed systems

Deep familiarity with HPC & GPU cluster management using Kubernetes and Slurm

A blend of developer empathy and raw performance engineering, designing systems and tools that are intuitive to use and fast

Ability to balance principled engineering with the urgency of keeping mission-critical systems alive

Proficiency with Infrastructure-as-Code tools (Terraform, Ansible, etc.) and observability tools (Prometheus, Grafana, etc.)

Strong debugging skills— comfortable diagnosing NCCL issues, CUDA errors, and network or driver-level faults.

What Sets You Apart

Experience optimizing large-scale distributed training frameworks such as DeepSpeed, Megatron-LM, or similar

Familiarity with advanced parallelization techniques such as FSDP, context parallelism, or tensor parallelism

Our culture

🏢 We’re an in-person team based out of San Francisco. We love being in the office, hanging out together and learning from each other everyday.

🚢 We ship fast. All of our work is novel and cutting edge, and execution speed is paramount. We have a high bar, and we don’t sacrifice quality and design along the way.

🤝 We support each other. We have an open and inclusive culture that’s focused on giving everyone the resources they need to succeed.

serp_jobs.job_alerts.create_a_job

Infrastructure Engineer • San Francisco, California, United States

Job_description.internal_linking.related_jobs
Infrastructure Engineer

Infrastructure Engineer

Roboflow • San Francisco, California, USA
serp_jobs.job_card.full_time
Our mission is to make the world programmable.Sight is one of the key ways we understand the world and soon this will be true for the software we use too. Were building the tools community and resou...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Cloud Infrastructure Engineer

Cloud Infrastructure Engineer

Braintrust • San Francisco, CA, United States
serp_jobs.job_card.full_time
Braintrust is building the modern platform for evaluating and deploying AI systems.Our mission is to help enterprises build trust in their AI by making it easy to test, monitor, and improve models ...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Cloud Infrastructure Engineer - Mid to Staff Level

Cloud Infrastructure Engineer - Mid to Staff Level

HireTo by Kuvaka • San Francisco, CA, United States
serp_jobs.job_card.full_time
Cloud Infrastructure Engineer - Mid to Staff Level.Cloud Infrastructure Engineer - Mid to Staff Level.Cloud Infrastructure Engineer - Mid to Staff Level. Be among the first 25 applicants.Cloud Infra...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Infrastructure Platform Engineer

Infrastructure Platform Engineer

NS IT Solutions • San Francisco, California, USA
serp_jobs.job_card.full_time
Title : Infrastructure / Platform Engineer (AI Voice & Social Product) - w / Equity.Location : San Francisco CA (onsite 5 days a week). As a Founding Infrastructure / Platform Engineer oversee cloud da...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Infrastructure Engineer

Infrastructure Engineer

Chalk • San Francisco, CA, United States
serp_jobs.job_card.full_time
Chalk is building the data platform that powers the future of machine learning applications.We tear down complexity, latency, and scale barriers that have traditionally constrained ML capabilities....serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Infrastructure Engineer

Infrastructure Engineer

Tamarind Bio • San Francisco, CA, United States
serp_jobs.job_card.full_time
This range is provided by Tamarind Bio.Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. We're looking for an Infrastructure Engineer to lead the ...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Infrastructure Engineer

Infrastructure Engineer

VibeCode • San Francisco, CA, United States
serp_jobs.job_card.full_time
We're democratizing software creation.Our platform lets anyone describe an idea and instantly turn it into a working application—no coding required. We're solving one of computing's fundamental chal...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_1_day • serp_jobs.job_card.promoted
Platform & Infrastructure Engineer

Platform & Infrastructure Engineer

MindsDB • San Francisco, CA, United States
serp_jobs.job_card.full_time
Retrieved from the description.MindsDB is a fast-growing AI startup headquartered in San Francisco, California.MindsDB is an AI Analytics solution that connects to diverse data sources and applicat...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Infrastructure Engineer

Infrastructure Engineer

Factory • San Francisco, CA, United States
serp_jobs.job_card.full_time
Factory is seeking seasoned Infrastructure Engineers to architect, build, and maintain our cloud infrastructure.Lead the design and implementation of robust, secure, and highly scalable cloud infra...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Infrastructure Engineer

Infrastructure Engineer

Delphi • San Francisco, CA, United States
serp_jobs.job_card.full_time
Get AI-powered advice on this job and more exclusive features.At Delphi, we are redefining how knowledge is shared by creating a new medium for human communication : interactive digital minds that p...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Senior Cloud Serverless Infrastructure Engineer

Senior Cloud Serverless Infrastructure Engineer

jobr.pro • San Francisco, CA, United States
serp_jobs.job_card.full_time
A leading technology company is seeking a software engineer to join their team focused on serverless products.The role involves writing development code, participating in design reviews, and ensuri...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_1_day • serp_jobs.job_card.promoted
Infrastructure Engineer

Infrastructure Engineer

Tempo • San Francisco, CA, United States
serp_jobs.job_card.full_time
Tempo is a layer-1 blockchain purpose-built for stablecoins and real-world payments, born from Stripe’s experience in global payments and Paradigm’s expertise in crypto tech.Tempo’s payment-first d...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Cloud Infrastructure Engineer

Cloud Infrastructure Engineer

Florvets Structures • San Francisco, CA, United States
serp_jobs.job_card.full_time
Job Title : Cloud Infrastructure Engineer.Florvets Structures is a leading construction and engineering company based in San Francisco, California. We specialize in building innovative and sustainabl...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Infrastructure Platform Engineer

Infrastructure Platform Engineer

Fieldguide • San Francisco, California, USA
serp_jobs.job_card.full_time
Fieldguide is establishing a new state of trust for global commerce and capital markets through automating and streamlining the work of assurance and audit practitioners specifically within cyberse...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Infrastructure Engineer

Infrastructure Engineer

Langchain • San Francisco, CA, United States
serp_jobs.job_card.full_time
At LangChain, our mission is to make intelligent agents ubiquitous.We provide the agent engineering platform and open source frameworks developers need to ship reliable agents fast.Our open source ...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Cluster Infrastructure Engineer

Cluster Infrastructure Engineer

Cartesia • San Francisco, CA, United States
serp_jobs.job_card.full_time
Our mission is to build the next generation of AI : ubiquitous, interactive intelligence that runs wherever you are.Today, not even the best models can continuously process and reason over a year-lo...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Cloud Infrastructure Engineer – Edge & Streaming Systems

Cloud Infrastructure Engineer – Edge & Streaming Systems

Specter • San Francisco, CA, United States
serp_jobs.job_card.full_time
A tech startup specializing in physical AI is seeking an infrastructure software engineer to design, deploy, and scale distributed systems for their sensing and perception platform.This role involv...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_1_hour • serp_jobs.job_card.promoted • serp_jobs.job_card.new
Infrastructure Engineer (Hybrid Cloud & Platform)

Infrastructure Engineer (Hybrid Cloud & Platform)

Aldea Inc • San Francisco, CA, United States
serp_jobs.job_card.full_time
Location : US Remote / Bay Area.Aldea is a multi-modal foundational AI company reimagining the scaling laws of intelligence. We believe today's architectures create unnecessary bottlenecks for the ev...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted