Performance engineer serp_jobs.h1.location_city

serp_jobs.job_alerts.create_a_job

Performance engineer • berkeley ca

serp_jobs.last_updated.last_updated_variable_hours

Machine Learning Engineer - Model Performance

InferenceSan Francisco, California, United States

serp_jobs.job_card.full_time

Machine Learning Engineer to join our team, focusing on optimizing the performance of our cutting-edge AI inference systems. This role involves working with state-of-the-art large language models an...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30

Performance Test Engineer - Senior Manager

PwCSan Francisco,CA

serp_jobs.job_card.full_time

SummaryAt PwC, our people in software and product innovation focus on developing cutting-edge software solutions and driving product innovation to meet the evolving needs of clients.These individua...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30

serp_jobs.job_card.promoted

Performance Engineer- Java / LoadRunner

StaffingSan Francisco, CA, US

serp_jobs.job_card.full_time

Performance Engineer- Java / Loadrunner.Job Location : San Francisco, CA or San Jose, CA.Key Skillset : 5+ years of Performance Center or LoadRunner experience. Java performance tuning and optimization ...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days

Senior AI Performance Engineer

GenmoSan Francisco, California, United States

serp_jobs.job_card.full_time

We are Genmo, a research lab dedicated to building open, state-of-the-art models for video generation towards unlocking the right brain of AGI. Join us in shaping the future of AI and pushing the bo...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30

serp_jobs.job_card.promoted

AI Agent Software Engineer - Agent Performance Engineering

AssembledSan Francisco, CA, US

serp_jobs.job_card.full_time

Agent Performance Engineering Role.As part of the Agent Performance Engineering team, you'll be working on the core systems that make our AI agents smarter, more accurate, and more capable of handl...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days

serp_jobs.job_card.promoted

Performance Engineer

AnthropicSan Francisco, CA, US

serp_jobs.job_card.full_time

Research Engineer, Frontier Red Team (Rsp Evaluations).San Francisco, CA | Seattle, WA.Research Scientist, Frontier Red Team (Autonomy). Remote-Friendly (Travel-Required) | San Francisco, CA | Seatt...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days

serp_jobs.job_card.promoted

Software Engineer, ML Performance

OpenAISan Francisco, CA, United States

serp_jobs.job_card.full_time

Software Engineer, ML Performance | OpenAI.Software Engineer, ML Performance.Apply now (opens in a new window).Our Inference team brings OpenAI’s most capable research and technology to the world t...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days

Senior HPC Performance Engineer

NVIDIARemote, CA, US

serp_jobs.filters.remote

serp_jobs.job_card.full_time

As a member of our team in NVIDIA's NVHPC compilers & tools group, you will analyze and run High Performance Computing (HPC) applications on HPC servers and systems to gain insight into the per...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30

Sr Building Performance Engineer

HGASan Francisco, CA, US

serp_jobs.job_card.full_time

HGA is an award winning architectural, engineering and planning firm with a full-time opportunity for a talented, ambitious. HGA's Building Performance presence in the West Coast.We define a buildin...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30

serp_jobs.job_card.promoted

Principal AI Performance Engineer

Epoch BiodesignSan Francisco, CA, United States

serp_jobs.job_card.full_time

Crusoe is building the World’s Favorite AI-first Cloud infrastructure company.We’re pioneering vertically integrated, purpose-built AI infrastructure solutions trusted by Fortune 500 companies to p...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days

QE Lead Performance Engineer

US012 Marsh & McLennan Agency LLCCalifornia,San Francisco

serp_jobs.job_card.full_time

Award-winning, inclusive, Top Workplace culture doesn’t happen overnight.It’s a result of hard work by extraordinary people. The industry’s brightest talent drive our efforts to deliver purposeful w...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30

serp_jobs.job_card.promoted

Lead Performance Tester / Engineer

Diverse LynxSan Francisco, CA, US

serp_jobs.job_card.full_time

Design and implement comprehensive performance testing strategies for web, mobile, and backend systems.Develop, maintain, and execute performance test scripts using LoadRunner and NeoLoad and famil...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_hours

serp_jobs.job_card.promoted

Sr. Performance Engineer San Francisco, California

Databricks Inc.San Francisco, CA, US

serp_jobs.job_card.full_time

At Databricks, we are passionate about enabling data teams to solve the world's toughest problems.We do this by building and running the world's best data and AI infrastructure platform so our cust...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30

Sr. Software Engineer - Performance

DatabricksSan Francisco, California

serp_jobs.job_card.full_time

serp_jobs.job_card.promoted

Principal AI Performance Engineer

CrusoeSan Francisco, CA, US

serp_jobs.job_card.full_time

Crusoe is building the World's Favorite AI-first Cloud infrastructure company.We're pioneering vertically integrated, purpose-built AI infrastructure solutions trusted by Fortune 500 companies to p...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days

serp_jobs.job_card.promoted

Performance engineer

writer.comSan Francisco, CA, US

serp_jobs.job_card.full_time

Writer is seeking a highly skilled and motivated Principal performance engineer to lead the performance optimization of our cutting-edge Generative AI technology stack. This role is critical in ensu...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30

serp_jobs.job_card.promoted

performance Engineer / tester

Omega Solutions Inc.San Francisco, CA, US

serp_jobs.job_card.full_time

This is Ashok from Omega solutions.This is regarding an immediate opening for Performance Engineer.Please find the below description and let me know your interest. Designs, configures and runs perfo...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30

serp_jobs.job_card.promoted

Machine Learning Engineer - Model Performance

SOLANA FOUNDATIONSan Francisco, CA, US

serp_jobs.job_card.full_time

Senior Embedded Software Engineer, Performance

SquareSan Francisco, CA, US

serp_jobs.job_card.full_time

Since we opened our doors in 2009, the world of commerce has evolved immensely, and so has Square.After enabling anyone to take payments and never miss a sale, we saw sellers stymied by disparate, ...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30

Senior Site Reliability Engineer, Performance

Cisco MerakiUS; San Francisco, CA, United States

serp_jobs.filters.remote

serp_jobs.job_card.full_time

Application window is open until further notice.The Meraki cloud supports millions of customer devices from 10 data centers and numerous public cloud regions from around the world.Meraki’s customer...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days

Machine Learning Engineer - Model Performance

InferenceSan Francisco, California, United States

job_description.job_card.30_days_ago

serp_jobs.job_preview.job_type

serp_jobs.job_card.full_time

job_description.job_card.job_description

Inference.net is seeking a Machine Learning Engineer to join our team, focusing on optimizing the performance of our cutting-edge AI inference systems. This role involves working with state-of-the-art large language models and ensuring they run efficiently and effectively at scale. You will be responsible for deploying state-of-the-art models at scale and performing optimizations to increase throughput and enable new features. This position offers the chance to collaborate closely with our engineering team and make significant contributions to open source projects, like SGLang and vLLM.

About Inference.net

We are building a distributed LLM inference network that combines idle GPU capacity from around the world into a single cohesive plane of compute that can be used for running large-language models like DeepSeek and Llama 4. At any given moment, we have over 5,000 GPUs and hundreds of terabytes of VRAM connected to the network.

We are a small, well-funded team working on difficult, high-impact problems at the intersection of AI and distributed systems. We primarily work in-person from our office in downtown San Francisco. Our investors include A16z CSX and Multicoin. We are high-agency, adaptable, and collaborative. We value creativity alongside technical prowess and humility. We work hard, and deeply enjoy the work that we do.

Responsibilities

Design and implement optimization techniques to increase model throughput and reduce latency across our suite of models

Deploy and maintain large language models at scale in production environments

Deploy new models as they are released by frontier labs

Implement techniques like quantization, speculative decoding, and KV cache reuse

Contribute regularly to open source projects such as SGLang and vLLM

Deep dive into underlying codebases of TensorRT, PyTorch, TensorRT-LLM, vLLM, SGLang, CUDA, and other libraries to debug ML performance issues

Collaborate with the engineering team to bring new features and capabilities to our inference platform

Develop robust and scalable infrastructure for AI model serving

Create and maintain technical documentation for inference systems

Requirements

3+ years of experience writing high-performance, production-quality code

Strong proficiency with Python and deep learning frameworks, particularly PyTorch

Demonstrated experience with LLM inference optimization techniques

Hands-on experience with SGLang and vLLM, with contributions to these projects strongly preferred

Familiarity with Docker and Kubernetes for containerized deployments

Experience with CUDA programming and GPU optimization

Strong understanding of distributed systems and scalability challenges

Proven track record of optimizing AI models for production environments

Nice to Have

Familiarity with TensorRT and TensorRT-LLM

Knowledge of vision models and multimodal AI systems

Experience implementing techniques like quantization and speculative decoding

Contributions to open source machine learning projects

Experience with large-scale distributed computing

Compensation

We offer competitive compensation, equity in a high-growth startup, and comprehensive benefits. The base salary range for this role is $180,000 - $250,000, plus competitive equity and benefits including :

Full healthcare coverage

Quarterly offsites

Flexible PTO

Equal Opportunity

Inference.net is an equal opportunity employer. We welcome applicants from all backgrounds and don't discriminate based on race, color, religion, gender, sexual orientation, national origin, genetics, disability, age, or veteran status.

If you're passionate about building the next generation of high-performance systems that push the boundaries of what's possible with large language models, we want to hear from you!