Reliability engineer serp_jobs.h1.location_city

serp_jobs.job_alerts.create_a_job

Reliability engineer • oakland ca

serp_jobs.last_updated.last_updated_variable_hours

Site Reliability Engineer

AlembicSan Francisco, California, United States

serp_jobs.job_card.full_time

Alembic is where top engineers are solving marketing's hardest problem : proving what actually works.If you're looking for frontier technical challenges at an applied science company, this is the pl...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30

serp_jobs.job_card.promoted

Site Reliability Engineer, Founding

LimohealthSan Francisco, CA, United States

serp_jobs.job_card.full_time

At Charta, we're pioneering a transformative approach to healthcare billing through the power of generative AI.Our mission is to revolutionize this critical yet often cumbersome aspect of healthcar...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days

Reliability Test Engineer

Rondo EnergyAlameda, California, United States

serp_jobs.job_card.full_time

Rondo Energy's mission is to eliminate 15% of global CO2 emissions in 15 years.To accomplish this mission, Rondo is deploying low-cost, zero-carbon Rondo Heat Batteries to accelerate the deployment...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30

Site Reliability Engineer

HappyRobotSan Francisco, California, United States

serp_jobs.job_card.full_time

HappyRobot is a platform to build and deploy.AI workers that automate communication.Our AI workers connect to any system or data source to handle phone calls, email, messages….We target the logisti...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30

serp_jobs.job_card.promoted

Software Engineer, Reliability

OpenAISan Francisco, CA, US

serp_jobs.job_card.full_time

Join the engineering teams that bring OpenAIs ideas safely to the world!!.The Applied Engineering team works across research, engineering, product, and design to bring OpenAIs technology to consume...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days

serp_jobs.job_card.promoted

Site Reliability Engineer

VirtualVocationsSan Francisco, California, United States

serp_jobs.job_card.full_time

A company is looking for a Site Reliability Engineer to join a dynamic Cloud Services team in a fully remote role.Key Responsibilities Act as a subject matter expert in cloud technologies, guidin...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30

Reliability Engineer

Memphis MeatsEmeryville, CA

serp_jobs.job_card.full_time

UPSIDE is growing and entering an exciting new period of its history around scale up.The chosen candidate for this Process Reliability Engineer position will execute and initiate optimization of th...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days

serp_jobs.job_card.promoted
serp_jobs.job_card.new

Site Reliability Engineer

GlobalBerkeley, California, USA

serp_jobs.job_card.full_time

We are seeking a Site Reliability Engineer to join our Operations Group.This role plays a key part in advancing scientific discovery by supporting high-performance computing (HPC) and data analysis...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_hours

Site Reliability Engineer

GrammarlySan Francisco, California, United States

serp_jobs.job_card.full_time

Grammarly offers a dynamic hybrid working model for this role.This flexible approach gives team members the best of both worlds : plenty of focus time along with in-person collaboration that helps f...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30

Site Reliability Engineer

Unstructured TechnologiesSan Francisco, California, United States

serp_jobs.job_card.full_time

Unstructured builds open-source and commercial tools that enable developers to preprocess and transform unstructured data — PDFs, HTML, Word docs, images, and more — for AI / ML pipelines.Our solutio...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30

serp_jobs.job_card.promoted

Software Engineer - Reliability

Pantera CapitalSan Francisco, CA, United States

serp_jobs.job_card.full_time

AI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excelle...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days

Site Reliability Engineer

BasetenSan Francisco, California, United States

serp_jobs.job_card.full_time

We’re a growing team of builders backed by top-tier investors, including.ML teams at enterprises and category-defining AI-native companies like. Baseten to power their core production workloads with...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days

Software Engineer (Site Reliability Engineer)

AnyscaleSan Francisco, California, United States

serp_jobs.job_card.full_time

Ray in their tech stacks to accelerate the progress of AI applications out into the real world.With Anyscale, we’re building the best place to run Ray, so that any developer or data scientist can s...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days

Site Reliability Engineer

WorkosSan Francisco, California, United States

serp_jobs.filters.remote

serp_jobs.job_card.full_time

WorkOS builds tools and services for developers to help them implement authentication, identity, authorization, and overall enterprise readiness. We’re a fully distributed team with employees across...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30

serp_jobs.job_card.promoted

Senior Reliability Engineer

Eight SleepSan Francisco, CA, United States

serp_jobs.job_card.full_time

Join the Sleep Fitness Movement.At Eight Sleep, we’re on a mission to fuel human potential through optimal sleep.As the world’s first sleep fitness company, we’re redefining what it means to be wel...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days

Senior Site Reliability Engineer - Fleet Reliability

LambdaSan Francisco, California, United States

serp_jobs.job_card.full_time

In 2012, Lambda started with a crew of AI engineers publishing research at top machine-learning conferences.We began as an AI company built by AI engineers. Today, we're on a mission to be the world...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days

Site Reliability Engineer

WriterSan Francisco, California, United States

serp_jobs.job_card.full_time

Writer is the full-stack generative AI platform delivering transformative ROI for the world’s leading enterprises.Named one of the top 50 companies in AI by Forbes and one of the best places to wor...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days

Site Reliability Engineer

CheckrSan Francisco, California, United States

serp_jobs.job_card.full_time

Checkr is building the data platform to power safe and fair decisions.Established in 2014, Checkr’s innovative technology and robust data platform help customers assess risk and ensure safety and c...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days

Software Engineer, Reliability

OpenaiSan Francisco, California, United States

serp_jobs.job_card.full_time

Join the engineering teams that bring OpenAI’s ideas safely to the world!!.The Applied Engineering team works across research, engineering, product, and design to bring OpenAI’s technology to consu...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days

Site Reliability Engineer

LatentSan Francisco, California, United States

serp_jobs.job_card.full_time

Latent is building the intelligence infrastructure for American healthcare.Our products are already helping hospitals and clinics dramatically increase workflow output, speed up patient access to m...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days

Site Reliability Engineer

AlembicSan Francisco, California, United States

job_description.job_card.30_days_ago

serp_jobs.job_preview.job_type

serp_jobs.job_card.full_time

job_description.job_card.job_description

About Us

Alembic is where top engineers are solving marketing's hardest problem : proving what actually works. If you're looking for frontier technical challenges at an applied science company, this is the place.

At Alembic, we're not just building software, we're decoding the chaos of modern marketing. Join Alembic to build trusted systems that Fortune 100 companies use to make multimillion dollar decisions.

We're backed by leading tech luminaries including WndrCo (founded by DreamWorks founder Jeffrey Katzenberg), Jensen Huang, Joe Montana, and many more.

About the Role

We’re looking for a SRE to help evolve and scale the systems that power Alembic. This is a high-impact, foundational role where you’ll drive platform scalability from the ground up. This role is particularly well-suited for seasoned platform, cloud, or DevOps engineers who are ready to dive into AI infrastructure. You'll leverage your proven expertise in scalable systems while learning to deploy and manage cutting-edge ML workloads—making this an ideal transition role for infrastructure veterans looking to specialize in the AI space.

What You’ll Do

Design, build, integrate, and operate the foundational infrastructure that powers Alembic’s platform—including core services, data pipelines, and distributed AI / ML workloads—across both cloud (primarily AWS) and on-prem environments.

Leverage Infrastructure as Code (IaC) tools such as Terraform for cloud resource provisioning and Ansible for configuration management, enabling repeatable, auditable, and environment-agnostic infrastructure deployments.

Develop and maintain CI / CD pipelines that enable reliable, low-risk, and rapid deployments using modern tools like GitHub Actions, ArgoCD, Bazel, or equivalent, with automated testing, rollback, and deployment workflows.

Establish and operate robust observability systems , including metrics, logging, and distributed tracing, using tools like Prometheus , Grafana , Datadog , and OpenTelemetry to ensure proactive incident detection and diagnosis.

Collaborate closely with the AI Research team to deploy and manage novel ML algorithms and drive next generation work on GPU-based development efforts.

Serve as a technical mentor and thought leader , promoting best practices in system design, infrastructure reliability, and code quality across the engineering organization.

What Will Help You Succeed

15–20 years of engineering experience, including significant time spent on platform, infrastructure, or DevOps / SRE teams.

Deep experience with AWS (or GCP / Azure), container orchestration with Kubernetes, and service discovery at scale.

Strong grasp of DevOps principles, infrastructure as code (Terraform, Ansible), and immutable infrastructure.

Experience deploying and operating production systems in fast-paced environments, ideally early- or growth-stage startups.

Proficiency in systems or scripting language (e.g., Python, Bash).

Experience with secure networking, secrets management, and managing systems in compliance-heavy environments.

A bias for simplicity, automation, and building tools that empower developers.

A hands-on, in-the-weeds approach and a collaborative mindset. You’re as comfortable fixing a broken pipeline as designing the future of our platform.

This role is right for you if :

You're an experienced platform / DevOps engineer ready to apply your infrastructure expertise to the cutting edge of AI. This role offers the perfect bridge between traditional platform engineering and the emerging world of ML / AI systems at scale.

You want to build something that is both technologically challenging and solves a real customer need. You want a role with major upside that tackles a massive market opportunity.

Why You Might Be Excited About Alembic

Hard problems with real impact : You'll tackle the hardest challenges in marketing analytics while building systems that influence multimillion-dollar decisions at Fortune 100 companies

Technical autonomy : You want ownership over technical decisions and the freedom to solve complex problems your way

Cutting-edge technology : Work with advanced AI / ML algorithms, composite AI solutions, private NVIDIA DGX clusters, and the latest in data processing at scale

Elite team : Join top engineers who thrive on challenging problems and high-impact work

Startup upside : Early-stage equity opportunity with experienced leadership and proven product-market fit

Why You Might Not Be Excited

If you only want to tell people what to build instead of building and coding alongside them, we're not the environment for you

You prefer company practices with 100% built-out process for every detail

You prefer static over dynamic. Projects, priorities, and roles will adapt to your skill set and goals. Though we have real paying customers and a playbook for growth, we proudly remain an early-stage startup