Senior+ Site Reliability EngineerCrusoe Energy Systems LLC • San Francisco, CA, United States

Senior+ Site Reliability Engineer

Crusoe Energy Systems LLC • San Francisco, CA, United States

job_description.job_card.1_day_ago

serp_jobs.job_preview.job_type

serp_jobs.job_card.full_time

job_description.job_card.job_description

Crusoe's mission is to accelerate the abundance of energy and intelligence. We’re crafting the engine that powers a world where people can create ambitiously with AI — without sacrificing scale, speed, or sustainability.

Be a part of the AI revolution with sustainable technology at Crusoe. Here, you'll drive meaningful innovation, make a tangible impact, and join a team that’s setting the pace for responsible, transformative cloud infrastructure.

About This Role :

Crusoe is building the most reliable, energy-efficient, AI-optimized cloud platform — and operational excellence is at the heart of that mission. As a Site Reliability Engineer focused on Operational Excellence, you will help ensure the stability, resilience, and performance of Crusoe’s GPU cloud.

This role is ideal for engineers who thrive in fast-paced environments, enjoy solving operational problems, and want to grow their technical career while supporting incident response, reliability, and continuous improvement across a large-scale distributed platform.

You’ll partner closely with senior SREs, infrastructure engineers, and platform teams to improve reliability, reduce operational toil, and strengthen Crusoe’s incident management practices.

What You’ll Be Working On :

Collaborate with cross-functional teams to define and refine availability metrics for Crusoe’s cloud infrastructure, including establishing, tracking, and improving SLIs and SLOs.

Assist in incident response by identifying, diagnosing, and resolving service disruptions, and support post-incident processes through RCA documentation and participation in post-incident reviews.

Build, operate, and monitor infrastructure health using Crusoe’s observability stack (Prometheus, Grafana, Alertmanager, OpenTelemetry).

Identify and communicate reliability risks, performance bottlenecks, and early indicators of potential incidents that could impact service availability.

Develop automation and tooling to reduce operational toil, minimize manual intervention, and enhance service recovery and self‑healing capabilities.

Partner with compute, network, storage, and platform teams to improve service resilience and strengthen disaster recovery readiness.

Contribute to knowledge sharing, process improvements, and the development of operational best practices across the organization.

Participate in ongoing training, mentorship, and professional development to grow into advanced SRE responsibilities.

What You’ll Bring to the Team :

5+ years of experience in cloud operations, SRE, or related roles

Understanding of cloud platforms and infrastructure fundamentals (Kubernetes, AWS / GCP, virtualization, distributed systems)

Familiarity with incident management practices and operational frameworks (SRE / ITIL / etc.)

Experience with monitoring and alerting tools (Prometheus, Grafana) or a strong willingness to learn

Familiarity with infrastructure-as-code and configuration management tools such as Terraform and Ansible

Basic Scripting and automation experience (Go, Python, C, C++, or similar)

Strong communication skills, with the ability to clearly articulate technical issues to diverse stakeholders

Ability to stay calm, focused, and effective in fast-moving or high-pressure situations

A growth mindset with enthusiasm for operational excellence, reliability engineering, and continuous improvement

Bonus Points :

Experience with Kubernetes, container orchestration, or large-scale distributed systems

Exposure to change management, operational readiness reviews, or structured RCAs

Familiarity with self‑healing systems, automated remediation, or event‑driven operations

Interest in scaling AI / HPC infrastructure and solving reliability challenges in GPU‑heavy environments

Passion for learning, mentorship, and developing deeper SRE capabilities over time

Benefits :

Industry competitive pay

Restricted Stock Units in a fast growing, well‑funded technology company

Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents

Employer contributions to HSA accounts

Paid Parental Leave

Paid life insurance, short-term and long-term disability

Teladoc

401(k) with a 100% match up to 4% of salary

Generous paid time off and holiday schedule

Cell phone reimbursement

Tuition reimbursement

Subscription to the Calm app

MetLife Legal

Company paid commuter benefit; $300 per month

Compensation :

Compensation will be paid in the range of $172,000 - $209,000 + Bonus. Restricted Stock Units are included in all offers. Compensation to be determined by the applicant’s education, experience, knowledge, skills, and abilities, as well as internal equity and alignment with market data.

Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex / gender, sexual preference / orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.

#J-18808-Ljbffr

serp_jobs.job_alerts.create_a_job

Senior Site Reliability Engineer • San Francisco, CA, United States

Job_description.internal_linking.related_jobs

Senior Site Reliability Engineer

Canonical • San Francisco, CA, United States

serp_jobs.job_card.full_time

Senior Site Reliability Engineer.Location : Globally remote role.Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets.Our pla...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted

Senior Staff Site Reliability Engineer

WEX • San Francisco, CA, United States

serp_jobs.job_card.full_time

We are looking for a highly motivated and high-potential Senior Staff Site Reliability Engineer (SRE) to join our team as a senior technical leader, driving transformational change and delivering s...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_1_day • serp_jobs.job_card.promoted

Senior Site Reliability Engineer

Chainlink Labs • San Francisco, CA, United States

serp_jobs.job_card.full_time

Senior Site Reliability Engineer.We’re looking for an experienced Site Reliability Engineer to join the Infrastructure Platform team, help builders at Chainlink, and accelerate delivery of internal...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted

Site Reliability Engineer

Latent • San Francisco, CA, United States

serp_jobs.job_card.full_time

San Francisco, CA (5 Days In-Office).You are the infrastructure expert who enables our rapid product development and guarantees. AI platform for major health systems.Your focus on operational excell...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted

Senior Site Reliability Engineer, Compute

Roblox • San Mateo, California, USA

serp_jobs.job_card.full_time

The Infrastructure Compute Site Reliability Engineering (SRE) teams mission is to own and manage the successful operation of our underlying cell infrastructure system along with elements of service...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted

Site Reliability Engineer (San Francisco Bay Area)

Cypress HCM • San Francisco Bay Area, US

serp_jobs.job_card.part_time

As a Site Reliability Engineer (Contractor), you will be a hands-on contributor, focused on supporting and improving the reliability of our AWS cloud infrastructure. You will apply core SRE principl...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted

Site Reliability Engineer

Alchemy • San Francisco, CA, United States

serp_jobs.job_card.full_time

Our mission is to bring web3 to a billion people, by providing builders with the tools they need to build exceptional onchain products. Alchemy is the only complete developer platform that offers th...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted

Senior Site Reliability Engineer

Alembic Technologies • San Francisco, CA, United States

serp_jobs.job_card.full_time

Senior Site Reliability Engineer.This range is provided by Alembic Technologies.Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.We’re looking fo...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted

Senior Site Reliability Engineer

Loft Orbital • San Francisco, CA, United States

serp_jobs.job_card.full_time

Senior Site Reliability Engineer.This range is provided by Loft Orbital.Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.Loft Orbital is revoluti...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted

Senior Site Reliability Engineer

Hive • San Francisco, CA, United States

serp_jobs.job_card.full_time

Hive is the leading provider of cloud-based AI solutions to understand, search, and generate content, and is trusted by hundreds of the world's largest and most innovative organizations.The company...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted

Site Reliability Engineer I

Prosper • San Francisco, CA, United States

serp_jobs.job_card.full_time

As a Site Reliability Engineer I at Prosper, you will play a crucial role in enhancing the reliability, scalability, and maintainability of our technology platform. This entry-level position is desi...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted

Senior Staff Site Reliability Engineer - Platform

Icon Ventures • San Francisco, CA, United States

serp_jobs.job_card.full_time

At Quizlet, our mission is to help every learner achieve their outcomes in the most effective and delightful way.Our $1B+ learning platform serves tens of millions of students every month, includin...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted

Senior Site Reliability Engineer

Circle • San Francisco, CA, United States

serp_jobs.job_card.full_time

Senior Site Reliability Engineer at Circle.Circle is a financial technology company at the epicenter of the emerging internet of money. Our infrastructure—including USDC, a blockchain‑based dollar—h...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted

Site Reliability Engineer - Scale & Observability

gamma.app • San Francisco, CA, United States

serp_jobs.job_card.full_time

A dynamic tech firm located in San Francisco is seeking a Site Reliability Engineer to enhance operational health across their production systems. This high-impact role demands expertise in AWS and ...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_1_hour • serp_jobs.job_card.promoted • serp_jobs.job_card.new

Site Reliability Engineer

Speak • San Francisco, CA, United States

serp_jobs.job_card.full_time

Our mission is to reinvent the way people learn, starting with language.Learning a language can change a life by opening doors to new cultures, careers, and communities. Two billion people around th...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted

Sr. Site Reliability Engineer

Apple Inc. • San Francisco, CA, United States

serp_jobs.job_card.full_time

San Francisco Bay Area, California, United States Software and Services.Apple is where individual imaginations gather together, committing to the values that lead to great work.Every new product we...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_1_day • serp_jobs.job_card.promoted

Infrastructure Site Reliability Engineer (Local only)

Maxonic Inc. • San Francisco, CA, United States

serp_jobs.job_card.full_time

Maxonic maintains a close and long-term relationship with our direct client.In support of their needs, we are looking for an. Infrastructure Site Reliability Engineer.Job Title : Infrastructure Site ...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted

Senior Technology Site Reliability Engineer

Cooley LLP • San Francisco, CA, United States

serp_jobs.job_card.full_time

Senior Technology Site Reliability Engineer page is loaded## Senior Technology Site Reliability Engineerlocations : San Francisco : New York : Santa Monica : Los Angeles : Palo Altotime type : ...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted