Talent.com
Site Reliability Engineer
Site Reliability EngineerPsiQuantum • Palo Alto, CA, United States
Site Reliability Engineer

Site Reliability Engineer

PsiQuantum • Palo Alto, CA, United States
job_description.job_card.30_days_ago
serp_jobs.job_preview.job_type
  • serp_jobs.job_card.full_time
job_description.job_card.job_description

PsiQuantum'smission is to build the first useful quantum computers-machines capable of delivering the breakthroughs the field has long promised. Since our founding in 2016, our singular focus has been to build and deploy million-qubit, fault-tolerant quantum systems.

Quantum computers harness the laws of quantum mechanics to solve problems that even the most advanced supercomputers or AI systems will never reach. Their impact will span energy, pharmaceuticals, finance, agriculture, transportation, materials, and other foundational industries.

Our architecture and approachisbased on silicon photonics. Byleveragingthe advanced semiconductor manufacturing industry-including partners like GlobalFoundries-we use the same high-volume processes that already produce billions of chips for telecom and consumer electronics. Photonics offers natural advantages for scale : photonsdon'tfeel heat, are immune to electromagnetic interference, and integrate with existing cryogenic cooling and standard fiber-optic infrastructure.

In 2024,PsiQuantumannounced government-funded projects to support the build-out of our first utility-scale quantum computers in Brisbane, Australia, and Chicago, Illinois. These initiatives reflect a growing recognition that quantum computing will be strategically and economically defining-and that now is the time to scale.

PsiQuantumalso develops the algorithms and software needed to make these systems commercially valuable. Our application, software, and industry teams work directly with leading Fortune 500 companies-including Lockheed Martin, Mercedes-Benz, Boehringer Ingelheim, and Mitsubishi Chemical-to prepare quantum solutions for real-world impact.

Quantum computing is not an extension of classical computing. Itrepresentsa fundamental shift-and a path to mastering challenges that cannot besolvedany other way. The potential is enormous, and we have a clearpathto make it real.

Come join us.

Job Summary :

Join the OS / Platform team as a Site Reliability Engineer (SRE) and keep our services healthy, observable, and fast. Partnering with the Platform Engineering group, you'll own the daytoday operation of our monitoring stack-Grafana, Prometheus, Loki, and Tempo-crafting dashboards that surface golden signals and drive realtime insight. You'll codify reliability through SLIs / SLOs, automate runbooks in Python, and lead incident response to maintain worldclass uptime across both onprem and AWS environments.

Responsibilities :

  • ","469777815" : "hybridMultilevel"}" data-aria-posinset="1" data-aria-level="1">
  • Define, implement, and iterate on Service Level Indicators & Service Level Objectives (SLIs / SLOs) and error budgets for critical services, with a focus on network reliability and data centre interconnects.
  • ","469777815" : "hybridMultilevel"}" data-aria-posinset="2" data-aria-level="1">
  • Build and maintain Grafana dashboards that visualize golden signals (latency, traffic, errors, saturation), extending coverage to network telemetry such as packet loss, jitter, bandwidth utilization, and BGP / EVPN stability.
  • ","469777815" : "hybridMultilevel"}" data-aria-posinset="3" data-aria-level="1">
  • Operate and tune the observability pipeline (Prometheus, Loki, Tempo) to ensure scalable, low-latency telemetry ingestion and alerting for networking as well as compute layers.
  • ","469777815" : "hybridMultilevel"}" data-aria-posinset="4" data-aria-level="1">
  • Drive incident response : triage, mitigate, perform post-incident reviews, and implement preventive actions-particularly for network-related outages, congestion, or misconfigurations.
  • ","469777815" : "hybridMultilevel"}" data-aria-posinset="5" data-aria-level="1">
  • Develop automation and self-service tooling in Python / Bash to streamline alerts, runbooks, and operational tasks, including network monitoring and diagnostics.
  • ","469777815" : "hybridMultilevel"}" data-aria-posinset="6" data-aria-level="1">
  • Collaborate with Platform, Product, and Networking teams on capacity planning, performance testing, traffic engineering, and change management.
  • ","469777815" : "hybridMultilevel"}" data-aria-posinset="7" data-aria-level="1">
  • Improve CI / CD health checks and release safety nets within GitLab, with attention to network dependencies in deployments.
  • ","469777815" : "hybridMultilevel"}" data-aria-posinset="8" data-aria-level="1">
  • Contribute to Infrastructure as Code (Terraform, Ansible) for monitoring stack deployments and upgrades, including network observability tooling and configuration
  • Experience / Qualifications :

  • ","469777815" : "hybridMultilevel"}" data-aria-posinset="1" data-aria-level="1">
  • Bachelor's Degree or higher in Computer Science, Engineering, or related technical field.
  • ","469777815" : "hybridMultilevel"}" data-aria-posinset="2" data-aria-level="1">
  • 5+ years in an SRE, DevOps, or Production Engineering role supporting distributed systems in production.
  • ","469777815" : "hybridMultilevel"}" data-aria-posinset="3" data-aria-level="1">
  • Hands-on expertise with observability tools : Grafana, Prometheus, Loki, Tempo (or equivalent).
  • ","469777815" : "hybridMultilevel"}" data-aria-posinset="4" data-aria-level="1">
  • Proven track record designing dashboards and alerts around golden signals and USE / RED methodologies, extended to network utilization, saturation, and error metrics.
  • ","469777815" : "hybridMultilevel"}" data-aria-posinset="5" data-aria-level="1">
  • Solid scripting / automation skills in Python and Bash; familiarity with GitLab CI pipelines.
  • ","469777815" : "hybridMultilevel"}" data-aria-posinset="6" data-aria-level="1">
  • Operational experience with Kubernetes and containerized workloads.
  • ","469777815" : "hybridMultilevel"}" data-aria-posinset="7" data-aria-level="1">
  • Strong working knowledge of AWS services, data centre networking fundamentals, routing protocols, load balancing, and network overlays (e.g., VXLAN / EVPN).
  • ","469777815" : "hybridMultilevel"}" data-aria-posinset="8" data-aria-level="1">
  • Experience running incident response and writing actionable post-mortems, including for network-related events.
  • ","469777815" : "hybridMultilevel"}" data-aria-posinset="9" data-aria-level="1">
  • Familiarity with Infrastructure as Code (Terraform, Ansible) and configuration management.
  • ","469777815" : "hybridMultilevel"}" data-aria-posinset="10" data-aria-level="1">
  • Exposure to regulated environments, multi-region networking architectures, and hybrid on-prem / cloud topologies is a plus.
  • ","469777815" : "hybridMultilevel"}" data-aria-posinset="11" data-aria-level="1">
  • Strong communication and collaboration skills; comfortable acting as a generalist across infrastructure, networking, application, and data layers.
  • PsiQuantum provides equal employment opportunity for all applicants and employees. PsiQuantum does not unlawfully discriminate on the basis of race, color, religion, sex (including pregnancy, childbirth, or related medical conditions), gender identity, gender expression, national origin, ancestry, citizenship, age, physical or mental disability, military or veteran status, marital status, domestic partner status, sexual orientation, genetic information, or any other basis protected by applicable laws.

    Note : PsiQuantum will only reach out to you using an official PsiQuantum email address and will never ask you for bank account information as part of the interview process. Please report any suspicious activity to recruiting@psiquantum.com .

    We are not accepting unsolicited resumes from employment agencies.

    The ranges below reflect the target ranges for a new hire base salary. One is for the Bay Area (within 50 miles of HQ, Palo Alto), the second one (if applicable) is for elsewhere in the US (beyond 50 miles of HQ, Palo Alto). If there is only one range, it is for the specific location of where the position will be located. Actual compensation may vary outside of these ranges and is dependent on various factors including but not limited to a candidate's qualifications including relevant education and training, competencies, experience, geographic location, and business needs. Base pay is only one part of the total compensation package. Full time roles are eligible for equity and benefits. Base pay is subject to change and may be modified in the future.

    U.S. Base Pay Range

    $120,000 — $140,000 USD

    Bay Area Pay Range

    $145,000 — $165,000 USD

    serp_jobs.job_alerts.create_a_job

    Site Reliability Engineer • Palo Alto, CA, United States

    Job_description.internal_linking.related_jobs
    Site Reliability Engineer

    Site Reliability Engineer

    Fortinet • Sunnyvale, CA, United States
    serp_jobs.job_card.full_time
    At Fortinet, we strive to provide a supportive, collaborative environment where people are empowered to do the best work of their careers. Our team members enjoy solving complex problems, and obsess...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Site Reliability Engineer | AI Supercomputing

    Site Reliability Engineer | AI Supercomputing

    Luma AI • Palo Alto, CA, United States
    serp_jobs.job_card.full_time
    Site Reliability Engineer | AI Supercomputing.Site Reliability Engineer | AI Supercomputing.Luma AI is building the engine for multimodal general intelligence. To teach models to understand the worl...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_1_hour • serp_jobs.job_card.promoted • serp_jobs.job_card.new
    Senior Reliability Engineer

    Senior Reliability Engineer

    Intuitive • Sunnyvale, California, USA
    serp_jobs.job_card.full_time +1
    We are looking for a talented individual to join our growing Reliability Engineering team focused on innovative approaches to reliability and life testing. This role has the opportunity to work crea...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Site Reliability Engineer - US Government

    Site Reliability Engineer - US Government

    x.ai • Palo Alto, CA, United States
    serp_jobs.job_card.full_time
    Site Reliability Engineer - US Government.AI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly mot...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Senior Site Reliability Developer

    Senior Site Reliability Developer

    Oracle • Pleasanton, California, USA
    serp_jobs.job_card.full_time
    Solve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence.Design write and deploy software to improve the availability scalability and effic...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Senior Technology Site Reliability Engineer

    Senior Technology Site Reliability Engineer

    Cooley LLP • Palo Alto, CA, United States
    serp_jobs.job_card.full_time
    Senior Technology Site Reliability Engineer.Cooley is seeking a Senior Site Reliability Engineer to join the.Infrastructure & Development Operations. The Senior Technology Site Reliability Engineer(...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Reliability Quality Engineer

    Reliability Quality Engineer

    PROCEPT BioRobotics • San Jose, California, USA
    serp_jobs.job_card.full_time +1
    Embark on an enriching journey with PROCEPT BioRobotics where our vision mission and values guide everything we do as a company. At PROCEPT we put the patient first in everything we do andare commit...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Cypress HCM • Fremont, CA, United States
    serp_jobs.job_card.full_time
    As a Site Reliability Engineer (Contractor), you will be a hands-on contributor, focused on supporting and improving the reliability of our AWS cloud infrastructure. You will apply core SRE principl...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Senior Software Engineer, Site Reliability Engineering

    Senior Software Engineer, Site Reliability Engineering

    Google • Mountain View, CA, United States
    serp_jobs.job_card.full_time
    Senior Software Engineer, Site Reliability Engineering.Senior Software Engineer, Site Reliability Engineering.Be among the first 25 applicants. Get AI-powered advice on this job and more exclusive f...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Sr. Reliability Engineer (26861)

    Sr. Reliability Engineer (26861)

    Supermicro • San Jose, CA, United States
    serp_jobs.job_card.full_time
    Supermicro is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop / Big Data, Hyperscale, HPC and IoT / Embedded customers...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Site Reliability Engineer - Observability

    Site Reliability Engineer - Observability

    Rivian and Volkswagen Group Technologies • Palo Alto, CA, United States
    serp_jobs.job_card.full_time
    Senior Site Reliability Engineer (SRE).RivianVW's Data Platform - Production Engineering team.In this role, you will design, implement, and scale robust observability systems to ensure the health, ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Technical Lead, Site Reliability Engineer, Fleetnet, Vehicle Software

    Technical Lead, Site Reliability Engineer, Fleetnet, Vehicle Software

    Tesla • Palo Alto, CA, United States
    serp_jobs.job_card.full_time
    Technical Lead, Site Reliability Engineer, Fleetnet.Technical Lead, Site Reliability Engineer, Fleetnet.Technical Lead, Site Reliability Engineer, Fleetnet. Be among the first 25 applicants.Technica...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Key2Source • San Leandro, California, USA
    serp_jobs.job_card.full_time
    Job Title : Site Reliability Engineer.Location : San Leandro CA (Onsite).Engineering experience or equivalent demonstrated through one or a combination of the following : work experience training mili...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Site Reliability Engineer (L2)

    Site Reliability Engineer (L2)

    Wave Money • Palo Alto, CA, United States
    serp_jobs.job_card.full_time
    Job Location : The Campus, Pun Hlaing Estate, Hlaing Thar Yar Township, Yangon.Working Hours : 8 : 30 AM to 5 : 30 PM, (Monday to Friday). Site Reliability Engineer is to perform daily support and monitor...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Staff Site Reliability Engineer

    Staff Site Reliability Engineer

    Grindr • Palo Alto, CA, United States
    serp_jobs.job_card.full_time
    Staff Site Reliability Engineer.Get AI-powered advice on this job and more exclusive features.This range is provided by Grindr. Your actual pay will be based on your skills and experience — talk wit...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Site Reliability Engineer (SRE) at OPPO US Research Center Palo Alto, CA

    Site Reliability Engineer (SRE) at OPPO US Research Center Palo Alto, CA

    OPPO US Research Center • Palo Alto, CA, United States
    serp_jobs.job_card.full_time
    Site Reliability Engineer (SRE) job at OPPO US Research Center.OPPO US Research Center is seeking a skilled and proactive. Site Reliability Engineer (SRE).In this role, you will be responsible for e...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Site Reliability Engineer – Observability & Automation

    Site Reliability Engineer – Observability & Automation

    black.ai • Palo Alto, CA, United States
    serp_jobs.job_card.full_time
    A leading quantum computing company is seeking a Site Reliability Engineer to join their OS / Platform team in Palo Alto. This role involves maintaining the health and performance of services through ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Sr. Site Reliability Engineer (SRE)

    Sr. Site Reliability Engineer (SRE)

    Avenue Code • Mountain View, CA, United States
    serp_jobs.job_card.full_time
    We’re seeking an experienced, highly collaborative SRE to partner with product teams and tackle our most critical infrastructure challenges. You’ll be hands-on in designing, building, and operating ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted