Talent.com
Senior Site Reliability Engineer, Healthcare Cloud Infrastructure and Networking
Senior Site Reliability Engineer, Healthcare Cloud Infrastructure and NetworkingCollective Health • San Francisco, California, USA
Senior Site Reliability Engineer, Healthcare Cloud Infrastructure and Networking

Senior Site Reliability Engineer, Healthcare Cloud Infrastructure and Networking

Collective Health • San Francisco, California, USA
job_description.job_card.variable_days_ago
serp_jobs.job_preview.job_type
  • serp_jobs.job_card.full_time
job_description.job_card.job_description

At Collective Health were transforming how employers and their people engage with their health benefits by seamlessly integrating cutting-edge technology compassionate service and world-class user experience design.

As a Sr. Site Reliability Engineer you will be a key player in designing building and maintaining the cloud infrastructure that powers our healthcare applications. You will blend software engineering with systems and network administration expertise to solve complex multi-cloud connectivity and operational challenges. Your work will directly impact patient care by ensuring our services are always available and performant. You will be responsible for the availability latency performance efficiency monitoring and emergency response of our production environment with a special focus on meeting stringent healthcare compliance standards like HIPAA SOC 2 & HITRUST.

What youll do :

  • Cloud Infrastructure Management :

Design deploy and manage scalable secure and highly available infrastructure on multi-cloud platforms - AWS and GCP.

  • Implement and manage Infrastructure as Code (IaC) using tools like Terraform Ansible to automate provisioning and configuration.
  • Manage containerized applications and orchestration platforms primarily Kubernetes and Docker.
  • Cloud Networking Engineering :
  • Lead the architecture implementation and maintenance of secure cloud connectivity solutions to ensure compliant and high-throughput data exchange with external healthcare partners.

  • Design implement and maintain highly available and secure cloud network topologies (e.g. VPCs subnets routing tables and peering) across multiple regions and multiple cloud technologies.
  • Expertly configure and manage cloud load balancing (e.g. ALB NLB GCLB) and DNS services (e.g. Route 53 Cloud DNS) for optimal traffic distribution and low latency.
  • Own the end-to-end lifecycle of TLS / SSL termination key rotation and certificate management across all load balancers to enforce stringent security postures.
  • Design and enforce network segmentation and Zero Trust Architecture principles at the network layer to secure Protected Health Information (PHI).
  • Perform network performance analysis and troubleshooting for latency throughput and connectivity issues specifically within the cloud providers network infrastructure.
  • Site Reliability & Automation :
  • Develop and implement Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to maintain and improve system reliability.

  • Automate manual operational tasks from deployments and scaling to incident response and recovery.
  • Conduct blameless post-mortems and root cause analyses to prevent recurrence of incidents.
  • Participate in an on-call rotation to respond to production issues and drive them to resolution.
  • Monitoring & Application Support :
  • Build and maintain robust monitoring logging and alerting systems using tools like Prometheus Grafana or the ELK stack.

  • Work closely with software development teams to improve the reliability and performance of applications.
  • Manage CI / CD pipelines to ensure safe automated and efficient software releases.
  • Ensure all systems and processes are compliant with healthcare regulations and security best practices (HIPAA SOC 2).
  • To be successful in this role youll need :

  • Required Qualifications
  • Bachelors degree in Computer Science Engineering or a related field or equivalent practical experience.

  • 8 years of experience in a Network Engineering or Cloud Engineering role.
  • Strong proficiency with AWS and GCP cloud providers.
  • Deep expertise in network observability tools and designing systems for continuous network compliance auditing against HIPAA / HITRUST standards.
  • Expert-level proficiency in VPC design advanced routing protocols IP Address Management (IPAM) and Container Network Interface (CNI) configuration.
  • Hands-on experience with containerization and orchestration technologies such as Kubernetes Docker.
  • Solid experience with Infrastructure as Code tools such as Terraform Ansible.
  • Proficiency in scripting and / or programming languages such as Python Go Bash.
  • Experience in capacity planning cost analysis and justification for the architecture and design proposals.
  • Experience working in a regulated industry (e.g. healthcare finance) with a strong understanding of security and compliance requirements.
  • Preferred Qualifications
  • Multi-cloud & Multi-site networking experience with AWS and GCP.

  • Experience with Service Mesh technologies (e.g. Istio Linkerd) to manage and secure inter-service communication.
  • Relevant certifications : AWS Certified Advanced Networking or GCP Professional Cloud Network Engineer.
  • Pay Transparency Statement

    This is a hybrid position based out of one of our offices : San Francisco CA Lehi UT or Plano TX. Hybrid employees are expected to be in the office two days per week. #LI-hybrid

    The actual pay rate offered within the range will depend on factors including geographic location qualifications experience and internal addition to the salary you will be eligible for stock options and benefits like health insurance 401k and paid time off. Learn more about our benefits at Francisco CA Pay Range

    $168000 $210000 USD

    Lehi UT Pay Range

    $134500 $168000 USD

    Plano TX Pay Range

    $147800 $185500 USD

    Why Join Us

  • Mission-driven culture that values innovation collaboration and a commitment to excellence in healthcare
  • Impactful projects that shape the future of our organization
  • Opportunities for professional development through internal mobility opportunities mentorship programs and courses tailored to your interests
  • Flexible work arrangements and a supportive work-life balance
  • We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race religion color national origin gender sexual orientation age marital status veteran status or disability status. Collective Health is committed to providing support to candidates who require reasonable accommodation during the interview process. If you need assistance please contact .

    Privacy Notice

    For more information about why we need your data and how we use it please see our privacy policy : Experience :

    Senior IC

    Key Skills

    Kubernetes,FMEA,Continuous Improvement,Elasticsearch,Go,Root cause Analysis,Maximo,CMMS,Maintenance,Mechanical Engineering,Manufacturing,Troubleshooting

    Employment Type : Full Time

    Experience : years

    Vacancy : 1

    serp_jobs.job_alerts.create_a_job

    Senior Site Reliability Engineer • San Francisco, California, USA

    Job_description.internal_linking.related_jobs
    Senior Site Reliability Engineer, Healthcare Cloud Infrastructure and Networking

    Senior Site Reliability Engineer, Healthcare Cloud Infrastructure and Networking

    Collective Health • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Senior Site Reliability Engineer, Healthcare Cloud Infrastructure and Networking.At Collective Health, we’re transforming how employers and their people engage with their health benefits by seamles...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Senior Technology Site Reliability Engineer

    Senior Technology Site Reliability Engineer

    Cooley LLP • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Senior Technology Site Reliability Engineer.Cooley is seeking a Senior Site Reliability Engineer to join the.Infrastructure & Development Operations. The Senior Technology Site Reliability Engineer(...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Chainlink Labs • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Chainlink Labs is the primary contributing developer of Chainlink, the decentralized computing platform powering the verifiable web. Chainlink is the industry-standard platform for providing access ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Site Reliability Engineer

    Site Reliability Engineer

    ConductorOne • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    ConductorOne is the first AI-native identity security platform that protects every identity : human, non-human, and AI.With powerful automation, platform-level AI, and out-of-the-box connectors, it ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Senior Site Reliability Engineer – Platform

    Senior Site Reliability Engineer – Platform

    Icon Ventures • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    At Quizlet, our mission is to help every learner achieve their outcomes in the most effective and delightful way.We blend cognitive science with machine learning to personalize and enhance the lear...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Cloud Site Reliability Engineer (SRE)

    Cloud Site Reliability Engineer (SRE)

    Promise • Oakland, California, United States
    serp_jobs.job_card.full_time +1
    Promise empowers utilities and government agencies to create flexible, affordable solutions for individuals struggling with debt. Our innovative approach to payment plans and relief distribution sig...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Senior Site Reliability Engineer, Compute

    Senior Site Reliability Engineer, Compute

    Roblox • San Mateo, California, USA
    serp_jobs.job_card.full_time
    The Infrastructure Compute Site Reliability Engineering (SRE) teams mission is to own and manage the successful operation of our underlying cell infrastructure system along with elements of service...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Site Reliability Engineer, Frontier Systems Infrastructure

    Site Reliability Engineer, Frontier Systems Infrastructure

    OpenAI • San Francisco, California, United States
    serp_jobs.job_card.full_time
    The Frontier Systems team at OpenAI builds, launches, and supports the largest supercomputers in the world that OpenAI uses for its most cutting edge model training. We take data center designs, tur...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Freed • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Doctors are overworked, burnt out, and are quitting in masses.At Freed, we combine clinician love with the latest AI tech and intense execution to create products that make clinicians happier.Our f...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Conductorone • San Francisco, California, United States
    serp_jobs.job_card.full_time
    ConductorOne is the modern identity governance platform that makes it possible to move beyond the limitations of legacy IGA and reduce the identity attack surface with confidence.Designed for flexi...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Circle • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Senior Site Reliability Engineer at Circle.Circle is a financial technology company at the epicenter of the emerging internet of money. Our infrastructure—including USDC, a blockchain‑based dollar—h...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Hive • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Hive is the leading provider of cloud-based AI solutions to understand, search, and generate content, and is trusted by hundreds of the world's largest and most innovative organizations.The company...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Loft Orbital Solutions • San Francisco, California, United States
    serp_jobs.job_card.full_time
    Loft Orbital builds a space infrastructure providing a fast & simple path to orbit.We operate satellites, fly customer payloads onboard and handle the entire mission from initial concept through in...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Site Reliability Engineer - Scale & Reliability Leader

    Site Reliability Engineer - Scale & Reliability Leader

    Alchemy • San Francisco, CA, US
    serp_jobs.job_card.full_time
    An established industry player is seeking an Infrastructure Engineer to enhance developer productivity and ensure product reliability. In this pivotal role, you will collaborate with a talented engi...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Senior Site Reliability Engineer - Managed Kubernetes

    Senior Site Reliability Engineer - Managed Kubernetes

    Lambda • San Francisco, California, United States
    serp_jobs.filters.remote
    serp_jobs.job_card.full_time
    We're here to help the smartest minds on the planet build Superintelligence.The labs pushing the edge? They run on Lambda. Our gear trains and serves their models, our infrastructure scales with the...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Site Reliability Engineer II

    Site Reliability Engineer II

    Hinge-Health • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Site Reliability Engineers at Hinge Health are infrastructure engineers with a strong sense of ownership over the systems that keep our platform running reliably, securely, and efficiently.From sca...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Checkr • San Francisco, California, United States
    serp_jobs.job_card.full_time
    Checkr is building the data platform to power safe and fair decisions.Established in 2014, Checkr’s innovative technology and robust data platform help customers assess risk and ensure safety and c...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Senior Site Reliability Engineer Cloud Platform

    Senior Site Reliability Engineer Cloud Platform

    Zilliz • Redwood City, California, United States
    serp_jobs.job_card.full_time
    Zilliz is a fast-growing startup developing the industry’s leading .Founded by the engineers behind Milvus, the world’s most popular . On a mission to democratize AI, Zilliz is committed to simplify...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted