Talent.com
Senior Site Reliability Engineer
Senior Site Reliability EngineerGovx • California, United States, California, United States
Senior Site Reliability Engineer

Senior Site Reliability Engineer

Govx • California, United States, California, United States
job_description.job_card.30_days_ago
serp_jobs.job_preview.job_type
  • serp_jobs.job_card.full_time
  • serp_jobs.filters.remote
job_description.job_card.job_description

GOVX is seeking an experienced Senior Site Reliability Engineer (SRE) to ensure the reliability, scalability, and performance of our production systems through automation, observability, and operational excellence. This position is remote but must be located in one of the following states : California, Washington, Texas, Tennessee, Florida, Colorado, or New York.

The Senior Site Reliability Engineer (SRE) plays a key role in maintaining resilient infrastructure, monitoring critical services, and improving deployment and recovery processes across environments. The Senior Site Reliability Engineer works under the direction of the Director of Engineering and collaborates closely with Site Reliability Engineers, Automation Engineers, and other members of the engineering organization.

This position will report to the Director of Engineering.

Responsibilities

  • Maintain scalable, secure, and reliable cloud services ensuring reliable system operations within Service Level Objectives.
  • Implement and manage monitoring, alerting, and observability systems using Prometheus, Grafana, and Azure Monitor to proactively identify and resolve issues.
  • Develop and maintain automation scripts and tools in PowerShell, Bash, and C# to improve deployment efficiency, system reliability, and developer productivity.
  • Create, refine, and maintain detailed runbooks for production systems to ensure consistent operational procedures and effective incident response.
  • Define and manage Service Level Indicators (SLIs), Service Level Objectives (SLOs), and error budgets to measure and maintain system reliability.
  • Collaborate with software engineers and automation engineers to integrate reliability practices into CI / CD pipelines using Azure DevOps.
  • Design and implement intelligent alerting strategies that ensure high signal-to-noise ratios and enable rapid triage of critical issues.
  • Participate in incident response, post-incident reviews, and blameless root cause analysis to drive continuous improvement of system reliability and uptime.
  • Contribute to deployment strategy evolution, including blue-green and canary deployments, to minimize downtime and release risk.
  • Collaborate closely with Automation Engineers to enhance automated validation and testing of production environments.
  • Monitor system health, capacity, and performance, providing data-driven insights and recommendations for optimization.
  • Conduct chaos engineering experiments and resilience testing to proactively identify and address system weaknesses.
  • Develop and maintain disaster recovery and business continuity plans, including regular failover testing.
  • Participate in the on-call rotation for platform services, ensuring high availability and rapid incident resolution.
  • Proactively monitor and respond to production support tickets and alerts within established SLA timeframes, delivering first-level diagnosis, troubleshooting, and escalation as needed to maintain system reliability
  • Continuously improve incident response playbooks and reduce Mean Time to Recovery (MTTR).
  • Participate in sprint planning, stand-ups, and retrospectives to ensure alignment with development and operational objectives.
  • Identify opportunities to improve resiliency, reduce toil, and strengthen the reliability culture across the engineering organization.
  • Collaborate with security and compliance teams to ensure infrastructure meets regulatory and security standards.
  • Support cost optimization efforts by monitoring cloud resource usage and recommending efficiency improvements.
  • Explore and integrate AI / ML-based observability tools for predictive monitoring and anomaly detection.
  • 8+ years of professional experience in site reliability, infrastructure, or systems engineering roles.
  • Proficiency with Azure cloud infrastructure, services, and resource management
  • Experience in operating systems, network concepts, protocols, and architecture. Microsoft / Linux operating systems, active directory, OSI.
  • Technical ability in Node JS, .NET / C# and knowledge of both current and legacy architecture, software development practices, and conventions.
  • Strong experience with Rest APIs
  • Hands-on experience with containerization and orchestration using Kubernetes and microservices architecture.
  • Strong automation and scripting skills in PowerShell, Bash.
  • Experience with Infrastructure as Code tools for provisioning and configuration management.
  • Deep understanding of CI / CD processes and tools, preferably using Azure DevOps.
  • Experience implementing and managing observability solutions including Azure Monitor, Application Insights, and Log Analytics Workspaces, Prometheus and Grafana.
  • Strong problem-solving, analytical, and troubleshooting abilities in distributed systems and cloud environments.
  • Ability to write, maintain, and execute operational runbooks and automation for incident management and recovery.
  • Ability to work self-directed, plan and execute projects involving multiple technical resources and stakeholders.
  • Excellent communication and collaboration skills, with the ability to work across software development, infrastructure, and operations teams.

Preferred Education and Experience

  • Bachelor’s degree in Computer Science, Engineering, or related technical field.
  • Experience working in Agile / Scrum delivery environments.
  • Experience supporting .NET applications and microservices in a production environment.
  • Experience supporting SQL Server and Cosmos DB applications in production environments.
  • Knowledge of network fundamentals, load balancing, and high-availability architectures.
  • Supervisory Responsibility

    This position does not include supervisory responsibilities but provides mentorship and technical guidance to the Site Reliability team members.

    Travel Requirements

    Yearly travel to the San Diego office headquarters is expected for this position.

    Work Environment

    This job operates in a professional office environment. This role routinely uses standard office equipment such as computers, phones, photocopiers, filing cabinets, and fax machines. This role occasionally must lift and carry office equipment.

    Physical / Mental Demands

  • Physical – This is largely a sedentary role.
  • Mental – Problem-solving, making decisions, interpreting data, organizing, reading / writing.
  • Reasonable accommodation may be made to enable individuals with disabilities to perform the essential functions.
  • Work Location

    Due to state law and tax implications, remote work candidates must live and work in one of the following states : California, Washington, Texas, Tennessee, Florida, Colorado, or New York.

  • Paid Time Off, Paid Sick Leave, Paid Holidays
  • Competitive Medical, Dental, Vision, and Life Insurance
  • 401(k) plan with discretionary match available
  • Flexible Spending Account (FSA), Health Savings Account (HSA)
  • Voluntary benefits including Critical Illness, Group Accident, and Voluntary Life
  • Employee Referral Program
  • Exposure to a growing ecommerce company
  • Discounts on the GOVX website
  • Salary Range

    $165,000 - 175,000 Annually

    AAP / EEO Statement

    EOE. Veterans / Disabled.  Reasonable accommodation may be made to enable individuals with disabilities to perform the essential functions.

    Position will require successful completion of a background check and drug testing prior to starting employment.

    About GOVX, Inc.

    Savings for Those Who Serve

    GOVX was founded in 2011 to offer exclusive benefits to those who serve our country. The GOVX membership is comprised of current and former members of the United States military, law enforcement, firefighting, medical services, and government personnel. We are dedicated to supporting these communities and to offering unique value to our members, while delivering an authentic platform for brands to reach our growing customer base. As the largest and fastest growing digital platform serving this deserving audience, we are committed to stretching the limits of ecommerce to deliver the best assortment for our members’ on-duty and off-duty needs.

    serp_jobs.job_alerts.create_a_job

    Senior Site Reliability Engineer • California, United States, California, United States

    Job_description.internal_linking.related_jobs
    Remote Exceptional Software Engineers (Experience Using Agents) - AI Trainer ($70-$110 per hour)

    Remote Exceptional Software Engineers (Experience Using Agents) - AI Trainer ($70-$110 per hour)

    Mercor • Tulare, California, US
    serp_jobs.filters.remote
    serp_jobs.job_card.full_time
    Mercor is seeking software engineers to support one of the world’s leading AI labs in building • •robust, high-performance systems • • that serve the needs of next-generation machine learning applicat...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Principal Engineer, Standards and Design

    Principal Engineer, Standards and Design

    Metrolink • California, CA, US
    serp_jobs.job_card.full_time
    PURPOSE OF POSITION The Southern California Regional Rail Authority, the operator of the METROLINK Commuter Rail System, is seeking a Principal Engineer, Design and Standards, who will support the ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Remote Product Tester - $25-45 per hour

    Remote Product Tester - $25-45 per hour

    Online Consumer Panels America • Visalia, California, US
    serp_jobs.filters.remote
    serp_jobs.job_card.part_time +1
    Product Testers are wanted to work from home nationwide in the US to fulfill upcoming contracts with national and international companies. We guarantee 15-25 hours per week with an hourly pay of bet...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Regional Property Manager

    Regional Property Manager

    Community Holdings Management LLC • Tulare, CA, US
    serp_jobs.job_card.full_time
    Every person is expected to perform any reasonable task or request that is consistent with fulfilling company objectives. The Regional Property Manager is solely accountable for all property operati...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Lead Integration support engineer (Web Methods & Control M)

    Lead Integration support engineer (Web Methods & Control M)

    Aroha Technologies • CA, US
    serp_jobs.job_card.full_time
    Position : Lead Integration support engineer (Web Methods & Control M) Location : Culver City, CA (onsite) Duration : Long Term Client : Media Client Job Description : Web Methods 1.Good understanding o...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Senior Railroad Civil Engineer (Project Manager)

    Senior Railroad Civil Engineer (Project Manager)

    Metrolink • California, CA, US
    serp_jobs.job_card.full_time +1
    The Senior Railroad Civil Engineer (Limited Term) will perform project management and engineering duties to deliver the design and construction of railroad projects. This position will be focused on...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Automated Logic Start-Up Technician / Project Manager

    Automated Logic Start-Up Technician / Project Manager

    Jobot • Visalia, CA, US
    serp_jobs.job_card.full_time
    This Jobot Job is hosted by : Christie Bauer.Are you a fit? Easy Apply now by clicking the "Apply Now" buttonand sending us your resume. Salary : $120,000 - $180,000 per year.We are a growin...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Remote Work – Product Assessments - $25-$45 per hour (No Experience)

    Remote Work – Product Assessments - $25-$45 per hour (No Experience)

    Online Consumer Panels America • Visalia, California, US
    serp_jobs.filters.remote
    serp_jobs.job_card.part_time +1
    Product Testers are wanted to work from home nationwide in the US to fulfill upcoming contracts with national and international companies. We guarantee 15-25 hours per week with an hourly pay of bet...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Remote Civil Engineers - AI Trainer ($95-$170 per hour)

    Remote Civil Engineers - AI Trainer ($95-$170 per hour)

    Mercor • Visalia, California, US
    serp_jobs.filters.remote
    serp_jobs.job_card.full_time
    Mercor is recruiting • •Civil Engineers • • as independent contractors working on a research project • •for one of the world’s top AI companies. This project involves using your professional experience ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Remote Exceptional Software Engineers (Coding Agent Experience) - AI Trainer ($85-$85 per hour)

    Remote Exceptional Software Engineers (Coding Agent Experience) - AI Trainer ($85-$85 per hour)

    Mercor • Tulare, California, US
    serp_jobs.filters.remote
    serp_jobs.job_card.full_time
    Mercor is seeking software engineers to support one of the world’s leading AI labs in building • •robust, high-performance systems • • that serve the needs of next-generation machine learning applicat...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Remote Construction Managers - AI Trainer ($100-$180 per hour)

    Remote Construction Managers - AI Trainer ($100-$180 per hour)

    Mercor • Tulare, California, US
    serp_jobs.filters.remote
    serp_jobs.job_card.full_time
    Mercor is recruiting • •Construction Managers • • as independent contractors working on a research project • •for one of the world’s top AI companies. This project involves using your professional exper...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Senior AI Engineer I, Literacy (24 months fixed-term)

    Senior AI Engineer I, Literacy (24 months fixed-term)

    Jobgether • California, United States, California, United States
    serp_jobs.filters.remote
    serp_jobs.job_card.full_time +1
    This position is posted by Jobgether on behalf of Khan Academy.We are currently looking for a Senior AI Engineer I, Literacy (24 months fixed-term) in California (USA). This is an exciting opportuni...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    GOVX • CA, US
    serp_jobs.filters.remote
    serp_jobs.job_card.full_time
    serp_jobs.filters_job_card.quick_apply
    GOVX is seeking an experienced Senior Site Reliability Engineer (SRE) to ensure the reliability, scalability, and performance of our production systems through automation, observability, and operat...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30
    Remote Product Tester – $45 / hr + Free Products – Start Now!

    Remote Product Tester – $45 / hr + Free Products – Start Now!

    OCPA • Shaver Lake, California, us
    serp_jobs.filters.remote
    serp_jobs.job_card.part_time +1
    Product Testers are wanted to work from home nationwide in the US to fulfill upcoming contracts with national and international companies. We guarantee 15-25 hours per week with an hourly pay of bet...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Remote Backend Software Engineer : Go - AI Trainer ($80-$100 per hour)

    Remote Backend Software Engineer : Go - AI Trainer ($80-$100 per hour)

    Mercor • Tulare, California, US
    serp_jobs.filters.remote
    serp_jobs.job_card.part_time
    Mercor is hiring experienced Go Engineers • • to support a variety of high-impact research collaborations with leading AI labs. Freelancers will help improve AI systems through work extending coding b...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Test Products from Home – $25-$45 / hr + Freebies

    Test Products from Home – $25-$45 / hr + Freebies

    OCPA • Shaver Lake, California, us
    serp_jobs.job_card.part_time +1
    Product Testers are wanted to work from home nationwide in the US to fulfill upcoming contracts with national and international companies. We guarantee 15-25 hours per week with an hourly pay of bet...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Property Manager

    Property Manager

    Community Holdings Management LLC • Visalia, CA, US
    serp_jobs.job_card.full_time
    The Property Manager is totally accountable for all property operations.The duty of the Property Manager is to effectively manage the. Community Holdings development in accordance with the managemen...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Leasing Agent

    Leasing Agent

    Community Holdings Management LLC • Visalia, CA, US
    serp_jobs.job_card.permanent
    The Leasing Agent plays a vital role in supporting the mission of providing stable, supportive housing to vulnerable populations, including individuals experiencing homelessness or with behavioral ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted