Talent.com
Principal Site Reliability Engineer
Principal Site Reliability EngineerQgenda • Atlanta, Georgia, United States
Principal Site Reliability Engineer

Principal Site Reliability Engineer

Qgenda • Atlanta, Georgia, United States
job_description.job_card.30_days_ago
serp_jobs.job_preview.job_type
  • serp_jobs.job_card.full_time
  • serp_jobs.job_card.permanent
job_description.job_card.job_description

Who We Are

QGenda is redefining healthcare workforce management everywhere care is delivered. We're on a mission to empower the healthcare industry to better onboarding, deploy, and manage their workforce. Over 4,500 healthcare organizations have trusted us to help them make strategic workforce decisions through our unified software platform. With more than 700 employees across the US, we are united in our vision and culture to make a difference for our customers, while enjoying the day-to-day.

At QGenda, we value our employees and their contributions toward the success of the business. We strive to create a dynamic work environment that fosters growth, innovation, and collaboration, where employees can be proud of the work they do and the impact it has on the healthcare industry.

QGenda is headquartered in Atlanta.

To learn more about QGenda, visit us at qgenda.com or follow us on Instagram  or LinkedIn .

About Your Role

As a Principal Site Reliability Engineer, you will work with our Infrastructure and Product Development Teams to design, operate, and scale highly available services on AWS. You’ll lead automation and infrastructure-as-code efforts to eliminate toil, standardize configuration, and expand observability across metrics, logs, and traces. You will evaluate and introduce AWS services and tooling that improve reliability, performance, and developer velocity. This role offers the opportunity to shape our reliability roadmap and make a measurable impact on the resilience and evolution of our technology stack.

How You’ll Make an Impact

System Reliability and Performance :

  • Design, implement, and manage scalable systems that ensure high availability, fault tolerance, and optimal performance.
  • Continuously monitor and enhance system health and performance through data analysis and metrics.
  • Embed observability (metrics, logs, traces, alerts) with actionable thresholds and up-to-date runbooks.

Automation and Tooling :

  • Eliminate toil by building automation and self-service tools for common operational workflows.
  • Own CI / CD pipelines (build, test, security scans) and enable progressive delivery (blue / green, canary).
  • Manage infrastructure as code via Terraform and configuration management with Git-backed workflows.
  • Incident Management and Troubleshooting :

  • Participate in on-call; triage, mitigate, and resolve incidents within defined SLAs.
  • Lead incident response and blameless post-incident reviews; document RCAs and drive corrective actions to closure.
  • Maintain runbooks / playbooks and regularly perform disaster recovery scenarios.
  • Infrastructure Management :

  • Operate and secure AWS environments (IAM, VPC, EC2 / ECS, RDS, S3, Lambda, etc.) with a focus on resilience and compliance.
  • Optimize cost, performance, and reliability (rightsizing, autoscaling, reservations / savings plans, tagging, spend monitoring, etc.).
  • Collaboration & Culture :

  • Serve as a technical advisor to engineering teams on infrastructure and operations best practices.
  • Mentor peers on SRE practices; promote observability, continuous improvement, and a blameless culture.
  • Contribute to roadmaps and capacity planning to align reliability goals with product objectives.
  • Who You Are

  • Availability for off-hours deployment and upgrades of production systems during release and maintenance windows. This is a rotational setup where you would be on two weeks at a time.
  • Strong problem-solving skills and ability to work effectively under pressure.
  • Excellent communication skills for cross-functional collaboration as well as documentation creation.
  • Experience You Bring

  • B.S. in Computer Science, Computer Information Systems, or Computer Engineering from a major U.S. university or equivalent industry experience
  • 8+ years of experience as a DevOps, SRE or Systems Engineer
  • Advanced proficiency with at least one scripting or programming language
  • Experience with Docker and container orchestration tools such as AWS ECS
  • Hands-on experience building infrastructure and supporting applications in AWS using services such as Lambda, EC2, ECS, S3, SNS, SQS, RDS, Redshift, and Elasticache
  • Experience with logging, creating dashboards, and alerts using observability tools such as Datadog and Amazon CloudWatch
  • Strong understanding of networking and DNS
  • Familiarity with configuration management and infrastructure as code (IaC) tools such as Terraform
  • Firm understanding and experience with Agile and Scrum SDLC processes
  • Using distributed version control system experience (Git preferred) to check-in code, branching, merging, pull request, code review, etc
  • Knowledge of CI / CD best practices and tools such as AWS CodeBuild, Jenkins and / or TeamCity
  • Experience designing and delivering secure, high performance and highly available cloud services
  • Not Required, But Nice to Have

  • Experience with automation tools related to MLOps or AIOps such as AWS Bedrock and / or SageMaker.
  • #LI-Hybrid

    Applicants for this position must be authorized to work for any employer in the United States(U.S.), including being located in the US. We are unable to sponsor, take over sponsorship of, or hire candidates with an employment visa at this time.

    What’s In It For You

    We offer a comprehensive total rewards package to support our full-time employees and their family’s day-to-day needs, well-being and major life events, which includes :

  • Fully company-paid options for medical (both in-person and virtual), dental and vision insurance
  • Generous paid time off (PTO) policy to enjoy periods of uninterrupted rest and relaxation for a healthy work / life balance
  • Paid parental leave for birth, adoption or permanent placement
  • 401(k) with company match
  • Options to work in a hybrid-working model or remotely from home, depending on the position
  • Annual Costco membership, cell phone stipend, commuter benefits, in-office perks and more
  • QGenda delivers technology solutions to improve how healthcare is delivered and increase access - for everyone. We can only succeed by bringing together diverse minds, thoughts, ideas and team members to create better solutions for our customers and make us a better company as a whole. We are committed to creating a culture of embracing diversity, inclusion and equity for all.

    QGenda is an Equal Employment Opportunity employer and makes all employment decisions without regard to race, color, religion, creed, gender, sex (including pregnancy), sexual orientation, gender identity or expression, natural origin, ancestry, age, marital status, disability or genetic information, military status, status as a disabled or protected veteran or any other protected status under applicable law.

    If you require accommodations or assistance to complete the online application process, please contact recruiting@qgenda.com and identify the type of accommodation or assistance you are requesting. Do not include any medical or health information in this email. We will respond to your email promptly.

    serp_jobs.job_alerts.create_a_job

    Site Reliability Engineer • Atlanta, Georgia, United States

    Job_description.internal_linking.related_jobs
    Field Service Manager

    Field Service Manager

    Step Up Recruiting • Fayetteville, GA, US
    serp_jobs.job_card.full_time
    serp_jobs.filters_job_card.quick_apply
    Candidates must be great at leading people.This is a leadership role requiring them to inspire the team and get them to buy into all company goals and standards while feeling appreciated, worthy, a...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Senior Systems Reliability Engineer - Urgently Hiring!

    Senior Systems Reliability Engineer - Urgently Hiring!

    ADP • Alpharetta, GA, United States
    serp_jobs.job_card.full_time
    Senior Systems Reliability Engineer in our Alpharetta, GA location.Are you empathetic to client needs and inspired by transformation and impacting the lives of millions of people every day?.Are you...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days
    Sr. Manager, Engineering

    Sr. Manager, Engineering

    OpenGov • Atlanta, GA, United States
    serp_jobs.job_card.full_time
    OpenGov is the leader in AI and ERP solutions for local and state governments in the U.More than 2,000 cities, counties, state agencies, school districts, and special districts rely on the OpenGov ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Cloud Infrastructure Site Reliability Engineer

    Cloud Infrastructure Site Reliability Engineer

    Matlen Silver • Alpharetta, GA, United States
    serp_jobs.job_card.full_time
    Cloud Infrastructure Site Reliability Engineer.Location : Alpharetta Georgia OR Berkeley Heights NJ.As a Cloud Infrastructure Site Reliability Engineer (SRE) with expertise in multiple public cloud ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_1_day • serp_jobs.job_card.promoted
    Associate Director\Software Engineer - Remote

    Associate Director\Software Engineer - Remote

    Gallin Associates • Atlanta, GA, US
    serp_jobs.filters.remote
    serp_jobs.job_card.full_time
    serp_jobs.filters_job_card.quick_apply
    Solves complex and escalated aspects of a project, performing coding, debugging, testing and troubleshooting throughout the development process. Outline an end-to-end approach to creating a full SDL...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Site Reliability Engineer II

    Site Reliability Engineer II

    Axon • Atlanta, Georgia, United States
    serp_jobs.job_card.full_time
    Join Axon and be a Force for Good.At Axon, we’re on a mission to Protect Life.We’re explorers, pursuing society’s most critical safety and justice issues with our ecosystem of devices and cloud sof...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Principal Site Reliability Engineer - Federal Team

    Principal Site Reliability Engineer - Federal Team

    Saviynt • Atlanta, Georgia, United States
    serp_jobs.job_card.full_time
    Saviynt's AI-powered identity platform manages and governs human and non-human access to all of an organization's applications, data, and business processes. Customers trust Saviynt to safeguard the...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Site Reliability Engineer

    Site Reliability Engineer

    CD Newco LLC d / b / a Curve Dental • Alpharetta, Georgia, United States, 30009
    serp_jobs.job_card.full_time
    At Flex Dental, we go beyond checking boxes; our integration and automation are unparalleled.Every feature serves a purpose, creating seamless collaboration with Open Dental’s practice management s...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30
    Principal Account Engineer

    Principal Account Engineer

    Munich RE • Atlanta, GA, United States
    serp_jobs.job_card.full_time
    Atlanta, United States; Chicago, United States; Dallas, United States;.Use your experience in the power generation industry or heavy manufacturing to become an integral member of HSB's heavy indust...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    R&D Senior Engineer II

    R&D Senior Engineer II

    Avanos Medical • Alpharetta, GA, United States
    serp_jobs.job_card.full_time
    Job Title : R&D Senior Engineer II.Job Country : United States (US).Here at Avanos Medical, we passionately believe in three things : . Making a difference in our products, services and offers, never ceas...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Insurance Agent (Base salary + Uncapped commissions)

    Insurance Agent (Base salary + Uncapped commissions)

    Comparion Insurance Agency • Fayetteville, Georgia, United States
    serp_jobs.job_card.full_time
    Schedule : Full-Time Salary Range : USD $32000.Job Category : Sales - Comparion The typical starting salary range for this role is determined by a number of factors including skills, experience, educa...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Senior Site Reliability Engineer - Featurespace

    Senior Site Reliability Engineer - Featurespace

    Visa • Atlanta, Georgia, United States
    serp_jobs.job_card.full_time
    Visa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more t...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Data Center Technician

    Data Center Technician

    TEKsystems • Fayetteville, GA, United States
    serp_jobs.job_card.full_time
    Will work 100% onsite in Fayetteville GA •.Seeking Level 2, Level 3, Level 4, and Level 5 / Lead Data Center Technicians •. Team will be handling troubleshooting of large GPU deployments inside of a h...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Site Reliability Engineer

    Site Reliability Engineer

    T-Mobile USA, Inc. • Atlanta, GA, United States
    serp_jobs.job_card.full_time +1
    At T-Mobile, we invest in YOU! Our Total Rewards Package ensures that employees get the same big love we give our customers. All team members receive a competitive base salary and compensation pack...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Civil Engineer – Land Development

    Civil Engineer – Land Development

    Jobot • Berkeley Lake, GA, US
    serp_jobs.job_card.full_time
    Growing Engineering Firm | Great Compensation Package | Upwards Career Growth!.This Jobot Job is hosted by : Lauren Lehman. Are you a fit? Easy Apply now by clicking the "Apply Now" button and sendin...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Systems Engineer

    Systems Engineer

    Delta Dental of California • Alpharetta, GA, United States
    serp_jobs.job_card.full_time
    EMPLOYER : Delta Dental Insurance Company.Location : 1130 Sanctuary Pkwy, Alpharetta, GA 30009; Must live within reasonable distance from HQ and appear in office as required.Monitor and work ticket q...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Healthcare Process Engineer

    Healthcare Process Engineer

    Care Logistics • Alpharetta, GA, US
    serp_jobs.job_card.full_time
    The Healthcare Process Engineer collaborates with the Transformation Team to advance the Care Logistics mission.This involves aiding hospitals in overhauling their operations through a blend of min...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_hours • serp_jobs.job_card.promoted • serp_jobs.job_card.new
    Site Reliability Engineering Manager (Alpharetta)

    Site Reliability Engineering Manager (Alpharetta)

    LexisNexis Risk Solutions • Alpharetta, GA, US
    serp_jobs.job_card.part_time
    Are you an experienced Site Reliability Engineering leader ready to shape strategy, inspire teams, and drive innovation at scale?. Are you looking to lead a high-impact SRE team where your leadershi...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_1_day • serp_jobs.job_card.promoted