Talent.com
Senior Site Reliability Engineer, Arlington
Senior Site Reliability Engineer, ArlingtonOnebrief • Remote, Remote, United States
Senior Site Reliability Engineer, Arlington

Senior Site Reliability Engineer, Arlington

Onebrief • Remote, Remote, United States
job_description.job_card.variable_days_ago
serp_jobs.job_preview.job_type
  • serp_jobs.job_card.full_time
job_description.job_card.job_description

About Onebrief

Onebrief is collaboration and AI-powered workflow software designed specifically for military staffs. By transforming this work, Onebrief makes the staff as a whole superhuman - meaning faster, smarter, and more efficient.

We take ownership, seek excellence, and play to win with the seriousness and camaraderie of an Olympic team. Onebrief operates as an all-remote company, though many of our employees work alongside our customers at military commands around the world.

Founded in 2019 by a group of experienced planners, today, Onebrief’s team spans veterans from all forces and global organizations, and technologists from leading-edge software companies. We’ve raised $123m+ from top-tier investors, including Battery Ventures, General Catalyst, Insight Partners, and Human Capital, and today, Onebrief is valued at $1.1B. With this continued growth, Onebrief is able to make an impact where it matters most.

Security Clearance, Location, and Onsite Notice :

This role requires regularly working on-site at customer locations in Arlington, VA.

If you are not currently within commuting distance, you must be willing to relocate (note that Onebrief will provide relocation assistance).

Active Top Secret Clearance required with the ability to obtain SCI eligibility.

About The Role

We are hiring a Site Reliability Engineer to join our Infrastructure & Security team. You’ll work closely with fellow SREs, security, and customer success.

You will be the first line of support for our mission critical deployments, and responsible for ensuring best-in-class service quality and issue resolution. You will work in both on-premise DoD environments and AWS cloud environments. Your lessons from the field will shape how our team works, from policy to implementation.

In addition to working at the customer, you will contribute directly to solutions that increase stability, performance, and security of our deployments, and improve the overall experience of deploying and managing Onebrief on premise.

About You

You are a force multiplier who views reliability as the most critical feature of any application and / or platform and believe that "reliability beats novelty." You see infrastructure and operability as a product to be automated, documented, and continuously improved, always leaving systems easier to operate than you found them.

You are equally comfortable leading a post-incident review, designing SLOs in a system design session, or diving into a

kubectl

shell to triage a complex production issue. You don't just fix problems; you translate constraints and failure modes into clear, automated guardrails and scalable, resilient architecture. For you, robust monitoring, actionable alerting, and insightful runbooks are core parts of the engineering process, not afterthoughts.

You mentor others, fostering a culture of blameless postmortems and proactive reliability. You collaborate naturally with application and platform teams, helping them move quickly but safely by building the tools, processes, and observability that make "fast recovery" a reality.

What You'll Do

You'll own the reliability, scalability, and security of the production application and / or platform. You will do this by :

Building a World-Class Observability Platform : Design, implement, and manage our monitoring, logging, and alerting stack (e.g., Prometheus, Loki, Alloy, and Grafana). You won't just track metrics; you'll create the actionable insights and automated alerting that allow teams to identify and resolve issues before they impact users.

Defining and Upholding Reliability : Define, measure, and own alerting that feeds into our Service Level Objectives (SLOs) and increases trust internally and externally. You will be the organization's expert on what it means for our systems to be reliable and how to measure it.

Leading Incident Response : Act as the incident responder and potentially incident commander during critical incidents You will lead blameless post-mortems / After Action Reviews (AARs) that identify true root causes and drive automated, long-term solutions to prevent recurrence.

Automating for Scale and Security : Partner with platform engineers to design, build, and manage secure, resilient Kubernetes clusters and cloud / on-prem environments using Infrastructure-as-Code (Terraform, Ansible). You will embed security and compliance controls (RMF, STIGs) directly into this automation.

Eliminating Toil and Scaling the Team : Proactively identify and eliminate operational toil by building automation. You will act as a force multiplier by advising other teams on best practices in air-gapped environments and production readiness.

What We Look For

3 years of experience in Site Reliability Engineering or a related field, with firsthand experience managing mission-critical systems within DoD’s air-gapped environments

An active Top Secret security clearance. U.S. citizenship required.

Experience automating software delivery, deployment, and providing documentation and self-service tools for engineering teams and customers.

A strong understanding of Linux, containerization and orchestration, and virtual machines

Experience with centralized logging, metrics, and observability using tools such as Prometheus, Loki, Grafana, ELK stack, or Datadog.

Networking fundamentals : core protocols and secure configurations.

A deep understanding of incident response processes, with experience conducting thorough root cause analyses and driving continuous improvement

Clear, concise writing; strong documentation habits and async communication.

Core skills and technologies : VMWare, Kubernetes, Docker, Helm, Ansible, Terraform, Linux, AWS, DoD compliance, Monitoring and Observability tools, AWS.

Bonus points (nice to have)

Experience with compliance frameworks (RMF, STIGs / SRGs, ICD 503).

Security‑minded design for air-gapped environments.

Active Security+ or another DoD 8570.01-approved security credential, or the ability to obtain the valid credentials within 3 months of employment.

Notice to Third Party Recruitment Agencies

Please note that Onebrief does not accept unsolicited resumes from recruiters or employment agencies. In the absence of an executed Recruitment Services Agreement, there will be no obligation to any referral compensation or recruiter fee. In the event a recruiter or agency submits a resume or candidate without an agreement Onebrief explicitly reserves the right to pursue and hire those candidate(s) without any financial obligation to the recruiter or agency. Any unsolicited resumes, including those submitted to hiring managers, shall be deemed the property of Onebrief.

serp_jobs.job_alerts.create_a_job

Senior Site Reliability Engineer • Remote, Remote, United States

Job_description.internal_linking.related_jobs
Site Reliability Engineer

Site Reliability Engineer

Real Time Technologies • Remote, Remote, United States
serp_jobs.filters.remote
serp_jobs.job_card.full_time
Realtime technologies, LLC offers the most flexible cutting-edge Retail Management Solutions that encompass sales, inventory management, frontline employee management and engagement, payments, busi...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Principal Site Reliability Engineer

Principal Site Reliability Engineer

Expel • Remote, Remote, United States
serp_jobs.filters.remote
serp_jobs.job_card.full_time
Your passion for uptime was forged from experience in production and refined through incident response.You’re an Expel Principal Site Reliability Engineer - a protector, champion, and leader of Exp...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Staff Site Reliability Engineer

Staff Site Reliability Engineer

Topstep • United States, United States, United States
serp_jobs.job_card.full_time
Are you a systems-minded engineer who thrives on building resilient infrastructure, driving operational excellence, and enabling teams to move fast with confidence? As a Staff Site Reliability Engi...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Staff Site Reliability Engineer

Staff Site Reliability Engineer

Palmetto Clean Technology • Remote, Remote, United States
serp_jobs.filters.remote
serp_jobs.job_card.full_time
Palmetto is a leading clean tech company on a mission to accelerate the transition to a clean energy future.With a belief that consumers can. Our award-winning technology platform empowers homeowner...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_hours • serp_jobs.job_card.promoted • serp_jobs.job_card.new
Site Reliability Engineer

Site Reliability Engineer

Saferide Health • United States, United States, United States
serp_jobs.filters.remote
serp_jobs.job_card.full_time
SafeRide Health is seeking a Site Reliability Engineer to develop and implement new processes that support software delivery excellence and operational discipline, to ensure that user-facing servic...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Site Reliability Engineer

Site Reliability Engineer

Cutover • Remote, Remote, United States
serp_jobs.filters.remote
serp_jobs.job_card.full_time
An inclusive work environment is an empowering one.At Cutover, we lead with empathy and enable others to succeed through curiosity, kindness, and self-expression. Location : Remote, United States.Shi...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Senior / Principal Site Reliability Engineer

Senior / Principal Site Reliability Engineer

Datacrunch • Remote, Remote, United States
serp_jobs.filters.remote
serp_jobs.job_card.full_time +1
Imagine a future where everyone has instant, low-cost access to intelligence.We’re building a fully featured European AI cloud - with everything one needs to train, experiment with, and deploy AI m...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Staff Site Reliability Engineer - Platform

Staff Site Reliability Engineer - Platform

Ionq • Remote, Remote, United States
serp_jobs.filters.remote
serp_jobs.job_card.full_time +1
IonQ is developing the world's most powerful full-stack quantum computer based on trapped-ion technology.We are pushing past the limits of classical physics and current supercomputing technology to...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Principal Site Reliability Engineer

Principal Site Reliability Engineer

Blue River Technology • Remote, Remote, United States
serp_jobs.filters.remote
serp_jobs.job_card.full_time
We’re Blue River, a team of innovators driven to create intelligent machinery that solves monumental problems for our customers. We empower our customers – farmers, construction crews, and foresters...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Staff Site Reliability Engineer

Staff Site Reliability Engineer

Sentinelone • Remote, Remote, United States
serp_jobs.filters.remote
serp_jobs.job_card.full_time
Please note that under Federal & FedRAMP regulations, hiring for this role is limited to US citizens only.FedRamp Staff may be subject to customer or third-party background checks up to and includi...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Senior Site Reliability Engineer - Growth

Senior Site Reliability Engineer - Growth

Kraken • United States, United States, United States
serp_jobs.filters.remote
serp_jobs.job_card.full_time
Building the Future of Crypto .Our Krakenites are a world-class team with crypto conviction, united by our desire to discover and unlock the potential of crypto and blockchain technology.Kraken is ...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Site Reliability Engineer

Site Reliability Engineer

Akaasa Technologies • United States
serp_jobs.job_card.full_time
serp_jobs.filters_job_card.quick_apply
Title - Site Reliability Engineer Sr.Duration - contract to hire Location - Cincinnati, OH serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days
Site Reliability Engineer

Site Reliability Engineer

EngFlow Inc. • US
serp_jobs.filters.remote
serp_jobs.job_card.full_time
serp_jobs.filters_job_card.quick_apply
Our cloud-based, distributed service optimizes developer workflows through remote execution and caching, improving efficiency, productivity, and product quality. Backed by top investors, EngFlow is ...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30
Staff Site Reliability Engineer - Spacetime

Staff Site Reliability Engineer - Spacetime

Aalyria • Remote, Remote, United States
serp_jobs.filters.remote
serp_jobs.job_card.full_time +1
This isn't a "keep the lights on" SRE role.This is a strategic, high-impact opportunity to build the nervous system for a platform that transforms how networks of satellites, ground stations, and f...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Senior Site Reliability Engineer

Senior Site Reliability Engineer

Patreon • Remote, Remote, United States
serp_jobs.filters.remote
serp_jobs.job_card.full_time
Patreon is a media and community platform where over 300,000 creators give their biggest fans access to exclusive work and experiences. We offer creators a variety of ways to engage with their fans ...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Staff / Principal Site Reliability Engineer

Staff / Principal Site Reliability Engineer

Veza Technologies • Remote, Remote, United States
serp_jobs.filters.remote
serp_jobs.job_card.full_time
Staff / Principal Site Reliability Engineer.You'll architect scalable solutions, navigate complex technical challenges independently, and deliver results under tight deadlines in a fast-paced environ...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Senior Site Reliability Engineer

Senior Site Reliability Engineer

Sciencelogic • Remote, Remote, United States
serp_jobs.filters.remote
serp_jobs.job_card.full_time
ScienceLogic is redefining IT operations for the modern enterprise.Our AIOps platform empowers organizations to achieve Autonomic IT — where systems are self-healing, self-optimizing, and seamlessl...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Site Reliability Engineer - Spacetime

Site Reliability Engineer - Spacetime

Aalyria • Remote, Remote, United States
serp_jobs.filters.remote
serp_jobs.job_card.full_time +1
This isn't a "keep the lights on" SRE role.This is a strategic, high-impact opportunity to build the nervous system for a platform that transforms how networks of satellites, ground stations, and f...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted