Site Reliability Engineer

Unstructured TechnologiesSan Francisco, California, United States

job_description.job_card.30_days_ago

serp_jobs.job_preview.job_type

serp_jobs.job_card.full_time

job_description.job_card.job_description

Unstructured builds open-source and commercial tools that enable developers to preprocess and transform unstructured data — PDFs, HTML, Word docs, images, and more — for AI / ML pipelines. Our solutions power production-grade, scalable generative AI use cases at leading enterprises.

We’re a team of builders obsessed with performance, simplicity, and reliability. If you’re excited by complex systems, cutting-edge ML infrastructure, and high-impact problems, we’d love to meet you.

We’re looking for a

Site Reliability Engineer to help us scale our infrastructure, automate deployments, and ensure the reliability and performance of our systems as we grow. This role is critical to the health of our platform and will work closely with Engineering, Product, and Customer teams to deliver resilient and efficient software systems for enterprise deployments.

You'll have the opportunity to work across a modern stack (Python, Kubernetes, Helm, CI / CD with GitHub Actions, etc.), influence infrastructure decisions from day one, and help shape reliability culture across the company.

This role is hybrid in San Francisco—join us in-office 3x a week for deep collaboration, whiteboard sessions, and hands-on impact.

What You'll Do

Design and implement highly available, scalable, and observable systems across our platform

Automate infrastructure with tools like Terraform, Pulumi, and build reusable CI / CD pipelines

Maintain and optimize Kubernetes clusters, container orchestration, and service mesh configurations

Set up and manage monitoring and alerting for performance, reliability, and uptime (e.g., Elastic, Prometheus, Grafana, Datadog)

Improve developer velocity through tooling, automation, and infrastructure improvements

Lead or support incident response, root cause analysis, and blameless postmortems

Partner with engineering teams on production readiness, capacity planning, and rollout strategies

What We're Looking For

4+ years of experience in an SRE, DevOps, or Infrastructure Engineering role

Deep expertise in cloud platforms (AWS, GCP, or Azure)

Hands-on experience with Kubernetes, Docker, and container orchestration

Strong skills in Linux systems, networking, and scripting (e.g., Bash, Python, Go)

Proficiency with Infrastructure-as-Code (Terraform, CloudFormation, Ansible, etc.)

Familiarity with monitoring, logging, and observability practices and tools

Experience supporting production systems and operating in high-scale environments

Bonus Points

Experience with machine learning infrastructure or data pipeline systems

Exposure to serverless or event-driven architectures

Contributions to open source projects or DevOps communities

Familiarity with security best practices for cloud-native environments

Why Join Us?

Remote-first team with flexible work style and async collaboration

Opportunity to own critical infrastructure at a fast-growing company

Work on impactful problems at the intersection of data and AI

Competitive salary, equity, and benefits package

Supportive, high-performance team culture

190,000 - $250,000 a year

This role's salary is benchmarked against San Francisco market rates to remain competitive with top-tier talent in high-cost-of-living regions. Final compensation may vary based on experience, skill set, and location.

serp_jobs.job_alerts.create_a_job

Site Reliability Engineer • San Francisco, California, United States