Unstructured builds open-source and commercial tools that enable developers to preprocess and transform unstructured data — PDFs, HTML, Word docs, images, and more — for AI / ML pipelines. Our solutions power production-grade, scalable generative AI use cases at leading enterprises.
We’re a team of builders obsessed with performance, simplicity, and reliability. If you’re excited by complex systems, cutting-edge ML infrastructure, and high-impact problems, we’d love to meet you.
We’re looking for a
Site Reliability Engineer to help us scale our infrastructure, automate deployments, and ensure the reliability and performance of our systems as we grow. This role is critical to the health of our platform and will work closely with Engineering, Product, and Customer teams to deliver resilient and efficient software systems for enterprise deployments.
You'll have the opportunity to work across a modern stack (Python, Kubernetes, Helm, CI / CD with GitHub Actions, etc.), influence infrastructure decisions from day one, and help shape reliability culture across the company.
This role is hybrid in San Francisco—join us in-office 3x a week for deep collaboration, whiteboard sessions, and hands-on impact.
What You'll Do
Design and implement highly available, scalable, and observable systems across our platform
Automate infrastructure with tools like Terraform, Pulumi, and build reusable CI / CD pipelines
Maintain and optimize Kubernetes clusters, container orchestration, and service mesh configurations
Set up and manage monitoring and alerting for performance, reliability, and uptime (e.g., Elastic, Prometheus, Grafana, Datadog)
Improve developer velocity through tooling, automation, and infrastructure improvements
Lead or support incident response, root cause analysis, and blameless postmortems
Partner with engineering teams on production readiness, capacity planning, and rollout strategies
What We're Looking For
4+ years of experience in an SRE, DevOps, or Infrastructure Engineering role
Deep expertise in cloud platforms (AWS, GCP, or Azure)
Hands-on experience with Kubernetes, Docker, and container orchestration
Strong skills in Linux systems, networking, and scripting (e.g., Bash, Python, Go)
Proficiency with Infrastructure-as-Code (Terraform, CloudFormation, Ansible, etc.)
Familiarity with monitoring, logging, and observability practices and tools
Experience supporting production systems and operating in high-scale environments
Bonus Points
Experience with machine learning infrastructure or data pipeline systems
Exposure to serverless or event-driven architectures
Contributions to open source projects or DevOps communities
Familiarity with security best practices for cloud-native environments
Why Join Us?
Remote-first team with flexible work style and async collaboration
Opportunity to own critical infrastructure at a fast-growing company
Work on impactful problems at the intersection of data and AI
Competitive salary, equity, and benefits package
Supportive, high-performance team culture
190,000 - $250,000 a year
This role's salary is benchmarked against San Francisco market rates to remain competitive with top-tier talent in high-cost-of-living regions. Final compensation may vary based on experience, skill set, and location.
Site Reliability Engineer • San Francisco, California, United States