Talent.com
AI Engineer - Site Reliability Researcher
AI Engineer - Site Reliability ResearcherTraversal • New York, NY, United States
AI Engineer - Site Reliability Researcher

AI Engineer - Site Reliability Researcher

Traversal • New York, NY, United States
job_description.job_card.30_days_ago
serp_jobs.job_preview.job_type
  • serp_jobs.job_card.full_time
job_description.job_card.job_description

About Traversal

Traversal  is the AI Site Reliability Engineer (SRE) for the enterprise—already trusted by some of the largest companies in the world to troubleshoot, remediate, and even prevent the most complex production incidents. Our mission is to free engineers from endless firefighting and enable them to focus on creative, high-impact work.

Our roots remain deeply embedded in AI research, and we’re channeling that scientific rigor and creativity into building the premier AI agent lab for the enterprise. Hence, what we’re proudest of is assembling the most talented yet nicest group of individuals, including researchers from MIT, Harvard, and Berkeley, to world-class engineers from industry : Citadel Securities, Cockroach Labs, Cerebras Systems, Glean, Nuro, Perplexity, Pinecone, and more, to take on one of the hardest problems for AI to solve. Without the entire team, none of this would be possible.

The Role

As an AI Site Reliability Researcher, you’ll play a central role in ensuring the scalability, reliability, and observability of our AI platform. This is a high-impact, cross-functional role where you’ll design systems and processes that keep our AI-driven infrastructure healthy and performant.

We’re entering a phase of rapid growth and scale, driven by the needs of large enterprise customers. That means pressure on everything from deployments to developer workflows. We’re building our own distributed systems, maturing our CI / CD pipelines, and managing complex hybrid environments (SaaS and on-prem). You’ll play a foundational role in establishing the SRE practices that allow us to scale thoughtfully and reliably.

In this role, you’ll define how we do change management across diverse deployment environments, build internal observability from the ground up, and help bring structure to systems that are evolving quickly. You’ll also be a hands-on user of Traversal — your feedback will shape the product directly. And while your focus will be reliability, you’ll collaborate closely with our infra and AI agent teams, with opportunities to influence how AI integrates with real-world production environments.

Responsibilities

  • Brains Of The Product : Distilling SRE Knowledge into Agentic workflows.
  • System Design & Architecture : Build scalable and resilient infrastructure to support AI observability agents in both cloud and on-prem environments.
  • Observability : Built systems to monitor logs, metrics, and traces tied to deployments and developer activity. Power user of observability tools.
  • Incident Management : Define and lead our on-call and incident response processes, including alerting, debugging, and postmortems.
  • CI / CD & Deployment : Design and scale our in-house CI / CD systems to support safe, efficient rollouts across hybrid environments.
  • Infrastructure Automation : Own our infrastructure-as-code stack and improve automation across deployment and provisioning workflows.

Requirements

  • Experience as an SRE, infrastructure engineer or similar role in fast-paced environments.
  • Exceptional debugging skills across complex, distributed systems — proven ability to get to root cause quickly across varied tech stacks.
  • Strong systems design intuition — understands how observability tools fit into architecture and how to leverage them effectively in incident response.
  • Experience with observability tools (e.g., Datadog, Grafana, Prometheus, OpenTelemetry) and incident response.
  • Deep understanding of infrastructure automation and CI / CD systems.
  • Hands-on experience with Terraform, Kubernetes, and cloud environments (AWS or GCP).
  • Ability to debug distributed systems and drive system-level improvements.
  • Experience supporting hybrid cloud / on-prem deployments and complex change management.
  • Nice to Have

  • Familiarity with AI infrastructure or supporting ML / LLM workloads in production.
  • Background in developer productivity tooling or internal platform teams.
  • Prior experience building systems that connect infra events to developer workflows.
  • Exposure to agentic systems or AI observability platforms.
  • Compensation

    We offer competitive compensation, startup equity, health insurance, and additional benefits. The U.S. base salary range for this full-time, in-person role in New York is $150,000–$300,000, plus equity and benefits. Our salary ranges are based on location, level, and role. Individual compensation is determined by experience, skills, and job-related knowledge.

    Why You Should Join Us

    We’ll make sure you’re fully supported with health insurance, a great tech setup, flexible time off, and plenty of in-office snacks. We offer competitive salary and equity packages, and take thoughtful consideration with every hire on our small, high-impact team.

    Traversal is fully in-office, 5 days a week, based in New York near Madison Square Park. We have a collaborative, hard-working culture and are energized by building the future of AI-powered software maintenance.

    Working here means owning meaningful parts of the product, having the flexibility to move fast, and learning constantly. This is a place to grow your career, make a real impact, and help define a new category of infrastructure software.

    serp_jobs.job_alerts.create_a_job

    Site Reliability Engineer • New York, NY, United States

    Job_description.internal_linking.related_jobs
    Site Reliability Engineer

    Site Reliability Engineer

    S&P Global • New York, New York, United States
    serp_jobs.job_card.full_time
    This job is with S&P Global, an inclusive employer and a member of myGwork – the largest global platform for the LGBTQ+ business community. Please do not contact the recruiter directly.About the Rol...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Machine Learning Research Engineer, Agent Data Foundation - Enterprise GenAI

    Machine Learning Research Engineer, Agent Data Foundation - Enterprise GenAI

    Scale AI, Inc. • New York, NY, United States
    serp_jobs.job_card.full_time
    AI is becoming vitally important in every function of our society.At Scale, our mission is to accelerate the development of AI applications. For 9 years, Scale has been the leading AI data foundry, ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Founding Research Engineer

    Founding Research Engineer

    Carbion • New York, NY, United States
    serp_jobs.job_card.full_time
    Founder & CEO | Activate Fellow | Forbes 30U30 | TIME Best Inventions.Carbion is a Brooklyn‑based company tackling global and domestic graphite challenges with a proprietary thermochemical platform...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Site Reliability Engineer (Genetec)

    Site Reliability Engineer (Genetec)

    STAND 8 Technology Consulting • Englewood Cliffs, NJ, United States
    serp_jobs.job_card.full_time
    STAND 8 provides end to end IT solutions to enterprise partners across the United States and with offices in Los Angeles, New York, New Jersey, Atlanta, and more including internationally in Mexico...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Founding Audio AI Research Engineer

    Founding Audio AI Research Engineer

    David AI • New York, NY, United States
    serp_jobs.job_card.full_time
    David AI is the first audio data research company.We bring an R&D approach to data–developing datasets with the same rigor AI labs bring to models. Our mission is to bring AI into the real world, an...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    GenAI Evaluation Scientist Enterprise LLM Systems

    GenAI Evaluation Scientist Enterprise LLM Systems

    Scale AI • New York, NY, United States
    serp_jobs.job_card.full_time
    A leading AI technology company is seeking an AI Research Engineer to join their Enterprise Evaluations team.In this critical role, you will enhance evaluation systems for LLM-powered workflows.Can...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Applied Researcher II (AI Foundations, LLM Core and Agentic AI)

    Applied Researcher II (AI Foundations, LLM Core and Agentic AI)

    Capital One • New York, New York, United States
    serp_jobs.job_card.full_time +1
    Applied Researcher II (AI Foundations, LLM Core and Agentic AI).Overview : At Capital One, we are creating trustworthy and reliable AI systems, changing banking for good. For years, Capital One has b...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_hours • serp_jobs.job_card.promoted • serp_jobs.job_card.new
    Senior Technology Site Reliability Engineer

    Senior Technology Site Reliability Engineer

    Cooley LLP • New York, NY, United States
    serp_jobs.job_card.full_time
    Senior Technology Site Reliability Engineer.Cooley is seeking a Senior Site Reliability Engineer to join the.Infrastructure & Development Operations. The Senior Technology Site Reliability Engineer(...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Tekgence Private Ltd • Jersey City, NJ, United States
    serp_jobs.job_card.full_time
    serp_jobs.filters_job_card.quick_apply
    Site Reliability Engineer Location : Jersey City, NJ Day 1 onsite Hybrid Required skills Python,...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days
    AI Research Engineer, Enterprise Evaluations

    AI Research Engineer, Enterprise Evaluations

    Scale AI, Inc. • New York, NY, United States
    serp_jobs.job_card.full_time
    Scale AI is seeking a technically rigorous and driven.This high-impact role is critical to our mission of delivering the industry's leading. You will be a hands-on contributor to the core systems th...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Machine Learning Research Engineer, Agents - Enterprise GenAI

    Machine Learning Research Engineer, Agents - Enterprise GenAI

    Scale AI, Inc. • New York, NY, United States
    serp_jobs.job_card.full_time
    AI is becoming vitally important in every function of our society.At Scale, our mission is to accelerate the development of AI applications. For 9 years, Scale has been the leading AI data foundry, ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Site Reliability Engineer (Genetec) (Englewood Cliffs)

    Site Reliability Engineer (Genetec) (Englewood Cliffs)

    STAND 8 Technology Consulting • Englewood Cliffs, NJ, US
    serp_jobs.job_card.part_time
    STAND 8 provides end to end IT solutions to enterprise partners across the United States and with offices in Los Angeles, New York, New Jersey, Atlanta, and more including internationally in Mexico...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Applied AI Researcher, System Self-Construction

    Applied AI Researcher, System Self-Construction

    Distyl AI • New York, NY, United States
    serp_jobs.job_card.full_time
    Distyl AI develops AI native technologies for humans & AI to collaborate to power the operations of the Global Fortune 1000. In just 24 months, we've rapidly grown to partner with some of the world'...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    AI Engineer

    AI Engineer

    Oscar • New York, NY, US
    serp_jobs.job_card.full_time
    AI Engineer - Healthcare Automation Platform.Healthcare AI | Hybrid (NYC preferred) or Remote.AI-powered automation platform. Our system processes complex, unstructured data to ensure time-sensitive...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    AI Solutions Engineer - Remote

    AI Solutions Engineer - Remote

    Finn Partners • New York, NY, US
    serp_jobs.filters.remote
    serp_jobs.job_card.full_time
    serp_jobs.filters_job_card.quick_apply
    The AI Solutions Engineer is the primary builder, architect, and technical steward of the organization's internal AI capabilities. This role is responsible for the design, development, and operation...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30
    Founding AI / ML Engineer (Private Equity - NYC)

    Founding AI / ML Engineer (Private Equity - NYC)

    Averity • New York, NY, United States
    serp_jobs.job_card.full_time
    We’re a Private Equity / investment firm embarking on setting up our A.The goal is to build AI-powered tools to transform how we manage due diligence, investor relations, and portfolio operations a...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Founding AI / ML Engineer (Private Equity - NYC) (New York)

    Founding AI / ML Engineer (Private Equity - NYC) (New York)

    Averity • New York, NY, US
    serp_jobs.job_card.part_time
    Were a Private Equity / investment firm embarking on setting up our A.The goal is to build AI-powered tools to transform how we manage due diligence, investor relations, and portfolio operations ac...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Research Engineer / Research Scientist, Tokens

    Research Engineer / Research Scientist, Tokens

    Anthropic • New York, NY, United States
    serp_jobs.job_card.full_time
    Anthropic’s mission is to create reliable, interpretable, and steerable AI systems.We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted