Talent.com
Infrastructure Engineering - Traffic
Infrastructure Engineering - TrafficxAI • San Francisco, CA, United States
Infrastructure Engineering - Traffic

Infrastructure Engineering - Traffic

xAI • San Francisco, CA, United States
job_description.job_card.30_days_ago
serp_jobs.job_preview.job_type
  • serp_jobs.job_card.full_time
job_description.job_card.job_description

xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company’s mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All engineers are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.

About the Role

In this role, you will be a key contributor to xAI’s Supercomputing team, focusing on building and optimizing scalable, high-performance traffic platforms that power our production inference engines. You will work on critical systems that manage traffic flow, service discovery, and network reliability across both on-premise and cloud-based Kubernetes clusters. Collaborating closely with Network Fabric Engineers and other technical teams, you will drive projects that enhance the stability and efficiency of our AI infrastructure, including support for large-scale training runs for advanced models like Grok 4 and beyond. This role demands deep technical expertise in Kubernetes, L4 / L7 proxies like Envoy, and service discovery systems, along with a proactive approach to debugging and optimizing complex network performance issues from L3 to L7.

What you’ll do

  • Build and optimize traffic platforms that automate and simplify the lifecycle of production inference engines across dozens of on-premise and cloud clusters, managing core traffic primitives like load balancing, routing, overload control, authentication / authorization, encryption in transit.
  • Manage, extend, and optimize xAI’s production inference capabilities with L4 / L7 proxies such as Envoy, NGINX.
  • Manage and extend xAI’s Service Discovery systems, both in and outside of Kubernetes (DNS, xDS control planes).
  • Collaborate with Network Fabric Engineers to improve host networking + fabric stability for large scale training runs (ie Grok 4 and beyond).
  • Work with a fast, small technical team to execute projects in the critical path of xAI.

What we’d like to see

  • 2+ years of experience operating Kubernetes clusters, or experience writing + deploying controllers.
  • 2+ years of experience configuring and deploying Envoy, NGINX, HAProxy, or some other L7 software load balancer.
  • 1+ years of experience deploying and configuring kubernetes CNI plugins (Calico, Cilium, Flannel) or experience with IPAM.
  • 1+ years of experience with DNS systems (ex : CoreDNS, Unbound) or service discovery control planes (xDS)
  • 1+ years of experience with cloud networking primitives (VPC Route Tables, Cloud NAT, Peering / Transit Gateways, CDN, Cloudflare Workers or equivalent)
  • Experience with host level network proxies (iptables, nftables, IPVS, eBPF programs) is a plus.
  • Deep experience with gRPC Client libraries (grpcio / grpc-go / grpc-java) is a plus.
  • Experience with service mesh (Istio, Linkerd) is a plus.
  • Demonstrated experience in working with Kubernetes and Envoy internals – can you tell us how k8s cached clients work? Can you tell us how Envoy scales and manages state?
  • Demonstrated experience debugging performance and reliability issues that span from L3 to L7 (ex : how would a gRPC client in a cloud environment call a gRPC server in an on-prem server? Describe the entire network path and any issues to watch out for, including Service Discovery / DNS, gRPC channel management, egress proxies, VPC routing, peering / PNI, edge caching / CDN, L4 loadbalancing devices, host networking + virtualization, k8s networking, L7 routing, TLS / authnz, TCP / IP)
  • Location

    This role is based in the Bay Area (San Francisco and Palo Alto). Candidates are expected to be located near the Bay Area or open to relocation.

    Envoy / xDS

    Golang and Rust

    Interview Process

    Application Review : Submit your CV and a statement of exceptional work. Our team will review your application to assess fit.

    Phone Interview (45 minutes) : A brief conversation with a team member to discuss your background, key accomplishments, and motivation.

    Main Interview Process

  • 2 Coding Assessments : Solve problems in a language of your choice.
  • Systems Hands-On : Demonstrate practical skills in a live problem-solving session.
  • Project Deep-Dive : Present your past exceptional work to a small audience.
  • Annual Salary Range

    $180,000 - $440,000 USD

    Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks.

    Note

    We welcome a variety of formats, such as public writings, presentations, or publications. Submission is optional but highly encouraged.

    #J-18808-Ljbffr

    serp_jobs.job_alerts.create_a_job

    Traffic Engineering • San Francisco, CA, United States

    Job_description.internal_linking.related_jobs
    Infrastructure Engineer

    Infrastructure Engineer

    FAR.AI • Berkeley, California, United States
    serp_jobs.job_card.full_time
    AI is a non-profit AI research institute dedicated to ensuring advanced AI is safe and beneficial for everyone.Our mission is to facilitate breakthrough AI safety research, advance global understan...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Staff Infrastructure Engineer

    Staff Infrastructure Engineer

    Ironclad • San Francisco, California, United States
    serp_jobs.filters.remote
    serp_jobs.job_card.full_time
    Ironclad is the #1 contract lifecycle management platform for innovative companies.Every company, in every country, in every industry runs on contracts, but managing these contracts slows companies...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Staff Infrastructure Engineer

    Staff Infrastructure Engineer

    Replit • Foster City, California, United States
    serp_jobs.job_card.full_time
    Replit is the agentic software creation platform that enables anyone to build applications using natural language.With millions of users worldwide and over 500,000 business users, Replit is democra...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Infrastructure Platform Engineer

    Infrastructure Platform Engineer

    NS IT Solutions • San Francisco, California, USA
    serp_jobs.job_card.full_time
    Title : Infrastructure / Platform Engineer (AI Voice & Social Product) - w / Equity.Location : San Francisco CA (onsite 5 days a week). As a Founding Infrastructure / Platform Engineer oversee cloud da...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Infrastructure Engineer

    Infrastructure Engineer

    LangChain • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    At LangChain, our mission is to make intelligent agents ubiquitous.We provide the agent engineering platform and open source frameworks developers need to ship reliable agents fast.Our open source ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Lead Cloud Infrastructure Engineer

    Lead Cloud Infrastructure Engineer

    Together Ai • San Francisco, California, United States
    serp_jobs.job_card.full_time
    Together AI is hiring a Lead Cloud Infrastructure Engineer to own and operate the cloud foundation that powers our rapidly scaling data platforms. In this role, you will be the primary engineer resp...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Platform & Infrastructure Engineer

    Platform & Infrastructure Engineer

    Mindsdb • San Francisco, California, United States
    serp_jobs.job_card.full_time
    MindsDB is a fast-growing AI startup headquartered in San Francisco, California.MindsDB is an AI Analytics solution that connects to diverse data sources and applications then unifies structured an...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Lead Platform Engineer (Network Infrastructure)

    Lead Platform Engineer (Network Infrastructure)

    Capital One • San Francisco, CA, United States
    serp_jobs.job_card.full_time +1
    Lead Platform Engineer (Network Infrastructure) at Capital One.Network Design & Architecture – Plan and develop network infrastructure based on business needs. Create bill of materials and obtain qu...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Infrastructure Engineer

    Infrastructure Engineer

    Tempo • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Tempo is a layer-1 blockchain purpose-built for stablecoins and real-world payments, born from Stripe’s experience in global payments and Paradigm’s expertise in crypto tech.Tempo’s payment-first d...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Senior Traffic Engineer – Supervising Traffic Engineer

    Senior Traffic Engineer – Supervising Traffic Engineer

    AECOM • San Francisco, California, USA
    serp_jobs.job_card.full_time
    Senior Traffic Engineer Supervising Traffic Engineer.Summary of Responsibilities : .Senior technical resource may serve as technical advisor for team. Provides specialized technical input to st...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Infrastructure Platform Engineer

    Infrastructure Platform Engineer

    Fieldguide • San Francisco, California, USA
    serp_jobs.job_card.full_time
    Fieldguide is establishing a new state of trust for global commerce and capital markets through automating and streamlining the work of assurance and audit practitioners specifically within cyberse...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Infrastructure Engineer

    Infrastructure Engineer

    Mercor • San Francisco, California, United States
    serp_jobs.job_card.full_time
    Mercor is training models that predict how well someone will perform on a job better than a human can.We use our platform to source, vet, and onboard expert contractors who help train AI models in ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Infrastructure Engineer

    Infrastructure Engineer

    Langchain • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    At LangChain, our mission is to make intelligent agents ubiquitous.We provide the agent engineering platform and open source frameworks developers need to ship reliable agents fast.Our open source ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Lead Infrastructure Engineer

    Lead Infrastructure Engineer

    PIP Labs • San Francisco, California, United States
    serp_jobs.job_card.full_time
    Story aims to grow the creativity of the internet.The internet has introduced Story is building the IP infrastructure for the internet era, where creativity and intelligence move at the speed of cu...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Infrastructure Engineer

    Infrastructure Engineer

    Vibecode • San Francisco, California, United States
    serp_jobs.job_card.full_time
    We're democratizing software creation.Our platform lets anyone describe an idea and instantly turn it into a working application—no coding required. We're solving one of computing's fundamental chal...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Traffic Engineer

    Traffic Engineer

    Aceolution • San Francisco, CA, US
    serp_jobs.job_card.full_time
    Job Description : Transit City Manager.The team works on keeping Maps Data current and reflective of real-world changes.They work on issues submitted by end users of Maps or test the location result...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Principal Infrastructure Engineer

    Principal Infrastructure Engineer

    Nextdata Technologies Inc • San Francisco, California, United States
    serp_jobs.job_card.full_time
    The future of data lies in decentralization, and the concept of a data mesh is the proven approach for implementing this at Enterprise scale. We’re here to make it a reality.Nextdata OS is a data-me...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Infrastructure Engineer (Hybrid Cloud & Platform)

    Infrastructure Engineer (Hybrid Cloud & Platform)

    Aldea Inc • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Location : US Remote / Bay Area.Aldea is a multi-modal foundational AI company reimagining the scaling laws of intelligence. We believe today's architectures create unnecessary bottlenecks for the ev...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted