Talent.com
Supercomputing Engineer
Supercomputing EngineerThe San Francisco Compute Company • San Francisco, CA, United States
serp_jobs.error_messages.no_longer_accepting
Supercomputing Engineer

Supercomputing Engineer

The San Francisco Compute Company • San Francisco, CA, United States
job_description.job_card.30_days_ago
serp_jobs.job_preview.job_type
  • serp_jobs.job_card.full_time
job_description.job_card.job_description

About

Compute is a commodity. We think people should buy it like one.

Startups shouldn’t be forced to buy a year’s worth of compute time in order to get market rate and compute providers shouldn’t go bankrupt because they can’t fully book their clusters.

At SF Compute, our goal is to solve this issue the same way this was solved for every other commodity — by building a venue where compute contracts are traded in real-time and by bringing a new kind of participant into the supply chain, traders.

If we succeed, buyers will be able to get a good price for any order, whether it’s 32 H100s for a month or 8,000 H100s for an hour, and sellers will instantly book out their clusters because traders will speculatively buy them, for a spread. Every FLOP will flow through us somewhere in the supply chain. What Brent is for oil, we will be for compute.

About the Role

ML training clusters are some of the most high performance computers on the planet. Even relatively small clusters would have been in the TOP500 5 years ago. Our supercomputing team is responsible for keeping our compute clusters running smoothly, monitoring hardware health, participating in on-call rotation, and fixing things when they go wrong. We believe strongly in automation — code is the only reliable way to manage hardware at scale. As we scale, this will become a more data-driven role, predicting failures before they happen. We’re a small team, so you’ll be spending time talking to customers as well.

About You

  • You’ve managed at least one GPU training cluster in the past (ideally a cluster with >

1k GPU’s but not required)

  • You deeply understand Linux, networking fundamentals, CUDA, NCCL, and Infiniband
  • You enjoy automating hardware deployments, leveraging IaC wherever possible
  • You appreciate and value good documentation
  • Some Nice to Haves

  • Experience with Rust (our bare metal tooling is written in Rust)
  • Experience with Linux virtualization (KVM, QEMU, libvirt, etc.)
  • Experience with Kubernetes implementation including CRD’s and CNI’s
  • Experience with HPC network architectures (eBGP, fat-tree, VXLAN, MCLAG, etc.)
  • Compensation

    US : $170k - $300k + equity

    Benefits

    Generous equity grant

    Team members are offered a competitive salary along with equity in the company

    Visa Sponsorships

    Yes, we sponsor visas and work permits

    Retirement matching

    We match 401(k) plans up to 4%

    Medical, dental & vision

    We offer competitive medical, dental, vision insurance for employees and dependents and cover 100% of premiums

    Time off

    We offer unlimited paid time off as well as 10+ observed holidays

    Parental leave

    We offer biological, adoptive, and foster parents paid time off to spend quality time with family

    Daily lunch

    We cover lunch daily for employees

    Unlimited office book budget

    You can buy as many books for the office as you want

    The San Francisco Compute Company is committed to maintaining a workplace free from discrimination and harassment.

    We make employment decisions based on business needs, job requirements, and individual qualifications, without regard to race, color, religion, belief, national origin, social or ethical origin, age, physical, mental, or sensory disability, sexual orientation, gender identity or expression, marital status, civil union or domestic partnership status, past or present military service, HIV status, family medical history or genetic information, family or parental status including pregnancy, or any other status protected by law.

    We welcome the opportunity to consider qualified applicants with prior arrest or conviction records. Our commitment to diversity includes hiring talented individuals regardless of their criminal history, in accordance with local, state, and federal laws, including San Francisco’s Fair Chance Ordinance and California’s ban-the-box laws.

    If you require reasonable accommodation for any reason, please reach out to us at team@sfcompute.com.

    #J-18808-Ljbffr

    serp_jobs.job_alerts.create_a_job

    Engineer • San Francisco, CA, United States

    Job_description.internal_linking.related_jobs
    Supercomputing Engineer

    Supercomputing Engineer

    The San Francisco Compute Company • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    We think people should buy it like one.Startups shouldn’t be forced to buy a year’s worth of compute time in order to get market rate and compute providers shouldn’t go bankrupt because they can’t ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Principal DevOps Engineer

    Principal DevOps Engineer

    Informatica LLC • Redwood City, CA, United States
    serp_jobs.job_card.full_time
    Build Your Career at Informatica.We seek innovative thinkers who believe in the power of data to drive meaningful change. At Informatica, we welcome adventurous minds eager to solve the world's most...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    AI Engineer, Evaluation and Reliability

    AI Engineer, Evaluation and Reliability

    Mice Groups • Redwood City, CA, US
    serp_jobs.job_card.permanent
    Senior Engineer, AI Evaluation and Reliability / Contract-to-Hire or Direct Hire / Redwood City / Hybrid, onsite 3 days per week / This position pays $70-80 / hr. W2 for Contract, $140-190K annually u...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Staff Systems Engineer

    Staff Systems Engineer

    Bio-Rad Laboratories • Hercules, CA, United States
    serp_jobs.job_card.full_time
    Working within Bio-Rad's Life Science R&D Group as a Systems Engineer, you will take engineering concepts, requirements and transform them into functional prototypes and finished products that impr...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Controls Engineer

    Controls Engineer

    US Main • Alameda, CA, US
    serp_jobs.job_card.full_time
    Controls Engineer Our mission is to improve life on Earth from space by creating a healthier and more connected planet.Today we offer one of the lowest cost-per-launch dedicated orbital launch serv...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Engineer II - IAM Technologies

    Engineer II - IAM Technologies

    Exelixis • Alameda, CA, United States
    serp_jobs.job_card.full_time
    The Engineer II - Client Technology provides advanced engineering support across a broad range of technologies and platforms. This role plays a critical part in anticipating and resolving escalated ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    HPC Linux Systems Administrator

    HPC Linux Systems Administrator

    Jobot • Berkeley, CA, US
    serp_jobs.job_card.full_time
    This Jobot Job is hosted by : Kurt Holzmuller.Are you a fit? Easy Apply now by clicking the "Apply Now" button and sending us your resume. Salary : $120,000 - $180,000 per year.We are a leading global...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Remote CUDA Kernel Optimizer - ML Engineer - AI Trainer ($120-$250 per hour)

    Remote CUDA Kernel Optimizer - ML Engineer - AI Trainer ($120-$250 per hour)

    Mercor • Redwood City, California, US
    serp_jobs.filters.remote
    serp_jobs.job_card.full_time
    Role Overview • • Mercor is engaging advanced CUDA experts who specialize in GPU kernel optimization, performance profiling, and numerical efficiency. These professionals possess a deep mental model o...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Product Development Engineer, Reagents

    Product Development Engineer, Reagents

    Bruker • Emeryville, CA, United States
    serp_jobs.job_card.full_time +1
    Product Development Engineer, Reagents.Bruker is enabling scientists to make breakthrough discoveries and develop new applications that improve the quality of human life. Bruker's high-performance s...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Mission Operations Ground Systems and Software Engineer (7115C), Space Sciences Laboratory - #82831

    Mission Operations Ground Systems and Software Engineer (7115C), Space Sciences Laboratory - #82831

    University of California-Berkeley • Berkeley, CA, United States
    serp_jobs.job_card.full_time +1
    At the University of California, Berkeley, we are dedicated to fostering a community where everyone feels welcome and can thrive. Our culture of openness, freedom and belonging make it a special pla...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Distinguished Engineer

    Distinguished Engineer

    Scale AI, Inc. • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Job Description : Distinguished Engineer, Enterprise AI.Our mission is to develop reliable AI systems for the world's most important decisions. The Enterprise AI business delivers performant, reliabl...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Mission Operations Ground Systems and Software Engineer (7115C), Space Sciences Laboratory - 82831

    Mission Operations Ground Systems and Software Engineer (7115C), Space Sciences Laboratory - 82831

    InsideHigherEd • Berkeley, California, United States
    serp_jobs.job_card.full_time
    Mission Operations Ground Systems and Software Engineer (7115C), Space Sciences Laboratory - 82831.At the University of California, Berkeley, we are dedicated to fostering a community where everyon...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    R&D Imagineer Principal - Electrical Engineer (Hiring Immediately)

    R&D Imagineer Principal - Electrical Engineer (Hiring Immediately)

    Walt Disney Imagineering • Albany, CA, US
    serp_jobs.job_card.full_time
    Walt Disney Imagineering makes the impossible possible by combining innovation and storytelling to bring Disney stories, characters, and worlds to life. Imagineering is the master planning, creative...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_hours • serp_jobs.job_card.promoted • serp_jobs.job_card.new
    Senior Data Engineer

    Senior Data Engineer

    PG Forsta • Emeryville, CA, United States
    serp_jobs.job_card.full_time
    PG Forsta is the leading experience measurement, data analytics, and insights provider for complex industries-a status we earned over decades of deep partnership with clients to help them understan...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Senior Kubernetes Platform Engineer

    Senior Kubernetes Platform Engineer

    HeartFlow, Inc. • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    A leading medical technology company in San Francisco is seeking an experienced DevOps Engineer to lead the design and implementation of Kubernetes infrastructure. The role demands strong skills in ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Engineer I

    Engineer I

    Marriott International, Inc • San Francisco, CA, US
    serp_jobs.job_card.full_time
    Open availability, Flexible shift, Per the CBA, candidates must be members of Union Local 39 and have completed the apprenticeship program with Local 39. Francis San Francisco on Union Square, 335 P...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_hours • serp_jobs.job_card.promoted • serp_jobs.job_card.new
    Staff Commissioning EngineerProject Engineering • Berkeley, CA • Full time • On-site

    Staff Commissioning EngineerProject Engineering • Berkeley, CA • Full time • On-site

    Form Energy • Berkeley, CA, United States
    serp_jobs.job_card.full_time +1
    Are you ready to build America's energy future? Form Energy is an American manufacturing and energy technology company.We're revolutionizing energy storage with cost-effective, multi-day technology...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Hardware Support Engineer

    Hardware Support Engineer

    Cognizant • Hillsborough, CA, US
    serp_jobs.job_card.full_time
    Cognizant is a leading provider IT and BPO services, providing critical initiatives to a variety of global clients.The Hardware Operations team is a part of a high profile client project that provi...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_hours • serp_jobs.job_card.promoted • serp_jobs.job_card.new