Talent.com
Platform Engineer, Model Shaping
Platform Engineer, Model ShapingTogether AI • San Francisco, CA, United States
serp_jobs.error_messages.no_longer_accepting
Platform Engineer, Model Shaping

Platform Engineer, Model Shaping

Together AI • San Francisco, CA, United States
job_description.job_card.30_days_ago
serp_jobs.job_preview.job_type
  • serp_jobs.job_card.full_time
job_description.job_card.job_description

About Model Shaping

The Model Shaping team at Together AI works on products and research for tailoring open foundation models to downstream applications. We build services that allow machine learning developers to choose the best models for their tasks and further improve these models using domain-specific data. In addition to that, we develop new methods for more efficient model training and evaluation, drawing inspiration from a broad spectrum of ideas across machine learning, natural language processing, and ML systems.

About the Role

As a Platform Engineer at Model Shaping, you will work on the foundational layers of Together's platform for model customization and evaluation. You will design the infrastructure and backend services that will allow us to sustainably and reliably scale the systems powering production workflows launched by our users, as well as internal research experiments.

You will operate in a cross-functional environment, collaborating with other engineers and researchers in the team to improve the infrastructure based on the needs of projects they work on. You will also interact with other engineering teams at Together (such as Commerce, Data Engineering, and Cloud Infrastructure) to integrate the services developed by Model Shaping with systems developed by those teams.

Responsibilities

  • Design and build Together's systems and infrastructure for model customization, including user-facing features and internal improvements
  • Contribute to reliability improvements for the platform, participating in an on-call rotation and improving processes for incident response
  • Create and improve internal tooling for deployment, continuous integration, and observability
  • Build a job orchestration platform spanning multiple data centers, supporting a highly heterogeneous hardware landscape
  • Partner with teams developing internal services, co-designing these services and incorporating them in systems built by Model Shaping

Requirements

  • 3+ years of experience in building infrastructure or backend components of production services
  • Comfortable with the fundamentals of Linux environments and modern container / orchestration stacks (e.g., Docker and Kubernetes)
  • Strong software engineering background in Python or Go
  • Experienced with infrastructure automation tools (Terraform, Ansible), monitoring / observability stacks (Prometheus, Grafana), and CI / CD pipelines (GitHub Actions, ArgoCD)
  • Skilled with analyzing non-trivial issues of complex software systems and documenting your findings
  • Have cloud environment (e.g., AWS / GCP / Azure) administration experience, preferably with a hybrid bare-metal / cloud environment
  • Strong communication skills, willing to document systems and processes and collaborate with peers of varying technical expertise
  • Stand-out experience

  • Developing large-scale production systems with high reliability requirements
  • Pipeline orchestration frameworks (e.g., Kubeflow, Argo Workflows, Flyte)
  • Managing GPU workloads on HPC clusters, ideally with hands-on experience in operating NVIDIA's networking stack (e.g., NCCL, Mellanox firmware, GPUDirect RDMA)
  • Deployment of services for AI training or inference
  • Maintaining or contributing to open-source projects
  • About Together AI

    Together AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society, and together we are on a mission to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models. We have contributed to leading open-source research, models, and datasets to advance the frontier of AI, and our team has been behind technological advancements such as FlashAttention, RedPajama, SWARM Parallelism, and SpecExec. We invite you to join a passionate group of researchers in our journey in building the next generation AI infrastructure.

    Compensation

    We offer competitive compensation, startup equity, health insurance, and other benefits, as well as flexibility in terms of remote work. The US base salary range for this full-time position is $200,000 - $290,000. Our salary ranges are determined by location, level and role. Individual compensation will be determined by experience, skills, and job-related knowledge.

    Equal Opportunity

    Together AI is an Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more.

    Please see our privacy policy at

    serp_jobs.job_alerts.create_a_job

    Platform Engineer • San Francisco, CA, United States

    Job_description.internal_linking.related_jobs
    Feature Platform Engineer

    Feature Platform Engineer

    Whatnot • San Francisco, California, United States
    serp_jobs.job_card.full_time
    Join the Future of Commerce with Whatnot!.Whatnot is the largest live shopping platform in North America and Europe to buy, sell, and discover the things you love. We’re re-defining e-commerce by bl...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Founding Applied ML Engineer

    Founding Applied ML Engineer

    David AI • San Francisco, California, United States
    serp_jobs.job_card.full_time
    David AI is the first audio data research company.We bring an R&D approach to data–developing datasets with the same rigor AI labs bring to models. Speech is versatile, accessible, and.To unlock the...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Principal Software Engineer AI Platform

    Principal Software Engineer AI Platform

    Snorkel Ai • Redwood City, California, United States
    serp_jobs.job_card.full_time
    At Snorkel, we believe meaningful AI doesn’t start with the model, it starts with the data.We’re on a mission to help enterprises transform expert knowledge into specialized AI at scale.The AI land...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Platform Engineer, Model Shaping

    Platform Engineer, Model Shaping

    Together AI • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    The Model Shaping team at Together AI works on products and research for tailoring open foundation models to downstream applications. We build services that allow machine learning developers to choo...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Senior ML Platform Engineer : Scale Production Models

    Senior ML Platform Engineer : Scale Production Models

    Turo Inc • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    A leading car-sharing platform is seeking a Senior Software Engineer to work with the Machine Learning Engineering team.You'll build scalable systems and integrate machine learning models into the ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Founding Engineer - ML

    Founding Engineer - ML

    Datawizz • San Francisco, California, United States
    serp_jobs.job_card.full_time
    Datawizz helps companies reduce LLM costs by 85% while improving accuracy by over 20% by combining distillation, model routing, and pruning to route requests to smaller, more efficient models.We st...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Machine Learning, Platform Engineer

    Machine Learning, Platform Engineer

    Together Ai • San Francisco, California, United States
    serp_jobs.job_card.full_time
    This role focuses on enabling custom models and dedicated inference on Together.We are responsible for optimizing autoscaling, minimizing cold starts, achieving the best end-to-end model performanc...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Staff Machine Learning Engineer (Modeling), Support

    Staff Machine Learning Engineer (Modeling), Support

    Block • San Francisco, California, United States
    serp_jobs.job_card.full_time
    Block is one company built from many blocks, all united by the same purpose of economic empowerment.The blocks that form our foundational teams — People, Finance, Counsel, Hardware, Information Sec...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    ML Engineer

    ML Engineer

    Wispr Flow • San Francisco, California, United States
    serp_jobs.job_card.full_time
    Wispr Flow is making it as effortless to interact with your devices as talking to a close friend.Voice is the most natural, powerful way to communicate — and we’re building the interfaces to make t...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    ML Engineer

    ML Engineer

    Phizenix • Menlo Park, California, United States
    serp_jobs.job_card.full_time +1
    Client Opportunity | Through Phizenix.Phizenix, a certified minority and women-led recruiting firm, is hiring on behalf of an innovative generative AI startup that’s developing diffusion-based larg...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    AI Infrastructure Engineer, Model Serving Platform

    AI Infrastructure Engineer, Model Serving Platform

    Scale AI, Inc. • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    As a Software Engineer on the ML Infrastructure team, you will design and build platforms for scalable, reliable, and efficient serving of LLMs. Our platform powers cutting-edge research and product...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Machine Learning Engineer

    Machine Learning Engineer

    Skild AI • San Mateo, Pennsylvania, United States
    serp_jobs.job_card.full_time
    At Skild AI, we are building the world's first general purpose robotic intelligence that is robust and adapts to unseen scenarios without failing. We believe massive scale through data-driven machin...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Model Deployment Engineer

    Model Deployment Engineer

    Rime • San Francisco, California, United States
    serp_jobs.job_card.full_time
    Rime builds enterprise-grade voice models that sound truly human — trusted by global telcos, healthcare systems, and leading brands to power billions of real customer interactions.Our mission is to...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_1_day • serp_jobs.job_card.promoted
    Senior Software Engineer - Machine Learning Platform

    Senior Software Engineer - Machine Learning Platform

    Snowflake • Menlo Park, California, United States
    serp_jobs.job_card.full_time
    The Snowflake Machine Learning Platform team’s mission is to enable customers to bring their machine learning and deep learning workloads to Snowflake. Our customers want to build powerful models wi...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Lead Performance Modelling Engineer - Systems & Simulators

    Lead Performance Modelling Engineer - Systems & Simulators

    Flux • San Francisco, CA, US
    serp_jobs.job_card.full_time
    A leading technology company in San Francisco is seeking a Staff Performance Modelling Engineer to develop analytical and simulation models that drive architecture evolution.The ideal candidate wil...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Forward Deployed AI Engineer

    Forward Deployed AI Engineer

    Datologyai • Redwood City, California, United States
    serp_jobs.job_card.full_time
    But a large portion of training compute is wasted training on data that are already learned, irrelevant, or even harmful, leading to worse models that cost more to train and deploy.At DatologyAI, w...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Senior Software Engineer - Machine Learning

    Senior Software Engineer - Machine Learning

    Celonis • Redwood City, California, United States
    serp_jobs.job_card.full_time
    We're Celonis, the global leader in Process Intelligence technology and one of the world's fastest-growing SaaS firms.We believe there is a massive opportunity to unlock productivity by placing AI,...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Staff Machine Learning Platform Engineer

    Staff Machine Learning Platform Engineer

    Faire • San Francisco, California, United States
    serp_jobs.job_card.full_time
    Faire is an online wholesale marketplace built on the belief that the future is local — independent retailers around the globe are doing more revenue than Walmart and Amazon combined, but individua...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_1_day • serp_jobs.job_card.promoted