Talent.com
Software Engineer - Pretraining Data
Software Engineer - Pretraining DataMagic Ai • San Francisco, California, United States
Software Engineer - Pretraining Data

Software Engineer - Pretraining Data

Magic Ai • San Francisco, California, United States
job_description.job_card.30_days_ago
serp_jobs.job_preview.job_type
  • serp_jobs.job_card.full_time
job_description.job_card.job_description

Magic’s mission is to build safe AGI that accelerates humanity’s progress on the world’s most important problems. We believe the most promising path to safe AGI lies in automating research and code generation to improve models and solve alignment more reliably than humans can alone. Our approach combines frontier-scale pre-training, domain-specific RL, ultra-long context, and inference-time compute to achieve this goal.

About the role :

As a Software Engineer working on our pretraining data, you write efficient and robust pipelines for giant, multimodal datasets. You will develop and optimize web scraping techniques to harvest and maintain data at internet-scale.

What you might work on :

Design & implement multimodal (video, audio, text etc) web crawlers for scraping and indexing petabytes of data

Create large scale data processing pipelines using tools like Ray, Apache Spark, Apache Flink, Google BigQuery etc.

Implement and scale deduplication techniques across modalities and apply heuristic and model-based techniques for parsing and filtering crawled data

Identify new data sources for inclusion in pre / post-training datasets

What we’re looking for :

Strong proficiency in distributed computing and parallel processing techniques

Obsession with details, reliability, and good testing to ensure data quality and integrity

Experience with designing and maintaining high-performance, scalable data architectures

Ability to design, develop and operate an LLM data pipeline from web scraping to data loading

Magic strives to be the place where high-potential individuals can do their best work. We value quick learning and grit just as much as skill and experience.

Our culture :

Integrity. Words and actions should be aligned

Hands-on. At Magic, everyone is building

Teamwork. We move as one team, not N individuals

Focus. Safely deploy AGI. Everything else is noise

Quality. Magic should feel like magic

Compensation, benefits and perks (US) :

Annual salary range : $100K - $550K

Equity is a significant part of total compensation, in addition to salary

401(k) plan with 6% salary matching

Generous health, dental and vision insurance for you and your dependents

Unlimited paid time off

Visa sponsorship and relocation stipend to bring you to SF, if possible

A small, fast-paced, highly focused team

serp_jobs.job_alerts.create_a_job

Software Engineer Data • San Francisco, California, United States

Job_description.internal_linking.related_jobs
Software Engineer - Platform (Experienced; Remote)

Software Engineer - Platform (Experienced; Remote)

Dagster Labs • San Francisco, CA, United States
serp_jobs.filters.remote
serp_jobs.job_card.full_time
Software Engineer - Platform (Experienced; Remote).Remote with offices in San Francisco, CA / New York, NY / Minneapolis, MN. From scrappy startups to global enterprises, thousands of teams trust us...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Software Engineer - Data / ML

Software Engineer - Data / ML

Koah Labs • San Francisco, CA, United States
serp_jobs.job_card.full_time
Koah Labs is building the ad network to power the next generation of AI-native products.Our mission is to help publishers monetize and help advertisers reach the right audience — without compromisi...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_1_day • serp_jobs.job_card.promoted
Software Engineer, Distributed Data Systems (Sora)

Software Engineer, Distributed Data Systems (Sora)

OpenAI • San Francisco, CA, United States
serp_jobs.job_card.full_time
Software Engineer, Distributed Data Systems (Sora).The Sora team is pioneering multimodal capabilities for OpenAI’s foundation models. We’re a hybrid research and product team focused on integrating...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Software Engineer 2 - Data Acquisition

Software Engineer 2 - Data Acquisition

WEX, Inc. • San Francisco, CA, United States
serp_jobs.job_card.full_time
Data Acquisition Team is the entry point to WEX's Data-as-a-Service (DaaS) platform - responsible for ingesting, validating, and orchestrating raw data from dozens of internal systems and third-par...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Staff Software Engineer, Data Platform

Staff Software Engineer, Data Platform

Social Finance, Inc. (SoFi) • San Francisco, CA, United States
serp_jobs.job_card.full_time
Shape a brighter financial future with us.Together with our members, we’re changing the way people think about and interact with personal finance. We’re a next-generation financial services company ...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Software Engineer, Data Platform Product

Software Engineer, Data Platform Product

Notion • San Francisco, CA, United States
serp_jobs.job_card.full_time
Notion helps you build beautiful tools for your life’s work.In today's world of endless apps and tabs, Notion provides one place for teams to get everything done, seamlessly connecting docs, notes,...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Software Engineer, Enterprise Data Platform

Software Engineer, Enterprise Data Platform

Monograph • San Francisco, CA, United States
serp_jobs.job_card.full_time
We're on a mission to make it possible for every person, team, and company to be able to tailor their software to solve any problem and take on any challenge. Computers may be our most powerful tool...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Staff Software Engineer - Data Cloud

Staff Software Engineer - Data Cloud

Rippling • San Francisco, California, United States, 94102
serp_jobs.job_card.full_time
Rippling is the first way for businesses to manage all of their HR & IT—payroll, benefits, computers, apps, and more—in one unified workforce platform. By connecting every business system to one sou...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30
Staff Software Engineer - Data Platform

Staff Software Engineer - Data Platform

RadiantGraph • San Francisco, CA, United States
serp_jobs.job_card.full_time
Staff Software Engineer – Data Platform.Compensation Range : $196K - $225K per year.We’re hiring a Staff Software Engineer to lead the design and evolution of RadiantGraph’s data platform, the found...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Data Software Engineer : Analytics Platform & Pipelines

Data Software Engineer : Analytics Platform & Pipelines

Zip • San Francisco, CA, United States
serp_jobs.job_card.full_time
A tech company in San Francisco seeks an experienced Software Engineer, Data to build self-serve tools for product analytics and develop customer-facing data products. Ideal candidates have over 3 y...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Senior Software Engineer, Data Platform

Senior Software Engineer, Data Platform

Hayden AI Technologies, Inc. • San Francisco, CA, United States
serp_jobs.job_card.full_time
At Hayden AI, we are on a mission to harness the power of computer vision to transform the way transit systems and other government agencies address real-world challenges.From bus lane and bus stop...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Senior Software Engineer, Data Platform

Senior Software Engineer, Data Platform

Hayden AI • San Francisco, CA, United States
serp_jobs.job_card.full_time
At Hayden AI, we are on a mission to harness the power of computer vision to transform the way transit systems and other government agencies address real-world challenges.From bus lane and bus stop...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Software Engineer, Agent Data Platform

Software Engineer, Agent Data Platform

Sierra Business Solution • San Francisco, CA, United States
serp_jobs.job_card.full_time
Software Engineer, Data Platform & Products.Software Engineer, Data Platform & Products.At Sierra, we’re creating a platform to help businesses build better, more human customer experiences with AI...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_hours • serp_jobs.job_card.promoted • serp_jobs.job_card.new
Software Engineer (Python), Data Platform

Software Engineer (Python), Data Platform

Doximity, Inc. • San Francisco, CA, United States
serp_jobs.job_card.full_time
We're changing the way clinicians communicate.Employment offers will always be made by a Doximity hiring manager or Talent Acquisition Partner with a. We will never text you about an employment oppo...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Software Engineer, AI Data Platform [32729]

Software Engineer, AI Data Platform [32729]

Stealth Startup • San Francisco, CA, United States
serp_jobs.job_card.full_time
The company is redefining how enterprises prepare and optimize data at the most fundamental layer of the AI stack—where raw information becomes usable intelligence. Our technology operates deep in t...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Software Engineer, Data Platform San Francisco; Hybrid

Software Engineer, Data Platform San Francisco; Hybrid

Superhuman Labs, Inc. • San Francisco, CA, United States
serp_jobs.job_card.full_time
Superhuman offers a dynamic hybrid working model for this role.This flexible approach gives team members the best of both worlds : plenty of focus time along with in-person collaboration that helps ...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Principal Cloud & Data-Plane Software Engineer

Principal Cloud & Data-Plane Software Engineer

COMMON - A Users Group • San Francisco, CA, United States
serp_jobs.job_card.full_time
A global IT services provider is seeking a Principal Software Developer to design and launch cloud services from the ground up, focusing on high availability workloads. The ideal candidate will have...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_1_day • serp_jobs.job_card.promoted
Senior Software Engineer, Data Platform

Senior Software Engineer, Data Platform

Verkada • San Mateo, CA, United States
serp_jobs.job_card.full_time
Verkada is a leader in cloud-based B2B physical security.Verkada offers six product lines - video security cameras, access control, environmental sensors, alarms, workplace and intercoms - integrat...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted