Talent.com
Senior Data Engineer, Data Curation
Senior Data Engineer, Data CurationFormation Bio • San Francisco, California, United States
Senior Data Engineer, Data Curation

Senior Data Engineer, Data Curation

Formation Bio • San Francisco, California, United States
job_description.job_card.30_days_ago
serp_jobs.job_preview.job_type
  • serp_jobs.job_card.full_time
job_description.job_card.job_description

About Formation Bio

Formation Bio is a tech and AI driven pharma company differentiated by radically more efficient drug development.

Advancements in AI and drug discovery are creating more candidate drugs than the industry can progress because of the high cost and time of clinical trials. Recognizing that this development bottleneck may ultimately limit the number of new medicines that can reach patients, Formation Bio, founded in 2016 as TrialSpark Inc., has built technology platforms, processes, and capabilities to accelerate all aspects of drug development and clinical trials. Formation Bio partners, acquires, or in-licenses drugs from pharma companies, research organizations, and biotechs to develop programs past clinical proof of concept and beyond, ultimately helping to bring new medicines to patients. The company is backed by investors across pharma and tech, including a16z, Sequoia, Sanofi, Thrive Capital, Sam Altman, John Doerr, Spark Capital, SV Angel Growth, and others.

You can read more at the following links :

  • Our Vision for AI in Pharma
  • Our Current Drug Portfolio
  • Our Technology & Platform

At Formation Bio, our values are the driving force behind our mission to revolutionize the pharma industry. Every team and individual at the company shares these same values, and every team and individual plays a key part in our mission to bring new treatments to patients faster and more efficiently.

About the Position

As a Senior Data Engineer at Formation Bio, you will focus on building the semantic layer that makes diverse data pillars interoperable, consistent, and actionable. You’ll work across healthcare (EHR, claims, real-world data), commercial / pharma (pricing, formulary, market data), biomedical (scientific and trial data), and finance (operational and business datasets) to design models that unify disparate sources into a common language for analytics, decision-making, and AI applications.

While ingestion pipelines are part of the work, your primary responsibility will be transforming both structured and unstructured data into scalable, ontology-driven data models that teams can trust and reuse. This includes everything from traditional relational datasets to text-heavy unstructured sources that feed NLP, embeddings, and semantic search.

This role requires partnering closely with engineers, analysts, data scientists, and business stakeholders to ensure every data pillar is represented in a robust semantic foundation that supports today’s needs and tomorrow’s AI-native platforms.

Responsibilities

  • Semantic Modeling & Ontologies : Build and maintain SQL / dbt models that unify datasets across healthcare, commercial / pharma, biomedical, and finance domains, leveraging ontologies (e.g., SNOMED CT, ICD, RxNorm, HL7 FHIR, OMOP).
  • Structured + Unstructured Data Integration : Design models that handle not only structured datasets but also unstructured data sources (e.g., documents, free text, biomedical literature), preparing them for AI-driven applications.
  • Data Layer Architecture : Own and evolve the semantic layer that transforms raw data into consistent, reusable models powering analytics and advanced AI.
  • Ingestion & Integration : Contribute to pipelines that bring in data from APIs, partner feeds, flat files, and unstructured text, ensuring inputs are reliable, well-documented, and metadata-rich.
  • Data Quality & FAIR Principles : Apply FAIR principles to ensure data is traceable, interoperable, and reusable across structured and unstructured domains.
  • Cross-functional Collaboration : Partner with commercial, scientific, finance, and healthcare stakeholders to align semantic models with real-world use cases.
  • Enablement & Documentation : Document data standards and reusable modeling patterns to empower downstream teams and reduce cognitive load.
  • Future-Proofing : Anticipate how today’s semantic modeling will support tomorrow’s AI workflows such as NLP, embeddings, knowledge graphs, and retrieval-augmented generation.
  • About You

    Required Experience :

  • 5+ years of experience as a Data Engineer, Analytics Engineer, or similar role in healthcare, pharma, biotech, finance, or other highly regulated industries.
  • Deep expertise in at least one data domain (e.g., healthcare / EHR / claims, commercial / pharma, biomedical / scientific, or finance), with a track record of translating complex, domain-specific datasets into consistent and usable models.
  • Strong SQL and data modeling skills, with proven experience designing semantic or analytical layers.
  • Exposure to additional domains beyond your core area of expertise, and the ability to learn and adapt to new datasets quickly.
  • Experience working with both structured data (e.g., relational tables, APIs) and unstructured data (e.g., documents, free text, biomedical literature, healthcare notes).
  • Familiarity with healthcare / life sciences ontologies (SNOMED CT, ICD, RxNorm, LOINC, HL7 FHIR, OMOP, Mondo) and / or financial / commercial taxonomies.
  • Preferred Experience (Valued but Not Required) :

  • Hands-on experience with Snowflake, dbt, Dagster, and modern data stacks.
  • Experience with unstructured data workflows (NLP, embeddings, semantic search, knowledge graphs).
  • Understanding of regulatory and compliance considerations in healthcare, pharma, or finance.
  • Practical use of metadata management and data catalog platforms.
  • Hands-on experience structuring dbt projects with testing, quality checks, and reusable design patterns.
  • Key Attributes :

  • Curious & Investigative – Always looking deeper into how and why datasets work the way they do.
  • Structured & Methodical – Brings rigor to semantic modeling, ontology mapping, and data quality management.
  • Collaborative Partner – Works seamlessly across pillars, enabling others while owning core responsibilities.
  • Adaptable – Leverages deep domain expertise while learning quickly in unfamiliar data areas.
  • Enablement-Minded – Strives to reduce complexity for downstream users by standardizing and documenting.
  • Future-Oriented – Builds today’s models with tomorrow’s AI-native and data-driven applications in mind.
  • Formation Bio is prioritizing hiring in key hubs, primarily the New York City and Boston metro areas, with additional growth in the Research Triangle (NC) and San Francisco Bay Area. Please only apply if you reside in these locations or are willing to relocate.

    Compensation :

    The target salary range for this role is : $180,000 - $230,000.

    Salary ranges are informed by a number of factors including geographic location. The range provided includes base salary only. In addition to base salary, we offer equity, comprehensive benefits, generous perks, hybrid flexibility, and more. If this range doesn't match your expectations, please still apply because we may have something else for you.

    You will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, or veteran status.

    #LI-hybrid

    serp_jobs.job_alerts.create_a_job

    Senior Data Engineer • San Francisco, California, United States

    Job_description.internal_linking.related_jobs
    Senior Data Engineer

    Senior Data Engineer

    Baselayer • San Francisco, California, United States
    serp_jobs.job_card.full_time
    Trusted by 2,200+ financial institutions, Baselayer is the intelligent business identity platform that helps verify any business, automate KYB, and monitor real-time risk.Baselayer’s B2B risk solut...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Senior Data Engineer

    Senior Data Engineer

    Genentech • South San Francisco, CA, United States
    serp_jobs.job_card.full_time
    It's what drives us to innovate.To continuously advance science and ensure everyone has access to the healthcare they need today and for generations to come. Creating a world where we all have more ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Senior Data Engineer

    Senior Data Engineer

    Gallup • San Francisco, California, United States
    serp_jobs.job_card.full_time
    Engineer data systems that change how people live and work.As a senior data engineer at Gallup, you’ll play a key role in designing, developing and optimizing the data systems that underpin our fla...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Senior Data Engineer, Data Lake & Governance

    Senior Data Engineer, Data Lake & Governance

    Gridware • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Get AI-powered advice on this job and more exclusive features.Gridware is a San Francisco-based technology company dedicated to protecting and enhancing the electrical grid.We pioneered a groundbre...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Senior Data Engineer

    Senior Data Engineer

    Probably Genetic • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Probably Genetic is changing the lives of patients living with severe, complex diseases.Our data platform is used by drug developers and patient advocacy groups to develop and launch treatments for...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Data Engineer, Senior

    Data Engineer, Senior

    Pacific Gas And Electric Company • Oakland, CA, United States
    serp_jobs.job_card.full_time
    Job Category : Information Technology.Job Level : Individual Contributor.Business Unit : Information Technology.Information Systems Technology Services is a unified organization comprised of various d...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_1_day • serp_jobs.job_card.promoted
    Senior Data Engineer

    Senior Data Engineer

    Apple • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    At Apple, we focus deeply on our customers’ experience.Apple Ads brings this same approach to advertising, helping people find exactly what they’re looking for and helping advertisers grow their bu...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Senior Data Engineer

    Senior Data Engineer

    Eleven Recruiting • Menlo Park, CA, United States
    serp_jobs.job_card.full_time
    This is a highly confidential search and will require a signed NDA to disclose the company name.The role is based in Los Angeles, CA. We are a specialized technology staffing agency supporting profe...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Senior Data Engineer

    Senior Data Engineer

    EDO • San Francisco, California, USA
    serp_jobs.job_card.full_time
    EDO is the TV outcomes company.Our leading measurement platform connects convergent TV airings to the ad-driven consumer behaviors most predictive of future sales. EDO empowers the advertising indus...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Data Engineer, Senior

    Data Engineer, Senior

    PG&E • Oakland, CA, United States
    serp_jobs.job_card.full_time
    Job Category : Information Technology.Job Level : Individual Contributor.Business Unit : Information Technology.Information Systems Technology Services is a unified organization comprised of various d...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Data Engineer, Senior

    Data Engineer, Senior

    PG&E Corporation • Oakland, CA, United States
    serp_jobs.job_card.full_time
    Job Category : Information Technology .Job Level : Individual Contributor.Business Unit : Information Technology.Information Systems Technology Services is a unified organization comprised of various ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Senior Data Engineer

    Senior Data Engineer

    Xscion • San Francisco, California, United States
    serp_jobs.filters.remote
    serp_jobs.job_card.full_time
    As an employee, you Turn Change Into Value® - for our clients, for our company, for your professional growth, for the consumers. We hire the best and brightest, who are driven to create lasting valu...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Senior Data Engineer

    Senior Data Engineer

    PG Forsta • Emeryville, CA, United States
    serp_jobs.job_card.full_time
    PG Forsta is the leading experience measurement, data analytics, and insights provider for complex industries-a status we earned over decades of deep partnership with clients to help them understan...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Senior Data Engineer

    Senior Data Engineer

    Together Ai • San Francisco, California, United States
    serp_jobs.job_card.full_time
    Together AI is looking for a Senior Data Engineer to help define, build, and operate the data infrastructure that handles millions of events every day to power Together’s mission-critical systems.A...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Senior Data Engineer

    Senior Data Engineer

    Checkr • San Francisco, California, United States
    serp_jobs.job_card.full_time
    Checkr is building the data platform to power safe and fair decisions.Established in 2014, Checkr’s innovative technology and robust data platform help customers assess risk and ensure safety and c...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Senior Data Engineer, Insights

    Senior Data Engineer, Insights

    Decagon • San Francisco, California, United States
    serp_jobs.job_card.full_time
    Decagon is the leading conversational AI platform empowering every brand to deliver concierge customer experience.Our AI agents provide intelligent, human-like responses across chat, email, and voi...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Senior Data Engineer, Customer Operations

    Senior Data Engineer, Customer Operations

    Block • San Francisco, California, United States
    serp_jobs.job_card.full_time
    Since we opened our doors in 2009, the world of commerce has evolved immensely, and so has Square.After enabling anyone to take payments and never miss a sale, we saw sellers stymied by disparate, ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Senior Data Integration Engineer

    Senior Data Integration Engineer

    Epoch Biodesign • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Sunnyvale, CA - US, San Francisco, CA - US.Crusoe's mission is to accelerate the abundance of energy and intelligence.We’re crafting the engine that powers a world where people can create ambitious...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted