Talent.com
Data Engineer - Scientific Data Ingestion (San Francisco)
Data Engineer - Scientific Data Ingestion (San Francisco)Mithrl • San Francisco, CA, US
Data Engineer - Scientific Data Ingestion (San Francisco)

Data Engineer - Scientific Data Ingestion (San Francisco)

Mithrl • San Francisco, CA, US
job_description.job_card.variable_days_ago
serp_jobs.job_preview.job_type
  • serp_jobs.job_card.part_time
job_description.job_card.job_description

ABOUT MITHRL

We envision a world where novel drugs and therapies reach patients in months, not years, accelerating breakthroughs that save lives.

Mithrl is building the worlds first commercially available AI Co-Scientista discovery engine that empowers life science teams to go from messy biological data to novel insights in minutes. Scientists ask questions in natural language, and Mithrl answers with real analysis, novel targets, and patent-ready reports. No coding. No waiting. No bioinformatics bottlenecks.

We are the fastest growing tech-bio startup in the Bay Area with over 12X YoY revenue growth. Our platform is already being used by teams at some of the largest biotechs and big pharma across three continents to accelerate and uncover breakthroughsfrom target discovery to mechanism of action.

WHAT YOU WILL DO

Build and own an AI-powered ingestion & normalization pipeline to import data from a wide variety of sources unprocessed Excel / CSV uploads, lab and instrument exports, as well as processed data from internal pipelines.

Develop robust schema mapping, coercion, and conversion logic (think : units normalization, metadata standardization, variable-name harmonization, vendor-instrument quirks, plate-reader formats, reference-genome or annotation updates, batch-effect correction, etc.).

Use LLM-driven and classical data-engineering tools to structure semi-structured or messy tabular data extracting metadata, inferring column roles / types, cleaning free-text headers, fixing inconsistencies, and preparing final clean datasets.

Ensure all transformations that should only happen once (normalization, coercion, batch-correction) execute during ingestion so downstream analytics / the AI Co-Scientist always works with clean, canonical data.

Build validation, verification, and quality-control layers to catch ambiguous, inconsistent, or corrupt data before it enters the platform.

Collaborate with product teams, data science / bioinformatics colleagues, and infrastructure engineers to define and enforce data standards, and ensure pipeline outputs integrate cleanly into downstream analysis and storage systems.

WHAT YOU BRING

Must-have

  • 5+ years of experience in data engineering / data wrangling with real-world tabular or semi-structured data.
  • Strong fluency in Python, and data processing tools (Pandas, Polars, PyArrow, or similar).
  • Excellent experience dealing with messy Excel / CSV / spreadsheet-style data inconsistent headers, multiple sheets, mixed formats, free-text fields and normalizing it into clean structures.
  • Comfort designing and maintaining robust ETL / ELT pipelines, ideally for scientific or lab-derived data.
  • Ability to combine classical data engineering with LLM-powered data normalization / metadata extraction / cleaning.
  • Strong desire and ability to own the ingestion & normalization layer end-to-end from raw upload final clean dataset with an eye for maintainability, reproducibility, and scalability.
  • Good communication skills; able to collaborate across teams (product, bioinformatics, infra) and translate real-world messy data problems into robust engineering solutions.

Nice-to-have

  • Familiarity with scientific data types and modalities (e.g. plate-readers, genomics metadata, time-series, batch-info, instrumentation outputs).
  • Experience with workflow orchestration tools (e.g. Nextflow, Prefect, Airflow, Dagster), or building pipeline abstractions.
  • Experience with cloud infrastructure and data storage (AWS S3, data lakes / warehouses, database schemas) to support multi-tenant ingestion.
  • Past exposure to LLM-based data transformation or cleansing agents building or integrating tools that clean or structure messy data automatically.
  • Any background in computational biology / lab-data / bioinformatics is a bonus though not required.
  • WHAT YOU WILL LOVE AT MITHRL

  • Mission-driven impact : youll be the gatekeeper of data quality ensuring that all scientific data entering Mithrl becomes clean, consistent, and analysis-ready. Youll have outsized influence over the reliability and trustworthiness of our entire data + AI stack.
  • High ownership & autonomy : this role is yours to shape. You decide how ingestion works, define the standards, build the pipelines. Youll work closely with our product, data science, and infrastructure teams shaping how data is ingested, stored, and exposed to end users or AI agents.
  • Team : Join a tight-knit, talent-dense team of engineers, scientists, and builders
  • Culture : We value consistency, clarity, and hard work. We solve hard problems through focused daily execution
  • Speed : We ship fast (2x / week) and improve continuously based on real user feedback
  • Location : Beautiful SF office with a high-energy, in-person culture
  • Benefits : Comprehensive PPO health coverage through Anthem (medical, dental, and vision) + 401(k) with top-tier plans
  • serp_jobs.job_alerts.create_a_job

    Data Engineer • San Francisco, CA, US

    Job_description.internal_linking.related_jobs
    Data Engineer (San Francisco)

    Data Engineer (San Francisco)

    Midjourney • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Midjourney is a research lab exploring new mediums to expand the imaginative powers of the human species.We are a small, self-funded team focused on design, human infrastructure, and AI.We have no ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Principal Data Platform Engineer

    Principal Data Platform Engineer

    Harnham • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    San Francisco, CA (Remote Eligible – US Only).Are you ready to lead the design and build of a world-class data platform from the ground up?. A high-growth, product-led tech company is looking for a ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Senior Data Engineer, Data Lake & Governance

    Senior Data Engineer, Data Lake & Governance

    Gridware • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Get AI-powered advice on this job and more exclusive features.Gridware is a San Francisco-based technology company dedicated to protecting and enhancing the electrical grid.We pioneered a groundbre...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Lead Data Engineer, AI Platform (Relocation Included)

    Lead Data Engineer, AI Platform (Relocation Included)

    OpenAI • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    A leading AI research organization is seeking a Data Engineer to build essential data pipelines in San Francisco.This role involves designing robust systems for data processing and collaborating wi...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Lead Data Engineer (San Francisco)

    Lead Data Engineer (San Francisco)

    Mentor Talent Acquisition • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Were looking for a Lead Data Engineer to spearhead the design, implementation, and iteration of a world-class, modern data infrastructure that powers analytics, data science, and ML / AI systems.You ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Senior Data Engineer (San Francisco)

    Senior Data Engineer (San Francisco)

    Sigmaways Inc • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    If youre hands on with modern data platforms, cloud tech, and big data tools and you like building solutions that are secure, repeatable, and fast, this role is for you. As a Senior Data Engineer, y...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Data Platform Engineer / AI Workloads (San Francisco)

    Data Platform Engineer / AI Workloads (San Francisco)

    The Crypto Recruiters • San Francisco, CA, US
    serp_jobs.job_card.part_time +1
    We are actively searching for a Data Infrastructure Engineer to join our team on a permanent basis.In this founding engineer role you will focus on building next-generation data infrastructure for ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Senior AI & Data Platform Engineer (Onsite)

    Senior AI & Data Platform Engineer (Onsite)

    Icon Ventures • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    A leading technology firm in San Francisco is looking for a Staff AI & Data Platform Engineer to design and develop scalable machine learning infrastructure. This role involves cross-functional coll...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Staff Data Engineer

    Staff Data Engineer

    Visa • Foster City, CA, United States
    serp_jobs.job_card.full_time
    Visa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more t...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Staff Data Engineer - Scale a Global Data Platform

    Staff Data Engineer - Scale a Global Data Platform

    Checkr, Inc. • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    A leading data technology company is seeking a Staff Data Engineer to join their Data Platform team in San Francisco.This role involves architecting and building high-performance data systems while...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_hours • serp_jobs.job_card.promoted • serp_jobs.job_card.new
    Senior Data Platform Engineer (Remote)

    Senior Data Platform Engineer (Remote)

    Linktree • San Francisco, CA, United States
    serp_jobs.filters.remote
    serp_jobs.job_card.full_time
    A leading data platform company in San Francisco is looking for a Software Engineer to design and implement a robust data platform. Your role will directly impact the experiences of millions of user...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Senior Data Engineer

    Senior Data Engineer

    PG Forsta • Emeryville, CA, United States
    serp_jobs.job_card.full_time
    PG Forsta is the leading experience measurement, data analytics, and insights provider for complex industries-a status we earned over decades of deep partnership with clients to help them understan...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Senior Data Engineer, Card Data Platform

    Senior Data Engineer, Card Data Platform

    Capital One • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    A financial services company in San Francisco seeks a Distinguished Data Engineer to lead innovation in data architecture and management. The role involves building critical data solutions, mentorin...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Data Engineer (San Francisco)

    Data Engineer (San Francisco)

    Fluency • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Fluency is enabling the autonomous Enterprise.You're needed to help pioneer a new software category that will change how enterprises work. Welcome to the data layer of the future.Fluency is looking ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Data Engineer (San Francisco)

    Data Engineer (San Francisco)

    Odiin. • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Youll work closely with engineering, analytics, and product teams to ensure data is accurate, accessible, and efficiently processed across the organization. Design, develop, and maintain scalable da...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Principal Data Engineer

    Principal Data Engineer

    Autodesk, Inc. • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Senior Data Engineer page is loaded## Senior Data Engineerlocations : San Francisco, CA, USA : AMER - United States - Washington - Offsite / Home : AMER - United States - California - Offsite / Home...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Senior Data Platform Engineer

    Senior Data Platform Engineer

    Ellipsis Health • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Ellipsis Health is creating cutting-edge AI / ML products that solve healthcare staffing issues and administrative burdens using conversational AI and our patented voice biomarker technology in the d...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Data Engineer - Scientific Data Ingestion (San Francisco)

    Data Engineer - Scientific Data Ingestion (San Francisco)

    Mithrl • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    We envision a world where novel drugs and therapies reach patients in months, not years, accelerating breakthroughs that save lives. Mithrl is building the worlds first commercially available AI Co-...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted