Talent.com
Data Engineer
Data EngineerInstitute Of Foundation Models • Sunnyvale, California, United States
Data Engineer

Data Engineer

Institute Of Foundation Models • Sunnyvale, California, United States
job_description.job_card.30_days_ago
serp_jobs.job_preview.job_type
  • serp_jobs.job_card.full_time
job_description.job_card.job_description

About the Institute of Foundation Models

We are a dedicated research lab for building, understanding, using, and risk-managing foundation models. Our mandate is to advance research, nurture the next generation of AI builders, and drive transformative contributions to a knowledge-driven economy.

As part of our team, you’ll have the opportunity to work on the core of cutting-edge foundation model training, alongside world-class researchers, data scientists, and engineers, tackling the most fundamental and impactful challenges in AI development. You will participate in the development of groundbreaking AI solutions that have the potential to reshape entire industries. Strategic and innovative problem-solving skills will be instrumental in establishing MBZUAI as a global hub for high-performance computing in deep learning, driving impactful discoveries that inspire the next generation of AI pioneers.

The Role

As a Data Engineer specializing in Natural Language Processing (NLP) and large-scale data processing, you will quickly and effectively gather, curate, and prepare high-quality datasets to support cutting-edge NLP research. Your role will be instrumental in enabling researchers by delivering essential data through efficient and scalable engineering practices, including web crawling, LLM-generated content refinement, and robust data pipelines, primarily leveraging Python and related technologies.

Key Responsibilities

  • Rapidly collect, curate, and preprocess datasets based on detailed specifications provided by NLP researchers, delivering data within tight timelines (typically within 1-2 days).
  • Develop and maintain efficient web crawling solutions, APIs, and automated workflows to continuously improve data collection processes.
  • Refine and evaluate outputs from Large Language Models (LLMs) to generate structured datasets suitable for model training and benchmarking.
  • Implement scalable data pipelines, ensuring efficient data processing, storage, retrieval, and distribution to research teams.
  • Collaborate closely with researchers and engineers to ensure collected data meets specified quality and relevance criteria.
  • Document data collection methodologies, dataset characteristics, and pipeline architecture clearly and effectively.
  • Engage with peer teams and participate in technical reviews to uphold best practices and data quality standards.
  • Represent MBZUAI at industry and research forums, showcasing technical capabilities in large-scale data processing and AI data infrastructure.
  • Perform all other duties as reasonably directed by the line manager commensurate with these functional objectives.

Academic Qualifications

  • Bachelor's degree in Computer Science, Data Science, Engineering, or a related technical field required
  • Master’s degree or equivalent experience in Computer Science, Data Engineering, or related technical fields preferred.
  • Professional Experience - Required

  • Extensive experience in data engineering, data processing, and automation using Python.
  • Demonstrated proficiency in designing and deploying web crawling solutions, automated data extraction, and processing pipelines.
  • Strong understanding of data structures, algorithms, databases, SQL, and performance optimization.
  • Experience working with cloud infrastructure and distributed data processing frameworks (e.g., AWS, Spark, Kafka, Kubernetes).
  • Excellent problem-solving abilities, attention to detail, and the capability to rapidly address technical challenges.
  • Strong communication and collaboration skills with cross-functional teams.
  • Professional Experience - Preferred

  • Proven track record of supporting NLP or AI research teams with rapid and reliable data delivery.
  • Experience with refining outputs from large-scale AI models, such as LLM-generated data.
  • Contributions to open-source projects, coding competitions, or high visibility in coding communities (e.g., GitHub, Stack Overflow).
  • Familiarity with the latest advancements in NLP data processing and large language model technologies.
  • $100,000 - $500,000 a year

    Visa Sponsorship

    This position is eligible for visa sponsorship.

    Benefits Include

  • Comprehensive medical, dental, and vision benefits
  • Bonus
  • 401K Plan
  • Generous paid time off, sick leave and holidays
  • Paid Parental Leave
  • Employee Assistance Program
  • Life insurance and disability
  • serp_jobs.job_alerts.create_a_job

    Data Engineer • Sunnyvale, California, United States

    Job_description.internal_linking.related_jobs
    Data Engineer | 2025PX05009 | 477|DB-26453

    Data Engineer | 2025PX05009 | 477|DB-26453

    Mindverse Consulting Services • Mountain View, California, United States
    serp_jobs.filters.remote
    serp_jobs.job_card.full_time
    We are looking for experienced contract data / software engineer contractors to support the Multi-Cloud Efficiency (MCE) team in scaling cost attribution infrastructure and improving financial visibi...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    DATA ENGINEER

    DATA ENGINEER

    Purple Drive • Pleasanton, CA, United States
    serp_jobs.job_card.full_time
    The Senior Data Engineer will be responsible for designing, building, and maintaining robust data pipelines and architectures on AWS to support scalable data processing, storage, and analytics.The ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Lead Data Engineer

    Lead Data Engineer

    eTeam • Fremont, CA, United States
    serp_jobs.job_card.full_time
    Location : Fremont, CA (Hybrid 3 days a week from office).Min 5 years of experience in modern data engineering / data warehousing / data lakes technologies on cloud platforms like Azure, AWS, GCP, Data ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Software Engineer - Data Engine

    Software Engineer - Data Engine

    Applied Intuition • Sunnyvale, CA, United States
    serp_jobs.job_card.full_time
    Applied Intuition is the vehicle intelligence company that accelerates the global adoption of safe, AI-driven machines.Founded in 2017 and now valued at $15 billion following its recent Series F fu...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Builder - Data Engineer (Growth)

    Builder - Data Engineer (Growth)

    Reevo • Santa Clara, California, United States
    serp_jobs.job_card.full_time
    BUILDER – DATA ENGINEER (GROWTH).Santa Clara until SF office opens in Q3, then may need to travel 1 day per week to Santa Clara or on as needed basis). You’ve built and scaled data infrastructure fr...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Lead Data Engineer

    Lead Data Engineer

    Inizio Partners • Fremont, CA, United States
    serp_jobs.job_card.full_time
    About the job Lead Data Engineer.Location : Fremont, CA(3 days onsite, 2 days work from home).Candidate should Provide technical expertise in needs identification, data modelling, data movement, tra...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Staff Data Engineer

    Staff Data Engineer

    Coupand • Mountain View, California, United States
    serp_jobs.job_card.full_time
    How did we ever live without Coupang?" Born out of an obsession to make shopping, eating, and living easier than ever,.We are one of the fastest-growing e-commerce companies that.We are proud to ha...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_1_day • serp_jobs.job_card.promoted
    Data Engineer - Open on W2 only

    Data Engineer - Open on W2 only

    Dataflix • San Jose, CA, United States
    serp_jobs.filters.remote
    serp_jobs.job_card.full_time
    We are looking for a Data Engineer to build out and scale our Analytics platform.As a member of the team, you will be responsible for building and scaling a robust platform that will act as the dri...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Data Engineer

    Data Engineer

    VEEV • Hayward, CA, United States
    serp_jobs.job_card.full_time
    Veev is leading the transformation of the construction industry with an innovative approach to modular home construction. Veev's prefabricated closed-panel systems allow homes to be assembled effici...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Data Engineer

    Data Engineer

    Diverse Lynx • Pleasanton, CA, United States
    serp_jobs.job_card.full_time
    Job Overview : We are seeking a talented and motivated Data Engineer with expertise in Spark SQL, Databricks, Azure Data Factory (ADF), SQL, IICS, Unix, PySpark, Python, and Azure Data Lake Storage ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Data Engineer

    Data Engineer

    Balbix • San Jose, California, United States
    serp_jobs.job_card.full_time
    The Balbix Security Cloud uses AI and automation to reinvent how the World's leading organizations reduce their cyber risk. With Balbix, security teams can accurately inventory their cloud and on-pr...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Databricks Data Engineer

    Databricks Data Engineer

    Tekfortune Inc • Pleasanton, CA, United States
    serp_jobs.job_card.permanent
    Tekfortune is a fast-growing consulting firm specialized in permanent, contract & project-based staffing services for world's leading organizations in a broad range of industries.In this quickly ch...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Lead Data Engineer

    Lead Data Engineer

    Midi Health • Palo Alto, California, United States
    serp_jobs.job_card.full_time
    We're looking for a Lead Data Engineer to spearhead design, implementation, and iteration of a world-class, modern data infrastructure that will power all of analytics, data science, and ML / AI syst...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Data Engineer

    Data Engineer

    Apex Informatics • Pleasanton, CA, United States
    serp_jobs.job_card.full_time
    Bachelor's degree or equivalent experience in computer science, applied math, physics, engineering, statistics, economics or related field. SQL, Python, PySpark, Jupyter Notebook, database design, a...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Data Engineer

    Data Engineer

    CData Software • Pleasanton, CA, United States
    serp_jobs.job_card.full_time
    Should have good experience with Spark SQL, Databricks, Azure Data Factory (ADF), SQL, IICS, Unix, PySpark, Python, and Azure Data Lake Storage (ADLS). We are seeking a talented and motivated Data E...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Data Engineer (Onsite, US)

    Data Engineer (Onsite, US)

    Wipro Technologies • Fremont, California, United States
    serp_jobs.job_card.full_time
    Location : Fully Onsite in either Fremont, CA or Austin, TX.This position is a fully onsite position and not eligible for relocation. Salary : Up to $90,000 + 10% performance bonus (DOE & Geographic L...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Lead Data Engineer

    Lead Data Engineer

    Mentor Talent Acquisition • Hayward, CA, United States
    serp_jobs.job_card.full_time
    We’re looking for a Lead Data Engineer to spearhead the design, implementation, and iteration of a world-class, modern data infrastructure that powers analytics, data science, and ML / AI systems.You...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Role : Data Engineer

    Role : Data Engineer

    Info Way Solutions • Fremont, CA, United States
    serp_jobs.job_card.full_time
    This is Sangeetha from Info Way Solutions, LLC We have job opening for.Job description is given below : .Kindly check the JD and share your view. Location : Montreal Canada ( Hybrid - 3days in a week)....serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted