Talent.com
AI Model Evaluation Specialist
AI Model Evaluation SpecialistInizio Partners • New York, NY, United States
serp_jobs.error_messages.no_longer_accepting
AI Model Evaluation Specialist

AI Model Evaluation Specialist

Inizio Partners • New York, NY, United States
job_description.job_card.30_days_ago
serp_jobs.job_preview.job_type
  • serp_jobs.job_card.full_time
job_description.job_card.job_description

About the job AI Model Evaluation Specialist

Key Responsibilities :

  • Perform scoring and qualitative evaluations ofLLM-generated responses across multiple use cases.
  • Develop and maintain scoring guidelines and rubrics toensure consistency and objectivity.
  • Collaborate with data scientists, product managers, andengineering teams to align scoring with project goals.
  • Assist in the creation and labeling of high-qualityevaluation datasets for prompt tuning or model fine-tuning.
  • Utilize NLP-based metrics and tools (e.g., ROUGE, BLEU,cosine similarity) for automated scoring support.
  • Document scoring patterns, common model errors, andimprovement opportunities.
  • Contribute to prompt experimentation and help compareeffectiveness of different prompt strategies.

Qualifications :

  • Prior experience with LLMs (e.g., GPT, Claude, LLaMA,etc.) or AI / NLP projects is highly preferred.
  • Strong analytical skills and attention to detail,especially in assessing language quality.
  • Familiarity with prompt engineering, generative AI, orconversational AI tools is a plus.
  • Hands-on experience with Python, Jupyter, or evaluationlibraries (optional but desirable).
  • Experience working with evaluation frameworks orannotation tools (Label Studio, Prodigy, etc.) is a bonus.
  • Excellent written and verbal communication skills
  • serp_jobs.job_alerts.create_a_job

    Model • New York, NY, United States

    Job_description.internal_linking.related_jobs
    Senior Software Engineer, AI Evaluation Infra

    Senior Software Engineer, AI Evaluation Infra

    nTop • New York, NY, United States
    serp_jobs.job_card.full_time
    Senior Software Engineer, AI Evaluation Infra.Senior Software Engineer, AI Evaluation Infra.Top is pioneering the future of engineering design with advanced software that pushes the boundaries of p...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    AI Agent Evaluation Analyst (Freelance)

    AI Agent Evaluation Analyst (Freelance)

    Mindrift • New York, NY, US
    serp_jobs.filters.remote
    serp_jobs.job_card.part_time +1
    serp_jobs.filters_job_card.quick_apply
    This opportunity is only for candidates currently residing in the specified country.Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of En...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days
    Director Data Modeling, Measurement Innovation

    Director Data Modeling, Measurement Innovation

    People Inc • New York City, New York, USA
    serp_jobs.job_card.full_time
    Director Data Modeling Measurement Innovation.Dotdash Meredith is seeking a technically strong Director of Data Modeling to lead advanced analytical initiatives within our Measurement team.You will...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    AI Project Lead : Creative Benchmark & Evaluation

    AI Project Lead : Creative Benchmark & Evaluation

    Contra • New York, NY, United States
    serp_jobs.job_card.full_time
    A leading software development company in New York is seeking an AI Project Lead to design and manage the Human Creativity Benchmark. This mid-senior level role focuses on AI evaluation and requires...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    AI Innovation Content Lead

    AI Innovation Content Lead

    Latham & Watkins LLP • New York City, New York, USA
    serp_jobs.job_card.full_time
    Latham & Watkins is a global law firm consistently ranked among the top firms in the world.The success of our firm is largely determined by our commitment to hire and develop the very best and ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Machine Learning Engineer - Model Evaluations, Public Sector

    Machine Learning Engineer - Model Evaluations, Public Sector

    Scale AI, Inc. • New York, NY, United States
    serp_jobs.job_card.full_time
    Machine Learning Engineer - Model Evaluations, Public Sector.The Public Sector ML team at Scale deploys advanced AI systems-including LLMs, agentic models, and multimodal pipelines-into mission-cri...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Senior Software Engineer, AI Evaluation Infra

    Senior Software Engineer, AI Evaluation Infra

    nTopology inc. • New York, NY, United States
    serp_jobs.job_card.full_time
    With a focus on Aerospace & Defense where programs face an impossible reality : deliver next-gen aircraft faster, with fewer experts, and zero tolerance for failure. Top changes how aircraft get desi...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    AIML - ML Researcher, Foundation Models

    AIML - ML Researcher, Foundation Models

    Apple • New York, NY, United States
    serp_jobs.job_card.full_time
    We are a group of engineers and researchers responsible for building foundation models at Apple.We build infrastructure, datasets, and models with fundamental general capabilities such as understan...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Remote Cinematic Video Evaluator - AI Trainer ($45-$45 per hour)

    Remote Cinematic Video Evaluator - AI Trainer ($45-$45 per hour)

    Mercor • Clifton, New Jersey, US
    serp_jobs.filters.remote
    serp_jobs.job_card.full_time
    Overview : • • Mercor is seeking highly discerning • •video evaluators • •.Specifically : artistic professionals such as • •video editors, motion graphics designers, producers, animators, cinematographer a...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Artificial Intelligence (AI) Engagement Lead

    Artificial Intelligence (AI) Engagement Lead

    JPMorganChase • Jersey, New Jersey, USA
    serp_jobs.job_card.full_time
    If youre passionate about translating complex ideas into engaging content and driving the future of Artificial Intelligence (AI) this is your opportunity to shine. As the Artificial Intelligence (AI...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Senior Specialist, Technical Evaluations & Proposals

    Senior Specialist, Technical Evaluations & Proposals

    Resilience • New York, NY, United States
    serp_jobs.job_card.full_time
    A career at Resilience is more than just a job - it's an opportunity to change the future.Resilience is a technology-focused biomanufacturing company that's. We're building a sustainable network of ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Applied AI Research Engineer

    Applied AI Research Engineer

    Norm Ai • New York, New York, United States
    serp_jobs.job_card.full_time
    Norm Ai is the Compliance AI Platform for legal standards-based reasoning & workflow automation.We developed the first Domain Specific Language (DSL) for fully representing regulatory requirements ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Data Modeling Specialist

    Data Modeling Specialist

    Morgan Stanley • New York City, New York, USA
    serp_jobs.job_card.full_time
    Were seeking someone to join our team as a Data Modeling Specialist in NFR Data & Analytics to help execute on our data centric strategy. In the Legal & Compliance division we assist the Fir...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    UX Researcher, AI & LLM (Drug Discovery)

    UX Researcher, AI & LLM (Drug Discovery)

    SQA Solution • New York, NY, United States
    serp_jobs.job_card.full_time
    About the job UX Researcher, AI & LLM (Drug Discovery).New York, NY (Manhattan, onsite).Please note that at this time we are unable to sponsor employment authorization (both new and transfer).A lea...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Machine Learning Systems Engineer - Data & Evaluation, Horizons

    Machine Learning Systems Engineer - Data & Evaluation, Horizons

    Anthropic • New York, New York, United States
    serp_jobs.job_card.full_time
    Anthropic’s mission is to create reliable, interpretable, and steerable AI systems.We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Machine Learning Engineer - Model Evaluations, Public Sector

    Machine Learning Engineer - Model Evaluations, Public Sector

    Scale AI • New York, NY, United States
    serp_jobs.job_card.full_time
    Machine Learning Engineer - Model Evaluations, Public Sector.Louis, MO; New York, NY; Washington, DC.The Public Sector ML team at Scale deploys advanced AI systems—including LLMs, agentic models, a...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Product and Transformation – Applied AI ML Lead

    Product and Transformation – Applied AI ML Lead

    J.P. Morgan • New York, New York, US
    serp_jobs.job_card.full_time
    Join a team of analytics professionals focused on applied AI and quantitative modeling within Consumer and Community Banking that answers complex and unique questions, utilizing cutting edge quanti...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_hours • serp_jobs.job_card.promoted • serp_jobs.job_card.new
    Community ML Research Engineer, non-AI scientific fields - US Remote

    Community ML Research Engineer, non-AI scientific fields - US Remote

    Hugging Face • New York, New York, United States
    serp_jobs.filters.remote
    serp_jobs.job_card.full_time
    At Hugging Face, we’re on a journey to democratize good AI.We are building the fastest growing platform for AI builders with over 5 million users & 100k organizations who collectively shared over 1...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted