A company is looking for a Member of Technical Staff, Model Efficiency.
Key Responsibilities
Improve core performance metrics of ML systems by analyzing model execution and identifying bottlenecks
Collaborate with modeling and systems teams to experiment, measure, and implement optimizations that enhance inference efficiency
Develop advanced performance techniques, including GPU / CUDA optimizations and model execution strategies for large-scale architectures
Required Qualifications
5+ years of experience in writing high-performance, production-quality code
Strong programming skills in C++ or Python (Rust / Go also welcome)
Experience with large language models and the LLM inference ecosystem
Ability to diagnose and resolve performance bottlenecks across the model execution stack
A strong bias for action with a focus on shipping quickly, measuring impact, and iterating
Engineer • Saint Paul, Minnesota, United States