Senior AI Engineer – Computer Vision & Foundation Model Training (US)
About the Role
Location United States of America Florida Orlando Remote vs. Office Hybrid (Remote / Office) Company Siemens Energy, Inc. Organization Grid Technologies Business Unit Digital Grid Full / Part time Full-time Experience Level Experienced Professional
A Snapshot of Your Day
We are seeking a highly skilled and driven Senior AI Engineer to join our team as a founding member, developing the critical data and AI infrastructure for training vision models and other foundation models for power grid applications. You will be instrumental in building and optimizing the end-to-end systems, data pipelines, and training processes that will power our AI research. Working closely with research scientists, you will translate cutting-edge research into robust, scalable, and efficient implementations, enabling the rapid development and deployment of transformational AI solutions. This role requires deep hands-on expertise in distributed training, data engineering, and some MLOps - a proven track record of building scalable AI infrastructure.
How You’ll Make an Impact
- Design, build, and optimize everything necessary for large-scale training and / or fine-tuning with different model architectures. Design and optimize the full training stack, from data ingestion and preprocessing to model training and inference pipelines, with a focus on maximizing Model Flop Utilization (MFU) across multi-node GPU clusters.
- Collaborate closely and proactively with research scientists, translating research ideas and algorithms into high-performance, production-ready code on our infrastructure. Ability to rapidly implement, iterate and test ideas from research publications or open-source codebases.
- Relentlessly profile and resolve training performance bottlenecks, optimizing every layer of the training stack from data loading to model inference for speed and efficiency.
- Contribute to technology evaluations and selection of hardware, software, and cloud services that will define our AI infrastructure platform.
- Experience with MLOps frameworks (MLFlow, WnB, etc) to implement best practices across the model lifecycle – development, training, validation, and monitoring – ensuring reproducibility, reliability, and continuous improvement.
- Create thorough documentation for infrastructure, data pipelines, and training procedures, ensuring maintainability and knowledge transfer within the growing AI lab.
- Stay at the forefront of advancements in AI for large-scale training methods and data engineering, and proactively driving improvements and innovation in our workflows and infrastructure.
- High-agency individual demonstrating initiative, problem-solving, and a commitment to delivering robust and high quality code.
What You Bring
Bachelor's or Master's degree in Computer Science, Engineering, or a related technical field.3 or more years of hands-on experience in AI Engineering / Machine Learning Engineering.Deep practical expertise with AI frameworks (PyTorch, Pytorch Lightning, TorchTitan, etc). Hands-on experience with large-scale multi-node GPU training, and other optimization strategies for developing computer vision models / other foundation models. Ability to scale solutions involving large datasets and complex models on distributed compute infrastructure.Proven history and background working with Computer Vision related tasks and projects.Excellent problem-solving, debugging, and performance optimization skills, with a data-driven approach to identifying and resolving technical challenges; Strong communication and teamwork skills, with a collaborative approach to working with research scientists and other engineers.Experience with MLOps best practices for model tracking, evaluation and deployment.A track record of open-source contributions to relevant projects is a BIG PLUS.Bonus Points :
Experience writing CUDA / Triton / CUTLASS kernels.Experience with performance monitoring and profiling tools for distributed training and data pipelines.Experience with vision foundation models or multimodal architectures.Publications or presentations in top-tier AI conferences (NeurIPS, CVPR, ICML, etc.) are a strong plus.