NVIDIA DGX Infrastructure Architect

SyllogisTeksIndianapolis, IN

job_description.job_card.30_days_ago

serp_jobs.job_preview.job_type

serp_jobs.job_card.full_time

job_description.job_card.job_description

This person needs to be local to Indianapolis and available to come into the office 2-3 days / week.

Summary

We're seeking an experienced Infrastructure Architect to design, implement, and optimize NVIDIA DGX environments with a specialized focus on Run : ai orchestration. This role requires deep expertise in GPU-accelerated infrastructure and AI workload management to maximize resource efficiency and scalability.

Key Responsibilities

Architect DGX Solutions : Design and deploy NVIDIA DGX infrastructure. This role will primarily focus on solutions centered around the DGX B300 platform, but strong experience with previous generations, such as the DGX H100 and H200, is highly relevant and valued. A key aspect of this role will be integrating these DGX solutions with Run : ai for dynamic GPU orchestration.

Run : ai Implementation : Configure and manage Run : ai’s AI-native scheduling, resource pooling, and policy engine to optimize GPU utilization across hybrid environments (on-premises, cloud, edge)

Lifecycle Management : Oversee end-to-end AI workflows—from data preparation and model training to deployment—using Run : ai’s unified platform

Access Control : Implement and maintain role-based access control (RBAC) using Run : ai’s predefined roles (e.g., System Admin, Department Admin) and scope-based permissions

Performance Optimization : Monitor and tune cluster performance using Run : ai’s observability tools, ensuring maximal GPU throughput and minimal idle time

Cross-functional Collaboration : Partner with data science and IT teams to align infrastructure capabilities with AI project requirements

Required Qualifications

Technical Expertise :

10+ Years experience in Linux Advanced Compute environments

Proficiency in NVIDIA DGX systems and Kubernetes-based orchestration.

Hands-on experience with Run : ai’s dynamic scheduling, policy engine, and KAI Scheduler

Familiarity with hybrid / multi-cloud GPU resource management (AWS, GCP, Azure).

Operational Skills :

Ability to configure RBAC scopes (departments, projects) and workload prioritization in Run : ai

Experience optimizing distributed AI training and inference workloads.

Proactive Outreach : Initiate and maintain contact with NVIDIA technical teams on ongoing basis

Clear Communication : Ensure clear and consistent communication channels for discussions related to bugs, technical updates, and other issues.

Certifications : NVIDIA DGX System or Run : ai certification preferred.

Preferred Experience

Deploying Run : ai in large-scale AI factories with 100+ GPUs.

Managing NVIDIA AI Enterprise software stacks.

Integrating Run : ai with MLOps pipelines for automated resource provisioning

Familiar with NVIDIA Mission Control AI factory management platform (includes NVIDIA Base Command Manager, Run : ai and software including Autonomous Job Recovery, On-Demand Health Checks, Customizable dashboards)

Familiar with SLURM : bare-metal or containerized access to the compute infrastructure.

Experience with Spectrum-X is a plus

serp_jobs.job_alerts.create_a_job

Infrastructure Architect • Indianapolis, IN

Job_description.internal_linking.related_jobs

serp_jobs.job_card.promoted