GPGPU Performance Tooling Engineer

Initio CapitalSanta Clara, CA, US

job_description.job_card.variable_days_ago

serp_jobs.job_preview.job_type

serp_jobs.job_card.full_time

job_description.job_card.job_description

Job Description

Location : Hybrid – Santa Clara, CA or New York, NY

Type : Full-Time | Salary : $150K–$300K + Competitive Equity

Visa Sponsorship : H-1B, O-1, OPT Available

About the Opportunity

Initio Capital is hiring a Performance Tooling Engineer on behalf of a stealth-stage systems company building custom RISC-V infrastructure with AI acceleration at its core . The company is led by silicon and systems veterans and backed by tier-1 investors. Their vision : deliver ultra-efficient, secure, and high-performance compute across ML, analytics, and next-gen workloads.

This role focuses on performance visibility at the lowest levels— instrumenting how deep learning workloads actually perform across simulators, FPGAs, and physical hardware.

About the Role

As a GPGPU Performance Tooling Engineer , you’ll own and extend the company’s profiling infrastructure—building low-overhead instrumentation to track performance bottlenecks and throughput gaps on GPU-like accelerators.

You’ll work hands-on with frameworks like Perfetto , contribute to open-source tooling, and collaborate closely with hardware and compiler teams to align insights with optimization strategies.

What You’ll Do

Build and extend internal performance tooling, with a focus on Perfetto-based profiling

Develop instrumentation layers for real-time and post-run analysis across simulators, emulation, FPGAs, and silicon

Analyze bottlenecks in memory bandwidth, latency, and compute throughput on custom GPGPU-like architectures

Collaborate with software, compiler, and silicon design teams to prioritize optimizations

Automate collection and visualization of performance signals for hardware bring-up and AI inference workflows

Contribute back to open-source projects where appropriate

What We’re Looking For

2–5+ years of experience in low-level systems profiling or performance tooling

Deep fluency in Perfetto , Protobuf , and systems programming (C or C++)

Strong understanding of computer architecture, memory systems, and runtime behavior

Experience building and interpreting GPGPU performance traces

Ability to work independently and collaboratively across deep technical domains

Bonus Points

Experience profiling GPGPU execution and optimizing ML workloads

Familiarity with deep learning frameworks like PyTorch or TensorFlow

Knowledge of memory subsystem bottlenecks (e.g., DRAM bandwidth, shared memory stalls)

Working proficiency in Rust or scripting languages used in performance tooling

Contributions to open-source observability, tracing, or instrumentation frameworks

Compensation & Perks

Salary : $150K – $300K

Equity : Competitive early-stage grant

Hybrid in Santa Clara, CA or New York, NY

Visa sponsorship available (H-1B, O-1, OPT)

Join a founding engineering team at the edge of silicon and software

Shape the performance visibility layer that powers next-gen AI acceleration

If you want to build the tools that uncover what truly limits performance in modern compute systems—this is the role.

Apply now to join a deeply technical, mission-driven team.

serp_jobs.job_alerts.create_a_job

Tooling Engineer • Santa Clara, CA, US