Backend Software Engineer – HPC - Large Distributed Systems
Our client is expanding and looking for a backend engineer with a strong interest in distributed systems, open-source technologies, and tackling large-scale challenges. You’ll play a key role in advancing high-performance computing, to execute some of the most complex workloads on Kubernetes.
You’ll be contributing to cutting-edge software that makes large-scale computation faster, smarter, and more reliable. Tackle multi-cluster batch job scheduling for HPC and machine learning workloads.
What you’ll be working on :
- Designing and developing backend systems in Go, python or C++ with a strong emphasis on scalability, reliability, and performance
- Enhancing our Kubernetes-based compute platform, focusing on batch scheduling, orchestration, and workload optimisation
- Building and operating globally distributed systems that handle thousands of jobs across clusters
- Debugging and improving platform performance across Linux systems, networking layers, and containerised environments
- Contributing to open-source projects, collaborating with world-class engineers, and shaping the direction of HPC on Kubernetes
About you :
Strong software engineering background, ideally with experience in Go or another systems-oriented languageHands-on knowledge of Kubernetes internals (controllers, operators, workload scheduling)Experience with distributed systems and event-driven architecturesFamiliarity with HPC environments, DAG workflows, or large-scale batch schedulingComfortable navigating Linux and debugging complex issues across the stack