Talent.com
HPC System Software Engineer
HPC System Software EngineerLawrence Berkeley National Laboratory • Berkeley, CA, United States
HPC System Software Engineer

HPC System Software Engineer

Lawrence Berkeley National Laboratory • Berkeley, CA, United States
job_description.job_card.30_days_ago
serp_jobs.job_preview.job_type
  • serp_jobs.job_card.full_time
job_description.job_card.job_description

Lawrence Berkeley National Laboratory is hiring an HPC System Software Engineer within the National Energy Research Scientific Computing Center (NERSC) division.

In this exciting role, you will be pivotal in architecting, developing, deploying, and supporting the software that forms the backbone of NERSC's world-class supercomputing infrastructure. Your primary role will be to engineer robust, scalable, dynamic, and automated solutions for high-performance computing (HPC) system management and large-scale monitoring, directly enabling the operation of NERSC's flagship systems, including the current Perlmutter supercomputer and the upcoming Doudna system.

You will join a collaborative environment, working with engineers at NERSC, other national laboratories, leading HPC vendors, and vibrant open-source communities. This is a unique opportunity to build the foundational software that powers world-class scientific research and to define the future of programmable, data-driven HPC data centers, as well as the American Science Cloud.

The selected candidate(s) will be hired at the Computer Systems Engineer 3 or 4 (CSE3 or CSE4) depending on their level skills and experience.

You Will (Level 3) :

Develop and maintain software for automated provisioning, configuration management, and orchestration across thousands of servers, with a focus on the OpenCHAMI system management software stack.

Contribute to the development and operation of NERSC's large-scale data center monitoring framework.

Analyze system telemetry and logs to debug complex, system-wide issues, identify performance bottlenecks.

Develop and maintain plugins for the Slurm workload manager.

Identify and automate operational tasks and system management processes to improve the efficiency, reliability, and scalability of HPC systems.

Participate in the full lifecycle of HPC systems, including installation, configuration, testing, operation, and maintenance.

Contribute to a shared on-call rotation to provide 24x7 support for critical HPC systems and infrastructure.

Take ownership of new technical assignments, determine appropriate methods and procedures, and coordinate the activities of other personnel on smaller projects or focused technical efforts.

Collaborate with vendors to troubleshoot bugs, provide feedback on technical requirements, and track the resolution of issues affecting NERSC HPC and monitoring systems.

Evaluate and test new technologies, software, and system architectures to inform future designs.

Contribute code and engage with open-source communities that are critical to the HPC ecosystem, representing NERSC's technical interests.

Work on and resolve complex issues where analysis of situations or data requires an in-depth evaluation of variable factors.

In Addition to Above, You Will (Level 4) :

Design major software components for system management and monitoring, creating long-term roadmaps to ensure scalability, reliability, and future-readiness.

Lead the technical vision for key areas of the system software stack, making critical design decisions that impact the entire HPC ecosystem.

Proactively identify, evaluate, and champion emerging technologies and architectural patterns that can significantly enhance NERSC's capabilities, performance, and operational efficiency.

Solve the most significant and ambiguous technical issues, often requiring cross-functional team collaboration and an in-depth, multi-faceted analysis of complex systems.

Lead the implementation and deployment of critical system improvements, taking full ownership of projects from conception and requirements gathering through to production operation and support.

Provide technical leadership and mentorship to team members and colleagues across NERSC, guiding best practices in software design, development, security, and operations.

Act as a primary technical liaison with HPC vendors and partner institutions, driving the co-development of features and solutions that meet NERSC's strategic needs.

Represent NERSC in national and international forums, technical working groups, and open-source communities, influencing the direction of future HPC technologies to benefit the scientific community.

Determine methods and procedures on new or complex assignments, and formally coordinate the activities of other engineers to achieve project goals.

Work on and resolve significant and unique issues where analysis of situations or data requires an evaluation of intangibles.

We Are Looking For (Level 3) :

Typically requires a minimum of 8 years of related experience with a Bachelor's degree; or 6 years and a Master's degree; or equivalent experience.

Minimum of 4 years of experience with systems programming in Linux environments or management of large-scale Linux-based systems in a high-performance computing, cloud computing, or hyper-scale environment.

Experience with some or all of our key technologies :

containers (such as Docker or Kubernetes)

configuration management (such as Ansible or Puppet)

monitoring and observability (such as VictoriaMetrics, Prometheus, or Nagios)

virtualization (such as Proxmox or Harvester)

git-based CI / CD pipelines (such as GitLab runners or GitHub Actions)

continuous delivery tools (such as Argo CD or Flux)

modern programming languages (such as Go or Rust)

complex scripting with tools such as Python 3 or bash

Familiarity with provisioning tools (such as Chef, Foreman, or Terraform)

Working knowledge of software engineering best practices for performance and security.

Demonstrated experience in to resolving complex issues in creative and effective ways.

Excellent oral and written communication skills.

Demonstrated ability to work effectively as part of a cross-disciplinary team.

In Addition to Above, We Are Looking For (Level 4) :

Typically requires a minimum of 12 years of related experience with a Bachelor's degree; or 8 years and a Master's degree; or equivalent experience.

Experience leading and coordinating complex software projects.

Experience with software lifecycle management, from planning through retirement

Strong Linux systems programming skills and knowledge of Linux system internals.

Demonstrated experience in working on and resolving significant and unique issues where analysis of situations or data requires an evaluation of intangibles.

Ability to exercise independent judgment in methods, techniques and evaluation criteria for obtaining results.

We're here for the same mission, to bring science solutions to the world. Join our team and YOU will play a supporting role in our goal to address global challenges! Have a high level of impact and work for an organization associated with 17 Nobel Prizes!

Why join Berkeley Lab?

We invest in our employees by offering a total rewards package you can count on :

Exceptional health and retirement benefits , including pension or 401K-style plans

Opportunities to grow in your career - check out our Tuition Assistance Program

A culture where you'll belong - we are invested in our teams!

In addition to accruing vacation and sick time, we also have a Winter Holiday Shutdown every year.

Parental bonding leave (for both mothers and fathers)

Pet insurance

Additional Information :

Application Date : Priority consideration will be given to candidates who apply by November 14, 2025 . Applications will be accepted until the job posting is removed.

Appointment Type : This is a full-time, career appointment, exempt (monthly paid) from overtime pay.

Salary Range :

Level 3 : The expected salary for this position is $156,864 - $191,724, which fits into the full salary of $139,440 - $235,308 depending upon the candidate's skills, knowledge, and abilities, including education, certifications, and years of experience.

Level 4 : The expected salary for this position is $178,644 - $218,364, which fits into the full salary of $158,808 - $267,996 depending upon the candidate's skills, knowledge, and abilities, including education, certifications, and years of experience.

Background Check : This position is subject to a background check. Any convictions will be evaluated to determine if they directly relate to the responsibilities and requirements of the position. Having a conviction history will not automatically disqualify an applicant from being considered for employment.

Work Modality : This position requires substantial on-site presence, but is eligible for a flexible work mode, and hybrid schedules may be considered. Hybrid work is a combination of performing work on-site at Lawrence Berkeley National Lab, 1 Cyclotron Road, Berkeley, CA and some telework. Individuals working a hybrid schedule must reside within 150 miles of Berkeley Lab. Work schedules are dependent on business needs. In rare cases, full-time telework or remote work modes may be considered. A REAL ID or other acceptable form of identification is required to access Berkeley Lab sites (for more information click here ).

Multi-level Posting : This position will be hired at a level commensurate with the business needs and the skills, knowledge, and abilities of the successful candidate.

Want to learn more about working at Berkeley Lab? Please visit : careers.lbl.gov

Equal Employment Opportunity Employer : The foundation of Berkeley Lab is our Stewardship Values : Team Science, Service, Trust, Innovation, and Respect; and we strive to build community with these shared values and commitments. Berkeley Lab is an Equal Opportunity Employer. We heartily welcome applications from all who could contribute to the Lab's mission of leading scientific discovery, excellence, and professionalism. In support of our rich global community, all qualified applicants will be considered for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, age, protected veteran status, or other protected categories under State and Federal law.

Berkeley Lab is a University of California employer. It is the policy of the University of California to undertake affirmative action and anti-discrimination efforts, consistent with its obligations as a Federal and State contractor.

Misconduct Disclosure Requirement : As a condition of employment, the finalist will be required to disclose if they are subject to any final administrative or judicial decisions within the last seven years determining that they committed any misconduct, are currently being investigated for misconduct, left a position during an investigation for alleged misconduct, or have filed an appeal with a previous employer.

serp_jobs.job_alerts.create_a_job

Software Engineer • Berkeley, CA, United States

Job_description.internal_linking.related_jobs
Health Sensing HW - Systems Engineer

Health Sensing HW - Systems Engineer

Apple Inc. • San Francisco, CA, United States
serp_jobs.job_card.full_time
San Francisco Bay Area, California, United States Hardware.The Health Sensing Hardware team develops innovative and groundbreaking health sensing technologies and features, and integrates them into...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Software Engineer - Intelligent Systems (Berkeley)

Software Engineer - Intelligent Systems (Berkeley)

Lawrence Harvey • Berkeley, CA, United States
serp_jobs.job_card.full_time
Software Engineer Intelligent Systems.Location : Onsite in Berkeley, CA.Compensation : Up to $135K base salary.My client is a Series C renewable-energy automation unicorn, founded in 2019 and backed...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
C++ Software Engineer — Systems

C++ Software Engineer — Systems

Vast.ai • San Francisco, CA, United States
serp_jobs.job_card.full_time
Interested in building your career at Vast.Get future opportunities sent straight to your email.Accepted file types : pdf, doc, docx, txt, rtf. Accepted file types : pdf, doc, docx, txt, rtf.Are you w...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Lead GPU HPC Systems Engineer

Lead GPU HPC Systems Engineer

Sciforium • San Francisco, CA, United States
serp_jobs.job_card.full_time
A cutting-edge AI infrastructure company in San Francisco is seeking a Senior HPC & GPU Infrastructure Engineer to oversee the health and performance of its GPU compute clusters.The role involves m...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_1_day • serp_jobs.job_card.promoted
System Integration Engineer

System Integration Engineer

UCSF Health • San Francisco, CA, United States
serp_jobs.job_card.full_time
Be among the first 25 applicants.Certain terms and conditions of employment.This position is considered Flexible / Rotation. Employee will work on a rotational schedule to ensure on-site coverage of...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Recommendation System Engineer

Recommendation System Engineer

MeshyAI • San Francisco, CA, United States
serp_jobs.job_card.full_time
Recommendation System Engineer.Get AI-powered advice on this job and more exclusive features.Headquartered in the Silicon Valley, Meshy is the leading 3D generative AI company on a mission to.Meshy...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Firmware / Embedded Systems Engineer

Firmware / Embedded Systems Engineer

Skild AI • San Francisco, CA, United States
serp_jobs.job_card.full_time
At Skild AI, we are building the world's first general purpose robotic intelligence that is robust and adapts to unseen scenarios without failing. We believe massive scale through data-driven machin...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Founding Chip / Software Engineer

Founding Chip / Software Engineer

Silimate (YC S23) • San Francisco, CA, United States
serp_jobs.job_card.full_time
Founding Chip / Software Engineer.We’re growing the team and are looking for a Founding Engineer with significant experience with the VLSI front-end design flow and strong coding skills to build the ...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
System Engineer - Powering Autonomous Vehicles

System Engineer - Powering Autonomous Vehicles

WIT Recruiting • San Francisco, CA, United States
serp_jobs.job_card.full_time
System Engineer - Powering Autonomous Vehicles.Senior Systems Engineer – Functional Safety & Advanced Sensing.Shape the future of intelligent sensing systems. We’re seeking a visionary systems engin...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
System Software Engineer, Integrity

System Software Engineer, Integrity

OpenAI • San Francisco, CA, United States
serp_jobs.job_card.full_time
The Integrity team at OpenAI is dedicated to ensuring that our cutting-edge technology is not only revolutionary but also secure from a myriad of adversarial threats. We strive to maintain the integ...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Software Engineer, Infra / Systems

Software Engineer, Infra / Systems

Convex • San Francisco, CA, United States
serp_jobs.job_card.full_time
Convex is transforming the way developers build applications.Our mission is to fundamentally change how software is built on the Internet by empowering developers to create fast, reliable, and dyna...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Systems Engineer / L3 Support with Retail Industry

Systems Engineer / L3 Support with Retail Industry

Axius Inc • San Francisco, CA, United States
serp_jobs.job_card.full_time
Systems Engineer / L3 Support with Retail Industry.Advanced level of server, desktop and remote support knowledge.This experience should include Administration of the following : Windows Server (2000,...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
System Engineer (Alameda)

System Engineer (Alameda)

Acceler8 Talent • Alameda, CA, United States
serp_jobs.job_card.full_time
Systems Engineer - Video Intelligence Infrastructure - San Francisco.A Series A Funded start-up who already have millions in recurring revenue are building next-generation AI infrastructure for vid...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Senior Systems Engineer (Hiring Immediately)

Senior Systems Engineer (Hiring Immediately)

Center for Elders' Independence • Oakland, CA, United States
serp_jobs.job_card.full_time
The Center for Elders' Independence is a PACE (Program of All-Inclusive Care for the elderly) organization (PO) that uses an interdisciplinary team approach for care planning and implementing purpo...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_hours • serp_jobs.job_card.promoted • serp_jobs.job_card.new
Staff GPU Infra Engineer (HPC) - Remote-Flexible

Staff GPU Infra Engineer (HPC) - Remote-Flexible

Cohere • San Francisco, CA, United States
serp_jobs.filters.remote
serp_jobs.job_card.full_time
A leading AI infrastructure company in San Francisco is seeking a Staff Software Engineer to build and scale ML-optimized HPC infrastructure. This role involves managing Kubernetes-based GPU supercl...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
System Engineer I

System Engineer I

El Camino Health • San Francisco, CA, United States
serp_jobs.job_card.full_time
System Engineer I page is loaded## System Engineer Iremote type : Hybridlocations : San Francisco, CAtime type : Full timeposted on : Posted 3 Days Agojob requisition id : JR747 • •Career-defini...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Sr. System Engineer

Sr. System Engineer

Supermicro • San Francisco, CA, United States
serp_jobs.job_card.full_time
Supermicro is a global leader in advanced server, storage, and networking solutions powering Data Center, Cloud, Enterprise IT, Big Data, Hyperscale, HPC, and IoT / Embedded applications.Recognized a...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Senior Systems Engineer, Low-Latency IPC & Middleware

Senior Systems Engineer, Low-Latency IPC & Middleware

General Motors • San Francisco, CA, US
serp_jobs.job_card.full_time
A leading automotive technology company is seeking a Senior Software Engineer to design high-performance inter-process communication and middleware systems. You will optimize latency, throughput, an...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_hours • serp_jobs.job_card.promoted • serp_jobs.job_card.new