Talent.com
HPC Storage Systems Group Leader
HPC Storage Systems Group LeaderLawrence Berkeley National Laboratory • Berkeley, CA, United States
HPC Storage Systems Group Leader

HPC Storage Systems Group Leader

Lawrence Berkeley National Laboratory • Berkeley, CA, United States
job_description.job_card.variable_days_ago
serp_jobs.job_preview.job_type
  • serp_jobs.job_card.full_time
  • serp_jobs.job_card.part_time
  • serp_jobs.job_card.permanent
job_description.job_card.job_description

The National Energy Research Scientific Computing Center (NERSC) is inviting applications for the position of Storage Systems Group (SSG) Lead. NERSC's mission is to accelerate scientific discovery through high performance computing and data analysis for the Department of Energy's (DOE) Office of Science programs. NERSC is searching for a knowledgeable and inspired group leader for the Storage Systems Group who will be responsible for developing NERSC's storage strategy based on NERSC's systems roadmap, science workflows and user needs. They will provide vision and guidance to design, operate and simplify the storage environment for NERSC's 11,000+ users.

The SSG is responsible for NERSC's storage portfolio, including large scale high capacity parallel file systems and archival storage systems with an eye towards balancing performance, stability, and usability for NERSC's users who operate in a wide variety of DOE mission areas and scientific domains. The SSG Lead provides technical leadership to a group of highly skilled storage engineers who collaborate with other teams at NERSC to deliver innovative solutions to complex problems and a technical vision for the future of NERSC storage platforms.

The NERSC storage environment that SSG is responsible for today is composed of multiple tiers :

The NERSC hierarchical storage management system (presently High Performance Storage System (HPSS)) stores more than 450 PB of data for the scientific community and puts NERSC in the top 10 largest HPSS deployments globally.

NERSC provides a large-scale parallel community file system (presently Storage Scale) with more than 150 PB of online storage to the user community on a RDMA over Converged Ethernet (RoCE) fabric.

Home and common storage mounted via Storage Scale on several thousand nodes across NERSC.

In addition to the current environment, SSG will be responsible for the scratch and new quality of service storage systems in NERSC's latest GPU based supercomputer, named Doudna , to be operationalized in 2027. Doudna will deliver a tenfold increase in computing power to NERSC users along with new capabilities. The new Doudna environment will support larger and higher resolution data sets coming from new sensors, detectors, sequencers and telescopes from the scientific community and these data sets will need to be managed, shared and stored.

The Storage Systems Group lead is responsible for understanding existing and new emerging requirements, and deploying storage solutions in collaboration with other NERSC teams to support NERSC's broad user base of today and tomorrow. In doing so, the SSG Lead will drive the development and implementation of a holistic storage strategy to support changing scientific workflows and new technologies as part of Doudna and future NERSC system roadmaps. To accomplish this, the SSG Lead will be responsible for investigating new storage technologies and engaging with the vendor community on future roadmaps. The SSG Lead will work with the Data Center Department Head to provide guidance and priorities for the group based on NERSC's strategic plan and its goals.

You will :

Develop NERSC's storage strategy based on NERSC's systems roadmap, science workflows and user needs.

Lead a team that procures, installs, manages, supports and monitors NERSC's large scale storage systems, including providing 24x7 support.

Ensure NERSC's storage systems meet the needs of NERSC's 11,000 users by providing high performing, available, and usable systems.

Work independently and as part of the Storage Systems Group to diagnose and fix storage problems, help analyze storage system issues, and develop and implement workarounds and / or patches for software bugs.

Provide effective line management to a group of approximately 10 Computer Systems Engineers by hiring excellent staff and working closely with SSG staff members. Ensure staff are meeting goals, provide both positive and constructive feedback to staff and ensure all staff have career growth opportunities.

Provide technical leadership for implementation and deployment efforts for storage system improvements that enhance task automation, reliability, stability, usability, performance, and security.

Continuously evaluate new storage technologies and make recommendations on future storage strategy and directions for the center, including both parallel and hierarchical storage, that would create new capabilities and enhance storage and HPC system performance and usability.

Work closely with other teams at NERSC to enable large-scale simulation, data analysis and AI applications to run on NERSC supercomputing and storage systems.

Provide budgetary input and oversight for NERSC's storage systems.

Lead or collaborate efforts with other Department of Energy (DOE) Labs on future storage technologies, multi-lab storage efforts and other related topics.

Present at conferences and talks to promote NERSC to other national labs and HPC sites.

Create and develop a vision and strategy for the group and be a key part of NERSC's management team.

We are looking for :

Bachelor's degree in Computer Science, Engineering, Applied Mathematics, Computational Science (or related fields) and current applicable systems support and engineering experience, plus a minimum of 3 years of experience in a managerial role of complex computer systems, storage or networking unit.

Experience with storage technologies in a Linux environment, such as InfiniBand, RoCE, SAN / NAS, NFS, pNFS, hierarchical storage management systems (such as HPSS), Lustre, Storage Scale, VAST, and object stores.

Prior experience with HPC applications, workflows and computational and storage systems.

Experience in managing and supporting a 24 / 7 IT environment.

Ability to mentor staff to increase their knowledge and skills.

Deep and broad knowledge of storage technologies such as parallel filesystems (i.e. Storage Scale), hierarchical storage management (i.e. HPSS), distributed storage systems (i.e. VAST), and storage networking (i.e. InfiniBand or RoCE).

Demonstrated ability to work independently as well as collaboratively in large projects, and contribute to an active intellectual environment.

Ability to gather requirements from the scientific user community and turn requirements into system characteristics.

Strong technical and collaboration skills needed to create and deploy innovative ways of allowing our diverse user base to effectively utilize the unique resources that NERSC provides.

Understand balancing technical solutions with user needs and show initiative, tact and good judgment in developing solutions to problems.

Excellent written and verbal communication skills.

Desired skills / knowledge :

A Master's or PhD degree in related fields.

Knowledge of object storage and non-volatile storage technologies.

Experience administering and deploying storage systems of tens of petabytes (or greater) scale in a HPC environment.

We're here for the same mission, to bring science solutions to the world. Join our team and YOU will play a supporting role in our goal to address global challenges! Have a high level of impact and work for an organization associated with 17 Nobel Prizes!

Why join Berkeley Lab?

We invest in our employees by offering a total rewards package you can count on :

Exceptional health and retirement benefits , including pension or 401K-style plans

Opportunities to grow in your career - check out our Tuition Assistance Program [Only if Applicable to the Appointment]

A culture where you'll belong - we are invested in our teams!

In addition to accruing vacation and sick time, we also have a Winter Holiday Shutdown every year.

Parental bonding leave (for both mothers and fathers)

Pet insurance

Additional information :

Application date : Priority consideration will be given to candidates who apply by December 15, 2025 . Applications will be accepted until the job posting is removed.

Appointment type : This is a (full-time / part-time) career appointment, exempt (monthly paid) from overtime pay.

Salary range : The expected salary for this position is $203,496 - $248,736, which fits into the full salary of $180,876 - $305,268 depending upon the candidate's skills, knowledge, and abilities. This includes education, certifications, and years of experience.

Background check : This position is subject to a background check. Any convictions will be evaluated to determine if they directly relate to the responsibilities and requirements of the position. Having a conviction history will not automatically disqualify an applicant from being considered for employment.

Work modality : This position requires substantial on-site presence, but is eligible for a flexible work mode, and hybrid schedules may be considered. Hybrid work is a combination of performing work on-site at Lawrence Berkeley National Lab, 1 Cyclotron Road, Berkeley, CA and some telework. Individuals working a hybrid schedule must reside within 150 miles of Berkeley Lab. Work schedules are dependent on business needs. A REAL ID or other acceptable form of identification is required to access Berkeley Lab sites (for more information click here ).

Export Control : This position will involve access to hardware, commodities, and technical information subject to export control regulations including, but not limited to, the Export Administration Regulations ("EAR") and / or International Traffic in Arms Regulations ("ITAR"). Accordingly, any hiring decision may depend in part on Berkeley Lab's ability to obtain or rely on federal government authorizations as required, if you are not a U.S. citizen, lawful permanent resident of the U.S. ("green card holder"), asylee, refugee, or other qualifying protected individual as defined by 8 U.S.C. 1324b(a)(3).

Want to learn more about working at Berkeley Lab? Please visit : careers.lbl.gov

Equal Employment Opportunity Employer : The foundation of Berkeley Lab is our Stewardship Values : Team Science, Service, Trust, Innovation, and Respect; and we strive to build community with these shared values and commitments. Berkeley Lab is an Equal Opportunity Employer. We heartily welcome applications from all who could contribute to the Lab's mission of leading scientific discovery, excellence, and professionalism. In support of our rich global community, all qualified applicants will be considered for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, age, protected veteran status, or other protected categories under State and Federal law.

Berkeley Lab is a University of California employer. It is the policy of the University of California to undertake affirmative action and anti-discrimination efforts, consistent with its obligations as a Federal and State contractor.

Misconduct Disclosure Requirement : As a condition of employment, the finalist will be required to disclose if they are subject to any final administrative or judicial decisions within the last seven years determining that they committed any misconduct, are currently being investigated for misconduct, left a position during an investigation for alleged misconduct, or have filed an appeal with a previous employer.

serp_jobs.job_alerts.create_a_job

Group Leader • Berkeley, CA, United States

Job_description.internal_linking.related_jobs
Solutions Engineer, Enterprise

Solutions Engineer, Enterprise

Scale AI, Inc. • San Francisco, CA, United States
serp_jobs.job_card.full_time
Scale plays a vital role in the development of AI applications.Our customer base is growing exponentially, and you will be on the front lines, ensuring that the world's most innovative companies be...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Member of Technical Staff - Machine Learning

Member of Technical Staff - Machine Learning

Quantix Search • Sonoma, CA, US
serp_jobs.job_card.full_time
Member of Technical Staff – Machine Learning.San Francisco | Hybrid, 3 days / week | $200K – $280K + equity.I’m partnering with a rapidly scaling healthtech startup that has just ra...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Distributed Systems Engineer / AI Workloads

Distributed Systems Engineer / AI Workloads

The Crypto Recruiters • San Francisco Bay Area, United States
serp_jobs.job_card.permanent
We are actively searching for a Distributed Systems Engineer to join our team on a permanent basis.In this founding engineer role you will focus on building next-generation data infrastructure for ...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Staff Systems Engineer

Staff Systems Engineer

Bio-Rad Laboratories • Hercules, CA, United States
serp_jobs.job_card.full_time
Working within Bio-Rad's Life Science R&D Group as a Systems Engineer, you will take engineering concepts, requirements and transform them into functional prototypes and finished products that impr...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Solution Director, Employee Experience Solutions - Healthcare Growth

Solution Director, Employee Experience Solutions - Healthcare Growth

PG Forsta • Emeryville, CA, United States
serp_jobs.job_card.full_time
PG Forsta is the leading experience measurement, data analytics, and insights provider for complex industries-a status we earned over decades of deep partnership with clients to help them understan...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Distributed Systems Engineer (Alameda)

Distributed Systems Engineer (Alameda)

DeepRec.ai • Alameda, CA, US
serp_jobs.job_card.part_time
A fast-moving AI research group is building the core video data infrastructure used by leading AI labs and major tech companies. The team is small at around fifteen people, nearly all engineers, and...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Senior Technology Consultant (Sonoma)

Senior Technology Consultant (Sonoma)

Cobaltix • Sonoma, CA, US
serp_jobs.job_card.full_time +1
Cobaltix is looking for a senior technology consultant with experience working directly with clients, planning and implementing major technology projects. Time management and excellent communication...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Ground Software & Systems Manager - Mission Operations (0346U), Space Sciences Laboratory - #81263

Ground Software & Systems Manager - Mission Operations (0346U), Space Sciences Laboratory - #81263

University of California-Berkeley • Berkeley, CA, United States
serp_jobs.job_card.full_time +1
At the University of California, Berkeley, we are dedicated to fostering a community where everyone feels welcome and can thrive. Our culture of openness, freedom and belonging make it a special pla...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Host

Host

TORC • Napa, CA, US
serp_jobs.job_card.full_time
TORC restaurant in Napa is looking for a hospitable, talented, enthusiastic HOST to join our team.We are hiring a Full time HOST / HOSTESS. All applicants should have experience, be passionate ab...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Shift Leader

Shift Leader

Insomnia Cookies • Berkeley, CA, United States
serp_jobs.job_card.full_time
Bancroft Way, Suite 2, Berkeley CA, 94704.You understand what it takes to "Own the Night" (and beyond), by predicting the flow of business during a successful shift. Your hands-on leadership style ...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Product Development Engineer, Reagents

Product Development Engineer, Reagents

Bruker • Emeryville, CA, United States
serp_jobs.job_card.full_time +1
Product Development Engineer, Reagents.Bruker is enabling scientists to make breakthrough discoveries and develop new applications that improve the quality of human life. Bruker's high-performance s...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
System Software Engineer - Storage

System Software Engineer - Storage

Verkada • San Mateo, California, United States
serp_jobs.job_card.full_time
Verkada is the largest cloud-based B2B physical security platform company in the world.Only Verkada offers six product lines — video security cameras, access control, environmental sensors, alarms,...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Registry Data Systems Analyst- Remote - 136400

Registry Data Systems Analyst- Remote - 136400

UC San Diego Health • Richmond, CA, United States
serp_jobs.filters.remote
serp_jobs.job_card.full_time
This position is limited to California Residents and may require travel to Richmond and / or Sacramento, California.UCSD Layoff from Career Appointment. Apply by 8 / 27 / 2025 for consideration with prefe...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Warehouse Lead

Warehouse Lead

LHH • Richmond, CA, United States
serp_jobs.job_card.full_time
A growing warehouse operation in Richmond is seeking a reliable and hands-on.This role is ideal for someone with strong leadership skills, excellent attention to detail, and experience working in f...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_hours • serp_jobs.job_card.promoted • serp_jobs.job_card.new
Site Reliability Engineer

Site Reliability Engineer

Cypress HCM • Sonoma, CA, United States
serp_jobs.job_card.full_time
As a Site Reliability Engineer (Contractor), you will be a hands-on contributor, focused on supporting and improving the reliability of our AWS cloud infrastructure. You will apply core SRE principl...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Senior Site Reliability Engineer - Managed Kubernetes

Senior Site Reliability Engineer - Managed Kubernetes

Lambda • San Francisco, California, United States
serp_jobs.filters.remote
serp_jobs.job_card.full_time
We're here to help the smartest minds on the planet build Superintelligence.The labs pushing the edge? They run on Lambda. Our gear trains and serves their models, our infrastructure scales with the...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Solutions Architect (San Francisco)

Solutions Architect (San Francisco)

Strategic Employment Partners (SEP) • San Francisco, CA, US
serp_jobs.job_card.full_time +1
A well-established organization with complex systems is hiring a.In this role, youll design scalable systems, define integration strategies, and guide cloud transformation initiatives.Lead architec...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Oracle ERP Reporting Lead (Sonoma)

Oracle ERP Reporting Lead (Sonoma)

Nelson Connects • Sonoma, CA, US
serp_jobs.job_card.part_time
Oracle ERP Cloud reporting tools.You have a passion for data accuracy, enjoy collaborating across departments, and can deliver insights that drive business decisions. As our Oracle Reporting Lead yo...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted