About the Team
We are the Online Storage team powering ChatGPT, Sora, and the OpenAI APIs. We’re a growing team set up to own the databases and online‑storage infrastructure that serve all our products.
About the Role
As OpenAI scales, we’re seeking experienced, problem‑solving engineers to build robust, high‑performance, and scalable database systems. Our ability to rapidly iterate on products while ensuring reliability and speed is key to our success.
You’ll work in a fast‑paced, collaborative environment, building systems that serve hundreds of millions of users globally, with a strong emphasis on safety, reliability, and performance.
We’re hiring skilled software engineers to join the Online Storage team. You’ll help design and build a large‑scale database, collaborate with various product teams to scale it to meet their needs, and own operational excellence by defining SLAs and KPIs that directly satisfy stakeholder expectations. This is a critical role for engineers who thrive on solving complex, large‑scale challenges and are passionate about building resilient systems that perform under load.
In this role, you will :
Design and build highly scalable, reliable, and performant database
Design and build highly simple and intuitive APIs for the underlying database
Analyze and resolve performance and scalability bottlenecks to improve overall system efficiency
Debug, instrument, and fix system issues — from pinpointing root causes to delivering long-term solutions
Define technical strategy and guide the development of robust infrastructure that supports high-scale production systems and evolving business needs
Collaborate closely with product teams to deeply understand requirements and deliver impactful solutions
Boost engineering productivity by building intuitive tools and systems that empower fellow developers
Own the reliability of the systems you build, including participating in an on-call rotation to address critical incidents
You might thrive in this role if you :
Have experience building (and rebuilding) production systems to support new product capabilities and growing scale
Care deeply about the end-user experience and take pride in solving real customer needs
Embrace a humble, collaborative mindset and go the extra mile to support your teammates and the broader mission
Own problems end-to-end — you're comfortable learning on the fly to fill gaps and get things done
Build internal tools that improve workflows when off-the-shelf solutions fall short
Have hands-on experience with distributed systems such as data storage, caching, search, or other backend infrastructure components
Prioritize the reliability, scalability, and performance of large-scale systems
Thrive in ambiguous, fast-paced environments and enjoy iterating rapidly on product and research initiatives
Qualifications :
4+ years of industry experience, including 2+ years leading large-scale, complex projects or technical initiatives as an engineer or tech lead
Strong passion for building distributed systems at scale, with a focus on reliability, scalability, security, and continuous improvement
Expertise in systems programming, with hands-on experience in multi-threading and concurrency; proficiency in C++ and / or Python is highly preferred
Preferably, domain experience in areas such as databases, large-scale data systems, storage, caching, search, or other core components of distributed infrastructure
Excellent communication skills, with the ability to build consensus across diverse technical and non-technical stakeholders
Software Engineer • San Francisco