We’re seeking a Senior Platform Engineer with deep expertise in event-driven architectures, particularly leveraging Apache Kafka and Apache Flint, to help design, build, and scale our next-generation streaming platform. You will be a technical leader responsible for driving the architecture and reliability of real-time data pipelines that power mission-critical services across the organization.
In this role, you’ll collaborate with software engineers, data scientists, and infrastructure teams to deliver robust, observable, and scalable streaming systems. You’ll also bring strong hands-on experience in Kubernetes and public cloud environments (AWS, GCP, or Azure) to optimize deployment, orchestration, and resilience.
Key Responsibilities :
Design and implement scalable, fault-tolerant streaming data platforms using Apache Kafka and Apache Flink.
Lead architectural decisions and define best practices for real-time data processing and delivery.
Develop and maintain self-service infrastructure patterns and tools to enable internal teams to consume, process, and produce streaming data effectively.
Optimize system performance, reliability, and observability in a Kubernetes-based environment.
Drive infrastructure as code practices and automate deployment workflows using tools like Terraform, Helm, and CI / CD pipelines.
Collaborate with data and engineering teams to support use cases across analytics, ML, and operational systems.
Champion platform reliability, scalability, and cost-efficiency across public cloud platforms (AWS, GCP, or Azure).
Mentor junior engineers and help shape the technical roadmap for the platform.
Required Qualifications :
Deep expertise in Apache Kafka (including Kafka Streams, Connect) and Apache Flink (DataStream API, state management, CEP, etc.).
Hands-on experience running and managing workloads in Kubernetes.
Solid experience with cloud-native technologies and services in AWS, Google Cloud, or Azure.
Strong programming skills in Java, Scala, or Python.
Proficiency with observability stacks (, Prometheus, Grafana, OpenTelemetry) and debugging distributed systems.
Familiarity with infrastructure-as-code tools like Terraform, Pulumi, or similar.
Strong communication skills and the ability to drive technical initiatives across teams.
Data Platform Engineer • Concord, CA