years production level experience with distributed applications at scale in public and / or private cloud
Experience architecting and implementing large scale Observability platforms
degree in Computer Science or related technical field
Work in a diverse and distributed team environment!
Must Have
Programming experience with languages like Go, Python, Java; Experience building integrations and applications to large-scale environments.
Experience with UI technologies like Javascript, React, backstage etc.
Experience with internally hosted Observability and Tooling systems like Splunk, Prometheus, Github, Jenkins, Artifactory, assisting clients and improving environment performance and stability
Experience with container platforms like Kubernetes
Experience designing and implementing systems for fault tolerance, scalability and stability.
Experience developing, deploying and running distributed applications on cloud platforms. Experience with container and orchestration technologies (Docker, Kubernetes)
Ensure the highest level of up-time and Quality of Service (QoS) to Client’s customers through operational excellence
Knowledge in defining service level objectives (SLOs) and service level indicators (SLIs) to represent and measure service quality
Knowledge of (public and / or private) cloud
Collaborate with SRE and Engineering / Product teams in driving critical initiatives.
Experience in solving performance and stability issues using a wide variety of tools
Exceptional communicator in and across teams, driving projects to completion
Impacts the organization through contribution to technical direction and strategic decisions.
Good to Have
Experience with other Observability tooling like Grafana, Cortex, Tempo, Jaeger is helpful
Experience with Open-Source products / community like Open telemetry
Familiar with a variety of the cloud security and automation concepts, practices and procedures.