We are looking for a Tech Lead reporting to the SRE Manager to act as a major contributor in building a brand new top-tier SRE team. As a founding member, you will shape the future of Auth0 by helping define what reliability and incident response means. You may be a Team Lead, a Manager looking to return to hands-on work, or a Senior SRE/Platform/Software Engineer seeking the next step on the technical track and are excited by the opportunity to support the migration, availability, security, and releases of large scale, highly available,
distributed systems running on multiple global cloud products.
SkillsParticipate in regular on-call rotations to ensure 24/7 coverage of all critical systemsDeep understanding of the Software Development Lifecycle (SDLC) and implementing observability as a first-class citizenExperience designing and operating large scale distributed systems (preference given to those with experience in multi-region, multi tenant microservice systems)Strong understanding of the Agile software development methodology, including leading sprints or previous experience as a Scrum Master or CoachTeam members naturally gravitate to you for leadership and mentorshipResponsibilitiesOwn the day-to-day operations of the Auth0 SRE TeamCollaborate with other Engineering teams to support services before they go live through activities such as system design consulting, developing software platforms and frameworks, monitoring/alerting, capacity planning and launch reviewsImplement automation and champion the evolution of systems by facilitating changes that improve reliability and velocityTriaging and troubleshooting complex production issues to ensure reliability and performanceDeveloping and maintaining technical documentation, runbooks, and proceduresExperience2+ years on an SRE team2+ years team leadership experience or 5+ years experience as a senior technical resource5+ years of professional software development experience (golang, python)5+ years of professional experience in cloud infrastructure automation/orchestration (Terraform, Kubernetes)Experience with observability practices such as time series, tracing, logging and alerting (Datadog, Kibana, Lightstep, PagerDuty, CloudWatch)Experience defining and implementing incident response processesExperience with Jira, GitHub and similar development toolsOkta is an Equal Opportunity Employer.
LI-Remote
LI-EZ1
#J-18808-Ljbffr