The Site Reliability Engineer Lead (SRE Lead) at Screening Eagle will lead a team of SREs to ensure the stability, resilience, and scalability of our services through automation, testing, and engineering. This role involves leveraging expertise from product systems/operations, cloud infrastructure (AWS), build and release engineering, software development, and stress/load testing to guarantee our services are available, cost-efficient, and fit for purpose from the early stages of development. 5+ years of experience developing AWS cloud infrastructure and 7+ years of experience leading teams.
What will you doCloud Infrastructure Management and NetworkingDesign, develop, and implement cloud infrastructure using Terraform.Optimize resources for cost-efficiency and performance.Ensure infrastructure security and implement service control policies (e.g., Control Tower).Configure AWS VPC flow logs, load balancer logging, Direct Connect, AWS VPN, TGX, etc.Monitoring, Support, and PrototypingImplement robust monitoring and alerting systems.Set up and monitor CI/CD pipelines both on-premises and in the cloud.Enhance monitoring, logging, and alerting practices.Create prototypes and lead development teams in implementing solutions.Team Leadership, Collaboration, and DocumentationLead the SRE team, ensuring technical quality and best practices.Guide the team through the software development lifecycle.Collaborate with developers and operations to integrate infrastructure changes.Document DevOps changes, technical partnerships, design, integration, testing, and deployment.Innovation, Quality Assurance, and Process ImprovementEvaluate risks, customize applications, and lead quality practices.Focus on agile methodologies, test automation, and continuous integration.Simplify and automate complex processes to ensure quality and operational excellence.Improve the DevOps toolchain and streamline software delivery processes.Stop projects/products if solutions are not technically acceptable.What do we expectExtensive experience in implementing and evolving DevOps practices across multi-disciplinary teams and business frameworks.Strong background in leading technology change programs and managing projects.In-depth knowledge and experience with AWS services (EC2, S3, VPC, IAM, etc.).Expert-level proficiency in Terraform, including writing reusable modules and leveraging best practices.Highly skilled with Kubernetes, Terraform, serverless, and AWS in general.Proficient in non-functional testing, including performance, security, and cost optimization.Experience working with advanced architectures such as ARM and AWS Graviton, optimizing for performance, cost-efficiency, and scalability.Knowledge of K8S operator programming and those related with GPU-based architectures.Competent in using different arch build tools and practices.Expertise in Git and GitOps philosophy.
#J-18808-Ljbffr