Roche fosters diversity, equity, and inclusion, representing the communities we serve.
When dealing with healthcare on a global scale, diversity is an essential ingredient to success.
We believe that inclusion is key to understanding people's varied healthcare needs.
Together, we embrace individuality and share a passion for exceptional care.
Join Roche, where every voice matters.
The PositionThe role requires the candidate to be available for on-call duty service, responding promptly to urgent issues and emergencies outside of regular working hours, ensuring that critical situations are addressed in a timely and effective manner.
Your MissionDesign and maintain cutting-edge tools, scripts, and frameworks that automate repetitive tasks, streamline software deployment, and manage expansive systems with unparalleled efficiency.
Partner closely with forward-thinking development teams to architect and implement high-performance solutions that elevate system efficiency, optimize resource utilization, and enhance deployment processes for superior uptime and user satisfaction.
Your Core ResponsibilitiesReliability Mastery: Proactively monitor and maintain system reliability using advanced tools like DataDog, VictorOps, ELK, Grafana, and Prometheus.
Become a key player in ensuring system stability and performance.Uptime Guardian: Ensure optimal uptime and performance by swiftly identifying issues and responding to alerts with precision.Technical Troubleshooter: Basic understanding of architecture and designs to deep dive into complex technical issues, troubleshoot, investigate, and resolve them.
Collaborate seamlessly with engineering teams to enable timely and effective resolutions.Service Excellence: Maintain and consistently achieve defined SLAs, SLIs, and SLOs, ensuring service levels are consistently met or exceeded.Automation Innovator: Develop and deploy automation scripts (using Python or other scripting languages) to streamline operations, enhance system efficiencies, and reduce manual tasks.Cloud Steward: Manage and maintain robust infrastructure across AWS and Azure environments, implementing best practices to ensure peak performance and reliability of cloud-based applications.Cross-functional Collaborator: Work closely with engineering, DevOps, security, and operations teams to drive continuous improvement and foster a culture of reliability and inclusion.Incident Responder: Handle requests and incidents through JIRA and ServiceNow, documenting troubleshooting procedures, solutions, and lessons learned to fuel ongoing improvements.Flexible Scheduling: Work on-call outside of normal working hours and weekends as scheduled to ensure continuous support.Team Builder: Actively contribute to the growth and development of the SRE team's capabilities, nurturing a stronger, more inclusive, and resilient team.
#J-18808-Ljbffr