SRE Tooling & Observability Platform Expert Job title: SRE Tooling & Observability Platform Expert Spain / Barcelona About the job At Sanofi CHC, we're committed to providing the next-gen healthcare that patients and customers need. Join our team as SRE Tooling & Observability Platform Expert and you can help make it happen. Your job? The SRE Tooling & Observability Platform Expert at CHC is a specialized role designed to enhance the reliability, scalability, and efficiency of our platforms through expert implementation and management of SRE tooling. This role focuses on logging and analyzing, alerting, monitoring, configuration and Infrastructure as Code (IaC), and incident management to ensure high availability and performance across all systems. The SRE Tooling Expert will work closely with the platform engineering and site reliability teams to develop and maintain a robust tooling ecosystem that supports CHC's operational and business goals.
Main responsibilities: Develop and maintain a comprehensive suite of SRE tools for logging, analyzing, alerting, and monitoring to ensure system reliability and performance.
Implement and manage configuration and Infrastructure as Code (IaC) solutions to automate and streamline infrastructure provisioning and management processes.
Design and implement effective incident management strategies and tools to quickly identify, respond to, and resolve system issues.
Collaborate with engineering teams to integrate SRE tooling into the development and operational lifecycle, enhancing system observability and reliability.
Continuously evaluate and introduce improvements to the SRE tooling ecosystem, staying ahead of industry trends and best practices.
Participate in the planning and execution of system scalability and reliability initiatives, ensuring the infrastructure can support growing workloads and traffic.
Partner with Operations team, ensuring they have all the tools needed to be best in class.
About you Experience: The ideal candidate for the SRE Tooling Expert position at CHC is someone who possesses a deep technical proficiency across a broad spectrum of SRE tooling and demonstrates a proven track record of applying these skills in a dynamic environment. This individual will have extensive experience in logging, analyzing, alerting, monitoring, and incident management, showcasing their ability to ensure system reliability and performance.
Soft skills: Strong analytical and critical thinking skills, with the ability to develop creative solutions to complex problems.
Excellent communication skills, ensuring clear and effective technical information exchange among various stakeholders.
Technical skills: Proficiency with logging tools such as ELK (Elasticsearch, Logstash, Kibana), Splunk, Dynatrace or Datadog.
Experience with alerting tools like Prometheus Alertmanager, Grafana, or PagerDuty.
Expertise in implementing monitoring solutions using tools such as Prometheus, Grafana, Nagios, or Zabbix.
Strong background in using IaC tools like Terraform, Ansible, or CloudFormation.
Knowledge of incident management processes and tools (such as JIRA Service Desk, ServiceNow, or Opsgenie).
Familiarity with cloud services (AWS, Azure, Google Cloud Platform) and their respective management and monitoring tools.
Proficiency in scripting languages (such as Python, Bash, or PowerShell).
Awareness of security best practices and tools for monitoring security events.
Education: A relevant degree in Computer Science, Information Technology, or related fields.
Certifications in relevant SRE tooling and methodologies are highly desirable.
Languages: Fluency in written and spoken English.
Sanofi is dedicated to supporting people through their health challenges. We are a global biopharmaceutical company focused on human health.
#J-18808-Ljbffr