Platform Reliability Engineer

Detalles de la oferta

Job Title: Platform Reliability Engineer Career Level - E Introduction to role: Join us as a Platform Reliability Engineer in our Commercial IT – SSD, Data, Analytics and AI Platform Success Team. Your primary focus will be to ensure the stability, performance, and reliability of our Data, Analytics, and AI systems. You will bridge the gap between development and operations by generating insights into sub-optimal processes and optimization opportunities. This role offers an exciting opportunity to integrate Agile, Lean and SaFe practices within monitoring and observability initiatives and to continuously improve delivery cycle times.
Accountabilities:As a Platform Reliability Engineer, you will be responsible for the evaluation, selection, and deployment of monitoring & observability technologies. You will manage and maintain monitoring infrastructure, ensuring it aligns with industry best practices. You will collaborate with DevOps, CriticalOps, and IT leadership teams to understand system requirements and design effective monitoring strategies. You will also develop and implement monitoring solutions for infrastructure, applications, and services.
Responsibilities:Ensuring the stability, performance, and reliability of Data, Analytics, and AI systems by implementing and maintaining robust monitoring and observability solutions.Designing, deploying, and managing monitoring tools and practices that provide insights into the health and performance of our data infrastructure and analytics processes.Bridging the gap between development and operations by generating insights into sub-optimal processes and optimization opportunities.Maintaining working knowledge of platform architecture and business acumen.Integrating Agile, Lean, and SaFe practices within monitoring and observability initiatives to continuously improve delivery cycle times.Exploring and implementing new ways to automate systems, designing and testing automation processes, identifying quality issues, and supporting IT platform teams to eliminate defects and errors with product and platform development.Experience leveraging AIOps capabilities to uplift existing production operationsTechnology/Tool ManagementResponsible for the evaluation, selection, and deployment of monitoring & observability technologies suitable for the organization's needs.Manage and maintain monitoring infrastructure, ensuring it aligns with industry best practices.Monitoring & Observability Practice ManagementCollaborate with DevOps, CriticalOps, and IT leadership teams to understand system requirements and design effective monitoring strategies.Establish key metrics and KPIs that enable insights and analytics to achieve data-driven continuous improvement.Provide training and support to other teams on using monitoring tools effectively.Create and maintain documentation for monitoring and observability practices, including standard operating procedures and best practices.Stay abreast of industry trends, emerging technologies, and best practices related to monitoring and observability platforms.Monitoring & Observability Implementation & OperationsDevelop and implement monitoring solutions for infrastructure, applications, and services.Design and configure alerting mechanisms to proactively respond to potential issues.Use monitoring tools to identify and troubleshoot issues in real-time.Collaborate with other teams to resolve incidents promptly and prevent reoccurrence.Analyze monitoring data to identify performance bottlenecks and areas for improvement.Work with development and operations teams to optimize system performance based on monitoring insights.Implement automation scripts and workflows to streamline monitoring processes.Integrate monitoring solutions with existing frameworks for seamless operation.Identify and evaluate "self-healing" opportunities based on production issue trend analysis to inform AIOps roadmap.Essential Qualifications:Degree level education in computer science, information technology, or a related field.Proven experience as a monitoring and observability engineer or a similar role.Proficient in developing monitoring capabilities and configuring integration with tools such as Prometheus, Grafana, Splunk, SumoLogic, DataDog, DynaTrace, etc.Strong scripting skills (e.g., Python) for automation in data environments.Familiarity with logging, tracing, and APM (Application Performance Monitoring) solutions.Desirable Qualifications:Customer engagement experience.Knowledge of data processing frameworks (e.g., Apache Spark) and data storage solutions (e.g., data lakes, warehouses).Experience with data orchestration tools (e.g., Apache Airflow).Understanding of data lineage and metadata management.Ready to make a difference? Apply today and be part of a team that has the backing to innovate, disrupt an industry and change lives.

#J-18808-Ljbffr


Salario Nominal: A convenir

Fuente: Jobleads

Requisitos

Speculative Application

Univrse is a Barcelona-based VR studio developing Univrse Framework, a solution that revolutionizes location-based VR experiences (www.univr.se ). Our multid...


Univrse - Barcelona

Publicado 10 days ago

Programador/A Fullstack Php

¿Estas buscando una nueva oportunidad como Fullstack Developer? ¿Quieres trabajar para una empresa puntera en el sector tecnológico y del deporte? Pue esta e...


Talent Match - Barcelona

Publicado 25 days ago

Freelance Headhunter & Candidate Sourcing Specialist (High-Volume)

Bringing a personalized approach to connecting exceptional talent with unique opportunities. Specializing in recruitment for diverse roles, leveraging extens...


Salve.Inno Consulting - Barcelona

Publicado 10 days ago

Devops Engineer

Are you ready for the next step? We are looking for someone with a developer's mindset who can strengthen our team with expertise in modern application opera...


Sd Worx - Barcelona

Publicado 10 days ago

Built at: 2024-11-25T18:45:49.583Z