.Ebury is a hyper-growth FinTech firm, named as one of the top 15 European Fintechs to work for by AltFi.We offer a range of products including FX risk management, trade finance, currency accounts, international payments, and API integration.Position: Senior Site Reliability EngineerLocation: Ebury Madrid Office - Hybrid (4 days in the office, 1 day working from home)In this role, you will be working within one of our platform engineering teams to ensure that our platform meets the needs of our customers and business objectives.What we offer: Variety of meaningful and competitive benefits to meet your needs; Competitive salary; Continuous professional growth through our career progression framework with regular reviews; Equity processthrough a performance bonus; Paid time off and local public holidays; Continued personal development through training and certification; Being part of a diverse technology team that cares deeply about culture and best practices, and believes in agile principles; Contribute to our technical design through our open and collaborative Request For Comments (RFC) process; We are Open Source friendly, following Open Source principles in our internal projects and encouraging contributions to external projects. Why should I join Ebury? Want to work in a high-growth environment? We are always growing. Want to build a better world? We believe in inclusion.We stand against discrimination in all forms and have no tolerance for intolerance. At Ebury, you will find an internal group dedicated to discussing how we can build a more diverse and inclusive workplace. If you're excited about this job opportunity but your background doesn't match exactly the requirements, we strongly encourage you to apply anyway.What you will do: Work within a team of SREs to ensure high availability and reliability of our systems. Develop and maintain monitoring, incident management, and troubleshooting of infrastructure and applications. Utilize observability tools to gain insights into system performance and health, and make decisions for improvements. Design and implement automation tools and processes to improve efficiency and reduce downtime. Perform on-call on a rotating basis to address high-severity incidents and ensure system availability. Work closely with development teams to ensure that their applications are designed for scalability and reliability. Participate in the design and implementation of new systems and services to ensure they meet our reliability and scalability requirements. Keep up-to-date with emerging technologies, tools, and practices related to SRE and infrastructure. What we expect from you: Several years of relevant industry experience building large scale distributed systems. Solid understanding of cloud architecture and application deployment patterns on GCP or AWS. Experience operating web-scale deployments of containerised systems on Kubernetes and Amazon Container Services