.Company Description: Hi, we're Nexthink. We're not just the leader in the digital employee experience category, we invented the category. Our solutions combine real-time analytics, automation and employee feedback across all endpoints to help IT teams delight people at work. Our cloud-native platform pinpoints issues and solutions, automates response, and helps companies continuously improve their employees' experience, making them more productive, efficient, and happy at work. We have millions of endpoints deployed, we've surpassed $100M in ARR, and we've recently secured $180M in Series D financing for a company valuation of $1.1B, but we're just getting started.Job Description: You will be responsible for ensuring the smooth operation of our cloud infrastructure, with a primary focus on Kubernetes (k8s) and the deployment of the cloud infrastructure and services. Your expertise will be crucial in maintaining our services' high availability, performance, and efficiency, ensuring our customers enjoy a seamless user experience.Responsibilities:Manage and maintain our Kubernetes clusters, including deployment, configuration, and upgrades. Ensure the stability and scalability of the clusters to accommodate increasing demands.Utilize your hands-on knowledge to automate routine tasks and streamline operations. Implement infrastructure as code (IaC) practices to facilitate rapid and reliable deployments, ensuring efficient resource provisioning and management.Participate in an on-call rotation, providing prompt responses and resolution to critical incidents. Your commitment to keeping the cloud infrastructure up and running will be crucial to maintaining high availability.Proactively identify potential issues and troubleshoot system anomalies.Collaborate with other teams to address incidents and implement preventive measures to reduce downtime.Set up and maintain comprehensive monitoring and alerting systems to detect anomalies, capacity constraints, and potential performance bottlenecks. Ensure timely responses to alerts and alarms.Maintain accurate and up-to-date documentation of processes, procedures, and troubleshooting guides to facilitate knowledge sharing and standardization.Qualifications:Strong hands-on experience in managing Kubernetes clusters in a production environment.Knowledge in config automation (Ansible), CI/CD (Jenkins), IaC (Terraform, Crossplane) for infrastructure management. Also proficient in at least one scripting language (bash, python).Familiar with source code management solutions (GitHub, Bitbucket) and the Atlassian suite (JIRA, Confluence).Experience working in an on-call rotation environment and running operations.Proven problem-solving skills and the ability to troubleshoot complex technical issues.Deep commitment to maintaining high system reliability and availability.Familiarity with AWS cloud computing platform and related services