We are seeking a highly skilled and experienced Head of Machine Learning Infrastructure to lead our machine learning infrastructure and operations. As the Head of ML Infrastructure, you will be responsible for building, maintaining, and optimizing the infrastructure that supports our machine learning models and systems to ensure optimal velocity and cost efficiency. You will collaborate with data scientists, engineers, and other stakeholders to ensure the scalability, reliability, and efficiency of our ML operations.
Key Responsibilities:
Develop and implement a strategic vision for the ML infrastructure that supports the company's data science and machine learning initiatives.
Lead and manage a team of MLOps engineers and infrastructure specialists, fostering a culture of innovation, collaboration, and excellence.
Design, build, and maintain robust ML infrastructure for training, deploying, executing, and monitoring machine learning models.
Ensure the scalability, reliability, and security of the ML infrastructure to support high-performance and real-time ML applications.
Collaborate with data scientists to optimize model performance and ensure seamless integration of models into production environments.
Automate ML workflows, including data ingestion, preprocessing, feature engineering, model training, evaluation, and deployment.
Implement best practices for version control, testing, and continuous integration/continuous deployment (CI/CD) of ML models.
Develop and maintain monitoring and alerting systems to ensure the health and performance of ML models in production.
Stay up-to-date with the latest advancements in ML infrastructure, tools, and technologies, and apply them to improve our systems.
Provide technical leadership and guidance to cross-functional teams on ML infrastructure and operations.
Qualifications:
Bachelor's or Master's degree in Computer Science, Engineering, or a related field; PhD preferred.
6+ years of experience in ML infrastructure, DevOps, or a related role, with a strong focus on deploying and managing machine learning models in production.
Experience with ML frameworks and libraries such as TensorFlow or PyTorch.
Strong knowledge of cloud platforms (e.g., DataBricks, AWS) and containerization technologies (e.g., Docker, Kubernetes).
Familiarity with data engineering concepts and tools (e.g., Apache Spark, Apache Kafka, Airflow).
Excellent problem-solving skills and the ability to work in a fast-paced, dynamic environment.
Strong communication skills and the ability to collaborate effectively with cross-functional teams.
Knowledge of the AdTech industry and advertising technologies is highly desirable.
#J-18808-Ljbffr