Company Description At Dubme.io, we are committed to breaking down language barriers worldwide with state-of-the-art technologies such as AI, LLM, and Cloud Computing.
Our mission is to make audiovisual content accessible to all 5 billion people globally, no matter their language.
Role Description We are seeking a highly skilled AI Platform Engineer with a PhD in a relevant field to lead the development, optimization, and maintenance of the AI infrastructure powering our platform.
This role focuses on designing and implementing scalable, efficient, and robust AI solutions, primarily using Python and Google Cloud Platform (GCP).
The ideal candidate will collaborate with researchers to ensure seamless integration of cutting-edge models into production environments.
Key Responsibilities Infrastructure Design and Maintenance Architect and maintain scalable AI infrastructure on GCP to support model training, deployment, and monitoring.
Develop pipelines for data preprocessing, model training, and inference.
Optimize GPU usage and manage resource allocation for AI workloads.
Technical Implementation Implement and integrate machine learning models into production using Python-based frameworks and libraries.
Ensure high performance and reliability of AI systems in a live environment.
Collaboration and Support Work closely with researchers to transition models from experimentation to production.
Collaborate with the rest of the team (backend in Typescript mainly) Provide technical guidance on the best practices for AI development and deployment.
Decision-Making and Leadership Evaluate and recommend tools, frameworks, and architectures that align with business goals.
Stay updated with the latest trends in AI and cloud-based solutions to drive continuous improvement.
Monitoring and Troubleshooting Develop monitoring and alerting solutions to ensure system reliability and performance.
Troubleshoot and resolve issues in the AI pipeline promptly.
Qualifications Proficiency in Python and its AI/ML ecosystem (e.g., TensorFlow, PyTorch, Scikit-learn).
Strong experience with Google Cloud Platform (GCP) services, such as Vertex AI, BigQuery, Cloud Functions, and Kubernetes (GKE).
Solid understanding of MLOps practices, including CI/CD for machine learning models.
Experience with distributed computing and scaling AI workloads.
Knowledge of GPU/TPU optimization and orchestration.
Ability to work with cross-functional teams, including researchers, data scientists, and product engineers.
Strong written and verbal communication skills.
Demonstrated ability to diagnose and resolve complex system issues effectively.