We are currently seeking a Senior Data Engineer with 5-7 years of experience.
The ideal candidate will have the ability to work independently within an AGILE working environment and experience working with cloud infrastructure leveraging tools such as Apache Airflow, Databricks, DBT, and Snowflake.
Familiarity with real-time data processing and AI implementation, including generative AI, is highly advantageous.
Responsibilities: Design, build, and maintain scalable and robust data pipelines to support analytics and machine learning models, ensuring high data quality and reliability for both batch & real-time use cases.Design, maintain, and optimize data models and data structures in tools such as Snowflake and Databricks.Leverage Databricks and Cloud-native solutions for big data processing, ensuring efficient management of Spark jobs and seamless integration with other data services.Utilize PySpark and/or Ray to build and scale distributed computing tasks, enhancing the performance of machine learning model training and inference processes.Monitor, troubleshoot, and resolve issues within data pipelines and infrastructure, implementing best practices for data engineering and continuous improvement.Integrate generative AI capabilities into data pipelines and workflows to support advanced use cases such as data enrichment, automated content generation, and natural language processing.Collaborate with machine learning engineers to optimize generative AI workflows, ensuring seamless deployment and scalability in production environments.Develop APIs and tools to enable internal teams to consume generative AI models and services efficiently.Stay informed about advancements in generative AI technologies and recommend their adoption to improve business processes and analytics capabilities.Diagrammatically document data engineering workflows and generative AI integrations.Collaborate with other Data Engineers, Product Owners, Software Developers, and Machine Learning Engineers to implement new product features by understanding their needs and delivering on time. Qualifications: Minimum of 5 years of experience deploying enterprise-level scalable data engineering solutions.Strong examples of independently developed data pipelines end-to-end, from problem formulation, raw data, to implementation, optimization, and results.Proven track record of building and managing scalable cloud-based infrastructure on AWS (incl.
S3, Dynamo DB, EMR).Experience implementing and managing AI model lifecycles in production, including generative AI models.Familiarity with tools like OpenAI API, Hugging Face Transformers, or equivalent platforms for generative AI.Strong experience using Apache Airflow (or equivalent), Snowflake, and Lucene-based search engines.Advanced SQL and Python knowledge with associated coding experience.Experience with Databricks (Delta format, Unity Catalog).
#J-18808-Ljbffr