About the RoleDow Jones is seeking an experienced Data Engineer to join our AI Engineering Team.
You will be responsible for designing, developing, and maintaining robust data pipelines for data scraping, processing, extraction, transformation, loading, and storage.
You will collaborate within our team to ensure the efficient and reliable retrieval of data, enabling seamless integration with downstream systems for analysis and decision-making.As a key team member, you will play a crucial role in operationalizing data solutions to meet our organization's needs and deliver tangible value.
You will leverage your strong data engineering skills to develop robust, secure, and scalable data pipelines, utilizing your expertise in data retrieval and processing techniques.You Will:Collaborate with data scientists and ML engineers to design, develop, and maintain end-to-end data pipelines for extraction, transformation, loading (ETL), and storage.Clean, transform, and structure data using industry-standard techniques, ensuring quality and consistency.Work with APIs to retrieve data from external sources or integrate with third-party services, adhering to best practices.Manage and optimize SQL and NoSQL database systems for data storage, ensuring integrity and performance.Automate data fetching, processing, and storage by implementing data pipelines, leveraging ETL principles.Identify and troubleshoot issues related to data quality and pipeline performance, applying problem-solving skills.Communicate effectively with stakeholders and data providers to gather requirements and ensure project alignment.Stay updated with industry trends, emerging technologies, and best practices in data engineering and ETL processes.You Have:Bachelor's or Master's degree in Computer Science, Engineering, Data Science, or a related STEM field.At least 3 years of industrial experience in a data engineering roleExperience with web scraping techniques and data extraction methods.Solid understanding of data processing techniques, including extraction, transformation, and loading (ETL).Ability to work with APIs to retrieve data from external sources or integrate with third-party services.Experience with cloud-based infrastructure and services (e.g., AWS, GCP, etc.
).Familiarity with database systems, including SQL and NoSQL databases.Familiarity with NLP and Machine Learning frameworks and libraries (e.g., PyTorch, HuggingFace, LangChain, spaCy, NLTK, scikit-learn, etc.
)Experience in designing and implementing end-to-end data pipelines for web content retrieval and processing.Excellent problem-solving skills and attention to detail in identifying and troubleshooting issues.Strong communication and collaboration skills to work effectively with team members and stakeholders.Continuous learning mindset with a willingness to stay updated with industry trends and best practices.Our Benefits:Comprehensive Healthcare PlansPaid Time OffRetirement PlansComprehensive Insurance PlansLifestyle programs & Wellness ResourcesEducation BenefitsFamily Care Benefits & Caregiving SupportCommuter Transit ProgramSubscription DiscountsEmployee Referral Program