Job Description: Seeking a skilled Data Engineer with a robust background in PySpark and extensive experience with AWS services, including Athena and EMR. The ideal candidate will be responsible for designing, developing, and optimizing large-scale data processing systems, ensuring efficient and reliable data flow and transformation. Key Responsibilities: · Data Pipeline Development: Design, develop, and maintain scalable data pipelines using PySpark to process and transform large datasets. · AWS Integration: Utilize AWS services, including Athena and EMR, to manage and optimize data workflows and storage solutions. · Data Management: Implement data quality, data governance, and data security best practices to ensure the integrity and confidentiality of data. · Performance Optimization: Optimize and troubleshoot data processing workflows for performance, reliability, and scalability. · Collaboration: Work closely with data scientists, analysts, and other stakeholders to understand data requirements and deliver solutions that meet business needs. · Documentation: Create and maintain comprehensive documentation of data pipelines, ETL processes, and data architecture. Required Skills and Qualifications: · Education: Bachelor's or Master's degree in Computer Science, Engineering, or a related field. · Experience: 5 years of experience as a Data Engineer or in a similar role, with a strong emphasis on PySpark. · Technical Expertise: o Proficient in PySpark for data processing and transformation. o Extensive experience with AWS services, specifically Athena and EMR. o Strong knowledge of SQL and database technologies. o Experience with Apache Airflow is a plus o Familiarity with other AWS services such as S3, Lambda, and Redshift. · Programming: Proficiency in Python; experience with other programming languages is a plus. · Problem-Solving: Excellent analytical and problem-solving skills with attention to detail. · Communication: Strong verbal and written communication skills to effectively collaborate with team members and stakeholders. · Agility: Ability to work in a fast-paced, dynamic environment and adapt to changing priorities. Preferred Qualifications: · Experience with data warehousing solutions and BI tools. · Knowledge of other big data technologies such as Hadoop, Hive, and Kafka. · Understanding of data modeling, ETL processes, and data warehousing concepts. · Experience with DevOps practices and tools for CI/CD.