This job is with PageGroup, an inclusive employer and a member of myGwork – the largest global platform for the LGBTQ+ business community.
Please do not contact the recruiter directly.Data Profiling Monitoring: Develop and enhance data profiling engines to assess completeness, validity, and integrity of datasets.
Collaborate on AI-driven cataloguing projects, including a Purview Proof of Concept (POC).
Monitor data lake quality and performance using tools such as Databricks/Cloud based data platforms.
Machine Learning for Data Quality: Apply ML techniques (e.g., Random Forest) to improve duplicate identification and matching.
Build models to validate taxonomy mapping using LLMs, similar to inferring roles/departments based on job titles.
Automation and API Integration: Automate data update processes using APIs, reducing reliance on manual scripts, extracts, and CSVs.
Design scalable solutions for automated data reconciliation and integrity checks.
Data Quality Analysis: Conduct detailed data quality assessments, measuring completeness, validity, and consistency across datasets.
Identify gaps in data pipelines and propose actionable solutions.
bbExpertise ML Model Development and Evaluation: Understanding statistical distributions and probabilities is key to choosing the right features, algorithms, and evaluation metrics for ML tasks (e.g., precision, recall, F1-score).
Advanced tasks like enhancing duplicate detection or inferring roles with LLMs may involve probabilistic approaches.
Data Quality Analysis: Quantifying and diagnosing data completeness, validity, and integrity often require statistical tests and descriptive analytics.
General Problem-Solving: Statistical reasoning aids in diagnosing anomalies, reconciling datasets, and creating predictive models.Must have Proficiency in Python (essential for ML and automation tasks).
Strong understanding of statistics and probability, including hypothesis testing, regression, and probabilistic reasoning.
Experience with machine learning techniques (e.g., Random Forest, clustering, or NLP-based models).
Solid grasp of data quality concepts: completeness, validity, reconciliation, and profiling.
Strong problem-solving skills and the ability to design scalable solutions.Should-Have: Hands-on experience with Databricks (for data lake monitoring and ML implementation).
Familiarity with data cataloguing tools like Purview or similar platforms.
Working knowledge of SQL and large datasets.Could-Have: Experience with R for statistical analysis or visualization.
Knowledge of LLMs for advanced text or taxonomy-related projects.
Familiarity with data governance frameworks or compliance requirements.
Meal vouchers Bonus Remote working (2 days per weeks) Medical insurance (after 6 months) Life insurance Private pension (after 2 years) Flexible compensation (after 6 months) July August 36h per week Holidays per year - 25 days 20 working days per year to work from abroad EAP - since day one #LI-DNI