Location: Hibrid / Office in Barcelona (Spain).
Date of Publication:01/11/2024 About Us: CNAG (Centro Nacional de Análisis Genómico)The Centro Nacional de Análisis Genómico (CNAG) is one of the largest Genome Sequencing Centers in Europe.
The CNAG Consortium aims to carry out large-scale projects in DNA/RNA analysis for the improvement of quality of life in collaboration with the Spanish, European and International Research Community.
CNAG researchers participate in major International Genome Initiatives such as the Human Cell Atlas (HCA), the International Cancer Genome Consortium (ICGC), the International Human Epigenome Consortium (IHEC), the International Rare Diseases Research Consortium (IRDiRC), the European Reference Genome Atlas (ERGA) and the European Infrastructure for life-science information (ELIXIR), as well as in several EU-funded projects.
The RoleWe have an opening for a Data Engineer to play a key role in several cancer and rare diseases related projects, such as Genomed4all (https://genomed4all.eu/) and EJP-RD (https://www.ejprarediseases.org/).For Genomed4all we are developing a platform for federated learning based on flower (https://flower.ai/) and mlflow (https://mlflow.org/).
In EJP-RD we are further developing the RD-Connect GPAP (https://platform.rd-connect.eu/) and contributing to the EJP-RD Virtual Platform of data and resources.
With the supervision of the lead of the Data Platforms and Tools Development team and in collaboration with cancer specialists, bioinformaticians and software engineers, the successful candidate will implement the data infrastructure and back-end of the product for the federated platform and cancer platform.
The TeamThe successful candidate will join the Data Platforms and Tools Development team, coordinated by Dr. Davide Piscia (https://www.cnag.crg.eu/teams/bioinformatics-unit/data-platforms-and-tools-development).
The team is part of the CNAG Bioinformatics Unit (led by Dr. Sergi Beltran), which has over 30 members and offers continuous growth and support on a professional level.The team works in a stimulating scientific environment, applying state-of-the-art technologies to breakthrough research projects in Genomics that have an impact on people's health.
Responsibilities Implement pipelines in Apache SparkIntegrate Machine learning models into a federated learning platformIntegrate pipelines in Jenkins pipeline or NextFlow workflow manager systemsCollaborate with back-end developers and bioinformaticians to integrate data into platformsBenchmark, develop and implement services and queries on SQL (Postgres) and NoSQL databases (Clickhouse, Elasticsearch, MongoDB, etc.
)Gather and address technical and design requirementsFollow emerging technologies RequirementsBachelor degree or Master degree in Computer science or related fieldsA minimum experience of 2 years in a related position on software development, preferentially as a Data engineer.Hands on experience with programming languages like Python, Scala, Rust and similarUnderstanding of pipeline orchestrationKnowledge of distributed computing (Apache Spark, Apache Flink or similar)Good organisational, prioritising, communication and interpersonal skillsGood spoken and written English Nice to haveExperience with genomics and clinical dataExperience with federated learning framework ( flower, pysyft,etc..)Experience with work-flow orchestrator (Jenkins pipeline, Nextflow, Airflow, prefect, snakemake, etc.
)Experience with databases (Postgres, Clickhouse, Elasticsearch, Cassandra, etc.
)Experience with MlOps ( mlflow)Experience with data pipeline testing The Offer Contract duration: Open-ended contractEstimated annual gross salary: Salary is commensurate with qualifications and consistent with our pay scales.Target start date: as soon as possible BenefitsHighly stimulating environment with state-of-the-art infrastructures, and unique Professional Career Plan and development opportunities.We offer and promote a diverse and inclusive environment and welcomes applicants regardless of age, disability, gender, nationality, race, religion or sexual orientation, in a collaborative and supportive environment.We are committed to reconcile a work and family life for our employees and are offering the opportunity to benefit from annual leave, full health and dental Insurance, flexible schedule, and the possibility of remote work.We look forward to receiving your application and discovering how you can contribute to CNAG's success!
How to Apply:All applications must include:A complete CV including contact details.Contact details of two referees.Cover Letter.All applications must be addressed to People Department – ******:Please submit your application by 31/01/2025Interview:Shortlisted candidates will be invited for interview at CNAG on 01/02/2025 See the CNAG Career site at our website: https://www.cnag.eu/jobs# sequencing for a better life