EReMiD – Enhancing reuse of microbiome data

Enhancing reuse of microbiome data (EReMiD): AI-assisted data type categorization and ontology alignment across different disciplines
The Sequence Read Archive (SRA) holds most of the microbiome sequencing data, yet over 75% are not FAIR-compliant, impeding advances in health and environmental sciences.
This project aims to connect the Helmholtz HUBs Earth and Environment and Health through two complementary approaches: (1) AI-supported data type identification and correction to make orphaned data Findable and Reusable, and (2) interconnecting ontologies from Human (HMGU), and Terrestrial (UFZ) research fields by applying dictionaries to unify up to 8.2 million SRA records. By aligning these ontologies, we will enhance data Accessibility and Interoperability across research fields in three Centres. Additionally, we will conduct workshops to train the next generation of young scientists through the Centres' graduate schools. These workshops will foster a Research Object Crate bridging the Health and Earth and Environment HUBs, promoting robust metadata standards within the Helmholtz Metadata Collaboration.
EReMiD is a joint project between the Helmholtz Centres UFZ and HMGU funded within the HMC Project Cohort 2024.