MicroMacS – Microbiome Machine for Synthesis

Microbiome Machine for Synthesis: Domain-specific application of FAIR data assessments for automated metadata harvesting
Sequencing revolutionized the life sciences and generated massive amounts of uniform, publicly available sequence data. A major hurdle to data reuse is the historical lack of curation of the sequence data and metadata, and the missing links between sequence data and peer-reviewed scientific articles that describe them and provide their context. Enriching metadata for previously archived sequence data is an urgent challenge. We will hire a researcher to develop MicroMacS, a publicly available, modular pipeline that will link articles to sequence data by text parsing, use AI to extract experimental metadata, and perform basic sequence quality checks, enabling users to easily assess dataset reusability, increasing interoperability, and enriching existing metadata. MicroMacS will become the update protocol for MiCoDa, the largest curated prokaryotic metabarcoding database in the world. MicroMacS will foster sequence data reuse and will accelerate Open Science research in the life sciences.
MicroMacS is a joint project between the Helmholtz Centres UFZ and HMGU funded within the HMC Project Cohort 2024.