MIRO-LLMs – LLMs-Enhanced Knowledge Graphs for Integrative Radiooncology and Microbiome Bioinformatics

MIRO-LLMs is a project funded by the Helmholtz Metadata Collaboration (HMC) and carried out jointly by the Max Delbrück Center (MDC) and the German Cancer Research Center (DKFZ). The project aims to advance FAIR data practices and semantic interoperability across radiation oncology and microbiome research.

These domains generate highly heterogeneous data, ranging from clinical and preclinical metadata to molecular profiles and imaging, that are rarely interoperable or jointly exploitable. MIRO-LLMs addresses this challenge by developing a cross-domain, ontology-driven knowledge graph (KG) that enables structured integration, reuse, and analysis of these data.

The project builds on existing radiation oncology datasets that harmonize them using standards, and established biomedical ontologies. The project introduces new domain-specific concepts in both domains and aligns them with upper-level ontologies and/or schemas to ensure compatibility with the Helmholtz Knowledge Graph. This alignment supports sustainable data reuse and facilitates integration with other Helmholtz research infrastructures.
A key innovation of MIRO-LLMs is the use of large language models (LLMs) to improve accessibility to semantic technologies. The project explores LLM-based methods for translating natural-language research questions into SPARQL queries, enabling researchers to interact intuitively with knowledge graphs without requiring expertise in semantic query languages. This approach lowers technical barriers and enhances the usability of FAIR data infrastructures.

MIRO-LLMs follows a collaborative, user-centered approach, closely involving domain experts, data stewards, and infrastructure partners throughout the project. Beyond the technical implementation, it prioritizes clear and transparent documentation of workflows to empower reproducibility. In doing so, MIRO-LLMs contribute to the HMC mission by enhancing semantic interoperability, enabling cross-domain data discovery, and demonstrating how LLMs can enhance access to FAIR biomedical data.

Primary Contact Olga Ximena Giraldo Pasmin
Project Partners DKFZ, MDC
Research Fields Health
Project Duration 01.01.2026 - 31.12.2027