SIC – open Scientific Image Curation Framework

Imaging data is increasingly central to scientific discovery, but as repositories grow in size and complexity, systematic curation, quality control, and metadata management become major bottlenecks for reproducibility, reuse, and FAIRness. A core challenge is fragmented, heterogeneous metadata across instruments, labs, and archival systems, which makes it hard to align descriptions, run repeatable cleaning workflows, and apply collection-wide consistency checks.

In parallel, recent progress in self-supervised learning enables content-based enrichment through semantic embeddings, supporting outlier detection, similarity search, and content-aware filtering.

To address this gap, we propose SIC (Scientific Image Curation), a modular, open-source framework that acts as an analytic layer on top of existing image storage such as OMERO or file systems. SIC will ingest image-associated metadata from diverse sources, harmonize it into a structured and searchable database, and enrich it with computed image statistics for quality assessment (e.g., histograms, Laplacian variance, noise estimation) and visual descriptors from pretrained deep learning models.

These visual fingerprints will enable scalable quality control, outlier detection, dataset shift analyses, and content-aware similarity search. A browser-based interface will let researchers interactively explore local or remote collections, inspect metadata and quality signals, filter and annotate subsets, and export curated data for downstream tasks such as model training or inference, while also allowing downstream results to be re-imported and assessed to keep information consistent end-to-end.

Interoperability is central: SIC will align with standards such as OME-XML and iFDOs and collaborate with relevant initiatives to ensure broad reuse. The framework will be validated through pilot use cases across bioimaging and remote sensing (and further domains), refined through iterative stakeholder feedback, and disseminated via documentation, workshops, and open-source community contributions.

SIC is a joint project between the Helmholtz Centres MDC and DKFZ funded within the HMC Project Cohort 2025.

Primary Contact Deborah Schmidt
Project Partners MDC, DKFZ
Research Fields Health
Project Duration 01.01.2026 - 31.12.2027