Frequently Asked Questions
Here you can find common questions about metadata and the HMC.
If you have additional questions, please contact our HMC Office or the relevant metadata unit. If you have questions concerning the project call, please check its FAQs.
Metadata are data about data.
Metadata are generated and used when documenting your research and the resulting data. They contain information that establishes the findability, accessibility, interoperability and re-usability of data following the FAIR data principles.
For a detailed definition of the term metadata and other relevant terms please refer to our Glossary.
Metadata is one part of documenting your research process -- metadata describe your data and how you obtained it. Recording metadata, such as experimental protocols, makes it easier for you and your collaborators to design appropriate analyses. And in the future, you will already have all the details of what you did when it's time to publish or perform a new analysis.
When you share your data by uploading it to an online repository (which is often required by funding agencies and journals), you will probably need to submit associated metadata. Therefore considering metadata management in your project from the beginning will save you time later.
Regardless of how you publish or share your data, including metadata makes your research outputs easier for others to find and to use. Your work will be more visible, and cited more often!
One of the biggest hurdles for recording metadata – their availability – is connected to when metadata are recorded.
The effort to generate metadata retrospectively and attach it to a set of data is often very high. Therefore, the recording of metadata should always happen alongside or close to the generation of the research data itself.
In doing so, information is readily available and can be easily, automatically or semi-automatically recorded and saved alongside the research data. The knowledge of the researcher regarding generation, processing and analysis of the data can be documented in a structured manner and will consequently not be lost.
The question what metadata are appropriate for a given dataset, depends on both the scientific context and the dataset itself. The following aspects are essential to consider:
Vocabularies: Where possible metadata used for the description of a dataset should be aligned with an acknowledged and controlled vocabulary. This ensures that the captured descriptions remain generally understandable and interoperable with other datasets.
PID: Assigning a persistent identifiers (PIDs) such as a DOI to a research data set makes it findable and citable. This data set then becomes citable which increases the reputation as well as the visibility of the author(s). PIDs can have applications for the machine readability of metadata.
Repositories: In order to publish data a suitable repository needs to be identified. Requirements and demands towards quality control of the gathered data and/or long-term preservation in a given repository might differ depending on scientific domain. Re3Data or the databases of NIH and NHS are good starting points for a search.
Licensing: In order to clarify access and usage rights to tailor them for e.g. potential re-use of your data, an Open Content license can be assigned. For example, the licenses developed by Open Data Commons for open data and open databases can be used for research data.
Data privacy protection: Before publishing, you should check whether the data contain trade secrets or privacy rules and laws apply to your data. Further, funding decisions, employment or service contracts, might contain provisions that prevent the data from being published or allow publication only under certain conditions. Contact the data protection/privacy officer responsible for your research institute or search for applicable Data Policies. An overview of institutional policies can be found here.
The specific implementation of a metadata best-practices might differ depending on scientific context and application background and vary on scale from a single lab project to an institutional level.
HMC supports various use cases in order for you to be able to find an example that might be used as a scaffold for your own project. Further information can be found on the information pages of the Metadata Hubs.
All the information that is required to understand and interpret your research data properly, is what should be gathered as metadata.
In order to identify what metadata are explicitly required in your project, it can be helpful to set a data-management plan (DMP). For this, assistance from various sources can be utilized.
A minimal set of metadata should serve to answer questions like the following:
Who gathered the data?
When and where was the data gathered?
Why/for what purpose was the data recorded?
Which type of data was recorded?
How was the data gathered?
How was the data stored (file format/structure)?
How was the data pre-processed & analyzed (raw vs. analyzed data, pre-processing such as filtering, selection, or else)?
How was data quality assessed and guaranteed?
How can the data be accessed?
The HMC Office and the Metadata Hubs can be approached with further questions and for support. We want to help you to arrive at the optimal set of metadata for your project.
If you have questions regarding metadata you can always approach the HMC. You can find all relevant contact details here. We work together with and for researchers and data managers of the Helmholtz Association. Our work is integrated with national and international Initiatives and we also work with interested people from outside the HGF.
The “Nationale Forschungsdateninfrastruktur” (NFDI) consists of domain-specific consortia that work within their research domain and German community.
The “European Open Science Cloud” (EOSC) is an initiative of the European Commission that builds an Open Science infrastructure for Europe.
HMC works closely alongside both the NFDI and EOSC, and other initiatives in the field to make sure that our work contributes to the global research community.
Re3data can be helpful in order to identify domain specific repositories that host datasets.
The non-profit site DataCite can be used to search DOI referenced datasets.
Databases that are domain-agnostic might also be searched directly – i.e. Zenodo, Data Dryad or Figshare.
For domain specific questions, please do not hesitate to contact your Metadata Hub.
Triplestores are database systems optimised for storing RDF-style statements composed of triples. A triple describes information in terms of a subject, predicate, and object -- for example,
blue. All graph-like entities (e.g. OWL ontologies, knowledge graphs) can be stored in triplestores, as well as in formats like graph databases.