Science clusters

ENVRI - Environmental Sciences
ESCAPE - Astronomy, Nuclear and Particle Physics
LS RI - Life Sciences
PANOSC - Photon and Neutron Science
SSHOC - Social Sciences and Humanities
Other

Summary

Research Software (RS) has become a key asset to support, reuse and reproduce research outputs described in scientific publications. The growing acknowledgment of RS’s importance is reflected in initiatives, such as the Software Citation Principles, which advocate for the recognition of software contributions in academic curricula. However, the current state of Research Software metadata remains fragmented and inconsistent, hindering effective usage and compliance with FAIR principles. The CODEMETASOFT project aims to develop a comprehensive framework to streamline the management, enrichment, and propagation of RS metadata, thereby enhancing interoperability and supporting the scientific community's efforts to leverage software in research effectively.

Research domains:
The project approach may be applied to many domains (Open Science and Research Software Development)
Partner(s):
Universidad Politécnica de Madrid, Laboratoire d'Annecy de Physique des Particules (LAPP), CNRS
Project team member(s):
Daniel Garijo (co-PI, UPM), Thomas Vuillaume (co-PI, LAPP), Anas El Hounsrti, Oscar Corcho

Challenge

RS metadata is crucial for assessing RS FAIRness, and its significance in academia is increasingly recognised, but navigating the existing metadata landscape presents multiple challenges: 1) software metadata is currently disseminated in heterogeneous files and documentation, and there is currently no framework integrating them. As a result, developers usually have to manually duplicate efforts when creating metadata records. 2) Software metadata consistency and curation are performed by hand. 3) There is a lack of automated tools to suggest enhancements and fill gaps in RS metadata.

Solution

CODEMETASOFT will develop an innovative framework based on the CodeMeta standard. This framework will feature an Autocomplete CodeMeta Wizard adopting CodeMeta as a common metadata interchange format, to simplify the creation of metadata records and automate the management, enrichment and propagation of RS metadata across RS project files. The project will also introduce methodologies to compare similar (or complementary) metadata records, while detecting RS metadata gaps and enrichment suggestions to improve the quality and completeness of metadata records and README files. 

Scientific Impact

The implementation of CODEMETASOFT is poised to enhance the quality and consistency of Research Software metadata across Europe, promoting better compliance with FAIR principles. By reducing the manual workload and providing automated solutions for metadata management, the project aims to foster a more interoperable and reliable ecosystem for research outputs. While the resources developed will be made publicly available under an Open Source licence, the developed tools will be integrated with both the ESCAPE Open Source Software and Service Repository - OSSR (which uses CodeMeta as a source of metadata) and the EOSC. With the increasing adoption of CodeMeta by European research infrastructures, the project has the potential to make a critical impact by improving the quality of metadata at its core without disrupting the workflows followed by RSEs. 


Keywords
Research Software, Software Citation Principles, Research Software metadata, CodeMeta
Project start date:
Project duration:
24 months

Principal investigator

Daniel Garijo and Thomas Villaume - PIs CODEMETASOFT project
Daniel Garijo and Thomas Villaume
Universidad Politécnica de Madrid and LAPP
BIO

Daniel Garijo is a researcher at the Ontology Engineering Group of the Universidad Politécnica de Madrid. His line of research is at the intersection of Knowledge Capture, e-Science and Semantics, in particular on capturing the context and metadata of research software and computational experiments to promote their (re)usability.

Thomas Vuillaume is a data scientist and research software engineer working at LAPP, CNRS. His line of research focuses on developing data analysis pipelines, including machine and deep learning methods, to extract information from the Cherenkov Telescope Array (CTA) currently under construction.

QUOTE
"This project aims to increase the availability of software metadata records in Research Software repositories by incorporating metadata enrichment pipelines in the software development practices used by scientists."