CODEMETASOFT | OSCARS

Science clusters

ENVRI - Environmental Sciences

ESCAPE - Astronomy, Nuclear and Particle Physics

LS RI - Life Sciences

PANOSC - Photon and Neutron Science

SSHOC - Social Sciences and Humanities

Other

Software, AI & tools

Summary

Research Software (RS) has become a key asset to support, reuse and reproduce research outputs described in scientific publications. The growing acknowledgment of RS’s importance is reflected in initiatives, such as the Software Citation Principles, which advocate for the recognition of software contributions in academic curricula. However, the current state of Research Software metadata remains fragmented and inconsistent, hindering effective usage and compliance with FAIR principles. The CODEMETASOFT project aims to develop a comprehensive framework to streamline the management, enrichment, and propagation of RS metadata, thereby enhancing interoperability and supporting the scientific community's efforts to leverage software in research effectively.

Research domains:

The project approach may be applied to many domains (Open Science and Research Software Development)

Partner(s):

Universidad Politécnica de Madrid, Laboratoire d'Annecy de Physique des Particules (LAPP), CNRS

Project team member(s):

Daniel Garijo (co-PI, UPM), Thomas Vuillaume (co-PI, LAPP), Anas El Hounsrti, Oscar Corcho

Challenge

RS metadata is crucial for assessing RS FAIRness, and its significance in academia is increasingly recognised, but navigating the existing metadata landscape presents multiple challenges: 1) software metadata is currently disseminated in heterogeneous files and documentation, and there is currently no framework integrating them. As a result, developers usually have to manually duplicate efforts when creating metadata records. 2) Software metadata consistency and curation are performed by hand. 3) There is a lack of automated tools to suggest enhancements and fill gaps in RS metadata.

Solution

CODEMETASOFT will develop an innovative framework based on the CodeMeta standard. This framework will feature an Autocomplete CodeMeta Wizard adopting CodeMeta as a common metadata interchange format, to simplify the creation of metadata records and automate the management, enrichment and propagation of RS metadata across RS project files. The project will also introduce methodologies to compare similar (or complementary) metadata records, while detecting RS metadata gaps and enrichment suggestions to improve the quality and completeness of metadata records and README files.

Scientific Impact

The implementation of CODEMETASOFT is poised to enhance the quality and consistency of Research Software metadata across Europe, promoting better compliance with FAIR principles. By reducing the manual workload and providing automated solutions for metadata management, the project aims to foster a more interoperable and reliable ecosystem for research outputs. While the resources developed will be made publicly available under an Open Source licence, the developed tools will be integrated with both the ESCAPE Open Source Software and Service Repository - OSSR (which uses CodeMeta as a source of metadata) and the EOSC. With the increasing adoption of CodeMeta by European research infrastructures, the project has the potential to make a critical impact by improving the quality of metadata at its core without disrupting the workflows followed by RSEs.

Results

Gap analysis of the codemeta files in the ESCAPE OSSR: Analysis of metadata mismatch between codemeta and other metadata sources in code repositories belonging to the ESCAPE OSSR.
Gap analysis of the best research software best practices in the European Science Clusters: A report publication indicating the gaps in metadata adoption, as well as recommendations on how to address them.
Software metadata pitfall analysis dashboard: A first overview of the common metadata pitfalls performed by researchers when describing software tools (https://anas-elhounsri.github.io/).
Auto-codemeta generator: A form that will automatically gather metadata from a GitHub or GitLab repository and guide users when generating CodeMeta files | Demo

Publications

El Hounsri, A., & Garijo, D. (2025, April). Good practice versus reality: A landscape analysis of Research Software metadata adoption in European Open Science Clusters. In 2025 IEEE/ACM 22nd International Conference on Mining Software Repositories (MSR) (pp. 116-128). IEEE. | DOI

Events

28-29 April, 2025 | Ottawa, Canada - International Conference on Mining Software Repositories - Presentation of the peer reviewed publication “Good practice versus reality: A landscape analysis of Research Software metadata adoption in European Open Science Clusters” | Presentation

Keywords

Research Software, Software Citation Principles, Research Software metadata, CodeMeta

Project start date:

1 November 2024

Project duration:

24 months

Principal investigator

Daniel Garijo and Thomas Villaume

Universidad Politécnica de Madrid and LAPP

BIO

Daniel Garijo is a researcher at the Ontology Engineering Group of the Universidad Politécnica de Madrid. His line of research is at the intersection of Knowledge Capture, e-Science and Semantics, in particular on capturing the context and metadata of research software and computational experiments to promote their (re)usability.

Thomas Vuillaume is a data scientist and research software engineer working at LAPP, CNRS. His line of research focuses on developing data analysis pipelines, including machine and deep learning methods, to extract information from the Cherenkov Telescope Array (CTA) currently under construction.

QUOTE

"This project aims to increase the availability of software metadata records in Research Software repositories by incorporating metadata enrichment pipelines in the software development practices used by scientists."

Resources