Science clusters
Summary
Research Software (RS) has become a key asset to support, reuse and reproduce research outputs described in scientific publications. The growing acknowledgment of RS’s importance is reflected in initiatives, such as the Software Citation Principles, which advocate for the recognition of software contributions in academic curricula. However, the current state of Research Software metadata remains fragmented and inconsistent, hindering effective usage and compliance with FAIR principles. The CODEMETASOFT project aims to develop a comprehensive framework to streamline the management, enrichment, and propagation of RS metadata, thereby enhancing interoperability and supporting the scientific community's efforts to leverage software in research effectively.
Challenge
RS metadata is crucial for assessing RS FAIRness, and its significance in academia is increasingly recognised, but navigating the existing metadata landscape presents multiple challenges: 1) software metadata is currently disseminated in heterogeneous files and documentation, and there is currently no framework integrating them. As a result, developers usually have to manually duplicate efforts when creating metadata records. 2) Software metadata consistency and curation are performed by hand. 3) There is a lack of automated tools to suggest enhancements and fill gaps in RS metadata.
Solution
CODEMETASOFT will develop an innovative framework based on the CodeMeta standard. This framework will feature an Autocomplete CodeMeta Wizard adopting CodeMeta as a common metadata interchange format, to simplify the creation of metadata records and automate the management, enrichment and propagation of RS metadata across RS project files. The project will also introduce methodologies to compare similar (or complementary) metadata records, while detecting RS metadata gaps and enrichment suggestions to improve the quality and completeness of metadata records and README files.
Scientific Impact
The implementation of CODEMETASOFT is poised to enhance the quality and consistency of Research Software metadata across Europe, promoting better compliance with FAIR principles. By reducing the manual workload and providing automated solutions for metadata management, the project aims to foster a more interoperable and reliable ecosystem for research outputs. While the resources developed will be made publicly available under an Open Source licence, the developed tools will be integrated with both the ESCAPE Open Source Software and Service Repository - OSSR (which uses CodeMeta as a source of metadata) and the EOSC. With the increasing adoption of CodeMeta by European research infrastructures, the project has the potential to make a critical impact by improving the quality of metadata at its core without disrupting the workflows followed by RSEs.
Principal investigator
Daniel Garijo is a researcher at the Ontology Engineering Group of the Universidad Politécnica de Madrid. His line of research is at the intersection of Knowledge Capture, e-Science and Semantics, in particular on capturing the context and metadata of research software and computational experiments to promote their (re)usability.
Thomas Vuillaume is a data scientist and research software engineer working at LAPP, CNRS. His line of research focuses on developing data analysis pipelines, including machine and deep learning methods, to extract information from the Cherenkov Telescope Array (CTA) currently under construction.