Science cluster

ESCAPE - Astronomy, Nuclear and Particle Physics

Summary

The MADDEN project is designed to tackle a key challenge in modern scientific research: facilitating seamless data access and sharing across multiple Research Infrastructures (RIs) in international collaborations. Scientific experiments, especially in fields such as gravitational wave (GW) research, often produce vast amounts of data stored in isolated Data Lakes. This fragmentation creates barriers to collaboration and data access. The MADDEN project aims to build a multi-RI Data Lake managed with Rucio, a robust open-source framework for data management, distribution and access, initially developed to meet the requirements of the ATLAS experiment in High Energy Physics (HEP), and now widely adopted across various scientific communities. The project enhances Rucio to create a unified Multi-RI Data Lake, supporting international efforts to share and analyse experimental data more effectively, all within the EOSC framework.

MADDEN project image
Research domains:
Astrophysics, Cosmology, Particle or Nuclear Physics
Partner(s):
Istituto Nazionale di Fisica Nucleare - INFN (COORDINATOR), Institut de Recherche en Mathématique et Physique - Université catholique de Louvain - IRMP - UCLouvain

Challenge

Open Science project, Open Science Service, Citizen science, Main RI concerned, Cross-domain/Cross-RI


Modern scientific research, especially in fields such as GW research, requires international collaboration. Each RI typically manages its own data in isolated Data Lakes, hindering easy access and shared analysis. The emerging need is to establish a common data infrastructure that allows scientists to securely access data across multiple RIs for more efficient collaboration and discovery, especially as projects such as the Einstein Telescope (ET) and Cosmic Explorer (CE) prepare for the future of GW research.

Solution

The MADDEN project aims to build a multi-RI Data Lake managed with Rucio. It will deploy the relevant services for the ET Data Lake based on Rucio, and build a multi-RI Data Lake by deploying a mock-up of a CE Data Lake, managed by an independent Rucio instance. Rucio authentication features will be extended to allow ET users to seamlessly access data from both ET and CE Rucio instances. 
The project will also advance and test RucioFS, a tool originally developed by INFN-Torino to provide a POSIX-like view of the Rucio catalogue in a multi-RI environment, implementing the required missing functionalities to be used in a production environment.Additionally, it is proposed to leverage Rucio metadata capabilities to incorporate the possibility of making rich metadata queries in RucioFS for filtering search results according to metadata conditions. 
Finally, the project aims to set up a demonstrator for the Virgo collaboration to test the technology.

Scientific Impact

All developments proposed in this project (a multi-RI capable Rucio, a production-ready RucioFS and scalable metadata queries in Rucio) are part of the ESCAPE distributed data management (DDM) roadmap, and are applicable also in contexts beyond the GW community. 
As Rucio is the DDM solution adopted by the ESCAPE Science Cluster within EOSC, this project is the first necessary architectural step towards a consistent support of Open Science and the FAIR data principles in the GW physics domain, beyond the current existing GW Open Science Center (GWOSC). Common frameworks such as Rucio and RucioFS would allow less experienced data analysts to easily access and browse experimental data, and bridge the gap for multi-messenger analysis typically performed by scientists belonging to different astrophysics domains and astronomers.


Keywords
Distributed data management (DDM), gravitational waves, gravitational wave research, data lake, multi-RI data lake, Einstein Telescope, Cosmic Explorer, Rucio, RucioFS
Project start date:
Project duration:
24 months

Principal investigator

Federica Legger PI MADDEN - Multi-RI Access and Discovery of Data for Experiment Networking
Federica Legger
INFN (National Institute for Nuclear Physics)
BIO

Dr. Federica Legger is a staff scientist at INFN (National Institute for Nuclear Physics). She studied Physics at the University of Turin in Italy, and graduated from EPFL (École Polytechnique Fédérale de Lausanne) in Switzerland with a thesis on the data acquisition electronics of the LHCb experiment at CERN. She is currently participating in distributed computing activities for the future gravitational wave interferometer Einstein Telescope, for the Virgo experiment at EGO (European Gravitational Observatory), and for the CMS experiment at the LHC (Large Hadron Collider). 

QUOTE
"The MADDEN project will create a unified, multi-RI Data Lake to foster collaboration across scientific communities. By enhancing Rucio for gravitational wave research and beyond, MADDEN accelerates discovery, ensuring transparent, accessible, and reproducible science for all."