50-446 Rucio Data Management platform - project image

Science cluster

ESCAPE - Astronomy, Nuclear and Particle Physics

Summary

Rucio is an open source data management solution used by different nuclear, particle and astrophysics experiments, such as the LHC at CERN (ATLAS, CMS), AMS, DUNE, Belle II, ICARUS, LIGO/VIRGO, CTA, MAGIC and Rubin LSST, and is a key component of the ESCAPE Science Cluster. Exabytes of scientific (raw) data are managed with this technology and transferred worldwide for efficient distribution of data sets that need to reach end-users. However, existing workflows often require duplicating data to third-party systems to make it openly accessible. The project Streamlining open data policies in Rucio data management platform aims to enhance Rucio by embedding open data policies directly into its core, enabling seamless data sharing without duplication, and supporting interdisciplinary research.

Research domains:
Astrophysics, Cosmology, Particle or Nuclear Physics
Partner(s):
CERN
Project team member(s):
Hugo Gonzalez Labrador, Martin Barisits, Luis Obis Aparicio

Challenge

Open Science project, Open Science Service, Citizen science, Cross-domain/Cross-RI

The growing need for interdisciplinary research, particularly in multi-messenger science, has highlighted the importance of sharing scientific data across institutions performing complementary research (infrared, gravitational waves, etc.). However, institutions often rely on archaic or error-prone methods for data sharing. Moreover, given that Rucio does not recognise open data as a main data type, experiments need to copy this data to other systems to make them FAIR, incurring extra costs for the storage of the copy of the data. The main objective of this project is thus to introduce native support in Rucio to manage open data, where embargo and public access policies can be defined, so a copy of the data is not needed to make data open.

Solution

The project proposes a paradigm shift: instead of duplicating data, Rucio will link open data directly at the source, incorporating FAIR principles natively into its architecture. By managing embargoes and public access policies within Rucio itself, scientific data can remain within its custodial storage while being accessible as open data. This approach not only prevents unnecessary duplication but also ensures long-term preservation using Rucio’s robust replication mechanisms.

Scientific Impact

The integration of open data support within Rucio will benefit a wide range of research infrastructures, from particle physics to astronomy and beyond. The project will reduce costs, improve resource efficiency, and lower the environmental footprint by minimising data duplication. Moreover, it will foster collaboration across institutions and disciplines, empowering the broader scientific community, including citizen scientists, to access and exploit data more easily, further advancing scientific progress. 


Keywords
Rucio, multi-messenger science, open data policy
Project start date:
Project duration:
24 months

Principal investigator

Hugo Gonzalez Labrador - Rucio Management platform - PI photo
Hugo Gonzalez Labrador
CERN
BIO

Hugo Gonzalez is a Storage Engineer working at CERN's Storage and Data Management. Hugo has more than 10 years of experience in diverse software defined storage platforms.

His role includes operating and managing vast distributed storage systems that store over an exabyte of data.

He coordinates the Rucio project activities inside the CERN IT department and is a core member of the Rucio project. 

Prior to joining CERN, he was working as Software Engineer for ESA's HUMSAT project.

He is an enthusiast of open source technologies and his preferred programming language is Go.

QUOTE
"“This project aims at reducing the barrier to export scientific data to the citizens of the world by relying on the de facto scientific data management solution Rucio. Reducing the steps needed to release data as open is our main objective”."