Science cluster
Summary
Macromolecular crystallography (MX) is a key technique in structural biology, enabling the determination of atomic structures and understanding of macromolecular function and physiological role. However, data from unsolved cases, low-quality diffracting crystals, or problematic experiments are typically abandoned or discarded, limiting the potential for information extraction and methods improvement. The Fail2Fair project seeks to recover value from these discarded crystallographic datasets by developing a pipeline to annotate, classify, and integrate them into SciCat metadata catalogue, an Open Research Data infrastructure established at the Paul Scherrer Institut - PSI. By enriching metadata with detailed descriptions of crystallographic failure modes, the project will make such datasets FAIR, while supporting the advancement in the development of new tools, including those based on AI.

Challenge
Current repositories such as the Protein Data Bank (PDB) only include successful and processed experiments - fewer than one in a thousand MX experiments - excluding vast amounts of potentially informative discarded or unsolved crystallographic data. Moreover, while SciCat supports the deposition of experimental data, it lacks the capacity to document why certain datasets failed or were discarded. This gap limits opportunities to learn from experimental shortcomings, impedes methodological improvement, and restricts the use of these datasets for tool development, including AI-based approaches.
Solution
To make discarded crystallographic datasets accessible to the scientific community, Fail2Fair will extend the SciCat metadata framework with a new schema for recording crystallographic failure modes. Using an AI-powered pipeline, the project will allow automatically annotating and classifying crystallographic datasets that have been discarded for various reasons. These include datasets with pathologies (e.g., twinning, anisotropy, or translational-non-crystallographic symmetry), low-quality datasets (e.g., poorly diffracting crystalline forms or lattice defects), and other crystallographic pitfalls.
The initiative will actively engage the structural biology community through outreach at conferences and meetings, online documentation, and dedicated workshops at PSI to present the project outcomes, as well as to gather feedback and input from the community to steer annotation. The scientific questions proposed aim to exemplify the use of the annotated data produced and encourage similar uses.
Scientific Impact
By making previously discarded crystallographic data FAIR, Fail2Fair will improve data sharing and reuse among researchers. The reanalysis of such data deepens our understanding of common crystallographic problems, helps us identify systematic experimental errors, and supports troubleshooting in future experiments, ultimately reducing experimental efforts.
Furthermore, the project will provide labeled training datasets for AI methods, towards the development of new tools for data processing, phasing, or model refining for challenging crystallographic data. These datasets could also serve as benchmarks for testing new algorithms under real-world conditions.
Fail2Fair is also committed to community engagement through targeted training and outreach activities designed to foster collaboration and promote Open Science practices.
More broadly, this project can serve as a model for other fields, encouraging the reporting of negative results and enabling the reuse of “failed” data.
Principal investigator
Isabel Usón is a developer of computational methods in structural biology and chemistry. Crystallography provides an atomic three dimensional view of the molecules, allowing us to understand processes in health and biotechnology. Isabel is ICREA Research Professor at the IBMB-CSIC in Barcelona and author of distributed software for structure solution and interpretation.