Science cluster

LS RI - Life Sciences

Summary

(Standardised) content identification is becoming an essential requirement for bioimaging data reuse and the seamless integration of bioimaging data into AI models. Transparency regarding the origin of data, distinguishing between conventionally generated and AI-generated data, is essential to maintain data integrity and facilitate informed analysis. In this context, the International Standard Content Code - ISCC is a new, open-source identification system and global ISO standard, ensuring transparency, accessibility, and widespread adoption. The BIO-CODES project seeks to enhance the AI-readiness of bioimaging data by developing and implementing content-based identifiers, using ISCC. The project addresses the increasing complexity of bioimaging data, ensuring it adheres to FAIR principles, and prepares datasets for reliable use in AI-driven analyses. By integrating globally unique references into key platforms, such as OMERO, BIO-CODES aims to advance data integrity and streamline bioimage certification processes, ensuring transparency and reproducibility.

BIO-CODES project image
Research domains:
Life sciences
Partner(s):
Leiden University, ISCC Foundation

Challenge

Open Science project, Industry cooperation, Main RI concerned, Cross-domain/Cross-RI

As bioimaging data grows in complexity, much of it remains non-compliant with FAIR principles, which complicates data reuse and AI integration. Current methods for  identification and certification of bioimaging data lack the robustness needed for generative AI models, risking data integrity, data reliability, thus scientific reproducibility. Addressing these gaps is essential to ensuring bioimaging data’s value in Life Sciences research and mitigating biases in AI-driven analyses.

Solution

BIO-CODES will evaluate the ISO 24138 International Standard Content Code (ISCC) and apply it to bioimaging data in Life Science research. Generated from digital content using cryptographic and similarity hash algorithms, ISCC ensures data integrity and supports use cases like deduplication, database synchronisation, and data provenance. The project will test routine proprietary formats from imaging core facilities and engage vendors. If ISCC proves useful, recommendations will be made for integrating it into existing workflows and software applications used for authentication and certification of bioimages. The main goal is to assess ISCC's applicability and create a proof-of-concept, with integration into platforms like OMERO to ensure FAIR compliance.

*OMERO is one of the most versatile and widely used platforms for managing bioimaging data, offering tools for storage, organisation, and analysis. By seamlessly integrating globally unique identifiers into OMERO it will streamline AI-driven bioimaging data preparation while ensuring data integrity and FAIR compliance. Standardised authentication processes will enable researchers to share datasets confidently, enhancing transparency and reproducibility. This approach also addresses ethical concerns in AI, mitigating biases from unverified data and promoting reliable AI models in biomedical research.

Scientific Impact

BIO-CODES will enhance transparency and collaboration among researchers by introducing standardised content identifiers in bioimaging workflows. This enhances data reuse, mitigates ethical concerns in AI applications, and improves the reliability of AI models in Life Sciences. The ISCC’s ability to generate unique identifiers directly from digital files ensures ease of use and eliminates the need for manual management like DOIs. Its open-source nature supports decentralised digital content identification and encourages continuous improvement through collaboration among researchers, developers, and institutions.


Keywords
bioimaging, bioimaging data, International Standard Content Code, ISCC, ISO 24138 ISCC standard, AI applications, AI models, big data, data integrity, standardised content identifier
Project start date:
Project duration:
18 months

Principal investigator

Sylvia Le Dévédec
Sylvia Le Dévédec
Leiden University
BIO

Dr. Sylvia le Dévédec is an assistant professor and researcher specializing in cell biology, bioimaging, and cancer research. With a PhD in the field of life sciences, she has developed extensive expertise in high-content bioimaging and advanced data analysis, focusing on the integration of cutting-edge technologies in biomedical research. Sylvia's work emphasizes data integrity and reproducibility, which aligns with her leadership of the BIO-CODES project, where she aims to enhance the AI-readiness of bioimaging data. Her research contributes to Open Science initiatives, fostering collaboration across academia and industry. Through BIO-CODES, she is pioneering efforts to standardise bioimaging data, ensuring transparency, FAIR compliance, and the ethical application of AI in life sciences.

QUOTE
"BIO-CODES began as a bold idea to apply unique content identifiers to bioimaging data. In the future, we all might be using ISCC, the DNA of digital content, to ensure transparency, integrity, and ethical AI applications in life sciences—advancing Open Science and global collaboration."