FAIRFUN4Biodiversity project image

Science cluster

ENVRI - Environmental Sciences

Summary

FAIRFUN4Biodiversity aims to enhance functional annotation of genomic resources generated by the biodiversity genomics community, particularly for non-model organisms. By leveraging AI-driven methodologies, this project seeks to generate publicly accessible functional data, promote Open Science practices, and improve cross-domain interoperability. The project will ensure compliance with FAIR principles, and will expand the portfolio of open access tools readily available for the biodiversity genomics community, ultimately contributing to a deeper understanding of the functional landscape of non-model organisms.

Research domains:
Earth and environmental sciences
Partner(s):
Institute of Evolutionary Biology (CSIC-UPF), Andalusian Center of Developmental Biology (Ana Rojas, CABD-CSIC), Institute of Plant Molecular and Cell Biology (Aureliano Bombarely, IBMCP-CSIC)
Project team member(s):
Rosa Fernández, Ana Rojas, Aureliano Bombarely

Challenge

Open Science project, Cross-domain/Cross-RI

Understanding the evolution of coding genes and their functions is crucial in evolutionary biology, yet many protein-coding genes remain poorly characterised, particularly in non-model organisms. This lack of functional annotation, especially in what is termed the 'dark proteome' - genes  within a proteome without functional annotation - leads to incomplete models of evolutionary change and limits the identification of conserved or lineage-specific features. Traditional homology-based methods often fail to adequately transfer functional annotations. With the advent of initiatives such as the European Reference Genome Atlas or ATLASEA, where new genomes from non-model organisms are being sequenced and released daily, we need to leverage faster and scalable sequence-based functional prediction methods. In this line, embracing concepts from orthogonal disciplines, such as computer science and Artificial Intelligence (AI), could alleviate the problem.

Solution

FAIRFUN4Biodiversity addresses these challenges with FANTASIA (Functional ANnotation based on embedding space SImilArity), a novel pipeline leveraging AI models from natural language processing, which overcomes the current limitations of homology-based methods and recovers functional annotation with great informativeness for virtually all genes in a proteome. The tool is currently available as an open access Singularity container. 

To make it fully FAIR-compliant,  the project will i) generate functional annotation data of the genomic resources generated by the biodiversity genomics community, and make all data publicly available to the research community; (ii) provide publicly-available functional annotation of genomes generated by the current biodiversity genomics consortia, such as the European Reference Genome Atlas (ERGA) or ATLASEA; (iii) Engage with the biodiversity genomics community for knowledge transfer and connect to RIs within the Science Clusters such as ELIXIR or LifeWatch; (iv) Improve the tool by leveraging bilingual models that take into account both natural language processing algorithms and protein structure inference ones, such as ProstT51.

 

FAIRFUN4Biodiversity project image

Scientific Impact

This project significantly advances biodiversity genomics by addressing functional annotation for non-model organisms. By expanding publicly available functional annotation of the generated genomic resources, FAIRFUN4Biodiversity enhances downstream biological analyses and fosters collaboration among European RIs, such as ELIXIR and LifeWatch. Ultimately, it aims to deepen our understanding of the functional landscape of non-model organisms, stimulate Open Science practices in functional annotation of genomes of non-model organisms, improve cross-domain interoperability, and strengthen long-term coordination within the biodiversity genomics community at the European level.


Keywords
biodiversity genomics, biodiversity information, technology, data science, European Reference Genome Atlas, functional annotation, FANTASIA
Project start date:
Project duration:
24 months

Principal investigator

Rosa Fernandez - PI - FAIRFUN4Biodiversity project
Rosa Fernández
Institute of Evolutionary Biology, Spanish National Research Council (CSIC)
BIO

Rosa Fernández leverages phylogenomics and comparative genomics to understand how animals colonised land environments from marine ancestors (i.e., the origin of animal terrestrial biodiversity) and adaptation to life in caves. Ana Rojas investigates protein function and evolution and their relationship with structure /function through AI-based methods, in particular protein Language Models. Aureliano Bombarely is interested both in the development of bioinformatic tools and pipelines for genomic analysis, and in the study of how genomic information evolves associated to plant domestication and diversification using genomic tools.

QUOTE
"FAIRFUN4Biodiversity aims to revolutionise biodiversity genomics by using AI to decode gene function in non-model organisms, with emphasis on illuminating the 'dark proteome'. Building on collaboration across consortia, it promises major discoveries and progress in Open Science across Europe."