FAIRification of IsoSeq Evidence-driven annotation of the biodiversity - representative image

Science cluster

LS RI - Life Sciences

Summary

The ERGA (European Reference Genome Atlas) and ENA (European Nucleotide Archive) play vital roles in preserving genomic diversity, contributing to the Earth BioGenome Project (EBP), which aims to sequence all eukaryotic species. As the need to understand species' resilience to climate change intensifies, effective genome annotation becomes crucial. Our research group has developed SQANTI3, a quality control tool for long-read transcriptomic data, which filters inaccurate models to enhance genome annotations. The project team proposes leveraging SQANTI3 alongside the FAIR principles to create an open-source, standardised IsoSeq evidence-driven annotation pipeline. This initiative will refine genome annotation practices, support broader European and global genomic projects, and facilitate the characterisation of genomes for all species on Earth, addressing existing challenges in data integration and accuracy.

Research domains:
Life Sciences
Partner(s):
Spanish National Research Council-Institute for Integrative Systems Biology
Project team member(s):
Ana Conesa, Alejandro Paniagua

Challenge

Open Science project, Open Science Service

The project tackles the urgent need to understand species' resilience to climate change, impacting ecosystems and conservation efforts. A major challenge lies in improving genome annotation, essential for organising and preserving genomic data. While advancements in long-read sequencing (lrRNA-seq) support large-scale projects like the Earth BioGenome Project, integrating this data into annotation pipelines remains problematic due to noise and inaccuracies. Current guidelines lack specific instructions for utilising full-length RNA sequences, highlighting the need for better data integration and reuse in future annotation efforts.

Solution

The project addresses a critical gap in the current genome annotation landscape for biodiversity by developing a thoroughly benchmarked pipeline for long-read-driven genome annotation within the ERGA pilot project. This pipeline will be designed for seamless integration with existing genome annotation efforts, significantly enhancing the quality of annotations. Additionally, it will effectively utilise existing IsoSeq data and promote the generation of new data to further improve annotation accuracy.

Scientific Impact

This project is expected to significantly enhance the quality of both current and future genome annotations through the development of an improved annotation pipeline and the effective utilisation of existing sequencing data in open-access databases.


Keywords
biodiversity, Earth BioGenome Project, genome annotation, genome characterisation, IsoSeq, IsoSeq evidence-driven annotation pipeline
Project start date:
Project duration:
24 months

Principal investigator

Ana Conesa - PI - FAIRification of IsoSeq Evidence-driven annotation of the biodiversity project
Ana Conesa
Spanish National Research Council-Institute for Integrative Systems Biology
BIO

Ana Conesa is a Research Professor at the Institute for Integrative Systems Biology, Spain, interested in understanding genome-wide functional aspects of gene expression. She has pioneered the development of tools for genome annotation, multi-omics integration, and long-read transcriptomics, including Blast2GO and SQANTI. A strong drive in her research is helping the genomics community to bridge the gap between data and knowledge by creating bioinformatics tools that everybody can use. She is co-founder of Biobam Bioinformatics, a start-up that provides bioinformatics tools for biologists.

QUOTE
"Earth Biodiversity Genome projects demand  new methods that leverage the newest sequencing technologies for genome annotation. We join the power of Long-read sequencing and the Open Source software SQANTi to develop a standardised genome annotation pipeline for the European Reference Genome Atlas."