Science cluster
Summary
The ERGA (European Reference Genome Atlas) and ENA (European Nucleotide Archive) play vital roles in preserving genomic diversity, contributing to the Earth BioGenome Project (EBP), which aims to sequence all eukaryotic species. As the need to understand species' resilience to climate change intensifies, effective genome annotation becomes crucial. Our research group has developed SQANTI3, a quality control tool for long-read transcriptomic data, which filters inaccurate models to enhance genome annotations. The project team proposes leveraging SQANTI3 alongside the FAIR principles to create an open-source, standardised IsoSeq evidence-driven annotation pipeline. This initiative will refine genome annotation practices, support broader European and global genomic projects, and facilitate the characterisation of genomes for all species on Earth, addressing existing challenges in data integration and accuracy.
Challenge
Open Science project, Open Science Service
The project tackles the urgent need to understand species' resilience to climate change, impacting ecosystems and conservation efforts. A major challenge lies in improving genome annotation, essential for organising and preserving genomic data. While advancements in long-read sequencing (lrRNA-seq) support large-scale projects like the Earth BioGenome Project, integrating this data into annotation pipelines remains problematic due to noise and inaccuracies. Current guidelines lack specific instructions for utilising full-length RNA sequences, highlighting the need for better data integration and reuse in future annotation efforts.
Solution
The project addresses a critical gap in the current genome annotation landscape for biodiversity by developing a thoroughly benchmarked pipeline for long-read-driven genome annotation within the ERGA pilot project. This pipeline will be designed for seamless integration with existing genome annotation efforts, significantly enhancing the quality of annotations. Additionally, it will effectively utilise existing IsoSeq data and promote the generation of new data to further improve annotation accuracy.
Scientific Impact
This project is expected to significantly enhance the quality of both current and future genome annotations through the development of an improved annotation pipeline and the effective utilisation of existing sequencing data in open-access databases.
Principal investigator
Ana Conesa is a Research Professor at the Institute for Integrative Systems Biology, Spain, interested in understanding genome-wide functional aspects of gene expression. She has pioneered the development of tools for genome annotation, multi-omics integration, and long-read transcriptomics, including Blast2GO and SQANTI. A strong drive in her research is helping the genomics community to bridge the gap between data and knowledge by creating bioinformatics tools that everybody can use. She is co-founder of Biobam Bioinformatics, a start-up that provides bioinformatics tools for biologists.