gVolante provides an online interface for completeness assessment of user’s original or publicly available sequence datasets as well as for browsing results of completeness assessment performed on publicly available genome and transcriptome assemblies.
Preparation of high-quality genome or transcriptome sequence datasets for a study system of one’s interest is a crucial step for modern biology, and can bring about various effects on downstream analyses. Commonly used metrics for assessing the quality of genome and transcriptome assemblies are based on sequence lengths, such as ‘N50 length’. In fact, those length-based metrics are superficial, and cannot take into account their composition, namely the coverage of genes and the accuracy of reconstructed sequences in there, which matter in various biological analyses. In contrast, assessment referring to a set of pre-selected conserved genes can provide a complementary metric of completeness taking the composition of given sequences into account.
In this web site gVolante, you can run completeness assessment on the set of sequences of your interest, by means of computing the coverage of pre-selected conserved genes, in addition to the sequence length-based metrics.
CVG: Core Vertebrate Genes
gVolante allows you to choose Core Vertebrate Genes (CVG), a new reference gene set of 233 ortholog groups that is compatible with completeness assessment particularly of vertebrate genomes and transcriptomes (Hara et al., 2015). Every group in CVG contains one-to-one orthologs as a single gene (without any paralogs generated in the vertebrate lineage nor gene loss) of all the vertebrate genomes including Chondrichthyes and Cyclostomata that were selected for screening. Our pilot assessments on genome assemblies of diverse vertebrates and embryonic transcriptome assemblies of the Madagascar ground gecko (Paroedura picta) demonstrated that evaluations referring to the CVG achieved higher accuracy and resolution than those that referred to other reference gene sets. The CVG data set is available online in our laboratory's web site.
Browse Completeness of Published Assemblies
The database section of gVolante allows you to browse completeness scores for publicly available genomic/transcriptome assemblies of dozens of vertebrate species. This data can be searched by keyword or filtered by scores, and each record is linked to an individual page with more details including assembly information, completeness scores, N50 sequence length, etc. The ‘Ortholog details’ page provides candidates of retrieved or missing ortholog genes from the analysis result. To confirm the incompleteness of the genes, you can proceed to a further analysis at aLeaves web server.