Tutorial for completeness assessment of genome/transcriptome sequences
Prepare a single multi-fasta file of genome or transcriptome assembly. You can also use a compressed multi-fasta file (.gz, .tgz, .bz2, .tbz, .tar or .zip). For this tutorial, we have prepared a test_file for your trial.
STEP 1. Prepare your file
Information about the test_file:
- File: test_data.zip
- File format: zipped multi-fasta
- Species: human
- Sequence Type: peptide (amino acid) - selected from a comprehensive sequence set
- Results: analysis results of the test_data
We recommend you to compress your fasta file before submitting it, as slow uploading speed is often fatal. After selecting a compressed fasta file, push the [UPLOAD FILE] button. DO NOT press the button twice, as it redoes file uploading and can cause a problem.
STEP 2. Upload the file
The field 'E-mail address' is optional but recommended to fill in. If you do so, you can receive the results even for a time-consuming analysis via an email. The value of cut-off length is used in computing N50 statistics. If you want to compute it using all the sequences in the given file, enter ‘1’.
STEP 3. Enter your project information
Push the [START YOUR ANALYSIS] button. After a validation of the submitted file, job information page will be shown, and then the server will start an analysis. If you don't input an E-mail address, save the Job_ID or the hyperlink of the results before you leave the page.
STEP 5. Start the analysis
Approximate time required for an analysis if no present queue:
CEGMA on genome: 2~3 days
BUSCO v2 on genome: 1~2 days
BUSCO v2 on transcriptome: 1~2 hours
BUSCO v1 on genome: 2 hours
BUSCO v1 on transcriptome: 30 minutes
Via an E-mail message or checking the results page, you can access the analysis results. Therein, gVolante reports the project information, completeness scores, and N50 sequence statistics. Completeness assessment results are classified into 'Complete', 'Duplication', 'Partial', and 'Missing'. For more information about the classification, please refer to the definitions of those categories obtained from the original articles introducing the individual programs. The 'Ortholog detail' page provides the record of individual retrieved and missing genes in the given set of reference genes. To further analyze the absence of a certain gene from a phylogenetic viewpoint, you can proceed to aLeaves web server.
STEP 6. Check the results
Optimizing and benchmarking de novo transcriptome sequencing: from library preparation to assembly evaluation.
Core Vertebrate Genes (CVG)
Hara Y, Tatsumi K, Yoshida M, Kajikawa E, Kiyonari H, Kuraku S.
BMC Genomics. 2015. 16: 977.
Assessing the gene space in draft genomes.
Parra G, Bradnam K, Ning Z, Keane T, Korf I.
Nucleic Acids Res. 2009. 37: 289-97.
BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs.
Simao FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM.
Bioinformatics. 2015. 31: 3210-2.
Waterhouse RM, Seppey M, Simão FA, Manni M, Ioannidis P, Klioutchnikov G, Kriventseva EV, Zdobnov EM.
Mol Biol Evol. 2017. doi: 10.1093/molbev/msx319.