About
Apps
Protocols
Data
Workshops/Meetings

All tools are accessible as Apps in either the CyVerse Discovery Environment or in KBase's App Catalog. We plan to extend the list of tools for viruses as long as we continue to receive funding (and sometimes beyond). We’ve also included more generalized apps for metagenomics and microbial ecology available through the iMicrobe Project.

Below is a list of apps we've used in an iVirus protocol or used successfully with viral data. We'll do our best to keep this updated as frequently as time allows, though feel free to contact us if there’s any mistakes or omissions.

Quality Control
Assembly
Gene Calling / Annotation
Sequence Search
Viral Identification
Viral Analysis
Read-Based Analysis

Quality Control (QC)

Generally speaking, quality control (QC) is a technique most commonly applied to raw read data. This ensures that the data going into the assembly (common next step) is of high quality. Poor read quality can result in mis- or incorrectly assembled sequences. Most frequently, read data QC involves trimming reads according to their quality scores and removing barcoding sequences (if applicable). Although some assemblers do not require QC’d reads, we highly recommend it!

Sickle

CyVerse LinkKBase LinkOfficial WebsiteDOI

Reference:

Joshi NA, Fass JN. (2011). Sickle: A sliding-window, adaptive, quality-based trimming tool for FastQ files (Version 1.33) [Software]. Available at https://github.com/najoshi/sickle.

Scythe

CyVerse LinkKBase LinkOfficial WebsiteDOI

Reference:

Buffalo V. Scythe - A Bayesian adapter trimmer (version 0.994 BETA) [Software]. Available at https://github.com/vsbuffalo/scythe

Btrim

CyVerse LinkKBase LinkOfficial WebsiteDOI

Reference:

Kong, Y. (2011) Btrim: a fast, lightweight adapter and quality trimming program for next-generation sequencing technologies. Genomics. DOI: 10.1016/j.ygeno.2011.05.009

Trimmomatic

CyVerse LinkKBase LinkOfficial WebsiteDOI

Reference:

Bolger, A. M., Lohse, M., & Usadel, B. (2014). Trimmomatic: A flexible trimmer for Illumina Sequence Data. Bioinformatics, btu170.

FastQC

CyVerse LinkKBase LinkOfficial WebsiteDOI

Reference:

Andrews, S. (2010). FastQC:  A Quality Control Tool for High Throughput Sequence Data [Online]. Available online at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/

Assembly

Once reads have passed quality control and are 'cleaned', the next usual step is assemble them. Since reads are fragments of a longer DNA template, assembly attempts to piece back together the original DNA sequence from the short-reads. This process is called assembly, and results in commonly called 'contigs' - or contiguous sequences - that represent a larger piece of DNA from the original DNA library. Multiple assemblers are available, and have different methods and algorithms to piece back together contigs. The choice of assembler can also depend on the complexity of the genome, as well as the type of organism. For viruses, SPAdes or MetaSPAdes have yielded good results.

MetaSPAdes

CyVerse LinkKBase LinkOfficial WebsiteDOI

Reference:

Nurk, S., Meleshko, D., Korobeynikov, A. & Pevzner, P. A. metaSPAdes: a new versatile metagenomic assembler. Genome Res. 27, 824–834 (2017).

SPAdes

CyVerse LinkKBase LinkOfficial WebsiteDOI

Reference:

1Bankevich, A. et al. SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing. J. Comput. Biol. 19, 455–477 (2012).

IDBA-UD

CyVerse LinkKBase LinkOfficial WebsiteDOI

Reference:

Peng, Y., et al. (2012) IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth, Bioinformatics, 28, 1420-1428.

SOAPdenovo2

CyVerse LinkKBase LinkOfficial WebsiteDOI

Reference:

Luo et al.: SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. GigaScience 2012 1:18.

Gene Calling and Annotation

Post assembly, gene prediction is the next step. Genes determined from a gene prediction tool can be fed into numerous sequence analysis tools.

Prodigal

CyVerse LinkKBase LinkOfficial WebsiteDOI

Reference:

Hyatt, D. Prodigal (2.6.3) [Software]. Available at https://github.com/hyattpd/Prodiga

Prokka

CyVerse LinkKBase LinkOfficial WebsiteDOI

Reference:

Seemann T. Prokka: rapid prokaryotic genome annotation Bioinformatics 2014 Jul 15;30(14):2068-9. PMID:24642063

Sequence Search

Once genes are called (and sometimes that's not required), the real "fun" of analyzing viral sequence data begins. The tools featured here aren't virus-specific, but they're often used with viral data.

Diamond

CyVerse LinkKBase LinkOfficial WebsiteDOI

Reference:

B. Buchfink, Xie C., D. Huson, “Fast and sensitive protein alignment using DIAMOND”, Nature Methods 12, 59-60 (2015)

Viral Identification

Analyzing viral data remains a major challenge in the field of viral ecology. A variety of approaches have been proposed, each dependent on the source of data and the underlying biological question. A relatively recent method of analyzing complex viral data is by organizing viral sequence space, often through the use of protein clustering techniques. Protein clusters can be used as a diversity metric, or as units for ecological studies when compared against other datasets, or functional profiling of the community.

VirSorter2

CyVerse LinkKBase LinkOfficial WebsiteDOI

Reference:

Guo, J. et al. VirSorter2: a multi-classifier, expert-guided approach to detect diverse DNA and RNA viruses. Microbiome 9, 37 (2021).

VirSorter

CyVerse LinkKBase LinkOfficial WebsiteDOI

Reference:

Roux S, Enault F, Hurwitz BL, Sullivan MB. (2015) VirSorter: mining viral signal from microbial genomic data. PeerJ 3:e985

VIBRANT

CyVerse LinkKBase LinkOfficial WebsiteDOI

Reference:

Kieft, K., Zhou, Z. & Anantharaman, K. VIBRANT: automated recovery, annotation and curation of microbial viruses, and evaluation of viral community function from genomic sequences. Microbiome 8, 90 (2020).

MARVEL

CyVerse LinkKBase LinkOfficial WebsiteDOI

Reference:

Amgarten, D., Braga, L. P. P., da Silva, A. M. & Setubal, J. C. MARVEL, a Tool for Prediction of Bacteriophage Sequences in Metagenomic Bins. Front. Genet. 9, 1–8 (2018).

MArVD

CyVerse LinkKBase LinkOfficial WebsiteDOI

Reference:

Vik, D. R. et al. Putative archaeal viruses from the mesopelagic ocean. PeerJ 5, e3428 (2017).

DeepVirFinder

CyVerse LinkKBase LinkOfficial WebsiteDOI

Reference:

Ren, J. et al. Identifying viruses from metagenomic data by deep learning. (2018).

Viral Analysis

Analyzing viral data remains a major challenge in the field of viral ecology. A variety of approaches have been proposed, each dependent on the source of data and the underlying biological question. A relatively recent method of analyzing complex viral data is by organizing viral sequence space, often through the use of protein clustering techniques. Protein clusters can be used as a diversity metric, or as units for ecological studies when compared against other datasets, or functional profiling of the community. For taxonomic classification on phages, we recommend vConTACT-related software. For virus genome gene annotation, DRAM-v and Cenote-Taker2 can generate files containing annotation data.

vConTACT2-Gene2Genome

CyVerse LinkKBase LinkOfficial WebsiteDOI

Reference:

Bin Jang, H., Bolduc, B., Zablocki, O., Kuhn, J. H., Roux, S., Adriaenssens, E. M., … Sullivan, M. B. (2019). Taxonomic assignment of uncultivated prokaryotic virus genomes is enabled by gene-sharing networks. Nature Biotechnology.

vConTACT2

CyVerse LinkKBase LinkOfficial WebsiteDOI

Reference:

Bin Jang, H., Bolduc, B., Zablocki, O., Kuhn, J. H., Roux, S., Adriaenssens, E. M., … Sullivan, M. B. (2019). Taxonomic assignment of uncultivated prokaryotic virus genomes is enabled by gene-sharing networks. Nature Biotechnology.

vConTACT-PCs

CyVerse LinkKBase LinkOfficial WebsiteDOI

Reference:

Bolduc, B. et al. vConTACT: an iVirus tool to classify double-stranded DNA viruses that infect Archaea and Bacteria. PeerJ 5, e3243 (2017).

vConTACT

CyVerse LinkKBase LinkOfficial WebsiteDOI

Reference:

Bolduc, B. et al. vConTACT: an iVirus tool to classify double-stranded DNA viruses that infect Archaea and Bacteria. PeerJ 5, e3243 (2017).

DRAM-v

CyVerse LinkKBase LinkOfficial WebsiteDOI

Reference:

Shaffer, M. et al. DRAM for distilling microbial metabolism to automate the curation of microbiome function. Nucleic Acids Res. 48, 8883–8900 (2020).

Cenote-Taker2

CyVerse LinkKBase LinkOfficial WebsiteDOI

Reference:

Tisza, M. J., Belford, A. K., Domínguez-Huerta, G., Bolduc, B. & Buck, C. B. Cenote-Taker 2 democratizes virus discovery and sequence annotation. Virus Evol. 7, 1–12 (2021).

Cenote-Taker

CyVerse LinkKBase LinkOfficial WebsiteDOI

Reference:

Tisza, M. J. et al. Discovery of several thousand highly diverse circular DNA viruses. Elife 9, 1–26 (2020).

Read-Based Analysis

Analyses based on reads can be used for a variety of different reasons. Principle among them is estimating genome or population abundance.

Read2RefMapper

CyVerse LinkKBase LinkOfficial WebsiteDOI

Reference:

Bolduc, B., Youens-Clark, K., Roux, S., Hurwitz, B. L. & Sullivan, M. B. iVirus: facilitating new insights in viral ecology with software and community data sets imbedded in a cyberinfrastructure. ISME J. 11, 7–14 (2017).

BowtieBatch

CyVerse LinkKBase LinkOfficial WebsiteDOI

Reference:

Bolduc, B., Youens-Clark, K., Roux, S., Hurwitz, B. L. & Sullivan, M. B. iVirus: facilitating new insights in viral ecology with software and community data sets imbedded in a cyberinfrastructure. ISME J. 11, 7–14 (2017).

Go to Top