Introduction
A common preprocessing step in metagenomics data analysis e.g. working with host-associated studies is to remove the host-related DNA from the sequencing data. This is often referred to as contamination removal. A common approach to isolate host-related reads is performing a read mapping against the host genome. Bowtie2 is the reference tool for this task.
Since OmicsBox 2.0 this feature is available in the Metagenomics Module.
Contaminant Removal with Bowtie 2 in OmicsBox
- The Contaminant Removal tools accepts NGS reads in fasta, fastq single, and fastq paired-end format and separates the dataset into 2 parts, contaminant and contaminant-free reads. The output consists of files containing the separated reads.
- OmicsBox allows contaminant screening for a few preconfigured organisms: Human, Mouse, PhiX, etc. Custom target databases of any organism can be uploaded in fasta format.
- The output can directly be used for further analysis steps like taxonomic quantification with Kraken.
Note: The contaminant removal process as a taxonomic classification preprocessing step helps to reduce the number of unclassified and misclassified reads.
Note: The contaminant removal process can be repeated multiple times for further refinement (e.g. with different phylogenetically close target genomes).
References
- Langmead B. and Salzberg SL. (2012). Fast gapped-read alignment with Bowtie 2. Nature methods, 9(4), 357-9.
- Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G. and Durbin R. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics (Oxford, England), 25(16), 2078-9.
- Okonechnikov K., Conesa A. and Garcia-Alcalde F. (2016). Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics (Oxford, England), 32(2), 292-4.