Amplicon Sequencing Pipeline

Version 4.6.0

About

Amplicon sequencing (ribosomal RNA gene amplification analysis) is a metagenomic pipeline. It is based on the established Quantitative Insights into Microbial Ecology (QIIME) procedure for amplicon-based metagenomics. It assembles read pairs using Fast Length Adjustment of Short Reads (FLASH), detects chimeras with UCHIME, and picks operational taxonomic units using VSEARCH. Operational taxonomic units are then aligned using PyNAST and clustered with FastTree. Standard diversity indices, taxonomical assignments, and ordinations are then calculated and reported graphically.

Besides QIIME Pipeline GenPipes also supports the DADA2 protocol, a relatively new processing workflow for recovering single-nucleotide resolved Amplicon Sequence Variants (ASVs) from amplicon data.

See Schema tab for the pipeline workflow.

Usage

Schema

Steps

	QIIME sequencing Steps	DADA2 sequencing Steps
	Trimmomatic16S Step	Trimmomatic16S Step
	Merge Trimmomatic Stats	Merge Trimmomatic Stats
	Flash Pass 1	Flash Pass 1
	Amplicon Length Parser	Amplicon Length Parser
	Flash Pass 2	Flash Pass 2
	Merge Flash Stats	Merge Flash Stats
	Catenate	ASVA
	UCHIME Step
	Merge UCHIME Stats
	OTU Picking
	OTU Rep Picking
	OTU Assigning
	OTU Table
	OTU Alignment
	Filter Alignment
	Phylogeny
	QIIME Report
	Multiple Rarefaction
	Alpha Diversity
	Collate Alpha
	Sample Rarefaction Plot
	QIIME Report 2
	Single Rarefaction
	CSS Normalization
	Rarefaction Plot
	Summarize Taxonomy
	Plot Taxonomy
	Plot Heatmap
	Krona
	Plot to Alpha
	Beta Diversity
	Principal Coordinate Analysis
	PCoA Plot
	Plot to Beta

Step Details

Following are the various steps that are part of GenPipes Amplicon Sequencing genomic analysis pipeline:

Trimmomatic16S

MiSeq raw reads adapter & primers trimming and basic QC is performed using Trimmomatic. If an adapter FASTA file is specified in the config file (section ‘trimmomatic’, param ‘adapter_fasta’), it is used first. Else, Adapter1, Adapter2, Primer1 and Primer2 columns from the readset file are used to create an adapter FASTA file, given then to Trimmomatic. Sequences are reversed-complemented and swapped.

This step takes as input files:

MiSeq paired-End FASTQ files from the readset file.

Merge Trimmomatic Stats

The trim statistics per readset are merged in this step.

Flash Pass 1

Perform first pass of FLASH. FLASH is a fast computational tool to extend the length of short reads by overlapping paired-end reads from fragment libraries that are sufficiently short.

Flash Pass 2

Perform second pass of FLASH to find the correct overlap between paired-end reads and extend the reads by stitching them together.

Amplicon Length Parser

In this step, we look at FLASH output to set amplicon lengths input for DADA2. As minimum eligible length, a given length needs to have at least 1% of the total number of amplicons.

Merge Flash Stats

The paired end merge statistics per readset are merged in this step.

Catenate

This step catenates all the reads in one file for further analysis. As input, it takes the merged FASTQ files from the previous FLASH step.

UCHIME

This step takes catenated FASTA file from the previous Catenate step and uses it for reference based chimera detection using VSearch tool.

Merge UCHIME Stats

The chimeric sequences filtered out statistics per readset are merged at this step.

Operational Taxonomic Unit (OTU) Picking

This step takes catenated FASTA file from the previous Catenate step as input. The OTU picking step (de novo & close_ref) assigns similar sequences to operational taxonomic units (OTUs) by clustering sequences based on a user-defined similarity threshold. Method per default uses the following tools:

OTU Representative Picking

After picking OTUs, this step pick a representative sequence for each OTU. This step takes as input files:

OTU file from previous step
Catenated and filtered FASTA file from filter_chimeras step.

OTU Assigning

Given a set of OTUs, this step attempts to assign the taxonomy of each OTU using Uclust algorithm. As input, it takes OTU representative sequence file from the previous step.

OTU Table

This step make a consensus OTU table in biom format. It tabulates the number of times an OTU is found in each sample, and adds the taxonomic predictions for each OTU.

This step takes as input files:

OTU picking file.
Taxonomy assignment for each OTU from the previous step.

OTU Alignment

OTU Alignment uses OTU representative sequence file as input and aligns the OTU representative sequence using PyNAST tool.

Filter Alignment

This step takes the alignment sequence file as input and filters it by removing positions that are gaps in every sequence.

Phylogeny

This step uses filtered alignment sequence file from previous step and builds a phylogenetic tree from a multiple sequence alignment using FastTree algorithm.

QIIME Report

In this step, the first part of the report for taxonomic affiliation is created.

Multiple Rarefaction

This is the first of the four steps (1/4) for rarefaction plot. It takes OTU non-rarefied table in biom format as input. It rarefies OTU table by random sampling (without replacement) at different depth in order to perform rarefaction analysis. You need to provide the minimum/maximum number of sequences per samples and the size of each steps between the min/max of seqs/sample.

Alpha Diversity

This is the second of the four steps (2/4) for rarefaction plot. It takes as input, multiple OUT rarefied table in biom format from the previous step and calculates alpha diversity on each sample using a variety of alpha diversity metrics (chao1, shannon, observed OTUs).

Collate Alpha

This is the third of the four steps (3/4) for rarefaction plot. It merges all the alpha diversity computed in the previous step.

Sample Rarefaction Plot

This is the fourth and last of the four steps (4/4) for rarefaction plot. Here, rarefaction curve for each sample is plotted.

QIIME Report 2

In this step, the second part of the report for taxonomic affiliation is created.

Single Rarefaction

Single rarefaction takes OTU table in biom format as input. This step is recommended. It subsamples (rarefy) all the samples to an equal number of sequences for further comparison. You have to provide the number of sequences to subsample per sample in the configuration file (single_rarefaction_depth).

CSS Normalization

CSS Normalization takes OTU table in biom format as input. This is the alternative method for normalization to rarefaction. The CSS Matrix normalization step is recommended.

Rarefaction Plot

In this step, rarefaction curve for each sample is plotted on the same plot.

Summarize Taxonomy

This is the first of three (1/3) steps for taxonomic affiliation plot. If available, it takes OTU rarefied table in biom format as input. Otherwise, it takes OTU non-rarefied table in biom format. It summarizes information of taxonomic groups within each sample at different taxonomic level.

Plot Taxonomy

This is the second of three (2/3) steps for taxonomic affiliation plot. It takes summarization information from the previous step as input and makes taxonomy summary bar plot based on taxonomy assignment.

Plot Heatmap

This is the third and last of three (3/3) steps for taxonomic affiliation plot. It takes summarized information from previous step and makes heatmap at phylum level.

Krona

This step plots the Krona chart for taxonomic affiliation.

Plot to Alpha

This step generates the first part of the final report for Amplicon sequencing pipeline. The report displays results related to taxonomy, heatmap and alpha diversity.

Beta Diversity

This is the first of three (1/3) steps for 2D PCoA analysis plot.

It takes the following data as input files:

OTU rarefied table in biom format
Tree file

This step calculates beta diversity (pairwise sample dissimilarity) on OTU table. The OTU table has to be normalized. Only works with >= 4 samples

Principal Coordinate Analysis (PCoA analysis) is the second step of three (2/3) for PCoA plot. It takes the matrix produced in the previous step and computes coordinates for PCoA.

PCoA Plot

This is the last of three (3/3) steps for 2D PCoA analysis plot.

Plot to Beta

Here the last part of the final report for Amplicon sequencing pipeline is generated. The report displays results related to beta diversity PCoA plots.

Asva

This step checks for the design file (required for PCoA Plot)

Details

Amplicon sequencing is a highly targeted gene sequencing approach used to analyze genetic variation in specific genomic regions. Amplicons are Polymerase Chain Reaction (PCR) products and the ultra-deep sequencing allows for efficient variant identification and characterization. Amplicon sequencing uses oligonucleotide probes that target and capture genomic regions of interest and then uses next-generation sequencing techniques.

Uses of Amplicon sequencing

Diagnostic microbiology utilizes amplicon-based profiling that allows to sequence selected amplicons such as regions encoding 16S rRNA that are used for species identification.
Discovery of rare somatic mutations in complex samples such as tumors mixed with germline DNA.

See More Information section below for details.

More information

For the latest implementation and usage details refer to Amplicon Sequencing implementation README.md file.