Amplicon Sequencing Pipeline

Version 6.1.0

Warning

Amplicon supports only dada2 protocol by default. The Amplicon QIIME protocol is deprecated from GenPipes v5.x onward. To use QIIME` protocol, try an older version of GenPipes.

Usage

Schema

Steps

Trimmomatic16S

MiSeq raw reads adapter & primers trimming and basic QC is performed using Trimmomatic. If an adapter FASTA file is specified in the config file (section ‘trimmomatic’, param ‘adapter_fasta’), it is used first. Else, Adapter1, Adapter2, Primer1 and Primer2 columns from the readset file are used to create an adapter FASTA file, given then to Trimmomatic. Sequences are reversed-complemented and swapped.

This step takes as input files:

MiSeq paired-End FASTQ files from the readset file.

Merge Trimmomatic Stats

The trim statistics per readset are merged in this step.

Flash Pass 1

Perform first pass of FLASH. FLASH is a fast computational tool to extend the length of short reads by overlapping paired-end reads from fragment libraries that are sufficiently short.

Flash Pass 2

Perform second pass of FLASH to find the correct overlap between paired-end reads and extend the reads by stitching them together.

Amplicon Length Parser

In this step, we look at FLASH output to set amplicon lengths input for DADA2. As minimum eligible length, a given length needs to have at least 1% of the total number of amplicons.

Merge Flash Stats

The paired end merge statistics per readset are merged in this step.

Asva

This step checks for the design file required for the principal component analysis (PCA) based on amplicon sequence variant (ASV).

Run MultiQC

A quality control report for all samples is generated. See MultiQC documentation for details.

About

Amplicon sequencing (ribosomal RNA gene amplification analysis) is a highly targeted metagenomic pipeline used to analyze genetic variation in specific genomic regions. Amplicons are Polymerase Chain Reaction (PCR) products and the ultra-deep sequencing allows for efficient variant identification and characterization.

Uses of Amplicon sequencing

Diagnostic microbiology utilizes amplicon-based profiling that allows to sequence selected amplicons such as regions encoding 16S rRNA that are used for species identification.
Discovery of rare somatic mutations in complex samples such as tumors mixed with germline DNA.

GenPipes supports the DADA2 Amplicon sequencing protocol for recovering single-nucleotide resolved Amplicon Sequence Variants (ASVs) from the Amplicon data.

See Schema tab for the pipeline workflow. Check the README.md file for implementation details.

References