Nanopore CoVSeQ Pipeline

Version 6.1.0

Usage

Example

user@machine:~$ genpipes nanopore_covseq -c $GENPIPES_INIS/nanopore/nanopore.base.ini \
                                            $GENPIPES_INIS/common_ini/rorqual.ini \
                                            $GENPIPES_INIS/nanopore_covseq/ARTIC_v4.1.ini \
                                         -r readset.default.nanopore_covseq.txt
                                         -g nanopore_covseq_cmd.sh

user@machine:~$ bash nanopore_covseq_cmd

Tip

Depending upon the cluster where you are executing the pipeline, substitute the file name rorqual.ini in the command with the appropriate <DRAC server cluster name>.ini file located in the $GENPIPES_INIS/common_ini folder.

For e.g., rorqual.ini, fir.ini, or narval.ini.

Caution

It is recommended that you use the -g GENPIPES_CMD.sh option instead of redirecting the output of the pipeline command to a file via > GENPIPES_CMD.sh.

user@machine:~$ genpipes [pipeline] [options] -g genpipes_cmd.sh

user@machine:~$ bash genpipes_cmd.sh

user@machine:~$ genpipes [pipeline] [options] > genpipes_cmd.sh

user@machine:~$ bash genpies_cmd.sh

The > scriptfile method is supported but will be deprecated in a future GenPipes release.

Test Dataset

You can download the test dataset for this pipeline here.

Test Datasets

Schema

Steps

Guppy Basecall

This step uses the Oxford Nanopore basecaller, Guppy to basecall raw FAST5 files and produce FASTQ files. Basecalling model dna_r9.4.1_450bps_hac.cfg is used by default.

Guppy Demultiplex

This step uses he Oxford Nanopore basecaller Guppy to demultiplex FASTQ files based on their barcode. Barcode arrangement barcode_arrs_nb96.cfg is used by default.

Note

In the Guppy Demultiplex call, the following parameter, `--require_barcodes_both_ends`, is set by default.

pycoQC

In this step, pycoQC Software is used produce an interactive quality report based on the summary file and alignment outputs. PycoQC relies on the sequencing_summary.txt file generated by Guppy. If needed, it can also generate a summary file from basecalled FAST5 files. PycoQC computes metrics and generates interactive QC plots for Oxford Nanopore technologies sequencing data.

Host Reads Removal

This step uses a mapping approach with a hybrid GRCh38 + SARS-CoV2 genome. The reads that map to the Human Genome are removed from the analysis. A “de-hosted” FASTQ is produced.

Kraken Analysis

Kraken is a taxonomic sequence classifier that assigns taxonomic labels to short DNA reads. It does this by examining the k-mers within a read and querying a database with those k-mers.

Additionally, Kraken2 is used to produce a report on the raw data, which can be used to detect additional host contamination.

ARTIC Nanopolish

The ARTIC Nanopolish pipeline is used to produce consensus sequences and VCFs. Since Nanopolish is used, this step requires both FAST5 and FASTQ files.

Wub Metrics

Wub Package is used to calculate alignment metrics in this pipeline step.

CoVSeQ Metrics

Using all previous metrics calculated so far, a table is produced with a summary of all metrics for each individual sample.

SnpEff Annotate

The VCF produced by ARTIC Nanopolish step is annotated using SnpEff.

Quast Consensus Metrics

Consensus metrics are calculated using the tool QUAST.

Rename Consensus Header

A final consensus sequence is produced, with the appropriate header and naming convention based on genome completeness.

Prepare Report

Using ncov-tools package and additional R scripts, final reports are produced for all samples in the run, including basic QC plots as well as a preliminary lineage assignment through ncov-tools package.

About

The Nanopore CoVSeQ pipeline is used to analyze long reads produced by the Oxford Nanopore Technologies (ONT) sequencers.

The SOP for Nanopore data is based on the ARTIC SARS-CoV2 protocol, Version 4 / 4.1 (V4.1), using nanopolish. This protocol is closely followed in GenPipes Nanopore sequencing pipeline with majority of changes related to technical adaptation of the protocol to be able to run in a High Performance Computing (HPC) environment. In such environments, Conda is not advisable.

Key steps in this pipeline include basecalling with Guppy, demultiplexing, read filtering and consensus sequencing. Basecalling with Guppy happens only if the `-t basecalling` option is selected.

If basecalling protocol option is selected through the -t command line option, the Nanopore CoVSeQ pipeline will do basecalling with Guppy (GPU) and demultiplexing. After basecalling, the pipeline performs de-hosting, for all the samples, followed by running the ARTIC-Nanopolish wrapper which performs alignment to the SARS-CoV2 reference (using minimap2), variant calling (using Nanopolish software). The Nanopolish software performs signal-level analysis of Oxford Nanopore sequencing data. After Nanopolish processing, the pipeline performs consensus generation through artic_mask and bcftools consensus steps. Lastly, custom scripts and ncov_tools are run to report on quality metrics for Nanopore CoVSeQ GenPipes Sequencing Pipeline.

Details of structure and contents of the Nanopore readset file are available here.

See See Schema tab for the pipeline workflow. For more details, refer to the README file file.

References

nCoV-2019 novel coronavirus bioinformatics protocol
Phylogenetic Analysis of nCoV-2019 genome using publicly shared genome sequences with datasets from NCBI or GISAID.
Tiling Amplicon sequencing and downstream bioinformatics analysis

Sample	Readset	Run	Flowcell	Library	Summary	FASTQ	FAST5
sampleA	readset1	PAE00001_abcd123	FLO-PRO002	SQK-LSK109	path/to/readset1_sequencing_summary.txt	path/to/readset1/fastq_pass	path/to/readset1/fast5_pass
sampleA	readset2	PAE00002_abcd456	FLO-PRO002	SQK-LSK109	path/to/readset2_sequencing_summary.txt	path/to/readset2/fastq_pass	path/to/readset2/fast5_pass
sampleA	readset3	PAE00003_abcd789	FLO-PRO002	SQK-LSK109	path/to/readset3_sequencing_summary.txt	path/to/readset3/fastq_pass	path/to/readset3/fast5_pass
sampleA	readset4	PAE00004_abcd246	FLO-PRO002	SQK-LSK109	path/to/readset4_sequencing_summary.txt	path/to/readset4/fastq_pass	path/to/readset4/fast5_pass