RNA Sequencing (Light) Pipeline

Version 6.1.0

Usage

Schema

Figure: Schema of RNA Sequencing (Light) pipeline

Steps

Picard SAM to FASTQ

Convert SAM/BAM files from the input readset file into FASTQ format if FASTQ files are not already specified in the readset file. Do nothing otherwise.

Trimmomatic

Raw reads quality trimming and removing of Illumina adapters is performed using Trimmomatic Tool. If an adapter FASTA file is specified in the config file (section ‘trimmomatic’, param ‘adapter_fasta’), it is used first. Else, ‘Adapter1’ and ‘Adapter2’ columns from the readset file are used to create an adapter FASTA file, given then to Trimmomatic. For PAIRED_END readsets, readset adapters are reversed-complemented and swapped, to match Trimmomatic Palindrome strategy. For SINGLE_END readsets, only Adapter1 is used and left unchanged.

This step takes as input files:

FASTQ files from the readset file if available
Else, FASTQ output files from previous picard_sam_to_fastq conversion of BAM files

Merge Trimmomatic Stats

The trim statistics per readset are merged at this step.

Kallisto

Run Kallisto on FastQ files for a fast estimate of abundance.

Kallisto Count Matrix

Use the output from Kallisto to create a transcript count matrix.

GQ Seq Utils Exploratory

Exploratory analysis using the gqSeqUtils R package adapted for RnaSeqLight.

Sleuth Differential Expression

Performs differential gene expression analysis using Sleuth. Analysis are performed both at a transcript and gene level, using two different tests: LRT and WT.

MultiQC

Aggregate results from bioinformatics analyses across many samples into a single report. MultiQC searches a given directory for analysis logs and compiles a HTML report. It’s a general use tool, perfect for summarizing the output from numerous bioinformatics tools. For details, refer to MultiQC Info.

About

This is a lightweight RNA Sequencing Expression analysis pipeline based on Kallisto technique. It is used for quick Quality Control (QC) in gene sequencing studies.

The central computational problem in RNA-seq remains the efficient and accurate assignment of short sequencing reads to the transcripts they originated from and using this information to infer gene expressions. Conventionally, read assignment is carried out by aligning sequencing reads to a reference genome, such that relative gene expressions can be inferred by the alignments at annotated gene loci. These alignment-based methods are conceptually simple, but the read-alignment step can be time-consuming and computationally intensive.

Alignment-free RNA quantification tools have significantly increased the speed of RNA-seq analysis. The alignment-free pipelines are orders of magnitude faster than alignment-based pipelines, and they work by breaking sequencing reads into k-mers and then performing fast matches to pre-indexed transcript databases. To achieve fast transcript quantification without compromising quantification accuracy, different sophisticated algorithms were implemented in addition to k- mer counting, such as pseudo-alignments by Kallisto technique and quasi-mapping along with GC and sequence-bias corrections using Salmon.

RNA Sequencing Light is a lightweight pipeline that performs quick QC and removes a major computation bottleneck in RNA Sequence analysis. Kallisto is two orders of magnitude faster than previous approaches and achieves similar accuracy. Kallisto pseudo-aligns reads to a reference, producing a list of transcripts that are compatible with each read while avoiding the alignment of individual bases. In the latest release of GenPipes, calls to kallisto quant are now aggregated by sample instead of by the readset for better performance.

See Schema tab for pipeline workflow. Check the README.md file for implementation details.

References

Kallisto, a new ultra-fast RNA Sequencing technique
Limitations of alignment-free tools in RNA sequencing quantification