GenPipes Test Datasets

You can execute various GenPipes Pipelines using the following types of data:

  • The real data which is generated from your genomic analysis instruments and then measured, sampled and read into various specified bioinformatics data formats.

  • Test datasets that are available in the absence of real genomic analysis data.

Test Dataset in the context of GenPipes refers to the dataset that needs to be analyzed by one of the GenPipes Pipelines. It can either be real data or sample data that is used to run the pipeline. Test dataset refers to some dataset that user can use to have hands-on on the pipeline. It is typically a smaller datasets (for e.g., one chromosome only and few samples for instance) so that the test runs of the pipelines using sample data get completed quickly say for demonstration purposes.

Test dataset is different from readset file which is input to the pipeline. For other kinds of inputs required for GenPipes pipelines, see here.

In contrast to the test dataset, a Readset File in the context of GenPipes actually describes the dataset (test dataset or real dataset) so that the pipeline can understand the type of data and process it. Readset file is provided as input to almost all the GenPipes pipelines. Readset file contains information about the data to analyze; the path of the raw files, the type of sequencing, the name of the samples, etc.

Note

Please remember to use the correct dataset for the respective GenPipes pipelines. The table below lists the test dataset download link for each of the GenPipes pipeline. Do not use the test dataset specified for a different pipeline.

GenPipes Pipeline

Test Dataset

HiC Pipeline

Download HiC Pipeline Dataset

Amplicon Seq

Download Amplicon Seq Dataset

ChIP Seq

Download ChIP Seq Dataset

CoV Seq

[Not available due to privacy]

DNA Seq

Download DNA Seq Dataset

epiQC

Download epiQC Dataset

Nanopore

Download Nanopore Dataset

Nanopore Covseq

Download Nanopore Covseq Dataset

RNA Seq

Download RNA Seq Dataset

Methyl Seq

Download Methyl Seq Dataset

PacBio Seq

Download PacBio Seq Dataset

TumorPair Seq

Download TumorPair Seq Dataset

Warning

PacBio Sequencing Pipeline is no longer available in GenPipes Release 3.2.0 and beyond.

Test Dataset Usage Examples

For various GenPipes pipelines, you can refer to usage examples and commands for issuing pipeline jobs using various options in the individual pipeline reference guide listed above or a short summary here.

Bioinformatic resources

If you are looking for Bioinformatic resources such as available genomes with FASTA sequence, aligner indices and annotation files listed on Bioinformatics resources C3G website page, you can download those from the public repositories using scripts provided in GenPipes Repository.

You can also download the latest test datasets from Computational Genomics website download page.