Tutorial: GenPipes in the Cloud (GCP)

Usage Change Effective v5.x onward

Using v4.x?

You are recommended to:

  • Review the changes in v5.x and v6.x (See tabs)

  • Migrate to using the latest GenPipes v6.x

What has Changed?

Following changes are effective from GenPipes release v6.x onward:

  • Requires Python v3.12.0 or higher.

  • A new environment variable ‘GENPIPES_INIS’ is introduced for streamlining access to the config files in the genpipes commands. In the future, ‘MUGQIC_PIPELINES_HOME’ will be deprecated. This is applicable when using GenPipes deployed on the DRAC servers such as Rorqual.

    • Before

      $MUGQIC_PIPELINES_HOME/pipelines/<pipeline>/<pipeline>.base.ini
      
    • Now

      $GENPIPES_INIS/<pipeline>/<pipeline>.base.ini
      

    Note

    Please note that the old variable, ‘MUGQIC_PIPELINES_HOME’ will still be accessible and is still in use for instructions on how to deploy GenPipes locally, in the cloud, or in a container.

  • A new Long Read DNA Sequencing pipeline is now available in v6.0 that supports three protocols:

    • Nanopore

    • Nanopore Paired Somatic

    • Revio

Following changes are effective from GenPipes release v5.x onward:

  • Starting with v5.0, GenPipes uses Python packaging and no longer relies on Python modules.

  • If you were using Python v2.7, you must upgrade to Python v3.11.1.

  • To run any GenPipes pipelines, use the new command syntax:

    • Old Format

      user@rorqual% <pipeline name>.py [options] -g genpipes_cmd.sh
      user@rorqual% bash genpipes_cmd.sh
    • New Format

      user@rorqual% genpipes <pipeline name> [options] -g genpipes_cmd.sh
      user@rorqual% bash genpipes_cmd.sh
  • RNA Sequencing (De Novo) pipeline has been updated in v5.0 release.

  • EpiQC pipeline, HiC-Seq pipeline, and AmpliconSeq qiime protocol have been deprecated starting v5.0 onward.

  • The DNA-Seq high coverage pipeline and the TumorPair pipeline have been merged into a single workflow DNA-Seq.

  • The Methylseq pipeline has a new protocol option using the gemBS aligner in addition to Bismark.

  • Genome build GRCh38 (human) is now the default reference genome for all pipelines, but other versions or species can be selected via config files, as before.

    Danger

    When using the mouse genome, please note that the annotation files for GRCm38 do not work with the Homer analysis. Use mm10, instead of the GRCm38 program.

  • Markdown style reports have been deprecated for all pipelines starting v5.0 onward and replaced entirely with MultiQC reports.

GenPipes Wizard

Introducing the GenPipes Wizard, our latest tool that helps beginners dive right into genomic analysis using GenPipes. This intuitive wizard walks new users through picking the best deployment option, pipeline, and protocol, while automatically assembling the complete command for running GenPipes

v6.x Support for Cloud

We have not yet verified / released GenPipes support for GCP / Cloud in version 6.x release. The following tutorial works for GenPipes v5.x only.

GenPipes bioinformatics pipelines are developed as part of the GenAP project at the Canadian Centre for Computational Genomics (C3G).

This tutorial shows you how to run GenPipes in Google Cloud (GCP). It uses the “Try GCP for free” account to run a pipeline in the cloud.


Prerequisites

  1. Create an account on GCP. Learn more….

  2. Get acquainted with using the Google Cloud Shell. Learn more….

  3. Create a new project. Learn how to create a cloud project….

Step 1: Deploy GenPipes in GCP cloud

Set up GenPipes in your cloud server instance. Run in the Google shell:

user@machine:~$ git clone https://bitbucket.org/mugqic/cloud_deplyoment.git

user@machine:~$ cd cloud_deplyoment/gcp/

user@machine:~$ gcloud deployment-manager deployments create slurm --config slurm-cluster.yaml

For more details on how to set up GenPipes in the cloud, see GenPipes Cloud Deployment Guide.

Cloud billing

Please note that from here on, your GenPipes cloud deployment is being deployed and your account is getting billed by Google. Remember to shut down the cloud server cluster when the analysis is done if you do not wish to be billed unintentionally.

Step 2: Verify Slurm Deployment

Once the gcloud command is done running, a configuration script is started to install SLURM on the cluster running in the cloud. You will be able to monitor the installation after you run the next command.

Use the Google shell to log into the login node of the Slurm cluster:

user@machine:~$ gcloud compute ssh login1 --zone=northamerica-northeast1-a

You are now on your cloud deployment login node.

The installation may still be running. Once it is done, you will see a welcome message:

Slurm is currently being installed/configured in the background.

A terminal broadcast will announce when installation and configuration is
complete.

Wait for the terminal broadcast. It can take up to 10 minutes.

Step 3: Run GenPipes Pipeline

In this tutorial, we will run the chipseq pipeline in the cloud.

First, create a test folder as shown below:

user@machine:~$ mkdir -p chipseq_test
user@machine:~$ cd chipseq_test

Then, download the Chip Sequencing Test Dataset and unzip it:

user@machine:~$ wget  https://m-f39e09.071823.8540.data.globus.org/genpipes-test-datasets/chipseq.chr19.new.tar.gz
user@machine:~$ gzip -d chipseq.chr19.new.tar.gz

Next, download the chipseq configuration file for use in the cloud:

user@machine:~$ wget https://bitbucket.org/mugqic/cloud_deplyoment/raw/master/quick_start.ini

Then construct the chipseq pipeline launch command:

user@machine:~$ genpipes chipseq -c $MUGQIC_PIPELINES_HOME/pipelines/chipseq/chipseq.base.ini \
                    $MUGQIC_PIPELINES_HOME/pipelines/common_ini/rorqual.ini \
                    quick_start.ini \
                -j slurm \
                -r readsets.chipseqTest.chr22.tsv \
                -d designfile_chipseq.chr22.txt \
                -s 1-18 \
                -g chipseqScript.sh

Finally, launch the pipeline using the command:

user@machine:~$ bash chipseqScript.sh

Step 4: Monitor Pipeline Status

Use the squeue command to monitor the GenPipes analysis run through the Slurm scheduler. For details on how to monitor scheduler jobs, refer to the job monitoring step in the tutorial GenPipes on DRAC.

For more details on viewing log files and generating reports, refer to the section Monitor Job Status in the Tutorial: GenPipes on DRAC servers.

Note

Shut down your GenPipes Cloud setup once you are done to ensure you are not billed for unintentional cloud usage.

After the jobs have run, you can exit the login node:

user@machine:~$ exit

You, are now in back on your cloud shell administrative machine. You can shut down your GenPipes cloud cluster.

user@machine:~$ gcloud deployment-manager deployments delete slurm

You are not being billed anymore.

Note

You need to enable the “deployment manager” API on your project. See this page. You also need to make sure that billing is enabled (even for a free try). For more detailed information, check out our Bitbucket repo