Deploying GenPipes in a container

This document covers details on how to deploy GenPipes locally on your infrastructure using container mechanism. For more details on other available options to deploy and access GenPipes you may refer to GenPipes Deployment Options Page.

You can locally deploy GenPipes by creating a container that hosts all necessary software, configuration details to get you started with running GenPipes within the container. You only need $User privileges to deploy and use GenPipes locally in a container, no root privileges are needed for this option.

GenPipes genomic analysis tools are designed to run on supercomputing infrastructure or HPC data centres such as Compute Canada servers. However, you can generate generate the pipelines scripts and run smaller experiment on a server with container technology. This mechanism is useful if you are a contributor to GenPipes code or wish to add a feature of your own in the code. Containers make it easy for you to debug and develop GenPipes on you local machine, if you do not have access to GenPipes deployed on Compute Canada servers.

Step 1: Install a compatible container technology on your local server

GenPipes can be deployed either using Docker or Singularity based containers. Refer to the respective container technology tutorial and user manuals to deploy and check if your container setup is working locally.

Step 2: Setup a GenPipes development environment

Once your container environment and requisite software is all setup and working, proceed to clone GenPipes somewhere locally under $HOME directory using the following command:

git clone https://bitbucket.org/mugqic/genpipes $HOME/some/dir/genpipes

Add the following line to your .bashrc file:

export GENPIPES_DEV_DIR=$HOME/some/dir/genpipes

Next, use instructions below to start your GenPipes container.

Step 3: Setup GenPipes in the container

For Docker, use the following command:

docker run --privileged -v /tmp:/tmp --network host -it -w $PWD -v $HOME:$HOME --user $UID:$GROUPS -v /etc/group:/etc/group  -v /etc/passwd:/etc/passwd  [ -v < CACHE_ON_HOST >:/cvmfs-cache/ ] c3genomics/genpipes:<TAG>

For Singularity, use the following command:

singularity run [ -B < /HOST/CACHE/ >:/cvmfs-cache/  ] docker://c3genomics/genpipes:<TAG>

Please note, <TAG> refers to one of the tagged GenPipes sources as listed here. Click on ‘master’ branch and in the dropdown, choose ‘Tags’ to select the version that you wish to use for GenPipes.

Step 4: Load GenPipes dependency modules in the container

As shown in previous step, you can initiate the container process on your machine locally. Next, you need to load GenPipes module using the following command:

module load dev_genpipes

With this command, GenPipes uses whatever commit of branch that has been checked out in $HOME/some/dir/genpipes directory.

Voila! Now you can use GenPipes inside the container just like you would use it locally on a server or on Compute Cananda servers.

For each pipeline, you can get help about its usage through the help command:

$MUGQIC_PIPELINES_HOME/pipelines/<pipeline_name>/<pipeline_name>.py --help

Step 5: Running GenPipes Pipelines in a container

Running pipelines requires other inputs such as Configuration File, Readset File and Design File. For details on how to run individual pipelines you can see Running GenPipes or GenPipes User Guide.

You need to make a note of the fact that GenPipes Pipelines use scheduler’s calls (qsub, sbatch) for submitting genomic analysis compute jobs. If you plan to use GenPipes locally using your infrastructure, inside a container, you need to run the GenPipes pipeline python scripts using the “batch mode” option. For local containerized versions of GenPipes, this is the preferred way of running the pipelines, if you don’t have access to a scheduler locally such as SLURM or PBS.

This is how you can run GenPipes pipelines such as DNA Sequencing Pipeline, refer to the command below:

dnaseq.py -c dnaseq.base.ini dnaseq.batch.ini -j batch -r your-readsets.tsv -d your-design.tsv -s 1-34 -t mugqic > run-in-container-dnaseq-script.sh

bash run-in-container-dnaseq-script.sh

Please note, there is a disadvantage to running GenPipes Pipelines without a scheduler. In the batch mode, which is configured using the “-j batch” option, all the jobs would run as a batch, one after another, on a single node. If your server is powerful enough, this might be your preferable option. Otherwise, if you would like to take advantage of GenPipes’ job scheduling capabilities, you need to install a job scheduler locally in your infrastructure so that GenPipes can work effectively. We recommend SLURM scheduler for GenPipes.

Note

In case of any issues, you can try GenPipes Support or check out other communication channels to view latest discussions around using GenPipes by the community.

You may also want to check the latest GenPipes deployment and setup instructions listed in GenPipes README.md file.