:orphan: .. _docs_config_ini_file: Configuration File ================== .. spelling:word-list:: extention ini GenPipes pipelines are multi-step pipelines that run several tools, each with its own parameter inputs. All those parameters are stored in configuration files with .ini extension. Those files have a structure similar to Microsoft Windows INI files, where parameters are divided within sections. .. note:: **Why does GenPipes need configuration file?** An ini file is a file that contains parameters needed to run a pipeline. Our genome alignment pipeline contains over 20 steps, each involving over 5 parameters per step. Imagine having to type all 100 parameters to run a pipeline! For simplicity, all the parameters are stored in an “ini” file (extention.ini) that accompanies the pipeline. Try opening an ini file in a text editor and look at its content! Configuration File Format ------------------------- Pipeline command parameters and cluster settings can be customized using Configuration Files (.ini extension). Those files have a structure similar to Microsoft Windows INI files e.g.: .. code-block:: bash #!ini [DEFAULT] module_trimmomatic=mugqic/trimmomatic/0.36 [trimmomatic] min_length=50 A parameter value is first searched in its specific section, then, if not found, in the special DEFAULT section. The example above would resolve parameter module_trimmomatic value from section trimmomatic to mugqic/trimmomatic/0.36. Configuration files support interpolation. For example: .. code-block:: bash #!ini scientific_name=Homo_sapiens assembly=GRCh37 assembly_dir=$MUGQIC_INSTALL_HOME/genomes/species/%(scientific_name)s.%(assembly)s genome_fasta=%(assembly_dir)s/genome/%(scientific_name)s.%(assembly)s.fa Here, ``genome_fasta`` would resolve to ``$MUGQIC_INSTALL_HOME/genomes/species/Homo_sapiens.GRCh37/genome/Homo_sapiens.GRCh37.fa``. Each pipeline has several configuration files in: .. code-block:: bash #!bash $GENPIPES_INIS//.*.ini A default configuration file (``.base.ini`` extension) is set for running on abacus cluster using Homo sapiens reference genome and must always be passed first to the ``--config`` option. You can also add a list of other configuration files to ``--config``. Files are read in the list order and each parameter value is overwritten if redefined in the next file. This is useful to customize settings for a specific server cluster deployment or when using a specific genome. Each pipeline has a special configuration file for clusters in the same directory. For example, |key_ccdb_server_ini_name|. And various genome settings are available in ``$MUGQIC_INSTALL_HOME/genomes/species/``. For example, to run the DNA-Seq pipeline on |key_ccdb_server_cmd_name| cluster with Mus musculus reference genome: .. parsed-literal:: user@machine:~$ genpipes $GENPIPES_INIS/dnaseq/dnaseq --config $GENPIPES_INIS/dnaseq/dnaseq.base.ini \\ $GENPIPES_INIS/common_ini/\ |key_ccdb_server_cmd_name|\.ini \\ $MUGQIC_INSTALL_HOME/genomes/species/Mus_musculus.GRCm38//Mus_musculus.GRCm38.ini [other options] \\ -g genpipes_command_list.sh user@machine:~$ bash genpipes_command_list.sh