10.2 Using snakemake for workflow management

Snakemake is a workflow management system that helps automate the execution of computational workflows. It is designed to handle complex dependencies between the input files, output files, and the software tools used to process the data. Snakemake is based on the Python programming language and provides a simple and intuitive syntax for defining rules and dependencies.

Here is a brief overview of how Snakemake works and its basic usage:

  1. Define the input and output files: In Snakemake, you define the input and output files for each step in your workflow. This allows Snakemake to determine when a step needs to be executed based on the availability of its inputs and the freshness of its outputs.
  2. Write rules: Next, you write rules that describe the software tools and commands needed to process the input files into the output files. A rule consists of a name, input and output files, and a command to run.
  3. Create a workflow: Once you have defined the rules, you create a workflow by specifying the order in which the rules should be executed. Snakemake automatically resolves the dependencies between the rules based on the input and output files.
  4. Run the workflow: Finally, you run the workflow using the snakemake command. Snakemake analyzes the input and output files and executes the rules in the correct order to generate the desired output files.
rule count_reads:
    input:
        "reads/sample1.fastq.gz",
        "reads/sample2.fastq.gz"
    output:
        "counts.txt"
    shell:
        "fastqc {input} -o {output} -f fastq"

rule trim_reads:
    input:
        "reads/sample1.fastq.gz",
        "reads/sample2.fastq.gz"
    output:
        "trimmed/sample1.trimmed.fastq.gz",
        "trimmed/sample2.trimmed.fastq.gz"
    shell:
        "trimmomatic SE {input} {output} -threads 4"

rule align_reads:
    input:
        "trimmed/sample1.trimmed.fastq.gz",
        "trimmed/sample2.trimmed.fastq.gz"
    output:
        "aligned.bam"
    shell:
        "bwa mem -t 4 genome.fa {input} | samtools view -Sb - > {output}"

rule call_variants:
    input:
        "aligned.bam"
    output:
        "variants.vcf"
    shell:
        "freebayes -f genome.fa {input} > {output}"

workflow:
    rule count_reads
    rule trim_reads
    rule align_reads
    rule call_variants

To run this workflow, save the code to a file named Snakefile and execute the following command in your terminal:

snakemake