Snakemake Basics

Make files are a great way to get things done efficiently, if you know what they are and they make sense to you. My entry into makefiles started when I took C programming in the late 80’s. You would write code in C, then use a Makefile to turn that code into an executable program. Since it takes several intermediate steps and creation of different kinds of files along the way, a Makefile manages all these steps using recipes laid out in blocks. The blocks are designed to create output files from input files. You specify the names of the output files, and the names of the input files, and a set of rules for how to use one to create the other.

Snakemake is this same idea, implemented through python. Here’s a simple example.

all:
    alignment.bam         # We have to populate this part with the files we want to create

align:
    input:
        input.fastq       # [FileSystem] Does this file exist?
                          # Is there some rule to create it?
    output:
        alignment.bam     # Does this part match any part of "all"?
                          # [FileSystem] Does this file already exist?
    shell:
        bowtie input.fastq | samtools > alignment.bam

Generate Random Integers in Python

Getting random integers in Python

# import the function
from random import randint

# set your parameters
howmany = 10
min = 0
max = 100

# use a list comprehension to fill an array
rand_ints = [randint(min,max) for _ in range(howmany)]

Seems kinds of bulky, but you specify how many numbers you would like, what range to draw them from, and then call the radint() function over and over again until you have the numbers you need, using a for loop with a throaway variable “_”.

But there’s a few ways to do this! If you have NumPy installed, it has a similar function for generating random integers, and the lines of code above, can be reduced to just two: import the library, call the function.

from numpy.random import randint

randint(0,100,10)

The result is as follows:

array([42, 99, 30, 94, 60, 90,  7, 31, 91, 11])

Both of these methods would usually be preceded by a call to “seed” the random number generator, so that you can set a reproducible starting point for random number generation. The function has the same name in each library, and calling seed for one does not set the seed for the other. But that’s more than you need to know for now.