UnixSpring2013_Part2

Installing and running software on the command line: an example using Bowtie
Assuming you are working on an organism whose genome has already been sequenced, the first step for most types of experiments that involve next-gen sequence is to align your sequence reads to a reference genome.

One popular, fast, and easy to use short-read aligner is Bowtie

Unfortunately, there is not enough time in this module to get into the nitty-gritty of aligning short reads to a genome assembly but if you are familiar with BLAST, Bowtie takes a similar approach with a few tweaks thrown in including: speed-ups to make possible the alignment of millions of query sequences in a reasonable amount of time, optimizing the aligning process to specifically deal with short query sequences, and taking advantage of the quality scores that accompany the sequence reads. For more information, check out the Bowtie paper.

Downloading and installing software
Before we can run Bowtie, we need to install it and before we can install it, we need to download it. To download Bowtie, follow the link on the Bowtie homepage, which takes you to sourceforge.

It looks like there are several options depending on which operating system you are running.

Digression: source code versus pre-compiled binary
There are generally two formats that software will be available in for download: source code and binaries. The source code is the human-readable code that the developer(s) wrote which needs to be compiled into a binary before it can be read by your machine. In many cases developers will have already compiled their source code for a handful of different operating systems and will make those "pre-compiled binaries" available in addition to the source code. This saves you the step of having to compile the source code yourself, a more advanced procedure that we won't be able to cover in this workshop. Usually the source code will be labeled with "source" or "src" while the pre-compiled binaries will have the name of the operating system that they are intended for in their label.

So, it looks like the developers of bowtie have already compiled it for both macOS and linux, making it easy for us. However, we still have one more decision to make. You can see that there are two binaries available for both macOS and linux, one labeled i386 and the other x86_64.

Another digression: chip architecture
When downloading software, you will often find binaries labeled with either i386 or x86_64. Without going too deep into details, these numbers refer to the type of processor the binaries were compiled for. Luckily, there is an easy way to figure out which type of processor is on the machine you are installing to. If you are installing to your own local machine (mac or linux) open a new terminal window. If you are installing to a remote server, make sure you are logged on to the server and in the same window, type:

$ **uname -a**

//Linux poset.cgrl.berkeley.edu 2.6.18-238.12.1.el5 #1 SMP Tue May 31 13:22:04 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux//

Look through the output for either i386 or x86_64. You can see that in this case, the server we are all logged into has the x86_64 architecture and thus, the binary with that label is the one we should download.

Ok, now that we know which binary we need, what is the easiest way to download the file onto the server?

$ **wget -O bowtie-0.12.9-linux-x86_64.zip http://sourceforge.net/projects/bowtie-bio/files/bowtie/0.12.9/bowtie-0.12.9-linux-x86_64.zip/download** //--2013-02-11 06:23:59-- http://sourceforge.net/projects/bowtie-bio/files/bowtie/0.12.9/bowtie-0.12.9-linux-x86_64.zip/download// //Resolving sourceforge.net... 216.34.181.60// //Connecting to sourceforge.net|216.34.181.60|:80... connected.// //HTTP request sent, awaiting response... 302 Found// //Location: http://downloads.sourceforge.net/project/bowtie-bio/bowtie/0.12.9/bowtie-0.12.9-linux-x86_64.zip?r=&ts=1360592640&use_mirror=voxel [following]// //--2013-02-11 06:24:00-- http://downloads.sourceforge.net/project/bowtie-bio/bowtie/0.12.9/bowtie-0.12.9-linux-x86_64.zip?r=&ts=1360592640&use_mirror=voxel// //Resolving downloads.sourceforge.net... 216.34.181.59// //Connecting to downloads.sourceforge.net|216.34.181.59|:80... connected.// //HTTP request sent, awaiting response... 302 Found// //Location: http://voxel.dl.sourceforge.net/project/bowtie-bio/bowtie/0.12.9/bowtie-0.12.9-linux-x86_64.zip [following]// //--2013-02-11 06:24:00-- http://voxel.dl.sourceforge.net/project/bowtie-bio/bowtie/0.12.9/bowtie-0.12.9-linux-x86_64.zip// //Resolving voxel.dl.sourceforge.net... 107.6.88.167, 107.6.92.102, 107.6.92.101// //Connecting to voxel.dl.sourceforge.net|107.6.88.167|:80... connected.// //HTTP request sent, awaiting response... 200 OK// //Length: 10432225 (9.9M) [application/zip]// //Saving to: `bowtie-0.12.9-linux-x86_64.zip'// //100%[======================================>] 10,432,225 3.42M/s in 2.9s// //2013-02-11 06:24:03 (3.42 MB/s) - `bowtie-0.12.9-linux-x86_64.zip' saved [10432225/10432225]//

wget reports the download progress and a bunch of other stats about the download. Type 'ls' and you should now see a file named 'bowtie-0.12.7-linux-x86_64.zip' in your directory.

Unzipping the downloaded file
We can see that the file we downloaded ends in .zip which tells us that this file (or group of files) has been compressed using a utility called zip. To uncompress it type:

$ **unzip bowtie-0.12.9-linux-x86_64.zip** //Archive: bowtie-0.12.9-linux-x86_64.zip// //creating: bowtie-0.12.9/// //creating: bowtie-0.12.9/scripts/// //inflating: bowtie-0.12.9/scripts/build_test.sh// //inflating: bowtie-0.12.9/scripts/make_a_thaliana_tair.sh// //...// //...//

Let's take a look inside the directory:

$ **ls -l bowtie-0.12.9**

//total 12004// //-rw-r--r-- 1 mganesh cgrl 703 Dec 12 2009 AUTHORS// //-rw-r--r-- 1 mganesh cgrl 5207 Aug 13 2008 COPYING// //-rw-r--r-- 1 mganesh cgrl 69556 Dec 15 19:11 MANUAL// //-rw-r--r-- 1 mganesh cgrl 80863 Dec 15 19:11 MANUAL.markdown// //-rw-r--r-- 1 mganesh cgrl 30715 Dec 15 19:11 NEWS// //-rw-r--r-- 1 mganesh cgrl 6258 Oct 5 2009 TUTORIAL// //-rw-r--r-- 1 mganesh cgrl 6 Dec 15 19:11 VERSION// //-rwxr-xr-x 1 mganesh cgrl 744331 Dec 16 11:37 bowtie// //-rwxr-xr-x 1 mganesh cgrl 327131 Dec 16 11:36 bowtie-build// //-rwxr-xr-x 1 mganesh cgrl 2661665 Dec 16 11:36 bowtie-build-debug// //-rwxr-xr-x 1 mganesh cgrl 6570169 Dec 16 11:37 bowtie-debug// //-rwxr-xr-x 1 mganesh cgrl 238154 Dec 16 11:36 bowtie-inspect// //-rwxr-xr-x 1 mganesh cgrl 1520824 Dec 16 11:36 bowtie-inspect-debug// //drwxr-xr-x 2 mganesh cgrl 53 Dec 16 11:37 doc// //drwxr-xr-x 2 mganesh cgrl 26 Dec 16 11:37 genomes// //drwxr-xr-x 2 mganesh cgrl 154 Dec 16 11:37 indexes// //drwxr-xr-x 2 mganesh cgrl 4096 Dec 16 11:37 reads// //drwxr-xr-x 3 mganesh cgrl 4096 Dec 16 11:37 scripts//

**Digression: Cheatsheet showing various methods for compressing/uncompressing files and packaging/unpackaging directories**
Although the bowtie binary happens to be stored in a zipped directory, it's actually more common to find downloads (data such as genome sequences or other software) that have been packaged and compressed using a utility called tar. Consult the chart below for the command to unpackage such a file.


 * ~ Goal ||~ Command name ||~ Syntax ||~ Extension ||
 * compress file with zip (fast, less efficient compression) ||< zip || zip output-filename.zip input-filename || .zip ||
 * uncompress with zip ||< unzip || unzip filename.zip || .zip ||
 * compress file with gzip (slower, more efficient compression) ||< gzip || gzip filename || .gz ||
 * uncompress with gzip ||< gunzip || gunzip filename || .gz ||
 * compress file with bzip2 (slowest, most efficient compression) ||< bzip2 || bzip2 filename || .bz2 ||
 * uncompress with bzip2 ||< bunzip2 || bunzip2 filename || .bz2 ||
 * archive a directory of files and compress with gzip ||< tar || tar -czf output-filename.tar.gz input-directory || .tar.gz ||
 * unpack a directory of files that is compressed with gzip ||< tar || tar -xzf filename.tar.gz || .tar.gz ||
 * archive a directory of files and compress with bzip2 ||< tar || tar -cjf output-filename.tar.bz2 input-directory || .tar.bz2 ||
 * unpack a directory of files that is compressed with bzip2 ||< tar || tar -xjf filename.tar.bz2 || .tar.bz2 ||

Okay, we downloaded and uncompressed bowtie. Will it work now?

$ **bowtie**

//bowtie: command not found//

Why doesn't it work?

Modifying PATH
The bowtie binary is located inside the directory bowtie-0.12.7/ Is this directory in our PATH? From this morning:

$ **env | grep PATH**

//PATH=/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/opt/dell/srvadmin/bin:/global/home/mganesh/bin// //MODULEPATH=...*//IGNORE THIS LINE FOR NOW*

So, bowtie is not in your path. That explains it. The bowtie binary is now on the server, but the machine doesn't know where to find it. We need to add the bowtie folder to PATH. There is a special file located in your home directory called .bash_profile that specifies PATH. To add to PATH, we can use the text editor emacs to edit this file.

<span style="font-family: Arial,Helvetica,sans-serif;">$ **emacs** **~/.bash_profile**

//# .bash_profile// //# Get the aliases and functions//

//if [ -f ~/.bashrc ]; then// //. ~/.bashrc// //fi//

//# User specific environment and startup programs// //PATH=$PATH:$HOME/bin// //export PATH//

Scroll down to the line shown in red above and modify it so that it looks like this:

//PATH=$PATH:$HOME/bin:$HOME///bowtie-0.12.7

Do you remember from this morning's session how to save and exit from emacs?

After exiting emacs, reload the modified profile:

<span style="font-family: Arial,Helvetica,sans-serif;">$ **source ~/.bash_profile**

Now we should be able to run bowtie.

Running a program on the command line
When you used grep this morning, you had to type three things into the command: the name of the executable itself, what you wanted to search for, and the name of the file you wanted to search.

How do you know how to structure your command for a program you've never used before, like say, Bowtie?

You have a couple of options. You could try to find documentation on the website or inside the package that you downloaded but it's often easiest to just start by simply typing in the name of the executable:

$ **bowtie**

//No index, query, or output file specified!//

//Usage://

//bowtie [options]* {-1 <m1> -2 <m2> | --12 <r> | } [ ]//

//<m1> Comma-separated list of files containing upstream mates (or the//

//sequences themselves, if -c is set) paired with mates in <m2>//

//<m2> Comma-separated list of files containing downstream mates (or the//

//sequences themselves if -c is set) paired with mates in <m1>//

//<r> Comma-separated list of files containing Crossbow-style reads. Can be//

//a mixture of paired and unpaired. Specify "-" for stdin.//

// Comma-separated list of files containing unpaired reads, or the//

//sequences themselves, if -c is set. Specify "-" for stdin.//

// File to write hits to (default: stdout)//

//...//

//...//

//...//

Bowtie gets angry because we tried to run it without giving it all the information it needs to run, but it also conveniently tells us how to structure our command and lists a whole slew of options we can use to control how it will go about aligning our reads.