Welcome+to+Unix+Solutions

Solutions

 * 1) Yeast Genome**

Unzip all the files with a wildcard match.

$ **cd fasta_files** $ **gunzip *.fa.gz**

Create the file from all the chromosomes with another wildcard match.

$ **cat *.fa > cerevisiae_genome.fasta**

Count the number of chromosomes

$ **grep -c '>' cerevisiae_genome.fasta**


 * Watch out for searching for just > without the quotes... you may overwrite your genome with a blank file.**

Lookup the command **wc. How do we use it?**

$ **man wc**

Use **wc** to count the length of the genome.

$ **wc cerevisiae_genome.fasta**

What does each column mean?

There are other characters besides nucleotides, so how do we get rid of them?

$ grep -v '>' cerevisiae_genome.fasta | wc

2) SGD Features
$ **wget ftp://genome-ftp.stanford.edu/pub/yeast/chromosomal_feature/SGD_features.tab**

The -O flag allows you to specify the filename of the downloaded file.

The number of ORFs in the file:

$ **grep -c ORF SGD_features.tab**

And the Verified ones:

$ **grep ORF SGD_features.tab | grep -c Verified**

And the Dubious ones:

$ **grep ORF SGD_features.tab | grep -c Dubious**

Now the real number of listed ORFs:

$ **cut -f 2 SGD_features.tab | grep -c ORF**

Genomic features:

$ **cut -f 2 SGD_features.tab | sort | uniq**

3) Disk Space
I tried Googling "disk space free unix command"

$ **man df**

$ **df**

Try with the -m flag

$ **df -m**

And then with the "human readable" flag

$ **df -h**