Solutions


1) Yeast Genome

Unzip all the files with a wildcard match.

$ cd fasta_files
$ gunzip *.fa.gz

Create the file from all the chromosomes with another wildcard match.

$ cat *.fa > cerevisiae_genome.fasta

Count the number of chromosomes

$ grep -c '>' cerevisiae_genome.fasta

Watch out for searching for just > without the quotes... you may overwrite your genome with a blank file.

Lookup the command wc. How do we use it?

$ man wc

Use wc to count the length of the genome.

$ wc cerevisiae_genome.fasta

What does each column mean?

There are other characters besides nucleotides, so how do we get rid of them?

$ grep -v '>' cerevisiae_genome.fasta | wc

2) SGD Features


$ wget ftp://genome-ftp.stanford.edu/pub/yeast/chromosomal_feature/SGD_features.tab

The -O flag allows you to specify the filename of the downloaded file.

The number of ORFs in the file:

$ grep -c ORF SGD_features.tab

And the Verified ones:

$ grep ORF SGD_features.tab | grep -c Verified

And the Dubious ones:

$ grep ORF SGD_features.tab | grep -c Dubious

Now the real number of listed ORFs:

$ cut -f 2 SGD_features.tab | grep -c ORF

Genomic features:

$ cut -f 2 SGD_features.tab | sort | uniq

3) Disk Space


I tried Googling "disk space free unix command"

$ man df

$ df

Try with the -m flag

$ df -m

And then with the "human readable" flag

$ df -h