Skip to main content
guest
Join
|
Help
|
Sign In
CGRL
Home
guest
|
Join
|
Help
|
Sign In
Wiki Home
Recent Changes
Pages and Files
Members
Spring 2018 Workshops
Fall 2017 Workshops
Spring 2017 Workshops
Fall 2016 Workshops
Spring 2016 Workshops
Fall 2015 Workshops
Spring 2015 Workshops
Fall 2014 Workshops
Spring 2014 Workshops
Fall 2013 Workshops
Spring 2013 Workshops
Welcome to Unix Solutions
Edit
0
1
…
0
Tags
No tags
Notify
RSS
Backlinks
Source
Print
Export (PDF)
Solutions
1) Yeast Genome
Unzip all the files with a wildcard match.
$
cd fasta_files
$
gunzip *.fa.gz
Create the file from all the chromosomes with another wildcard match.
$
cat *.fa > cerevisiae_genome.fasta
Count the number of chromosomes
$
grep -c '>' cerevisiae_genome.fasta
Watch out for searching for just > without the quotes... you may overwrite your genome with a blank file.
Lookup the command
wc. How do we use it?
$
man wc
Use
wc
to count the length of the genome.
$
wc cerevisiae_genome.fasta
What does each column mean?
There are other characters besides nucleotides, so how do we get rid of them?
$ grep -v '>' cerevisiae_genome.fasta | wc
2) SGD Features
$
wget
ftp://genome-ftp.stanford.edu/pub/yeast/chromosomal_feature/SGD_features.tab
The -O flag allows you to specify the filename of the downloaded file.
The number of ORFs in the file:
$
grep -c ORF SGD_features.tab
And the Verified ones:
$
grep ORF SGD_features.tab | grep -c Verified
And the Dubious ones:
$
grep ORF SGD_features.tab | grep -c Dubious
Now the real number of listed ORFs:
$
cut -f 2 SGD_features.tab | grep -c ORF
Genomic features:
$
cut -f 2 SGD_features.tab | sort | uniq
3) Disk Space
I tried Googling "disk space free unix command"
$
man df
$
df
Try with the -m flag
$
df -m
And then with the "human readable" flag
$
df -h
Javascript Required
You need to enable Javascript in your browser to edit pages.
help on how to format text
Turn off "Getting Started"
Home
...
Loading...
Solutions
1) Yeast Genome
Unzip all the files with a wildcard match.
$ cd fasta_files
$ gunzip *.fa.gz
Create the file from all the chromosomes with another wildcard match.
$ cat *.fa > cerevisiae_genome.fasta
Count the number of chromosomes
$ grep -c '>' cerevisiae_genome.fasta
Watch out for searching for just > without the quotes... you may overwrite your genome with a blank file.
Lookup the command wc. How do we use it?
$ man wc
Use wc to count the length of the genome.
$ wc cerevisiae_genome.fasta
What does each column mean?
There are other characters besides nucleotides, so how do we get rid of them?
$ grep -v '>' cerevisiae_genome.fasta | wc
2) SGD Features
$ wget ftp://genome-ftp.stanford.edu/pub/yeast/chromosomal_feature/SGD_features.tab
The -O flag allows you to specify the filename of the downloaded file.
The number of ORFs in the file:
$ grep -c ORF SGD_features.tab
And the Verified ones:
$ grep ORF SGD_features.tab | grep -c Verified
And the Dubious ones:
$ grep ORF SGD_features.tab | grep -c Dubious
Now the real number of listed ORFs:
$ cut -f 2 SGD_features.tab | grep -c ORF
Genomic features:
$ cut -f 2 SGD_features.tab | sort | uniq
3) Disk Space
I tried Googling "disk space free unix command"
$ man df
$ df
Try with the -m flag
$ df -m
And then with the "human readable" flag
$ df -h