Gene Ontology (GO) Projecthttp://www.geneontology.org/
"Major bioinformatics initiative with the aim of standardizing the representation of gene and gene product attributes across species and databases. The project provides a controlled vocabulary of terms for describing gene product characteristics and gene product annotation data."
Sets of functional terms separated by broad category: Cellular Component, Molecular Function, or Biological Process
Example:
[Term]
id: GO:0000009
name: alpha-1,6-mannosyltransferase activity
namespace: molecular_function
def: "Catalysis of the transfer of a mannose residue from GDP-mannose to an oligosaccharide, forming an alpha-1,6-linkage." [GOC:mcc, PMID:2644248]
xref: EC:2.4.1.-
xref: Reactome:REACT_22295 "Addition of a third mannose to the N-glycan precursor by Alg2, Saccharomyces cerevisiae"
xref: Reactome:REACT_22383 "Addition of a third mannose to the N-glycan precursor by ALG2, Homo sapiens"
is_a: GO:0000030 ! mannosyltransferase activity
If the Pfam hmm isn't accessible, you will need to download the Pfam alignment files here: ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-A.seed.gz
And then run the following to decompress and create an hmm database using the program hmmbuild (HMMER3 package). These files are big, delete or compress when finished running hmmscan.
$ gunzip Pfam-A.seed.gz
$ hmmbuild Pfam-A.hmm Pfam-A.seed
$ ls Pfam-A.* # this gives you:
Pfam-A.hmm Pfam-A.hmm.h3f Pfam-A.hmm.h3i Pfam-A.hmm.h3m Pfam-A.hmm.h3p Pfam-A.seed
The wrapper script run_hmmscan.py will run hmmscan for you, but it's not the most efficient and it takes quite a while for a full genome set of proteins (10,000 genes = appx. 1.5hr). This test set only has 8 sequences:
The output files from run_hmmscan.py (association, description and population) should be correctly formatted for running the following functional enrichment program...
There are 3 flags in the code (use just 1 at a time...)-s (basic summary statistics will print)
-u (GO terms with uncorrected p-vals > 0.05 will print)
-c (GO terms with bonferroni corrected p-vals > 0.05 will print
Functional Enrichment
Are there any biological functions associated with your differentially expressed gene sets?Online tools
There are MANY, here's an example...
- Neurospora crassa FUNCAT (functional categories)
http://mips.helmholtz-muenchen.de/cgi-bin/proj/funcatDB/search_advanced.pl?gene=2Gene Ontology
Gene Ontology (GO) Project http://www.geneontology.org/"Major bioinformatics initiative with the aim of standardizing the representation of gene and gene product attributes across species and databases. The project provides a controlled vocabulary of terms for describing gene product characteristics and gene product annotation data."
- Sets of functional terms separated by broad category: Cellular Component, Molecular Function, or Biological Process
- GO Ontology File (all terms and descriptions): OBO v1.2 available http://www.geneontology.org/GO.downloads.ontology.shtml
- Example:
Species with defined GO terms[Term]
id: GO:0000009
name: alpha-1,6-mannosyltransferase activity
namespace: molecular_function
def: "Catalysis of the transfer of a mannose residue from GDP-mannose to an oligosaccharide, forming an alpha-1,6-linkage." [GOC:mcc, PMID:2644248]
xref: EC:2.4.1.-
xref: Reactome:REACT_22295 "Addition of a third mannose to the N-glycan precursor by Alg2, Saccharomyces cerevisiae"
xref: Reactome:REACT_22383 "Addition of a third mannose to the N-glycan precursor by ALG2, Homo sapiens"
is_a: GO:0000030 ! mannosyltransferase activity
http://www.geneontology.org/GO.annotation.species_db.shtml
Example GO term file (Candida)
Transferring GO terms
Pfam Domains
http://pfam.sanger.ac.uk/Alternatively, functional enrichment can be done using functional domain predictions...
HMMER
http://hmmer.janelia.org/
ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-A.seed.gz
And then run the following to decompress and create an hmm database using the program hmmbuild (HMMER3 package). These files are big, delete or compress when finished running hmmscan.
$ gunzip Pfam-A.seed.gz
$ hmmbuild Pfam-A.hmm Pfam-A.seed
$ ls Pfam-A.* # this gives you:
Pfam-A.hmm Pfam-A.hmm.h3f Pfam-A.hmm.h3i Pfam-A.hmm.h3m Pfam-A.hmm.h3p Pfam-A.seed
$ python test.fa test_pfam # this gives you 5 files (the tmp files can be deleted):
GeneMerge: Term enrichment test
http://genemerge.cbcb.umd.edu/Castillo-Davis, C.I. and D.L. Hartl 2003. Bioinformatics 19(7):891-892 (If you use output from GeneMerge, please cite this paper)
-u (GO terms with uncorrected p-vals > 0.05 will print)
-c (GO terms with bonferroni corrected p-vals > 0.05 will print
$ wget http://cgrlucb.wikispaces.com/file/view/Cocci_example.tar.gz
$ tar -zxvf Cocci_example.tar.gz
$ cd Cocci_example
$ ls
Cimmitis_pfam.assoc Cimmitis_pfam.pop geneMerge_use.txt upH_allpop.txt
Cimmitis_pfam.desc GeneMerge1.2.pl sort_geneMerge_output.py upS_allpop.txt
$ perl GeneMerge1.2.pl Cimmitis_pfam.assoc Cimmitis_pfam.desc Cimmitis_pfam.pop upH_allpop.txt upH.out
$ python sort_geneMerge_output.py upH.out -c
Pfam_ID Pop_freq Pop_frac Study_frac raw_Pval corr_Pval Description Genes_with_term
Kinesin 0.00112485939257593 11/9779 8/1299 1.07985654931451e-05 0.00555046266347657 Kinesin motor domain CIMG_01713 CIMG_04554 CIMG_05502 CIMG_07460 CIMG_09273 CIMG_11124 CIMG_12417 CIMG_13268