Functional+Enrichment

= = =__**Functional Enrichment**__= Are there any biological functions associated with your differentially expressed gene sets?

There are MANY, here's an example... [| http://mips.helmholtz-muenchen.de/cgi-bin/proj/funcatDB/search_advanced.pl?gene=2]
 * Online tools**
 * Neurospora crassa FUNCAT (functional categories)
 * Set of functionally related genes:
 * Input sample gene set:
 * Output: 43 functional terms enriched[[image:Nc_output.jpg width="800" height="515"]]

=__**Gene Ontology**__= "Major bioinformatics initiative with the aim of standardizing the representation of gene and gene product attributes across species and databases. The project provides a controlled vocabulary of terms for describing gene product characteristics and gene product annotation data." > [Term] > id: GO:0000009 > name: alpha-1,6-mannosyltransferase activity > namespace: molecular_function > def: "Catalysis of the transfer of a mannose residue from GDP-mannose to an oligosaccharide, forming an alpha-1,6-linkage." [GOC:mcc, PMID:2644248] > xref: EC:2.4.1.- > xref: Reactome:REACT_22295 "Addition of a third mannose to the N-glycan precursor by Alg2, Saccharomyces cerevisiae" > xref: Reactome:REACT_22383 "Addition of a third mannose to the N-glycan precursor by ALG2, Homo sapiens" > is_a: GO:0000030 ! mannosyltransferase activity > > []
 * Gene Ontology (GO) Project** []
 * Sets of functional terms separated by broad category: Cellular Component, Molecular Function, or Biological Process
 * GO Ontology File (all terms and descriptions): OBO v1.2 available []
 * Example:
 * Species with defined GO terms**


 * Example GO term file (Candida)**
 * Candida (CGD) GO term file: [[file:gene_association.cgd.gz]]
 * READ ME: []


 * Transferring GO terms**
 * Reciprocal best hits (BLAST) for most closely-related species with associated GO terms
 * Transfer GO terms from RBH

=**__Pfam Domains__**= []

Alternatively, functional enrichment can be done using functional domain predictions...

[] >> [] >> And then run the following to decompress and create an hmm database using the program hmmbuild (HMMER3 package). These files are big, delete or compress when finished running hmmscan. >> $ gunzip Pfam-A.seed.gz >> $ hmmbuild Pfam-A.hmm Pfam-A.seed >> $ ls Pfam-A.* # this gives you: >> Pfam-A.hmm Pfam-A.hmm.h3f Pfam-A.hmm.h3i Pfam-A.hmm.h3m Pfam-A.hmm.h3p Pfam-A.seed
 * HMMER**
 * hmmscan program from HMMER3 package allows you to "search protein sequences against collections of profiles, //e.g.// Pfam".
 * Output is long and detailed... use the python script included here:
 * If the Pfam hmm isn't accessible, you will need to download the Pfam alignment files here:

$ python test.fa test_pfam # this gives you 5 files (the tmp files can be deleted): >> $ ls ../get_Pfam_predictions >> run_hmmscan.py test.fa test_pfam.assoc test_pfam.desc test_pfam.pop tmp.fa tmp.out >> $ head test_pfam.* >> > test_pfam.assoc < >> Bcer_06612-RA Ldh_2; >> Bcer_06601-RA >> Bcer_06600-RA >> Bcer_06610-RA >> Bcer_06603-RA Hus1; >> Bcer_06606-RA Hydrolase;E1-E2_ATPase; >> Bcer_06622-RA MMtag; >> Bcer_06616-RA APH; >> >> > test_pfam.desc < >> Ldh_2 Malate/L-lactate dehydrogenase >> Hus1 Hus1-like protein >> E1-E2_ATPase E1-E2 ATPase >> APH Phosphotransferase enzyme family >> Hydrolase haloacid dehalogenase-like hydrolase >> MMtag Kinase phosphorylation protein >> >> > test_pfam.pop < >> Bcer_06612-RA >> Bcer_06601-RA >> Bcer_06600-RA >> Bcer_06610-RA >> Bcer_06603-RA >> Bcer_06606-RA >> Bcer_06622-RA >> Bcer_06616-RA
 * The wrapper script run_hmmscan.py will run hmmscan for you, but it's not the most efficient and it takes quite a while for a full genome set of proteins (10,000 genes = appx. 1.5hr). This test set only has 8 sequences:


 * The output files from run_hmmscan.py (association, description and population) should be correctly formatted for running the following functional enrichment program...

=__**GeneMerge: Term enrichment test**__= [] [|Castillo-Davis, C.I. and D.L. Hartl 2003. Bioinformatics 19(7):891-892] (If you use output from GeneMerge, please cite this paper) >> -u (GO terms with uncorrected p-vals > 0.05 will print) >> -c (GO terms with bonferroni corrected p-vals > 0.05 will print >> $ wget http://cgrlucb.wikispaces.com/file/view/Cocci_example.tar.gz >> $ tar -zxvf Cocci_example.tar.gz >> $ cd Cocci_example >> $ ls >> Cimmitis_pfam.assoc Cimmitis_pfam.pop geneMerge_use.txt upH_allpop.txt >> Cimmitis_pfam.desc GeneMerge1.2.pl sort_geneMerge_output.py upS_allpop.txt >> $ perl GeneMerge1.2.pl Cimmitis_pfam.assoc Cimmitis_pfam.desc Cimmitis_pfam.pop upH_allpop.txt upH.out >> $ python sort_geneMerge_output.py upH.out -c >> Pfam_ID Pop_freq Pop_frac Study_frac raw_Pval corr_Pval Description Genes_with_term >> Kinesin 0.00112485939257593 11/9779 8/1299 1.07985654931451e-05 0.00555046266347657 Kinesin motor domain CIMG_01713 CIMG_04554 CIMG_05502 CIMG_07460 CIMG_09273 CIMG_11124 CIMG_12417 CIMG_13268 >>
 * Hypergeometric distribution: probability of successes in //n// draws __without__ replacement
 * Here is the GeneMerge code
 * $ perl GeneMerge1.2.pl ___.assoc__ _.desc ___.pop list_of_interest.txt output.out
 * Those first 3 files (assoc, desc, pop) are the output from above
 * The list of interest is simply your list of genes of interest (1 gene ID per line, obviously must match the gene ID's in the assoc and pop files)
 * And... it writes to the output file
 * Here is a very old script I wrote to make the output file slightly more user-friendly
 * $ python sort_geneMerge_output.py output.out -(flag)
 * There are 3 flags in the code (use just 1 at a time...)-s (basic summary statistics will print)
 * The programs and an example are in this file: [[file:Cocci_example.tar.gz]]
 * To download, unpack the directory and run the example: