목차

Evaluation methods for semantic similarity measure

SS : semantic similarity between genes or geneset

LordPW :

Semantic similarity measures as tools for exploring the gene ontology.
Lord PW, Stevens RD, Brass A, Goble CA
Pac Symp Biocomput() p601-12 (2003)

SS vs Sequence Similarity (BLAST result)




Correlation co-efficients between BLAST bit scores, and semantic similarity.

Aspect Resnik Lin Jiang
Molecular Function 0.577 0.541 -0.483
Biological Process 0.280 0.303 -0.312
Cellular Component 0.368 0.452 -0.414

Correlation co-efficients for semantic similarity scores over different aspects of GO.

Aspect Resnik Lin Jiang
Molecular Function - Cellular Component 0.290 0.318 0.087
Molecular Function - Biological Process 0.219 0.244 0.269
Biological Process - Cellular Component 0.202 0.175 0.166
The Resnik measure shows the highest correlation, as well as having the lowest correlation for 
the other two aspects, so it may be the most discriminatory.

RubioA :

Correlation between gene expression and GO semantic similarity.
Sevilla JL, Segura V, Podhorski A, Guruceaga E, Mato JM, Martínez-Cruz LA, Corrales FJ, Rubio A
IEEE/ACM Trans Comput Biol Bioinform2(4) p330-8 (2005 Oct-Dec) 10.1109/TCBB.2005.50

SS vs Gene Expression

  1. Marsha dataset (5 samples, 907600 expression levels in total)
  2. RAD dataset (89 samples, 893827 expression levels in total)



Correlation co-efficients between Gene Expression Correlation and Semantic Similarity.

Correlation
/-\ Resnik Jiang Lin
Marsha MF 0.04 -0.05 0.04
/-\ CC 0.05 -0.06 0.05
/-\ BP 0.06 -0.03 0.05
RAD MF 0.12 0.00 0.10
/-\ CC 0.14 -0.06 0.10
/-\ BP 0.14 -0.05 0.12

Correlation Coefficients between Gene Expression Correlation and Semantic Similarity When Average Correlations Are Computed over 100 Semantic Similarity Intervals.

Correlation
/-\ Resnik Jiang Lin
Marsha MF 0.63 -0.59 0.24
/-\ CC 0.72 -0.32 0.12
/-\ BP 0.77 -0.22 0.39
RAD MF 0.47 0.16 0.28
/-\ CC 0.51 -0.23 0.34
/-\ BP 0.59 -0.14 0.41



SS vs (Random permutation of GO annotation, Gene Expression)




Correlation between Gene Expression Correlation and Semantic Similarity for Resnik Distance
in Two Randomized Experiments.

Correlation
/-\ Resnik GO Random Exp Random
Marsha MF 0.63 -0.13 0.10
/-\ CC 0.72 0.09 0.05
/-\ BP 0.77 -0.08 0.20
RAD MF 0.47 0.16 -0.03
/-\ CC 0.53 -0.23 -0.15
/-\ BP 0.61 -0.14 -0.16


These results suggest that there is an underlying relationship between gene expression and GO  
annotation. They also validate the use of Resnik semantic similarity as a measure that is well 
correlated to gene expression and can be used to augment the biological knowledge achieved 
from other sources. For instance, in the same way that we have tools that characterize genes   
according to their expression profiles or similar criteria, tools could be developed that take 
advantage of semantic similarity to enhance existing information. Semantic similarity could 
also be used to improve current clustering algorithms as well as in the development of 
a "semantic search" tool

LiebmanMN :

Assessing semantic similarity measures for the characterization of human regulatory pathways.
Guo X, Liu R, Shriver CD, Hu H, Liebman MN
Bioinformatics22(8) p967-73 (2006 Apr 15) 10.1093/bioinformatics/btl042

ROC curve analysis

It comprises pairwise interactions among proteins of the same complex and interactions of 
neighboring proteins within KEGG human regulatory pathways. After discarding proteins with 
indirect interaction effect, the interaction nature of neighboring proteins includes 
activation, inhibition, binding/association, dissociation, state change, phosphorylation,
dephosphorylation, glycosylation, ubiquitination and methylation.
we randomly choose two distinct human proteins from Entrez Gene database as a non-interacting 
protein pair. This is valid since the chance of identifying protein–protein interactions at 
random is very small (0.024% based on the two-hybrid data by Utez et al., 2000).

funsim :

A new measure for functional similarity of gene products based on Gene Ontology.
Schlicker A, Domingues FS, Rahnenführer J, Lengauer T
BMC Bioinformatics7() p302 (2006 Jun 15) 10.1186/1471-2105-7-302

SS vs Sequence similrity

In summary, these results confirm that functionally related proteins tend to have higher 
sequence similarity. This is more evident for the MFscore. Nevertheless, a considerable
percentage of protein pairs that are orthologous and that have a high sequence similarity show 
no functional similarity. The comparison with Lord's approach to combine semantic similarity 
scores shows significantly different results. In particular, the proposed approach is expected 
to provide a better discrimination between nonhomologous and orthologous proteins.


Finding functionally related proteins



MDS for yeast-yeast comparison



<latex>NS={{\sum_{ij}d_ij\prime - d_{ij})}^2}}\over{\sum_{ij}d_ij^2}}}</latex>


<latex>{d_{ij}}\prime</latex> is the distance of proteins i and j in the low dimensional space.


<latex>d_{ij}</latex> is the respective distance in the original space.


<latex>CR_k = {{(NS_k - NS_{k-1})}\over{(NS_{k+1} - NS_k )}}</latex>


<latex>k</latex> is the number if dimensions.

LussierYA :

Evaluation of high-throughput functional categorization of human disease genes.
Chen JL, Liu Y, Sam LT, Li J, Lussier YA
BMC Bioinformatics8 Suppl 3() pS7 (2007 May 9) 10.1186/1471-2105-8-S3-S7

Classification of HDG using GO categories

G-SESAME :

A new method to measure the semantic similarity of GO terms.
Wang JZ, Du Z, Payattakool R, Yu PS, Chen CF
Bioinformatics23(10) p1274-81 (2007 May 15) 10.1093/bioinformatics/btm087

SS vs pathway

LussierYA :

BurgunA :

SS vs pathway

# of transversal networks describe
6 4 KEGG annotations are identical or correspond to sibling KEGG pathways
/-\ 1 KEGG annotations correspond to closely related two-level terms
/-\ 1 annotations are different but reflect the composition of the networks into subnetworks
4 Heterogenous. However, from a biological point of view, the KEGG annotations are complementary