Set-wise semantic similarity measures
- - LordPW :
  - FuSSiMeG :
  - Resnik, Jiang, and Lin measure :
  - RubioA :
  - LiebmanMN :
  - LinK
  - funsim :
  - OlssonB :
  - GORank :
  - LussierYA :
  - G-SESAME :
  - LussierYA :
  - BurgunA :
  - DongXu :
  - ZhangA :
  - simGIC :
  - GOSAP :

Set-wise semantic similarity measures

IC(t) : Information content of term t

S(g1,g2) : semantic similarity between gene1 and gene2
S(t1,t2) : semantic similarity between term1 and term2

LordPW :

Semantic similarity measures as tools for exploring the gene ontology.
Lord PW, Stevens RD, Brass A, Goble CA
Pac Symp Biocomput() p601-12 (2003)

$S(g1,g2)= {1/{m*n}} Sigma_{t_i in T_1 , t_j in T_2 }{sim(t_i , t_j )}$

For this paper only those terms with evidence codes of “Traceable Author Statement” have been used.

FuSSiMeG :

Implementation of a functional semantic similarity measure between gene-products

DI/FCUL TR 03--29 (2003 November)

S(g1,g2)=max delim{lbrace} {S(t1,t2)*IC(t1)*IC(t2):t1 in T(g1) wedge t2 in T(g2)} {rbrace}

Resnik, Jiang, and Lin measure :

Correlation between gene expression and GO semantic similarity.
Sevilla JL, Segura V, Podhorski A, Guruceaga E, Mato JM, Martínez-Cruz LA, Corrales FJ, Rubio A
IEEE/ACM Trans Comput Biol Bioinform2(4) p330-8 (2005 Oct-Dec) 10.1109/TCBB.2005.50

S(g1,g2)=max delim{lbrace} {S(t1,t2):t1 in T(g1) wedge t2 in T(g2)} {rbrace}

RubioA :

S(g1,g2)=max lbrace{S(t1,t2):t1 in T(g1) wedge t2 in T(g2)}rbrace

LiebmanMN :

Assessing semantic similarity measures for the characterization of human regulatory pathways.
Guo X, Liu R, Shriver CD, Hu H, Liebman MN
Bioinformatics22(8) p967-73 (2006 Apr 15) 10.1093/bioinformatics/btl042

S(g1,g2)=max delim{lbrace} {S(t1,t2):t1 in T(g1) wedge t2 in T(g2)} {rbrace}

LinK

Prediction of yeast protein-protein interaction network: insights from the Gene Ontology and annotations.
Wu X, Zhu L, Guo J, Zhang DY, Lin K
Nucleic Acids Res34(7) p2137-50 (2006) 10.1093/nar/gkl219

S(g1,g2)=max delim{lbrace} {S(t1,t2):t1 in T(g1) wedge t2 in T(g2)} {rbrace}

funsim :

A new measure for functional similarity of gene products based on Gene Ontology.
Schlicker A, Domingues FS, Rahnenführer J, Lengauer T
BMC Bioinformatics7() p302 (2006 Jun 15) 10.1186/1471-2105-7-302

S(g1,g2)=max delim{lbrace} {columnScore, rowScore} {rbrace}

$rowScore=1/N sum{i=1}{N}{{max}under{1<=j<=M} S_ij}$ ,

$columnScore=1/M sum{j=1}{M}{{max}under{1<=i<=N} S_ij}$

$S_ij = sim({GO_i}^A , {GO_j}^B ),{forall}i in delim{lbrace}{1,...,N}{rbrace}, {forall}j in delim{lbrace}{1,...,M}{rbrace}$

OlssonB :

Combining functional and topological properties to identify core modules in protein interaction networks.
Lubovac Z, Gamalielsson J, Olsson B
Proteins64(4) p948-59 (2006 Sep 1) 10.1002/prot.21071

$S(g1,g2)=1/{m*n} sum{t_i in T_1 , t_j in T_2 }{}{sim(t_i , t_j )}$

GORank :

GORank: Gene Ontology를 이용한 유전자 산물의 의미적 유사성 검색

김기성, 유상원, 김형주

정보과학회논문지 33권 7호 p682-692 (2006 December)

For gene A and gene B,

IC(A)= sum{t in Ann(A)}{}{(AW(A,t) IC(t))}
AW(A,t) : Annotation Weight of Term t by evidence code ~~ TAS : 1.0, IEA : 0.1
{S}under{max}(A,t)={max}under{k in Ann(A)} S(k,t)

SharedIC(A,B)=sum{t in Ann(A)}{}{delim{lbrace}{{S}under{max}(t,B) AW(A,t) IC(t)}{rbrace}}

SharedIC(A,B)<>SharedIC(B,A)

S(A,B)={SharedIC(A,B)+SharedIC(B,A)}/{IC(A)+IC(B)}

LussierYA :

Evaluation of high-throughput functional categorization of human disease genes.
Chen JL, Liu Y, Sam LT, Li J, Lussier YA
BMC Bioinformatics8 Suppl 3() pS7 (2007 May 9) 10.1186/1471-2105-8-S3-S7

$S(g1,g2)=1/{m*n} sum{t_i in T_1 , t_j in T_2 }{}{sim(t_i , t_j )}$

G-SESAME :

A new method to measure the semantic similarity of GO terms.
Wang JZ, Du Z, Payattakool R, Yu PS, Chen CF
Bioinformatics23(10) p1274-81 (2007 May 15) 10.1093/bioinformatics/btm087

$For one term go, a GO term set GO = delim{lbrace}{go_1 ,go_2 ,...,go_k }{rbrace}$

$S(go,GO)={max}under{1<=i<=k} (S_GO (go,go_i ))$

For given two genes G1 and G2, and their term set

$GO_1 = delim{lbrace}{go_11 , go_12 ,...go_1m }{rbrace} , GO_2 = delim{lbrace}{go_21 , go_22 ,...go_2n }{rbrace},$

$S(G1, G2)={sum{1<=i<=m}{}{S(go_1i ,GO_2 )} + sum{1<=j<=n}{}{S(go_2j , GO_1 )}}/{m+n}$

LussierYA :

Information theory applied to the sparse gene ontology annotation network to predict novel gene function.
Tao Y, Sam L, Li J, Friedman C, Lussier YA
Bioinformatics23(13) pi529-38 (2007 Jul 1) 10.1093/bioinformatics/btm195

${S(A,B)=2* Sigma_{(a_1 ,b_i) in P, S(a_i ,b_i )>=t} S(a_i ,b_i )}/{(delim{|}{A}{|} + delim{|}{B}{|})}$

BurgunA :

A transversal approach to predict gene product networks from ontology-based similarity.
Chabalier J, Mosser J, Burgun A
BMC Bioinformatics8() p235 (2007 Jul 2) 10.1186/1471-2105-8-235

Gene g is represented by the IC vector of annotated GO terms delim{lbrace}{1,2,...n}{rbrace},

g=(IC_1 , IC_2 ,...,IC_n )

$S(g1,g2)={vec{g1} vec{g2}}/{|g1||g2|}={sum{i=1}{n}{IC1_i * IC2_i }}/{sqrt{sum{i=1}{n}{ (IC1_i )+ *(IC2_i )+ }}}$

DongXu :

Quantitative assessment of relationship between sequence similarity and function similarity.
Joshi T, Xu D
BMC Genomics8() p222 (2007 Jul 9) 10.1186/1471-2164-8-222

Example of GO index and the corresponding GO ID and functional category

Index level	GO Index	Functional category and GO ID
Index1	1-2	cellular process (GO:0009987)
Index2	1-2-1	cell communication (GO:0007154)
Index3	1-2-1-8	signal transduction (GO:0007165)
Index4	1-2-1-8-1	cell surface receptor linked signal transduction (GO:0007166)
Index5	1-2-1-8-1-4	G-protein coupled receptor protein signaling pathway (GO:0030454)

S(g1,g2) is maximum indices of overlap

(Example)~~ GO index of gene1 = 1-1-3-3-4, GO index of gene2 = 1-1-3-2

S(g1,g2) = 2 ~~ match for level 1 (1-1), level 2 (1-1-3)

ZhangA :

Semantic integration to identify overlapping functional modules in protein interaction networks.
Cho YR, Hwang W, Ramanathan M, Zhang A
BMC Bioinformatics8() p265 (2007 Jul 24) 10.1186/1471-2105-8-265

Suppose a protein x is annotated on m different GO terms.

S_i (x) : a set of annotated proteins on the GO term g_i ,

~~~whose annotation includes x, where 1<=i<=m

Suppose both x and y are annotated on n different GO terms, where n<=m

S_j (x,y) : a set of annotated proteins on the GO term g_j ,

Suppose the size of annotation represents the number of annotated proteins on a GO term

Using the annotation size of the most specific GO term, on which two proteins x and y are annotated,

$S(x,y)=-zlog({{min}under{j} delim{|}{S_j (x,y)}{|}}/{delim{|}{S_root }{|}}), where z={1}/{log delim{|}{S_root }{|}-log delim{|}{S_min }{|}}$

z is a normalization term using the maximum size of annotation,

${S_root} and the minimum size of annotation, S_min , among all GO terms in a DAG structure.$

If two proteins x and y are annotated on a more specific GO term than x and z,

then x is semantically more similar to y than z. The semantic similarity S(x, y)

can be assigned to the edge between x and y as a weight.

simGIC :

Metrics for GO based protein semantic similarity: a systematic evaluation.
Pesquita C, Faria D, Bastos H, Ferreira AE, Falcão AO, Couto FM
BMC Bioinformatics9 Suppl 5() pS4 (2008 Apr 29) 10.1186/1471-2105-9-S5-S4

$S_max (g1,g2)=max delim{lbrace} {S(t1,t2):t1 in T(g1) wedge t2 in T(g2)} {rbrace}$

$S_avg (g1,g2)=1/{m*n} sum{t_i in T_1 , t_j in T_2 }{}{sim(t_i , t_j )}$

$S_bma (g1,g2)={{AVG}under{t1} ({S}under{max_t2 }(t1,t2))+{AVG}under{t1} ({S}under{max_t1 }(t1,t2))}/{2} , t_1 in GO(A), t_2 in GO(B)$

GO(X) : implicitly annotated GO terms for gene X

$simUI(A,B)={COUNT_{{t}in{delim{lbrace}{GO(A) inter GO(B)}{rbrace}}}} /{COUNT_{{t}in{delim{lbrace}{GO(A) union GO(B)}{rbrace}}}}$

$simGIC(A,B)={Sigma_{t in delim{lbrace}{GO(A) inter GO(B)}{rbrace}} IC(t)}/{Sigma_{t in delim{lbrace}{GO(A) union GO(B)}{rbrace}} IC(t)}$

GOSAP :

GOSAP: Gene Ontology-Based Semantic Alignment of Biological Pathways.
Gamalielsson J, Olsson B
Int J Bioinform Res Appl4(3) p274-94 (2008)

$S_max (g1,g2)=max delim{lbrace} {S(t1,t2):t1 in T(g1) wedge t2 in T(g2)} {rbrace}$