목차

Set-wise semantic similarity measures

IC(t) : Information content of term t

S(g1,g2) : semantic similarity between gene1 and gene2
S(t1,t2) : semantic similarity between term1 and term2

LordPW :

Semantic similarity measures as tools for exploring the gene ontology.
Lord PW, Stevens RD, Brass A, Goble CA
Pac Symp Biocomput() p601-12 (2003)

S(g1,g2)= {1/{m*n}} Sigma_{t_i in T_1 , t_j in T_2 }{sim(t_i , t_j )}


For this paper only those terms with evidence codes of “Traceable Author Statement” have been used.

FuSSiMeG :

Implementation of a functional semantic similarity measure between gene-products

Couto F, Silva M, Coutinho P
DI/FCUL TR 03--29 (2003 November)

S(g1,g2)=max delim{lbrace} {S(t1,t2)*IC(t1)*IC(t2):t1 in T(g1) wedge t2 in T(g2)} {rbrace}

Resnik, Jiang, and Lin measure :

Correlation between gene expression and GO semantic similarity.
Sevilla JL, Segura V, Podhorski A, Guruceaga E, Mato JM, Martínez-Cruz LA, Corrales FJ, Rubio A
IEEE/ACM Trans Comput Biol Bioinform2(4) p330-8 (2005 Oct-Dec) 10.1109/TCBB.2005.50

S(g1,g2)=max delim{lbrace} {S(t1,t2):t1 in T(g1) wedge t2 in T(g2)} {rbrace}

RubioA :

Correlation between gene expression and GO semantic similarity.
Sevilla JL, Segura V, Podhorski A, Guruceaga E, Mato JM, Martínez-Cruz LA, Corrales FJ, Rubio A
IEEE/ACM Trans Comput Biol Bioinform2(4) p330-8 (2005 Oct-Dec) 10.1109/TCBB.2005.50

S(g1,g2)=max lbrace{S(t1,t2):t1 in T(g1) wedge t2 in T(g2)}rbrace

LiebmanMN :

Assessing semantic similarity measures for the characterization of human regulatory pathways.
Guo X, Liu R, Shriver CD, Hu H, Liebman MN
Bioinformatics22(8) p967-73 (2006 Apr 15) 10.1093/bioinformatics/btl042

S(g1,g2)=max delim{lbrace} {S(t1,t2):t1 in T(g1) wedge t2 in T(g2)} {rbrace}

LinK

S(g1,g2)=max delim{lbrace} {S(t1,t2):t1 in T(g1) wedge t2 in T(g2)} {rbrace}

funsim :

A new measure for functional similarity of gene products based on Gene Ontology.
Schlicker A, Domingues FS, Rahnenführer J, Lengauer T
BMC Bioinformatics7() p302 (2006 Jun 15) 10.1186/1471-2105-7-302

S(g1,g2)=max delim{lbrace} {columnScore, rowScore} {rbrace}

rowScore=1/N sum{i=1}{N}{{max}under{1<=j<=M} S_ij},

columnScore=1/M sum{j=1}{M}{{max}under{1<=i<=N} S_ij}

S_ij = sim({GO_i}^A , {GO_j}^B ),{forall}i in delim{lbrace}{1,...,N}{rbrace}, {forall}j in delim{lbrace}{1,...,M}{rbrace}




OlssonB :

S(g1,g2)=1/{m*n} sum{t_i in T_1 , t_j in T_2 }{}{sim(t_i , t_j )}

GORank :

GORank: Gene Ontology를 이용한 유전자 산물의 의미적 유사성 검색

김기성, 유상원, 김형주
정보과학회논문지 33권 7호 p682-692 (2006 December)

For gene A and gene B,

IC(A)= sum{t in Ann(A)}{}{(AW(A,t) IC(t))}
AW(A,t) : Annotation Weight of Term t by evidence code ~~ TAS : 1.0, IEA : 0.1
{S}under{max}(A,t)={max}under{k in Ann(A)} S(k,t)

SharedIC(A,B)=sum{t in Ann(A)}{}{delim{lbrace}{{S}under{max}(t,B) AW(A,t) IC(t)}{rbrace}}

SharedIC(A,B)<>SharedIC(B,A)

S(A,B)={SharedIC(A,B)+SharedIC(B,A)}/{IC(A)+IC(B)}

LussierYA :

Evaluation of high-throughput functional categorization of human disease genes.
Chen JL, Liu Y, Sam LT, Li J, Lussier YA
BMC Bioinformatics8 Suppl 3() pS7 (2007 May 9) 10.1186/1471-2105-8-S3-S7

S(g1,g2)=1/{m*n} sum{t_i in T_1 , t_j in T_2 }{}{sim(t_i , t_j )}

G-SESAME :

A new method to measure the semantic similarity of GO terms.
Wang JZ, Du Z, Payattakool R, Yu PS, Chen CF
Bioinformatics23(10) p1274-81 (2007 May 15) 10.1093/bioinformatics/btm087

For one term go, a GO term set GO = delim{lbrace}{go_1 ,go_2 ,...,go_k }{rbrace}

S(go,GO)={max}under{1<=i<=k} (S_GO (go,go_i ))

For given two genes G1 and G2, and their term set

GO_1 = delim{lbrace}{go_11 , go_12 ,...go_1m }{rbrace} , GO_2 = delim{lbrace}{go_21 , go_22 ,...go_2n }{rbrace},

S(G1, G2)={sum{1<=i<=m}{}{S(go_1i ,GO_2 )} + sum{1<=j<=n}{}{S(go_2j , GO_1 )}}/{m+n}

LussierYA :

{S(A,B)=2* Sigma_{(a_1 ,b_i) in P, S(a_i ,b_i )>=t} S(a_i ,b_i )}/{(delim{|}{A}{|} + delim{|}{B}{|})}

BurgunA :

Gene g is represented by the IC vector of annotated GO terms delim{lbrace}{1,2,...n}{rbrace},

g=(IC_1 , IC_2 ,...,IC_n )

S(g1,g2)={vec{g1} vec{g2}}/{|g1||g2|}={sum{i=1}{n}{IC1_i * IC2_i }}/{sqrt{sum{i=1}{n}{ (IC1_i )+ *(IC2_i )+ }}}

DongXu :

Example of GO index and the corresponding GO ID and functional category

Index level GO Index Functional category and GO ID
Index1 1-2 cellular process (GO:0009987)
Index2 1-2-1 cell communication (GO:0007154)
Index3 1-2-1-8 signal transduction (GO:0007165)
Index4 1-2-1-8-1 cell surface receptor linked signal transduction (GO:0007166)
Index5 1-2-1-8-1-4 G-protein coupled receptor protein signaling pathway (GO:0030454)

S(g1,g2) is maximum indices of overlap

(Example)~~ GO index of gene1 = 1-1-3-3-4, GO index of gene2 = 1-1-3-2

S(g1,g2) = 2 ~~ match for level 1 (1-1), level 2 (1-1-3)

ZhangA :

Suppose a protein x is annotated on m different GO terms.

S_i (x) : a set of annotated proteins on the GO term g_i ,

~~~whose annotation includes x, where 1<=i<=m



Suppose both x and y are annotated on n different GO terms, where n<=m

S_j (x,y) : a set of annotated proteins on the GO term g_j ,

~~~whose annotation includes x and y, where 1<=j<=n



Suppose the size of annotation represents the number of annotated proteins on a GO term

Using the annotation size of the most specific GO term, on which two proteins x and y are annotated,

S(x,y)=-zlog({{min}under{j} delim{|}{S_j (x,y)}{|}}/{delim{|}{S_root }{|}}), where z={1}/{log delim{|}{S_root }{|}-log delim{|}{S_min }{|}}

z is a normalization term using the maximum size of annotation,

{S_root} and the minimum size of annotation, S_min , among all GO terms in a DAG structure.

If two proteins x and y are annotated on a more specific GO term than x and z,

then x is semantically more similar to y than z. The semantic similarity S(x, y)

can be assigned to the edge between x and y as a weight.

simGIC :

Metrics for GO based protein semantic similarity: a systematic evaluation.
Pesquita C, Faria D, Bastos H, Ferreira AE, Falcão AO, Couto FM
BMC Bioinformatics9 Suppl 5() pS4 (2008 Apr 29) 10.1186/1471-2105-9-S5-S4

S_max (g1,g2)=max delim{lbrace} {S(t1,t2):t1 in T(g1) wedge t2 in T(g2)} {rbrace}

S_avg (g1,g2)=1/{m*n} sum{t_i in T_1 , t_j in T_2 }{}{sim(t_i , t_j )}

S_bma (g1,g2)={{AVG}under{t1} ({S}under{max_t2 }(t1,t2))+{AVG}under{t1} ({S}under{max_t1 }(t1,t2))}/{2} , t_1 in GO(A), t_2 in GO(B)



GO(X) : implicitly annotated GO terms for gene X

simUI(A,B)={COUNT_{{t}in{delim{lbrace}{GO(A) inter GO(B)}{rbrace}}}} /{COUNT_{{t}in{delim{lbrace}{GO(A) union GO(B)}{rbrace}}}}

simGIC(A,B)={Sigma_{t in delim{lbrace}{GO(A) inter GO(B)}{rbrace}} IC(t)}/{Sigma_{t in delim{lbrace}{GO(A) union GO(B)}{rbrace}} IC(t)}

GOSAP :

GOSAP: Gene Ontology-Based Semantic Alignment of Biological Pathways.
Gamalielsson J, Olsson B
Int J Bioinform Res Appl4(3) p274-94 (2008)

S_max (g1,g2)=max delim{lbrace} {S(t1,t2):t1 in T(g1) wedge t2 in T(g2)} {rbrace}