Discovering significant and interpretable patterns from multifactorial DNA microarray data with poor replication.

Motivation: Multivariate analyses are advantageous for the simultaneous testing of the separate and combined effects of many variables and of their interactions. In factorial designs with many factors and/or levels, however, sufficient replication is often prohibitively costly. Furthermore, complicated statements are often required for the biological interpretation of the higher-order interactions determined by standard statistical techniques like analysis of variance.

Results: Because we are usually interested in finding factor-specific effects or their interactions, we assumed that the observed expression profile of a gene is a manifestation of an underlying factor-specific generative pattern (FSGP) combined with noise. Thus, a genetic algorithm was created to find the nearest FSGP for each expression profile. We then measured the distance between each profile and the corresponding nearest FSGP. Permutation testing for the distance measures successfully identified those genes with statistically significant profiles, thus yielding straightforward biological interpretations. Association networks of genes, drugs, and cell lines were created as tripartite graphs, representing significant and interpretable relations, by using a microarray experiment of gastric-cancer cell lines with a factorial design and no replication. The proposed method may benefit the combined analysis of heterogeneous expression data from the growing public repositories.

Keyword: gene expression, DNA microarray, pattern recognition, genetic algorithm, gastric cancer

Availability: http://www.snubi.org/software/FSGP/
Contact:
juhan@snu.ac.kr

Journal of Biomedical Informatics 2004:37;260-268

Situation

¡¡

Problem

¡¡

Assuming underlying generative pattern.

  (i.e. range of the values: 0 ~ 9)

¡¡

Defining factor-specific generative pattern and pattern distance:

Therefore, the pattern distance between a two-factor profile Cij and the nearest FSGP, C¡¯ij, equals å|CijC¡¯ij| . 

More general N-dimensional pattern distance of Ck1,...,kn equals å|Ck1,...,knC¡¯k1,...,kn |. ¡¡

¡¡

Genetic Algorithm to find the nearest factor-specific generative pattern of an observation:

#An example of this  6 * 9 pattern 

¡¡

a case with dichotomized values (0, 1) a case with continuous values (0~9)

pattern = []
pattern.append([0,0,0,1,0,0,0,1,0])
pattern.append([0,1,0,1,0,0,0,1,0])
pattern.append([0,0,0,1,0,0,0,1,0])
pattern.append([0,0,0,1,0,0,0,1,0])
pattern.append([0,0,0,0,0,0,0,1,0])
pattern.append([1,1,1,1,1,1,1,1,1])
pattern.append([0,0,0,1,0,0,0,1,0])

Run the above example!

pattern = []
pattern.append([ 4, 0, 0, 9, 0, 5, 5, 3, 3])
pattern.append([ 4, 5, 9, 4, 0, 4, 3, 7, 4])
pattern.append([ 3, 3, 0, 9, 0, 5, 9, 5, 2])
pattern.append([ 4, 2, 0, 9, 0, 9, 4, 3, 3])
pattern.append([ 2, 4, 0, 9, 0, 8, 5, 6, 3])
pattern.append([ 5, 0, 0, 9, 0, 9, 2, 0, 2])
pattern.append([ 4, 0, 0, 9, 0, 9, 2, 0, 2])
pattern.append([ 9, 7, 6, 9, 0, 9, 8, 7, 2])
pattern.append([ 7, 0, 0, 9, 0, 9, 2, 0, 2])
pattern.append([ 4, 0, 0, 9, 0, 5, 5, 3, 3])
pattern.append([ 4, 5, 9, 4, 0, 4, 9, 7, 4])
pattern.append([ 9, 9, 9, 9, 9, 9, 9, 9, 9])
pattern.append([ 9, 9, 9, 9, 9, 9, 9, 9, 9])
pattern.append([ 9, 9, 9, 9, 9, 9, 9, 9, 9])
pattern.append([ 9, 9, 9, 9, 9, 9, 9, 9, 9])
Run the above example!

¡¡

¡¡

Determining significance: 

¡¡

Evaluation of FSGP

¡¡

Contact information: juhan@snu.ac.kr

powered by SNUBI