<Supplemental Site>

¡¡

Comprehensive evaluation of matrixfactorization methods for

the analysis of DNA microarray gene expression data

 

 

Mi Hyeon Kim1, Hwa Jeong Seo2, Je-Gun Joung1,3,4, Ju Han Kim1,3*

   

                                                                                                                                                        

                                                                                                                                                                               

Last Modified: June. 29, 2011

 

 

ABSTRACTS


Background
: Clustering-based methods on gene-expression analysis have been shown to be useful in biomedical applications such as cancer subtype discovery. Among them, Matrix factorization (MF) is advantageous for clustering gene expression patterns from DNA microarray experiments, as it efficiently reduces the dimension of gene expression data. Although several MF methods have been proposed for clustering gene expression patterns, a systematic evaluation has not been reported yet.
Results: Here we evaluated the clustering performance of orthogonal and non-orthogonal MFs by a total of nine measurements for performance in four gene expression datasets and one well-known dataset for clustering. Specifically, we employed a non-orthogonal MF algorithm, BSNMF (Bi-directional Sparse Non-negative Matrix Factorization), that applies bi-directional sparseness constraints superimposed on non-negative constraints, comprising a few dominantly co-expressed genes and samples together. Non-orthogonal MFs tended to show better clustering-quality and prediction-accuracy indices than orthogonal MFs as well as a traditional method, K-means. Moreover, BSNMF showed improved performance in these measurements. Non-orthogonal MFs including BSNMF showed also good performance in the functional enrichment test using Gene Ontology terms and biological pathways.
Conclusions: In conclusion, the clustering performance of orthogonal and non-orthogonal MFs was appropriately evaluated for clustering microarray data by comprehensive measurements.


 

Supplemental Material

¡¡

1. Various clustering evaluation measures

     1.1 Leukemia dataset

     1.2 Medulloblastoma dataset

     1.3 Iris dataset

     1.4 Fibroblast dataset

     1.5 Mouse dataset

¡¡

2. Class assignment in Iris dataset

    - Table of class assignment  using six Matrix Factorization methods and K-means clustering in Iris dataset

        

3. Weighted P-value of significantly enriched terms

     - Plots of weighted P-value resulting from enrichment analysis for each cluster

      

       3.1 Fibroblast dataset

       3.2 Mouse dataset

¡¡

4. P-values for significantly enriched terms

     - Plots of log(P-value) using six Matrix Factorization methods and K-means clustering.  

     - Each plot is represented for Gene Ontology (GO) category, KEGG  and Biocarta

¡¡

     4.1 ALL cluster for Leukemia dataset

     4.2 AML cluster for Leukemia dataset

     4.3 Cluster 1 for Medulloblastoma dataset

     4.4 Cluster 2 for Medulloblastoma dataset

     4.5 Cluster 1 for Fibroblast dataset

     4.6 Cluster 2 for Fibroblast dataset

     4.7 Cluster 3 for Fibroblast dataset

     4.8 Cluster 4 for Fibroblast dataset

     4.9 Cluster 5 for Fibroblast dataset

     4.10 Cluster 1 for Mouse dataset

     4.11 Cluster 2 for Mouse dataset

   

¡¡

5. Twenty dominant genes in each subtype

    - Table of twenty dominant genes in each subtype using BSNMF

¡¡

Download

¡¡

   ** R code

     1.  K-means clustering 

     2.  Orthogonal matrix factorization   

     3.  Independent component analysis

     4.  Non-negative matrix  factorization 

     5.  Clustering evaluation measures 

¡¡

¡¡

  ** MATLAB code

    1.  Bi-directional Non-negative matrix factorization 

¡¡