Scientific Paper Sessions            Highlight Research Tracks            ISCB Scientific Session
 

Scientific Paper Sessions

S1. Clinical Application

Room: Grand Ballroom A
Date: Thursday, Oct. 3, 13:00 - 14:20
S1-1: Concordance of deregulated mechanisms unveiled in underpowered experiments: PTBP1 knockdown case study.

Vincent Gardeux1,2,3,§, Ahmet Dirim Arslan4,5,§, Ikbel Achour1,2,§, Tsui-Ting Ho4,6,§, William T. Beck4,10,*, Yves A. Lussier1,2,4,7,8,9,10,*

1 Institute for Translational Health Informatics, University of Illinois at Chicago, Illinois, USA.
2 Department of Medicine, University of Illinois at Chicago, Chicago, Illinois, USA.
3 Department of Informatics, School of Engineering, EISTI (Ecole Internationale des Sciences du Traitement de l'Information), Cergy-Pontoise, France.
4 Department of Biopharmaceutical Science, College of Pharmacy, University of Illinois at Chicago, Ill., USA.
5 Robert H. Lurie Comprehensive Cancer Center, Northwestern University, Chicago, Illinois, USA.
6 Cancer Institute, University of Mississippi Medical Center, Jackson, Mississippi, USA.
7 Department of Bioengineering, University of Illinois at Chicago, Chicago, Illinois, USA.
8 Computation Inst. & Inst. For Genomics & Systems Biol, Argonne National Lab. & Un. of Chicago, Ill., USA.
9 Institute for Personalized Respiratory Medicine, University of Illinois at Chicago, Illinois, USA.
10 University of Illinois Cancer Center, Chicago, IL, USA.
§ Equal contribution


Abstract
Background: Genome-wide transcriptome profiling generated by microarray and RNA-Seq often provides deregulated genes or pathways applicable only to larger cohort. On the other hand, individualized interpretation of transcriptomes is increasely pursued to improve diagnosis, prognosis, and patient treatment processes. Yet, robust and accurate methods based on a single paired-sample remain an unmet challenge.
Method: "N-of-1-pathways" translates gene expression data profiles into mechanism-level profiles on single pairs of samples (one p-value per geneset). It relies on three principles: I )statistical universe is a single paired sample, which serves as its own control; ii) statistics can be derived from multiple gene expression measures. We analyzed deregulated mechanisms associated with the depletion of the alternative splicing protein, PTBP1. Using a single paired neuronal cell line RNA-Seq transcriptomes (Gold Standard), our method predicts mechanisms that were compared to those of breast and ovarian cancer cell lines (mRNA expression microarray data).
Results: N-of-1-pathways predictions outperform those of GSEA and Differentially Expressed Genes enrichment (DEG-enrichment), within- and cross-datasets. N-of-1-pathways uncovered concordant PTBP1- dependent mechanisms across datasets (Odds-Ratios >= 13, p-values <= 1x10-5), such as RNA splicing and cell cycle. In addition, it unveils tissue-specific mechanisms of alternatively transcribed PTBP1-dependent genesets. Furthermore, we demonstrate that GSEA and DEG-Enrichment preclude accurate analysis on single paired samples.
Conclusion: N-of-1-pathways enables robust and biologically relevant mechanism-level classifiers with small cohorts and one single paired samples that surpasses conventional methods. Further, it identifies unique sample/ patient mechanisms, a requirement for precision medicine.
Software: http://Lussierlab.org/publication/N-of-1-pathways.

Top

S1-2: Predicting different phenotypes of asthma and eczema using machine learning

Mattia C.F. Prosperi1,2,*, Susana Marinho2, Angela Simpson2, Iain Buchan1, Adnan Custovic2

1 Centre for Health Informatics, Institute of Population Health, Faculty of Medical and Human Sciences, University of Manchester, Manchester, United Kingdom
2 Centre for Respiratory Medicine and Allergy, Institute of Inflammation and Repair, University of Manchester, Manchester, United Kingdom


Abstract
Asthma is the most common chronic disease in the developed countries, with a relatively modest drug armamentarium. There is increasing recognition that asthma is a heterogeneous disease with similar clinical manifestations (phenotypes), but different underlying pathophysiological causes (endotypes).
We investigate here the predictive ability of linear/non-linear machine learning models (from logistic regression to random forests, validated via extra-sample bootstrapping) in an unselected population, with respect to different operational definitions of asthma, wheeze, and eczema, using a large heterogeneous set of attributes (demographic, clinical, laboratory features, genetic profiles, environmental exposures). The aim is to identify to which extent such heterogeneous information contributes and combines towards specific clinical manifestations.
Our study population included 554 adults, 42% male, 38% previous or current smokers. Proportion of asthma, wheeze, and eczema diagnoses was 16.7%, 12.3%, and 21.7%, respectively. Models were fit on 223 non-genetic variables plus 215 single nucleotide polymorphisms. In general non-linear models achieved a better sensitivity/specificity trade-off as compared to other methods, more markedly when considering asthma and wheeze, less with respect to eczema (area under the curve 84%, 76% and 64%, respectively). Findings confirm the relevant contribution of allergen sensitisation combined with lung function markers (but not for eczema, for which new predictors like whole body impedance are found). Predictive ability of genetic markers alone is limited.
Looking forward to a longitudinal extension as well as increasing the amount of information processed, this study marks the grounds for a better understanding of disease mechanisms towards the development of personalized diagnostic tools.

Top

S1-3: Comparison of warfarin therapy clinical outcomes following implementation of an automated mobile phone-based critical laboratory value text alert system

Shu-Wen Lin1,2,3, Wen-Yi Kang4, Dong-Tsamn Lin5, James Chao-Shen Lee6, Fe-Lin Lin Wu1,2,3, Chuen-Liang Chen7, Yufeng J. Tseng3,4,7,*

1 Graduate Institute of Clinical Pharmacy, College of Medicine, National Taiwan University
2 School of Pharmacy, College of Medicine, National Taiwan University
3 Department of Pharmacy, National Taiwan University Hospital
4 Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University
5 Department of Pediatrics and Laboratory Medicine, College of Medicine, National Taiwan University
6 Department of Pharmacy Practice, College of Pharmacy, University of Illinois at Chicago
7 Department of Computer Science and Information Engineering, National Taiwan University


Abstract
Objective: To evaluate clinical outcomes of patients on warfarin therapy following implementation of a Personal Handy-phone System-based (PHS) alert system capable of generating and delivering text messages to communicate critical prothrombin time (PT) / international normalized ratio (INR) laboratory results to practitioners' mobile phones in a large tertiary teaching hospital.
Design: A retrospective analysis was performed comparing patient clinical outcomes and physician prescribing behavior following conversion from a manual laboratory result alert system to an automated system.
Measurements: Clinical outcomes and practitioner responses to both alert systems were compared. Complications to warfarin therapy, warfarin utilization, and PT/INR results were evaluated for both systems, as well as clinician time to read alert messages, time to warfarin therapy modification, and monitoring frequency.
Results: No significant differences were detected in major hemorrhage and thromboembolism, warfarin prescribing patterns, PT/INR results, warfarin therapy modification, or monitoring frequency following implementation of the PHS text alert system. In both study periods, approximately 80% of critical results led to warfarin discontinuation or dose reduction. Senior physicians' follow-up response time to critical results was significantly decreased in the PHS alert study period compared to the manual notification study period (P=0.015). No difference in follow-up response time was detected for junior physicians.
Conclusions: Implementation of an automated PHS-based text alert system did not adversely impact clinical or safety outcomes of patients on warfarin therapy. Approximately 80% immediate recognition of text alerts was achieved. The potential benefits of an automated PHS alert for senior physicians were demonstrated.

Top

S1-4: Automatic detection and resolution of measurement-unit conflicts in aggregated data

Soroush Samadian1, Bruce McManus1 and Mark Wilkinson2

1 UBC James Hogg Research Center, Institute for Heart + Lung Health, Room 166 - 1081 Burrard Street, St. Paul's Hospital Vancouver, BC, Canada, V6Z 1Y6
2 Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid, Madrid, España


Abstract
Motivation: Measurement-unit conflicts are a perennial problem in integrative research domains such as clinical meta-analysis. As multi-national collaborations grow, as new measurement instruments appear, and as Linked Open Data infrastructures become increasingly pervasive, the number of such conflicts will similarly increase. We propose a generic approach to the problem of (a) encoding measurement units in datasets in a machine-readable manner, (b) detecting when a dataset contained mixtures of measurement units, and (c) automatically converting any conflicting units into a desired unit, as defined for a given study.
Results: We utilized existing ontologies and standards for scientific data representation, measurement unit definition, and data manipulation to build a simple and flexible Semantic Web Service-based approach to measurement-unit harmonization. A cardiovascular patient cohort in which clinical measurements were recorded in a number of different units (e.g., mmHg and cmHg for blood pressure) was automatically classified into a number of clinical phenotypes, semantically defined using different measurement units.
Conclusion: We demonstrate that through a combination of semantic standards and frameworks, unit integration problems can be automatically detected and resolved.

Top

S2. Cancer Bioinformatics

Room: Grand Ballroom B
Date: Thursday, Oct. 3, 13:00 - 14:20
S2-1: Integrative analysis reveals disease-associated genes and biomarkers for prostate cancer progression

Yin Li1,§, Wanwipa Vongsangnak1,§, Luonan Chen2, Bairong Shen1,*

1 Center for Systems Biology, Soochow University, Suzhou, 215006, China
2 Key Laboratory of Systems Biology, Chinese Academy of Sciences, Shanghai, 200031, China
§ Co-first authors


Abstract
Background Prostate cancer is one of the most common complex diseases with high leading cause of death in men. Identification of prostate cancer associated genes and biomarkers is thus essential as it can gain insights into the mechanisms underlying disease progression and advancing for early diagnosis and developing effective therapies.
Methods In this study, we presented an integrative analysis of gene expression profiling and protein interaction network at systematic level to reveal candidate disease-associated genes and biomarkers for prostate cancer progression. We first reconstructed the human prostate cancer protein-protein interaction network (HPC- PPIN) and then the network was integrative analyzed with the prostate cancer gene expression data to identify modules related to different phases in prostate cancer. At last, the candidate module biomarker was validated by its predictive ability of prostate cancer progression.
Results Different phases-specific modules were identified for prostate cancer. Among these modules, transcription Androgen Receptor (AR) nuclear signaling and Epidermal Growth Factor Receptor (EGFR) signaling pathway were shown to be the pathway targets for prostate cancer progression. The identified candidate disease-associated genes showed better predictive ability of prostate cancer progression than those of published biomarkers. In context of functional enrichment analysis, interestingly candidate disease- associated genes were enriched in the nucleus and different functions were encoded for potential transcription factors, for examples key players as AR, Myc, ESR1 and hidden player as Sp1 which were considered as potential biomarkers for prostate cancer.
Conclusions The successful results on prostate cancer samples demonstrated that the integrative analysis is powerful and useful approach to detect candidate disease-associate genes and modules which can be used as the potential biomarkers for prostate cancer progression. The data, tools and supplementary files for this integrative analysis are deposited at http://www.ibio-cn.org/HPC- PPIN/.

Top

S2-2: A Coupling Approach of a Predictor and a Descriptor for Breast Cancer Prognosis

Hyunjung Shin1,* and Yonghyun Nam1

1 Department of Industrial Engineering, Ajou University, Wonchun-dong, Yeongtong-gu, Suwon 443-749, South Korea

Abstract
Background In cancer prognosis research, diverse machine learning models have applied to the problems of cancer susceptibility (risk assessment), cancer recurrence (redevelopment of cancer after resolution), and cancer survivability, regarding an accuracy (or an AUC--the area under the ROC curve) as a primary measurement for the performance evaluation of the models. However, in order to help medical specialists to establish a treatment plan by using the predicted output of a model, it is more pragmatic to elucidate which variables (markers) have most significantly influenced to the resulting outcome of cancer or which patients show a similar patterns.
Proposed Method In this study, a coupling approach of two sub-modules--a predictor and a descriptor--is proposed. The predictor module generates the predicted output for the cancer outcome. Semi-supervised learning Co-training algorithm is employed as a predictor. On the other hand, the descriptor module post- processes the results of the predictor module, mainly focusing on which variables are more highly or less significantly ranked when describing the results of the prediction, and how patients are segmented into several groups according to the trait of common patterns among them. Decision trees are used as a descriptor.
Results The proposed approach, 'predictor-descriptor', was tested on the breast cancer survivability problem based on the surveillance, epidemiology, and end results database for breast cancer (SEER). The results present the performance comparison among the established machine leaning algorithms, the ranks of the prognosis elements for breast cancer, and patient segments.

Top

S2-3: Identifying Potential Subtypes of Melanoma based on Pathway Activity Profiles

Sungwon Jung1, Seungchan Kim1

1 Integrated Cancer Genomics Division, Translational Genomics Research Institute, 445 North 5th Street, Phoenix, Arizona 85004, USA

Abstract
Identifying subtypes of complex diseases such as cancer is the very first step toward developing highly customized therapeutics on such diseases, as their origins significantly vary even with similar physiological characteristics. There have been many studies to recognize subtypes of various cancer based on genomic signatures, and most of them rely on approaches based on the signatures or features developed from individual genes. However, the idea of network-driven activities of biological functions has gained a lot of interests, as more evidence is found that biological systems can show highly diverse activity patterns because genes can interact differentially across specific molecular contexts. In this study, we proposed a method to compute the dissimilarity between two patient samples based on their pathway profiles, where pathway profiles are evaluated by computing the likelihoods of genetic networks and silenced interactions within pathways. By using the proposed dissimilarity measure between sample pathway profiles in clustering melanoma gene expression data, we identified two potential subtypes of melanoma with distinguished pathway profiles, where the two groups of patients showed significantly different survival patterns. We also investigated selected pathways with distinguished activity patterns between the two groups, and the result suggests hypotheses on the mechanisms driving the two potential subtypes.

Top

S2-4: Identifying multi-biomarker to distinguish malignant from benign colorectal tumours by a mixed integer programming

Meng Zou1,§, Peng-Jun Zhang2,§, Xin-Yu Wen2, Luonan Chen3,*, Ya-Ping Tian2,* and Yong Wang1,*

1 National Center for Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100080, China.
2 Department of Clinical Biochemistry, Chinese PLA General Hospital, Beijing, 100853, China
3 Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200233, China
§ Joint first author


Abstract
Biomarkers serve as useful tools to aid in the early diagnosis and ultimately battle the complex disease. For many malignancies, multi-biomarker from clinical data plays an important role in patient management and has been actively studied. Specifically, serum-based diagnosis to distinguish colorectal cancers (CRC) from benign colorectal tumours is very challenging.
Here, we develop a novel mixed integer programming based multi-biomarker diagnostic method. This method allows us to select the best subset of clinical markers by maximizing the accuracy to distinguish case and control samples given the number of selected biomarkers. We then generated serum profiling data for 101 CRC patients and 96 benign colorectal disease patients and analyzed 61 clinical features measured in serum individually and further their combinations. Four features were identified as our optimal small multi-biomarker panel, including known colon cancer biomarkers CEA and IL-10, as well as novel biomarkers IMA and NSE. Single feature analysis shows that CEA has the area under the curve (AUC) of receiver operating characteristic (ROC) 0.6995, followed by NSE (0.6643), IMA (0.6521), and IL-10 (0.6165). While the combined multi-biomarker panel greatly improved predictive leave-one-out cross-validation (LOOCV) accuracy to 0.7857 by nearest centroid classifier and an AUC 0.8438 by an independent three fold cross validation by support vector machines (SVMs). When we extend our optimal selection to a larger multi-biomarker panel with 13 features, the LOOCV reaches 0.8673 and AUC gets 0.8437. In addition to accuracy, our method is efficient in computational time. When compared with the exhaustive search method to select 2, 3, and 4 markers with SVM, our method dramatically reduced the searching time by 1000 folds while achieving high accuracy. Furthermore, our method can efficiently select multi- biomarker panel with more than 5 features when the exhaustive methods fail.
In conclusion, we propose a novel model to select the best multi-biomarker panel. Our method takes less running time and improves the clinical interpretability, and can serve as a useful tool for other complex disease studies.

Top

S3. Proteoinformatics

Room: Grand Ballroom A
Date: Thursday, Oct. 3, 15:00 - 16:15
S3-1: Integrated analysis of genome-wide DNA methylation and gene expression profiles in molecular subtypes of breast cancer

Je-Keun Rhee1, Kwangsoo Kime2, Heejoon Chaee3, Jared Evanse4, Pearlly Yane5, Byung-Tak Zhange1,6, Joe Graye7, Paul Spellmane7, Tim Huange8, Kenneth Nephewe9,10 and Sun Kim1,2,6,*

1 Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 151-742, Korea
2 Bioinformatics Institute, Seoul National University, Seoul 151-744, Korea
3 School of Informatics and Computing, Indiana University, Bloomington, IN 47408, USA
4 Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN 55905, USA
5 The Ohio State University Comprehensive Cancer Center Nucleic Acid Shared Resource-Illumina Core, Columbus, OH 43210, USA
6 School of Computer Science and Engineering, Seoul National University, Seoul 151-742, Korea
7 OHSU Knight Cancer Institute, Portland, OR 97239, USA
8 Department of Molecular Medicine/Institute of Biotechnology, The University of Texas Health Science Center at San Antonio, San Antonio, TX 78229-3900, USA
9 Medical Sciences, Indiana University School of Medicine, Bloomington, IN 47405, USA
10 Department of Cellular and Integrative Physiology, Indiana University School of Medicine, Indianapolis, IN 46202, USA


Abstract
Aberrant DNA methylation of CpG islands, CpG island shores and first exons is known to play a key role in the altered gene expression patterns in all human cancers. To date, a systematic study on the effect of DNA methylation on gene expression using high resolution data has not been reported. In this study, we conducted an integrated analysis of MethylCap-sequencing data and Affymetrix gene expression microarray data for 30 breast cancer cell lines representing different breast tumor phenotypes. As well-developed methods for the integrated analysis do not currently exist, we created a series of four different analysis methods. On the computational side, our goal is to develop methylome data analysis protocols for the integrated analysis of DNA methylation and gene expression data on the genome scale. On the cancer biology side, we present comprehensive genome-wide methylome analysis results for differentially methylated regions and their potential effect on gene expression in 30 breast cancer cell lines representing three molecular phenotypes, luminal, basal A and basal B. Our integrated analysis demonstrates that methylation status of different genomic regions may play a key role in establishing transcriptional patterns in molecular subtypes of human breast cancer.

Top

S3-2: Prediction of C-peptide Like Family using Multiple Predictive Models and Feature Encodings

Elbashir Abbas1, Ho-Jin Choi1, Yan Zhang2, Luonen Chen2

1 Knowledge Engineering and Collective Intelligence Lab.(KECI), Dept., of Computer Science, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 305-701, Korea
2 Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences(SIBS), Chinese Academy of Sciences, Shanghai 200233, China


Abstract
Since its description in 1967, C-peptide has been historically thought to be an inert and biologically non-active peptide. That is, no physiological roles or functions were attributed to it other than connecting A and B chains' and aiding in proper folding of mature insulin. An increasing body of experimental evidence has challenged this view and purports the notion that C-peptide is bioactive, evidenced by observed signaling characteristics from in vitro experimental studies. The most pronounced is the ameliorated effect it has on diabetes induced renal and nerve dysfunction. Accordingly, the past decade has witnessed a renewal in C-peptide research aimed at providing a complete physiological characterization of the peptide. In this paper we provide the initial steps in addressing this endeavor computationally. We performed an investigative study on C-peptide that spanned 75 organisms, of which the physiochemical properties and compositional makeup C-peptide denoted its most pronounced aspect. This was used in developing a framework composed of different predictive models and feature encodings for predicting C-peptide like family.

Top

S3-3: Derivative Component Analysis for Serum Proteomics Data

Henry Han1,2

1 Department of Computer and Information Science, Fordham University, New York NY 10023 USA
2 Quantitative Proteomics Center, Columbia University, New York 10027 USA


Abstract
A new machine learning algorithm: derivative component analysis (DCA) is proposed for high dimensional proteomics data. Unlike conventional feature selection approaches, DCA aims at capturing subtle data behaviors in addition to unveiling global data behaviors through multi-resolution analysis. Compared with classic PCA and ICA methods that view each feature an indecomposable information unit in a single resolution way, DCA examines each feature in a multi-resolution approach by seeking its derivatives to capture latent data characteristics and conduct de-noising. We demonstrate DCA's advantages in disease phenotype discrimination and meaningful biomarker discovery by comparing it with state-of-the-art algorithms on benchmark data. Our results show that high-dimensional proteomics data are actually linearly separable under derivative component analysis. As a novel multi-resolution feature selection algorithm, DCA not only overcomes the weakness of the traditional methods in latent data behavior discovery, but also provides new techniques and insights in translational bioinformatics and machine learning.

Top

S4. Multi-Omic Applications

Room: Grand Ballroom B
Date: Thursday, Oct. 3, 15:00 - 16:15
S4-1: Integrated Analysis of microRNA-target Interactions with Clinical Outcomes for Cancers

Je-Gun Joung1,2,3, Dokyoon Kim1,2,4, Su-Yeon Lee1,2, Hwa Jung Kang5, Ju Han Kim1,2

1 Seoul National University Biomedical Informatics (SNUBI), Div. of Biomedical Informatics, Seoul National University College of Medicine, Seoul 110-799, Korea
2 Systems Biomedical Informatics National Core Research Center, Seoul National University College of Medicine, Seoul 110-799, Korea
3 Institute of Endemic Diseases, Seoul National University College of Medicine, 103 Daehakro, Jongno-gu, Seoul 110-799, Korea
4 Center for Systems Genomics, Pennsylvania State University, University Park, Pennsylvania, USA
5 Translational Bioinformatics Lab., Samsung Genome Institute, Samsung Medical Center, Seoul Korea


Abstract
Clinical statement alone is not enough to predict the progression of disease. Instead, the gene expression profiles have been widely used to forecast clinical outcomes. Many genes related to survival have been identified, and recently miRNA expression signatures predicting patient survival have been also investigated for several cancers. However, miRNAs and their target genes associated with clinical outcomes have remained largely unexplored. Here, we demonstrate a survival analysis based on the regulatory relationships of miRNAs and their target genes. The patient survivals for the two major cancers, ovarian cancer and glioblastoma multiforme (GBM), are investigated through the integrated analysis of miRNA-mRNA interaction pair. We found that there is a larger survival difference between two patient groups with an inversely correlated expression profile of miRNA and mRNA. It supports the idea that signatures of miRNAs and their targets related to cancer progression can be detected via this approach, and subsequent therapeutic targets can in turn be identified.

Top

S4-2: "N-of-1-pathways" unveils personal deregulated mechanisms from a single pair of RNA-Seq samples: towards precision medicine

Vincent Gardeux1,2,3,§, Ikbel Achour1,2,§, Mark Maienschein-Cline1, Gurunadh Parinandi1,4, Jianrong Li1,2, Neil Bahroos1, Haiquan Li1,2, Joe G.N. Garcia2,4,6,7, Yves A. Lussier1,2,4,5,6,8,9,*

1 Institute for Translational Health Informatics, University of Illinois at Chicago, Illinois, USA.
2 Department of Medicine, University of Illinois at Chicago, Chicago, Illinois, USA.
3 Department of Informatics, School of Engineering, EISTI (École Internationale de Sciences du Traitement de l'Information), Cergy-Pontoise, France.

4 Department of Bioengineering, University of Illinois at Chicago, Chicago, Illinois, USA.
5 Computation Institute, Argonne National Laboratory & University of Chicago, Illinois, USA.
6 Inst. for Personalized Respiratory Medicine, University of Illinois at Chicago, Illinois, USA
.
7 Department of Pharmacology, University of Illinois at Chicago, Chicago, Illinois, USA.

8 Dept. Biopharmaceutical Science, College of Pharmacy, Un. of Illinois at Chicago, Ill, USA.

9 Inst. For Genomics and Systems Biology, The University of Chicago, Chicago, Illinois, USA.
§ Equal contribution


Abstract
Background: In the groundbreaking genomic era, the emergence of precision medicine ushered in the opportunity to incorporate individual molecular data into patient care. Indeed, DNA-sequencing predicts somatic mutations of individual patients. However, these genetic features are static and overlook dynamic epigenetic and phenotypic response to therapy. Meanwhile, accurate personal transcriptome interpretation remains an unmet challenge. Further, N-of-1 (single subject) efficacy trials are increasingly pursued. However, they are not powered for molecular marker discovery.
Method: "N-of-1-pathways" translates gene expression data profiles into pathway-level profiles on single patient paired samples (one p-value per geneset). Using RNA-Seq data of 55 TCGA lung adenocarcinoma patients, it predicts individually deregulated pathways. Pooling patient-level predictions together, we then compare these pathways to those of three independent lung adenocarcinoma studies (microarray gold standards).
Results: The precision-recall curves of N-of-1-pathways predictions are comparable to those of GSEA and DEG enrichment from both internal and three external evaluations. We further show that >99.7% of 362 biological processes found in cross-patient studies are predicted by N-of-1-pathways, which also unveils 89 additional mechanisms unrelated to the gold standard shared by 1 to 40 patients. Moreover, a heatmap illustrates deregulated pathways at the single patient-level and highlights both individual and shared mechanisms ranging from molecular to organ-systems levels (e.g. DNA repair, signaling, immune response, organ development, etc.)
Conclusion: N-of-1-pathways provides a robust statistical and relevant biologic interpretation of individual response to therapy that were overlooked by cross-patient studies. Further, it enables mechanism-level classifiers with smaller cohorts as well as N-of-1-studies.
Software: https://Lussierlab.org/N-of-1-pathways

Top

S4-3: Knowledge Boosting: A graph-based integration with multi-omics data and genomic knowledge for cancer clinical outcome prediction

Dokyoon Kim1,2, Je-Gun Joung1,3, Kyung-Ah Sohn1,4, Hyunjung Shin5, Marylyn D. Ritchie2, Ju Han Kim1,6,*

1 Seoul National University Biomedical Informatics (SNUBI), Div. of Biomedical Informatics, Seoul National University College of Medicine, Seoul 110799, Korea
2 Center for Systems Genomics, Pennsylvania State University, University Park, Pennsylvania, USA
3 Translational Bioinformatics Lab (TBL), Samsung Genome Institute (SGI), Samsung Medical Center, Seoul, Korea

4 Department of Information and Computer Engineering, Ajou University, Suwon, Korea

5 Department of Industrial & Information Systems Engineering, Ajou University, San 5, Wonchun-dong, Yeoungtong-gu, 443-749, Suwon, Korea

6 Systems Biomedical Informatics Research Center, Seoul National University, Seoul 110799, Korea


Abstract
Cancer is a complex disease, which can be dysregulated through multiple mechanisms. Thus, no single level of genomic data fully elucidates tumor behavior since there are many genomic variations within/between levels in a biological system such as copy number alterations, DNA methylation, alternative splicing, miRNA regulation, post translational modification, etc. Nowadays, a number of heterogeneous types of data have become more available from the Cancer Genome Atlas (TCGA), generating multiple molecular levels of omics dimensions from genome to phenome. Given multi-omics data, information from one level to another may lead to some clues that help to uncover an unknown biological knowledge. Thus, integration of different levels of data can aid in extracting new knowledge by drawing an integrative conclusion from many pieces of information collected from diverse types of genomic data. Previously, we have proposed a graph-based framework that integrates multi-omics data including copy number alteration, DNA methylation, gene expression, and miRNA expression, for cancer clinical outcome prediction. Genomic features do not act in isolation, but rather interact with other genomic features in complex signaling or regulatory networks since cancer is caused by the deregulation of alteration in pathways or complete processes. Thus, it would be desirable to incorporate genomic knowledge when integrating multi-omics data for cancer clinical outcome prediction. Here, we proposed a new graph-based framework for integrating different levels of genomic data and genomic knowledge at hand in order to improve the predictive power and provide an enhanced global view on the interplay between levels. To highlight the validity of our proposed framework, we used an ovarian cancer dataset from TCGA for the stage, grade, and survival outcome prediction. Integrating multi-omics data with genomic knowledge to construct pre-defined features results in higher performance in clinical outcome prediction and higher stability. With integration of multi-omics data and genomic knowledge, understanding the molecular pathogenesis and underlying biology in cancer is expected to provide better guidance for improved diagnostic and prognostic indicators and effective therapies.

Top

S5. Linking Phenotypes

Room: Grand Ballroom A
Date: Friday, Oct. 4, 08:30 - 10:10
S5-1: The Multiscale Backbone of the Human Phenotype Network based on Biological Pathways

Christian Darabos1, Marquitta J. White1,2, Britney E. Graham1, Derek Leung1, Scott Williams1, and Jason H. Moore1

1 Department of Genetics, Institute for Quantitative Biomedical Sciences, Dartmouth College, Hanover, USA
2 Center for Human Genetics Research, Vanderbilt University, Nashville, USA


Abstract
Networks are commonly used to represent and analyze large and complex systems of interact- ing elements. We built pathway-based human phenotype network (PHPN) of over 800 physical attributes, diseases, and behavioral traits; based on about 2,300 genes and 1,200 biological pathways. Using GWAS phenotype-to-genes associations, and pathway data from Reactome, we connect human traits based on the common patterns of human biological pathways, detecting more pleiotropic effects, and expanding previous studies from a gene-centric approach to that of shared cell-processes. The resulting network has a heav- ily right-skewed degree distribution, placing it in the scale-free region of the network topologies spectrum. We extract the multi-scale information backbone of the PHPN based on the local densities of the network and discarding weak connection. Using a standard community detection algorithm, we construct phenotype modules of similar traits without applying expert biological knowledge. These modules can be assimilated to the disease classes. However, we are able to classify phenotypes according to shared biology, and not arbitrary disease classes. We present examples of expected clinical connections identified by PHPN as proof of principle. Furthermore, we highlight an unexpected connection between phenotype modules and discuss potential mechanistic connections that are obvious only in retrospect. The PHPN shows tremendous poten- tial to become a useful tool both in the unveiling of the diseases' mmon biology, and in the elaboration of diagnosis and treatments.

Top

S5-2: Integrative approach for modeling the association of multi-layered genomic data with gene expression traits

Kyung-Ah Sohn1,§, Dokyoon Kim2,3,§, Jaehyun Lim2,4, Ju Han Kim2,4,*

1 Department of Information and Computer Engineering, Ajou University, Suwon 443749, Korea
2 Seoul National University Biomedical Informatics (SNUBI), Div. of Biomedical Informatics, Seoul National University College of Medicine, Seoul 110799, Korea
3 Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, Pennsylvania, USA
4 Systems Biomedical Informatics Research Center, Seoul National University, Seoul 110799, Korea
§ These authors contributed equally to this work


Abstract
Large-scale multi-layered genomic datasets have been emerging through collaborative efforts such as TCGA and this provides valuable opportunities to deepen the knowledge of the molecular basis of cancer. Although many approaches have been proposed for the integrative analysis of such multi-layered data, few approaches address the problem of elucidating gene expression traits with more than two types of genomic features such as SNP, copy number alteration, methylation levels or miRNA expressions. In this work, we present a statistical framework for modeling the association of multi-layered genomic data with gene expression traits. A high-dimensional integrative genomic feature vector is constructed using multiple types of genomic features and then each gene expression trait is regressed on the integrative feature vector in a sparse regression framework. As a result, a small number of significant associations between genomic features and gene expression traits can be obtained with the corresponding association strengths. This approach allows systematic investigation of the relative contribution of different types of genomic data to gene expression traits. We demonstrate our approach on the real data of TCGA ovarian cancer patients. Our analysis shows that the integrative genomic features have greater predictive power for gene expression traits than each single type of genomics features.

Top

S5-3: ATHENA: Identifying interactions between different levels of genomic data associated with cancer clinical outcomes using grammatical evolution neural network

Dokyoon Kim1, Ruowang Li1, Scott M. Dudek1, Marylyn D. Ritchie1,*

1 Center for Systems Genomics, Pennsylvania State University, University Park, Pennsylvania, USA

Abstract
Gene expression profiles have been broadly used in cancer research as a diagnostic or prognostic signature for the clinical outcome prediction such as stage, grade, metastatic status, recurrence, and patient survival, as well as to potentially improve patient management. However, emerging evidence shows that gene expression-based prediction varies between independent data sets. One possible explanation of this effect is that previous studies were focused on identifying genes with large main effects associated with clinical outcomes. Thus, non-linear interactions without large individual main effects would be missed. The other possible explanation is that gene expression as a single level of genomic data is insufficient to explain the clinical outcomes of interest since cancer can be dysregulated by multiple alterations through genome, epigenome, transcriptome, and proteome levels. In order to overcome the variability of diagnostic or prognostic predictors from gene expression alone and to increase its predictive power, we need to integrate multi-levels of genomic data and identify interactions between them associated with clinical outcomes. Here, we proposed an integrative framework for identifying interactions within/between multi-levels of genomic data associated with cancer clinical outcomes using the Grammatical Evolution Neural Networks (GENN). In order to demonstrate the validity of the proposed framework, ovarian cancer data from TCGA was used as a pilot task. We found not only interactions within a single genomic level but also interactions between multi-levels of genomic data associated with survival in ovarian cancer. Notably, the integration model from different levels of genomic data achieved 72.89% balanced accuracy and outperformed the top models with any single level of genomic data. Understanding the underlying tumorigenesis and progression in ovarian cancer through the global view of interactions within/between different levels of genomic data is expected to provide guidance for improved prognostic biomarkers and individualized therapies.

Top

S5-4: Topological Analysis of Statistical Epistasis Networks Reveals Pathways Associated with Alzheimer's Disease

Qinxin Pan1, Ting Hu1,2, Li Shen3, Andrew J. Saykin3 and Jason H. Moore1,2,*

1 Department of Genetics, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA
2 Institute for Quantitative Biomedical Sciences, Dartmouth College, Hanover, NH, USA
3 Department of Radiology and Imaging Science, Center for Neuroimaging, Indiana University School of Medicine, Indianapolis, IN, USA


Abstract
Most pathway analysis approaches rely on main effects of genes and do not take gene-gene interactions into account. Gene-gene interactions, i.e., epistasis, are believed to account for a portion of the presumed missing heritability. Moreover, conventional methods treat each pathway independently whereas in reality they cooperate and work together as an intertwined system. In this study, we construct statistical epistasis networks (SEN) un- derlying Alzheimer's disease (AD) and infer risk-associated pathways from their topological structures. We test for pathways that possess central positions in the SENs and characterize the interactions among pathways. We find that pathway glycosphingolipid biosynthesis ganglio series, which has been hypothesized to be involved in AD pathobiology, holds central positions in the SENs and is actively interacting with a high number of other pathways. Other central pathways include alpha linolenic acid metabolism, sphingolipid metabolism, peroxisome, ether lipid metabolism, primary bile acid biosynthesis etc. In addition to central pathways, we identify a few pathways that are frequently interacting with other pathways. The pathways identified in our study should be further investigated, especially in the context of epistasis.

Top

S6. Post-GWAS

Room: Grand Ballroom A
Date: Friday, Oct. 4, 13:20 - 14:35
S6-1: Practical issues for screening and variable selection method in a Genome- Wide Association Analysis

Sungyeon Hong1, Yongkang Kim1, Taesung Park1,2,*

1 Department of Statistics, Seoul National University, Seoul 151-747, Korea
2 Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 151-747, Korea


Abstract
Variable selection plays an important role in high dimensional statistical modeling analysis. Computational cost and estimation accuracy are two main concerns for statistical inference of high dimensional data. Recently, many high dimensional data have been generated in biomedical science such as microarray data and single nucleotide polymorphism (SNP) data. Especially, the genome-wide association studies (GWAS) which focus on identifying SNPs associated with a disease of interest, have produced ultra-high dimensional data. Numerous methods have been proposed to handle GWAS data. Most statistical methods have adopted a two-stage approach: (1) pre- screening for dimensional reduction, (2) variable selection for identification of causal SNPs. The pre-screening step selects SNPs in terms of their p-values or absolute value of regression coefficients in single SNP analysis. Penalized regression such as Ridge, Lasso, adaptive Lasso and Elastic-net are commonly used for the variable selection step. In this paper, we investigate which combination of prescreening method and penalized regression performs best on quantitative phenotype via real GWA data containing 327,872 SNPs from 8842 individuals.

Top

S6-2: IGENT: Efficient Entropy based Algorithm for Detecting Genome-wide Gene-Gene Interaction Analysis

Min-Seok Kwon1, Mira Park2 and Taesung Park1,3,*

1 Interdisciplinary program in Bioinformatics, Seoul National University, Seoul, Korea
2 Department of Preventive Medicine, Eulji University, Korea
3 Department of Statistics, Seoul National University, Seoul, Korea


Abstract
With the development of high-throughput genotyping and sequencing technology, there are growing evidences of association with genetic variants and complex traits. In spite of thousands of genetic variants discovered, such genetic markers have been shown to explain only a very small proportion of the underlying genetic variance of complex traits. Gene-gene interaction (GGI) analysis is expected to unveil a lot of portion of unexplained heritability of complex traits. In this work, we propose IGENT, Information theory-based GEnome- wide gene-gene iNTeraction method. IGENT is an efficient stepwise algorithm for identifying genome-wide gene-gene interactions (GGI) and gene-environment interaction (GEI). For detecting significant GGIs in genome-wide scale, it is important to reduce computational burden significantly. Our method uses information gain (IG) and evaluates its significance without resampling. Through 70 simulation data sets, the power of the proposed method is shown to be nearly equivalent to the power of the proposed method. The proposed method successfully detected GGI for age-related macular degeneration (AMD). The proposed method is implemented by C++ and available on Windows, Linux and MacOSX.

Top

S6-3: Identification of novel therapeutics for complex diseases from genome-wide association data

M. P. Grover1, S. Ballouz2, K. A. Mohanasundaram1, R. A. George3, C. D. H. Sherman1, T. M. Crowley4,5, M. A. Wouters1,4

1 Life and Environmental Sciences, Deakin University, Geelong, Victoria, Australia.
2 Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, United States.
3 Victor Chang Cardiac Research Institute, 405 Liverpool St, Darlinghurst, 2010, NSW, Australia.
4 School of Medicine, Deakin University, Geelong, Victoria, Australia.
5 Australian Animal Health Laboratory, CSIRO Animal, Food and Health Sciences, Portarlington Road, Geelong, Victoria, Australia.


Abstract
Background: Human genome sequencing has enabled the association of phenotypes with genetic loci, but our ability to effectively translate this data to the clinic has not kept pace. Over the past 60 years, pharmaceutical companies have successfully demonstrated the safety and efficacy of over 1,200 novel therapeutic drugs [1] via costly clinical studies. Integration of drug-target data with candidate gene prediction systems can identify novel phenotypes which may benefit from current therapeutics. Such a drug repositioning tool can save valuable time and money spent on phase I clinical trials.
Results: We adopted a simple approach to integrate drug data with candidate gene predictions at the systems level. We previously used Gentrepid (www.gentrepid.org) as a platform to predict 1,805 candidate genes for the seven complex diseases considered in the WTCCC genome wide association study, namely Type 2 Diabetes (T2D), Bipolar Disorder (BD), Crohn's Disease (CD), Hypertension (HT), Type 1 Diabetes (T1D), Coronary Artery Disease (CAD) and Rheumatoid Arthritis (RA) [2, 3]. Using the publicly available drug databases, Therapeutic Target Database (TTD), PharmGKB and DrugBank (DB) as sources of drug-target association data, we identified a total of 390 (22%) candidate genes as novel therapeutic targets for the phenotype of interest and 2,132 drugs feasible for repositioning against the predicted targets.
Conclusions: By integrating genetic, bioinformatic and drug data, we have demonstrated that currently available drugs may be repositioned as novel therapeutics for the seven diseases studied here, quickly taking advantage of prior work in pharmaceutics to translate ground breaking results in genetics to clinical treatments.

Top

S7. Biomedical Big Data

Room: Grand Ballroom B
Date: Friday, Oct. 4, 13:20 - 14:35
S7-1: Health Monitoring System based on Lifelog Analysis

Yongjin Kwon1, Kyuchang Kang1, Changseok Bae1

1 Human Computing Section, Software Research Laboratory, Electronics and Telecommunications Research Institute, Daejeon, Korea

Abstract
Health status is closely related to daily routines. To monitor a patient's health status, it is important to track and interpret routine data continuously. Despite the effectiveness of everyday activity information, however, both collecting and analyzing the routine data are a difficult task. To collect and analyze routine data in a seamless and accurate way, it is required to build a system that incorporates a variety of sensors, data man- agement techniques, lifelog analysis algorithm, and summarization techniques. This paper introduces a health monitoring system based on lifelog analysis. Triaxial acceleration and angular velocity data are considered as lifelog data, which are measured by the accelerometer in smartphones. A smartphone collects lifelog data con- tinuously and transfers them into a server in a secure and reliable way. The lifelog data are interpreted by our activity recognition engine in the server, and the results are used as routine information to help practitioners or other vendors provide enhanced services.

Top

S7-2: Differentially Private Distributed Logistic Regression using Hybrid datasets

Zhanglong Ji1, Xiaoqian Jiang1, Shuang Wang1, Li Xiong2, Lucila Ohno-Machado1

1 Division of Biomedical Informatics, University of California, San Diego, CA, USA
2 Department of Mathematics and Computer Science, Emory University, Atlanta, GA, USA


Abstract
Differential privacy is a state-of-the-art framework for data privacy research. It offers provable privacy against attackers who have auxiliary information. However, differentially private methods sometimes introduce too much noise and make outputs less useful. We hypothesized that this situation could be alleviated in an environment where public and private data sets for the same study are available for analysis. In biomedical settings, for example, some patients are willing to sign an open-consent agreement to make their data (publicly) available for research (such as the individuals who are contributing to the 1,000 Genome Project), but others are not. Therefore, hybrid models that leverage public data while rigorously protecting private data can be developed. In this paper, we propose a novel distributed logistic regression model to be built from many data sets, including public and private ones, in a differentially private way. We showed that our algorithm has advantage over: (1) a logistic regression model based on only public data, and (2) differentially private distributed logistic regression models based on private data under various scenarios.

Top

S7-3: Effectively processing medical term queries on the UMLS Metathesaurus by Layered Dynamic Programming

Kaiyu Ren1,2, Albert M. Lai1, Kun Huang1, Aveek Mukhopadhyay2, Raghu Machiraju2, Yang Xiang1

1 Department of Biomedical Informatics,
2 Department of Computer Science and Engineer, The Ohio State University, Columbus, OH 43210, USA


Abstract
Mapping medical terms to standardized UMLS concepts is a basic step for leveraging biomedical texts in data management and analysis. However, available methods and tools have major limitations in handling queries over the UMLS Metathesaurus that contain inaccurate query terms, which frequently appear in real world applications. To provide a practical solution for this task, we propose a layered dynamic programming mapping (LDPMap) approach, which can efficiently handle these queries. Our empirical study shows that LDPMap has much higher accuracies in mapping inaccurate medical terms to UMLS concepts, in comparison with the UMLS Metathesaurus Browser and MetaMap.

Top

S8. New Technologies

Room: Grand Ballroom A
Date: Friday, Oct. 4, 14:50 - 16:05
S8-1: GAMUT: GPU Accelerated MicroRNA analysis to Uncover Target genes through CUDA-miRanda

Shuang Wang1, Jihoon Kim1, Xiaoqian Jiang1, Stefan F Brunner2, and Lucila Ohno-Machado1

1 Division of Biomedical Informatics, University of California, San Diego, CA, USA
2 Biomedical Informatics, University of Applied Sciences Upper Austria, Hagenberg, Austria


Abstract
Non-coding sequences such as microRNAs have important roles in disease processes. Computational microRNA target identi- fication (CMTI) is becoming increasingly important since traditional experimental methods for target identification pose many difficulties. These methods are time-consuming, costly, and often need guidance from computational methods to narrow down candidate genes anyway. However, most CMTI methods are computationally very demanding, since they need to handle not only several million query microRNA and reference RNA pairs, but also several million nucleotide comparisons within each given pair. Thus, the need to perform microRNA identification at such large scale has increased the demand for parallel computing. Although most CMTI programs (e.g., the miRanda algorithm) are based on a modified Smith-Waterman (SW) algorithm, the existing parallel SW implementations (e.g., CUDASW++ 2.0/3.0, SWIPE) are unable to meet this demand in CMTI tasks. We present CUDA-miRanda, a fast microRNA target identification algorithm that takes advantage of massively parallel computing on Graphics Processing Units (GPU) using NVIDIA's Compute Unified Device Architecture (CUDA). CUDA-miRanda specifically focuses on the local alignment of short (i.e., < 32 nucleotides) sequences against longer reference sequences (e.g., 20K nucleotides). Moreover, the proposed algorithm is able to report multiple alignments (up to 191 top scores) and the corresponding traceback sequences for any given (query sequence, reference sequence) pair. Speeds over 5.36 Giga Cell Updates Per Second (GCUPs) are achieved on a server with 4 NVIDIA Tesla M2090 GPUs. Compared to the original miRanda algorithm, which is evaluated on an Intel Xeon E5620@2.4 GHz CPU, the experimental results show up to 166 times performance gains in terms of execution time. In addition, we have verified that the exact same targets were predicted in both CUDA-miRanda and the original miRanda implementations through multiple testing datasets. Furthermore, GPUs are inexpensive compared to high performance compute (HPC) environments in which miRanda would have to run to achieve similar performance. We offer an alternative to HPC that can be developed locally at a relatively small cost. The community of GPU developers in the biomedical research community, particularly for genome analysis, is still growing. With increasing shared resources, this community will be able to advance CMTI in a very significant manner. Our source code is available at http://dbmi-engine.ucsd.edu/cudaMiranda.

Top

S8-2: A Novel Multi-scale Visualization Software for Data-driven Biomedical Data Exploration

Gang Su1,2, Barbara Mirel3, Anuj Kumar1,4, Charles F Burant5, Brian Athey2 and Fan Meng1,2

1 The Molecular and Behavioral Neuroscience Institute,
2 Department for Computational Medicine and Bioinformatics,
3 School of Education,
4 Molecular, Cellular, Developmental biology,
5 Department of Molecular and Integrative Physiology, University of Michigan, Ann Arbor Michigan 48105, USA


Abstract
Inspired by hierarchical clustering heatmaps, CoolMap is a general-purpose, multi-scale, flexible and extensible software application for visual exploration of big biomedical datasets. To overcome the difficulty of interactive data-driven analysis of large datasets such as omics experiments and clinical trial data, CoolMap offers the capability of aggregate rows and columns in a tabular dataset at various concept levels, defined either by external data structures such as Gene Ontology (GO), KEGG pathways or experiment sample groups, or computed groups such as clustering results. Aggregated values at different concept levels, such as the data mean or standard deviation, can then replace the individual data points at the intersection of these concept terms to reduce the size and complexity of the original data and provide a high-level overview for data-driven pattern discovery and hypothesis generation. Once a 'hotspot' is identified, the researcher may drill down for additional details by expanding a concept to its child entities for fine details, while maintaining the surrounding context in coarser overview. Data could be visualized using a variety of ways, such as color, shape, text or summary plots using our custom developed high performance and extensible rendering engine. Many other auxiliary functions, such as data filtering, searching, multi-view linking, data rows/columns sorting, rearrangement, resizing, etc., were also developed to facilitate the knowledge discovery process. CoolMap was also developed using modular design so that it can be extended to for general tabular data visualization or augment other visualization software such as Cytoscape for network analysis. Compared with a variety of classic heatmap tools, CoolMap is significantly more efficient for big data exploration and analysis.

Top

S8-3: In Silico Cancer Cell versus Stroma cellularity index computed from species-specific human and mouse transcriptome of xenograft models: towards accurate stroma targeting therapy assessment

Xinan Yang1, Yong Huang1, Younghee Lee1, Vincent Gardeux2,3,4, Ikbel Achour2,4, Kelly Regan2,4, Ellen Rebman2,4, Haiquan Li2,4, Yves A. Lussier1,2,4,5,*,§

1 Ctr for Biomed Inform and Sect of Genetic Medicine, Dept. of Medicine, Un. of Chicago, USA.
2 Institute for Translational Health Informatics, University of Illinois at Chicago, Illinois, USA.
3 Department of Informatics, School of Engineering, EISTI, Cergy-Pontoise, France.
4 Dept. of Medicine, University of Illinois at Chicago, Chicago, IL, USA.
5 Comprehensive Cancer Ctr and Ludwig Ctr for Metastasis Research; Un. of Chicago, IL, USA.
6 Depts of Bioengineering & of Pharmaceutical Science, Un. of Illinois at Chicago, IL, USA.
7 Computation Institute & Institute For Genomics and Systems Biology, Argonne National Laboratory and The University of Chicago, IL, USA.
8 Inst. for Personalized Respiratory Medicine, Un. of Illinois at Chicago, Chicago, Illinois, USA.
§ This work was conducted in part while at The University of Chicago


Abstract
Background: The current state of the art for measuring stromal response to targeted therapy requires burdensome and rate limiting quantitative histology. Transcriptome measures are increasingly affordable and provide an opportunity for developing a stromal versus cancer ratio in xenograft models. In these models, human cancer cells are transplanted into mouse host tissues (stroma) and together co-evolve into a tumour microenvironment. However, profiling the mouse or human component separately also remains problematic. Indeed, laser captured- microdissection is labour intensive. Moreover, gene expression using commercial microarrays introduces significant and underreported cross-species hybridization errors that are commonly overlooked by biologists. Method: We developed a customized dual-species array, H&M array, and performed cross-species and species- specific hybridization measurements. We validate a new methodology for establishing the stroma vs cancer ratio using transcriptomic data.
Results: In the biological validation of the H&M array, cross-species hybridization of human and mouse probes was significantly reduced (4.5 and 9.4 fold reduction, respectively; p<2x10-16 for both, Mann-Whitney test). We confirmed the capability of the H&M array to determine the stromal to cancer cells ratio based on the estimation of cellularity index of mouse/human mRNA content in vitro. This new metrics enable to investigate more efficiently the stroma-cancer cell interactions (e.g. cellularity) bypassing labour intensive requirement and biases of laser capture microdissection.
Conclusion: These results provide the initial evidence of improved and cost-efficient analytics for the investigation of cancer cell microenvironment using species-specificity arrays specifically designed for xenografts models.

Top



 

Highlight Research Tracks

Highlight Research 1.

Room: Grand Ballroom C
Date: Thursday, Oct. 3, 13:00 - 14:20
H1-1: Heart Attacks: Leveraging a cardiovascular systems biology strategy to predict future outcomes

Carlo Vittorio Cannistraci1, Timothy Ravaasi1 and Enrico Ammirati1

1 Integrative Systems Biology Laboratory, Division of Biological and Environmental Sciences and Engineering, Division of Applied Mathematics and Computer Science and Engineering, Computational Bioscience Research Center, King Abdullah University for Science and Technology, Thuwal, Kingdom of Saudi Arabia

Abstract
Inflammation is likely involved in ST-elevation acute myocardial infarction (STEMI), and patients with STEMI can present with high levels of circulating interleukin-6 (IL6) at the onset of symptoms. We used machine learning techniques to identify characteristic inflammatory cytokine patterns in the blood of emergency-room patients with STEMI, and observed two functional modules characterizing the reciprocal behaviours of the cytokines in patients with high IL6 levels. Next, exploiting reverse engineering techniques, we inferred which cytokines were crucial inside the respective modules. Combining them together with IL6 in a unique formula yielded a risk-index - a kind of composed-biomarker - that outperformed any single cytokine and classical prognostic factors in the prediction of cardiac dysfunction at discharge and death at six months. Our methodology was considered a translational research innovation for the definition of composed-inflammatory-markers in cardiology, while our findings have potential implications for risk-oriented patient stratification and design of immune-modulating therapies.

Top

H1-2: Bridging cancer: biology with the clinic: a novel personalized prognostic indicators for breast cancer

Xinan Yang1,*, Prabhakaran Vasudevan1, Vishwas Parekh1, Aleks Penev1, John M. Cunningham1

1 Section of Hematology/Oncology, Department of Pediatrics, Comer Children’s Hospital, The University of Chicago, Chicago, Illinois, United States of America

Abstract
Identification and characterization of crucial gene target(s) that will allow focused therapeutics development remains a challenge. We have interrogated the putative therapeutic targets associated with the transcription factor Grainy head-like 2 (GRHL2), a critical epithelial regulatory factor. We demonstrate the possibility to define the molecular functions of critical genes in terms of their personalized expression profiles, allowing appropriate functional conclusions to be derived. A novel methodology, relative expression analysis with gene-set pairs (RXA-GSP), is designed to explore the potential clinical utility of cancer-biology discovery. Observing that Grhl2-overexpression leads to increased metastatic potential in vitro, we established a model assuming Grhl2-induced or -inhibited genes confer poor or favorable prognosis respectively for cancer metastasis. Training on public gene expression profiles of 995 breast cancer patients, this method prioritized one gene-set pair (GRHL2, CDH2, FN1, CITED2, MKI67 versus CTNNB1 and CTNNA3) from all 2717 possible gene-set pairs (GSPs). The identified GSP significantly dichotomized 295 independent patients for metastasis-free survival (log-rank tested p = 0.002; severe empirical p=0.035). It also showed evidence of clinical prognostication in another independent 388 patients collected from three studies (log-rank tested p = 3.3e-6). This GSP is independent of most traditional prognostic indicators, and is only significantly associated with the histological grade of breast cancer (p = 0.0017), a GRHL2-associated clinical character (p = 6.8e-6, Spearman correlation), suggesting that this GSP is reflective of GRHL2-mediated events. Furthermore, a literature review indicates the therapeutic potential of the identified genes. This research demonstrates a novel strategy to integrate both biological experiments and clinical gene expression profiles for extracting and elucidating the genomic impact of a novel factor, GRHL2, and its associated gene-sets on the breast cancer prognosis. Importantly, the RXA-GSP method helps to individualize breast cancer treatment. It also has the potential to contribute considerably to basic biological investigation, clinical tools, and potential therapeutic targets.

Top

H1-3: Interpreting individuals' genomes: Practical applications from newborn screening, and findings from CAGI 2013--the Critical Assessment of Genome Interpretation

Steven E. Brenner1, John Moult2, CAGI Participants

1 University of California, Berkeley, CA, USA
2 IBBR, University of Maryland, Rockville, MD, USA


Abstract
The Critical Assessment of Genome Interpretation (CAGI, 'kā-jē) is a community experiment to objectively assess computational methods for predicting the phenotypic impacts of genomic variation. In the experiment, participants are provided genetic variants and make predictions of resulting phenotype. These predictions are evaluated against experimental characterizations by independent assessors. A long-term goal for CAGI is to improve the accuracy of phenotype and disease predictions in clinical settings.
The third CAGI experiment (concluded in July 2013) consisted of ten diverse challenges. CAGI deliberately extends challenges from previous years, with the continuity allowing measurement of progress. For example, in the second CAGI, in a challenge to predict Crohn’s disease from exomes, one group was able to identify 80% of affected individuals before the first false positive healthy person. In the third CAGI experiment, this challenge used an improved dataset, and several groups performed remarkably well, with one group achieving a ROC AUC of 0.94. The experiment also revealed important population structure to Crohn’s disease in Germany.
For three years, CAGI has posed a challenge with Personal Genome Project (PGP) genome data. This year, two groups were able to successfully map a significant number of complete genomes to their corresponding trait profiles submitted by PGP participants. In the expanded challenge to predict benign versus deleterious variants in DNA double-strand break repair MRN genes—Rad50 (from last year), Mre11, and Nbs1—as determined by those that appear in a breast cancer case versus healthy control, predictions show how methods differ sharply in their effectiveness even amongst proteins in the same complex.
A new challenge this year was to use exomes from families with lipid metabolism disorders. In the case of hypoalphalipoproteinemia (HA), a company made predictions which showed how understanding the problem structure and employing an extensive knowledgebase led to remarkably good results. Another related challenge revealed a twist wherein real-world data differed sharply from theoretical models.
The other challenges were to predict which variants of BRCA1 and BRCA2 are associated with increased risk of breast cancer; to predict how variants in p53 gene exons affect mRNA splicing; to predict how well variants of a p16 tumor suppressor protein inhibit cell proliferation; and to identify potential causative SNPs in disease-associated loci.
Overall, CAGI revealed that the phenotype prediction methods embody a rich and diverse representation of biological knowledge, and they are able to make predictions that are highly statistically significant. However, we also found the accuracy of prediction on the phenotypic impact of any specific variant was unsatisfactory and of questionable clinical utility. The most effective predictions came from methods honed to the precise challenge, including the specific genes of interest as well as the problem context. Prediction methods are clearly growing in sophistication, yet there are extensive opportunities for further progress.
Complete information about CAGI may be found at http://genomeinterpretation.org.

Top

Highlight Research 2.

Room: Grand Ballroom C
Date: Thursday, Oct. 3, 15:00 - 16:15
H2-1: The SADI PersonThe SADI Personal Health Lens: A Web Browser-Based System for Identifying Personally Relevant Drug Interactions

Ben Vandervalk1, E Luke McCarthy1, José Cruz-Toledo2, Artjom Klein3, Christopher J O Baker3, Michel Dumontier2, Mark D Wilkinson4

1 James Hogg Research Centre, Heart & Lung Institute, University of British Columbia, Vancouver, BC, Canada
2 Department of Biology, Carleton University, Ottawa, ON, Canada
3 Department of Computer Science and Applied Statistics, University of New Brunswick, Saint John, NB, Canada
4 Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid, Pozuelo de Alarcón (Madrid), Spain


Abstract
Background: The Web provides widespread access to vast quantities of health-related information that can improve quality-of-life through better understanding of personal symptoms, medical conditions, and available treatments. Unfortunately, identifying a credible and personally relevant subset of information can be a time-consuming and challenging task for users without a medical background.
Objective: The objective of the Personal Health Lens system is to aid users when reading health-related webpages by providing warnings about personally relevant drug interactions. More broadly, we wish to present a prototype for a novel, generalizable approach to facilitating interactions between a patient, their practitioner(s), and the Web.
Methods: We utilized a distributed, Semantic Web-based architecture for recognizing personally dangerous drugs consisting of: (1) a private, local triple store of personal health information, (2) Semantic Web services, following the Semantic Automated Discovery and Integration (SADI) design pattern, for text mining and identifying substance interactions, (3) a bookmarklet to trigger analysis of a webpage and annotate it with personalized warnings, and (4) a semantic query that acts as an abstract template of the analytical workflow to be enacted by the system.
Results: A prototype implementation of the system is provided in the form of a Java standalone executable JAR file. The JAR file bundles all components of the system: the personal health database, locally-running versions of the SADI services, and a javascript bookmarklet that triggers analysis of a webpage. In addition, the demonstration includes a hypothetical personal health profile, allowing the system to be used immediately without configuration. Usage instructions are provided.
Conclusions: The main strength of the Personal Health Lens system is its ability to organize medical information and to present it to the user in a personalized and contextually relevant manner. While this prototype was limited to a single knowledge domain (drug/drug interactions), the proposed architecture is generalizable, and could act as the foundation for much richer personalized-health-Web clients, while importantly providing a novel and personalizable mechanism for clinical experts to inject their expertise into the browsing experience of their patients in the form of customized semantic queries and ontologies.

Top

H2-2: Correlation network-guided novel key gene identification

Feng He1,2,§, Hairong Chen1,§, Michael Probst-Kepper3, Robert Geffers4, Serge Eifes2, Antonio del Sol2, Klaus Schughart1, An-Ping Zeng5,6 and Rudi Balling2,*

1 Department of Infection Genetics, Helmholtz Centre for Infection Research (HZI), University of Veterinary Medicine Hannover, Braunschweig, Germany
2 Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Esch-sur-Alzette, Luxembourg
3 Institute of Microbiology, Immunology and Hospital Hygiene, Städtisches Klinikum Braunschweig GmbH, Braunschweig, Germany
4 Department of Cell Biology, Helmholtz Centre for Infection Research, Braunschweig, Germany
5 Group of Systems Biology, Helmholtz Centre for Infection Research, Braunschweig, Germany
6 Institute of Bioprocess and Biosystems Engineering, Hamburg University of Technology, Hamburg, Germany
§ These authors contributed equally to this work


Abstract
Human FOXP3+, CD25+, CD4+ regulatory T cells(Tregs) are essential to the maintenance of immune homeostasis. Several genes are known to be important for murine Tregs, but for human Tregs the genes and underlying molecular networks controlling the suppressor function still largely remain unclear. Here, we describe a strategy to identify the key genes directly from an undirected correlation network which we reconstruct from a very high time-resolution (HTR) transcriptome during the activation of human Tregs/CD4+ T-effector cells. We show that a predicted top-ranked new key gene PLAU (the plasminogen activator urokinase) is important for the suppressor function of both human and murine Tregs. Further analysis unveils that PLAU is particularly important for memory Tregs and that PLAU mediates Treg suppressor function via STAT5 and ERK signaling pathways. Our study demonstrates the potential for identifying novel key genes for complex dynamic biological processes using a network strategy based on HTR data, and reveals a critical role for PLAU in Treg suppressor function.

Top

H2-3: Imbalanced network biomarkers for traditional Chinese medicine Syndrome in gastritis patients

Rui Li1,§, Tao Ma1,§, Jin Gu1, Xujun Liang1 and Shao Li1,*

1 Bioinformatics Division and Center for Synthetic and Systems Biology, TNLIST/Department of Automation, Tsinghua University, Beijing 100084, China.
§ Co-first authors


Abstract
Cold Syndrome and Hot Syndrome are thousand-year-old key therapeutic concepts in traditional Chinese medicine (TCM), which depict the loss of body homeostasis. However, the scientific basis of TCM Syndrome remains unclear due to limitations of current reductionist approaches. Here, we established a network balance model to evaluate the imbalanced network underlying TCM Syndrome and find potential biomarkers. By implementing this approach and investigating a group of chronic superficial gastritis (CSG) and chronic atrophic gastritis (CAG) patients, we found that with leptin as a biomarker, Cold Syndrome patients experience low levels of energy metabolism, while the CCL2/MCP1 biomarker indicated that immune regulation is intensified in Hot Syndrome patients. Such a metabolism-immune imbalanced network is consistent during the course from CSG to CAG. This work provides a new way to understand TCM Syndrome scientifically, which in turn benefits the personalized medicine in terms of the ancient medicine and complex biological systems.

Top

Highlight Research 3.

Room: Grand Ballroom B
Date: Friday, Oct. 4, 08:30 - 10:10
H3-1: Computational Studies of Ubiqutin and Ubiquitin-like Conjugation

Tianshun Gao1,*, Yu Xue1

1 Huazhong University of Science and Technology, Wuhan, P. R. China, 430074

Abstract
Justification: The 2004 Nobel Prize in Chemistry was awarded for the discovery of ubiquitin and ubiquitin proteasome system that is a highly-specific, ATP-dependent pathway responsible for targeting specific proteins for degradation and regulating nearly all of cellular processes. Since the investigation has advanced with thousands of experimental efforts, an integrative and comprehensive data resource is still not available. Besides, systematic identification of ubiquitinated proteins with modified sites has emerged to be another hot topic, at least 10 prediction programs for ubiquitination sites have been developed, but none of them were able to predict ubiquitin ligases for substrates' ubiquitination sites. Identification of protein ubiquitination sites with their cognate ubiquitin ligases (E3s) has been critical for understanding the complete ubiquitination and potential relationships between ubiquitinaton and other important cellular processes. We have developed UUCD database and GPS-PLUB predictor to overcome the two difficulties.
Methods: From the scientific literature, 26 E1s, 105 E2s, 1003 E3s and 148 deubiquitination enzymes (DUBs) were collected and classified into 1, 3, 19 and 7 families, respectively. 981 ubiquitination sites with E3 information were also collected and 1154 site-E3 pairs were integrated. Furthermore, there were 965 sites without redundancies in the 8 main E3 families and 1126 sites in the 87 single E3s. To computationally characterize potential enzymes in 70 eukaryotic species, we constructed 1, 1, 15 and 6 hidden Markov model (HMM) profiles for E1s, E2s, E3s and DUBs at the family level, separately. Moreover, the ortholog searches were conducted for E3 and DUB families without HMM profiles. All experimentally identified enzymes were taken as the benchmark dataset to evaluate the prediction performance and robustness of the HMM identifications. We first classified E3-associated proteins into two classes as E3 activity and E3 adaptor. Besides, we adopted the GPS (Group-based Prediction System) algorithm, and developed a useful tool for predicting E3-specific ubiquitination sites for 87 E3s in hierarchy. Especially, a reasonable approach used in the predictor was able to successfully estimate the theoretically maximal false positive rates (FPR). Taking APC/C family as an example, the training sensitivity and specificity were 100% and 94.74% respectively with the FPR of 2%. The predictor showed a great performance and a significant accuracy.
Results: a database UUCD (Ubiquitin and Ubiquitin-like Conjugation Database) was developed with 738 E1s, 2937 E2s, 46 631 E3s and 6647 DUBs of 70 eukaryotic species. Besides, a useful tool GPS-PLUB (Prediction of ubiquitin Ligase-based Ubiquination sites) was designed for predicting E3-specific ubiquitination sites for 87 E3s in hierarchy.
Conclusions: Taken together, we developed a family-based database (http://uucd.biocuckoo.org) for ubiquitin and ubiquitin-like conjugation, through a similar E1 (ubiquitin-activating enzyme)-E2 (ubiquitin-conjugating enzyme)-E3 (ubiquitin-protein ligase) enzyme thioester cascade and a predictor for identification of E3-specific ubiquitination sites. We believe that they can lead users to generate a comprehensive view of the ubiquitination modification and also serve as a useful resource for further researches.

Top

H3-2: Mechanisms of PDGFRα promiscuity and PDGFRβ specificity in association with PDGFB

Daniel Torrente1, Ricardo Cabezas1, Marco Fidel1, Francisco Capani2, Yuly Sanchez1, Ludis Morales1, George E. Barreto1,*, Janneth González1,*

1 Departmento de Nutrición y Bioquímica, Facultad de Ciencias, Pontificia Universidad Javeriana, Bototá D.C., Colombia
2 Laboratorio de Citoarquitectura y Plasticidad Neuronal, Instituto de Investigaciones Cardiológicas "Prof. Dr. Alberto C. Taquini", UMA-CONICET, Buenos Aires, Argentina


Abstract
Platelet-derived growth factor (PDGF) receptor α interacts with PDGFA, B, C and AB, while PDGFRβ just binds to PDGFB and D, suggesting that PDGFRα is more promiscuous than PDGFRβ. The structural analysis of PDGFRα-PDGFA and PDGFRα-PDGFB complexes and a molecular explanation for the promiscuity of PDGFRα and the specificity of PDGFRβ remain unclear. In the present study, we modeled the three extracellular domains of PDGFRα using a previous crystallographic structure of PDGFRβ as a template. Additionally, we analyzed the interacting residues of PDGFRα-PDGFA and PDGFRα-PDGFB complexes using docking simulations. The validation of the resulting complexes was evaluated by molecular dynamics simulations. Structural analysis revealed that changes of non-aromatic amino acids in PDGFRα to aromatic amino acids in PDGFRβ (ILE 139PHE, PRO267PHE and ASN204TYR) may be involved in the promiscuity of PDGFRα. Indeed, substitution of amino acids with low probability of rotamer changes in PDGFRβ (MET133ALA, ASN163GLU and ASN179SER) and energy stability due the formation of hydrogen bond in PDGFRβ could explain the specificity of PDGFRβ. These results could be used as an input for a better and more specific drug design for diseases related with the malfunction of PDGFs and PDGFRα such as cancer and atherosclerosis.

Top

H3-3: DisplayHTS: a R package for visualizing high-throughput screening data results

Xiaohua Douglas Zhang1,* and Zhaozhi Zhang2

1 Early Development Statistics, BARDS, Merck Research Laboratories, West Point, PA 19486, USA
2 Central Bucks South, Warrington, PA 18976, USA


Abstract
RNA interference (RNAi) research has been used to elucidate gene function, to identify novel drug targets, and to reveal the molecular biological system. RNAi high-throughput screening (HTS) study allows genome-wide loss-of-function screening. One of the major advantages of RNAi HTS is its ability to simultaneously interrogate thousands of genes. With the ability of generating a large amount of data per experiment, RNAi HTS has led to an explosion in the rate of data generated in recent years. Consequently, one of the most fundamental challenges in RNAi HTS experiments is to glean biological significance from mounds of data, which relies on the development and adoption of suitable statistics/bioinformatics methods. Recently, we have been developing novel analytic methods specifically for quality control and hit selection in RNAi HTS experiments. We published a R package displayHTS in Bioinformatics in 2013. This package implements recently developed methods and figures for displaying data and hit selection results in HTS experiments. It generates useful distinctive graphics. In this presentation, I will describe related statistical methods, elaborate the R package and demonstrate how to use the R package in HTS experiments.

Top

H3-4: Methylerythritol phosphate pathway to isoprenoids: Kinetic modeling and in silico enzyme inhibitions in P. falciparum

Vivek Kumar Singh1, Indira Ghosh1,*

1 School of Computational & Integrative Sciences, Jawaharlal Nehru University, New Delhi, India

Abstract
The methylerythritol phosphate (MEP) pathway of Plasmodium falciparum (P. falciparum) has become an attractive target for anti-malarial drug discovery. This study describes a kinetic model of this pathway, its use in validating 1-deoxy-D-xylulose 5-phosphate reductoisomerase (DXR) as drug target from the systemic perspective, and additional target identification, using metabolic con- trol analysis and inhibition studies. In addition to DXR, 1-deoxy-D-xylulose 5-phosphate synthase (DXS) can be targeted because it is the first enzyme of the pathway and has the highest flux control coefficient followed by that of DXR. In silico inhibition of both enzymes caused large decre- ment in the pathway flux. An added advantage of targeting DXS is its influence on vitamin B1 and B6 biosynthesis. Two more potential targets, 2-C-methyl-D-erythritol 2,4-cyclodiphosphate synthase and 1-hydroxy-2-methyl-2-(E)-butenyl 4-diphosphate synthase, were also identified. Their inhibi- tion caused large accumulation of their substrates causing instability of the system.

Top

Highlight Research 4.

Room: Grand Ballroom C
Date: Friday, Oct. 4, 13:20 - 14:35
H4-1: A protein domain-centric approach for the comparative analysis of human and yeast phenotypically relevant mutations

Thomas A Peterson1, DoHwan Park2, Maricel G Kann1,*

1 Department of Biological Sciences, University of Maryland, Baltimore County, Baltimore, MD, USA.
2 Department of Mathematics and Statistics, University of Maryland, Baltimore County, Baltimore, MD, USA.


Abstract
Background: The body of disease mutations with known phenotypic relevance continues to increase and is expected to do so even faster with the advent of new experimental techniques such as whole-genome sequencing coupled with disease association studies. However, genomic association studies are limited by the molecular complexity of the phenotype being studied and the population size needed to have adequate statistical power. One way to circumvent this problem, which is critical for the study of rare diseases, is to study the molecular patterns emerging from functional studies of existing disease mutations. Current gene-centric analyses to study mutations in coding regions are limited by their inability to account for the functional modularity of the protein. Previous studies of the functional patterns of known human disease mutations have shown a significant tendency to cluster at protein domain positions, namely position-based domain hotspots of disease mutations. However, the limited number of known disease mutations remains the main factor hindering the advancement of mutation studies at a functional level. In this paper, we address this problem by incorporating mutations known to be disruptive of phenotypes in other species. Focusing on two evolutionarily distant organisms, human and yeast, we describe the first inter-species analysis of mutations of phenotypic relevance at the protein domain level.
Results: The results of this analysis reveal that phenotypic mutations from yeast cluster at specific positions on protein domains, a characteristic previously revealed to be displayed by human disease mutations. We found over one hundred domain hotspots in yeast with approximately 50% in the exact same domain position as known human disease mutations.
Conclusions: We describe an analysis using protein domains as a framework for transferring functional information by studying domain hotspots in human and yeast and relating phenotypic changes in yeast to diseases in human. This first-of-a-kind study of phenotypically relevant yeast mutations in relation to human disease mutations demonstrates the utility of a multi-species analysis for advancing the understanding of the relationship between genetic mutations and phenotypic changes at the organismal level.

Top

H4-2: Yin and Yang of reciprocally scale-free biological networks between lethal genes and disease genes

Hyun Wook Han1,2,3, Jung Hun Ohn1,3, Jisook Moon2,* and Ju Han Kim1,3,*

1 Division of Biomedical Informatics, Seoul National University Biomedical Informatics (SNUBI), Seoul National University College of Medicine, Seoul 110799, Korea
2 College of Medicine, CHA General Hospital, CHA University, Seoul 135081, Korea
3 Systems Biomedical Informatics Research Center, Seoul National University, Seoul 110799, Korea


Abstract
Biological networks often show a scale-free topology with node degree following a power-law distribution. Lethal genes tend to form functional hubs, whereas non-lethal disease genes are located at the periphery. Uni-dimensional analyses, however, are flawed. We created and investigated two distinct scale-free networks; a protein-protein interaction (PPI) and a perturbation sensitivity network (PSN). The hubs of both networks exhibit a low molecular evolutionary rate (P < 8 × 10-12, P < 2 × 10-4) and a high codon adaptation index (P < 2 × 10-16, P < 2 × 10-8), indicating that both hubs have been shaped under high evolutionary selective pressure. Moreover, the topologies of PPI and PSN are inversely proportional: hubs of PPI tend to be located at the periphery of PSN and vice versa. PPI hubs are highly enriched with lethal genes but not with disease genes, whereas PSN hubs are highly enriched with disease genes and drug targets but not with lethal genes. PPI hub genes are enriched with essential cellular processes, but PSN hub genes are enriched with environmental interaction processes, having more TATA boxes and transcription factor binding sites. It is concluded that biological systems may balance internal growth signaling and external stress signaling by unifying the two opposite scale-free networks that are seemingly opposite to each other but work in concert between death and disease.

Top

H4-3: Comparative analysis using K-mer and K-flank patterns provides evidence for CpG island sequence evolution in mammalian genomes

Heejoon Chae1, Jinwoo Park2,3, Seong-Whan Lee4, Kenneth P. Nephew5 and Sun Kim2,3,*

1 Department of Computer Science, School of Informatics and Computing, Indiana University, Bloomington, IN, USA
2 Department of Computer Science and Engineering, Bioinformatics Institute, Seoul National University, Seoul, Korea
3 Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Korea
4 Department of Brain and Cognitive Engineering, Korea University, Seoul, Korea
5 Medical Sciences Program, Indiana University School of Medicine, Indiana University, Bloomington, IN, USA


Abstract
CpG islands are GC-rich regions often located in the 50 end of genes and normally protected from cytosine methylation in mammals. The important role of CpG islands in gene transcription strongly suggests evolutionary conservation in the mamma- lian genome. However, as CpG dinucleotides are over-represented in CpG islands, comparative CpG island analysis using conventional sequence analysis techniques remains a major challenge in the epigenetics field. In this study, we conducted a comparative analysis of all CpG island sequences in 10 mammalian genomes. As sequence similarity methods and character composition techniques such as information theory are particularly difficult to conduct, we used exact patterns in CpG island sequences and single character discrepancies to identify differences in CpG island sequences. First, by calculating genome distance based on rank cor- relation tests, we show that k-mer and k-flank patterns around CpG sites can be used to correctly reconstruct the phylogeny of 10 mammalian genomes. Further, we used various machine learning algorithms to demonstrate that CpG islands sequences can be characterized using k-mers. In addition, by testing a human model on the nine different mammalian genomes, we provide the first evidence that k-mer signatures are consist- ent with evolutionary history.

Top

Highlight Research 5.

Room: Grand Ballroom B
Date: Friday, Oct. 4, 14:50 - 16:05
H5-1: Arpeggio: harmonic compression of ChIP-seq data reveals protein-chromatin interaction signatures

Kelly Patrick Stanton1, Fabio Parisi1, Francesco Strino1, Neta Rabin2, Patrik Asp3 and Yuval Kluger1,4,*

1 Department of Pathology, Yale University School of Medicine, 333 Cedar Street, New Haven, CT 06520, USA
2 Department of Exact Sciences, Afeka - Tel-Aviv Academic College of Engineering, Tel-Aviv 69107, Israel
3 Department Of Liver Transplant, Montefiore Medical Center, Albert Einstein College of Medicine, Bronx, NY 10467, USA
4 NYU Center for Health Informatics and Bioinformatics, New York University Langone Medical Center, 227 East 30th Street, New York, NY 10016, USA


Abstract
Researchers generating new genome-wide data in an exploratory sequencing study can gain biological insights by comparing their data with well- annotated data sets possessing similar genomic patterns. Data compression techniques are needed for efficient comparisons of a new genomic experi- ment with large repositories of publicly available profiles. Furthermore, data representations that allow comparisons of genomic signals from differ- ent platforms and across species enhance our ability to leverage these large repositories. Here, we present a signal processing approach that char- acterizes protein–chromatin interaction patterns at length scales of several kilobases. This allows us to efficiently compare numerous chromatin-immuno- precipitation sequencing (ChIP-seq) data sets con- sisting of many types of DNA-binding proteins collected from a variety of cells, conditions and or- ganisms. Importantly, these interaction patterns broadly reflect the biological properties of the binding events. To generate these profiles, termed Arpeggio profiles, we applied harmonic deconvolu- tion techniques to the autocorrelation profiles of the ChIP-seq signals. We used 806 publicly available ChIP-seq experiments and showed that Arpeggio profiles with similar spectral densities shared biolo- gical properties. Arpeggio profiles of ChIP-seq data sets revealed characteristics that are not easily de- tected by standard peak finders. They also allowed us to relate sequencing data sets from different genomes, experimental platforms and protocols. Arpeggio is freely available at http://sourceforge. net/p/arpeggio/wiki/Home/.

Top

H5-2: TrAp: a tree approach for fingerprinting subclonal tumor composition

Francesco Strino1, Fabio Parisi1, Mariann Micsinai1,2 and Yuval Kluger1,2,3,*

1 Department of Pathology, Yale University School of Medicine, New Haven, CT 06520, USA 2 NYU Center for Health Informatics and Bioinformatics, New York University Langone Medical Center, 227 East 30th Street, New York, NY 10016, USA
3 Yale Cancer Center, New Haven, CT 06520, USA


Abstract
Revealing the clonal composition of a single tumor is essential for identifying cell subpopulations with metastatic potential in primary tumors or with resistance to therapies in metastatic tumors. Sequencing technologies provide only an overview of the aggregate of numerous cells. Computational approaches to de-mix a collective signal composed of the aberrations of a mixed cell population of a tumor sample into its individual components are not available. We propose an evolutionary frame- work for deconvolving data from a single genome- wide experiment to infer the composition, abundance and evolutionary paths of the underlying cell subpopulations of a tumor. We have developed an algorithm (TrAp) for solving this mixture problem. In silico analyses show that TrAp correctly deconvolves mixed subpopulations when the number of subpopulations and the measurement errors are moderate. We demonstrate the applic- ability of the method using tumor karyotypes and somatic hypermutation data sets. We applied TrAp to Exome-Seq experiment of a renal cell carcinoma tumor sample and compared the mutational profile of the inferred subpopulations to the mutational profiles of single cells of the same tumor. Finally, we deconvolve sequencing data from eight acute myeloid leukemia patients and three distinct metastases of one melanoma patient to exhibit the evolutionary relationships of their subpopulations.

Top

H5-3: Variants Affecting Exon Skipping Contribute to Complex Traits

Younghee Lee1,*, Eric R. Gamazon1, Ellen Rebman1, Yeunsook Lee2, Sanghyuk Lee3, M. Eileen Dolan1, Nancy J. Cox1,*, Yves A. Lussier1,4,*

1 Department of Medicine, The University of Chicago, Chicago, Illinois, United States of America
2 Bioinformatics and Computational Biology Program, Iowa State University, Ames, Iowa, United States of America
3 Departments of Life Sciences, Ewha Womans University, Seoul, Korea
4 Departments of Medicine and of Bioengineering, University of Illinois at Chicago, Chicago, Illinois, United States of America


Abstract
DNA variations affect alternative splicing, an overlooked mechanism present in 40% of complex human diseases. A commonly held hypothesis asserts that, in complex human traits, altered splicing patterns might be more important than expression changes in determining disease-associated risks. Furthermore, the therapeutic potential of using single nucleotide polymorphisms (SNPs) to cause alternative splicing of exons has been experimentally demonstrated in models of human disease. The precise mechanism by which SNPs regulate this process remains to be fully elucidated. In this study, we develop an integrative approach that utilizes sequence-based analysis and genome-wide expression profiling to identify genetic variations that may affect alternative splicing. We also provide the first proof of their enrichment among validated disease-associated variations. Our study provides insights into the functionality of these variations and emphasizes their importance for complex human traits and diseases.

Top

Highlight Research 6.

Room: Grand Ballroom C
Date: Friday, Oct. 4, 14:50 - 16:05
H6-1: Statistical epistasis networks reduce the computational complexity of searching three-locus genetic models

Ting Hu1, Angeline S. Andrew2, Margaret R. Karagas2, Jason H. Moore3,*

1 Department of Genetics, Geisel School of Medicine Dartmouth College, Hanover, NH 03755, USA
2 Department of Community and Family Medicine, Geisel School of Medicine Dartmouth College, Hanover, NH 03755, USA
3 Institute for Quantitative Biomedical Sciences Departments of Genetics and Community and Family Medicine, Geisel School of Medicine Dartmouth College, Hanover, NH 03755, USA


Abstract
The rapid development of sequencing technologies makes thousands to millions of genetic at- tributes available for testing associations with various biological traits. Searching this enormous high-dimensional data space imposes a great computational challenge in genome-wide association studies. We introduce a network-based approach to supervise the search for three-locus models of disease susceptibility. Such statistical epistasis networks (SEN) are built using strong pairwise epistatic interactions and provide a global interaction map to search for higher-order interactions by prioritizing genetic attributes clustered together in the networks. Applying this approach to a population-based bladder cancer dataset, we found a high susceptibility three-way model of ge- netic variations in DNA repair and immune regulation pathways, which holds great potential for studying the etiology of bladder cancer with further biological validations. We demonstrate that our SEN-supervised search is able to find a small subset of three-locus models with significantly high associations at a substantially reduced computational cost.

Top

H6-2: PhenDisco (Phenotype Discoverer): a New Information Retrieval System for the database of Genotypes and Phenotypes

Hyeoneui Kim1, Son Doan1, Lucila Ohno-Machado1

1 Division of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA

Abstract
The database of Genotypes and Phenotypes (dbGaP) developed by the National Center for Biotechnology Information (NCBI) contains information on phenotypes, genotypes and study protocols from various Genome Wide Association Studies (GWAS). Although dbGaP is a critical resource that can facilitate new exploratory research or cross-study validation, lack of standardization in a way that phenotype information is presented becomes a major barrier to accurate and complete retrieval of the studies with a phenotype of interest. As a solution to this challenge, we developed an NLP-based process to standardize phenotype variables, and an information retrieval tool that processes user queries, and then displays the results in the order of relevance. These processes were implemented in PhenDisco. In a preliminary evaluation, PhenDisco showed better retrieval performance than dbGaP, as well as superior acceptance by users,, showing that it fills an important gap in this area.

Top

H6-3: Phenome-Wide Association Study (PheWAS) for Detection of Pleiotropy within the Population Architecture using Genomics and Epidemiology (PAGE) Network

Sarah A. Pendergrass1, Kristin Brown-Gentry2, Scott Dudek2, Alex Frase1, Eric S. Torstenson2, Robert Goodloe2, Jose Luis Ambite3, Christy L. Avery4, Steve Buyske5,6, Petra Bůžková7, Ewa Deelman3, Megan D. Fesinmeyer8, Christopher A. Haiman9, Gerardo Heiss4, Lucia A. Hindorff10, Chu-Nan Hsu3, Rebecca D. Jackson11, Charles Kooperberg8, Loic Le Marchand12, Yi Lin8, Tara C. Matise5, Kristine R. Monroe9, Larry Moreland13, Sungshim L. Park12, Alex Reiner8,14, Robert Wallace15, Lynn R. Wilkens12, Dana C. Crawford2,16, Marylyn D. Ritchie1,*

1 Center for Systems Genomics, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, Eberly College of Science, The Huck Institutes of the Life Sciences, University Park, Pennsylvania, United States of America
2 Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, United States of America
3 Information Sciences Institute, University of Southern California, Marina del Rey, California, United States of America
4 Department of Epidemiology, University of North Carolina, Chapel Hill, North Carolina, United States of America
5 Department of Genetics, Rutgers University, Piscataway, New Jersey, United States of America
6 Department of Statistics, Rutgers University, Piscataway, New Jersey, United States of America
7 Department of Biostatistics, University of Washington, Seattle, Washington, United States of America
8 Division of Public Health, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
9 Department of Preventive Medicine, Keck School of Medicine, University of Southern California/Norris Comprehensive Cancer Center, Los Angeles, California, United States of America
10 National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, United States of America
11 Ohio State University, Columbus, Ohio, United States of America
12 Epidemiology Program, University of Hawaii Cancer Center, Honolulu, Hawaii, United States of America
13 University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
14 Department of Epidemiology, University of Washington, Seattle, Washington, United States of America
15 Departments of Epidemiology and Internal Medicine, University of Iowa, Iowa City, Iowa, United States of America
16 Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, Tennessee, United States of America


Abstract
Using a phenome-wide association study (PheWAS) approach, we comprehensively tested genetic variants for association with phenotypes available for 70,061 study participants in the Population Architecture using Genomics and Epidemiology (PAGE) network. Our aim was to better characterize the genetic architecture of complex traits and identify novel pleiotropic relationships. This PheWAS drew on five population-based studies representing four major racial/ethnic groups (European Americans (EA), African Americans (AA), Hispanics/Mexican-Americans, and Asian/Pacific Islanders) in PAGE, each site with measurements for multiple traits, associated laboratory measures, and intermediate biomarkers. A total of 83 single nucleotide polymorphisms (SNPs) identified by genome-wide association studies (GWAS) were genotyped across two or more PAGE study sites. Comprehensive tests of association, stratified by race/ethnicity, were performed, encompassing 4,706 phenotypes mapped to 105 phenotype-classes, and association results were compared across study sites. A total of 111 PheWAS results had significant associations for two or more PAGE study sites with consistent direction of effect with a significance threshold of p,0.01 for the same racial/ethnic group, SNP, and phenotype-class. Among results identified for SNPs previously associated with phenotypes such as lipid traits, type 2 diabetes, and body mass index, 52 replicated previously published genotype-phenotype associations, 26 represented phenotypes closely related to previously known genotype-phenotype associations, and 33 represented potentially novel genotype-phenotype associations with pleiotropic effects. The majority of the potentially novel results were for single PheWAS phenotype-classes, for example, for CDKN2A/B rs1333049 (previously associated with type 2 diabetes in EA) a PheWAS association was identified for hemoglobin levels in AA. Of note, however, GALNT2 rs2144300 (previously associated with high-density lipoprotein cholesterol levels in EA) had multiple potentially novel PheWAS associations, with hypertension related phenotypes in AA and with serum calcium levels and coronary artery disease phenotypes in EA. PheWAS identifies associations for hypothesis generation and exploration of the genetic architecture of complex traits.

Top



 

ISCB Scientific Session

ISCB Scientific Session

Room: Grand Ballroom C
Date: Friday, Oct. 4, 08:30 - 10:10
I-1: Balanced Nucleo-cytosolic Partitioning Defines a Spatial Network for Coordination of Circadian Physiology in Plants

Daehee Hwang, Posttech, Korea

Abstract
Biological networks consist of a defined set of regulatory motifs. Subcellular compartmentalization of regulatory molecules can provide a further dimension in implementing regulatory motifs. However, spatial regulatory motifs and their roles in biological network have rarely been explored. Here, we show, using experimentation and mathematical modeling, that spatial segregation of GIGANTEA (GI), a critical component of plant circadian systems, into nuclear and cytosolic compartments leads to differential functions as positive and negative regulators of the circadian core gene, LHY, forming an incoherent feedforward loop to regulate LHY. This regulatory motif formed by nucleo-cytoplasmic partitioning of GI confers, through the balanced operation of the nuclear and cytosolic GI, strong rhythmicity and robustness to external and internal noises to the circadian system. Our results show that spatial and functional segregation of a single molecule species into different cellular compartments provides a unique means to extend the regulatory capabilities of biological networks.

Top

I-2: NGS sequence analysis for regulation and epigenomics

Tim Bailey, University of Queensland, Australia


Top

I-3: Revisiting statistical significance for finding combinatorial effects.

Jun Sese, Tokyo Insitute of Technology, Japan

Abstract
To understand complex associations between genotypes and phenotypes, a first step is to list up statistically significant combinations of the features. However, the discovery is not only computationally non-trivial but also extremely unlikely due to multiple testing correction. The exponential growth of the number of tests forces us to set a strict limit to the maximum arity. In this talk, we introduce an efficient branch-and-bound algorithm named Limitless Arity Multiple testing Procedure (LAMP) to count the exact number of testable combinations and calibrate the Bonferroni factor to the smallest possible value. LAMP lists up significant combinations without any limit, while the family-wise error rate is rigorously controlled under the threshold. We applied LAMP to the discovery of combinatorial regulations of transcription factors. From human breast cancer transcriptome, LAMP discovered statistically significant combinations of as many as eight binding motifs. This method may contribute to uncover pathways regulated in a coordinated fashion and find hidden associations in heterogeneous data.

Top

I-4: An integrative characterization of recurrent molecular aberrations in glioblastoma genomes

Chen-Hsiang Yeang, Academica Sinica, Taiwan

Abstract
Glioblastoma multiforme (GBM) is the most common and malignant primary brain tumor in adults. Decades of investigations and the recent effort of the Cancer Genome Atlas (TCGA) project have mapped many molecular alterations in GBM cells. Alterations on DNAs may dysregulate gene expressions and drive malignancy of tumors. It is thus important to uncover causal and statistical dependency between "effector" molecular aberrations and "target" gene expressions in GBMs. A rich collection of prior studies attempted to combine copy number variation (CNV) and mRNA expression data. However, systematic methods to integrate multiple types of cancer genomic data -- gene mutations, single nucleotide polymorphisms, copy number variations, DNA methylations, mRNA and microRNA expressions, and clinical information -- are relatively scarce.
We proposed an algorithm to build "association modules" linking effector molecular aberrations and target gene expressions and applied the module-finding algorithm to the integrated TCGA GBM datasets. The inferred association modules were validated by six tests using external information and datasets of central nervous system tumors: (1)indication of prognostic effects among patients, (2)coherence of target gene expressions, (3)retention of effector-target associations in external datasets, (4)recurrence of effector molecular aberrations in GBM, (5)functional enrichment of target genes, and (6)co-citations between effectors and targets. Modules associated with well-known molecular aberrations of GBM -- such as chromosome 7 amplifications, chromosome 10 deletions, EGFR and NF1 mutations -- passed the majority of the validation tests. Furthermore, several modules associated with less well-reported molecular aberrations -- such as chromosome 11 CNVs, CD40, PLXNB1 and GSTM1 methylations, and mir-21 expressions -- were also validated by external information. In particular, modules constituting trans-acting effects with chromosome 11 CNVs and cis-acting effects with chromosome 10 CNVs manifested strong negative and positive associations with survival times in brain tumors. By aligning the information of association modules with the established GBM subclasses based on transcription or methylation levels, we found each subclass possessed multiple concurrent molecular aberrations. Furthermore, the joint molecular characteristics derived from 16 association modules had prognostic power not explained away by the strong biomarker of CpG island methylator phenotypes. Functional and survival analyses indicated that immune/inflammatory responses and epithelial-mesenchymal transitions were among the most important determining processes of prognosis. Finally, we demonstrated that certain molecular aberrations uniquely recurred in GBM but were relatively rare in non-GBM glioma cells. These results justify the utility of an integrative analysis on cancer genomes and provide testable characterizations of driver aberration events in GBM.

Top