Private SNUBiomedical Informatics

KEYNOTE PRESENTATIONS

Nancy J. Cox

Professor and Section Chief,
Section of Genetic Medicine, Department of Medicine
and Dept. of Human Genetics, U. of Chicago, MI, U.S.A.

New Approaches to Understanding the Genetic Component to Common Human Disease

Although genome-wide association studies (GWAS) has enabled us to identify many new loci with highly significant and reproducible associations to common diseases and related quantitative traits, these discoveries have not yet given us much new understanding of the biology underlying disease, nor enabled us to develop accurate predictive risk models. In this talk I will describe a new approach to characterizing the genetic component to common diseases with complex inheritance that promises both a more comprehensive understanding of the biological basis of disease as well as practical utility for predicting risk. Examples of the application of this approach to data on such disparate complex traits as bipolar disorder, schizophrenia, type 2 diabetes and autism illustrate well its value, and demonstrate that it can be applied equally well to data generated through array genotyping or next generation sequencing.

Trey Ideker

Division Chief of Genetics
Professor, Depts. of Medicine and Bioengineering
UC San Diego, CA, U.S.A.

Turning Protein Networks into Ontologies

Ontologies have been very useful for capturing knowledge as a hierarchy of concepts and their interrelationships. In biology, a prime challenge has been to develop ontologies of gene function given only partial biological knowledge and inconsistency in how this knowledge is curated by experts. I will present a method by which large networks of gene and protein interaction, as are being mapped systematically for many species, can be transformed to assemble an ontology with equivalent coverage and power to the manually-curated Gene Ontology (GO). The network-extracted ontology contains 4,123 biological concepts and 5,766 relations, capturing the majority of known cellular components as well as many additional concepts, triggering subsequent updates to GO. Using genetic interaction profiling we provide further support for novel concepts related to protein trafficking, including a link between Nnf2 and YEL043W. This work enables a shift from using ontologies to evaluate data to using data to construct and evaluate ontologies.

Takashi Gojobori

Vice-Director of National Institute of Genetics (NIG)
Professor at Center for Information Biology and DNA Data Bank of Japan (DDBJ) in NIG, Mishima, Japan

Big Data needs Good Tools: Translational Bioinformatics in Cell Innovation Project

As we know, the next-generation sequencing (NGS) technologies are changing a paradigm of genomic science so rapidly. First, a huge amount of nucleotide sequence data comes out from medical institutions such as medical schools of the universities and even city hospitals rather than laboratories in basic sciences. Second, how to obtain appropriate DNA or RNA samples timely in a given condition has become more crucial than how to be equipped with expensive sequencing machines, because sequencing itself is no longer a limiting factor of conducting genomic research from both viewpoints of time and cost. Third, when almost all the targets will become sequence-based, we have to deal with not only SNPs but also other types of variations such as CNV, Indels, and other DNA rearrangements. This may urge us to change the present due-course of GWAS, for example. Fourth and finally, development of powerful and accurate bioinformatics tool as well as construction of appropriate database have become essential in analyzing the so-called Big Data in order to produce the significant outcome. This paradigm change should be more emphasized particularly when we focus on translational medical research. In Japan, we conduct the research and development of NGS-sequence-based bioinformatics tools under the name of Cell Innovation Project in collaboration with RIKEN. I would present the current progress of this particular project with special reference to translational bioinformatics.

Maricel Kann

Assistant Professor,
Depts. of Biological Sciences and Computational Sciences and Engineering
University of Maryland Baltimore County, MD, U.S.A.

A Protein-Domain Approach for the Analysis of Disease Mutations

Identifying the functional context for key molecular disruptions in complex diseases is a major goal of modern medicine that will lead to earlier diagnosis and more effective personalized therapies. Most available resources for visualization and analysis of disease mutations center on gene analysis and do not leverage information about the functional context of the mutation. In addition, these gene-centric approaches are confounded by the fact that gene products (proteins) may share some functional sub-units or protein domains but not others. I will describe a resource for domain mapping of disease mutations, DMDM, a protein domain database developed by our group in which each disease mutation is aggregated and displayed by its protein domain location. We have also developed a methodology using domain significance scores (DS- Scores) to detect statistically significant disease mutation clusters at the protein domain level. When we applied the DS-Scores to human data, we identified domain hotspots in oncogenes, tumor suppressors, as well as in genes associated with Mendelian diseases. In addition, I will describe recent work on analyzing cancer somatic mutations from individual cancer patient genomes. We found that incorporating information about classification of proteins and protein sites leads to new hypotheses regarding the role of tumor somatic mutations in cancer. Our analysis confirms that the domain-centric approach creates a framework for leveraging structural genomics and evolution into the analysis of disease mutations.

Jason Moore

Professor of Genetics, Professor of Community and Family Medicine
Director of the Institute for Quantitative Biomedical Sciences
Director of the Graduate Program in Quantitative Biomedical Sciences
Associate Director for Bioinformatics, Norris-Cotton Cancer Center
Editor-in-Chief, BioData Mining

Computational Intelligence Strategies for Embracing the Complexity of Genetic Architecture

Given infinite time, humans would progress through modeling complex data in a manner that is dependent on prior knowledge of their domain, computer science and statistics as well as their prior experience working with other data. For example, a human modeler interested in identifying genetic risk factors for type II diabetes might start by examining insulin metabolism genes. We will review extensions and enhancements to an artificial intelligence-based computational evolution system (CES) that has the ultimate objective of tinkering with data as a human would. The key to the CES system is the ability to identify and exploit expert knowledge from biological databases or prior analytical results. Our prior studies have demonstrated that CES is capable of efficiently navigating large and rugged fitness landscapes toward the discovery of biologically meaningful genetic models of disease predisposition.

Jessica Tenenbaum

Associate Director for Bioinformatics
Duke Translational Medicine Institute Biomedical Informatics Core
Duke University, NC, U.S.A.

Informatics to enable precision medicine: achievements, obstacles and opportunities

The field of translational bioinformatics is at an exciting stage of progression. The past 5-10 years have seen the establishment of TBI as a widely recognized discipline unto itself, and the launch of a number of large-scale initiatives that TBI has enabled. A recent report from the National Academies describes how the recent explosion of molecular data coupled with clinical data on actual patients holds the potential to define an entirely new taxonomy of disease. In this new taxonomy, disease would be classified not solely by macroscopic symptoms many of which have been observed for centuries, but rather based on underlying molecular and environmental causes. This paradigm shift, enabled by novel methods for the generation, storage, analysis, and visualization of "big data" in biology and medicine, promises to do nothing short of rewrite the textbook of medicine moving forward. It will change the way we approach biomedical research and practice across the spectrum of scale, from molecules to populations. As technology continues to advance, assay costs to decrease, and as methods are further refined, the next decade is likely to feature increasingly pervasive examples of applied translational bioinformatics, both in healthcare and other areas of day to day life. In this talk I will highlight success stories and outstanding achievements in, or enabled by, translational bioinformatics. I will describe some important caveats and obstacles we face in this rapidly advancing field, as well as some ideas on how to address those hurdles. Finally, I will explore some of the tremendous opportunities we face in the years ahead.

Olga Troyanskaya

Associate Professor, Lewis-Sigler Institute for Integrative Genomics
and Department of Computer Science,
Princeton University, Princeton, NJ, U.S.A.

Understanding complex human disease through cell-lineage specific networks

The ongoing explosion of new technologies in functional genomics offers the promise of understanding gene function, interactions, and regulation at the systems level. This should enable us to develop comprehensive descriptions of genetic systems of cellular controls, including those whose malfunctioning becomes the basis of genetic disorders, such as cancer, and others whose failure might produce developmental defects in model systems. However, the complexity and scale of human molecular biology make it difficult to integrate this body of data, understand it on a systems level, and apply it to the study of specific pathways or genetic disorders. These challenges are further exacerbated by the biological complexity of metazoans, including diverse biological processes, individual tissue types and cell lineages, and by the increasingly large scale of data in higher organisms. I will describe how we address these challenges through the development of bioinformatics frameworks for the study of gene function and regulation in complex biological systems and through close coupling of these methods with experiments, thereby contributing to understanding of human disease. I will specifically discuss how integrated analysis of functional genomics data can be leveraged to study cell-lineage specific gene expression, to identify proteins involved in disease in a way complementary to quantitative genetics approaches, and to direct both large-scale and traditional biological experiments.

Naomichi Matumoto

Professor, Dept. of Human Genetics,
Yokohama City University Graduate School of Medicine, Yokohama, Japan.

Exome sequencing in mendelian disorders

Disease-related genome analysis (DGA) has been developed and sophisticated together with technology advances. The advent and frequent update of next generation sequencers (NGSs) can attain the appropriate accuracy for mutation analysis and push��DGA into the new stages. We now use Illumina Genome Analyzer (GA) IIx and Hiseq2000 which can produce as much as 60-Gb and 600-Gb sequences in one run, respectively. To focus on genes, we utilized exon capture methods such as SureSelect (Agilent). The current NGS protocol uses 100-108-bp pair-end reads and usually produces 8-9 Gb sequences (per one sample) could be enough for analysis of the whole exome: 90 % of exome bait regions are covered by 8-10 reads or more. Sequences are aligned using MAQ, BWA, Novoalign and commercial-based NextGENe software all of which are able to extract nucleotide changes and small insertions/deletions. The most critical step is the priority scheme selecting variants. We have been successful in addressing culprit mutations in several Mendelian diseases. I will present our procedures used in our projects including Coffin-Siris syndrome and others.

Steven E. Brenner

Professor, Depts. of Plant and Microbial Biology and Molecular and Cell Biology
Affiliated Associate Professor, Dept. of Bioengineering
UC Berkeley.

Ultraconserved nonsense: gene regulation by alternative splicing & RNA surveillance

Nonsense-mediated mRNA decay (NMD) is a cellular RNA surveillance system that recognizes transcripts with premature termination codons and degrades them. Using RNA-Seq, we discovered large numbers of natural alternative splice forms that appear to be targets for NMD. This coupling of alternative splicing and RNA surveillance can be used as a means of gene regulation. We found that all conserved members of the human SR family of splice regulators have an ��unproductive�� alternative mRNA isoform targeted for NMD degradation. Preliminary data suggest that this is used for creating a network of auto- and cross-regulation of splice factors. Strikingly, the splice pattern for each SR protein is shared with mouse, and each alternative splice is associated with an ultraconserved or highly-conserved region of ~100 or more nucleotides of perfect identity between human and mouse--amongst the most conserved regions in these genomes. Further, we recently discovering that most ancient known alternative splicing event is in this family and creates an alternate transcript to be degraded by NMD. Despite conservation since the pre-Cambrian, when the genes duplicate they change their regulation, so that nearly every human SR gene has its own distinctive sequences for unproductive splicing. As a result, this elaborate mode of gene regulation has ancient origins and can involve exceptionally conserved sequences, yet after gene duplication it evolves swiftly and often.

Yi-Xue Li

Professor, Chairman of Department of Bioinformatics and Biostatistics, College of Life Science and Biotechnology, Shanghai Jiao Tong University, Director of Shanghai Center for Bioinformation Technology, Vice Director of Key Laboratory of Systems Biology at the Shanghai Institutes for Biological Science, Chinese Academy of Sciences, and director in the Shanghai Society for Bioinformatics.

Dynamic conservation of gene co-expression and oncogene deciphering
Liyun Yuan, Guohui Ding, Y. Eugene Chen, Zhe Chen, Yixue Li

Gene expression profiling from patients provide much biological information for oncogene deciphering. Traditional methods, such as the Student��s t-test and the clustering methods, identify differentially expressed genes through comparison of adjacent disease stages. These methods take no account of time conservative features and cooperative properties of gene signatures across whole disease stages, and may cause high false positive in finding disease related genes. Some new methods, like the multiclass ordinal analyses, were developed to identify genes involved in cancer development by extracting consistently increasing or decreasing expressed gene signatures in consideration of global changes of gene expression. Because gene expression profiling data are too complicated and heterogeneous, it is still a challenge to mine disease related genes. In fact, by using different methods on same gene expression data, we rarely get consistent results. In term of this, we developed an algorithm to deal with gene expression data in consideration of time serial conservation properties. In our method any specific expressed gene can be ranked with a time conservative score to evaluate its importance in cancer progression and development. Comparing with current methods, our algorithm can effectively and exactly identify functional gene sets by evaluating the global conservative properties of gene expression signatures. According to our approaches, a total of 480 genes in 29 clusters were obtained, only 8 percent of them can be identified by other studies. In a case study, 2 clusters were randomly selected and 9 genes were carefully annotated. All of those genes showed strong functional link with carcinoma occurrences and themselves form a small gene regulatory network mediated by P53, c-Myc, Sp1, IRF1, etc... Thus, to some extent, our evolutionary conservation analyzing based methodology compensates for the inherent weaknesses of current statistics methods and provides a new way for dynamic gene expression profile analyzing