KEYNOTE PRESENTATIONS

Olivier Lichtarge, MD, PhD

Cullen Chair and Professor, Molecular and Human Genetics
Baylor College of Medicine, USA


Evolution vs Disease: From Big Data and Text Mining to Personalized Genomics

Computational integration is essential to translate the buildup of biological data and publications into meaningful knowledge. But the complexity, heterogeneity, and sheer mass of information are daunting. Here, we split this long-term goal into small, tractable steps. One step integrates gene interaction networks over hundreds of species to predict gene function, including a possible new target of a leading anti-malarial drug. Another step mines the literature into a network that it then reasons over, leading in a case study to the discovery of novel p53 kinases. These examples fuse structured and unstructured data into novel networks amenable to automated hypotheses generation. But, they still lack individual patient information. As a potential solution, we introduce an analytic model of evolution. This model describes the genotype-phenotype relationship in terms of perturbations in the fitness landscape. Mutational, clinical, and population genetic data show that this approach predicts the effect of point mutations in diverse proteins, in vivo and in vitro; that it correlates disease-causing gene mutations with morbidity and mortality; and that it determines human coding polymorphism frequencies, respectively. Altogether, these studies point to an integrative network formalism that may soon reflect structured and unstructured personalized to the relevant mutational variations of any individual. Diverse applications in biology and precision medicine should follow.


Olufunmilayo I. Olopade, MD, FACP

Walter L. Palmer Distinguished Service Professor of Medicine and Human Genetics
The University of Chicago, USA


Deploying Genomics and Immunology for Risk Assessment and Prevention

Breast cancer is no longer defined as a single disease but rather a heterogeneous disease comprised of distinct sub-types with varied molecular, clinical and prognostic characteristics. In the Era of Precision Medicine and Cancer Moonshot, women at risk for the most aggressive forms of breast cancer can derive more benefit from innovative interventions to personalize risk assessment for early detection, and optimal use of molecularly-targeted therapies to improve clinical outcomes. We are performing whole genome sequencing of breast cancer cases on the Illumina platform, with neoplastic and non-neoplastic tissues sequenced to average depths of 90x and 30x, respectively. To handle the computational burden inherent to large-scale sequencing analyses, we have developed SwiftSeq, a modular, highly-parallel workflow for fast, efficient, and robust processing of DNA sequencing data. Using Genome Analysis Toolkit's best practices, SwiftSeq is able to completely align, process, genotype, and annotate a 30x genome in ~36-40 hours. By scaling with compute resources, our framework can analyze hundreds of genomes in days, rather than weeks. Gathering data to inform policy interventions for diverse populations of women with breast cancer is daunting. With continued sequencing, analysis, and comparison of tumor-normal genomes from The Cancer Genome Atlas, we will elucidate the unique characteristics of young onset breast cancer genomes to determine which of these alterations may be amenable to novel approaches for therapy and primary prevention.


Yves A. Lussier, MD

University of Arizona, USA


Integrative genomics analyses unveil downstream biological effectors of disease-specific polymorphisms buried in intergenic regions

Functionally altered biological mechanisms arising from disease-associated polymorphisms, remain difficult to characterise when those variants are intergenic, or, fall between genes. We sought to identify shared downstream mechanisms by which inter- and intragenic single-nucleotide polymorphisms (SNPs) contribute to a specific physiopathology. Using computational modelling of 2 million pairs of disease-associated SNPs drawn from genome-wide association studies (GWAS), integrated with expression Quantitative Trait Loci (eQTL) and Gene Ontology functional annotations, we predicted 3,870 inter-intra and inter-intra SNP pairs with convergent biological mechanisms (FDR<0.05). These prioritised SNP pairs with overlapping messenger RNA targets or similar functional annotations were more likely to be associated with the same disease than unrelated pathologies (OR>12). We additionally confirmed synergistic and antagonistic genetic interactions for a subset of prioritised SNP pairs in independent studies of Alzheimer's disease (entropy P=0.046), bladder cancer (entropyP=0.039), and rheumatoid arthritis (PheWAS case-control P<10-4). Using ENCODE data sets, we further statistically validated that the biological mechanisms shared within prioritised SNP pairs are frequently governed by matching transcription factor binding sites and long-range chromatin interactions. These results provide a 'roadmap' of disease mechanisms emerging from GWAS and further identify candidate therapeutic targets among downstream effectors of intergenic SNPs.


Nigam Shah, MBBS, PhD

Associate Professor of Medicine, Biomedical Informatics Research
Stanford University School of Medicine, CA, USA


Using Electronic Health Records for Translational Science and Better Patient Care

In the era of Electronic Health Records, it is possible to examine the outcomes of decisions made by doctors during clinical practice to identify patterns of care-generating evidence from the collective experience of patients. We will discuss methods that transform unstructured EHR data into a de-identified, temporally ordered, patient-feature matrix.  We will review use-cases, which use the resulting de-identified data, for pharmacovigilance, to reposition drugs, build predictive models, and drive comparative effectiveness studies in a learning health system


Lars Juhl Jensen, PhD

Professor, University of Copenhagen, Denmark


Mini-biography of Lars Juhl Jensen

Lars Juhl Jensen started his research career in Soren Brunak's group at the Technical University of Denmark (DTU), from where he in 2002 received the Ph.D. degree in bioinformatics for his work on non-homology based protein function prediction. During this time, he also developed methods for visualization of microbial genomes, pattern recognition in promoter regions, and microarray analysis. From 2003 to 2008, he was at the European Molecular Biology Laboratory (EMBL) where he worked on literature mining, integration of large-scale experimental datasets, and analysis of biological interaction networks. Since 2009, he has continued this line of research as a professor at the Novo Nordisk Foundation Center for Protein Research at the Panum Institute in Copenhagen and as a founder, owner and scientific advisor of Intomics A/S. He is a co-author of more than 150 scientific publications that have in total received more than 15,000 citations. He was awarded the Lundbeck Foundation Talent Prize in 2003, his work on cell-cycle research was named "Break-through of the Year" in 2006 by the magazine Ingeniøren, his work on text mining won the first prize in the "Elsevier Grand Challenge: Knowledge Enhancement in the Life Sciences" in 2009, and he was awarded the Lundbeck Foundation Prize for Young Scientists in 2010.

Medical data and text mining: Linking diseases, drugs, and adverse reactions

Clinical data describing the phenotypes and treatment of patients is an underused data source that has much greater research potential than is currently realized. Mining of electronic health records (EHRs) has the potential for revealing unknown disease correlations and for improving post-approval monitoring of drugs for adverse drug reactions. In my presentation I will introduce the centralized Danish health registries and show how we use them for identification of temporal disease correlations and discovery of common diagnosis trajectories of patients. I will also describe how we perform text mining of the clinical narrative from electronic health records and use this for identification of new adverse reactions of drugs.

 


Katsuya Tsuchihara, MD, PhD

Division of Translational Genomics,
Exploratory Oncology Research and Clinical Trial Center,
National Cancer Center Japan


Data storage and sharing in SCRUM-Japan; a nation-wide cancer genome screening project for drug development

SCRUM-Japan is a nation-wide cancer genome screening program including a lung cancer screening network, "LC-SCRUM" and a gastrointestinal cancer screening network, "GI-SCREEN". 4500 patients in total are planned to be collected from participating institutions extending from Hokkaido to the Kyushu regions from February 2015 to March 2017. Tumor samples are applied for the Oncomine Cancer Research Panel (Thermo Fischer Scientific) at CLIA-certified laboratories. Clinical information and annotated genome data are centralized to the SCRUM-Japan data center. The patients and physicians obtain individual profiles of actionable mutations and corresponding therapeutic arms. As well, the accumulated data are open for collaborating researchers in academia and industries to enhance the development of cancer therapies. As of September, 2016, 3559 cases of non-small non-squamous lung cancer, squamous cell lung cancer, colorectal cancer, and non-colorectal cancer have been enrolled. Based on the screening system, 34 clinical trials are on-going. Among them, LURET-study, a phase II study of vandetanib in patients with advanced RET-rearranged non-small cell lung cancer was successfully conducted.


Koji Tsuda, PhD

Professor, Department of Computational Biology and Medical Sciences
Graduate School of Frontier Sciences, The University of Tokyo, Japan


Significant Pattern Mining for Biomedical Applications

Pattern mining techniques such as itemset mining, sequence mining and graph mining have been applied to a wide range of datasets. To convince biomedical researchers, however, it is necessary to show statistical significance of obtained patterns to prove that the patterns are not likely to emerge from random data. The key concept of significance testing is family-wise error rate, i.e., the probability of at least one pattern is falsely discovered under null hypotheses. In the worst case, FWER grows linearly to the number of all possible patterns. We show that, in reality, FWER grows much slower than the worst case, and it is possible to find significant patterns in biomedical data. The following two properties are exploited to accurately bound FWER and compute small p-value correction factors. 1) Only closed patterns need to be counted. 2) Patterns of low support can be ignored, where the support threshold depends on the Tarone bound. We introduce efficient depth-first search algorithms for discovering all significant patterns and discuss about parallel implementations.


Woong Yang Park, MD, PhD

Director of Samsung Genome Institute, Samsung Medical Center, Korea
Professor of Molecular Cell Biology, Sungkyunkwan University School of Medicine, Korea


Single cell genome analysis for precision cancer medicine

Tumor-infiltrating lymphocytes (TILs) and the immune gene signature correlate with clinical progression in breast cancer. We isolated single cells from four breast cancer patients with different molecular subtypes to analyze the whole transcriptome. Based on copy number alterations (CNAs) in gene expression patterns, tumor cells could be separated from microenvironmental non-tumor cells. Although the pure population of tumor cells from four different subtypes displayed the characteristics of each subtype, heterogeneity was observed in the gene expression of cancer-related pathways. Most non-tumor cells were infiltrated immune cells, which showed the immune-suppressive signature in the triple-negative breast cancer-type sample. Immune cells for the luminal type of breast cancer consisted of activated lymphocytes. In this study, we uncovered molecular characteristics of TILs in breast cancers by single cell transcriptome analysis, especially through CNA-based separation of tumor and non-tumor cells.