Poster Abstracts

  TBC-1: A computational modeling for short term response predictive of long term response in the detection of Alzheimer's disease severity

Hyunjo Kim1,*

1 Department of Life Science, University of Gachon, Seungnam, Kyeonggido, Korea

Abstract
We have developed a computer based prediction model that is used to determine the severity of Alzheimer's disease (AD). To identify severity AD, we have analyzed the human based on these MRI images and data we have designed an automated system for the determination of AD severity. The algorithms described in this study may be used in clinical practice to validate or invalidate the diagnoses. Algorithms or method developed here may also be used for pooling diagnostic knowledge for serving mankind. Here we have described a computational based low cost AD diagnostic approach which can aid psychiatrists to quickly diagnose the various stages of AD. This system can accept AD and can successfully detect any pathological condition associated with AD.
Top

  TBC-2: An Approach to Function Prediction of Metabolites by Clustering the 3D- Chemical Structural Similarity Based Network

Md. Altaf-Ul-Amin1,*, Nobutaka Wakamatsu1, Shigehiko Kanaya1

1 Nara Institute of Science and Technology, 8916-5, Thakayama, Ikoma Nara 630-0192, Japan

Abstract
Secondary metabolites are used by humans as flavors, fragrances, medicines, biomarkers, fertilizers and for other purposes. They play important roles in ecological relationships between species. The broad functional spectrum of secondary metabolites is still not fully understood. A number of studies have investigated the relations between structures and functions of metabolites. It has been revealed that structural similarity between metabolites implies high possibility of functional similarity between them. In light of this fact we propose a method for function prediction of secondary metabolites based on guilt by association philosophy. First we determine the structural similarity scores of all possible metabolite pairs using COMPLIG algorithm and then select the metabolite pairs for which the similarity score is more than or equal to the threshold value of 0.95. To increase the possibility of clusters rich with known metabolites we then again select structurally similar metabolite pairs for which functions of both metabolites or at least one metabolite is known. The network of such metabolite pairs is then clustered using the DPClusO algorithm. Statistically significant cluster-function pairs are then selected using the concept of hypergeometric p-value and False Discovery Rate (FDR). Functions are then predicted for function unknown metabolites based on statistically significant cluster-function pairs.
Top

  TBC-3: VICTOR: a pipeline for Variant Interpretation in Clinical Testing Or Research

Bing-Jian Feng1,*, Kristina Callis Duffin, Gerald Krueger, Wendy Kohlmann, Joshua Schiffman, Marjanka Schmidt, Alfon Meindl, Ricardo Berruti, Rita Schmutzler, Eric Hahnen, Maxime Vallee, Arnaud Droit, Douglas Easton, Sean Tavtigian, Jacques Simard and David Goldgar

1 Dermatology, University of Utah, Salt Lake City, UT, USA

Abstract
We have developed a variant interpretation pipeline that starts from a raw genotype file in VCF format. It conducts genotype-, variant-, and sample-wise quality control of data. This pipeline implements a novel functional consequence annotation program that annotates against the predominant transcripts whenever such information is available, chooses the most biologically relevant 5' or 3' representation for short insertions or deletions (InDel), merges multi-nucleotide polymorphisms (MNP), labels loss-of-function (LoF) variants, supports non-coding regions, and is robust to reference sequence errors. For clinical testing, this pipeline quantitatively integrates multiple deleteriousness scores, allele frequencies in different populations, co-segregation within pedigrees, and association among case-control samples to calculate a posterior probability of pathogenicity for each variant. For gene discovery research, it performs gene prioritization by integrating a region-based linkage analysis, a novel rare-variant association analysis where variants are weighted by deleteriousness and call quality, the relatedness of each gene to known disease genes within a gene-gene association network, and the differential gene expression between lesional and non-lesional tissues. In both scenarios, all components are combined in a quantitative fashion. Being light-weighted and fast with low demands on memory, this pipeline is scalable to whole genome sequencing (WGS) of a large sample of individuals that is typical of a complex disease research. Components of this framework can be assembled in various ways to accommodate different study designs and analysis goals. Using this pipeline, we have re-classified a TP53 variant of unknown significance (VUS) for Li-Fraumeni Syndrome (LFS), analyzed the whole exome sequencing (WES) of 1368 breast cancer cases and 3725 healthy controls from the PERSPECTIVE project (PErsonalised Risk Stratification for Prevention and Early deteCTIon of breast cancer), and analyzed the WES of 42 cases from 16 high-risk psoriasis pedigrees in the Utah Psoriasis Initiative project (UPI). The results demonstrated the value of the VICTOR pipeline in variant classification and gene discovery applications.
Top

  TBC-4: Identification of a Genetic Locus for Thoracic-to-Hip Ratio in a Large Family: a Genome-Wide Linkage and Targeted Re-sequencing Analyses

Seongwon Cha1,*, and Changsoo Kang2

1 Mibyeong Research Center, Korea Institute of Oriental Medicine, Daejeon 34054, Korea
2 Department of Biology and Research Institute of Basic Sciences, College of Natural Sciences, Sungshin Women's University, Seoul 01133, Korea


Abstract
Increasing prevalence of cardiometabolic risks containing metabolic syndrome traits have affected the increased morbidity and mortality. The heritability of cardiometabolic risk factors has been known to be 31 - 77% by twin studies. Recently, various genome-wide association studies have been performed to elucidate single nucleotide polymorphisms (SNPs) associated with cardiometabolic risks. However, the heritability estimated with the genome-wide variants has been less than that from twin studies, although the genome-wide complex trait analysis has compensated a part of the missing heritability (approximately 20% - 50%). The genetic analysis combining genome-wide linkage analysis (GWLS) and next generation sequencing can facilitate the identification novel genetic loci on cardiometabolic risks from large family study. Here, we tried to find quantitative trait loci for anthropometric indices including body mass index, waist-to-hip ratio, and thoracic-to-hip ratio (THR), systolic and diastolic blood pressures, lipid traits, and fasting blood glucose in a large family over three generations consisting of 171 individuals, using the GWLS and the followed targeted re-sequencing. After selecting 9,472 evenly distributed SNPs out of 500K genome-wide DNA chip, the significant linkage of the multiple SNPs to the THR was detected in the region of chromosome 5q12.3-31 (peak LOD score = 5.1 in the 5'-UTR of EPB41L4A). To identify causative variant associated with THR, we performed targeted sequencing of the 4.34-Mb region containing the peak SNP at its center in 31 individuals of the same family. Bioinformatic and statistical analyses showed that the most significant SNP for THR was localized in the intron of KCNN2 (p = 0.0000678). The second significant signal was detected in the SNP of EPB41L4A intron (p = 0.000636). In conclusion, we suggest that the variants in chromosome 5q21.3-22.3 may harbor genetic factors affecting THR and, by extension, cardiometabolic risk in Koreans.
Top

  TBC-5: An approach for inferring dynamic pathway interaction using cancer datasets

Shinuk Kim1,*

1 Sangmyung University

Abstract
In this paper we introduce an approach for inferring dynamic pathway interactions by converting static datasets to dynamic datasets using patients' clinical information. One such approach is using grade-and-stage based dynamic datasets. We generated six dynamic levels based on grades and stages, and obtained two pairs of positively related pathways among 12 enrichment pathways. The common genes of one pair of pathways consisting of LEISHMANIA INFECTION (21 overlapping genes) and ALLOGRAT REJECTION (12 overlapping genes) are four including HLA-DMB, HLA-DOA, HLA-DOB and IFNG with correlation coefficient 0.89. The other pair of pathways consists of SPLICESOME (32 overlapping genes) and PRIMARY IMMUNODEFICIENCY (15 overlapping genes) with 0.94 coefficient and no common genes.
Top

  TBC-6: Serum MicroRNA Expression Profiling of Prolonged Fatigue: RNA Sequencing and Quantitative PCR

Taehyeung Kim1, Seongwon Cha1,*

1 Mibyeong Research Center, Korea Institute of Oriental Medicine, Daejeon 34054, Korea

Abstract
Background: Prolonged fatigue is defined by persistent fatigue lasting at least one month but not exceeding more than 6 months without evident clinical causes, leading to the temporary inability of physical activity or optimal cognitive performance. However, diagnostic markers of prolonged fatigue have been still unknown. Therefore, we aimed to find serum microRNAs associated with prolonged fatigue in this study.
Method: We performed small RNA-sequencing by the Illumina Nextseq 500 with serum of 10 prolonged fatigue subjects and of 10 healthy controls, both matched for age, gender, and BMI. After alignment and pairwise differential expression analysis, we identified 12 microRNA candidates whose expression were significantly altered in fatigue subjects as compared with controls (P < 0.05 by DESeq2). At present, we are performing real-time quantitative PCR (RT-qPCR) in additional 80 fatigue subjects and 80 controls to validate differential expression of 12 microRNAs.
Results: Of 3 microRNAs having read counts of 1000 or more, miR-122-5p was down-regulated (P = 0.0088) only in women, while let-7f-5p and let-7a-5p were up-regulated (P = 0.0010 and 0.040, respectively) only in men. Of 9 microRNAs having read counts under 1000, 4 up-regulated and 1 down-regulated microRNAs were significant (P = 0.0031 - 0.044) in women, whereas 1 up-regulated and 2 down-regulated microRNAs were significant (P = 0.000056 - 0.027) in men. Exceptionally, miR-3605-5p was down-regulated (P = 0.043) in men + women. By using microRNA-seq browser (MiRGator v3.0), we checked that 12 serum microRNAs were mainly expressed in liver tissue or immune-related cells including peripheral blood mononuclear cell.
Conclusion: These findings emphasized that expression of several circulating microRNAs in fatigue individuals are significantly changed in a gender-specific manner. Furthermore, possible origin of the serum microRNAs would imply that fatigue associated microRNAs are involved in liver metabolism or immune regulation.
Top

  TBC-7: Analysis of mutation, copy number variation and DNA methylation in early breast tumorigenesis

Jong-Lyul Park1,2, Yong-Sun Lee3 and Seong-Young Kim1,2,*

1 Personalized Genomic Medicine Research Center, KRIBB, Daejeon 305-806, Korea
2 Department of Functional Genomics, University of Science and Technology, Daejeon, 305-806, Korea
3 Department of Biochemistry and Molecular Biology, University of Texas Medical Branch, Galveston, TX77555-1072, USA


Abstract
The timing and progression of mutation, copy number variation (CNV), and DNA methylation changes during carcinogenesis and metastasis are not completely understood. To inspect a timeline of aberrant mutation, CNV, and DNA methylation events during the carcinogenesis and metastasis progression, we analyzed normal human mammary epithelial cells (HMEC) (184D), four independent Benzo[a]pyrene (BaP)-derived immortal HMEC strains (184A1, 184AA4, 184B5, and 184BE1) and four HMEC strains immortalized with anchorage-independent growth (AIG) (184AA2, 184AA3, 184B5ME and 184FMY2) by Illumina whole genome sequencing and Epic 850K BeadChip. In carcinogenesis step coincident with immortalization, several of driver genes such as PCSK5, NFATC4, TAF1, AHNAK, CDKN2A, ASCL3, EPHB, KALRN, MED12, MTOR, ESCC5 and ESCC2 et al., were mutated. However, immortal with AIG HMEC strains acquired only one novel driver mutation (ADAM10 in 184FMY2) compared to immortal HMEC strains. For CNV, 0.06% and 0.36% of genome was amplified and deleted in the immortal HMEC strains, but percentage of amplification and deletion was 8.32 % and 18.08% in the immortal AIG HMEC strains, respectively. In case of DNA methylation, 6.91% and 5.74% of CpG sites were hypomethylated and hypomethylated in the immortal step compared to the normal HMECs but relatively small proportion (0.79% and 0.21% for hypomethylation and hypermethylation) of CpG sites were altered in the immortal with AIG step compared to the immortal HMECs. In summary, mutation and DNA methylation changes were dramatic in the carcinogenesis step, while CNV changes were dramatic in the metastasis progression. These results indicate that changes in mutation and DNA methylation may be significant in the carcinogenesis progression, while CNV changes may be important in the metastasis progression
Top

  TBC-8: GENT2: a platform for exploring Gene Expression patterns across Normal and Tumor tissues - newly updated

Seung-Jin Park1,2, Seon-Kyu Kim1, and Seon-Young Kim1,2,*

1 Personalized Genomic Medicine Research Center, Korea Research Institute of Bioscience and Biotechnology, Daejeon, Korea
2 Department of Functional Genomics, University of Science and Technology, Daejeon, Korea


Abstract
Although distinct expression changes of a gene and its diagnostic or prognostic values in a specific cancer were frequently reported, exploring expression alterations of a gene across various tissues is still profoundly important for identifying its heterogeneous molecular behavior and clinically applying it to other types of cancer. Here, we intensely updated a searching platform of gene expression across normal and tumor tissues, namely GENT2. The system has several advanced features. First, currently, we generated a database using gene expression data obtained from more than 60,000 cancer patients, a significant increase in the number of samples than the previous platform. All data were obtained from the NCBI GEO repository and generated based on the Affymetrix U133A or U133plus2 experimental platforms (Accession numbers: GPL97 and GPL570, respectively). Second, in spite of numerous samples in the database, GENT2 shows a fast search result and provides an intuitive visualization of cancer tissue-wide gene expression patterns for utilizing Google Web Toolkit (GWT). Lastly, GENT2 also illustrates a difference of gene expression among known molecular subtypes of cancer, which were previously reported. In conclusion, with these significant improvements and plentiful user access (at least 2,000 visitors a month), GENT2 represents a promising cancer research supporting tool to provide a simple but best valuable information about genes across whole cancers. GENT2 is freely available at http://mgrc.kribb.re.kr/GENT.
Top

  TBC-9: Characterization of aging-related genes through network biology

Sang-Hun Bae1,3, Han Wool Kim1, Seo Jeong Shin1, Jae Hyun Park1, Chul Woo Lim1, Jisook Moon1,2,*

1 College of Life Science, Department of Applied Bioscience, CHA University, Seoul, Korea
2 College of Life Science, Department of Bioengineering, CHA University, Seoul, Korea
3 General Research Institute, CHA general Hospital, Seoul, Korea


Abstract
Aging is an inevitable progressive decline in physiological functions and thus serves as a driver for disease and death. Due to complexity of it, the aging process needs to be understood in a systemic manner. To identify the general features of aging in the context of all the molecular interactions, we separated genes in the interactome that are associated with age into functionally distinct and physically connected modules using current biological knowledge. The modules are involved in immune process, metabolic process, developmental process, cancer pathway which are likely to be biological functions perturbed in the process of aging with the immune related sub-network showing the highest interconnectivity. Concerning relationships between age and diseases, we measured the network-based separation between the aging related modules and disease modules in the interactome. Certain disease modules are likely to be in the neighbourhood of some of the age related modules. Specifically, the age related immune module is associated with multiple sclerosis, autoimmune disease of nervous system, demyelinating and glucose metabolism disorders. In consistent with it, genes associated with immune response and myelination in the aging hippocampus show similar co-expression patterns and tend to be expressed at higher levels with advancing age in human transcriptome data, suggesting a significant role of immune process in aging. In the study, several ageing-related modules were created by integrating biological annotation and interactome to lead to the identification of the processes driving aging and aging-disease relationships.
Top

  TBC-10: Compliance of Korean Patients with Inflammatory Bowel Disease to Colonoscopy

Jay Choi1,2, Seungbin Oh1,2, Eugene Jeong1,2, and Hyun Wook Han1,2,3,*

1 CHA University Biomedical Informatics (CHABI),
2 Basic Medical Research Center, CHA University Graduate School of Medicine, Gyeonggi-do, Korea
3 Department of Preventive Medicine, CHA Bundang Medical Center, CHA University, Gyeonggi-do, Korea


Abstract
Patients with Inflammatory Bowel Disease in the United States (US) are recommended using surveillance colonoscopy at 2—3 year intervals beginning 8 years after diagnosis of IBD. However, one prior study showed that the use of surveillance colonoscopy in US Medicare patients with IBD was low. Meanwhile, it has long been commonly believed that IBD patients in Korea would have a facilitating access to healthcare, due to a lower medical treatment fee in Korea than in the United States, which would allow them to use surveillance colonoscopy more often than the patients in US have. Our aim was to study, through this retrospective, observational big data research, overall characteristics of IBD patients in Korea to challenge whether Korean IBD patients actually have a higher compliance to the surveillance colonoscopy. And then, with statistical analysis, we identified factors that affected the use of colonoscopy including sex, age, socioeconomic status, subtypes of IBD such as Crohn’s Disease and Ulcerative Colitis, and the presence of Colorectal cancer, which IBD patients are at high risk to have. In conclusion, our research offers gastroenterologists in Korea more accurate views on overall IBD patients and a new clinical approach to take care of them.
Top

  TBC-11: A causal modeling approach to human disease using Korean claims data

Eugene Jeong1,2,#, Kyungmin Ko1,2,3,#, Seungbin Oh1,2, Sangmin Nam5, and Hyun Wook Han1,2,4,*

1 CHA University Biomedical Informatics (CHABI),
2 Basic Medical Research Center, CHA University Graduate School of Medicine, Gyeonggi-do, Korea
3 Korea Veterans Health Service Medical Center, Seoul, Korea
4 Department of Preventive Medicine, CHA Bundang Medical Center, CHA University, Gyeonggi-do, Korea
5 Department of Ophthalmology, CHA Bundang Medical Center, CHA University, Gyeonggi-do, Korea
# Equally contributed


Abstract
In recent years, multiple risk factors of diseases are newly defined and the evidences of the relationship between diseases have been discovered. Many researchers in the field of biological network science have presented several kinds of disease networks using big data to solve the mystery of disease-disease associations. However, there are a number of limitations to fully understand associations between human disease and apply in practice: important risk factors contributing to many human disease, such as age, sex and causality, are not considered in most of the disease networks. To bridge the gap between research findings and clinical practice, we constructed the casual network of human disease using National Health Insurance Service (NHIS) sample cohort data of approximately 2% of total Korean population from 2002 to 2013 in which disease terms are encoded according to ICD-10. The Fisher exact test with the Bonferroni correction was used to reduce the risk of obtaining false-positive results. To measure the weights of the associations, a relative risk or risk ratio(RR) was calculated and we considered significant only those combinations for which p-value < 0.001 and RR >4. Our network is composed of 798 nodes (diseases) and 6,089 links (disease-disease associations). We find that our network is a scale-free network, which suggests that a few diseases (hubs) have a large number of links while the most diseases have small degrees. By applying the clustering detection algorithm to identify highly connected local sub-networks, we present that diseases are clustered not by the ICD-10 disease classes but by the mean age at incidence, which indicates that the casual network is differentiated from other networks based on biological data. Ultimately, our network not only gives a guideline to many researchers in many fields for future researches but also help to turn the possibility of precision medicine into an achievable target.
Top

  TBC-12: A Disease Network representing Combinatorial Risk Ratio Calculations in a Large Sample Cohort

Kyungmin Ko1,2,3,#, Eugene Jeong1,2,#, Seungbin Oh1,2, and Hyun Wook Han1,2,3,4,*

1 CHA University Biomedical Informatics (CHABI),
2 Basic Medical Research Center, CHA University Graduate School of Medicine, Gyeonggi-do, Korea
3 Korea Veterans Health Service Medical Center, Seoul, Korea
4 Department of Preventive Medicine, CHA Bundang Medical Center, CHA University, Gyeonggi-do, Korea
# Equally contributed


Abstract
The list of medical problems that a patient has had is a very important and useful piece of information that is taken into consideration when forming a hypothesis explaining the patient’s current symptoms or when planning treatment. The basis for this inference is provided in many cases by cohort studies, which are often used to test the association between an exposure and an outcome. There have been several disease networks called "comorbidity networks" based on relative risk calculations in cross-sectional models. Unlike this definition, risk ratios (also called relative risk) in cohort studies imply a temporal relationship between an exposure and outcome, which is a necessary condition for causality. We present a disease network representing combinatorial, pairwise risk ratio calculations as defined in the context of a cohort study. We used a sample cohort data provided by the National Health Insurance Service of South Korea where diagnoses are represented as ICD10 codes. A cutoff value of RR > 4 and FDR-corrected p-value of < 0.001 formed a network of 293 diseases and 3134 risk ratio relationships. We present some of the interesting and potentially useful properties of this network. Specifically, the disease nodes cluster into 4 major demographically distinct communities. In addition, the strength of the nodes calculated in the complete network aligns the disease nodes with respect to the age distribution of its patients and clusters the disease nodes into the aforementioned communities. The combinatorial calculation and its network representation are easily scalable to the level of a clinic or hospital.
Top

  TBC-13: Mining Potential Inhibitors for Bcr-AblT315I Mutation from Chinese Traditional Medicine

Yali Xiao1, Xin-Yi Liang1, Ping-Ru Lai1, and Pei-Chun Chang1,*

1 Department of Bioinformatics and Medical Engineering, Asia University, Taiwan

Abstract
Cancer has been ranked as one of the fatal causes since 1982. Recently, anticancer drug screening from the compounds of Chinese Traditional Medicine (TCM) has become a tendency in drug discovery. We build a drug screening process that focuses on the compounds from the formula of Chinese medicine. In this study, we focused on chronic myelogenous leukemia (CML) to mine the anticancer drug from herbs of Chinese Traditional Medicine. The gene Bcr-Abl in CML lost its regulation function of the tyrosine kinase that causes cells grow up continuously and inhibiting cells be withered. Currently, CML is treated by inhibiting the activity of Bcr-Abl tyrosine kinase that inhibits cell proliferation and induces apoptosis. Unfortunately, due to the variation of this cancer gene, the drugs such as imatinib, dasatinib, nilotinib, and bosutinib all have resistance effects for Bcr-AblT315I mutation. In addition, ponatinib has a deadly side effect. To overcome these problems, we proposed a filtering process to discover the potential drug from TCM compounds. The results show that salvianolic acid C, baicalin, 1, 4-dicaffeoylquinic acid, and dihydroisotanshinone I may have the potential for CML treatment with reducing side effects.
Top

  TBC-14: Short isoform of DNAJB6 protects against 1-methyl-4-phenylpridinium ion-induced apoptosis in LN18 cells via inhibiting ROS formation and mitochondrial membrane potential loss

Yeon-Mi Hong1,2, Yohan Hong1,2, Yeong-Gon Choi1,3, Sujung Yeo1,4, Hyejin Jung2, Suk-Hyun Lee2, Sae-Won Lee5, Soo Hee Jin1, and Sabina Lim1,2

1 Research Group of Pain and Neuroscience, East-West Medical Research Institute, Kyung Hee University, Seoul, Republic of Korea
2 Department of Meridian & Acupoint, College of Korean Medicine, Kyung Hee University, Seoul, Republic of Korea
3 Department of Neurodegenerative Diseases, Ilsong Institute of Life Science, Hallym University, Anyang, Republic of Korea
4 Department of Meridian & Acupoint, College of Korean Medicine, Sang Ji University, Wonju, Republic of Korea
5 Biomedical Research Institute and IRICT, Seoul National University Hospital, Seoul, Republic of Korea


Abstract
In a previous study, we found that the short isoform of DNAJB6 (DNAJB6(S)) had been decreased in the striatum of a mouse model of Parkinson's disease (PD) induced by 1-methyl-4-phenyl-1,2,3,6-tetrahydropyridine (MPTP). DNAJB6, heat shock protein (HSP), has been implicated in the pathogenesis of Parkinson's disease (PD). In this study, we explored the cytoprotective effect of DNAJB6(S) against MPP+-induced apoptosis and the underlying molecular mechanisms in cultured LN18 cells from astrocytic tumors. We observed that MPP+ significantly reduced the cell viability and induced apoptosis in LN18 glioblastoma cells. DNAJB6(S) protected LN18 cells against MPP+-induced apoptosis not only by suppressing Bax cleavage, but also by inhibiting a series of apoptotic events including loss of mitochondrial membrane potential, increase in intracellular reactive oxygen species, and activation of caspase-9. These observations suggest that the cytoprotective effects of DNAJB6(S) may be mediated, at least in part, by the mitochondrial pathway of apoptosis.
Top

  TBC-15: A study of association of genetic variants with imaging phenotypes in multiple sclerosis

Kicheol Kim1, Takuya Matsushita1, Lohith Madireddy1, Till Sprenger2, Pouya Khankhanian1, Stefano Magon2, Yvonne Naegelin3, Bruce A. Cree1, Eduardo Caverzasi1, Raija L.P. Lindberg2, Laura E. Jonkman3, Lisanne Balk3, Jeroen J.G. Geurts3, Ludwig Kappos2, Stephen L. Hauser1, Jorge R. Oksenberg1, Roland G. Henry1, Daniel Pelletier1, Ari J. Green1, Sergio E. Baranzini1,*

1 Department of Neurology, University of California, San Francisco (UCSF), San Francisco, USA
2 Department of Neurology, University Hospital of Basel, Basel, Switzerland
3 Dept. of Anatomy & Neuroscience of the VU University Medical Center, Amsterdam, Netherlands


Abstract
Multiple sclerosis (MS) is an autoimmune disorder caused by inflammatory demyelination of the central nervous system (CNS). GWAS have identified more than 140 loci that confer susceptibility to MS. However, a significant proportion of the heritability of MS remains to be explained. Relevant endophenotypes greatly empower the genetic analysis of complex diseases. In MS, brain, spinal cord, and retinal imaging represent valuable quantitative assessments of CNS integrity and function to help describe disease processes. Here we build on our experience in genetic analysis and acquisition of clinical datasets to: a) develop a set of neuroimaging-derived longitudinal endophenotypes that capture clinical relevant milestones associated with disease progression; b) test whether specific genetic variants associate with these endophenotypes. We conducted association studies using a UCSF (n=553) and two additional cohorts as replication (Amsterdam (n=205) and Basel (n=232)). Different MRI and OCT metrics were available in each of the cohorts, thus analyses were conducted on the appropriate datasets, but in all cases, at least two datasets were used. A GWAS with cortical thickness of 34 cerebral regions in UCSF and Basel datasets did not identify any significant association. However, in 9/34 regions in which thickness was found to be significantly different between cases and controls, significant associations with genes in pathways involved in neuronal differentiation were identified. Next, a focused genetic association study was conducted using only the known significant SNPs in a GWAS using MRI and OCT metrics as outcomes. Some MS-associated variants were modestly significant when tested for association with OCT metrics. These results are intriguing and warrant further exploration. The statistical significance of these associations are modest, highlighting the need to acquire even larger datasets. Efforts into merging these results with those of other groups in order to increase statistical power are underway.
Top

  TBC-16: Bioinformatics-based Analysis of Sepsis and CMap public microarray data

Seoungbin oh1,2, Jongman Yoo2,*, Hyun Wook Han1,2,3,*

1 CHA University Biomedical Informatics (CHABI),
2 Basic Medical Research Center, CHA University Graduate School of Medicine, Gyeonggi-do, Korea
3 Department of Preventive Medicine, CHA Bundang Medical Center, CHA University, Gyeonggi-do, Korea


Abstract
Sepsis is fatal systemic immune response triggered by microbial infection. A large number of specific single-targeted agents have been evaluated, but no specific agents currently approved to regulate immune system and improve the survival rate effectively. To overcome the difficulties of anti-septic pharmacologic study, we attempted systemic and network-level analysis of sepsis to search anti-septic agents to reverse the pathological changes in sepsis patients. As one of the bioinformatics-based methods, we hypothesized that if sepsis-induced gene expression is signified by a specific set of mRNA expression signitures and exposure to a drug cause the opposite effect on the cell lines, then drug might have a therapeutic effect antagonising the disease process. We chose drug X from Connectivity mapping algorithm and we selected 100 most significant DE genes using FDR less than 0.05 on GEO data as criteria and visualized their logFC values in heatmap on both microarray data to investigate the relation between sepsis and drug X-induced expression data. After the heatmap visualization, we selected and mapped 94 inversely correlated genes to STRING database to investigate the connectivity among genes in protein-protein interaction network. And we queried OGEE database and summerized the ratio of interactions and essential genes. This bioinformatics methods can be exploited to analyse big public data and find candidate drugs for repurposing and we are developing computational and network level methods for systemic searching upon this concept.
Top

  TBC-17: Distributed computing performance adaptation for human long read sequence SNV analysis

Chang-Wei Yeh1,#, Chieh-Wei Huang1,#, Chao-Chun Chuang1, Chang-Huain Hsieh1, Yu-Tai Wang1,*, and Chih-Min Yao1

1 1National Center for High-performance Computing (NCHC), National Applied Research Laboratories (NARLabs), Taiwan
# Equal co-first authors


Abstract
Single molecule sequencing long reads data computing demanding is emerging. The single molecule sequencing technology can be used for human genome single nucleotide variants calling. Despite using this technology with high quality sequencing result is expensive, long sequence reads can reduce many known issues, for example, phasing and identifying long structure variants. Base on above valuable advantages, we expect the long read sequencing technology will be getting popular and cost down continually. After the technology are popular used, long reads data type will cause what kind of computing impacts? There is no one knows. In this research, we use currently distributed computing facilities to emulate long reads data significantly emerging when identifying single nucleotide variants. In this emulating stress test, we will record each time benchmark logs. We analysed memory usages, CPU times, network traffics and file system capacities. That concludes, the computing and file system are still demanding. The memory requirement is still intensive. Those logs and parameters can help for designing next generation facilities for translational medicine users.
Top

  TBC-18: Big data analysis for chemical and protein interaction statistics

Hsuan-Feng Tseng1,#, Chieh-Wei Huang1,#, Chang-Wei Yeh1,#, Chao-Chun Chuang1, Chang-Huain Hsieh1, Yu-Tai Wang1,* and Chih-Min Yao1

1 1National Center for High-performance Computing (NCHC), National Applied Research Laboratories (NARLabs), Taiwan
# Equal co-first authors


Abstract
There are 2,660 genes and gene products without any identification of chemical interaction in public domain. We merged Chembl, PubChem, ChEBI, Drugbank and FDA public information to be a huge data warehouse. We examined the data, in total 21,867 genes, each gene have found 13.2 interacting chemicals in average. The most interacting chemical amount genes are CASP3. It have 1,053 chemicals. The top 10 genes are CASP3, TNF, CYP3A4, CXCL8, MAPK1, MAPK3, BCL2, TP53, CYP1A1 and BAX. In the list, 7 genes are related to cancer, cell program death and cell activations. 2 genes are related to chemical detoxification and metabolism. 1 gene is for immune response. However, there are 2,660 genes can not be found any interacting chemical in our data set. In the list, there are 135 olfactory receptors without any chemical interacting information. Other genes are GTP binding proteins, gene transcription regulation pathway and unknown function. In the result, our statistics shows that we find cancer is the target of currently mankind medical resource. We also find the preferences for life scientists. However, there are many interested and important biological question, such as olfactory mystery still need to be explored.
Top

  TBC-19: Exploring Deep Learning for Making Sense of Biotech Data

Mijung Kim1,2,*, Jasper Zuallaert1,2, and Wesley De Neve1,2

1 Data Science Lab, Ghent University - iMinds, Belgium
2 Center for Biotech Data Science, Ghent University Global Campus, Korea


Abstract
Deep neural networks have recently proven to outperform different machine learning techniques. The usage of these neural networks has gained even more attention after Google DeepMind’s AlphaGo managed to beat Sedol Lee in a five-game Go match in March 2016. In our research, we are applying deep learning techniques to vast sets of noisy biotech data, targeting four different use cases: (1) splice site detection in genomic data; (2) computer-aided drug discovery (CADD); (3) breast cancer detection and localization; and (4) sleep apnea detection. Each of these use cases leverages different deep learning techniques, given the different nature of the datasets involved. For splice site detection, we apply a convolutional neural network (CNN) to raw DNA sequences, with the goal of classifying candidate splice sites as true or pseudo splice sites. We combine CNNs with techniques that have already been successfully applied in the area of natural language processing, including long-short term memory networks (LSTMs) and word embeddings. For CADD, we leverage the publicly available PubChem database of chemical molecules and their activities against biological assays. In particular, we apply the ligand-based virtual screening method to detect interactions between drugs and targets, using a multi-task deep neural network. For breast cancer detection and localization, we apply a CNN to images belonging to a mammography dataset, with the goal of finding lesions. If a lesion is present in a given image, we then classify this lesion as either benign or malignant. Our neural network subsequently localizes where the lesion resides. For sleep apnea detection, we use clinical polysomnography data, for instance consisting of electroencephalograms (EEG), electrooculograms (EOG), electromyograms (EMG), and electrocardiograms (ECG). After processing of the raw data, we make use of a deep neural network to identify patterns in the cleansed data, with the aim of detecting sleep apnea.
Top

  TBC-20: A Cloud-Based Pathology Images Collaborative Platform for Medical Annotation, Analysis and Education

Chang-Wei Yeh1, Chieh-Wei Huang1, Chao-Chun Chuang1, and Yu-Tai Wang1,*

1 National Center for High-Performance Computing, Hsinchu 30076, Taiwan.

Abstract
The Cancer Genome Atlas (TCGA) data provides high-resolution digital whole-slide images (WSIs) for pathologists to make diagnoses directly and presents great opportunities to perform the studies of tissue morphology and development. Since different file formats and large-scale data, the integration between TCGA data and these WSIs from different laboratory are common challenges in pathology informatics. Thus, a suitable visualization and analysis platform is needed to integrate these vast and disparate images from TCGA and different laboratory. Here we developed a large scale and high performance storage system and created a web-based virtual microscopy platform for integrating WSIs and allowing users to view, search, annotate, and quantify high-resolution histology slides via the internet in real-time. Specially, users can easily keep their annotation personally or share these with other researchers by our protection system. For WSIs from different laboratory, this platform supports major Histology Image formats, including Aperio, Hamamatsu, Leica, MIRAX, Philips, Sakura, Trestle, Ventana, and Generic tiled TIFF. This platform is compatibility for any operating systems (OSX, Windows, iOS, and Android). In conclusion, while the basic purpose of the website is to provide a resource for the use of students in studying and analyzing pathological slides, it is being made available to the general pathology community and to interested clinicians everywhere.
Top

  TBC-21: Comparison of germline variant calling softwares from targeted next-generation sequencing

Minjung Kim1, Taeheon Lee1, Chae Hyun Lim1, Junnam Lee1, Guhwan Kim1, Young-Eum Kim1, Ja-Hyun Jang1, Han-Wook Yoo1, and Eun-Hae Cho1,*

1 Green Cross Genome, Yong-in 16924, Korea

Abstract
Background: Chromosomal microarray has been used as a first-tier diagnostic tool for microdeletion/microduplication syndromes in individuals with developmental delays or congenital anomalies. However, next generation sequencing (NGS) technology with rapid dropping of whole genome sequencing (WGS) cost, provides the possibility of low coverage WGS as an alternative method of copy number variation (CNV) detection in clinical cytogenetics. In this study, we developed bioinformatic pipeline for accurate CNV detection and compared the results of low coverage WGS with chromosomal microarray.
Methods: We analysed clinical samples of 62 patients who had been previously tested with chromosomal microarray (Affymetrix cytoscan 750K) due to congenital anomalies or developmental delays. The libraries prepared from these samples were pooled and sequenced with Nextseq 500 (Illumina) 75bp length. The average 3.6 million reads per sample were produced. Reads were aligned and curated to eliminate GC bias, mappability and high-order artifact using principle component analysis (PCA). For sample quality check, we developed Q-score system which used LOESS smoothing algorithm for elimination of locally clustered noise. Samples with higher Q-scores had more false segmentations.
Results: Compared with chromosomal microarray, we showed the possibility of low coverage WGS for detection of clinically relevant CNVs with lower cost and higher capacity. This study was supported by R&D program of MOTIE/KEIT (10053626), Republic of Korea.
Top

  TBC-22: Low coverage sequencing for comprehensive screening of chromosomal Copy Number Variation relative disease

Junnam Lee1, Young Joo Jeon1, Chae Hyun Lim1, Taeheon Lee1, Minjung Kim1, Young-Eun Kim1, Ja-Hyun Jang1, and Eun-Hea Cho1,*

1 Green Cross Genome, Yong-in 16924, Korea

Abstract
Background: Next Generation Sequencing (NGS) technologies enable fast and economic genome sequencing for clinical research and diagnosis. Accordingly, many analysis applications have been developed. But there have yet been high false positive rates in Insertion/deletion (INDEL) calling. So the aim of this study is to develop variant calling methods for targeted NGS.
Methods: For development of analysis pipeline, we compared performances of all possible combination of SNP and INDEL callers separately using the NA12878 whole exome sequencing data sets and NIST Genome in a Bottle validation call-set. And we tested this pipeline using patient samples of targeted NGS panels.
Results: The combination of UnifiedGenotyper and Samtools showed the best performance in SNP calling and a combination of three callers (UnifiedGenotyper, Freebayes and Scalpel) exhibited higher precision rate and sensitivity in INDEL calling using data sets of NA12878 whole-exome sequencing. This combination also detected about 90bp long deletion. We demonstrated 100% concordance in detecting 297 pathogenic single nucleotide variants and 33 pathogenic insertion-deletion mutations in 340 patients that were previously confirmed by Sanger sequencing. Intra- and inter-run reproducibility tests showed 100% of efficiency (sensitivity as well as precision rate).
Conclusion: This study provides accurate and reproducible analysis pipeline by targeted sequencing. We demonstrate 100% concordance in mutations identified. Clinical trials using these targeted panels and analysis pipeline will be conducted for the KFDA IVD approval. This study was supported by R&D program of MOTIE/KEIT (10053626), Republic of Korea.
Top

  TBC-23: Medical Examination Data Prediction with Missing Information Using Long Short-Term Memory

Han-Gyu Kim1, Gil-Jin Jang2, Ho-Jin Choi1,*, Minho Kim3, Young-Won Kim3, and Jae-Hun Choi3

1 School of Computing, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, South Korea
2 School of Electronics Engineering, Kyungpook National University, Daegu 41566, South Korea
3 Electronics and Telecommunications Research Institute, Daejeon 34129, South Korea


Abstract
In this work, we use recurrent neural network (RNN) to predict the medical examination data with missing parts. There often exist missing parts in medical examination data due to various human factors, for instance, because human subjects occasionally miss their annual examinations. Such missing parts make it hard to predict the future examination data by machines. Thus, imputation of the missing information is needed for accurate prediction of medical examination data. Among various types of RNNs, we choose long short-term memory (LSTM) to predict the missing information as well as the future medical examination data, as LSTM shows good performance in many relevant applications. In our proposed method, the temporal trajectories of the medical examination measurements are modelled by LSTM with the missed measurements compensated, which is then used to predict the future measurements to be used as diagnosing the diseases of the subjects in advance. We have carried out experiments using a medical examination database of Korean people for 12 consecutive years with 13 medical fields. In this database, 11500 people took the medical check-up every year, and 7400 people missed their examination occasionally. We use complete data to train LSTM, and the data with missing parts are used to evaluate the imputation and future measurement prediction performance. In terms of root mean squared error (RMSE) between the prediction and the actual measurements, the experimental results show that the proposed LSTM network predicts medical examination data much better than the conventional linear regression in most of the examination items.
Top

  TBC-24: ConVarCal Facilitates Robust Massive Parallel Sequencing Variant Calling

Yonglan Zheng1, Alex Rodriguez2, Segun C. Jung2, Toshio F. Yoshimatsu1, Ravi K. Madduri2, Utpal J. Dave2, Ian Foster2, Olufunmilayo I. Olopade1,*

1 Center for Clinical Cancer Genetics, Department of Medicine, The University of Chicago, USA
2 Computation Institute and Argonne National Laboratory, The University of Chicago, USA


Abstract
Background: The vastly increasing implementation of massive parallel sequencing (MPS) in academic and clinical settings demands for reliable and reproducible variant calling methods. Precise detection of single nucleotide variants (SNVs), insertions and deletions (Indels), and structural variants (SVs) are required at both individual and population levels.
Methods: Our MPS variant identification platform, ConVarCal (Confident Variant Calling), compiles multiple tools in a malleable manner using the elastic computing capability of Globus Genomics built upon Amazon Web Services. FASTQ files are submitted for BWA-MEM or Bowtie2 alignment; BAM files generated are subsequently processed by highly parallelized workflows: GATK HaplotypeCaller, Platypus, FreeBayes, SAMtools mpileup, and Atlas2. The output VCF files are normalized, and a set of highly confident variants are obtained through refinement by Consensus Genotyper for ANNOVAR or VEP annotation. SVs are precisely detected with MetaSV workflow that integrates Pindel, BreakDancer, BreakSeq, CNVkit, and Manta. DELLY, LUMPY and CONTRA are also available. For dynamic parallel computing, wrapper script using Swift language was implemented for some callers.
Results: We tested the performance of ConVarCal by analyzing germline targeted sequencing data (1.3Mbp, ave. 260x) of 200 Nigerian breast cancer patients. The entire analysis was completed in a week, but the total processing time varied depending on the configuration and availability of cloud computing resources. ConVarCal confidently identified 25 deleterious SNVs/Indels in 29 subjects, and all have been confirmed experimentally. In addition, users can share and trace the analytic steps; further optimize the operations through adjustment of parameters, combination of job ordering for better parallelization, or properly allocating computing resources; and analyze the performance through visualization of resource-performance plots.
Conclusion: ConVarCal takes full advantage of Globus Genomics for MPS variant calling in a reliable and robust manner. It has great scalability and its modular design allows building additional tools to further enhance the platform.
Top

  TBC-25: Creation and Validation of Metadata Registry based Personal Health Record

Hye Hyeon Kim1, Ju Han Kim1,*

1 Division of Biomedical Informatics, College of Medicine, Seoul National University, Seoul, South Korea

Abstract
Personal health record (PHR) is a collection of information about individual health. It includes patient data that helps each individual and their health care providers manage their health as containing allergies, medications, family history, and so on. However, as kind of snapshot data, PHR has limitation to cover detail of patient data and to represent precise and semantic representation in the limited PHR model. To address this problem, we adopted ISO/IEC 11179 metamodel based MDR in PHR. We first adopted CCD/CCR standard models for representing standard based PHR. And we developed the process of how MDR based PHR is created with five steps; 1) Extracting individual health data from EMR/HER, 2) Determine whether MDR based PHR is developed, 3) Retrieval CDEs for MDR based PHR, 4) Creation and registration of PHR related CDEs, 5) Completion of MDR based PHR. We also developed the process of how MDR based PHR is validated as using value domain information of data element such as data type, min/max value, and so on as including three steps: 1) CCD/CCR XML Schema based validation; 2) CCD/CCR+ XML Schema based validation; 3) MDR based semantic validation. As a result, we specified how MDR based PHR in CCD/CCR model is represented with sample patient data. A data element in the MDR based PHR can be a medium to bring data semantics from rich semantic contents of MDR. It also provides several benefits including rich semantic representation with clear definition, semantic validation for patient data, improving semantic interoperability.
Top

  TBC-26: EasyFormBuilder: Form building tool based on standardized metadata repository to facilitate semantic interoperability

Hyeong Joon Kim1, Hye Hyeon Kim1, Ju Han Kim1,*

1 Division of Biomedical Informatics, College of Medicine, Seoul National University, Seoul, South Korea

Abstract
Clinical document is an effective tool for collecting patient data including demography, family history, and disease history. Though there is HL7 CDA standard to develop standard based clinical documents, clinical documents are developed differently and separately for each physician and for each hospital. So that, there are big variability among the same kind clinical documents such as the same admission notes among different hospitals. Paper or PDF based clinical documents are also problem to exchange and share clinical data. To address these problems, we developed semi-automatic and web-based application to build clinical forms, named EasyFormBuilder, composed by ISO/IEC 11179 based Common Data Elements (CDEs) for enhancing semantic interoperability. The process of generating forms in EasyFormBuilder is summarized as follows; 1) User inserts the basic information about the form into the web page and upload CSV file template from EasyFormBuilder. The template is based items from ISO/IEC 11179 for storing information of CDE. 2) Our tool generate a form by using extracted information from CSV file. 3) The tool searches CDEs stored in metadata registry (MDR) to match questions and makes the ‘List of recommended CDEs’ for enhancing semantic interoperability. 4) For ensuring semantic representation, the user can choose most appropriate CDE in the list, and annotate the CDE to each user-defined question. 5) As last step, it is completed as building XML based form. As utilizing large scale of CDEs in MDR, we can build a forms easily. Through automatic form generation part, XML based form is created, and it gives machine and human readable documents, so it is useful to read, write, and reuse. Another benefit to use our fool is that it give rich semantic contents for each questions annotated CDEs as CDE has precise definition and information of concepts and representation.
Top