Poster Abstracts

  TBC-1: The implications of RBBP6 in various types of cancer

Firdous Khan1,* and Ashley Pretorius1,*

1 University of the Western Cape, South Africa

Abstract
Background: The 250 kDa RBBP6 protein was found to bind both p53 and Rb1 tumor suppressor proteins. In addition, RBBP6 has been associated with multiple biological functions, such as mitosis, mRNA processing, translation and ubiquitination.
Objectives: Using an in silico approach to identify RBBP6 binding partners (BPs). The information will be used to investigate the relation between RBBP6 and its bps and to further probe their in various cancer types.
Materials & Methods: RBBP6 was used as input to identify its BPs. This was followed by expression profiling across several cancer experiments. Lastly promotor content analyses was carried out to establish gene regulatory networks based on functional annotation (FA) and de novo motif prediction.
Results: In the current study 20 bps were identified for RBBP6. Expression profiling revealed RBBP6 and its are BPs differentially expressed in 14 cancers. Whilst FA analyses indicated that they are involved in similar biological processes such as regulation of apoptosis, programmed cell death etc.. . De novo motif discovery revealed 10 regulatory elements present in the promoters of RBBP6 and its BPs.
Discussion: Differential expression in many cancers and the association with the aforementioned FAs indicates a strong implication in cancer progression. The regulatory elements identified are directly linked to the FAs identified, validating the co-expression relationship between RBBP6 and its BPs.
Conclusions: The study showed that RBBP6 and its BPs share FAs, and common regulatory elements, inference can thus be made that they are highly involved in the progression of cancer
Top

  TBC-2: MELLO: Medical Life-Log Ontology

Hye Hyeon Kim1, Soo Youn Lee1, Su Youn Baik1, Kye Hwa Lee1 and Ju Han Kim1,*

1 Seoul National University Biomedical Informatics, Seoul National University College of Medicine, Seoul 110799, Korea

Abstract
Expectation of utilizing quantified-self data in medicine is increasing as accurate and reliable medical monitoring is possible through many body tracking devices. It has led to the development of more lifelogging devices, but also has caused uncontrolled generation of lifelogging term even for the devices having the same function. Computational analysis of lifelogging data has been hampered by lack of adequate and integrated lifelogging terms and related ontology so far. Therefore, we developed a MEdical Life-Log Ontology (MELLO) with over 500 terms to overcome this problem. Based on the core 50 data sets extracted from 25 body tracking devices and related mobile apps, first we searched and extracted enriched lifelogging terms from SNOMED-CT, having scattered lifelogging. Then we classified them manually into 7 major categories with hierarchical structure by three curators. We completed the MELLO as annotating each term with synonyms from UMLS, and definitions from Wikipedia. Our ontology was successfully validated by applying it to two different devices performing the same function. We show that the MELLO is able to integrate the different lifelogging terms with the same semantic for personal lifelogging data analysis.
Top

  TBC-3: Investigation on Gene Expression Patterns of Cardiac Myocyte Hypertrophy using Coexpression Network Analysis

Junbeom Kim1, Jun Hyuk Kang1 and Ho-Jin Choi1,*

1 KAIST, Republic of Korea

Abstract
Heart failure is a complex and multifactorial disease, which threatens one's life. Normal hearts are usually triggered by insults and derived to hypertrophy and heart failure. In this paper, extracting different gene expression patterns between normal, hypertrophy, and heart failure hearts using coexpression network analysis is performed. Generally, differentially expressed gene method is used for this kind of problem. However, conventional method compare between only individual gene, while coexpression network analysis consider correlation between genes and compare between the modules which are sets of genes. The contributions of this work are: 1) Applying coexpression network analysis framework to different target disease, heart failures; 2) Defining a new scheme for identifying and validation of modules. The coexpression network analysis framework is originally applied to hepatocellular carcinoma to extract differentially expressed genes during development of the disease. Here, the new scheme is proposed to distinguish the stages of heart failure.
Top

  TBC-4: Tell me your pathways

Frida Belinky1,*, Gil Stelzer1, Simon Fishilevich1, Shahar Zimmerman1, Marilyn Safran1,2 and Doron Lancet1

1 Departments of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel
2 Departments of Biological Services, Weizmann Institute of Science, Rehovot, Israel


Abstract
A key annotation facet for a gene is the list of biological pathways it belongs to. However, the flat pathway list is of limited utility, due to a high degree of intra- and inter-source redundancy and inconsistency.
Striving to convey an integrated, internally consistent view of biological pathways per gene, we have clustered 3840 pathways from 12 sources into ~1600 super-pathways including singleton pathways.
This resulted in a collection of manageable super-pathways, each with no more than 80 members, and with optimal inter-cluster orthogonality.
Pathway expression, based on averaging gene expression binary vectors, reveals the super-pathway pattern of expression across 16 tissues.
Pathway evolution, inferred from genes orthology, reveals that most of the human pathways evolved mainly in three evolutionary time points:
(1) In the last universal common ancestor (LUCA).
(2) In the ancestor of eukaryotes.
(3) In the ancestor of Metazoa (animals).
Interestingly, super-pathways that are highly expressed in the liver are enriched in group (1), while super-pathways that are highly expressed in the testes are enriched in group (2).
Top

  TBC-5: Computational Morphoproteomics: Inferring Biological Relationships from Resource Description Framework Networks

Dmitriy Shin1,*, Gerald Arthur1,3, Mihail Popescu2,3,4, Dmitry Korkin3,4 and Chi-Ren Shyu3,4

1 University of Missouri, School of Medicine, Department of Pathology and Anatomical Sciences, Columbia, MO 65212, United States
2 University of Missouri, School of Medicine, Department of Health Management and Informatics, Columbia, MO 65212, United States
3 University of Missouri, Graduate School, MU Informatics Institute, Columbia, MO 65211, United States
4 University of Missouri, College of Engineering, Department of Computer Science, Columbia, MO 65211, United States


Abstract
Morphoproteomics is an emerging field aimed at systems-level identification of protein circuitries in a personalized medicine setting. Morphoproteomics is based on comprehensive analysis of immunohistochemical protein expression patterns in individual patient cases. A number of morphoproteomic studies have demonstrated better clinical outcomes and potential to improve therapeutics and diagnostics, also known as theranostics. A standard morphoproteomics practice, however, is heavily dependent on the expert knowledge and is therefore prone to inter- and intra- observer variability, which can undermine its widespread usage.
We propose a computational approach to improve traditional morphoproteomics by utilizing vast amounts of curated biological knowledge. First, we transform this knowledge into Resource Description Framework (RDF) knowledge networks using description logic inference and biological ontologies. Second, inspired by the ideas from the probabilistic causal theory, we introduce a method to traverse these networks and infer the biological mechanisms relevant to the case. Finally, the inferred information is presented in the form of diagrams for clinical decision-making. As a proof-of-concept, we have applied the formalism to the data from a clinical case of Acute Lymphoblastic Leukaemia performed using traditional morphoproteomics. The diagram inferred by our method shows high level of concordance with the human derived morphoproteomic diagram. The correlated expression of AKT, NF-kappa-B (nuclear) and BCL-2 proteins and the activation of an anti-apoptotic mechanism were noted by the experts in this case. The same flow of events was inferred by our computational approach.
We, therefore, conclude that our approach could provide the important advancements to the clinical implementation of morphoproteomics. A comprehensive assessment of the approach with more experimental data will be conducted to further explore its clinical utility.
Top

  TBC-6: A quantitative mixture model for transcriptome prediction

Qing Zhang1, Xiaodan Fan1 and Dianjing Guo1,*

1 The Chinese University of Hong Kong, Hong Kong

Abstract
Although many computational methods have been widely adopted to infer the transcription regulatory networks (TRNs), quantitative models that accurately predict the dynamic behavior of genes based on gene expression data are still in need by both wet-lab experimental design and synthetic biology.
In the present work, we propose a quantitative mixture model for transcriptome inference under a wide range of experimental conditions. Using cross-validation on a E.coli transcriptome data, the prediction power of the proposed model was estimated under various system perturbations, such as, gene knock-out, gene over-expression, and network rewiring. By linking a new experimental condition to the known conditions, the model can be used to reveal the possible functional relationships between different conditions. In addition, the model can also be extended to generate benchmark synthetic transcriptome data for the evaluation of TRN inference algorithms. The good performance of this method allows its wide application in synthetic biology system redesign and in biological experimental design.
Top

  TBC-7: De novo genome sequencing project of Korean native pig:Current status of genome assembly and annotation

Won-Hyong Chung1,*, Namshin Kim1, Kyung-Tai Lee2 and Tae-Hun Kim2

1 Korean Bioinformation Center, Korea Research Institute of Bioscience and Biotechnology, Korea 2 National Institute of Animal Science, Rural Development Administration, Korea

Abstract
De novo genome sequencing of a Korean native pig: current progress and future works Starting from the panda genome sequencing project, genome sequencing using only NGS technology has been widely adapted to mammalian genome sequencing projects. The pig is one of the most important food sources and one of the oldest forms of livestock. Even though two pig breeds, Duroc swine and Mini-pig, were sequenced previously, it is important to sequence local breeds to elucidate their genomic features. Korean native pig assumed to be come to Korea via north China around 2000 years ago. It has long black coarse hair, long straight nose, and small body weight (approximately 70 kg at adult). Here we report the progress of the de novo genome sequencing project to make a reference genome sequence of Korean native pig. We sequenced various kinds of libraries (170 bp, 300 bp, 400 bp, 500 bp and 600 bp insert paired-end; 2 Kbp, 5Kbp, 7Kbp and 10Kbp insert mate-pair) to approximately 136x coverage using Illumina GA IIx and HiSeq 2000 platforms. Small amount of 20Kb long insert library (~0.1x) was sequenced using 454 GS FLX platform. De novo sequence assembly was performed on these sequence sets using AllPaths-LG. After filling gaps using GapCloser in SOAPdenovo2 package, we applied RACA pipeline which curates and rearranges scaffolds using comparative genomic information. This resulted in 357 scaffolds totaling 2.52 Gb with a mean scaffold size of 7 Mb and N50 size of 17.2 Mb covering over 95% of the Duroc swine genome (excluding unplaced scaffolds). The assembly result is superior to the Mini-pig genome’s (1,138,136 scaffolds, N50 size of 5.4 Mb). Korean pig draft genome will be a good resource for genome-wide comparative genomics between pig breeds or novel gene identification.
Top

  TBC-8: Systems Biology Integrative approach uncovers newer molecular targets in Metachromatic Leukodystrophy

Punit Kaur1,*, Parul Sharma1, Sujata Sharma1 and T. P. Singh1

1 Department of Biophysics, All India Institute of Medical Sciences, New Delhi, India

Abstract
Metachromatic Leukodystrophy (MLD) is a neurological disorder caused by deficiency of the enzyme arylsulfatase A (ARSA). This disease impairs the growth or development of the myelin sheath. Mutation or absence of ARSA may cause the accumulation of sulfatides in many tissues of the body, eventually destroying the myelin sheath of the nervous system. The molecular interaction analysis of MLD was carried out using Cytoscape tool and its plug-ins. The functional modules and potential drug targets were identified as highly interconnected sub-graphs in the network. Molecular functions (gene ontology) of these genes were studied using BiNGO implemented in Cytoscape 2.8. DAVID, an online bioinformatics tool, was used for Pathway and disease enrichment analysis to get deeper insight into the molecular mechanism of MLD. The highest ranking sub-network was found to have 126 genes, and in addition to ARSA, four more genes were found to be highly inter-connected, namely SMAD9, PSAP, BMPR2 and UBE3A, which may play a major role in pathogenesis for this disease. The genes which were found to be potential drug targets for this disorder are TAF1, SMAD2, BRCA1, HNF4A, AR, SMAD9, CDC2, RB1, UBC, CDK2, UBB, PSAP, CDC23, MYC, MNAT1, CCNH, CDK7. The MLD initiating genes and other important candidate genes were found to be mostly involved in the Binding process and Catalytic activity. This analysis has lead to the identification of experimentally inadequately explored genes which are currently not reported in MLD physiopathology. Additionally major pathways likely to be affected in MLD include sphingolipid metabolism, lysosome, proteolysis, arrhythmogenic right ventricular cardiomyopathy (ARVC). Thus through a disease enrichment analysis we corroborated that MLD is not only associated with neuronal degeneration but also has probable links with cancer, metabolism and immune system.
Top

  TBC-9: Measuring DNA methylation in large epidemiological prospective studies: an example of a nested case-control study of breast cancer using the Illumina Infinium 450k BeadChip array

Chol-Hee Jung1, Gianluca Severi2, Melissa Southey3, Dallas English4, Andrew Lonie5, Helen Tsimiklis3, John Hopper4, Graham G Giles2 and Laura Baglietto2,*

1 Life Sciences Computation Centre, Victorian Life Sciences Computation Initiative, Carlton, Victoria, 3010, Australia 2 Cancer Epidemiology Centre, Cancer Council of Victoria, Melbourne, Australia 3 Genetic Epidemiology Laboratory, Department of Pathology, The University of Melbourne, Australia 4 Centre for Molecular, Environmental, Genetic and Analytic Epidemiology, School of Population Health, University of Melbourne, Australia 5 Life Sciences Computation Centre, Victorian Life Sciences Computation Initiative, Carlton, Victoria, 3010, Australia

Abstract
DNA methylation is a key epigenetic mechanism that regulates gene expression and is known to be involved in many human diseases including cancer. The development of new technologies to measure genome-wide DNA methylation makes it possible to conduct large epidemiological studies to test multiple hypotheses of association between methylation and disease. The major challenges posed by this type of study include the use of DNA from archival biospecimens of different type (e.g. dried blood spots, lymphocytes, buffy coats), handling missing values and controlling for batch effects. In this paper we discuss these challenges using the example of a prospective case-control study of breast cancer nested within the Melbourne Collaborative Cohort Study.
Top

  TBC-10: Analysis of Functional Impacts on Massive Cancer Mutation Data

Seunghwan Jung1,§, Soobok Joe1,§ and Hojung Nam1,*

1 School of Information and Communications, Gwangju Institute of Science and Technology, 123, Cheomdangwagi-ro, Buk-gu, Gwangju, 500-712, Republic of Korea
§ Equal Contribution


Abstract
A genetic mutation is a change of the nucleotide sequence of the genome of an organism. Mutation can result in several different types of change in sequences: (i) a change in one DNA base pair that results in the substitution of one amino acid for another in the protein made by a gene (missense mutation), (ii) a change in one DNA base pair that makes DNA sequence prematurely signals the cell to stop building a protein (nonsense mutation), (iii) changes the number of DNA bases in a gene by adding a piece of DNA (insertion), and (iv) changes the number of DNA bases by removing a piece of DNA (deletion), and so on.
The importance of the genetic mutations as factors of human diseases has been known for many years. Especially, mutations have a major role in initiation and development of cancer. In general, a common model view defines two classes of mutations in cancer, driver and passenger mutations. A driver mutation is causally implicated in oncogenesis. It has conferred growth advantage on the cancer cell and has been positively selected in the microenvironment of the tissue in which the cancer arises. In the other hand, a passenger mutation has no contribution to cancer development. In this sense, discovering functionally important mutations, including clear ‘drivers’ is one goal of genome resequencing studies.
Thus, in this work, we analysis massive cancer mutation data sets by using the conventional analysis tools to give statistics of how many mutations detected in cancer could have potential to be classified into driver mutations, and their patterns in various types of cancer. Here we used cancer mutation information collected from the COSMIC database, The Cancer Cell Line Encyclopedia (CCLE), and The Cancer Genome Atlas (TCGA) project.
Top

  TBC-11: Systematic and integrative analysis of large gene/protein interaction network for Rett syndrome

Parul Sharma1, Sujata Sharma1, T.P Singh1 and Punit Kaur1,*

1 Department of Biophysics, All India Institute of Medical Sciences, New Delhi, India

Abstract
Rett syndrome is a neurodevelopmental disorder of the grey matter of the brain that exclusively affects females. Mutations in the methyl-CpG-binding protein 2 gene (MECP2) found on the X-chromosomes is the major cause of Rett Syndrome. Very few drugs with low efficacy have been reported in the literature for Rett syndrome. Additionally, there exists a complete lack of knowledge about its gene Co-Expression network and pathogenesis. System networks are a central paradigm in biology which help in identifying new drug targets which in turn can generate a greater in-depth understanding of the mechanism of diseases. In an effort to explore drug targets, we have implemented a computational platform that integrates gene-gene interactions, differentially expressed genome and literature mining data to build comprehensive networks for drug-target identification. We used Cytoscape and its various plugins for prediction of the probable drug targets, to study the expression of genes in various biological processes and to identify highly interconnected clusters of genes. We have not only confirmed the well known relationship between this syndrome and neurodevelopmental disorder but also identified statistically significant relationships with other biological processes such as cell apoptosis, metabolic processes and many signalling pathways that affect the nervous system, musculo-skeletal system, respiratory system, excretory system and circulatory system. These multi-system complex thus play crucial role in the onset and pathogenesis of Rett Syndrome. Gene Ontology (GO) enrichment analysis was performed in all the obtained clusters. GO analysis exposed the significant molecular functions such as histone deacetylase binding, transcription factor binding and transcriptional co-repressor activity which were found to be associated with the genes that are known to play an important role. It also revealed various important biological functions associated with the highly interconnected hubs in the network. We succeeded in detecting some well known related genes such as MECP2, HDAC1, SIN3A, DNMT1, RCOR1 and NTNG1 together with we also identified GD1, TNF, PAK1, ADIPOQ and CAP2 that have been poorly explored or unknown in the current state of art of Rett syndrome.
Top

  TBC-12: Identifying Cross-Species Simple Sequence Repeat Biomarkers

Tun-Wen Pai1,*

1 Department of Computer Science and Engineering, National Taiwan Ocean University, Keelung 20224, Taiwan

Abstract
Simple sequence repeats (SSRs) are DNA segments with continuously repeated basic pattern of length from one to six nucleotides. SSRs are not only used as genetic markers in evolutionary studies but also play an important role in gene regulatory activities. Reports have revealed that SSR mutation or expansion may cause earlier symptoms of genetic diseases and lead to serious illness. Therefore, identifying and predicting functional SSRs through cross-species comparison are helpful for understanding the evolutionary mechanisms and associations between genes and functional SSRs, and the identified important biomarkers could be applied to further studies in genetic diseases, gene therapy, and breeding for various species. Due to the abundant number of SSRs, it is difficult to identify functional SSRs by featuring the only information of length and basic pattern from a single genome dataset. Hence, this study proposed a cross-species comparative approach and integrated with a tag cloud visualization technique for SSR biomarker identification. Tag Cloud representation utilizes different font sizes and colors for displaying the relationships between genes and retrieved SSRs. Here, the SSR database was established by selecting 12 frequently used model species which are clustered into mammal and marine species clusters. Users are required to provide a set of genes or simply input keywords for gene selection from the designed system automatically. The proposed system could identify those extra conserved or unique SSRs through cross-species orthologous genes comparison. To demonstrate system performance, four testing gene sets were applied: (1) all orthologous genes from 12 model species and each gene possessing sequence identities higher than 80% compared with human genome; (2) 17 skeletal development related genes among mammal and marine species clusters; (3) a functional related gene set from a GO term of “embryonic cranial skeleton morphogenesis”; (4) a gene set of all well-known genetic diseases associated with SSR biomarkers. From these testing gene datasets, the system provided effective and efficient approaches for identifying conserved and exclusive SSR biomarker candidates through a friendly designed interface. Besides, the last testing dataset successfully demonstrated that the well-known genetic diseases were indeed associated with the retrieved ultra-conserved SSR biomarkers. Through statistical analysis and enhanced tag cloud representation on functional related gene sets and cross-species clusters, it can be noticed that the patterns, loci, colors, and sizes of identified SSR tags possess high correlations with gene functions, SSR pattern qualities and the numbers of conserved species.
Top

  TBC-13: Expression profiling using RNA-seq for identifying developmentally regulated genes in Daphnia pulex

Haein An1 and Chang-Bae Kim1,*

1 Department of Life Science, Sangmyung University, Seoul 110743, Korea

Abstract
To identify genes controlling the developmental stages of Daphnia pulex, we determined gene expression profiles in three developmental stages, late embryo, 1st~3rd instars and 4th~5th instars by using RNA-seq technique. Gene expressions in 1st~3rd instars were more similar to 4th~5th instars than to the late embryo. We suggested that the most distinct stage in the developmental process was late embryo. Differentially expressed genes (DEGs) were discovered by comparing gene expressions of the late embryo with those of the post-embryonic stages. 3,562 genes were up-regulated and 3,332 genes were down-regulated in the embryonic stage. A hierarchical clustering of the DEGs generated two clusters: up-regulated genes and down-regulated genes in the embryonic stage. The DEGs were enriched with GO categories. Late embryo had higher activity in synapse, transcription regulator activity and molecular transducer activity. In the post-embryonic stages, membrane-enclosed lumen, envelop, reproduction and others were highly expressed. Genomic studies from multiple developmental stages are needed for elucidating developmental mechanisms.
Top

  TBC-14: Loss of the Heterochromatic X Chromosome in High Grade Ovarian Serous Carcinoma

Jun Kang1,*, Hee Jin Lee1, Ho Yun Lee1, Jeong Hee Lee1, Hajeong Lee1, Guhyun Kang1 and Joon Seon Song1

1 Training Program of Certified Physicians in BioMedical Informatics (CPBMI), Korea

Abstract
Introduction: Loss of the heterochromatic X chromosome occurs in certain breast and ovarian cancers. Mitotic segregation errors was thought to be most common mechanism. However, genome-wide deficits in heterochromatin maintenance and dysfunction of BRCA1 were suggested as alternative mechanisms of loss of the heterochromatic X chromosome. We investigated the correlation between the status of loss of the heterochromatic X chromosome and genome wide methylation status and BRCA mutation in ovarian high grade serous carcinoma.
Methods: We analysed X choromosome heterochromatin indicators including XIST and methylation at X chromosome in 164 ovarian high grade serous carcinoma of TCGA data (normalized RPKM of IlluminaHiSeq_RNASeqV2 for XIST level and beta value of Illumina Human Methylation 27k for methylation value). Genome wide methylation status and BRCA mutation were analysed with X chromosome heterochromatic status.
Results: XIST RNA varies in ovarian high grade serous carcinoma. After sorted by RPKM of XIST, X chromosome methylation pattern was vaguely divided into two groups at the level of 1592 RPKM of XIST. Low XIST RNA group accompanied hypomethylation of X chromosome, but not somatic chromosomes. There is no differences BRCA1 mutation between the two groups.
Conclusion: Some of high grade ovarian serous carcinomas have loss of heterochromatic X chromosome. Genome-deficits in heterochromatic maintenance or BRCA1 dysfunction seem not main mechanisms of loss of the heterochromatic X chromosome.
Top

  TBC-15: How should we normalize laboratory results from multiple institutes to combine clinical data for unbiased analysis?

Dukyong Yoon1, Dong Ki Kim2, Eun-Young Jung3, Sean Hennessy4, Hyung Jin Choi5, Ju Han Kim6 and Rae Woong Park1,*

1 Department of Biomedical Informatics, Ajou University School of Medicine, Suwon 443749, Korea
2 Department of Internal Medicine, Seoul National University College of Medicine, Seoul 110799, Korea
3 Centre for u-Healthcare, Gachon Univ. Gil Hospital, Incheon 405760, Korea
4 Center for Clinical Epidemiology and Biostatistics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA 19104, USA
5 Department of Internal Medicine, Chungbuk National University College of Medicine, Cheongju 361711, Korea
6 Seoul National University Biomedical Informatics (SNUBI), Seoul 110799, Korea


Abstract
Combining clinical data, including laboratory results, from multiple institutions enables large-scale epidemiological studies. Although the increased adoption of electronic health record systems has facilitated this, there is no method for normalizing the combined clinical data from multiple institutes. Since the patient population at each hospital might differ, simply combining the data without considering population characteristics can lead to biased results. This study applied an age-stratification strategy to compensate for differences in age structure. To demonstrate the effect of age stratification, clinical laboratory results collected at two Korean tertiary teaching hospitals over a 5-year period were used. The hemoglobin level, which decreases with age, was selected for study. The hemoglobin readings were stratified by age from 0 to 79 years at 1-year intervals according to when the patient was tested. The results for patients 80 years old or older were aggregated as one group. For each group, the test results were normalized by standardization and then the data were recombined. The degree of normalization (distance) was measured using Kullback-Leibler divergence and the results were compared with normalized data standardized without using an age-stratification strategy. Misclassifications count, changes from a normal/abnormal state after normalization was also compared. As a result, the distance of the hemoglobin level distribution between the two hospitals was closer for the age-stratified data than for the normalized non-age-stratified data in both males and females (males, 0.051 vs. 0.345; females, 0.010 vs. 0.16). There were also fewer misclassifications in the age-stratified data: 167,737 (8.3%) vs. 205,554 (10.2%) for males and 223,683 (12.7%) vs. 234,921 (13.3%) for females. The difference in the laboratory data distribution between the two hospitals was normalized well when the population characteristics of the hospitals were considered. The consideration of characters in addition to age will provide us with more elaborately normalized distributions.
Top

  TBC-16: miSeqaid: A pipeline for the analysis of microRNA sequencing data

Jee Yeon Heo1, Hae-Seok Eo1, Yong-Jin Choi1 and Hyung-Seok Choi1,*

1 BioIT Team, Future IT R&D Laboratory, LGE Advanced Research Institute, Seocho-gu, Seoul 137-724, Korea

Abstract
Small non-coding RNAs (ncRNAs) are functional RNA molecules and have a variety of processes from cell development and differentiation, stress responses to carcinogenesis by regulating gene expression. Currently, next generation sequencing (NGS) has been extensively used for small ncRNA profiling, especially for microRNAs (miRNAs), and several tools have been developed for analysing miRNAs expression profiles and predicting novel miRNAs. Here, we present a novel standalone tool, miSeqaid, for analysing miRNAs expression profiles and predicting novel miRNAs from NGS data. miSeqaid consists of four steps - quality control, read mapping, expression analysis and novel miRNA prediction. In step 1, 3’/5’ adaptor sequences and contaminated sequences are trimmed and low quality and short reads are removed. In step 2, cleaned sequences are mapped to sequences of several categories (miRBase, RNA, Rfam, Repeat, Genome) using the Bowtie and Blast program. In step 3, miRNA expression values are normalized by RPM (Reads per Million) or quantile normalization. Subsequently, differentially expressed miRNAs are identified using Fisher’s exact test or Wilcoxon-Mann-Whitey (WMW) test. P-values were adjusted for solving the multiple testing probleum using the Flase discovery Rate (FDR) and Bonferroni correction. In step 4, sequences that could be mapped to the reference genome but not assigned to the known miRNAs are used in the prediction of novel miRNAs. For the prediction of novel miRNAs, RNAfold, which have showed best performances on the calculation of secondary structures, is adapted. miSeqaid generates various result files, such as summary reports and analysis images including length distribution, read classification, Genome mapping, Repeat mapping, Rfam mapping, RNA mapping, expression analysis, novel miRNA prediction and so on. This tool is implemented by using PERL and R languases and Gnuplot was used to plot the analysis image.
Top

  TBC-17: High-order epistatic interaction detection using clique finding algorithm in genome-wide association studies

Hyun-Hwan Jeong1, Sangseob Leem1 and Kyubum Wee1,*

1 Department of Information and Computer Engineering, Ajou University, Suwon, S. Korea

Abstract
In recent years many studies have been proposed to detect association between multiple SNPs and complex diseases in case-control studies. However, most of the studies are not competent in detecting high-order epistatic interactions in genome-wide association studies (GWAS). Those methods are either only for two-way interaction or unable to cope with heavy computational burden of processing large-scale genotype data for detecting interactions of degree 3 or higher.
We propose a new method to find high-order epistatic interaction using clique-finding algorithm in a graph. The method runs as follows: (1) From every possible pair of SNPs, collect the pairs of SNPs that has significant mutual information value. Mutual information is between a pair of SNPs and the disease status. (2) Construct a graph from the collection of pair of SNPs. The vertices represent SNPs, and the edges represent the collected pairs of SNPs. (3) Find every possible clique in the graph and compute mutual information value of the SNPs in the clique. (4) Finally, sort the list of cliques that are found in step (3) by the mutual information value.
Our proposed method shows better performance than previous methods on simulated data. We also show that the method is feasible for large-scale genotype case-control data in real world. The method detects several instances of significant high-order epistatic interaction for coronary artery disease (CAD) case-control data that is provided from Wellcome Trust Case Control Consortium (WTCCC).
Top

  TBC-18: Rediscovery rate estimation for assessing the validation of significant findings in high-throughput studies

Andrea Ganna1, Donghwan Lee1, Erik Ingelsson2 and Yudi Pawitan1,*

1 Karolinska Institutet, Sweden 2 Uppsala University, Sweden

Abstract
It is common and advised practice in biomedical research to validate experimental or observational findings in a population different from the one where the findings were initially assessed. This practice increases the generalizability of the results and decreases the likelihood to report false positive findings. However, the question of what constitutes a successful validation has not been addressed rigorously.
We introduce a new measure called rediscovery rate (RDR) that quantifies the proportion of significant findings from a training sample that are replicated in a validation sample, and illustrate the benefits of using this measure for planning and assessing validation studies. In high-throughput studies, we show that the RDR is a function of false positive rate and power in both the training and validation samples. We derive its estimate based on the training data, assuming that the test statistics follow a mixture distribution. Furthermore, we explain how the RDR is connected to the power of the validation study in the single hypotheses testing and to the Winner’s curse bias problem. We foresee two main applications. First, if the validation study has not yet been performed, the RDR can be used to decide the optimal combination between the proportion of findings taken forward to validation and the size of the validation study. Second, if a validation study has already been done, the RDR estimated using the training data can be compared to the observed RDR from the validation data: hence assess the success of the validation study. We use simulated data and real examples from metabolomics experiments in two large studies to illustrate the application of the RDR concept in high-throughput data analyses.
Top

  TBC-19: Molecular Subtyping of Breast Cancer using RNA-Sequence Data

Setia Pramana1,*, Stefano Calza1,2, Chen Suo1, Fredrik Jonsson1 and Yudi Pawitan1,*

1 Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm 2 Molecular and Translational Medicine, University of Brescia, Italy

Abstract
Molecular classification of breast cancer into clinically relevant subtypes would help to improve diagnosis and adjuvant-treatment decisions. Given that more and more women are diagnosed with early-stage cancers, better specificity in treatment decision would save many of them from unnecessary side effects of the adjuvant treatment. However, cancer classification is still a big challenge. Rapid improvements in molecular analysis, e.g. by the application of next generation sequencing of cancer genomes, have the potential to bring deeper understanding as well as new biomarkers discoveries of the disease. The aim of this study is to use RNA-sequence data to classify breast-cancer patients into known molecular subtypes and get a deeper understanding of the disease.
RNA-seq data were generated from 329 breast cancer samples obtained from The Cancer Genome Atlas (TCGA) project. Based on the gene-level FPKM, a supervised classification for the RNA-seq data was performed by using k-nearest neighbour (k-NN) approach with Swedish breast-cancer data obtained from a classical microarray platform as training data (n=369). We selected 107 genes providing highly significant concordance rate (87%) between the supervised k-NN classification and the K-means unsupervised clustering within the TCGA samples. Most of the samples were classified as luminal subtypes (luminal A 35% and luminal B 21%), and the rest are basal (15%), ERBB2 (16 %) and normal like (13 %) subtypes. Our study shows that we can integrate gene expression from different platforms for molecular subtype discovery. The subtype assigned can be used later to obtain novel subtype-associated genes based on the RNA-seq data using all genes.
Top

  TBC-20: Application of pathway descriptors to detect similarities between human diseases

Alexander Ivliev1, Marina Bessarabova1 and Yuri Nikolsky1,*

1 Thomson Reuters, IP & Science. 5901 Priestly Dr., Carlsbad, CA 92008, USA

Abstract
Identification of molecular alterations (biomarkers) shared between distinct diseases is important for understanding of the underlying mechanisms and diversity of human pathologies and such applications as patients’ stratification, translational research, precision medicine and drug repositioning. Standard computational approaches for detection of disease-to-disease similarities largely focus on gene level information, i.e. identification of genes and variants shared by the diseases. Unfortunately, clinically similar diseases or even individual cases of the same disease (e.g. cancer patient samples) can be strikingly different in gene expression and genetic alterations patterns. Nevertheless, one can detect higher-order similarities between human clinical phenotypes at the level of biological pathways. Detection of disease similarities at the level of pathways represents a broad field for data mining which largely remains unexplored by the previous studies.
Top

  TBC-21: Development of microarray analysis automation system in Cytoscape plugin

Kyung-Sik Ha1,*, Jin-Muk Lim1 and Hong-Gee Kim1

1 Biomedical Knowledge Engineering Lab, Seoul National University, Korea

Abstract
In this study, we made a Cytoscape plugin that allows users to handle more easily the analysis of a microarray data. This plugin has been to automate the process of selecting only the probe with a significant value from microarray raw data. And this process have been made use the packages provided by R. Plugin that by using the protein-protein interaction database, it can now be represented as a network of relationships with other genes and gene the user has been selected. This whole process has been developed on the JAVA platform. This plugin can be accessed easier analysis of microarray data. Then, the user expected to be able to easily draw the network relationship of genes. This research was supported by MSIP (the Ministry of Science, ICT and Future Planning), Korea, under the IT-CRSP(IT Convergence Research Support Program) (NIPA-2013-H0401-13-1001) supervised by the NIPA(National IT Industry Promotion Agency)
Top

  TBC-22: Plasma metabolites as Alzheimer’s Disease (AD) biomarkers

Petroula Proitsi1,*, Richard Dobson1, Cristina Legido-Quigley2 and John Powell1

1 King's College London, Institute of Psychiatry, United Kingdom 2 King's College London, Institute of Pharmaceutical Science, United Kingdon

Abstract
Introduction: There is a need for a better understanding of the biological mechanisms underlying AD and the identification of biomarkers for early clinical diagnosis, progression and conversion. Metabolites are the final product of interactions between gene expression, protein expression, and the cellular environment and represent a more accurate approximation of the phenotype of an organism and complex biological processes.
Aims: The aim of this project is 1) to characterise the plasma metabolic profiles of AD patients, subjects with mild cognitive impairment (MCI), and controls and to utilize these metabolic profiles in order to identify diagnostic, conversion and progression biomarkers; 2) to integrate metabolic profiles with genetic, transcriptomic and proteomic data in order to improve classification/prediction.
Methods: Ultra Performance Liquid Chromatograpy/Mass Spectrometry was performed on plasma samples from 35 AD, 43 MCI & 45 controls (MassLynx- Waters). Samples were divided into Train (2/3 sample) and Test (1/3 sample) datasets. Machine learning approaches were used to classify AD, MCI & CTL.The analytes which predicted disease were identified and investigated further.
Results and conclusions: 1878 analytes were extracted and raw values normalized to the whole area mean. Following removal of analytes with <80% data, transformation and imputation, 573 analytes were analysed. The train dataset was used to tune the parameters of L(1)-L(2)-regularized regression (elastic net) using internal crossvalidation and the model was evaluated on the independent test set. A set of 34 analytes predicted AD with accuracy >75%. Including APOE, the most established AD risk gene, in the model increased accuracy to >83%. Logistic regression analyses showed that some of the individual analytes were associated with AD with p<10-4. Most analytes were associated with changes in lipid metabolism. Results for AD-MCI and MCI-CTL classifier’s showed lower accuracy (<75%). Data integration using genetic, expression and protein data will improve the classifier performance.
Top

  TBC-23: Insight into the Binding Mode Analysis of Combinatorial Cancer Drugs with Cytochrome P450

Dhanusha Yesudhas1,§, Suresh Panneerselvam1,§, Shaherin Basith1 and Sangdun Choi1,*

1 Department of Molecular Science & Technology, Ajou University, Suwon, 443-749, Republic of Korea
§ Equal contribution


Abstract
Combinatorial drug therapy is becoming a promising strategy in the treatment of cancer. However, the patient has an increased risk of suffering from an adverse drug-drug interaction (DDI). DDI is a situation where one drug inhibits the metabolism of another drug, thereby leading to an increased plasma concentration of either drug. The poor metabolism of the drug molecules by cytochrome P450 is one of the reasons for the drug-drug interaction in the combinatorial therapy. However, we have limited knowledge about the interaction of drug molecules with cytochrome P450. Hence, we have utilized computational docking to predict the drug-binding mode and assessed its stability using molecular dynamic simulation studies. Nine cancer drugs which are used in combinatorial therapy were selected from National Cancer Institute (NCI) Database. Previous studies have shown that CYP3A4 isoform metabolizes these drug molecules. Therefore, we performed docking for the selected drug molecules with CYP3A4. One hundred docking structures were generated for each drug molecule. Hydrogen bond analyses of molecular dynamic simulations were used to confirm the selected binding mode of drug molecules. The predicted binding modes of the drugs were found to have good correlations with the available experimental data. These studies will be useful for new drug development and also provide valuable insights in the metabolism of cancer drugs.
Top

  TBC-24: Phasing haplotype of a single individual by evolutionary algorithm

Je-Keun Rhee1, Honglan Li2, Byoung-Tak Zhang1,3, Kyu-Baek Hwang2 and Soo-Yong Shin4,*

1 Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 151-742, Korea
2 School of Computer Science and Engineering, Soongsil University, Seoul 156-743, Korea
3 School of Computer Science and Engineering, Seoul National University, Seoul 151- 744, Korea
4 Department of Biomedical Informatics, Asan Medical Center, Seoul 138-736, Korea


Abstract
Although lots of genetic variations have been identified successfully, haplotype information which is a combination of alleles at adjacent locations on the chromosome can provide much crucial knowledge for whole-genome association studies. Previously, the haplotype were inferred from genotype information of population. Recently, with development of high-throughput sequencing (HTS) technologies, the approach to find haploid of a single individual have been drawn attention. Here, we present an evolutionary algorithm to assemble the haplotype of a single individual by combining its sequence reads. Based on heterozygous single nucleotide polymorphisms (SNPs), the haplotype phasing problem can be considered as a combinatorial optimization problem, and the evolutionary algorithm can effectively solve the computationally complex problem. We applied the proposed method to real whole-genome sequencing datasets from NA12878. The experimental results show our proposed approach can practically reconstruct the haplotype.
Top

  TBC-25: Biological network inference for allelic differences between familial Creutzfeldt–Jakob disease (fCJD) patients with E200K and Healthy individuals

Sol Moe Lee1,§, Myungguen Chung2,§, Kyu Jam Hwang1, Young Ran Ju1, Jae Wook Hyeon1, Jun Sun Park1, Chi-Kyeong Kim1, Sangho Choi1, Jeongmin Lee1 and Su Yeon Kim1,*

1 Division of Zoonoses, Center for Immunology and Pathology, National Institute of Health, Korea Centers for Disease Control and Prevention, Cheongwon-gun, Chungcheongbuk-do 363-700, Republic of Korea
2 Division of Bio-Medical Informatics, Center for Genome Science, National Institute of Health, Korea Centers for Disease Control and Prevention, Cheongwon-gun, Chungcheongbuk-do 363-700, Republic of Korea
§ Equal contribution


Abstract
The human prion diseases are caused by an abnormal accumulation of misfolded prion protein in the brain. Inherited prion diseases including familial Creutzfeldt-Jakob disease (fCJD) are associated with the mutations of prion protein gene (PRNP). The glutamate to lysine substitution at codon 200 (E200K) in PRNP is the most common pathogenic mutation causing fCJD in the world, and a few cases with E200K have been reported annually in Korea. E200K pathogenic mutation alone is not regarded sufficient to cause prion diseases and unidentified necessary factors have been proposed to explain penetrance of E200K-dependant fCJD. In our previous study, a total 19 genes showed significant differences of genotypes between fCJD patients with E200K and non-CJD individuals. In this study, 19 genes were analyzed to identify biological pathways and relationship among the proteins encoded by the genes. Protein–protein interactions (PPIs) among the proteins encoded by 19 genes through the exome sequencing study were identified using the Michigan Molecular Interaction (MiMI) database and Prion Disease Database (PDDB), and then visualized using Cytoscape v2.8.3. Biological interactions were identified among 8 genes. All of them were linked not by direct interaction, but by 14 interactomes (PLG, TAF1, FRS3, etc). Biological interactions identified by PPI network were about complement and coagulation cascades, lysine degradation, neurodegenerative diseases, and so on. Our results implied that there are a possible co-regulation mechanism and candidate necessary factors of fCJD with E200K. These biological network data can be used for further investigation on the mechanisms of the genetical prion diseases.
Top

  TBC-26: A meta-analysis of pharmacogenetic studies of ABC and SLC transporters among cancer patients

Chulbum Park1, Se Mi Lee2, Seong Eun Park3 and Ji-Yeob Choi1,*

1 Seoul National University, Korea
2 Chonnam National University, Korea
3 Duksung women's University, Korea


Abstract
Membrane transporters can be major determinants of pharmacokinetic profiles of anticancer drugs. A meta-analysis was conducted to investigate the association of ATP-binding cassette (ABC) and solute carrier (SLC) transporter genetic polymorphisms with pharmacogenetic outcomes until Jan, 2012. Eligible studies involved cancer patients and compared genetic variants in the ABC and SLC transporters with information anticancer drugs and reported one of the following outcomes: overall survival, progression-free survival, response rate or efficacy, drug toxicity and pharmacokinetic parameters. A total of 158 publications were identified, of which 33 were deemed eligible for inclusion. For efficacy, 6 genes (ABCB1, ABCC1, ABCC2, ABCG2, SLC28A1 and SLC28A2) with 31 polymorphisms were analyzed and any gene was not significantly associated with the response. When stratified by cancer sites or anticancer drugs, ABCB1 variants decreased the risk of resistant rate among colorectal cancer (OR=0.67, 95% CI=0.47~0.96 for 5 reports) and among patients treated with nucleotide analogue (5FU or gemcitabine) (OR=0.67, 95% CI=0.52~0.86 for 11 reports). For toxicity, 5 genes (ABCB1, ABCC2, ABCC4, ABCG2, SLCO1B1) with 17 polymorphisms were analyzed and variants of ABCB1 were significantly associated with drug toxicity overall (OR=0.86, 95% CI=0.74~0.99). Any variants of ABC transporters also decreased the risk of drug toxicity, especially for GI related toxicity (OR=0.79, 95% CI=0.67~0.94 for 19 reports). For survival, patients with any variants of ABCB1 showed poor progression free survival compared to patients with wild types (HR=1.86, 95% CI=1.06~3.25). For pharmacokinetics, patients with ABCB1 variant homozygotes showed higher AUC (SMD=1.75, 95% CI=-0.01~3.51, p=0.051) and lower clearance (SMD=-4.46, 95% CI=-7.05~-1.86, p=0.001) compared to patients with wild types. Variants of ABC transporters were significantly associated with improved pharmacogenetic outcomes of anticancer drugs in a meta-analysis of multiple cancer sites.
Top

  TBC-27: The value of controls in peak calling from ChIP-seq experiments

Fabian Buske1,*, Phillippa Taberlay1 and Susan Clark1

1 Garvan Institute of Medical Research, Australia

Abstract
ChIP-seq is the method of choice for interrogating the DNA occupancy of proteins involved in gene regulation. Reliable assessment of ChIP enrichment (peak calling) requires sequencing of a matched control library (e.g. input DNA) to compensate for biases (copy-number alterations, sequence content, chromatin structure, antibody quality). However, for economic reasons matched control libraries are often sequenced at lower depth than the ChIP enriched sample or are not sequenced at all. It is therefore important to address if input sequenced controls are required for the accurate interpretation of ChIP-seq data and if so should input data be from matched control libraries or will unmatched input libraries suffice.
We investigated the effect of input sequencing controls on peak calling by contrasting matched controls with libraries generated from unmatched biological replicates or obtained from ENCODE project using the same cell lines. We considered the peaks generated from matched controls as the gold standard and assessed the accuracy of unmatched controls from the same cell lines to call the equivalent enriched regions with at least 50% overlap. We observe for all three interrogated histone marks (H3K9K14ac, H3K4me1, H3K27ac) that high accuracy can be achieved with unmatched input controls depending on the sequencing depth of the control library (Peakranger accuracy of 0.99 vs 0.97 vs 0.79 using 21, 13 or 5.5 mil. mapped reads, respectively).
Furthermore, investigating the base pair overlap of the enriched regions, we observe that the algorithm of choice has a greater impact than utilizing an unmatched control (average Jaccard similarity coefficient of 0.24 between Peakranger, Homer and Chromablocks using matched controls and 0.76 between matched versus unmatched control libraries using the same algorithm).
We therefore conclude that it is reasonable to use an unmatched control even from public data if there is high sequencing coverage. This has important ramifications in the processing of ChIP data using different antibodies to interrogate the same cell type.
Top

  TBC-28: NGSANE - A HPC Processing Framework for Terabyte-scale Sequencing Data

Fabian Buske1,*, Susan Clark1 and Denis Bauer2,*

1 Epigenetics Program, Cancer Research Division, Garvan Institute of Medical Research, Kinghorn Cancer Centre, Darlinghurst City, NSW 2010, Australia 2 Computational Informatics, CSIRO, North Ryde, NSW 2113, Australia

Abstract
The first steps of analysing sequencing data (2GS,NGS) have entered a transitional period where analysis steps can be automated in standardised pipelines. With constantly evolving technology, academic software will remain the methods of choice for cutting-edge data analysis. This makes setting-up and maintaining analysis pipelines labour intensive, as most tools do not comply with good software-development practice (i.e. good documentation, legacy support).
Many GUI-enabled tools, like Galaxy, address this issue but are commonly tailored to cater for biologist with only small numbers of experiments. However, with increasing study sizes, the capability of leveraging high performance compute clusters and processing libraries in parallel is paramount.
NGSANE is a lightweight, Linux-based, HPC-enabled framework that minimizes overhead for set-up and processing of new projects yet maintains full flexibility of custom scripting when processing raw sequence data. The framework separates project specific data from commonly used annotation files, scripts and software suites. NGSANE supports Sun-Grid-Engine and Portable-Batch-System job scheduling and can be operated in different modes for development and production thus enabling efficient and flexible processing of NGS data. It currently includes pipelines for adapter trimming, read mapping, peak calling, motif discovery, transcript assembly, variant calling and chromatin conformation analysis by tapping into various
Top

  TBC-29: Evolution of IgE Sensitization Profiles for Timothy Grass and House Dust Mite Allergens

Hans-Joachim Sonntag1, Mattia Prosperi2,*, Iain Buchan2, Angela Simpson2 and Adnan Custovic2

1 University of York, United Kingdom 2 University of Manchester, United Kingdom

Abstract
The study of immune responses to allergens has been revolutionised by the routine availability of component resolved diagnostics (ImmunoCAP ISAC®) that measure the specific IgE response towards many allergen components, including timothy grass and house dust mite. Using latent class analysis on data from the population-based Manchester asthma & allergy study (1,186 children followed up from birth, 899 undergoing ISAC® IgE testing, 235 with full longitudinal information at age 5, 8 and 11 years), we confirmed the hypothesis of a “molecular spreading” pathway for timothy grass allergens, where sensitisation to the lead allergen Phl p 1 precedes a progression towards a full sensitisation to other Phl p components, with serum concentration increasing over time. Conversely, in the case of house dust mite, we found different pathways related to two distinct allergen groups (Der f 1 & Der p 1) and (Der f 2 & Der p 2). Longitudinally, from age 5 to 11 years, we could either observe a co-development trajectory of sensitisation towards these two groups or stabilisation towards a single group over time. Logistic regression was employed to demonstrate that all house dust mite sensitisation trajectories are significantly (0.05 level) associated with an increased risk of asthma at age 11 (odds ratios ranging from 4.5 to 8.6 as compared to the non-sensitisation pathway). Interestingly, there was a significant difference between the Der f/p 1 pathway and the Der f/p 2 pathway as predictors of eczema, with the former having a significant odds ratio of 3.2 [95%CI 1.2–8.0] as compared to the non-sensitisation pathway. Regression analysis with house dust mite exposure from early ages (<2 years) confirmed the association with longitudinal allergen trajectories, while house dust mite concentration at later ages showed no significant association with any of the three house dust mite sensitization trajectories.
Top

  TBC-30: Gene Expression Similarity between Breast and Prostate Cancer

Darius Coelho1,2 and Lee Sael1,2,*

1 Department of Computer Science, State University of New York , Incheon 406840, Korea
2 Department of Computer Science, Stony Brook University, Stony Brook, NY 11794-4400


Abstract
Epidemiologic and phenotypic evidences indicate that breast and prostate cancer have high pathological similarities. Genes that are affected by both breast and prostate cancer are investigated to gain knowledge of the similarity between their pathology. Gene expression data extracted from RNA-seq experiment for breast invasive carcinoma (BRCA) and prostate adenocarcinoma (PRAD) retrieve from TCGA database (http://tcga-data.nci.nih.gov/) were analyzed. Iterative SVM-based ensemble gene selection method was used to select genes that discriminate cancer samples from normal samples. Iterative SVM-based gene selection methods enable correlated gene expressions to be considered simultaneously and ensemble approach stabilizes the selection. The selected gene sets were able to achieve classification accuracy of 90\% for BRCA and 93\% for PRAD. However, only two genes, Transglutaminase 4 (TGM4) and complement component 4A (C4A), were common in the BRCA and PRAD gene set. Based on the Ingenuity Pathways Analysis, although there are no specific associations known to the breast or prostate cancer, TGM4 has known association with the adenocarcinoma in general. Also, C4A do not have known association with the breast or prostate cancer. However, both genes have directly and/or indirectly association with multiple types of cancer and possibilities of being a drug target for both breast and prostate cancer could be found through guilt by association in pathway analysis. Although this information can be important in itself, since the two genes may likely be common genes associated with various types of cancer, further study is needed to confirm that breast and prostate cancer have high pathological similarity.
Top

  TBC-31: Systematic discovery of disease-associated miRNAs using an integrated network approach

Yukyung Jun1, Kyungsun Choi2, Sanghyuk Lee1,* and Wankyu Kim1,*

1 Ewha Research Center for Systems Biology (ERCSB), Ewha Womans University, Korea
2 Bio and Brain Engineering, KAIST, Korea


Abstract
miRNAs are thought to be promising diagnostic and therapeutic targets due to their frequent dysregulation in many human diseases. We develop a method which predicts disease-miRNA associations systematically, based on standard gene sets analysis (GSA) using an extensive series of gene signatures for 2,078 human diseases and 1,432 miRNAs. More than 30 types of independent evidences are integrated by performing ~24 million GSA comparisons. As a result, a generic disease-miRNA association network is constructed with >40,000 associations between 956 diseases and 772 miRNAs. It includes many human diseases such as rheumatoid arthritis, muscular dystrophy, Parkinson disease as well as various cancers, where miRNAs may have a critical role in pathogenesis.
As a validation of our model, the influence on cell proliferation is tested for ten candidate miRNAs using cell lines of glioblastoma multiforme (GBM), the most malignant form of brain cancer. Five miRNAs (50%) show a significant decrease in proliferation in multiple cell lines (eight miRNAs (80%) in at least one cell line). Also, some of the miRNAs show a significant correlation with proliferation rate, cell morphology and patient survival. It suggests that disease-associated miRNAs can be identified with a reasonable accuracy, overcoming our limited knowledge on miRNA targeting that is a major hurdle in miRNA functional studies. Our disease-miRNA network provides a foundation to elucidate the functional role of miRNAs in a wide range of human diseases.
Top

  TBC-32: FUT8 play an important role as a glucose metabolic agent in EML4-ALK Fusion NSCLC

Jin-Muk Lim1, Hong-Gee Kim1 and Ju-Hong Jeon2

1 Biomedical Knowledge Engineering Lab, Seoul National University College of Medicine
2 Department of Physiology, Seoul National University College of Medicine


Abstract
The EML4 (echinoderm microtubule-associated protein-like 4)–ALK (anaplastic lymphoma kinase) fusion-type tyrosine kinase is an oncoprotein found in 4 to 5% of non–small-cell lung cancers, and clinical trials of specific inhibitors of ALK for the treatment of such tumors are currently under way. However, patients with these cancers invariably relapse, typically within 1 year, because of the development of drug resistance. Herein, we compare affymetrix microarray of ALK positive set(n=11) and Triple Negative(EGFR/KRAS/ALK) set(n=68). Machine Learning, statistical method, GO analysis and network analysis method are used. FUT8 is found as ALK positive set specific target. FUT8 play an important role as a glucose metabolic agent in this cancer. Our results may help future experimental investigation to understand the signal process of ALK, FUT8 inhibitor combination therapy in non-small-cell lung cancer. [This research was supported by MSIP (the Ministry of Science, ICT and Future Planning), Korea, under the IT-CRSP(IT Convergence Research Support Program) (NIPA-2013-H0401-13-1001) supervised by the NIPA(National IT Industry Promotion Agency)]
Top

  TBC-33: Landscape of ceRNAs in human genome and their potential role in cancer

Taehyung Kim1,2/sup>, Leonardo Salmena5,* and Zhaolei Zhang1,2,3,4,*

1 Department of Computer Science, University of Toronto, Toronto, ON, Canada
2 The Donnelly Centre, University of Toronto, Toronto, ON, Canada
3 Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
4 Banting and Best Department of Medical Research, University of Toronto, Toronto, ON, Canada
5 Princess Margaret Hospital, Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada


Abstract
MicroRNAs are small non-coding RNAs that govern many cellular processes by triggering RNA degradation or by inhibiting translation. Recently, miRNAs have been identified as key components of a novel RNA-RNA crosstalk mechanism, where RNAs influence each other’s expression level by competing for a limited pool of miRNAs. This phenomenon, termed competing endogenous RNA (ceRNA), results in a positive correlation of expression between the competing transcripts. Despite its potential importance in global gene regulation, there have been very few efforts to systematically study the functional relevance of ceRNAs, especially in cancer. Herein, we aim to extend this limited knowledge by identifying novel ceRNA pairs or networks through the use of the latest microRNA target prediction methods and high-throughput sequencing technology. In particular, we are investigating in the role of pseudogenes in modulating the expression of their parental genes via the ceRNA mechanism. Pseudogenes, previously dismissed as “junk DNA”, are genomic loci that resemble protein-coding genes but have lost any ability to code for a functional protein. By coordinating these two ideas, we hypothesize that gene-pseudogene (and gene-gene) regulation in cancer can be achieved through a ceRNA mechanism and this phenomenon includes, but extends well beyond the PTEN-PTENP1 paradigm. Here, we have identified a number of pseudogenes and proteincoding genes that have perturbed expression in tumour samples as compared to control tissue specimens. These perturbations will be evaluated for ceRNA potential, and their role in cancer progression. This work will not only demonstrate the existence of novel ceRNAs, but also extend our understanding on the origins and progress of cancer. A list of confirmed genes and miRNAs contributing as ceRNA networks implicated in cancer may serve as new therapeutic targets, and thereby allow development of a new means to modulate the expression of key cancer genes and to slow down cancer progression.
Top