TBC-1: Recently exonized Alu elements in Macaca fascicularis

 

Young-Hyun Kim1,2, Jae-Won Huh1,2 and Kyu-Tae Chang1,2

 

1National Primate Research Center, Korea Research Institute of Bioscience and Biotechnology, Ochang 363-883, Republic of Korea
2University of Science & Technology, National Primate Research Center, KRIBB, Ochang 363-883, Republic of Korea

 

Crab-eating monkey (Macaca fascicularis) and rhesus monkey (Macaca mullata) are frequently used and valuable primate model species. Although they most common primate model organism for biomedical approaches, their genetic information is not yet applicable except for rhesus monkey. In this study, we tried to analyze genomic diversity of closely related two macaca species with recently integrated Alu elements. First, the Macaca fascicularis mRNA sequences (10221 mRNA) were collected from Genebank database, and ¡®young¡¯ Alu-exonized mRNA sequences were sorted by repeatmasker program (216 mRNA). Second, for avoiding the false positive data (avoiding the genomic contaminated cDNA sequences), manual correction were conducted. Third, ten genes were chosen, and eight genes contained young Alu element were identified. Finally, for the verification of exonized young Alu element, PCR amplification and sequencing procedure were conducted using various human and primate DNA samples. Intriguingly, two genes (C9orf6 and NOLC1 gene) harbor the insertional polymorphic Alu element in their transcript. Although, we did not use the whole genome information of Macaca fascicularis, genome wide survey could be a useful tool for understanding the useful primate model organism.

 

 

 

TBC-2: Genome diversification mechanism between human and chimpanzee 

 

Jae-Won Huh 1,2 and Kyu-Tae Chang1,2

 

1National Primate Research Center, Korea Research Institute of Bioscience and Biotechnology, Ochang 363-883, Republic of Korea
2University of Science & Technology, National Primate Research Center, KRIBB, Ochang 363-883, Republic of Korea

 

Chimpanzee is the most closely related living species of human. Human and chimpanzee genome project show that there is only about 1 % genome difference between the two species. Thus, the comparison of gene sequences of two species could show us the genetic components that are related with lineage specific events. We compared and investigated the gene regions between human and chimpanzee using bioinformatic and experimental tools. In silico comparison was performed between human and chimpanzee genome. Among the 65248 insertion-deletion (INDEL) loci, 285 genes regions were identified, and 130 gene regions were experimentally validated. Although, 48 gene loci did not show any genetic differences, 32 gene loci showed the lineage specific INDEL events (insertion in human - 12 genes, deletion in chimpanzee – 20 genes). Those INDEL events categorized into five different evolutionary mechanism including retroelements-related (12 genes), homologous recombination and excision (12 genes), tandem repeats variation (5 genes), gene conversion (2 genes), and processed pseudogene formation (1 gene) mechanism. These results suggest that not only simple integration events can drive the genetic differences, but deletion mediated by the recombination event also participate the lineage specific evolutionary events between human and chimpanzee lineage.

 

 

 

TBC-3: Transcriptome sequencing and gene analyses in the crab-eating macaque

 

Kyu-Tae Chnag1,2

1National Primate Research Center, Korea Research Institute of Bioscience and Biotechnology, Ochang 363-883, Republic of Korea
2University of Science & Technology, National Primate Research Center, KRIBB, Ochang 363-883, Republic of Korea

 

As a human mimic, the crab-eating macaque (Macaca fascicularis) is an invaluable non-human primate model for biomedical research, but the lack of genetic information on this primate has represented a significant obstacle for its broader use. Here, we sequenced the transcriptome of 16 tissues and identified genes to resolve the main obstacles for understanding the biological response of the crab-eating macaque. From 4 million reads with 1.4 billion base sequences, 31,786 isotigs containing genes similar to those of humans, 12,672 novel isotigs, and 348,160 singletons were identified using the GS FLX sequencing method. Approximately 86% of human genes were represented among the genes sequenced in this study. Additionally, 175 tissue-specific genes were identified, 81 of which were experimentally validated. In total, 4,314 alternative splicing (AS) events were identified and analyzed. Intriguingly, 10.4% of AS events were associated with transposable element (TE) insertions. Finally, investigation of TE exonization events and evolutionary analysis were conducted, revealing interesting phenomena of human-specific amplified trends in TE exonization events. This report represents the first large-scale transcriptome sequencing and genetic analyses of M. fascicularis and could contribute to its utility for biomedical research and basic biology.

 

 

TBC-4: Recent Positive Selection in Human Genes That are Enriched for Disease Mutations, but Limited for Polymorphism

 

Yoon-Ho Hong1, Malcolm Campbell2, Kyungjoon Lee2, In-Hee Lee2 and Sek-Won Kong2

 

1Department of Neurology, Seoul National University Boramae Municipal Hospital, Korea

2Children's Hospital Informatics Program, Boston Children's Hospital, USA

 

Examining the near-full spectrum of genetic variation across the whole human genome is now possible with the advances of high-throughput sequencing technology. This enables population scale analysis of sequence variations, which provides an opportunity to explore characteristics of human disease genes and mutations in the context of molecular evolution. Here, using the whole genome sequence data of 37 putatively healthy unrelated individuals, we investigated the effects of natural selection in shaping the frequency spectrum of genetic polymorphism and disease mutations. We found that a quantitative estimate of evolutionary constraints is significantly higher in genes with lower frequency of polymorphic coding variants. The correlation between polymorphism and natural selection is also supported by 1) population and comparative analyses at the gene level, which revealed a significantly greater spectrum of single nucleotide polymorphisms (SNPs) in genes under positive selection, and 2) analysis in the context of human disease and gene essentiality, which confirmed the limited spectrum of polymorphism in disease genes with greater essentiality. Interestingly, the signature of recent or ongoing positive selection was consistently found in a subset of disease genes that are limited for polymorphism but enriched for disease-linked mutations. This suggests that recent adaptive selection might have acted on evolutionarily conserved genes, increasing the spectrum of disease-linked mutations.

 

 

TBC-5: Gene expression changes as resistant markers to cisplatin in a panel of bladder cancer cell lines

 

Sung Han Kim1 and Seok Soo Byun2

 

1Seoul National University Hospital, Republic of Korea

2Seoul National University Bundang Hospital, Republic of Korea

 

BACKGROUND: Cisplatin, one of the most effective anticancer drugs for bladder cancer, develops resistance during treatment by a cellular self-defense system of activating or silencing a variety of different genes, resulting in genetic and epigenetic alternations. As a result, the resistance mechanism of cisplatin is one of the most investigated subjects in clinical fields. In order to understand the resistance mechanism and to establish a possible gene candidate, a panel of cisplatin-resistant and general bladder cancer cell lines were used in a combination of microarray and real time-PCR profiling to investigate the possible resistant cisplatin gene expression.

METHOD: The human bladder cancer cell line (T24) obtained from the American Type Culture Collection (ATCC) and the preformed bladder cancer resistant cell line at 2.0¥ìg/ml of cisplatin (T24R2) were used for the microarray analysis to define the different expressions of significant genes resistant to cisplatin. Those upregulated significant genes were compared to tissue assay of bladder cancer resistant to cisplatin chemotherapy by real time PCR using. A fold change¡Ã 2 with p-value< 0.05 of statistics was considered significant.

RESULTS: Among a list of 488 up-regulated genes and 69 pathways from microarray analysis, a panel of 23 genes was selected for real time-PCR validation from four selected cancer-related pathways (p53, apoptosis, cell cycle, and pathway in cancer). All 23 genes were determined to be significantly different and up-regulated in both the microarray and the RT-PCR with fold change >2.0. They are PRKAR2A and 2B, CYCS, Bcl-2, BIRC3, DFFB, CASP6, CDK6, CCNE1, CUL2, FN1, STEAP3, MCM7, ORC2 and 5, LEF1, ANAPC1 and 7, CDC7 and 27, SKP1, WNT5a and 5b genes. Especially, the fold changes of CUL2, MCM7, WNT5A and 5B, LEF1, Bcl-2, CYCS, and PRKAR2B were greater than 4.0, suggesting high correlation with cisplatin resistance.

CONCLUSIONS: A panel of 23 up-regulated genes including the 5 genes with greater fold changes was determined to be significantly different from cisplatin resistant bladder cancer and bladder cancer cell lines. We propose that their gene expression profiles may play one of the key roles in the resistance mechanism to cisplatin in patients with bladder cancer.

 

 

TBC-6: GlaI-qPCR assay — a new instrument for quantitative DNA methylation analysis and its application for tumor suppressor genes study

 

Vitaliy Kuznetsov1, Elena Zemlyanskaya1 and Sergey Degtyarev1

 

1SibEnzyme Ltd., Novosibirsk, Russia, 630117

 

De novo DNA methylation in mammals is performed by Dnmt3a and Dnmt3b DNA methyltransferases, which recognize a tetranucleotide 5¡¯-RCGY-3¡¯ and modify the inner CG-dinucleotide with formation of 5¡¯-R(5mC)GY-3¡¯/3¡¯-YG(5mC)R-5¡¯[1].

GlaI is a novel methyl-directed site-specific DNA-endonuclease which recognizes DNA sequence 5¡¯-R(5mC)¡éGY-3¡¯ and cleaves it as indicated by arrow [2]. Thus, the recognition sequence of GlaI exactly corresponds to a product of DNA methylation with Dnmt3a and Dnmt3b. GlaI cleaves DNA completely and requires no additional cofactors [3]. Recently we have developed GlaI-PCR assay which allows determination of 5¡¯-R(5mC)GY-3¡¯ sites in studied DNA region [4]. The method includes DNA hydrolysis with GlaI followed by PCR with primers designed for the DNA region of interest. Earlier we have used GlaI-PCR assay to determine DNA methylation status of regulatory regions of tumor suppressor genes (TSGs) [5]. In this work we perform real time GlaI-PCR assay for quantitative determination (GlaI-qPCR) of 5¡¯-R(5mC)GY-3¡¯ sites in studied DNA regions. This assay was applied for study of DNA methylation in regulatory regions of RARB, NOTCH1, DAPK1, SEPT9b, IGFBP3, CEBPD, MGMT and RASSF1A TSGs in malignant cell lines HeLa, Raji, U-937, Jurkat and in the control fibroblast cell line L-68. We received methylation profiles of these genes for each cell line. In correspondence with previous data regulatory regions of TSGs are methylated in malignant cell lines. However, the methylation profiles are different for each cell line. This allows differentiating between different types of cancer cells. The results show that method of GlaI-qPCR assay may be used for quantitative determination of de novo DNA methylation.

 

References

1. Handa V, and Jeltsch A. J. Mol. Biol. 2005; 348, 1103-1112.

2. Tarasova GV et al. BMC Mol. Biol. 2008; 9, 7.

3. Abdurashitov MA et al. BMC Genomics, 2009; 10, 322.

4. SE Scientific Library [http://science.sibenzyme.com/article12_article_53_1.phtml]

5. SE Scientific Library [http://science.sibenzyme.com/article8_article_58_1.phtml]

 

 

TBC-7: A Filtering Algorithm for Gene-Gene Interaction using Case-Only Data

 

Pin-Cian Wang1, Liang-Chuan Lai2, Mong-Hsun Tsai3, Eric Y. Chuang4, Cheng-Yan Kao1 and Pei-Chun Chen5

 

1Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taiwan  

2Graduate Institute of Physiology, National Taiwan University, Taiwan  

3Graduate Institute of Biotechnology, National Taiwan University, Taiwan  

4Bioinformatics and Biostatistics Core, Research Center for Medical Excellence, National Taiwan University, Taiwan  

5Department of Statistics and Informatics Science, Providence University, Taiwan

 

Genome-wide association studies (GWAS) are typical study designs in genetic epidemiology using whole-genome SNP data. Single-locus test is used in most GWAS. However, some researchers have indicated the problems of GWAS using single-locus strategy. Gene-gene interaction becomes a more important issue. Exhaustive search methods such as multifactor dimensionality reduction (MDR) are powerful tools for gene-gene interaction detection. However, the main limitation of MDR is heavy computation. Therefore, the aim of our research was to design a filtering algorithm to select a candidate SNP set for further analysis and that can save computation time and get same prediction, called the deviance of independence (DOI).

DOI describes the level of dependence between two SNPs. In the first step of DOI calculation, the SNP data in control samples was removed because it was hypothesized that the frequency of allele and genotype may be stable in normal population. Next, the frequency of expected two-SNP combination and real two-SNP combination were calculated. The frequency of expected two-SNP combination was derived from the frequency of two individual SNPs according the principle of independence. Finally, DOI values were calculated by the summation of each absolute difference between the frequency of expected and real two-SNP combination. It is expected that the SNP combinations with high DOI have more potential to be the interaction combinations.

We use simulation and real data to examine DOI performance. The simulation results show that DOI values may be used to predict the interaction combinations. In addition, the WTCCC Rheumatoid arthritis (RA) chromosome 22 data and Parkinson's disease (PD) chromosome 20 data were used for real data application. And the results demonstrate that potential interactions can be identified after using DOI value as a filter criterion. In sum, DOI algorithm is a powerful tool to filter a candidate gene set for further interaction analysis.

 

 

TBC-8: 20-gene-based risk score classifier predicts disease recurrence in non-muscle invasive bladder cancer

 

Seon-Kyu Kim1, Young-Kyu Park1 and Seon-Young Kim1

 

1Medical Genomics Research Center, Korea Research Institute of Bioscience and Biotechnology, Daejeon, Korea

 

Background

Bladder cancer is a genetic disorder driven by the progressive accumulation of multiple genetic changes. While several molecular markers for the recurrence of bladder cancer have been studied, the limited value of current prognostic markers has created the need for new molecular indicators of bladder cancer outcomes. Here, we sought to identify a molecular signature associated with disease recurrence in non-muscle invasive bladder cancer (NMIBC) and to assess its usefulness as a prognostic indicator.

Methods

Microarray gene expression profiling was performed using gene-expression data from 102 primary NMIBC specimens (Korean cohort) to identify a gene expression signature associated with disease recurrence. The prognostic value of the gene expression signature was validated in an independent cohort (European cohort, n=302). A risk score based on the expression data of 20 genes was developed in the Korean cohort and validated in the European cohort. The association between the 20-gene-based risk scoring method and prognosis of NMIBC patients was assessed using Kaplan- Meier plot, the log-rank test, Cox proportional hazards model, and leave-one-out cross validation method.

Results

The determination of gene expression patterns by microarray data analysis identified 822 genes associated with disease recurrence. Of the 822 genes, 20 genes which are highly associated with recurrence free survival were detected by time-dependent ROC analysis. The risk score was developed by using Cox coefficient values of 20 genes in the Korean cohort and its robustness was validated in the European cohort (log-rank test, P < 0.001). Multivariate Cox regression analysis revealed that the risk score was an independent strong predictor of disease recurrence (hazard ratio = 6.082, 95% confidence interval = 3.280 to 11.279, P < 0.001).

Conclusions

The risk scoring method based on 20 genes represents a promising diagnostic tool to identify NMIBC patients that have a high risk of recurrence.

 

 

TBC-9: Genome-wide analysis of CNV and SNP in Koreans  

 

Sanghoon Moon1, Kwang Su Jung2, Young Jin Kim1, Miyeong Hwang1, Kyungsook Han4, Bok-Ghee Han3, Jong-Young Lee1, Kiejung Park2 and Bong-Jo Kim1

 

1Division of Structural and Functional Genomics, Center for Genome Science, National Institute of Health, Chungcheongbuk-do, 363-951, Korea
2Division of Bio-Medical informatics, Center for Genome Science, National Institute of Health, Chungcheongbuk-do, 363-951, Korea
3Center for Genome Science, National Institute of Health, Chungcheongbuk-do, 363-951, Korea
4School of Computer Science and Engineering, Inha University, Inchon, 402-751, Korea

 

To date, single-marker association analysis in genome-wide association studies (GWAS) has identified a large number of single nucleotide polymorphisms (SNPs) that are highly associated with complex diseases, but only a small portion of genetic heritability is explained by these variants. A copy number variation (CNV) is a physical change of genomic segment ranging from a kilobase to several megabases. CNV may alter disease susceptibility and gene dosage for genetic risk, so is a useful source for finding missing heritability.

Recent studies have reported that 60% of the detected CNVs were called with a single copy-number class, which cannot be tested for association and that well-defined polymorphic CNVs tagged by SNPs are more likely to affect multiple expression traits than frequency-matched variants. CNVs encompassing single genes or a set of genes can be more causative variants of genetic disease than SNPs alone. Therefore, SNPs correlated with CNVs are a valuable resource for GWAS.

Most CNV databases (except SCAN) do not consider polymorphic CNV (multi copy-number class). SCAN database also contains CNV data of Caucasian and Yoruba populations, and does not provide Asian CNV data. Due to the difference in CNVs between distinct ethnic groups, providing polymorphic CNVs and allele frequency of each genotype in Asian populations will help investigate CNV-association with diseases and ethnic differences.

In this study we developed a database called Korean Genomic Variant Database (KGVDB), which provides polymorphic CNV regions and well-tagged SNP information. The data were obtained from 4,700 individuals using two different genotyping platforms and publicly available CNV data. The large data set of KGVDB will provide a rich public resource for the study of CNV and SNP.

 

 

TBC-10: 3D-QSAR Pharmacophore Modeling of Thromboxane A2 Receptor for Discovery New Inhibitors

 

Kuei-Chung Shih1, Cheng-Yu Ma1, Hsiao-Chieh Chi1 and Chuan-Yi Tang1,2

 

1Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan 30013, R.O.C.
2Department of Computer Science and Information Engineering, Providence University, Taichung, Taiwan 43301, R.O.C.

 

Thromboxane A2 (TXA2 ) is a hormone derived from arachidonic acid (AA) through cyclooxygenases (COX) and thromboxane synthase (TXS), and it is a platelet aggregator by activating thromboxane A2 receptor (TP) to induce platelet aggregation and cell proliferation. Based on the action of platelet activation, TXA2 is associated with thrombosis, acute myocardial infarction and many diverse inflammatory diseases. There are some different approaches to achieve antiplatelet therapy through this prostanoid pathway. One strategy is to inhibit COX so that TXA2 could not be produce from AA, such as the most well-known antiplatelet drug, aspirin. Despite aspirin could resist myocardial infarction and stroke, it may lead to gastrointestinal disorder and allergy. TXS inhibition is one kind of inhibitors for suppressing TXS to generate TXA2, but it does not work efficiently because other endoperoxides and isoprostanes can also active TP just like TXA 2. Accordingly, the method to directly inhibit TP seems to be attractive. However, TP antagonists include ifetroban, sulotroban, GR32191 and other antithrombotic agents still stay in phase II or III of clinical development due to the safety concerns and efficacy. The previous studies were not proposed available co-complex structure between TP and Thromboxane A2 (TXA2) or any of its inhibitors, it is necessary to establish a screening model for rational drug design in silico. Our research is focus on building the TP phaemacophore hypothesis for discovering other potential TP inhibitors. This study report, we developed pharmacophore hypothesis for discovery new TP inhibitors. The best hypothesis has one hydrogen-bond acceptor (A) and three hydrophobic aromatic groups (HYAR), its correlation coefficient of training set and testing set were 0.933 and 0.923, respectively. According to statistical validation and chemical features analysis, our best pharmacophore hypothesis has excellent ability to help medicinal chemists in their efforts to identify or design new TP inhibitors.

 

 

TBC-11: Comparison of somatic mutation-calling methods based on DNA sequence from matched tumor-normal pairs

 

Su Yeon Kim1 and Terry Speed1,2

 

1University of California at Berkeley, Berkeley 94720, USA
2Walter and Eliza Hall Institute of Medical Research, Parkville Victoria 3052, Australia

 

Somatic mutation-calling based on DNA from matched tumor-normal patient samples is one of the key tasks carried by many cancer genome projects. In particular, The Cancer Genome Atlas (TCGA) is now routinely compiling catalogs of somatic mutations for hundreds of patients for various tumor types. Nonetheless, mutation calling is still a very challenging problem. TCGA benchmark studies reveal that even up-to-date mutation callers from major sequencing centers show substantial discrepancies. For most tumor types, validation data is not yet available, and even when it will be, only a fraction of all candidate mutations are likely to be validated. In order to compare mutation callers without genome-wide gold standard validation data, we have developed an approach using pseudo-positives (presumed somatic mutations) and pseudo-negatives (presumed not somatic mutations) that are defined using another caller. The other callers can be built on using publicly available variant calling methods such as GATK or SAMtools. This approach allows us to give a convenient visualization of the discrepancies between the different mutation call sets, and to summarize each mutation-caller's performance in terms of pseudo-false-positive and pseudo-false-negative rates. Some insights were gained from observing consistent results from two other callers that are not expected to introduce the same biases.

 

 

TBC-12: The estimation of heritability analyses for BMI using genotype score based on Korean Cohort

 

Nam Hee Kim1, Youngdoe Kim1, Young Jin Kim1, Ji Hee Oh1, Mee Hee Lee1 and Juyoung Lee1

 

Division of Structural and Functional Genomics, Center for Genome Science, National Institute of Health, Korea Centers for Disease Control and Prevention, Korea

 

The aim of study was to estimate variation and their heritability for BMI including genotype score and compare BMI to other cohort. We have constructed community and twin-family based on cohort, which is an ongoing prospective studies and surveyed samples were drawn from the Korean Genome and Epidemiology Study and Korea Genome Analysis Project in Korea.

We selected 2,473 subjects in twin-family cohort and surveyed their zygosity using the self-report questionnaires about 2,000 items and genotyped using Affy 6.0. From community-based cohort(KARE; Korea Association REsource), we selected 8,842 subjects and surveyed their self-report questionnaires about 1,400 items and genotyped using Affy 5.0. Including genotype score of BMI estimated heritability for BMI using SOLAR, GCTA, GENABEL.

 

 

TBC-13: Genotype instability during long-term subculture of lymphoblastoid cell lines   

 

Ji Hee Oh1, Young Jin Kim1, Sanghoon Moon1, Jong-Young Lee1 and Yoon Shin Cho1,2

 

1Division of Structural and Functional Genomics, Center for Genome Science, National Institute of Health, Chungcheongbuk-do 363-951, Republic of Korea
2Department of Biomedical Science, Hallym University, 1 Hallymdaehak-gil, Chuncheon, Gangwon-do 200-

702, Republic of Korea

 

Epstein-Barr virus (EBV-transformed lymphoblastoid cell lines (LCLs) promise to address the challenge posed by the limited availability of primary cells needed as a source of genomic DNA for genetic studies. However, the genetic stability of LCLs following prolonged culture has never been rigorously investigated. To evaluate genotypic errors caused by EBV integration into human chromosomes, we isolated genomic DNA from human peripheral blood mononuclear cells and LCLs collected from 20 individuals and genotyped the DNA samples using the Affymetrix 500K SNP array set. Genotype concordance measurements between two sources of DNA from the same individual indicated that genotypic discordance is negligible in early-passage LCLs (less than 41 passages) but substantial in late-passage LCLs (more than 40 passages). Analysis of concordance on a chromosome-by-chromosome basis identified genomic regions with a high frequency of genotypic errors resulting from the loss of heterozygosity observed in late-passage LCLs. Our findings suggest that, whereas LCLs harvested during early stages of propagation are a reliable source of genomic DNA for genetic studies, investigations that involve genotyping of the entire genome should not use DNA from late-passage LCLs.

 

 

TBC-14: Multi-study integration of brain cancer transcriptomes reveals organ-level diagnostic signatures  

 

Jaeyun Sung1, Pan-Jun Kim1, Leroy Hood2, Donald Geman3 and Nathan Price2

 

1Asia Pacific Center for Theoretical Physics, Korea

2United States Institute for Systems Biology, USA  

3Institute for Computational Medicine, Department of Applied Mathematics and Statistics, Johns Hopkins University, USA

 

The identification of molecular signatures from either tissues or blood to accurately reflect the major cancers of an organ system would be a significant advance in molecular cancer diagnostics. Towards this goal, we identified comprehensive diagnostic signatures of major cancers of the human brain from a multi-study, integrated transcriptomic dataset. These signatures are based on comparing ranked expression values of gene-pair sets, which are aggregated into a brain cancer marker-panel of 44 unique genes. Many of these genes have established relevance to the brain cancers tested herein, with others having known roles in cancer biology. Phenotype prediction follows a diagnostic hierarchy, and the corresponding hierarchically-structured signatures achieved 90% classification accuracy against a multi-disease alternative hypothesis when training and validation sets were drawn from the same population distribution (cross validation). Despite accurately distinguishing among phenotypes in single-population cross-validation, diagnostic signatures must remain robust even across more heterogeneous populations to justify their broad clinical use. To address this issue, we found that sufficient dataset integration across multiple studies greatly enhanced reproducibility and accuracy in diagnostic performance on truly independent validation sets, whereas signatures learned from one dataset typically had high error on independent validation sets. Looking forward, we discuss our approach in the context of improving blood diagnostics for cancers of organ systems.

 

 

TBC-15: Methyl-directed Site-specific DNA Endonuclease MteI is a New Instrument for Analysis of CpG Island Methylation

 

Vasilina A. Sokolova1, Valery A. Chernukhin1, Danila A. Gonchar1, Elena V. Kileva1, Larisa N. Golikova1, Vladimir S. Dedkov1, Natalya A. Mikhnenkova1, Elena V. Zemlyanskaya1, Vitaliy V. Kuznetsov1 and Sergey Kh. Degtyarev1

 

1SibEnzyme Ltd., Novosibirsk, Russia 630117

 

Methyl-directed (MD) DNA endonucleases specifically cleave short methylated DNA sequences and don¡¯t cut unmethylated DNA. Biochemical properties of MD endonucleases are similar to those of restriction enzymes, both types of enzymes require only Mg2+ ions as a cofactor. Today more than ten MD DNA endonucleases recognizing different sites with 5-methylcytosine are discovered and characterized [1]. Among them MD DNA endonucleases BlsI, BisI, PkrI and Glul have the same recognition site 5'-GCNGC-3', but activity of these enzymes depends on the amount and position of 5-methylcytosines in the recognition sequence.

A new methyl-directed site-specific DNA endonuclease MteI was isolated from Microbacterium testaceum. MteI recognizes a prolonged methylated DNA sequence of nine bases in length with a central pentanucleotide 5¡¯-GCNGC-3¡¯. MteI activity depends on a number of 5-methylcytosines and their positions in the recognition site. MteI cleaves DNA sequence 5¡¯-G(5mC)G(5mC)^NG(5mC)GC-3¡¯/3¡¯-CG(5mC)GN^(5mC)G(5mC)G-5¡¯ as indicated by arrows. The enzyme activity is significantly higher if 5¡¯-GC-3¡¯ dinucleotides in this site are replaced by 5¡¯-G(5mC)-3¡¯ dinucleotides and additional 5¡¯-G(5mC)-3¡¯ dinucleotides are present in both DNA strands.

We have developed a method of MteI-PCR assay which allows determining the methylated CpG islands. The method includes DNA hydrolysis with MteI followed by PCR with primers designed for the DNA region of interest. MteI-PCR assay has been applied to study methylation of CpG islands located in regulatory regions of tumor suppressor genes and revealed different patterns of DNA methylation.

1. http://mebase.sibenzyme.com/md-endonucleases

 

 

TBC-16: Nonunique SNP problems in association study 

 

Lyong Heo1, Young Jin Kim1, Sanghoon Moon1 and Jong-Young Lee1

 

1Division of Structural and Functional Genomics, Center for Genome Science, National Institute of Health, Osong, Korea

 

In the recent years, genome-wide association study (GWAS) have successfully identified numerous phenotype associated SNPs. In GWAS, SNP is served as a marker indicating a specific genomic region. Chromosomal position of each SNP is well annotated in NCBI dbSNP database. In dbSNP, however, annotation errors have been reported such as a SNP with multiple position, position change, and chromosome change. Doron and Sheweiki reported that 4.2~11.9% of HapMap SNPs were mapped to nonunique genomic region. Since a marker is only valid if it maps to unique region, SNPs mapped at nonunique region would not be adequate for association analysis. In this study, we analyzed nonunique SNPs in two versions of dbSNP database, b130 (hg18) and b135 (hg19). Nonunique rsIDs account for 3.46% and 2.26% of b130 and b135, respectively. Also, position change due to dbSNP build update was 0.39% for b130 and 0.13% for b135. We inquired GWAS catalog for studying the effect of nonunique SNPs. As of August 2012, GWAS catalog included 1355 publications with 8754 SNPs (7131 unique SNPs). Among catalogued SNPs, we found 237 SNPs mapped at nonunique position. Our results indicate that SNPs should be carefully annotated and tested for its validity as a marker in association study.

 

 

TBC-P17: Exonic variants in Korean population

 

Young Jin Kim1, Kwang Joong Kim1, Lyong Heo1, Yun Kyoung Kim1, Sanghoon Moon1, Youngdoe Kim1, Mi Yeong Hwang1, Bong-Jo Kim1 and Jong-Young Lee1

 

1Division of Structural and Functional Genomics, Center for Genome Science, KNIH, KCDC

 

Recent advancement of high-throughput genotyping technologies has enabled us to carry out a genome-wide association study (GWAS) in a large cohort. The main goal of genome-wide association study is to identify the complex phenotype associated loci. The discovery of the associated loci would lead us to understand the underlying mechanisms of complex traits. Despite the great success of GWAS, however, a limited number of susceptibility variants discovered in the previous GWAS accounts for only a small proportion of phenotypic variance. Missing heritability of the current genome analysis is the bottleneck preventing us from taking a step forward to personal genome, personal medication, disease prediction and prevention. In this context, Next Generation Sequencing (NGS) technology has been gathered much attention due to its usability in accessing genomic data at the base pair level of resolution. In this context, exome sequencing comprising 400 Korean samples facilitated the assessment of full spectrum of allele frequencies including coding altering variants. The analyses of all variants within coding regions would reveal undiscovered possible causal common or rare variants near previously associated loci.

 

 

 

TBC-18: Development of Korea Common Data Model for Adverse Drug Signal Detection based on multi-center EMR systems  

 

Si Ra Kim1, Seung Ho Park2, Bum Joon Park2, Kwang Soo Jang2 and In Young Choi1

 

1Graduate School of Healthcare Management and Policy, The Catholic University of Korea, Seoul 137701 , Korea
2Master course of engineering, Hanyang University of Korea, Seoul 133791, Korea

 

The adverse drug reaction (ADR) research based on Clinical Data Warehouse(CDW) was getting important in accordance with the electronic clinical information like Electronic Medical Record (EMR) than spontaneous adverse drug reaction (ADR) reporting. The drug safety monitoring based on EMR is able to collect more objective pharmacovigilance and analyze ADR earlier than spontaneous adverse drug reaction (ADR) reporting. We analyzed drug safety surveillance model with three researches; EU-ADR data model of Europe, Mini-Sentinel data model of Food and Drug Administration (FDA) and Observational Medical Outcomes Partnership (OMOP) data model of National Institutes of Health (NIH). Based on the comparison of three data models, we developed the Korea ADR common data model (CDM) for early detection of adverse drug reaction in Korea. This project is called as K-ADR (Korea- Adverse Drug Reaction). The K-ADR consists of eight tables which contain demographic table, drug table, visit table, procedure table, diagnosis table, death table, laboratory table and report-machinery table. Each table consists of 5~12 fields. In addition, terminology standard such as ICD-10 and WHO-ART will be provided to integrate multiple EMR systems. The K-ADR reflected Korea EMR structures will contributes for pharmacovigilance activity. The pharmacovigilance activity by using EMR is able to accurate signal detection through the diagnosis name and drug prescription information by patient. Also the K-ADR could be detected adverse drug events (ADEs) that contain under-reported ADEs and deficient ADEs. Further efforts for development of the standardized guidelines about procedure code and laboratory code will be needed for multi-institutional pharmacovigilance database system. The pharmacovigilance activity based EMR will be cost-effective method to detectADR signals.

Acknowledgement: This research was supported by a grant(12172KFDA212) from Korea Food and Drug Administration in 2012.

 

 

 

TBC-19: Various nucleosome positioning patterns in Drosophila

 

Doo Yang1,2 and Ilya Ioshikhes1,2

 

1Ottawa Institute of Systems Biology, Canada

2Department of Biochemistry, Microbiology & Immunology University of Ottawa, Canada

 

Nucleosome plays an important role in gene regulation by affecting the accessibility of transcription factors to the DNA. DNA sequence is one of the factors that position nucleosomes.

Finding the nucleosome positioning sequence (NPS) is challenging because the nucleosome binding is not as specific as transcription factor motifs However, some sequence features , such as dinucleotide periodicity, can be observed by analyzing nucleosome sequences collectively.

Drosophila genome sequences of H2A and H2A.Z nucleosomes were analyzed to find a novel NPS and relationship with biological functions.

The nucleosome positions and sequences were obtained from the published Chip-Seq data (Mavrich, el al., 2008, Nature for H2A.Z and Henikoff, et al., 2011, Genes & Devlop.) In order to minimize the noise in sequence pattern, only the +1 nucleosomes sequences were selected and separated into H2A and H2A.Z sequences. Then the dinucleotide patterns were analyzed.

Two novel NPS patterns, WW/SS and RR/YY, are proposed. The WW/SS sequence pattern is similar but not identical to the previously proposed yeast NPS. The Drosophila WW/SS NPS has higher content of SS at dyad. The 10 bp periodicity is stronger off the dyad and disrupted near dyad. The RR/YY NPS shows that dinucleotides are more periodic between 25 to 45 bp from dyad than near dyad or outer region. GO analysis of the genes having either WW/SS or RR/YY nucleosomes showed differences in biological functions. It suggested that possible relationship between gene functions and nucleosome sequences.

Comparison of H2A and H2A.Z NPS showed differences in the dinucleotide pattern. The most significant difference is that H2A.Z NPS has stronger peaks at the ± 45 bp from dyad instead of ± 55 bp in H2A. These positions in DNA are close to the protein domain where H2A.Z and H2A histones are different. In yeast, H2A.Z positioning is dependent on SWR1 and is immobile once positioned. H2A.Z is also well phased at the down stream of TSS. Combined with the fact that H2A.Z plays a role in proper gene activation, H2A.Z may serve as a barrier of downstream nucleosomes to maintain the proper binding sites for transcription factors and other proteins.

 

 

TBC-20: Anonymized Patient Chart Review Tool in Asan Medical Center  

 

Soo-Yong Shin1,2, Yongdon Shin2, Yong-Man Lyu2, Hyo Joung Choi2, Jihyun Park2 and Jaeho Lee1,2,3

 

1Department of Biomedical Informatics, Asan Medical Center, Seoul 138-736, Korea
2Office of Clinical Research Information, Asan Medical Center, Seoul 138-736, Korea
3Department of Emergency Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul 138-736, Korea

 

Asan Medical Center (AMC) has been developing AMC biomedical research infrastructure to improve the efficiency of clinical research as well as to protect privacy of patients. As a first step, AMC developed the anonymized patient chart review tool to protect patients¡¯ privacy by complying with government regulations in Korea. The primary purpose of this tool is to decide if a chosen patient should be included or excluded for a proposed study by reviewing the patient¡¯s anonymized clinical data. For this purpose, the AMC anonymized patient chart review tool aims to provide the comprehensive clinical data in AMC data warehouse including diagnosis, medication, lab results, pathology/radiology reports, progress notes, admission note, discharge summary, and operative report. Also it tries to provide the easy user interface by implementing the same interface as other AMC medical information systems. To generate the anonymized clinical data, 18 identifiers defined by HIPAA were removed as follows: 1) each patient was assigned to new research ID which is different from hospital patient ID. 2) All structured identifiers stored in EMR database were removed. 3) The remaining identifiers in the narrative texts were masked using the pre-defined regular expressions. As a future work, we have plans to scramble the date in clinical data and develop one-time research ID method which can generate a different ID each time even for the same patient for stronger protection of patients¡¯ privacy. We are also developing a research cohort discovery tool to estimate the approximate number of patients satisfying the research criteria.

 

 

TBC-21: Integrate Genomics and Molecular Interactome Data for Brain Tumor Pathway Discovery and Prognosis

 

Jongkwang Kim1, Gao Long1 and Kai Tan1

 

1University of Iowa, Dept. of Internal Medicine, Dept. of Biomedical Engineering, 65536 Iowa city, USA

 

Glioblastoma (GBM: grade IV astrocytoma) is the most common and lethal form of brain cancer. Median patient survival time is 15 mo. Few predictive gene markers for prognosis and treatment. This study integrates three types of data: transcriptomic, epigenomic profiles, and protein-protein interactome to find pathway markers that are responsible for long-term survival (LTS) compared to short-term survival (STS). 13 pathway markers were found from the integrated analysis. Pathway markers were tested on 115 GBM patient samples for the classification accuracy into STS and LTS cases. The accuracy (82.2%) is 13.6% higher than using one or two types of data, demonstrating that integration of transcriptomic, epigenomic and interactome data is a more powerful approach to elucidating molecular pathways distinguishing GBM subtypes.

 

 

TBC-22: Development of a Consumer-engaged Obesity Management Ontology based on Nursing Process

 

Hyun-Young Kim1, Hyeoun-Ae Park2, Yul Ha Min2 and Eun-Joo Jeon2

 

1Eulji University, College of Nursing, Deajeon 301-832, Korea
2
Seoul National University, Seoul 110-799, Korea

 

The purpose of this study is to develop an ontology to represent the consumer-engaged obesity management process based on clinical practice guidelines. Since life style modification by the consumers is the most important aspect in obesity management, we introduced concepts of consumer¡¯s engagement into obesity management process. We also considered data traffic when we developed the ontology.

We developed the ontology by defining the scope of obesity management, selecting a foundational ontology, extracting the concepts, assigning relations among classes, and representing classes and relations with Protégé.

We identified behavioural intervention, dietary advice, and physical activity from the guideline as obesity management strategies. Nursing process was selected as a foundational ontology to represent consumer¡¯s engagement in obesity management process. Since, consumers engage in their obesity management when they identify expected. Nursing process is a patient-centered, and goal-oriented method consisting of five phases (assessment, nursing diagnosis, outcome identification, implementation, and evaluation). These phases are repetitive and cyclic in obesity management process. First cycle represents first encounter of obesity management from initial assessment to outcome identification. Second cycle represents second encounter and onward. Two cycles are connected through the assessment in the second cycle being the evaluation of the first cycle. With this approach we were able to minimize data traffic in the obesity management process. We extracted 127 concepts, which included assessment data (such as sex, body mass index, and waist circumference) and the inferred data to represent nursing diagnosis and evaluation (such as degree of and reason for obesity and success or failure in life style modification). Relations linking concepts are ¡°part of¡±, ¡°instance of¡±, ¡°derives from¡±, ¡°derives into¡±, ¡°has plan¡±, ¡°followed by¡±, and ¡°has intention¡±. The concepts and relations were formally represented using the Protégé.

We were able to represent obesity management with consumer¡¯s engagement using nursing process as a foundational ontology. Nursing process can be used as a foundational ontology to support development of ontologies representing consumer¡¯s behavioural modification.

Acknowledgements: This work was supported by the National Research Foundation of Korea (NRF) grant

funded by the Korea government (MEST) (no.2012-012257 and no. 2012- 0000998).

 

 

TBC-23: Performance of microRNA target prediction algorithms 

 

Jee Yeon Heo1, Yongjin Choi1, Hae-Seok Eo1, Youngho Kim1, Taesung Park2 and Hyung-Seok Choi1

 

1Bio&Health Team, Future IT R&D Laboratory, LGE Advanced Research Institute, Seocho-gu, Seoul 137-724, Korea
2Department of Statistics, Seoul National University, Gwanak-gu, Seoul 151-747, Korea

 

MicroRNAs (miRNAs) are a class of small non-coding RNAs (~22 nt), which regulate gene expression through suppressing mRNA translation or inducing mRNA degradation by binding to their target mRNAs in multiple biological processes such as cell cycle control, cell growth, cell differentiation, apoptosis, embryo development and so on. Many computational and bioinformatic approaches to predicting target mRNAs of each miRNA have been developed including miRanda, PITA, TargetScan, DIANA-microT, Microcosm and miRDB. Here, we compared the performances of these six above-mentioned miRNA target prediction algorithms. First, 6,901 common pairs (0.003%) were selected from the total 2,842,985 miRNA-target mRNA pairs predicted by all six algorithms. Second, 3,507 validated miRNA-target mRNA pairs were collected from the experimentally validated databases including TarBase, miR2Disease, miRTarBase and miRecords. Among them, 879 pairs (25%) were not predicted by any algorithm and 214 pairs (6%) were predicted by all six algorithms. Finally, Receiver operating characteristic (ROC) curves and area under curve (AUC) values were calculated to compare of the performance of each algorithm. Our comparison results show that DIANA-microT has the highest accuracy (60%) and miRanda has the lowest accuracy (49%) and prediction scores of each miRNA target prediction algorithm are lowly correlated to each other.

 

 

TBC-24: Graphical modeling of regulatory interactions in sporadic Inclusion Body Myositis

 

Thomas Thorne1, Pietro Fratta2, Michael Hanna3, Elizabeth Fisher2 and Michael Stumpf1

 

1Centre for Bioinformatics and Systems Biology Imperial College London, UK
2Department of Neurodegenerative Disease, UCL Institute of Neurology, UK
3National Hospital for Neurology & Neurosurgery, University College London, UK

 

Sporadic Inclusion Body Myositis (sIBM) is a disease that causes inflammation of the muscles and progressive weakening and wasting of the muscles, and the mechanisms by which it acts are not currently fully understood. Here we present an analysis of gene expression microarray data from both disease and control cases in an attempt to identify regulatory interactions that may be involved in the disease. To model the regulatory network structure we employ a Gaussian Graphical Model (GGM) formalism, whereby the data are assumed to be generated from a multivariate Normal distribution. In the GGM model a pair of genes will only share an edge if they have a non-zero partial correlation – that is if their correlation cannot be explained by the expression of any of the other genes. Since we are faced with a situation in which there are a significantly larger number of genes than data points, we apply a sparse regression methodology to infer the partial correlations between genes. Here we choose to apply a sparse Bayesian regression method that has been demonstrated to outperform methods such as the Lasso. To perform inference of the model parameters we apply variational inference, a technique whereby the Bayesian posterior distribution is approximated by a factorised set of exponential family distributions.

 

 

TBC-25: Jiffynet: A web server generating Gene networks for newly sequenced species  

 

Eiru Kim1 and Insuk Lee1

 

1Biotechnology Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Korea

 

Current one of the emerging approaches in studying biological systems is systems biology which is a study field that focuses on complex interactions in biological systems. Since development of next generation sequencing technology, large amounts of sequencing data as diverse species are now available. However, lacking of their genetic analysis, It is no possible to study them systematic approaches. For a biologist who wants to study novel species systematically, we have developed a web server providing draft models of various networks. The draft net, we call this "JiffyNet", which is made from mapping associalogs with well-established existing network such as HumaNet, WormNet, YeastNet, and RiceNet. Associalogs are derived from combining orthologs of two species and their interaction. Through this it is possible to make JiffyNet of user defined species by finding associalog and mapping to base networks. We are making the webserver that enables biologist to build their own JiffyNet. A biologist may upload their sequencing data, the server sends JiffyNet created using the data through e-mail.

 

 

TBC-26: Studying Plant Complex Traits Through Network-assisted Systems Genetics of Arabidopsis Thaliana

 

Tak Lee1, Jung Eun Shim1 and Insuk Lee1

 

1Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, 262 Seongsanno, Seodaemun-Gu, Seoul, 120-749, Korea

 

As next generation sequencing (NGS) technology develops rapidly, Genome Wide Association Study (GWAS) is being highlighted for searching genes that are associated with certain traits such as disease genes in humans and stress resistant genes in plants By sequencing genomes of organisms and statistically associating sequence variants to certain traits, GWAS is expected to show high performance on the discovery of novel genes. However, even though GWAS has high cost and requires intensive work, it does not give expected outcomes so far. Here, we present a novel way of analyzing associations between genetic variants and phenotypes of a plant model organism, Arabidopsis thaliana, by using a Network guided approach.

Using the Arabidopsis functional gene network (AraNet), we develop a unique algorithm that would effectively predict the significant variant-phenotype associations of Arabidopsis GWAS. AraNet is constructed by integrating various omics data and predicts functional relationships for 73% of total Arabidopsis genome. An algorithm that combines GWAS data and integrated omics data of AraNet, would give more power in predicting genes that have low significance in GWAS but still important in certain phenotypes

 

 

TBC-27: Systematic analysis of cell line data for the development of novel cancer treatment

 

Nayoung Kim1 and Sukjoon Yoon1

 

1Department of Biological Sciences, Sookmyung Women¡¯s University, Seoul 140742, Korea

 

An integrative approach of large-scale omics and drug response data on various cell lines enables us to identify the cellular signaling and drug sensitivity in cancer. Here we represent system-level analysis of cell line data for predicting sensitivity and mechanism of targeted drug response based on major genotypes of cancers. Association study with the genotypic classification was performed on drug data and omics data such as transcriptome, proteome, and phosphateome on human cancer cell lines. This approach reproduced the known patterns of mechanism-based drug response in cancers. Furthermore, gene and protein signatures significantly associated with genotype were identified and integrated to drug-centered network. This study provides an integrated approach for omics, drug response data, and cancer mutation types in cancers. Our platform is applicable to generate an accelerated hypothesis and validate the optimized therapeutic window for single or combined anticancer agents.

 

 

TBC-28: Genome Signature Image (GSI): Concise visualization of species/strain-specific profiles of repetitive element occurrences for cataloguing and evolutionary studies

 

Kang-Hoon Lee1, Kyung-Seop Shin2, Woo-Chan Kim2, Jeongkyu Roh2, Seung-Ho Choi2, Dong-Ho Cho2 and Kiho Cho1

 

1Department of Surgery, University of California, Davis and Shriners Hospitals for Children Northern California, USA

2Division of Electrical Engineering, School of Electrical Engineering and Computer Science, Korea Advanced Institute of Science and Technology, Korea

 

The genomes of living organisms, ranging from bacteria to humans, contain diverse populations of repetitive elements (REs). Our recent studies revealed that the RE profile, including RE arrays, of the human genome is unique in comparison to the mouse genome while gene sequences of humans and mice share a homology of ~90%. Also, a preliminary survey of the genomes of various other species demonstrated that genomic RE profiles are species-specific. In this study, we developed a suite of protocols/programs to concisely visualize genome signatures using species/strain-specific RE profiles. Since the genomes of higher eukaryotes, including humans and non-human primates, have not yet been fully decoded, we developed the genome signature technology using complete genome sequences from the domains of Archaea and Bacteria. The genome sequences of 117 Archaea-domain and 1,068 Bacteria-domain members were obtained from the National Center for Biotechnology Information and subjected to a genome-wide survey for the occurrence of 5-nucleotide REs. The top 50 highest frequency REs were then selected from each genome followed by an assembly of the 50 different REs into a RE string of 250 nucleotides, from high to low frequency. The string of high frequency REs now represents a unique signature of each genome. Of note, the two key parameters (number of high frequency REs and RE length) for the generation of genome signature sequences are tuneable. The genome signature sequence was then visualized into an image, named Genome Signature Image (GSI), using a CMYK color scheme. Interestingly, not all members within a pre-established phylogenetic branch shared similar CMYK color patterns and it can be confirmed by examination of the GSIs of the 1,185 microorganisms using different parameters. The tuneable GSIs represent and visualize unique characteristics of any genome and the concise RE string of each genome enables phylogenetic studies involving large sample numbers.

 

 

TBC-29: Analysis of copy number variation in exome sequencing data    

 

Mi Yeong Hwang1, Sanghoon Moon1, Young Jin Kim1, Lyong Heo1, Yun Kyoung Kim1,Youngdoe Kim1, Bok-Ghee Han2, Jong-Young Lee1, and Bong-Jo Kim1

 

1Division of Structural and Functional Genomics, Center for Genome Science, National Institute of Health, Chungcheongbuk-do, 363-951, Korea

2Center for Genome Science, National Institute of Health, Chungcheongbuk-do, 363-951, Korea

 

Copy number variation (CNV) has been reported lots of associations with complex diseases such as schizophrenia and obesity. To discover CNVs in the human genome, comparative genome hybridization array (aCGH) and single nucleotide polymorphism (SNP) array have been mainly used. However, CNVs from these array-based platforms have inaccurate breakpoints due to low resolution. Therefore, it is hard to discover exact size of CNV regions. Moreover, small size genomic variants such as less than 500 bp were also rarely detected. Recently, next generation sequencing (NGS) techniques have developed rapidly. In addition, exome sequencing approaches has been regarded as a tool for Mendelian disease gene discovery.

In this study, randomly selected 139 individuals enrolled from population-based cohort were genotyped with Agilent/Hiseq exome sequencing. Much of the detected CNV regions were validated by Agilent 60K aCGH. As a result, we discovered 10,084 from exome sequencing. More than 80% CNVs detected from exome sequencing (8,113/10,084) was less than 300 bp in length. We compared all of the detected CNV regions with previously reported regions and also examined recurrent copy-number deletion regions that might cause loss-of-function.

 

 

TBC-30: Identification of functional nucleotide sequence variant in the promoter of CEBPE gene

 

Hyunju Ryoo1, Minyoung Kong1, Younyoung Kim1 and Chaeyoung Lee1

 

1School of Systems Biomedical Science, Soongsil University, Seoul, Korea

 

Research efforts have been made to identify genetic factors for susceptibility to complex acute lymphoblastic leukemia (ALL). ALL has been known as the most common childhood malignancy. Especially, a recent outstanding genomewide association study (GWAS) revealed an association (odds ratio = 1.34, P = 2.88 * 10–7) of ALL with the SNP of rs2239633 in a 5¡¯upstream region of the gene encoding CCAAT/enhancer binding protein epsilon (CEBPE) in an English population (907 cases and 2,398 controls). The current study examined promoter activity in the promoter region to see if sequence variants can regulate the expression of the gene and to identify functional variant(s). Three haplotypes were estimated with the rs2239633 and its proximity single nucleotide polymorphisms (SNPs) in strong linkage. The wild haplotype was TGTTTTC (HT1) and second most consisted of the entirely opposite alleles to the wild haplotype (CCACGCT, HT2). Minigene constructs with the haplotypes were utilized to see the luciferase activity. Their luciferase activity revealed the strongest expression with the HT2 and the weakest with the HT1. Further luciferase activity showed that rs2239632 was the functional nucleotide variant which had made the different expression. The promoter activity concurred with our in silico analysis where different transcription factors were predicted with the haplotypes. We concluded that rs2239632 could regulate the expression of the CEBPE gene. This might result in the association in the previous GWAS with the rs2239633 which was strongly linked to the rs2239632 (r2=0.949). Its risk allele would increase the gene product and lead to leukemogenesis. As a result, person with the allele or the corresponding haplotype would be more susceptible to ALL.

 

 

TBC-31: Functional promoter nucleotide variants and their haplotypes of the gene encoding CCL21

 

Wonhee Jang1, Hyunju Ryoo1, Jihye Ryu1, Jeyoung Woo1, Minyoung Kong1, Younyoung Kim1 and Chaeyoung Lee1

 

1School of Systems Biomedical Sciences, Soongsil University, Seoul, Korea

 

Genetic architecture for rheumatoid arthritis (RA) has been quite limitedly known in spite of a great concern on its causal factors. Recent genomewide association studies (GWAS), however, have identified several genetic signals associated with susceptibility to RA. Especially, a meta-analysis of previously published GWAS showed an association (P = 2.8 × 10−7, OR=1.12) with the gene encoding chemokine (C-C motif) ligand 21 (CCL21) using a total of 3,393 cases and 12,462 controls. The sequence variant (rs2812378) identified in the meta-analysis was located in a 5¡¯upstream region of the gene. The current study aimed to identify functional variants in the promoter region in which the association signal was observed. Four nucleotide variants in an estimated linkage disequilibrium block were considered as candidate functional variants. Different transcription factors were predicted by allelic substitutions at all of the variants. Luciferase assay revealed that the minigene construct with wild haplotype (TCGG) had a smaller expression level than that with the haplotype of CCTG which included risk allele of rs2812378 identified in the meta-analysis. We concluded that the haplotype CCTG and the allele C of rs2812378 could overproduce CCL21 comparing to their corresponding wild types. The overexpression of the chemokine would lead to a larger susceptibility to RA considering that the chemokine was involved in ectopic lymphoid structures affected by RA.

 

 

TBC-32: Development of Web-based Case Report System in Traditional Korean Medicine for Clinic Doctor

 

Boyoung Kim1, Seung-Min Baek1 and Sunmi Choi1

 

1Korea Institute of Oriental Medicine, Daejeon 305811, Korea

 

The paper develops a web-based case report system for Traditional Korean medicine to be provided to Oriental Medicine doctors in local clinics. First of all, we arranges literatures of case report, which are gathering existing papers of case report, based on the STRICTA, and provide them as educational materials. Additionally various types of case report should be standardized to be accessible by web based system. Finally, we can prepare the foundation to practice evidence-based Medicine in Traditional Korean Medicine through the purposed system.

 

 

TBC-33: ChemTools : Python based Chemoinformatics Toolkit 

 

Jehoon Jun1, Minjae Yoo1 and Kwang-Hwi Cho1

 

1Soongsil University, Korea

 

Python based Chemoinformatics Toolkit (ChemTools) has been developed. The development of NMR and X-ray equipment led to the discovery of numerous chemical compound structures. And these chemical structure databases led to in silico drug discovery using computers. Among many in silico methods, virtual screening is an essential tool which is widely used in most of the pharmaceutical companies and related academic fields. In these drug discovery processes, computational tools for managing, mining, and collecting database are very important. However, accuracy and performance of some of public available tools has limited ability. For this reason, we have developed an chemoinformatics toolkit which include several in- and out-house codes. ChemTools contains modules, such as yaChI(Chemical line notation) , 3DG (3D structure generator from connectivity), conformer generator and filters for eliminating unwanted data from large chemical database, which are useful to treat large chemical database. And, ChemTools can edit molecule atom-by-atom and bond-by-bond using very simple syntax. ChemTools is based on python, so the modules could be combined with any combinations in python script language. The toolkits inherit some modules from Pybel such as SIMLES code generator, InChI code generator, and Energy minimizer. The modules we developed such as yaChI and 3DG are more reliable than any other modules have been released. The performance of in-house codes are presented with their counterparts and shows improved performance. ChemTools would be are very useful tools for researches which treat large chemical database such in silico drug discovery or material design.

 

 

TBC- 34: Molecular Dynamic Studies to predicted protein-protein interactions using GPU accelerated AMBER : application to TBC1 interacting Rab family proteins

 

Ok Sung Jung1, Bong Hun Ji1 and Kwang-Hwi Cho1

 

1Soongsil University, Korea

 

Current advances in computer simulation enable us to perform large scale molecular simulation relatively easily. Especially GPU accelerated AMBER package (AMBER-GPU) shows improved performance, in terms of speed, compared to CPU version. AMBER-GPU has been applied to study TBC1 interacting Rab family proteins. As TBC family proteins function GTPase-activating protein for Rab family proteins, TBC family proteins are considered to have important roles in cell cycle and differentiation in various tissues. And, Rab family proteins are known to be participated in protein transport, membrane traffic, exocytosis, endosomal recycling by taking part in transport from endoplasmic reticulum to Golgi complex. Therefore, knowing the interaction of TBC family proteins with Rab family proteins is very essential for studying transport system.

However, it is time-consuming and expensive to study the interactions between various TBC family and Rab family. So, it is necessary to apply a computational approach to predict the interaction of the complexes prior to the in vitro experiments.

TBC1D4 (also known as AS160) and TBC1D1, are the two RabGAPs integral for the GLUT4 translocations in adipocytes and skeletal myocytes respectively, whose crystal structure have been recently reported(PDBID:3QYE ). There are about 60 Rab family proteins and 18 out of them are experimentally treated to investigate the association with GLUT4 vesicles. Among them only a few (four) Rabs have been shown to be potential substrates for TBC1D1 or TBC1D4. Recently, the structures of TBC1D1 and Rab family proteins have been reported and more is coming. Using the structures the experimental result of protein-protein interaction between TBC1 and Rab family proteins are validated with computational method using AMBER. A certain energy cut has been found between binders and non-binders. We are expanding our work to the Rab family proteins which any experiments are not done yet to find possible interacting partners.

 

 

 

TBC-35: A Novel Data Mining Approach for Inferring Phenotypic Association Networks to Discover the Pleiotropic Effects

 

Sung Hee Park1 and Sangsoo Kim1

 

1School of Systems Medical Science, Soongsil University, Seoul, Korea

 

Pleiotropy is a genetic phenomenon that a single gene has effects on multiple phenotypes. In the human diseases and model organisms, the pleiotropy can imply that different mutations in the same gene cause different pathological effects. Examples of pleiotropic effects have been observed more with an increasing number of variants identified through genome-wide association studies (GWAS). However, current GWAS are performed in a single trait framework without considering genetic correlations between important disease traits. Hence, the general framework of GWAS has limitations in discovering genetic risk factors affecting pleiotropic genes.

This work reports a novel data mining approach to discover patterns of multiple phenotypic associations over 52 anthropometric and biochemical traits in KARE and to infer the phenotypic association networks from the patterns expressed as association rules. This method applied to the GWAS for multivariate phenotype highLDLhighTG derived from the predicted patterns of the phenotypic networks associated with high levels of triglycerides. The patterns of the phenotypic association networks were informative to draw relations between plasma lipid levels with bone mineral density and a cluster of common traits (Obesity, hypertension, insulin resistance) related to Metabolic Syndrome (MS). The 15 variants of six genes (PAK7, C20orf103, NRIP1, BCL2, TRPM3, and NAV1) were identified for significant associations with highLDLhighTG.

Our results suggest that the six pleiotropic genes may play important roles in the pleiotropic effects on lipid metabolism and the MS, which increase the risk of Type 2 Diabetes and cardiovascular disease by analysis of Mouse QTL and PPI interaction Network on top of phenotypic associations discovered. This work provides insights into explaining disease comorbidity when the pleiotropic genes share common etiological pathways.

 

 

TBC-36: Transcription Interference Networks are the coordinators of the gene expressions

 

Zsolt Boldogkoi1 and Dora Tombacz1

 

1Department of Medical Biology, Faculty of Medicine, University of Szeged, Szeged 6720, Hungary

 

Gene expression is mainly controlled at the level of transcription. Non-coding RNAs play very important roles in this process at various levels of genetic regulation, including the control of chromatin organization, transcription, various post-transcriptional processes, and translation. In this study, we report the detection of a genome-wide expression of antisense non-coding RNAs from the genome of pseudorabies virus, which is a neurotropic-herpesvirus. We put forward the Transcription Interference Network (TIN) hypothesis in an attempt to explain the genomic design and the existence of the antisense RNAs in a common interpretation framework. This hypothesis suggests the existence of a novel genetic regulatory layer, which controls the cascade of herpesvirus gene expression at the level of the transcription. The TIN is proposed to represent a mechanism, which plays a central role in the programmed step-by-step switches of transcription between kinetic classes and subclasses of viral genes. The proposed model may be not restricted to the herpesviruses, but might explain the mechanism of an important regulatory system existing in other organisms belonging to various taxonomic classes.

This project is supported by the Swiss Hungarian Contribution and the European Union and co-financed by the European Social Found.

 

 

TBC-37: Subnetwork-based analysis of human disease in protein complex with housekeeping functions  

 

Sanghun Bae1, Hyunwook Han2, Hanwool Kim3 and Jisook Moon1,2,3

 

1College of Life Science, Department of Applied Bioscience, CHA University, Seoul, Korea

2Department of Biomedical Science, CHA University, Seoul, Republic Korea

3CHA Stem Cell Institute, CHA Health Systems, Seoul, Republic Korea

 

Given that proteins in a living system serve the components of protein complexes or molecular machines to achieve a number of cellular processes and aberrant protein inter-relationship contribute to a disorder of molecular system, a comprehensive analysis of protein-protein interaction network (PPIN) is essential for a systemic understanding of human disease.

However, a substantial number and complexity of the entire protein interaction has led to the difficulty of network-based research, which makes analysis of sub-network, otherwise known as small world, necessary because of the greatly reduced number of proteins to be analysed. In this regard, the present study is concerned with the sub-network consisting of components of one protein complex that is responsible for basic cellular maintenance functions and their interactors, with our aim focused on systemic approach to human disease.

To construct human interactome PPIN as a first step for this study, we extracted binary protein-protein interaction data from eight molecular interaction database: HIPPIE, HPRD, REACTOME, BIOGRID, InnateDB, DIP, MINT and Intact; and integrated them (172,400 interactions) to increase coverage of PPI data. Proteins of interest used as seed-proteins and their neighbours in the integrated PPIN were selected for creating sub-network, the components of which were mapped to OMIM (Online Mendelian Inheritance in Man) data and GAD (Genetic Association Database) data, representative sources of genotype-phenotype correlation. In enrichment analysis (hypergeometric test), certain disease class terms were over-represented in the sub-network. Moreover, Network properties, GO term and pathway enrichment analysis revealed that the sub-network has distinct features that provide a possible explanation for overrepresentation of particular disease categories in the protein complex with housekeeping function.

Our findings suggest that a subnetwork-based, focused analysis can be a practical application for understanding the underlying nature of human disease and allow us to interpret the properties of disease-related genes on a systemic level.

 

 

TBC-38: Functional haplotypes in 5¡¯ region of RGS14 gene

 

Jeyoung Woo1, Minyoung Kong1, Younyoung Kim1 and Chaeyoung Lee1

 

1School of Systems Biomedical Science, Soongsil University, Seoul 156-743, Korea

 

Limited knowledge has been known for genetic factors on multiple sclerosis (MS) which leads to nerve degeneration in brain and spinal cord. Recently, an outstanding genomewide association study (GWAS) showed that a single nucleotide polymorphism (SNP, rs4075958) confer the risk of MS. The variant was located in the promoter region of the gene encoding regulator of G-protein signaling 14 (RGS14), a GTPase activating protein (GAP). We investigated the promoter activity of the variants in the region to see whether the sequences can regulate expression of the gene and to identify functional variants in the region. Three haplotypes were estimated with the rs4075958 and 4 SNPs in strong linkage. For each haplotype, a minigene was constructed containing the selected SNPs and firefly luciferase gene. Luciferase activity of each haplotype was measured by Dual-Luciferase Reporter Assay system. As a result, promoter activity has been shown different by the haplotypes. Especially, the largest difference was observed between wild haplotype and the haplotype with all the alleles complement to the wild type. This concurred with the previous GWAS in which the SNP conferred the risk of MS. We concluded that the haplotype with the complement alleles could increase expression of the RGS14 gene. The overexpressed product suppresses G¥ái/o of mGluR4 and thus increases cAMP that activates TH17. Consequently, the TH17 would lead to neuroinflammation, and the accumulated neuroinflammation might increase the susceptibility to MS.

 

 

TBC-39: Health SORA, the Smart Health Care Program for Cancer Survivors

 

Young-Ho Yun1, Ye-Ni Choi1, Moon-Kyung Shin1, Kwang-Choon Kim2 and Jaegeol Cho2

 

1Seoul National University College of Medicine, Korea

2Samsung DMC R&D Center, Korea

 

Although the numbers of cancer-survivors are steadily growing, there are few programs designed to accommodate survivors with Information Technology-based (IT) health promotion. According to previous studies, cancer survivors¡¯ Quality of Life (QOL) is significantly lower than general population, yet there is few programs designed for QOL of survivors, and only focus on specific area, such as exercise and nutrition. Realizing the need of comprehensive health-care program, we designed an IT-based program called Health SORA (Smart, Optimizing, Realistic, Authentic health care program) customized for total health care of cancer survivors.

We studied and analyzed strategies and theories in various fields: transtheoretical model (TTM), behavior/health psychology, fundamental principles of coaching and other leadership theories. Combining the theories, program flow chart is developed. Health care categories to be managed are determined by previous publications. Categories cover physical, mental, social, and existential areas for complete health care.

Managed categories are 12 total, which including exercise, nutrition, emotion, physical examination, fatigue, sleep, weight control, family and society, existential well-being, comorbidity and medication, pain, and quit smoking and moderate drinking. Each category is managed by following orders and the cycle repeats weekly for most of them: 1)evaluation, 2)analysis, 3)decision making, 4)planning, 5)acting, and 6)monitoring and receiving feedback. For example, user first assess one¡¯s exercise behavior (TTM, amount of exercise, regularity etc.) in evaluation. Next, user reviews one¡¯s current exercise status and decides whether to manage it or not. Once decided to manage, user can plan for certain education and activity. After actual performance of activity, user manages the category by reviewing one¡¯s status change in management phase.

This is the first smart and comprehensive prognosis program that includes 12 important health care areas for cancer survivors. We believe that this total health care program can effectively contribute to improve health and QOL of cancer survivors.

 

 

TBC-40: A computational framework for differential alternative polyadenylation profiles between cancer and normal cells

 

Jimin Shin1,2, Hyunmin Kim1, Chaeyoung Lee2 and David Bentley1

 

1Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, Colorado, USA
2School of Systems Biomedical Science, Soongsil University, Seoul, Korea

 

Alternative polyadenylation of mRNAs is greatly concerned as an important mechanism for post-transciptional regulation in eukaryotic genes. Approximately half of all expressed genes are thought to produce alternatively polyadenylated mRNAs in human. Recent studies showed that alternative polyadenylation in a specific tissue turned out to be important in oncogenesis. For example, mRNA isoforms having longer or shorter UTR lengths were observed in breast cancer cell lines, and a direction of the length changes is cell-type-dependent. This study aimed to overcome limitations of appropriate statistical background models and quantification of changes in the number of polyA sites in the currently available computational analysis of Alternative polyadenylation. We proposed an analysis with a computational framework for evaluation of the differential Alternative polyadenylation profiles between normal and cancer cells. The proposed approach deals with tasks of peak identification and peak comparison. It was to use a nonparametric normalization with LASSO algorithm in order to panelize peak patterns with artifacts. This method is called polyA shifting index (PSI). The PSI has a property of capturing non-linear trends of the changes in the numbers of polyA sites. Furthermore, the corresponding statistic also has an unbiasedness property in the changes over a long distance. The proposed method is needed to be publically available, which would accelerate identification of the differential Alternative polyadenylation profiles.

 

 

TBC-41: The genetic regulation of aging process and age-related disease 

 

Han Wool Kim1, Hyun Wook Han2, Sang Hun Bae2 and Ji Sook Moon1,2

 

1CHA Stem Cell Institute, CHA Health Systems, Seoul, Republic Korea

2Department of Biomedical Science, CHA University, Seoul, Republic Korea

 

Aging process is inevitable biological process of all life, and its fundamental mechanism remains unresolved. Recent studies only investigated simple difference of the network properties and disease classification from the relationship between aging genes and genetic disease genes. Further contributing factors such as methylation and miRNA are more important to uncover aging process and pathogenesis of diseases. Here, for further investigation, we compiled and analyzed human disease (OMIM) and aging (GenAGE) genes to investigate the relationship between aging and disease genes. We categorized the genes with three gene groups: disease only genes, aging only genes, and aging-disease genes. Each of these groups was subsequently characterized. Of the 2117 genes, 1856 genes were disease only, 155 genes were aging only, and 106 were aging-disease genes. Interestingly, Analyses of GO (Gene Ontology) enrichment, transcription factor, protein interaction network, and methylation revealed that each gene group is uniquely involved in different functional categories, and show different transcription factors, miRNA, degree centrality, and methylation pattern. Also, from analyses of disease genes, we uncovered that disease only and aging-disease genes are enriched in different disease categories. Our results shed light on elucidating the relationship between the genesis of a various diseases and aging process.

 

 

 

TBC-42: Discovery of Pathway Information Content of Protein Domains based on Domain Co-occurrence Network

 

Jung Eun Shim1 and Insuk Lee1

 

1Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, 262 Seongsanno, Seodaemun-Gu, Seoul, 120-749, Korea

 

Identification of functional building blocks, such as proteins, genes, and protein domains, is important for understanding the biological processes of a cell. Protein domain is particularly useful feature, because it is the structural, functional and evolutionary units of proteins. However, domain-based identification of protein function is still quite difficult problem. In this reason, we developed a network-based quantification of domain functions to identify protein domains which play a critical role to drive protein-level functions, using Domain Information Content Score (DomICS). In this framework, we first constructed a gene network by domain co-occurrence measured in which we give larger weights to rarer domains, and then measured association scores of a specific pathway using the linkage information in our network. Finally, we developed the pathway information content of each domain, meaning the specificity of pathway associated domains.. To evaluate the performance of the proposed method, in a microbe yeast (Saccharomyces cerevisiae) and multi-cellular human (Homo sapiens), we evaluated the predicted pathway information content of each domain by literatures and the enrichment analysis with known domains for Gene Ontology biological process (GO-BP) terms by Interpro2GO.

 

 

TBC-43: Identification and Characterization of Gastric Cancer Subtypes using Expression Microarray Data  

 

Haein Kim1, Ensel Oh1, Young Kee Shin1 and Yoon-La Choi2

 

1Laboratory of Molecular Pathology and Cancer Genomics, Seoul National University College of Pharmacy, Seoul, Korea
2Laboratory of Cancer Genomics and Molecular Pathology, Department of Pathology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea

 

Gastric cancer is one of the most common cancer in Korea, and the development of targeted therapies in the treatment of gastric cancer have been accelerated by the emerging understanding of gastric cancer genome. Alike other types of cancer, gastric cancer is highly heterogeneous, and the identification and characterization of gastric cancer subtypes are the first step to search novel targets for anti gastric cancer drugs. We selected 265 genes showing significantly over expressed in gastric tumors by comparing the expression microarray data of 80 paired gastric tumor and matched normal tissues using Significance Analysis of Microarray (SAM) and NetRank with COXPRESdb database. With the selected genes, we identified two subtypes (subtype A and subtype B) of gastric cancer by clustering the independent 200 gastric cancer tissues. According to GO analysis, the 88 genes which showed high expression in subtype A were related to angiogenesis and Wnt-signaling, and the last of the selected genes which showed high expression in subtype B were involved with immune response such as monocyte and leukocyte chemotaxis. We observed that the subtype A included high stage (stage¥², IV) tumors more than subtype B, and it seemed to be related with the active angiogenesis and Wnt-signaling in subtype A. In subtype B, high activity of immune response seemed to keep early tumors from developing to higher stage. From the identification of two subtypes of gastric cancer and characterizing each subtype, we could understand the gastric cancer genome more profoundly and the selected genes would provide the clue to find the targets for anti-gastric cancer drugs.

 

 

TBC-44: Functional nucleotide polymorphism in the promoter region of WFS1 gene

 

Yoonsook Moon1, Minyoung Kong1, Younyoung Kim1 and Chaeyoung Lee1

 

1School of Systems Biomedical Science, Soongsil University, Seoul, Korea

 

Genomewide association studies have identified common variants of the genetic risk for type 2 diabetes (T2D), especially by several international consortia. A recent meta-analysis has revealed four nucleotide variants including rs4689388 associated with T2D (P < 2 x 10-8). The variant was located in the promoter of Wolfram Syndrome 1 (WFS1) gene. Thus, we investigated promoter activity with 2 haplotypes (ATCGT with the frequency of 0.67, GATCG with the frequency of 0.33) estimated with 5 SNPs (rs4689388, rs4320200, rs13107806, rs13127445, and rs4273545) in strong linkage around the rs4689388. Luciferase assay for reporter-WFS1 haplotype constructs in HEK293 cells showed that the minigene with the wild haplotype showed a larger expression level than that with the minor haplotype (P < 0.05). Further analysis revealed that the expression level with the minor haplotype was smaller (P < 0.05) than that with the substitution of its first allele (AATCG), but corresponding to that with the wild haplotype (P > 0.05). In conclusion, rs4689388 was the functional variant for up-regulation of the WFS1 gene. Its major allele (A) could produce excessive product of the gene, which increases endothelial reticulum (ER) stress. Finally, a considerable ER stress would lead to a large susceptibility to T2D.

 

 

TBC-45: Comparison of Formaldehyde Fixed Paraffin Embedded (FFPE) and Frozen Tissues for Exome Sequencing

 

Ensel Oh1, Yoon-La Choi2 and Young Kee Shin1

 

1Laboratory of Molecular Pathology and Cancer Genomics, Seoul National University College of Pharmacy, Seoul, Korea

2Laboratory of Cancer Genomics and Molecular Pathology, Department of Pathology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea

 

Formalin-fixed, paraffin-embedded (FFPE) tissue is the most widely practiced method for clinical sample preservation and archiving. However, FFPE tissues have been unfavoured for NGS sequencing because its DNA/RNA is likely to be mutated or degraded through the preparation procedure of formaldehyde fixation. We investigated whether the DNA from FFPE tissue was compatible with frozen tissues for exome sequencing. Exome sequencing was performed with two paired FFPE and frozen tissues generated from two dermatofibrosarcoma protuberance (DFSP) cancer tumors. The DNA from the FFPE tissues were severely degraded compared to the frozen tissues, therefore, the insert size of the FFPE tissue was quite shorter than the frozen tissues. However, the sequencing base quality of the FFPE tissues was as good as frozen tissues, and the average coverage of both types of tissues were almost the same as about x100. The rate of properly mapped paired reads were about 90% for frozen tissues and 70% for FFPE tissues, and more than 95% of total targeted exomes were completely covered in both frozen and FFPE tissues. The number of SNPs called from FFPE tissues were similar to from the frozen tissues, and the dbSNP rate and Ti/Tv ratio of SNPs from FFPE tissues were 95% and 2.5 respectively. The number of Indels from FFPE tissues were also similar to from frozen tissues. Tumor specific SNPs were selected by subtracting the SNPs in blood from either the SNPs in FFPE or in frozen tissues, and the FFPE and frozen tissues showed well overlapped lists of SNPs indicating that FFPE is compatible with frozen for exome sequencing. From the results, we conclude that FFPE tissue could be a good resource for cancer genome study using exome sequencing.

 

 

TBC-46: Molecular and biochemical characterization on the artificial hibernation in the olive flounder, Paralichthys olivaceus

 

Meehye Kang1, Gila Jung1, Sung Kim1, Wan-Soo Kim1 and Youn-Ho Lee1

 

1Marine Ecosystem Research Division, Korea Institute of Ocean Science & Technology, Ansan, Korea

 

The aim of this study was to understand the molecular and physiological changes in an artificially hibernated olive flounder, Paralichthys olivaceus. At first, biochemical properties of artificially hibernated organism were examined through blood analysis. Serum glucose and triglyceride were significantly increased (p < 0.05) during hibernation, while alkaline phosphate (ALP) and glutamic-pyruvic transminase (GPT) had no significant change (p > 0.05). Then the genes associated with the artificial hibernation were investigated with the brain tissue using RNA-seq technology. Change of the expressed genes was examined with DEGseq R package, and gene ontology (GO) functional enrichment analysis. A total of 915 differentially expressed genes including 468 up-regulated and 447 down-regulated genes (p < 0.001) were identified. The GO of the differentially expressed genes (DEGs) revealed 45 significantly enriched GO terms indicating up and down regulation of genes, most of which were associated with protein binding, transcription factor activity, transcription factor complex, and sequence-specific DNA binding. Several genes such as intestinal fatty acid binding protein (IF), period 4, and somatolactin (SL) showed significant change in the expression level. For IF and SL, the change of expression level was quantitatively confirmed by the real time PCR.

 

 

TBC-47: Unraveling selection signatures by composite log likelihood  

 

Jihye Ryu1 and Chaeyoung Lee1

 

1School of Systems Biomedical Science, Soongsil University, Seoul 156-743, Korea

 

Positive selection not only increases beneficial allele frequency but also causes augmentation in allele frequencies of sequence variants in proximity. Signals for the positive selection would be identified by harbouring distribution of the sequence variants around a favourable mutation, and statistical differences from the expected values by chance determines the signals. We introduced a composite log likelihood-based method (CLL) which calculates a composite likelihood of the allelic frequencies observed across sliding windows of 5 adjunct loci and compares the value with the critical statistic estimated by 50,000 times of permutation. We applied the method to identification of selection signatures in Korean cattle. A total of 11,799 nucleotide polymorphism data were used for 71 Korean cattle and 209 foreign beef cattle. As a result, 147 signals were observed between Korean cattle and foreign cattle (P < 0.01). The selection signatures with the greatest CLL for each of 30 chromosomes encompassed 148 sequence variants among which 41 variants were located in the region encoding proteins. The signals might be candidate genetic factors for beef quality by which the Korean cattle have been selected.

 

 

TBC-48: The Health Avatar Platform: development of platform for interacting health agents and personal avatar

 

Hee-Joon Chung1,2, Byoungoh Kim1, Taehun Kim1, Keun Bong Kwak1 and Dongman Lee1

 

1Department of Computer Science, KAIST, Daejeon 305701, Korea

2Seoul National University Biomedical Informatics (SNUBI), Seoul National University College of Medicine, Seoul 110799, Korea

 

eHealth is a field of increasing interest with the potential to revolutionize the way health care and prevention is provided, shifting the balance of power and responsibility from health care professionals to patients and citizens. Health avatar is a user application that provides health information through health agent based on personal medical, genomic and ubiquitous data. The Health Avatar Platform (HAP) is a run-time environment for allowing appropriate intelligent health agents to get ¡°plug-in¡±ed to a health avatar and providing a data and access grid for heterogeneous clinical and genomic data.

We have completed the first phase of the HAP: a) defining an application programming interface for both avatar and agent developers, b) developing a broker that provides a match-making service between agent and avatar and a communication channel between them, and c) prototyping an obesity management agent application as a showcase of the system capabilities.

 

 

TBC-49: Systematic Analysis of Genotype-dependent Gene Expression Signatures and Drug Sensitivity in NCI60 Datasets

 

Ningning He1 and Sukjoon Yoon1

 

1Department of Biological Sciences, Sookmyung Women¡¯s University, Seoul 140-742, Korea

 

Most cell lines recapitulated known tumor-associated genotypes and genetically defined cancer subsets, irrespective of tissue types. Drug treatment on many different cell lines provides an important preclinical model for early clinical applications of novel targeted inhibitors. The NCI60 is a program developed by the NCI/NIH aiming the discovery of new chemotherapeutical agents to treat cancer. Here we present a novel statistical method, CLEA (Cell Line Enrichment Analysis) to quantitatively correlate the genotype with gene expression signatures and drug sensitivity in cancer cell lines. The results provided us new insights on genotype-dependent gene expression signatures, cancer pathways and chemical sensitivity. It will have applications in predicting and optimizing therapeutic windows of anti-cancer agents.

 

 

TBC-50: The role of TRP channel interactome in prostate cancer

 

Jin-Muk Lim1, Jung Nyeo Chun2, Hong-Gee Kim1 and Ju-Hong Jeon2

 

1Biomedical Knowledge Engineering Lab, Seoul National University, Korea
2Department of Physiology, Seoul National University College of Medicine, Korea

 

Transient receptor potential (TRP) channels translate various cellular stimuli into electrochemical signals, leading to changes in membrane potentials and intracellular Ca2+ levels. Aberrant regulation of intracellular Ca2+ homeostasis is closely associated with various cancers, particularly prostate cancer: however, the possible involvement of TRP channels in prostate cancer is largely unknown. To explore the role of TRP channels in prostate cancer, in this study, we have attempted to extract and integrate two different datasets: prostate cancer microarray data from the GEO database (accession # GSE3325) and TRP channel interactome data from the TRIP Database 2.0 (http://www.trpchannel.org). We found altered expression pattern of TRP channel interactome components according to tumor stages (benign, primary, and metastatic), which is represented as node-weighted networks using the Cytoscape program. Co-expression correlation analysis identified that certain TRP channel isotypes tend to be co-expressed with their interacting proteins, which can support disease module hypothesis of network medicine. In addition, we performed GO and pathway analyses to identify how certain TRP channels are associated with prostate cancer phenotypes. Our results may help future experimental investigation to understand the role of TRP channel-mediated Ca2+ signaling in prostate cancer biology and to develop novel therapeutic strategies for treatment of prostate cancer. [This research was supported by the MKE(The Ministry of Knowledge Economy), IT Convergence Healthcare Research Center support program supervised by the NIPA(National IT Industry Promotion Agency) (NIPA-2012-H0401-12-1001)]

 

 

TBC-51: Using CSSP to predict chameleon peptides   

 

Xiaoqi Wang1 and Sukjoon Yoon1

 

1Department of Biological Sciences, Sookmyung Women¡¯s University, Seoul 140-742, Korea

 

The sequence potential for non-native ¥â-strand formation and the presence of protein sequences have been investigated extensively from the perspective that such structural features are implicated in protein stability and effectiveness. We demonstrated that calculation of contact-dependent secondary structure propensity (CSSP) is highly sensitive in detecting non-native beta-strand propensities in helical regions of proteins. Beta-sheet formation is the main reason for protein aggregation. Based on our study, the CSSP method offers an alternative for designing peptide fragments with varied propensity for conformational change between helix and beta-strand.

 

 

TBC-52: Transcriptome analysis during the developmental stages for predator induced polyphenism in Daphnia pulex

 

Haein An1, Gila Jung2 and Chang-Bae Kim1

 

1Department of Green Life Science, Sangmyung University, Seoul 110743, Korea

2Marine Ecosystem Research Division, Korea Institute of Ocean Science and Technology, Ansan 426744, Korea

 

An invertebrate crustacean Daphnia pulex is one of the most suitable models for understanding how organisms adapt and survive to aquatic environmental stresses including predator-induced morphological responses. It has been known that neckteeth formation and maintenance at critical times is a defensive mechanism for D. pulex against the predator Chaoborus sp. The genetic mechanism of the defensive morph formation and maintenance for developmental ranges is very little known. To understand its genomic mechanism, we carried out comprehensive transcriptomes at various developmental stages in D. pulex by using RNA-seq technique. As the results, 37 Gb raw reads were generated and assembled. The 62,228 unigene clusters were annotated by blastx alignments against NCBI non-redundant (NR), COG, SwissProt, GO, and KEGG databases. According to the searches, 30,495 unigene clusters were matched to at least one database. Gene expression differences among developmental stages were greater than those between the two phases, normal and defensive morph in each stage. Differentially expressed transcripts (DETs) were discovered by measuring and comparing gene expression between the two phases in each stage. The most distinct phase differences in gene expression appeared in adult/egg stage. According to the detailed analyses, the defensive morph in the stage shows lower activity in signalling molecules and interaction, nucleotide metabolism. We identified 68 transcripts as candidates for defensive morph markers, containing insect cuticle protein and receptor transporting protein. This study could contribute to further studies of the candidate genes and epigenetic mechanism for defensive morph formation and maintenance in D. pulex.

 

 

TBC-53: Network analysis by phylogenetic profiling revealed domain-specific evolution of cellular pathways

 

Junha Shin1 and Insuk Lee1

 

1Network Biology Laboratory, Department of biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 120749, Korea

 

Phylogenetic profiling is a computational method to identify functional associations of genes within one organism, based on the comparisons of evolutionary co-inheritance patterns according to the completely sequenced genomes of other organisms. The composition – both abundance and heterogeneity - of genome set and the scoring scheme for relationship are two important factors to affect to the utility of a profile. Because a profile needs only genome sequence data to be generated, it is a practical bioinformatic technique along with recently advanced sequencing techniques and those exponentially growing sequenced data results. There are several previous reports that this method works optimally with a genome set consisted of bacterial organisms only.

Here we reinvestigated the optimal condition for phylogenetic profiling with increased fully sequenced genomes which were not available in previous studies. We could verify the improvement of prediction performance by grown numbers of genome data; therefore, at now, it could be available not only to discover functional association of genes even in higher eukaryote but also to retrieve human disease genes via investigating the resultant network model. Moreover, co-inherited genes associations show differences in various features between the inherited orientation of prokaryote and eukaryote. Followed by these distinctions, we could find the domain-specific nature and also explain the molecular mechanisms of pathway-level evolution.

 

 

TBC-54: Functional polymorphism located in the promoter of the coagulation factor XI gene as a putative genetic factor for susceptibility to venous thromboembolism

 

Minyoung Kong1, Younyoung Kim1 and Chaeyoung Lee1

 

1School of Systems Biomedical Science, Soongsil University, Seoul, Korea

 

Several genome-wide association study (GWAS) and meta-analysis of GWAS have been conducted for venous thromboembolism (VTE). A recent MARTHA and FARIVE project was reported the rs3756008 in promoter region of the coagulation factor XI (FXI) gene as nucleotide sequence variant associated with VTE in European (P = 6.46 x 10-11). Coagulation factor XI (FXI) is the zymogen of a plasma serine protease (FXIa) triggered the middle phase of the intrinsic blood coagulation pathway, and its plasma levels were associated with VTE. Thus, we searched the SNPs in strong linkage around the rs3756008, and the rs3756009 was selected. We investigated alteration of luciferase-reporter gene expression by the 2 haplotypes (AA with the frequency of 0.62, TG with the frequency of 0.38) and by the each SNP in HEK293 cells. Wild haplotype-reporter minigene showed a larger expression level than minor haplotype-reporter minigene (P < 0.001). Further analysis revealed that nucleotide substitution (A to T) at rs3756008 showed difference for expression level of 2 haplotypes (P < 0.001). In conclusion, minor allele (T) at rs3756008 was the regulatory allele for low expression of the FXI gene. Low FXI levels might result in reduced functional activity of activated coagulation factor XII (FXIIa), and blockage of FXIIa activity might be involved in the risk of vessel occlusion. It could not exclude a possibility that low FXI levels might lead to a susceptibility to VTE.

 

 

TBC-55: Temporal gene expression profiles identify genetically determined transcriptional regulation of human leukocytes  

 

SeongBeom Cho1, InSong Go2, Hyo-Jeong Ban1, Hyesun Yoon1, Yeunjung Kim1, Jaepill Jeon1 and BokGhee Han1

 

1Center for Genome Science, National Institute of Health, Korea Center for Disease Control, Chungcheongbuk-do, Republic of Korea

2Department of Physiology, School of Medicine, Hanyang University, Kyungkido, Republic of Korea

 

In this study, we investigated genetic markers affecting temporal gene expression in human leukocytes using expression quantitative trait (eQTL) loci analysis. During an oral glucose tolerance test, glucose, insulin levels and gene expressions of leukocytes in peripheral blood were measured at three time points. Through eQTL analysis, we identified relationship between gene expression, genetic component and environmental factors. Association analysis between the gene expressions and SNPs only (marginal model) found cis SNPs showing differential allele-specific gene expression. The analysis with the interaction terms (interaction model) identified interactions between SNPs and temporal glucose or insulin levels, or both, which significantly affected gene expression. Functional annotation revealed that the significant SNPs of the marginal model were related to various diseases. Moreover, SNPs of the interaction model showed a strong tendency for transcription factor binding site enrichment. Finally, using a differential allele-specific coexpression (DACE) method, we searched for SNP–pathway pairs that showed molecular networks of significant allele-specific changes of coexpression. The DACE method identified a trans-regulatory effect of the SNPs on pathway gene coexpression patterns. In conclusion, we identified tentative genetic markers affecting temporal gene expression change in human leukocytes through a genetic component alone or through interaction with the genetic components, glucose and/or insulin. These results will be resource for studying regulatory components of biological processes that are either determined by genetic component alone or by gene–environment cross talk.

 

 

TBC-56: gsGator – an integrated web platform for cross-species gene set analysis  

 

Hyunjung Kang1, Sooyoung Cho1, Ikjung Choi1, Yeongjun Jang2, Sanghyuk Lee1,2 and Wankyu Kim1

 

1Department of Life and Pharmaceutical Science, Ewha Womans University, Ewha Research Center for Systems Biology, 52, Ewhayeodae-gil, Seodaemun-gu, Seoul 120-750 Korea

2Korean Bioinformation Center, Korea Research Institute of Bioscience and Biotechnology, Daejeon 305-806, Korea

 

Gene set analysis (GSA) is useful to interpret its biological theme using a priori defined gene sets such as gene ontology or pathway. While model organisms are a rich source for inferring the function of human genes, few GSA tools enable to use these information. Here, we developed gsGator, a web-based platform for functional interpretation of gene sets with many useful features such as cross-species GSA, simultaneous analysis of multiple gene sets, and a fully integrated network viewer. An extensive set of gene annotation information is amassed including GO & pathway, genomic annotation, molecular network, miRNA target and phenotype information from various model organisms. gsGator enables virtually fully-automated analysis, providing intuitive understanding of the relations among genes and gene sets using an interactive network viewer. Particularly, gsGator supports cross-species GSA in a user-friendly manner, allowing full utilization of accumulated knowledge e.g. knockout phenotype from model organisms. Cross-species GSA greatly expands the scope of GSA, leading to the discovery of conserved gene modules among different species. (http://gsGator.ewha.ac.kr).

 

 

TBC-57: Identification of transcriptional network regulating prognostic gene expression signature of colorectal cancer patients  

 

Taejeong Bae1,2,2, Kyoohyoung Rho1, Yong-Ho In2,3 and Sunghoon Kim1

 

1College of Pharmacy, Seoul National University, Seoul 151-742, Korea
2Information Center for Bio-pharmacological Network, Seoul National University, Suwon 443-270, Korea
3Medicinal Bioconvergence Research Center, Advanced Institutes of Convergence Technology, Suwon 443-270, Korea
4Korean Bioinformation Center, Daejeon, Korea
5World Class University Program Department of Molecular Medicine and Biopharmaceutical Sciences, Seoul National University, Seoul 151-742, Korea

 

Background

Identification of gene expression signatures in cancer patients has been proven useful to determine the cancer types and stage and also to predict the prognosis of patients. However, expression signature itself does not provide information about the causality of changes of pathological cellular states. Construction of a transcription network that regulates the cancer signature can provide clues to hidden mechanisms of cancer progression.

Results

Here we inferred and analysed the transcriptional network regulating prognostic gene expression signature of colorectal cancer that is known to classify patients to good prognosis and poor prognosis group. To construct a colon cancer-specific regulatory network, we used the ARACNE algorithm followed by a series of filtering algorithms to find significant transcription factors. The inferred network consists of 9 transcription factors (TFs) regulating 75 genes out of 86 genes in colon cancer signature. The following analysis identified 6 TFs (PRRX1, SPDEF, FOSL2, HIF1A, RUNX1 and FOXD1) as master regulators regulating high risk signature genes for poorer prognostic subgroup and 3 others (PLAGL2, ASCL2 and TCF7) as ones regulating low risk signature genes for better prognostic subgroups. The common tumorigenic feature of HIF1A, RUNX1 and FOSL2 suggested that the tumorigenic feature of prognostic gene signature may be involved in metastasis of colorectal cancer while the tumorigenic roles of PRRX1, SPDEF and FOXD1 are unclear.

Conclusions

These results showed that the transcriptional network analysis is a powerful tool to reveal the regulatory programs related to prognosis of colorectal cancer patients.

 

 

TBC-58: Local Similarity Search of Physicochemical Properties in Protein-Ligand Binding Sites

 

Lee Sael1 and Daisuke Kihara2

 

1State University of New York Korea, Korea

2Purdue University, USA

 

Physicochemical similarity search of protein binding site have various applications such as finding the protein binding partners, protein function prediction, and prediction of unintended drug binders. We present two ligand binding pocket comparison methods: Pocket-Surfer (Chikhi R. et al. Proteins, 2010) and Patch-Surfer (Sael L. et al. Proteins, 2012). Pocket-Surfer captures shape and physicochemical properties of a binding site surface globally. In contrast, Patch-Surfer represents a binding site as a combination of segmented surface patches, each of which is characterized by its geometric shape, electrostatic potential, hydrophobicity, and concaveness. By relaxing the constraint put on by rigidity of global binding site structure, local similarities can be captured. This is effective when pocket shapes are slightly different due to structural flexibility but bind to the same ligand type. Both methods encode the surface properties of whole pocket or patches that compose the pockets by the 3D Zernike descriptors, which have been found to be successful in representing protein global surface properties (Sael L., Li B., et al. Proteins, 2008; Sael L., La D. et al. Proteins, 2008). We validated the two proposed method by measuring the prediction accuracy of the ligand binding predictions, i.e., predictions of the types of ligand that can bind to proteins. The performance was evaluated on a data set of 100 non-homologous proteins that bind to either one of nine types of ligands. 84.0% of the binding ligands were predicted correctly within the top three scoring ligands with the shape and pocket size information using the Patch-Surfer and 81.0% when Pocket-Surfer was used. The performance was further improved to 87.0% when surface properties, i.e. electrostatic potential and hydrophobicity, were added in the Patch-Surfer. Overall, we show that proposed methods are powerful in protein binding site similarity analysis even in the absence of homologous proteins in the database.

 

 

TBC-59: Association analysis of CNV data with linear mixed model

 

Meilling Liu1, Sanghoon Moon2, Youngjin Kim2 and Sungho Won1

 

1Dept of Statistics, Chung-Ang University, Korea

2The Center for Genome Science, Korea National Institute of Health, Korea

 

Copy number variation (CNV) has been expected to have an important effect on human genetic diseases. However even though several statistical methods have been proposed for CNV association studies, most of the existing approaches are restricted to the independent individuals. In this manuscript, we provided a new method for the analysis of CNV with related samples and it can also be applied to the unrelated samples under the presence of population substructure. The proposed approach consists of signal model, phenotype model and copy number model where the signal model provides the relationship between the observed intensity and the unknown CNV, and phenotype model explains the causality of the CNV to the phenotype. In our approach, we considered the correlation structure for both signal and phenotype model, and the multiple probe intensities are incorporated to them. Our simulation studies show that the proposed method outperforms the previous approaches and we illustrate the practical implications of the new analysis method by an application to Alzheimer.

 

 

TBC-60: Analysis of longitudinal data : Applications of Linear Mixed Model to The Korean Association Resource(KARE)    

 

Young Lee1, Suyeon Park1, Woojoo Lee2 and Sungho Won1

 

1Dept of Statistics, Chung-Ang University, Seoul, Korea

2Department of Statistics Inha University, Korea

 

Last decade genome-wide association studies (GWAS) has been successfully accomplished and we could find many significantly associated SNPs with phenotypes of interest. However the multiple testing problem is still intractable issues and it becomes more serious for next generation sequencing analysis. In this manuscript, we investigated the analysis of longitudinal data for GWAS. Because genotyping cost is often more expensive than phenotyping, the longitudinal data analysis can be an alternative choice for multiple testing problems. Here the linear mixed model has been applied to the phenotypes with repeated observations in Korean Association REsource (KARE) project and principle component analysis (PCA) has been conducted to adjust for population stratification. We found that the power is proportional to the number of repeated measurements and sample size while it is inversely proportional to the correlation coefficient of repeated observations.

 

 

TBC-61: Differential influences of common variants on erythrocyte-related traits according to Sasang constitutional types

 

Seongwon Cha1, Hyunjoo Yu1 and Jong Kim2

 

1Constitutional Medicine & Diagnosis Research Group, 2Vice-President, Korea Institute of Oriental Medicine (KIOM), Daejeon, 305-811, Korea

 

Hematological disorders such as anemia and erythrocytosis characterized by measuring erythrocyte-related traits are known to be associated with cardiometabolic diseases. Genetic variants associated with hematological traits have been elucidated in several genome-wide association studies (GWAS). In Sasang constitutional medicine (a Korea-specific type of personalized medicine), human beings are categorized into four types harbouring differential prevalence of cardiometabolic diseases and anemia. In this study, we aimed to investigate whether each constitutional type had differential genetic factors associated with hematological traits. Therefore, we examined the effects of the variants reported to be definitely associated with hematological traits from previous GWAS researches on the same hematological traits according to Sasang constitutional types. We performed multiple linear regression analyses with measurements of RBC, Hb, Hct, MCV, MCH, MCHC, and RDW in two Korean populations: 1,701 and 3,472 subjects recruited from the Korea Constitution Multicenter Study and the Korea Genome and Epidemiology Study, respectively. The Sasang constitutional types were categorized by the Sasang Constitutional Analysis Tool: in total, 2,696 subjects with Taeum type, 1,881 subjects with Soyang type, and 596 subjects with Soeum type. Among initially selected over 30 polymorphisms, we finally found 4 variants in 4 genetic loci (HBS1L-MYB, TMPRSS6, SPTA1, and ITFG3) presenting association signals both in the two populations. Two variants of HBS1L-MYB and TMPRSS6 were associated with measurements of RBC, MCV, MCH, MCHC, and/or RDW in total population and two sub-populations with Taeum and Soyang types. The variant of SPTA1 was associated with MCHC in total populations, and the ITFG3 variant was associated with Hb in a sub-population with Soeum type. These results showed that the profile of variants associated with hematological traits was different according to Sasang constitutional types, especially between Soeum type and the others.

 

 

TBC-62: Comparing algorithms for genotype imputations in family-based design  

 

Kim Youngdoe1, Lim Jungmin2, Li Donghe2, Lee Jaemoon2 and Won Sungho2

 

1Division of Structural and Functional Genomics, The Center for Genome Science, Korea National Institute of Health, KCDC, Osong, Korea

2Department of Applied Statistics, Chung-Ang University, Seoul, Korea

 

Genotype imputation is now an essential tool in the analysis of genome-wide association scans to handle the missing data, untyped genotypes, etc. However, even though its importance, a few approaches have been proposed for the imputation of genotype in family-based design, and the accuracy for each method has not been confirmed. In this manuscript we compared several methods for genotype imputations with Korean Healthy TWIN cohort. We compared IMPUTE2, BEAGLE, MACH and GHOST, and the accuracy for each software has been calculated. In addition we considered two-stage imputation algorithm. We, first, impute the genotypes with Mendelian transmission and then haplotype-based imputation algorithm has been conducted. Even though the difference between different software is small, our results show that the two-stage algorithm performs slightly better.

 

 

TBC-63: A large-scale genome-wide association study of Korean Family cohorts for genetic variants influencing metabolic syndrome

 

Youngdoe Kim1,2, Yong Ki Jung2, Sung Oh Kang2, Nam Hee Kim1, Young Jin Kim1,Juyoung Lee1, Sungho Won2

 

1Division of Structural and Functional Genomics, The Center for Genome Science, Korea

National Institute of Health, KCDC, Osong, Korea

2Department of Applied Statistics, Chung-Ang University, Seoul, Korea

 

To identify genetic factors influencing several traits (height, body mass index (BMI), triglycerides (TG), high density lipoprotein (HDL), low deinsity lipoprotein (LDL), diastolic blood pressure (DBP) and systolic blood pressure (SBP)) of metabolic syndrome (MetS), we conducted a genome-wide association study (GWAS) with 1,801 samples from Korean Healthy Twin cohorts and 784 samples from Ansung Family extended cohorts recruited in Korea. In particular we found that the phenotypic distributions for TG were not normally distributed and thus they were log-transformed for GWAS. The linear mixed model with the restricted maximum likelihood (REML) method has been applied to find a significant association. We found that two SNPs were significantly associated with log TG at the genome-wide scale and both SNPs were replicated in the other cohort.

 

 

TBC-64: Ethical, Legal, and Social Frameworks on Issues of Bioinformatics

 

Hannah Kim1, Ilhak Lee1, Ji Yong Park1, Sang Hyun Kim2 and So Yoon Kim1,3

 

1Department of Health Law and Bioethics, College of Medicine, Yonsei University, Korea

2Department of Health Law and Bioethics, Graduate School of Public Health, Yonsei University, Korea

3Centre for ELSI Research, Asian Institute for Bioethics and Health Law, Yonsei University, Seoul 120821, Korea

 

Fundamental roles of bioinformatics are to identify the genes and cellular pathways relating to diseases and to link them to the advanced clinical fields such as prevention, diagnosis, and treatment of human diseases. Whereas this field accelerates the progress of development and generalization, it raises various ethical, legal, and social questions focusing on patients or research participants.

Thus, Centre for Ethical, Legal, and Social Issues Research (Centre for ELSI Research) developed frameworks to investigate, analyse, and evaluate the developed issues in the aspects of ethical, legal, and social context. The frameworks are efficient not only to predict the effects of translational bioinformatics and medicine so to make appropriate response or strategies, but also multinational comparative studies. We expect the applicable range of the frameworks is from bioinformatics to other cutting–edge biotechnology area.

Going through the final stage of development of the framework, we are planning next step. It is to address the implications for individuals and society, drawing all prospective ethical, legal, and social issues on each sub-project, as well as reviewing key issues through discussions with researchers and expert panels, as our next step. This article will provide the introduction of the whole schemes for refining them more.

 

 

TBC-65: PATH2: Software for Conducing Gene-Ontology And Pathway Based Analyses using Genome-Wide Association Data

 

Denise Daley1, David Zamar1, Ben Tripp1, Brad Cavanagh1 and George Ellis1

 

1University of British Columbia, Canada

 

Most genome-wide association (GWA) studies lack the power to detect single nucleotide polymorphisms (SNPs) with small effects. However, the aggregate effect of several SNPs working together within a pathway is more easily detectable. Testing for pathway-based association is a promising approach in identifying genes with small additive effects that work together to increase or decrease susceptibility to common complex diseases. Perhaps the most important role performed by pathway-based approaches is in the identification of underlying biological mechanisms leading to disease. Although several algorithms exist for conducting pathway-based analyses, not all of them have been implemented for public usage. We have developed a software package that implements several pathway-based methods and provides an easy to use interface for conducting analyses. Source code and binaries are freely available for download at http://genapha.icapture.ubc.ca/Path2. Our software is implemented in Java, but makes use of both Perl and R and is supported on Linux and Windows. To illustrate its usage, we perform an ontology-based and a pathway-based analysis of the published results from the GABRIEL consortium large-scale genome-wide association study of asthma.

 

 

TBC-66: Comparison of Genetic Variations in Drug Metabolizing Enzyme and Transporter Genes among Korean, Japanese, and Chinese Population

 

SoJeong Yi1, Sangin Lee2, Youngjo Lee2, Seonghae Yoon1, Inbum Chung1, HyeKyung Han1, Jae-Yong Chung1, Ichiro Ieiri3 and In-Jin Jang1

 

1Department of Clinical Pharmacology and Therapeutics, Seoul National University College of Medicine and Hospital
2Department of Statistics, Seoul National University, Seoul, 110-799, Korea
3Department of Clinical Pharmacokinetics, Graduate School of Pharmaceutical Sciences, Kyushu University, Fukuoka, 812-8582, Japan

 

Inter-ethnic difference of genetic polymorphism in genes encoding drug-metabolizing enzymes and drug transporters is one of major factors causing ethnic sensitivity for drug response. In this study, the authors explored genetic differences among 3 major East Asian populations, Korean, Japanese, and Chinese in single nucleotide polymorphisms (SNPs) on genes related with drug absorption, metabolism, disposition, and transport.

Using DMET® plus platform (Affymetrix, USA), the allele or genotype frequencies of 1,936 variants (1,931 SNPs and 5 copy number variations) representing in 225 drug-metabolizing enzyme and transporter genes were determined from 786 healthy male participants (448 Koreans, 208 Japanese, and 130 Chinese). To compare allele or genotype frequencies among 3 ethnic groups in the high-dimensional data, a principal component analysis (PCA) method and regularized multinomial logit model, which is a multi-class classification procedure, were employed.

Of the 1,936 variants, 1,071 variants (55.3%) were monomorphic and 127 variants (6.6%) were 'no call', therefore, the rest 738 biallele variants were analysed. The result of PCA showed that Korean, Japanese, and Chinese were not distinguished by first few principal components. However, multinomial logit model via least absolute shrinkage and selection operator (LASSO) could classify three ethnic groups using a model with 105, 98 and 99 selected markers for Korean, Japanese, and Chinese, respectively. The accuracy of prediction model was 87.9%, and misclassification error rate was 12.1%. The most significant genetic variations were EPHX1_16466T>C for Korean (coefficient= -1.24), CYP2A6_1799T>A for Japanese (coefficient = 2.45), and rs17064 on ABCB1 for Chinese (coefficient = 2.37).

In conclusion, this comprehensive genetic variant assessment suggests that genetic differences in genes encoding drug-metabolizing enzymes and drug transporters are very small among Korean, Japanese, and Chinese.