Biomedical Informatics Grand Round

현재 참여하는 연구실은 다음과 같습니다.
EWUBI 이화여자대학교 분자생명과학부 이상혁교수님 연구실
이화여자대학교 컴퓨터학과 박현석교수님 연구실
SABB 서울대학교 식품,동물 생명공학부 김희발교수님 연구실
SNUBI서울대학교 의과대학 김주한교수님 연구실
문의사항은 김주한교수님 연구실의 김도균 (dkkim@snu.ac.kr 02-740-8319)에게 문의해주십시요.

네 번째 세미나는 서울대학교 관악캠퍼스 농생대 대회의실 200동 3016호 에서 2007년 4월 21일 오전 9시에 시작합니다.

Affiliation	Presenter	Abstract
SNUBI	Mi Ryung Han	Protein classification from protein-domain and gene-ontology annotation information using formal concept analysis There are a number of different attributes to describe ontology of proteins such as protein structure, biomolecular interaction, cellular location, and protein domains which represent the basic evolutionary units that form protein. In this paper, we propose a mathematical approach, formal concept analysis (FCA), which toward abstracting from attribute-based object descriptions. Based on this theory, we present extended version of algorithm, tripartite lattice, to compute a concept lattice. By analyzing tripartite lattice, we attempt to extract proteins, which are related to domains and gene ontology (GO) terms from bottom nodes to the top of lattice. In summary, using tripartite lattices, we classified proteins from protein domain composition with their describing gene ontology (GO) terms.
BOIPOP	Kyugn Mo Kim	Molecular Evolution and Phylogenetic Potential of Lanosterol Synthase in Animals and Fungi Lanosterol synthase is strongly related to the fluidity and ion permeability of cell membranes and the metabolism of steroid hormones. The absence of this enzyme can lead to no production of cholesterol and ergosterol, which is fatal to cell viability in animals and fungi. In terms of evolution, lanosterol synthase is the most recent common ancestor in the biosynthetic pathways related to cholesterol and ergosterol. Of 255 homologous sequences retrieved from public databases, we identified 25 orthologs of lanosterol synthase. The phylogenetic relationships of lanosterol synthase were almost completely congruent with the existing species divergence. The statistical tests, including maximum likelihood analyses of codon-based models, showed that negative selection has affected on the evolution of lanosterol synthase, indicating that the molecule has been under strong functional constraints. The results of the TLD and PTP tests showed that lanosterol synthase has a strong phylogenetic signal. Additionally, our novel combined test of bootstrapping and PHT revealed that the lanosterol synthase gene is highly compatible with the small subunit sequences of ribosomal DNA, indicating that the gene can be a good partner with the rDNA marker for phylogenetic studies of animals and fungi.
EWUBI	Youngah Shin	DEGASEST – a database of differentially expressed genes and alternative splicing using EST information Differentially expressed genes (DEG) are valuable resources for various biological and medical applications. DEGASEST allows the user to explore differentially expressed genes, transcripts (isoforms), and alternative splicing (AS) events based on EST information for human and mouse. Over 8,600 cDNA libraries were manually classified into 52 tissue/organ and cancer types for human, while over 1,100 cDNA libraries were classified into 36 tissue/organ, developmental stage and cancer types for mouse. Specific expression in any tissue and/or cancer type is inferred from statistical testing of EST clusters at three levels - gene, transcript, and splicing events. ECgene's genome-based EST clustering was used to assess the gene level expression, which is quite similar to the UniGene. Additionally, DEGASEST predicts the isoform level expression using ECgene's assembly and sub-clusters. Transcripts may be differentially regulated at the isoform level even though the gene itself has no specific expression pattern. Furthermore, DEGASEST includes the differentially regulated AS events such as exon-skipping, alternative donor/acceptor sites, and intron retention. Genome-wide search result was stored in a relational database and a user-friendly web interface is provided to support various types of queries.

Affiliation

Presenter

Abstract

SNUBI

Mi Ryung Han

Protein classification from protein-domain and gene-ontology annotation information using formal concept analysis

There are a number of different attributes to describe ontology of proteins such as protein structure, biomolecular interaction, cellular location, and protein domains which represent the basic evolutionary units that form protein. In this paper, we propose a mathematical approach, formal concept analysis (FCA), which toward abstracting from attribute-based object descriptions. Based on this theory, we present extended version of algorithm, tripartite lattice, to compute a concept lattice. By analyzing tripartite lattice, we attempt to extract proteins, which are related to domains and gene ontology (GO) terms from bottom nodes to the top of lattice. In summary, using tripartite lattices, we classified proteins from protein domain composition with their describing gene ontology (GO) terms. 　

BOIPOP

Kyugn Mo Kim

Molecular Evolution and Phylogenetic Potential of Lanosterol Synthase in Animals and Fungi

Lanosterol synthase is strongly related to the fluidity and ion permeability of cell membranes and the metabolism of steroid hormones. The absence of this enzyme can lead to no production of cholesterol and ergosterol, which is fatal to cell viability in animals and fungi. In terms of evolution, lanosterol synthase is the most recent common ancestor in the biosynthetic pathways related to cholesterol and ergosterol. Of 255 homologous sequences retrieved from public databases, we identified 25 orthologs of lanosterol synthase. The phylogenetic relationships of lanosterol synthase were almost completely congruent with the existing species divergence. The statistical tests, including maximum likelihood analyses of codon-based models, showed that negative selection has affected on the evolution of lanosterol synthase, indicating that the molecule has been under strong functional constraints. The results of the TLD and PTP tests showed that lanosterol synthase has a strong phylogenetic signal. Additionally, our novel combined test of bootstrapping and PHT revealed that the lanosterol synthase gene is highly compatible with the small subunit sequences of ribosomal DNA, indicating that the gene can be a good partner with the rDNA marker for phylogenetic studies of animals and fungi.

EWUBI

Youngah Shin

DEGASEST – a database of differentially expressed genes and alternative splicing using EST information

Differentially expressed genes (DEG) are valuable resources for various biological and medical applications. DEGASEST allows the user to explore differentially expressed genes, transcripts (isoforms), and alternative splicing (AS) events based on EST information for human and mouse. Over 8,600 cDNA libraries were manually classified into 52 tissue/organ and cancer types for human, while over 1,100 cDNA libraries were classified into 36 tissue/organ, developmental stage and cancer types for mouse. Specific expression in any tissue and/or cancer type is inferred from statistical testing of EST clusters at three levels - gene, transcript, and splicing events. ECgene's genome-based EST clustering was used to assess the gene level expression, which is quite similar to the UniGene. Additionally, DEGASEST predicts the isoform level expression using ECgene's assembly and sub-clusters. Transcripts may be differentially regulated at the isoform level even though the gene itself has no specific expression pattern. Furthermore, DEGASEST includes the differentially regulated AS events such as exon-skipping, alternative donor/acceptor sites, and intron retention. Genome-wide search result was stored in a relational database and a user-friendly web interface is provided to support various types of queries.

세번째 세미나는 서울대학교 의과대학 의대본관 308호 에서 2006년 10월 21일 오전 9시에 시작합니다.

Affiliation	Presenter	Abstract
SNUBI	Mingoo Kim	Extracting Regulatory Modules from Heterogeneous Gene Expression Data by Sequential Pattern Mining Motivation: Identifying a regulatory module (RM), a bi-set of coregulated genes and co-regulating conditions (or samples), has been an important challenge in functional genomics and bioinformatics. In our approach, the co-regulated genes are identified as a sequential pattern, resulting from sequential pattern mining on microarray data. The co-regulating conditions are identified as the corresponding samples to the genes. In order to fit the algorithm into biological implication at hand, the conventional definition for sequential pattern is relaxed by allowing trivial switch between consecutive elements in a sequence. The searching method is also modified to enhance flexibility and scalability. The modified method enables the algorithm to run for huge-sized microarray data and to finish in a reasonable time. The proposed algorithm is of great benefit when RM are identified from a large-scale gene expression matrix with heterogeneous conditions. Results: The resulting RMs are significantly well enriched to known annotations (about genes and conditions as well), and are consistent with known biological knowledge. In addition, the types of relations between RMs are further investigated; they are categorized into one of four types? independent, conditionally co-regulated, separately coregulated, and similar, based on the degree of overlap between two modules. The respective types of inter-module relations are exemplified with biological inferences via enrichment study.
SABB	Jongeun Park	Evolutionary characterization of the proteins containing KRAB-Zinc finger domains In the previous study (Kim et al., 2006), we have shown that the proteins containing KRAB-Zinc finger domains are likely to be responsible for the lineage specific function in mammals. Using the non redundant sets of proteins, the international protein index(IPI), developed by the EMBL, Pfam domain search were conducted to extract the proteins containing the KRAB-Zinc finger domains in five species; human, mouse, rat, chicken and Zebrafish. The number of the KRAB-Zinc finger proteins were much higher in mammals; human (312), mouse (430) and rat (486), that that of non mammalian vertebrate; chicken (12) and Zebrafish (0). In order to elucidate the evolutionary relationship of the proteins, phylogenetic analyses were conducted within species with homologous sequences, and between species with the orthologous sequences. Here, we suggest the proteins containing KRAB-Zinc finger domains seem to be mainly expanded after the evolutionary branching point of the mammalian and non mammalian vertebrate.
EWUBI	Bumjin Kim	Improved tag-to-gene assignment for reliable interpretation of SAGE data Serial Analysis of Gene Expression (SAGE) is a tag-based method of probing gene expression at the genome-wide level. Reliable tag-to-gene assignment is essential but often complicated due to many factors such as (i) sequencing errors, (ii) tag redundancy owing to short tag length (10bp in short SAGE, 21bp in long SAGE), (iii) interanl priming (use of alternative restriction sites), (iv) alternative polyA tails, and (v) presence of SNP in the restriction enzyme site or inside the tags. Conventainal procedure uses the tags extracted from the mRNA and EST sequences in a UniGene cluster without addressing those problems. We developed a computational pipeline that took alternate tags and experimental problems into consideration. First, we created the ‘virtual’ tag libraries from various gene models that included the RefSeq and the ECgene models of splice variants. Second, we created the ‘observed’ tag library after removing errorneous tags due to sequencing errors using a Monte Carlo simulation. Resulting observed tag library was compared with the virtual tag libaraies at varous confidence level. ECgene model of splice variants takes the alternative polyA tails into consideration. Alternative tags arising from SNP and internal priming were deduced.

Affiliation

Presenter

Abstract

SNUBI

Mingoo Kim

Extracting Regulatory Modules from Heterogeneous Gene Expression Data by Sequential Pattern Mining

Motivation: Identifying a regulatory module (RM), a bi-set of coregulated genes and co-regulating conditions (or samples), has been an important challenge in functional genomics and bioinformatics. In our approach, the co-regulated genes are identified as a sequential pattern, resulting from sequential pattern mining on microarray data. The co-regulating conditions are identified as the corresponding samples to the genes. In order to fit the algorithm into biological implication at hand, the conventional definition for sequential pattern is relaxed by allowing trivial switch between consecutive elements in a sequence. The searching method is also modified to enhance flexibility and scalability. The modified method enables the algorithm to run for huge-sized microarray data and to finish in a reasonable time. The proposed algorithm is of great benefit when RM are identified from a large-scale gene expression matrix with heterogeneous conditions.
Results: The resulting RMs are significantly well enriched to known annotations (about genes and conditions as well), and are consistent with known biological knowledge. In addition, the types of relations between RMs are further investigated; they are categorized into one of four types? independent, conditionally co-regulated, separately coregulated, and similar, based on the degree of overlap between two modules. The respective types of inter-module relations are exemplified with biological inferences via enrichment study. 　

SABB

Jongeun Park

Evolutionary characterization of the proteins containing KRAB-Zinc finger domains

In the previous study (Kim et al., 2006), we have shown that the proteins containing KRAB-Zinc finger domains are likely to be responsible for the lineage specific function in mammals. Using the non redundant sets of proteins, the international protein index(IPI), developed by the EMBL, Pfam domain search were conducted to extract the proteins containing the KRAB-Zinc finger domains in five species; human, mouse, rat, chicken and Zebrafish. The number of the KRAB-Zinc finger proteins were much higher in mammals; human (312), mouse (430) and rat (486), that that of non mammalian vertebrate; chicken (12) and Zebrafish (0). In order to elucidate the evolutionary relationship of the proteins, phylogenetic analyses were conducted within species with homologous sequences, and between species with the orthologous sequences. Here, we suggest the proteins containing KRAB-Zinc finger domains seem to be mainly expanded after the evolutionary branching point of the mammalian and non mammalian vertebrate. 　

EWUBI

Bumjin Kim

Improved tag-to-gene assignment for reliable interpretation of SAGE data

Serial Analysis of Gene Expression (SAGE) is a tag-based method of probing gene expression at the genome-wide level. Reliable tag-to-gene assignment is essential but often complicated due to many factors such as (i) sequencing errors, (ii) tag redundancy owing to short tag length (10bp in short SAGE, 21bp in long SAGE), (iii) interanl priming (use of alternative restriction sites), (iv) alternative polyA tails, and (v) presence of SNP in the restriction enzyme site or inside the tags. Conventainal procedure uses the tags extracted from the mRNA and EST sequences in a UniGene cluster without addressing those problems. We developed a computational pipeline that took alternate tags and experimental problems into consideration. First, we created the ‘virtual’ tag libraries from various gene models that included the RefSeq and the ECgene models of splice variants. Second, we created the ‘observed’ tag library after removing errorneous tags due to sequencing errors using a Monte Carlo simulation. Resulting observed tag library was compared with the virtual tag libaraies at varous confidence level. ECgene model of splice variants takes the alternative polyA tails into consideration. Alternative tags arising from SNP and internal priming were deduced.
　

두번째 세미나는 이화여자대학교 종합과학관 C동 B101호에서 2006년 2월 28일 오후 4시에 시작합니다. 이화여대로 가는 약도와 교통편의 링크입니다. 이화여대 캠터스 내에서 종합과학관의 위치는 다음 약도를 참조하십시요.

Affiliation	Presenter	Abstract
SABB	임다정	GOBias: A significance test of the spatial bias of genes in a gene ontology term
		GOBias is a web tool for testing the statistical significance of the chromosomal spatial bias of genes in a gene ontology (GO) term versus random chance. The distributions of the random chances of each node were drawn using 10,000 bootstraps. Currently, GOBias describes five species, including human, mouse, rat, chicken and zebrafish. The user can find the bootstrapping distribution and significance value of any GO term for these species. GOBias also visualizes the genomic distribution of genes in a GO term, and provides a test of significance for the clustering using a query protein list for the five species.>
EWUBI	이영희	ASviewer: Visualizing the transcript structure and functional domains of alternatively-spliced genes
		Alternative splicing (AS) produces diverse transcript structures by differential use of splice sites. Comparing the gene structure and functional domains of splice variants is an essential but nontrivial task with numerous gene predictions available publicly. We developed a novel viewer (ASviewer) that visualized the transcript structure and functional inference of alternatively spliced genes intuitively. Key ideas involve clustering of overlapping exons and representing introns in arbitrary scales. Using the representative exons in the master coordinate facilitates comparison of transcript structure of many isoforms. The most distinctive feature of the viewer is that it can be the genome browser or the transcript viewer by arbitrary intron scaling. Intron scale of 100% makes the view equivalent to the genome browser that is most convenient to specify genomic features. ASviewer at the intron scale of 0% shows transcripts in the mRNA (exon) coordinate that is suitable to depict features in mRNA sequences such as functional domains. Therefore, arbitrary intron scaling makes it possible to combine advantages of genome browser and transcript viewer into a single viewer. Current java implementation supports five well-known gene predictions (RefSeq, Ensembl, AceView, CCDS and ECgene) as well as uploading user sequences and features in various formats. ASviewer is available at http://genome.ewha.ac.kr/ASviewer. [doc]
EWUBI	이영희	Genome-wide survey of domain changes due to alternative pre-mRNA splicing
		Alternative splicing (AS) is an important mechanism of increasing proteome diversity. Domain changes due to AS events have a direct effect on molecular function of the gene, and many examples of functional changes are reported in terms of cell communication, signaling, development and apoptosis. Some splice variants are known to carry out even the opposite function. In an effort to elucidate the functional role of alternative splicing, we performed a genome-wide analysis of domain changes due to alternative pre-mRNA splicing using ECgene model. ECgene provides one of the most complete catalogs of splice variants. We calculated the PFAM domains for all ECgene transcripts and classified the type of alternative splicing - exon skipping, donor/acceptor site variation, alternative initial/terminal transcription. The origin of changes in functional domains was analyzed in terms of the AS types and frame shifts. We find that a substantial portion of domain changes arise from the frame shift, not from skipping exons with functional domains. Furthermore, the correlation with normal/cancer phenotypes is explored by inspecting the EST sequences consistent with each isoform structure. The result would be valuable to examine the phenotypic consequences of domain changes due to AS events. [doc]
SNUBI	이혜원	The Tissue Microarray Object Model: a data model for storage, analysis and exchange of tissue microarray experimental data
		Tissue microarray (TMA) is an array-based technology allowing the examination of hundreds of tissue samples on a single slide. To handle, exchange, and disseminate TMA data, we need standard representations of the methods used, of the data generated, and of the clinical and histopathological information related to TMA data analysis. This study aims to create a comprehensive data model with flexibility that supports diverse experimental designs and with expressivity and extensibility that enables an adequate and comprehensive description of new clinical and histopathological data elements. We designed a Tissue Microarray Object Model (TMA-OM). Both the Array Information and the Experimental Procedure models are created by referring to Microarray Gene Expression Object Model, Minimum Information Specification For In Situ Hybridization and Immunohistochemistry Experiments (MISFISHIE), and the TMA Data Exchange Specifications (TMA DES). The Clinical and Histopathological Information model is created by using CAP Cancer Protocols and National Cancer Institute Common Data Elements (NCI CDEs). MGED Ontology, UMLS and the terms extracted from CAP Cancer Protocols and NCI CDEs are used to create a controlled vocabulary for unambiguous annotation. We implemented a web-based application for TMA-OM, supporting data export in XML format conforming to the TMA DES or the DTD derived from TMA-OM. TMA-OM provides a comprehensive data model for storage, analysis and exchange of TMA data and facilitates model-level integration of other biological models. Availability: Xperanto-TMA is available at http://xperanto.snubi.org/TMA/. [doc]

첫번째 세미나는 서울대학교 관악캠퍼스 농생대 대회의실 200동 3016호에서 2005년 11월 19일 오전 9시에 시작합니다.

스케쥴은 9시부터 11시까지 발표, 그 후 족구시합, 점심 순으로 진행됩니다.

Affiliation Presenter Abstract

EWUBI 남승윤 Transcriptional regulatory network

Biological networks are the representation of multiple interactions within a cell. Recent advances in molecular and computational biology have made possible the study of intricate transcriptional regulatory networks that describe gene expression as a function of regulatory inputs specified by interactions between proteins and DNA. Here we have developed an approach to identify genome-wide transcriptional binding sites by using an knowledge-based transcriptional binding factor database, TRANSFAC® Professional 8.3 . The approach is combined with comparative genomics among multiple species to find the evolutionally conserved binding sites. The present study concentrates on searching the transcriptional binding factor pairs in the neighborhood and defining the statistical boundary for neighborhood measurement.

SNUBI 정희준 ArrayXPath

ArrayXPath (http://www.snuib.org/software/ArrayXPath) is a web-based service for mapping and visualizing microarray gene-expression data with integrated biological pathway resources using Scalable Vector Graphics (SVG). Deciphering the crosstalk among and integrating biomedical ontologies and knowledge bases may help biological interpretation of microarray data. ArrayXPath is empowered by integrating gene-pathway, disease-pathway, drug-pathway, and pathway-pathway correlations with integrated Gene Ontology (GO), Medical Subject Headings (MeSH), and OMIM Morbid Map-based annotations. We applied Fisher’s exact test and relative risk to evaluate the statistical significance of the correlations. ArrayXPath produces Javascript-enabled SVGs for web-enabled interactive visualization of gene expression profiles integrated with gene-pathway-disease interactions enriched by biomedical ontologies.

SABB 문선진 미토콘드리아 DNA와 미토-프로테인의 진화

세포의 ‘파워플랜트’라 불리기도 하는 미토콘드리아는 세포의 에너지 대사를 위해 필수적인 세포내 소기관이다. 미토콘드리아의 기능에 이상이 생기면 대사성질병에서부터 ATP등 에너지부족으로 유전자 중 특히, 긴 펩타이드를 만드는 유전자의 발현에 큰 지장을 초래한다. 즉, 근육을 구성하는 단백질 발현에 이상이 생기는 경우가 많다. 때문에 미토콘드리아는 생물체가 환경에 적응하는 데 가장 중요한 요소가 되며, 다른 어떤 것보다 진화적인 힘을 크게 받는다. 인간의 미토콘드리아 DNA는 독립적인 유전 시스템을 가지고 있으며, 단백질을 만드는 유전자를 13개 가지고 있다. 진화적으로 endosymbiosis에 의해서 세포소기관으로 진화한 것이나, 가장단순한 단세포 및 세균이 가진 유전자의 개수보다 훨씬 작은 수이다. 그러나 인간 게놈에서 천 개 이상의 단백질(미토-프로테인)이 발현되어서 미토콘드리아로 들어가 미토콘드리아가 제 기능을 할 수 있다. 이는 수 억 년전 자유롭게 살던 박테리아였던 미토콘드리아가 진핵 세포속으로 들어왔고, 그 후 미토콘드리아가 지니고 있던 유전자를 (어떤 진화의 힘이) 하나씩 핵의 게놈으로 이동시켜온 때문으로 현재는 보고 있다.
현재 미국생물정보센터(NCBI)에 1000여 종에 대해 전체 미토콘드리아 DNA 서열이 등재되어 있다. 또한, 미토-프로테인에 대한 유전자는 인간의 경우 800여 개가 등재되어 있으며, 효모, 생쥐, 식물의 미토-프로테인은 300에서 600 개정도의 단백질 서열이 등재되어 있다. 이들 미토콘드리아 DNA 및 미토-프로테인을 진화적 관점에서 분석하는 것은 미토콘드리아의 진화뿐만 아니라, 간접적으로 생물체가 환경에 대한 적응해온 방법에 대한 간접적인 증거로 사용될 수 있다. 또한, 미토-프로테인의 경우, endosymbiosis의 과정을 통해 핵의 DNA로 이동했기 때문에 기본적으로 intron이 없는 상태에서 DNA의 삶을 시작하였다. 미토-프로테인 유전자는 진핵세포가 진핵세포라는 복잡한 기능을 가질 수 있게 된 원류로써 duplication 매카니즘의 한 종류인 retroposon이기도 하다. 따라서 인간-침팬지-생쥐-닭의 미토-프로테인의 진화적 변화 과정을 추적하는 것으로 exon, intron 및 UTR 등과 같은 유전자 구조가 형성과정을 밝히는 데 빛을 비출 수 있을 것이다.

EWUBI 김보라 ChimerDB

Chromosome translocation and gene fusion are frequent events in the human genome and are often the cause of many types of tumor. ChimerDB is the database of fusion sequences encompassing bioinformatics analysis of mRNA and expressed sequence tag (EST) sequences in the GenBank, manual collection of literature dataandintegration with otherknown database such as OMIM. Our bioinformatics analysis identifies the fusion transcripts that have nonoverlapping alignments at multiple genomic loci. Fusion events at exon?exon borders are selected to filter out the cloning artifacts in cDNA library preparation. The result is classified into two groups?genuine chromosome translocation and fusion betweenneighboring genes owing to intergenic splicing. We also integrated manually collected literature and OMIM data for chromosome translocation as an aid to assess the validity of each fusion event. The database is available at http://genome.ewha.ac.kr/ ChimerDB/ for human, mouse and rat genomes.