Development and application of workbench for analysis and visualization of whole genome sequence

Jeong-Hyeon Choi; Hee Jeong Jin; Cheol Min Kim; Chul Hun L. Chang; Hwan Gue Cho

Development and application of workbench for analysis and visualization of whole genome sequence

Jeong-Hyeon Choi, Hee Jeong Jin, Cheol Min Kim, Chul Hun L. Chang, Hwan Gue Cho

Research output: Contribution to journal › Article › peer-review

Abstract

An increasing number of genome sequencing projects results in explosive growth of whole genome sequences. Furthermore the number of studies on the functions of individual genes has also been rapidly increased. However on-memory algorithms are not applicable to the analysis of whole genome sequences, since the size of individual whole genome ranges from several million base pairs to hundreds billion base pairs. In order to effectively manipulate the huge sequence data, it is necessary to use the indexed data structure for external memory. In this paper, we introduce the development and application of the workbench for the analysis and visualization of whole genome sequences using string B-tree that is suitable for the analysis of huge data. This system consists of two main parts, the analysis query part and the visualization part. The query system supports various transactions such as pattern matching, k-occurrence, and k-mer analysis. The visualization system helps biologists to easily understand whole genome structure and specificity by various kinds of visualization such as whole genome sequence viewer, annotation viewer, CGR (Chaos Game Representation) viewer, k-mer viewer, RWP (Random Walk Plot) viewer, and map viewer. We can find the relationships among organisms, support gene prediction in a genome, and study the function of junk DNA using our workbench. In this paper, we apply our workbench to investigating specific sequence such as avoided sequence, common sequence, and classifiable sequence.

Original language	English (US)
Pages (from-to)	205-217
Number of pages	13
Journal	Korean Journal of Genetics
Volume	24
Issue number	2
State	Published - Jun 2002
Externally published	Yes

Keywords

Avoided sequence
Chaos game representation
Classifiable sequence
Common sequence
Genome
Random walk plot
Sequence analysis
Workbench
k-mer analysis

ASJC Scopus subject areas

Genetics

Cite this

@article{b9e298f83d8144e49d8f4290606c5add,

title = "Development and application of workbench for analysis and visualization of whole genome sequence",

abstract = "An increasing number of genome sequencing projects results in explosive growth of whole genome sequences. Furthermore the number of studies on the functions of individual genes has also been rapidly increased. However on-memory algorithms are not applicable to the analysis of whole genome sequences, since the size of individual whole genome ranges from several million base pairs to hundreds billion base pairs. In order to effectively manipulate the huge sequence data, it is necessary to use the indexed data structure for external memory. In this paper, we introduce the development and application of the workbench for the analysis and visualization of whole genome sequences using string B-tree that is suitable for the analysis of huge data. This system consists of two main parts, the analysis query part and the visualization part. The query system supports various transactions such as pattern matching, k-occurrence, and k-mer analysis. The visualization system helps biologists to easily understand whole genome structure and specificity by various kinds of visualization such as whole genome sequence viewer, annotation viewer, CGR (Chaos Game Representation) viewer, k-mer viewer, RWP (Random Walk Plot) viewer, and map viewer. We can find the relationships among organisms, support gene prediction in a genome, and study the function of junk DNA using our workbench. In this paper, we apply our workbench to investigating specific sequence such as avoided sequence, common sequence, and classifiable sequence.",

keywords = "Avoided sequence, Chaos game representation, Classifiable sequence, Common sequence, Genome, Random walk plot, Sequence analysis, Workbench, k-mer analysis",

author = "Jeong-Hyeon Choi and Jin, {Hee Jeong} and Kim, {Cheol Min} and Chang, {Chul Hun L.} and Cho, {Hwan Gue}",

year = "2002",

month = jun,

language = "English (US)",

volume = "24",

pages = "205--217",

journal = "Korean Journal of Genetics",

issn = "0254-5934",

publisher = "Springer Verlag",

number = "2",

}

TY - JOUR

T1 - Development and application of workbench for analysis and visualization of whole genome sequence

AU - Choi, Jeong-Hyeon

AU - Jin, Hee Jeong

AU - Kim, Cheol Min

AU - Chang, Chul Hun L.

AU - Cho, Hwan Gue

PY - 2002/6

Y1 - 2002/6

N2 - An increasing number of genome sequencing projects results in explosive growth of whole genome sequences. Furthermore the number of studies on the functions of individual genes has also been rapidly increased. However on-memory algorithms are not applicable to the analysis of whole genome sequences, since the size of individual whole genome ranges from several million base pairs to hundreds billion base pairs. In order to effectively manipulate the huge sequence data, it is necessary to use the indexed data structure for external memory. In this paper, we introduce the development and application of the workbench for the analysis and visualization of whole genome sequences using string B-tree that is suitable for the analysis of huge data. This system consists of two main parts, the analysis query part and the visualization part. The query system supports various transactions such as pattern matching, k-occurrence, and k-mer analysis. The visualization system helps biologists to easily understand whole genome structure and specificity by various kinds of visualization such as whole genome sequence viewer, annotation viewer, CGR (Chaos Game Representation) viewer, k-mer viewer, RWP (Random Walk Plot) viewer, and map viewer. We can find the relationships among organisms, support gene prediction in a genome, and study the function of junk DNA using our workbench. In this paper, we apply our workbench to investigating specific sequence such as avoided sequence, common sequence, and classifiable sequence.

AB - An increasing number of genome sequencing projects results in explosive growth of whole genome sequences. Furthermore the number of studies on the functions of individual genes has also been rapidly increased. However on-memory algorithms are not applicable to the analysis of whole genome sequences, since the size of individual whole genome ranges from several million base pairs to hundreds billion base pairs. In order to effectively manipulate the huge sequence data, it is necessary to use the indexed data structure for external memory. In this paper, we introduce the development and application of the workbench for the analysis and visualization of whole genome sequences using string B-tree that is suitable for the analysis of huge data. This system consists of two main parts, the analysis query part and the visualization part. The query system supports various transactions such as pattern matching, k-occurrence, and k-mer analysis. The visualization system helps biologists to easily understand whole genome structure and specificity by various kinds of visualization such as whole genome sequence viewer, annotation viewer, CGR (Chaos Game Representation) viewer, k-mer viewer, RWP (Random Walk Plot) viewer, and map viewer. We can find the relationships among organisms, support gene prediction in a genome, and study the function of junk DNA using our workbench. In this paper, we apply our workbench to investigating specific sequence such as avoided sequence, common sequence, and classifiable sequence.

KW - Avoided sequence

KW - Chaos game representation

KW - Classifiable sequence

KW - Common sequence

KW - Genome

KW - Random walk plot

KW - Sequence analysis

KW - Workbench

KW - k-mer analysis

UR - http://www.scopus.com/inward/record.url?scp=0038046367&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0038046367&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:0038046367

SN - 0254-5934

VL - 24

SP - 205

EP - 217

JO - Korean Journal of Genetics

JF - Korean Journal of Genetics

IS - 2

ER -

Development and application of workbench for analysis and visualization of whole genome sequence

Abstract

Keywords

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this