Abstract
An increasing number of genome sequencing projects results in explosive growth of whole genome sequences. Furthermore the number of studies on the functions of individual genes has also been rapidly increased. However on-memory algorithms are not applicable to the analysis of whole genome sequences, since the size of individual whole genome ranges from several million base pairs to hundreds billion base pairs. In order to effectively manipulate the huge sequence data, it is necessary to use the indexed data structure for external memory. In this paper, we introduce the development and application of the workbench for the analysis and visualization of whole genome sequences using string B-tree that is suitable for the analysis of huge data. This system consists of two main parts, the analysis query part and the visualization part. The query system supports various transactions such as pattern matching, k-occurrence, and k-mer analysis. The visualization system helps biologists to easily understand whole genome structure and specificity by various kinds of visualization such as whole genome sequence viewer, annotation viewer, CGR (Chaos Game Representation) viewer, k-mer viewer, RWP (Random Walk Plot) viewer, and map viewer. We can find the relationships among organisms, support gene prediction in a genome, and study the function of junk DNA using our workbench. In this paper, we apply our workbench to investigating specific sequence such as avoided sequence, common sequence, and classifiable sequence.
Original language | English (US) |
---|---|
Pages (from-to) | 205-217 |
Number of pages | 13 |
Journal | Korean Journal of Genetics |
Volume | 24 |
Issue number | 2 |
State | Published - Jun 2002 |
Externally published | Yes |
Keywords
- Avoided sequence
- Chaos game representation
- Classifiable sequence
- Common sequence
- Genome
- Random walk plot
- Sequence analysis
- Workbench
- k-mer analysis
ASJC Scopus subject areas
- Genetics