TY - JOUR
T1 - ESTclean
T2 - A cleaning tool for next-gen transcriptome shotgun sequencing
AU - Tae, Hongseok
AU - Ryu, Dongsung
AU - Sureshchandra, Suhas
AU - Choi, Jeong-Hyeon
N1 - Funding Information:
We would like to give special thanks to H. Tang, J. K. Colbourne, J. Carter, Z. Lai, K. Mockaitis, and Z. Smith at the Center for Genomics and Bioinformatics, Indiana University for valuable comments. This work was supported in part by the National Institutes of Health [CA134304] and the National Research Foundation of Korea Grant funded by the Korean Government [NRF-2009-352-D00275].
PY - 2012/9/26
Y1 - 2012/9/26
N2 - Background: With the advent of next-generation sequencing (NGS) technologies, full cDNA shotgun sequencing has become a major approach in the study of transcriptomes, and several different protocols in 454 sequencing have been invented. As each protocol uses its own short DNA tags or adapters attached to the ends of cDNA fragments for labeling or sequencing, different contaminants may lead to mis-assembly and inaccurate sequence products.Results: We have designed and implemented a new program for raw sequence cleaning in a graphical user interface and a batch script. The cleaning process consists of several modules including barcode trimming, sequencing adapter trimming, amplification primer trimming, poly-A tail trimming, vector screening and low quality region trimming. These modules can be combined based on various sequencing applications.Conclusions: ESTclean is a software package not only for cleaning cDNA sequences, but also for helping to develop sequencing protocols by providing summary tables and figures for sequencing quality control in a graphical user interface. It outperforms in cleaning read sequences from complicated sequencing protocols which use barcodes and multiple amplification primers.
AB - Background: With the advent of next-generation sequencing (NGS) technologies, full cDNA shotgun sequencing has become a major approach in the study of transcriptomes, and several different protocols in 454 sequencing have been invented. As each protocol uses its own short DNA tags or adapters attached to the ends of cDNA fragments for labeling or sequencing, different contaminants may lead to mis-assembly and inaccurate sequence products.Results: We have designed and implemented a new program for raw sequence cleaning in a graphical user interface and a batch script. The cleaning process consists of several modules including barcode trimming, sequencing adapter trimming, amplification primer trimming, poly-A tail trimming, vector screening and low quality region trimming. These modules can be combined based on various sequencing applications.Conclusions: ESTclean is a software package not only for cleaning cDNA sequences, but also for helping to develop sequencing protocols by providing summary tables and figures for sequencing quality control in a graphical user interface. It outperforms in cleaning read sequences from complicated sequencing protocols which use barcodes and multiple amplification primers.
UR - http://www.scopus.com/inward/record.url?scp=84866554191&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84866554191&partnerID=8YFLogxK
U2 - 10.1186/1471-2105-13-247
DO - 10.1186/1471-2105-13-247
M3 - Article
C2 - 23009593
AN - SCOPUS:84866554191
SN - 1471-2105
VL - 13
JO - BMC Bioinformatics
JF - BMC Bioinformatics
IS - 1
M1 - 247
ER -