TY - CHAP
T1 - A statistical change-point analysis approach for modeling the ratio of next generation sequencing reads
AU - Chen, Jie
AU - Li, Hua
N1 - Publisher Copyright:
© Springer International Publishing Switzerland 2016.
PY - 2016
Y1 - 2016
N2 - One of the key features of statistical change-point analysis is to estimate the unknown change-point locations for various statistical models imposed on the sample data. This analysis can be done through a hypothesis testing process, a model selection perspective, or a Bayesian approach, among other methods. Change-point analysis has a wide range of applications in research fields such as statistical quality control, finance and economics, climate study, medicine, genetics, etc. In this paper, a change-point analysis motivated by the modeling of genomic data will be provided. The high throughput next generation sequencing (NGS) technology is now frequently used in profiling tumor and control samples for the study of DNA copy number variants (CNVs). In particular, the ratio of the read count of the tumor sample to that of the control sample is popularly used for identifying CNV regions. To identify CNV regions is equivalent to finding change-points that potentially exist in the NGS reads ratio data. We present a change-point model and a Bayesian solution for the estimation of the change-point locations in NGS reads ratio data. Simulation studies of the proposed method indicate the effectiveness of the proposed method in identifying change-point locations. Applications of the proposed change point model for identifying boundaries of DNA copy number variation (CNV) regions using the next generation sequencing data of breast cancer/tumor cell lines and lung cancer cell line will be presented.
AB - One of the key features of statistical change-point analysis is to estimate the unknown change-point locations for various statistical models imposed on the sample data. This analysis can be done through a hypothesis testing process, a model selection perspective, or a Bayesian approach, among other methods. Change-point analysis has a wide range of applications in research fields such as statistical quality control, finance and economics, climate study, medicine, genetics, etc. In this paper, a change-point analysis motivated by the modeling of genomic data will be provided. The high throughput next generation sequencing (NGS) technology is now frequently used in profiling tumor and control samples for the study of DNA copy number variants (CNVs). In particular, the ratio of the read count of the tumor sample to that of the control sample is popularly used for identifying CNV regions. To identify CNV regions is equivalent to finding change-points that potentially exist in the NGS reads ratio data. We present a change-point model and a Bayesian solution for the estimation of the change-point locations in NGS reads ratio data. Simulation studies of the proposed method indicate the effectiveness of the proposed method in identifying change-point locations. Applications of the proposed change point model for identifying boundaries of DNA copy number variation (CNV) regions using the next generation sequencing data of breast cancer/tumor cell lines and lung cancer cell line will be presented.
KW - Change point analysis
KW - DNA copy numbers
KW - Next generation sequencing data
UR - http://www.scopus.com/inward/record.url?scp=85071479056&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85071479056&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-34139-2_13
DO - 10.1007/978-3-319-34139-2_13
M3 - Chapter
AN - SCOPUS:85071479056
T3 - Association for Women in Mathematics Series
SP - 283
EP - 300
BT - Association for Women in Mathematics Series
PB - Springer
ER -