TY - JOUR
T1 - Model for comparative analysis of antigen receptor repertoires
AU - Rempala, Grzegorz A.
AU - Seweryn, Michał
AU - Ignatowicz, Leszek
N1 - Funding Information:
This work was partially supported by funds from the National Institutes of Health under Grants 1R01CA152158 (G.A.R.) and 5R01AI078285, 5R01AI079277 (L.I.) . The authors would like to thank Dr. Rhea-Beth Markowitz and Alicja Ignatowicz for their help with proofreading the manuscript as well as acknowledge the insightful comments of the Associate Editor and the Reviewers, which helped them improve the original version of the article.
PY - 2011/1/21
Y1 - 2011/1/21
N2 - In modern molecular biology one of the standard ways of analyzing a vertebrate immune system is to sequence and compare the counts of specific antigen receptor clones (either immunoglobulins or T-cell receptors) derived from various tissues under different experimental or clinical conditions. The resulting statistical challenges are difficult and do not fit readily into the standard statistical framework of contingency tables primarily due to the serious under-sampling of the receptor populations. This under-sampling is caused, on one hand, by the extreme diversity of antigen receptor repertoires maintained by the immune system and, on the other, by the high cost and labor intensity of the receptor data collection process. In most of the recent immunological literature the differences across antigen receptor populations are examined via non-parametric statistical measures of the species overlap and diversity borrowed from ecological studies. While this approach is robust in a wide range of situations, it seems to provide little insight into the underlying clonal size distribution and the overall mechanism differentiating the receptor populations. As a possible alternative, the current paper presents a parametric method that adjusts for the data under-sampling as well as provides a unifying approach to a simultaneous comparison of multiple receptor groups by means of the modern statistical tools of unsupervised learning. The parametric model is based on a flexible multivariate Poisson-lognormal distribution and is seen to be a natural generalization of the univariate Poisson-lognormal models used in the ecological studies of biodiversity patterns. The procedure for evaluating a model's fit is described along with the public domain software developed to perform the necessary diagnostics. The model-driven analysis is seen to compare favorably vis a vis traditional methods when applied to the data from T-cell receptors in transgenic mice populations.
AB - In modern molecular biology one of the standard ways of analyzing a vertebrate immune system is to sequence and compare the counts of specific antigen receptor clones (either immunoglobulins or T-cell receptors) derived from various tissues under different experimental or clinical conditions. The resulting statistical challenges are difficult and do not fit readily into the standard statistical framework of contingency tables primarily due to the serious under-sampling of the receptor populations. This under-sampling is caused, on one hand, by the extreme diversity of antigen receptor repertoires maintained by the immune system and, on the other, by the high cost and labor intensity of the receptor data collection process. In most of the recent immunological literature the differences across antigen receptor populations are examined via non-parametric statistical measures of the species overlap and diversity borrowed from ecological studies. While this approach is robust in a wide range of situations, it seems to provide little insight into the underlying clonal size distribution and the overall mechanism differentiating the receptor populations. As a possible alternative, the current paper presents a parametric method that adjusts for the data under-sampling as well as provides a unifying approach to a simultaneous comparison of multiple receptor groups by means of the modern statistical tools of unsupervised learning. The parametric model is based on a flexible multivariate Poisson-lognormal distribution and is seen to be a natural generalization of the univariate Poisson-lognormal models used in the ecological studies of biodiversity patterns. The procedure for evaluating a model's fit is described along with the public domain software developed to perform the necessary diagnostics. The model-driven analysis is seen to compare favorably vis a vis traditional methods when applied to the data from T-cell receptors in transgenic mice populations.
KW - Computational immunology
KW - Lognormal distribution
KW - Poisson abundance models
KW - Species diversity estimation
KW - T-cell antigen receptors
UR - http://www.scopus.com/inward/record.url?scp=77958138360&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77958138360&partnerID=8YFLogxK
U2 - 10.1016/j.jtbi.2010.10.001
DO - 10.1016/j.jtbi.2010.10.001
M3 - Article
C2 - 20955715
AN - SCOPUS:77958138360
SN - 0022-5193
VL - 269
SP - 1
EP - 15
JO - Journal of Theoretical Biology
JF - Journal of Theoretical Biology
IS - 1
ER -