Comparison of the predictive accuracy of DNA array-based multigene classifiers across cDNA arrays and affymetrix GeneChips

James Stec; Jing Wang; Kevin Coombes; Mark Ayers; Sebastian Hoersch; David L. Gold; Jeffrey S. Ross; Kenneth R. Hess; Stephen Tirrell; Gerald Linette; Gabriel N. Hortobagyi; W. Fraser Symmans; Lajos Pusztai

doi:10.1016/S1525-1578(10)60565-X

Comparison of the predictive accuracy of DNA array-based multigene classifiers across cDNA arrays and affymetrix GeneChips

James Stec, Jing Wang, Kevin Coombes, Mark Ayers, Sebastian Hoersch, David L. Gold, Jeffrey S. Ross, Kenneth R. Hess, Stephen Tirrell, Gerald Linette, Gabriel N. Hortobagyi, W. Fraser Symmans, Lajos Pusztai

Research output: Contribution to journal › Article › peer-review

48 Scopus citations

Abstract

We examined how well differentially expressed genes and multigene outcome classifiers retain their class-discriminating values when tested on data generated by different transcriptional profiling platforms. RNA from 33 stage I-III breast cancers was hybridized to both Affymetrix GeneChip and Millennium Pharmaceuticals cDNA arrays. Only 30% of all corresponding gene expression measurements on the two platforms had Pearson correlation coefficient r ≥ 0.7 when UniGene was used to match probes. There was substantial variation in correlation between different Affymetrix probe sets matched to the same cDNA probe. When cDNA and Affymetrix probes were matched by basic local alignment tool (BLAST) sequence identity, the correlation increased substantially. We identified 182 genes in the Affymetrix and 45 in the cDNA data (including 17 common genes) that accurately separated 91% of cases in supervised hierarchical clustering in each data set. Cross-platform testing of these informative genes resulted in lower clustering accuracy of 45 and 79%, respectively. Several sets of accurate five-gene classifiers were developed on each platform using linear discriminant analysis. The best 100 classifiers showed average misclassification error rate of 2% on the original data that rose to 19.5% when tested on data from the other platform. Random five-gene classifiers showed misclassification error rate of 33%. We conclude that multigene predictors optimized for one platform lose accuracy when applied to data from another platform due to missing genes and sequence differences in probes that result in differing measurements for the same gene.

Original language	English (US)
Pages (from-to)	357-367
Number of pages	11
Journal	Journal of Molecular Diagnostics
Volume	7
Issue number	3
DOIs	https://doi.org/10.1016/S1525-1578(10)60565-X
State	Published - Aug 2005
Externally published	Yes

ASJC Scopus subject areas

Pathology and Forensic Medicine
Molecular Medicine

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.1016/S1525-1578(10)60565-X

Cite this

Stec, J., Wang, J., Coombes, K., Ayers, M., Hoersch, S., Gold, D. L., Ross, J. S., Hess, K. R., Tirrell, S., Linette, G., Hortobagyi, G. N., Symmans, W. F., & Pusztai, L. (2005). Comparison of the predictive accuracy of DNA array-based multigene classifiers across cDNA arrays and affymetrix GeneChips. Journal of Molecular Diagnostics, 7(3), 357-367. https://doi.org/10.1016/S1525-1578(10)60565-X

Stec, J, Wang, J, Coombes, K, Ayers, M, Hoersch, S, Gold, DL, Ross, JS, Hess, KR, Tirrell, S, Linette, G, Hortobagyi, GN, Symmans, WF & Pusztai, L 2005, 'Comparison of the predictive accuracy of DNA array-based multigene classifiers across cDNA arrays and affymetrix GeneChips', Journal of Molecular Diagnostics, vol. 7, no. 3, pp. 357-367. https://doi.org/10.1016/S1525-1578(10)60565-X

@article{1705bc72d7814cfca2ba591792f12edf,

title = "Comparison of the predictive accuracy of DNA array-based multigene classifiers across cDNA arrays and affymetrix GeneChips",

abstract = "We examined how well differentially expressed genes and multigene outcome classifiers retain their class-discriminating values when tested on data generated by different transcriptional profiling platforms. RNA from 33 stage I-III breast cancers was hybridized to both Affymetrix GeneChip and Millennium Pharmaceuticals cDNA arrays. Only 30% of all corresponding gene expression measurements on the two platforms had Pearson correlation coefficient r ≥ 0.7 when UniGene was used to match probes. There was substantial variation in correlation between different Affymetrix probe sets matched to the same cDNA probe. When cDNA and Affymetrix probes were matched by basic local alignment tool (BLAST) sequence identity, the correlation increased substantially. We identified 182 genes in the Affymetrix and 45 in the cDNA data (including 17 common genes) that accurately separated 91% of cases in supervised hierarchical clustering in each data set. Cross-platform testing of these informative genes resulted in lower clustering accuracy of 45 and 79%, respectively. Several sets of accurate five-gene classifiers were developed on each platform using linear discriminant analysis. The best 100 classifiers showed average misclassification error rate of 2% on the original data that rose to 19.5% when tested on data from the other platform. Random five-gene classifiers showed misclassification error rate of 33%. We conclude that multigene predictors optimized for one platform lose accuracy when applied to data from another platform due to missing genes and sequence differences in probes that result in differing measurements for the same gene.",

author = "James Stec and Jing Wang and Kevin Coombes and Mark Ayers and Sebastian Hoersch and Gold, {David L.} and Ross, {Jeffrey S.} and Hess, {Kenneth R.} and Stephen Tirrell and Gerald Linette and Hortobagyi, {Gabriel N.} and Symmans, {W. Fraser} and Lajos Pusztai",

year = "2005",

month = aug,

doi = "10.1016/S1525-1578(10)60565-X",

language = "English (US)",

volume = "7",

pages = "357--367",

journal = "Journal of Molecular Diagnostics",

issn = "1525-1578",

publisher = "Association of Molecular Pathology",

number = "3",

}

TY - JOUR

T1 - Comparison of the predictive accuracy of DNA array-based multigene classifiers across cDNA arrays and affymetrix GeneChips

AU - Stec, James

AU - Wang, Jing

AU - Coombes, Kevin

AU - Ayers, Mark

AU - Hoersch, Sebastian

AU - Gold, David L.

AU - Ross, Jeffrey S.

AU - Hess, Kenneth R.

AU - Tirrell, Stephen

AU - Linette, Gerald

AU - Hortobagyi, Gabriel N.

AU - Symmans, W. Fraser

AU - Pusztai, Lajos

PY - 2005/8

Y1 - 2005/8

N2 - We examined how well differentially expressed genes and multigene outcome classifiers retain their class-discriminating values when tested on data generated by different transcriptional profiling platforms. RNA from 33 stage I-III breast cancers was hybridized to both Affymetrix GeneChip and Millennium Pharmaceuticals cDNA arrays. Only 30% of all corresponding gene expression measurements on the two platforms had Pearson correlation coefficient r ≥ 0.7 when UniGene was used to match probes. There was substantial variation in correlation between different Affymetrix probe sets matched to the same cDNA probe. When cDNA and Affymetrix probes were matched by basic local alignment tool (BLAST) sequence identity, the correlation increased substantially. We identified 182 genes in the Affymetrix and 45 in the cDNA data (including 17 common genes) that accurately separated 91% of cases in supervised hierarchical clustering in each data set. Cross-platform testing of these informative genes resulted in lower clustering accuracy of 45 and 79%, respectively. Several sets of accurate five-gene classifiers were developed on each platform using linear discriminant analysis. The best 100 classifiers showed average misclassification error rate of 2% on the original data that rose to 19.5% when tested on data from the other platform. Random five-gene classifiers showed misclassification error rate of 33%. We conclude that multigene predictors optimized for one platform lose accuracy when applied to data from another platform due to missing genes and sequence differences in probes that result in differing measurements for the same gene.

AB - We examined how well differentially expressed genes and multigene outcome classifiers retain their class-discriminating values when tested on data generated by different transcriptional profiling platforms. RNA from 33 stage I-III breast cancers was hybridized to both Affymetrix GeneChip and Millennium Pharmaceuticals cDNA arrays. Only 30% of all corresponding gene expression measurements on the two platforms had Pearson correlation coefficient r ≥ 0.7 when UniGene was used to match probes. There was substantial variation in correlation between different Affymetrix probe sets matched to the same cDNA probe. When cDNA and Affymetrix probes were matched by basic local alignment tool (BLAST) sequence identity, the correlation increased substantially. We identified 182 genes in the Affymetrix and 45 in the cDNA data (including 17 common genes) that accurately separated 91% of cases in supervised hierarchical clustering in each data set. Cross-platform testing of these informative genes resulted in lower clustering accuracy of 45 and 79%, respectively. Several sets of accurate five-gene classifiers were developed on each platform using linear discriminant analysis. The best 100 classifiers showed average misclassification error rate of 2% on the original data that rose to 19.5% when tested on data from the other platform. Random five-gene classifiers showed misclassification error rate of 33%. We conclude that multigene predictors optimized for one platform lose accuracy when applied to data from another platform due to missing genes and sequence differences in probes that result in differing measurements for the same gene.

UR - http://www.scopus.com/inward/record.url?scp=23844519571&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=23844519571&partnerID=8YFLogxK

U2 - 10.1016/S1525-1578(10)60565-X

DO - 10.1016/S1525-1578(10)60565-X

M3 - Article

C2 - 16049308

AN - SCOPUS:23844519571

SN - 1525-1578

VL - 7

SP - 357

EP - 367

JO - Journal of Molecular Diagnostics

JF - Journal of Molecular Diagnostics

IS - 3

ER -

Comparison of the predictive accuracy of DNA array-based multigene classifiers across cDNA arrays and affymetrix GeneChips

Abstract

ASJC Scopus subject areas

UN SDGs

Access to Document

Other files and links

Fingerprint

Cite this