A Highly Accurate Ensemble Classifier for the Molecular Diagnosis of ASD at Ages 1 to 4 Years

Austin Chiang

doi:10.1101/2021.07.08.21260225

A Highly Accurate Ensemble Classifier for the Molecular Diagnosis of ASD at Ages 1 to 4 Years

Austin Chiang

Immunology Center of Georgia

Research output: Other contribution

Abstract

ASD diagnosis remains behavior-based and the median age of the first diagnosis remains unchanged at ∼52 months, which is nearly 5 years after its first-trimester origin. Long delays between ASD’s prenatal onset and eventual diagnosis likely is a missed opportunity. However, accurate and clinically translatable early-age diagnostic methods do not exist due to ASD genetic and clinical heterogeneity. There is a need for early-age diagnostic biomarkers of ASD that are robust against its heterogeneity.ObjectiveTo develop a single blood-based molecular classifier that accurately diagnoses ASD at the age of first symptoms. Design, Setting, and ParticipantsN=264 ASD, typically developing (TD), and language delayed (LD) toddlers with their clinical, diagnostic, and leukocyte RNA data collected. Datasets included Discovery (n=175 ASD, TD subjects), Longitudinal (n=33 ASD, TD subjects), and Replication (n=89 ASD, TD, LD subjects). We developed an ensemble of ASD classifiers by testing 42,840 models composed of 3,570 feature selection sets and 12 classification methods. Models were trained on the Discovery dataset with 5-fold cross-validation. Results were used to construct a Bayesian model averaging-based (BMA) ensemble classifier model that was tested in Discovery and Replication datasets. Data were collected from 2007 to 2012 and analyzed from August 2019 to April 2021. Main Outcomes and MeasuresPrimary outcomes were (1) comparisons of the performance of 42,840 classifier models in correctly identifying ASD vs TD and LD in Discovery and Replication datasets; and (2) performance of the ensemble model composed of 1,076 models weighted by Bayesian model averaging technique.ResultsOf 42,840 models trained in the Discovery dataset, 1,076 averaged AUC-ROC>0.8. These 1,076 models used 191 different feature routes and 2,764 gene features. Using weighted BMA of these features and routes, an ensemble classifier model was constructed which demonstrated excellent performance in Discovery and Replication datasets with ASD classification AUC-ROC scores of 84% to 88%. ASD classification accuracy was comparable against LD and TD subjects and in the Longitudinal dataset. ASD toddlers with ensemble scores above and below the ASD ensemble mean had similar diagnostic and psychometric scores, but those below the ASD ensemble mean had more prenatal risk events than TD toddlers. Ensemble features include genes with immune/inflammation, response to cytokines, transcriptional regulation, mitotic cell cycle, and PI3K-AKT, RAS, and Wnt signaling pathways. Conclusions and RelevanceAn ensemble ASD molecular classifier has high and replicable accuracy across the spectrum of ASD clinical characteristics and across toddlers aged 1 to 4 years, which has the potential for clinical translation. Key PointsQuestionSince ASD is genetically and clinically heterogeneous, can a single blood-based molecular classifier accurately diagnose ASD at the age of first symptoms?FindingsTo address heterogeneity, we developed an ASD classifier method testing 42,840 models. An ensemble of 1,076 models using 191 different feature routes and 2,764 gene features, weighted by Bayesian model averaging, demonstrated excellent performance in Discovery and Replication datasets producing ASD classification with the area under the receiver operating characteristic curve (AUC-ROC) scores of 84% to 88%. Features include genes with immune/inflammation, response to cytokines, transcriptional regulation, mitotic cell cycle, and PI3K-AKT, RAS, and Wnt signaling pathways.MeaningAn ensemble gene expression ASD classifier has high accuracy across the spectrum of ASD clinical characteristics and toddlers aged 1 to 4 years.

Original language	Undefined
DOIs	https://doi.org/10.1101/2021.07.08.21260225
State	Published - Jul 9 2021

Access to Document

10.1101/2021.07.08.21260225

Cite this

@misc{6538f2c40839482783fce2a7245ee3c4,

title = "A Highly Accurate Ensemble Classifier for the Molecular Diagnosis of ASD at Ages 1 to 4 Years",

abstract = "ASD diagnosis remains behavior-based and the median age of the first diagnosis remains unchanged at ∼52 months, which is nearly 5 years after its first-trimester origin. Long delays between ASD{\textquoteright}s prenatal onset and eventual diagnosis likely is a missed opportunity. However, accurate and clinically translatable early-age diagnostic methods do not exist due to ASD genetic and clinical heterogeneity. There is a need for early-age diagnostic biomarkers of ASD that are robust against its heterogeneity.ObjectiveTo develop a single blood-based molecular classifier that accurately diagnoses ASD at the age of first symptoms. Design, Setting, and ParticipantsN=264 ASD, typically developing (TD), and language delayed (LD) toddlers with their clinical, diagnostic, and leukocyte RNA data collected. Datasets included Discovery (n=175 ASD, TD subjects), Longitudinal (n=33 ASD, TD subjects), and Replication (n=89 ASD, TD, LD subjects). We developed an ensemble of ASD classifiers by testing 42,840 models composed of 3,570 feature selection sets and 12 classification methods. Models were trained on the Discovery dataset with 5-fold cross-validation. Results were used to construct a Bayesian model averaging-based (BMA) ensemble classifier model that was tested in Discovery and Replication datasets. Data were collected from 2007 to 2012 and analyzed from August 2019 to April 2021. Main Outcomes and MeasuresPrimary outcomes were (1) comparisons of the performance of 42,840 classifier models in correctly identifying ASD vs TD and LD in Discovery and Replication datasets; and (2) performance of the ensemble model composed of 1,076 models weighted by Bayesian model averaging technique.ResultsOf 42,840 models trained in the Discovery dataset, 1,076 averaged AUC-ROC>0.8. These 1,076 models used 191 different feature routes and 2,764 gene features. Using weighted BMA of these features and routes, an ensemble classifier model was constructed which demonstrated excellent performance in Discovery and Replication datasets with ASD classification AUC-ROC scores of 84% to 88%. ASD classification accuracy was comparable against LD and TD subjects and in the Longitudinal dataset. ASD toddlers with ensemble scores above and below the ASD ensemble mean had similar diagnostic and psychometric scores, but those below the ASD ensemble mean had more prenatal risk events than TD toddlers. Ensemble features include genes with immune/inflammation, response to cytokines, transcriptional regulation, mitotic cell cycle, and PI3K-AKT, RAS, and Wnt signaling pathways. Conclusions and RelevanceAn ensemble ASD molecular classifier has high and replicable accuracy across the spectrum of ASD clinical characteristics and across toddlers aged 1 to 4 years, which has the potential for clinical translation. Key PointsQuestionSince ASD is genetically and clinically heterogeneous, can a single blood-based molecular classifier accurately diagnose ASD at the age of first symptoms?FindingsTo address heterogeneity, we developed an ASD classifier method testing 42,840 models. An ensemble of 1,076 models using 191 different feature routes and 2,764 gene features, weighted by Bayesian model averaging, demonstrated excellent performance in Discovery and Replication datasets producing ASD classification with the area under the receiver operating characteristic curve (AUC-ROC) scores of 84% to 88%. Features include genes with immune/inflammation, response to cytokines, transcriptional regulation, mitotic cell cycle, and PI3K-AKT, RAS, and Wnt signaling pathways.MeaningAn ensemble gene expression ASD classifier has high accuracy across the spectrum of ASD clinical characteristics and toddlers aged 1 to 4 years.",

author = "Austin Chiang",

year = "2021",

month = jul,

day = "9",

doi = "10.1101/2021.07.08.21260225",

language = "Undefined",

type = "Other",

}

TY - GEN

T1 - A Highly Accurate Ensemble Classifier for the Molecular Diagnosis of ASD at Ages 1 to 4 Years

AU - Chiang, Austin

PY - 2021/7/9

Y1 - 2021/7/9

N2 - ASD diagnosis remains behavior-based and the median age of the first diagnosis remains unchanged at ∼52 months, which is nearly 5 years after its first-trimester origin. Long delays between ASD’s prenatal onset and eventual diagnosis likely is a missed opportunity. However, accurate and clinically translatable early-age diagnostic methods do not exist due to ASD genetic and clinical heterogeneity. There is a need for early-age diagnostic biomarkers of ASD that are robust against its heterogeneity.ObjectiveTo develop a single blood-based molecular classifier that accurately diagnoses ASD at the age of first symptoms. Design, Setting, and ParticipantsN=264 ASD, typically developing (TD), and language delayed (LD) toddlers with their clinical, diagnostic, and leukocyte RNA data collected. Datasets included Discovery (n=175 ASD, TD subjects), Longitudinal (n=33 ASD, TD subjects), and Replication (n=89 ASD, TD, LD subjects). We developed an ensemble of ASD classifiers by testing 42,840 models composed of 3,570 feature selection sets and 12 classification methods. Models were trained on the Discovery dataset with 5-fold cross-validation. Results were used to construct a Bayesian model averaging-based (BMA) ensemble classifier model that was tested in Discovery and Replication datasets. Data were collected from 2007 to 2012 and analyzed from August 2019 to April 2021. Main Outcomes and MeasuresPrimary outcomes were (1) comparisons of the performance of 42,840 classifier models in correctly identifying ASD vs TD and LD in Discovery and Replication datasets; and (2) performance of the ensemble model composed of 1,076 models weighted by Bayesian model averaging technique.ResultsOf 42,840 models trained in the Discovery dataset, 1,076 averaged AUC-ROC>0.8. These 1,076 models used 191 different feature routes and 2,764 gene features. Using weighted BMA of these features and routes, an ensemble classifier model was constructed which demonstrated excellent performance in Discovery and Replication datasets with ASD classification AUC-ROC scores of 84% to 88%. ASD classification accuracy was comparable against LD and TD subjects and in the Longitudinal dataset. ASD toddlers with ensemble scores above and below the ASD ensemble mean had similar diagnostic and psychometric scores, but those below the ASD ensemble mean had more prenatal risk events than TD toddlers. Ensemble features include genes with immune/inflammation, response to cytokines, transcriptional regulation, mitotic cell cycle, and PI3K-AKT, RAS, and Wnt signaling pathways. Conclusions and RelevanceAn ensemble ASD molecular classifier has high and replicable accuracy across the spectrum of ASD clinical characteristics and across toddlers aged 1 to 4 years, which has the potential for clinical translation. Key PointsQuestionSince ASD is genetically and clinically heterogeneous, can a single blood-based molecular classifier accurately diagnose ASD at the age of first symptoms?FindingsTo address heterogeneity, we developed an ASD classifier method testing 42,840 models. An ensemble of 1,076 models using 191 different feature routes and 2,764 gene features, weighted by Bayesian model averaging, demonstrated excellent performance in Discovery and Replication datasets producing ASD classification with the area under the receiver operating characteristic curve (AUC-ROC) scores of 84% to 88%. Features include genes with immune/inflammation, response to cytokines, transcriptional regulation, mitotic cell cycle, and PI3K-AKT, RAS, and Wnt signaling pathways.MeaningAn ensemble gene expression ASD classifier has high accuracy across the spectrum of ASD clinical characteristics and toddlers aged 1 to 4 years.

AB - ASD diagnosis remains behavior-based and the median age of the first diagnosis remains unchanged at ∼52 months, which is nearly 5 years after its first-trimester origin. Long delays between ASD’s prenatal onset and eventual diagnosis likely is a missed opportunity. However, accurate and clinically translatable early-age diagnostic methods do not exist due to ASD genetic and clinical heterogeneity. There is a need for early-age diagnostic biomarkers of ASD that are robust against its heterogeneity.ObjectiveTo develop a single blood-based molecular classifier that accurately diagnoses ASD at the age of first symptoms. Design, Setting, and ParticipantsN=264 ASD, typically developing (TD), and language delayed (LD) toddlers with their clinical, diagnostic, and leukocyte RNA data collected. Datasets included Discovery (n=175 ASD, TD subjects), Longitudinal (n=33 ASD, TD subjects), and Replication (n=89 ASD, TD, LD subjects). We developed an ensemble of ASD classifiers by testing 42,840 models composed of 3,570 feature selection sets and 12 classification methods. Models were trained on the Discovery dataset with 5-fold cross-validation. Results were used to construct a Bayesian model averaging-based (BMA) ensemble classifier model that was tested in Discovery and Replication datasets. Data were collected from 2007 to 2012 and analyzed from August 2019 to April 2021. Main Outcomes and MeasuresPrimary outcomes were (1) comparisons of the performance of 42,840 classifier models in correctly identifying ASD vs TD and LD in Discovery and Replication datasets; and (2) performance of the ensemble model composed of 1,076 models weighted by Bayesian model averaging technique.ResultsOf 42,840 models trained in the Discovery dataset, 1,076 averaged AUC-ROC>0.8. These 1,076 models used 191 different feature routes and 2,764 gene features. Using weighted BMA of these features and routes, an ensemble classifier model was constructed which demonstrated excellent performance in Discovery and Replication datasets with ASD classification AUC-ROC scores of 84% to 88%. ASD classification accuracy was comparable against LD and TD subjects and in the Longitudinal dataset. ASD toddlers with ensemble scores above and below the ASD ensemble mean had similar diagnostic and psychometric scores, but those below the ASD ensemble mean had more prenatal risk events than TD toddlers. Ensemble features include genes with immune/inflammation, response to cytokines, transcriptional regulation, mitotic cell cycle, and PI3K-AKT, RAS, and Wnt signaling pathways. Conclusions and RelevanceAn ensemble ASD molecular classifier has high and replicable accuracy across the spectrum of ASD clinical characteristics and across toddlers aged 1 to 4 years, which has the potential for clinical translation. Key PointsQuestionSince ASD is genetically and clinically heterogeneous, can a single blood-based molecular classifier accurately diagnose ASD at the age of first symptoms?FindingsTo address heterogeneity, we developed an ASD classifier method testing 42,840 models. An ensemble of 1,076 models using 191 different feature routes and 2,764 gene features, weighted by Bayesian model averaging, demonstrated excellent performance in Discovery and Replication datasets producing ASD classification with the area under the receiver operating characteristic curve (AUC-ROC) scores of 84% to 88%. Features include genes with immune/inflammation, response to cytokines, transcriptional regulation, mitotic cell cycle, and PI3K-AKT, RAS, and Wnt signaling pathways.MeaningAn ensemble gene expression ASD classifier has high accuracy across the spectrum of ASD clinical characteristics and toddlers aged 1 to 4 years.

UR - http://dx.doi.org/10.1101/2021.07.08.21260225

U2 - 10.1101/2021.07.08.21260225

DO - 10.1101/2021.07.08.21260225

M3 - Other contribution

ER -

A Highly Accurate Ensemble Classifier for the Molecular Diagnosis of ASD at Ages 1 to 4 Years

Abstract

Access to Document

Other files and links

Cite this