Bayesian shrinkage estimation of the relative abundance of mRNA transcripts using SAGE

Jeffrey S. Morris, Keith A. Baggerly, Kevin R. Coombes

Research output: Contribution to journalArticlepeer-review

17 Scopus citations


Serial analysis of gene expression (SAGE) is a technology for quantifying gene expression in biological tissue that yields count data that can be modeled by a multinomial distribution with two characteristics: skewness in the relative frequencies and small sample size relative to the dimension. As a result of these characteristics, a given SAGE sample may fail to capture a large number of expressed mRNA species present in the tissue. Empirical estimators of mRNA species' relative abundance effectively ignore these missing species, and as a result tend to overestimate the abundance of the scarce observed species comprising a vast majority of the total. We have developed a new Bayesian estimation procedure that quantifies our prior information about these characteristics, yielding a nonlinear shrinkage estimator with efficiency advantages over the MLE. Our prior is mixture of Dirichlets, whereby species are stochastically partitioned into abundant and scarce classes, each with its own multivariate prior. Simulation studies reveal our estimator has lower integrated mean squared error (IMSE) than the MLE for the SAGE scenarios simulated, and yields relative abundance profiles closer in Euclidean distance to the truth for all samples simulated. We apply our method to a SAGE library of normal colon tissue, and discuss its implications for assessing differential expression.

Original languageEnglish (US)
Pages (from-to)476-486
Number of pages11
Issue number3
StatePublished - Sep 2003
Externally publishedYes


  • Bayesian methods
  • Bioinformatics
  • Mixture distributions
  • Multinomial distribution
  • SAGE
  • Shrinkage estimators

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry, Genetics and Molecular Biology(all)
  • Immunology and Microbiology(all)
  • Agricultural and Biological Sciences(all)
  • Applied Mathematics


Dive into the research topics of 'Bayesian shrinkage estimation of the relative abundance of mRNA transcripts using SAGE'. Together they form a unique fingerprint.

Cite this