Estimating selection on synonymous codon usage from noisy experimental data

Wallace EWJ, Airoldi EM, and Drummond DA, Molecular Biology & Evolution (2013).


A key goal in molecular evolution is to extract mechanistic insights from signatures of selection. A case study is codon usage, where despite many recent advances and hypotheses, two longstanding problems remain: the relative contribution of selection and mutation in determining codon frequencies and the relative contribution of translational speed and accuracy to selection. The relevant targets of selection—the rate of translation and of mistranslation of a codon per unit time in the cell—can only be related to mechanistic properties of the translational apparatus if the number of transcripts per cell is known, requiring use of gene expression measurements. Perhaps surprisingly, different gene-expression data sets yield markedly different estimates of selection. We show that this is largely due to measurement noise, notably due to differences between studies rather than instrument error or biological variability. We develop an analytical framework that explicitly models noise in expression in the context of the population-genetic model. Estimates of mutation and selection strength in budding yeast produced by this method are robust to the expression data set used and are substantially higher than estimates using a noise-blind approach. We introduce per-gene selection estimates that correlate well with previous scoring systems, such as the codon adaptation index, while now carrying an evolutionary interpretation. On average, selection for codon usage in budding yeast is weak, yet our estimates show that genes range from virtually unselected to average per-codon selection coefficients above the inverse population size. Our analytical framework may be generally useful for distinguishing biological signals from measurement noise in other applications that depend upon measurements of gene expression.