Estimating a structured covariance matrix from multi-lab measurements in high-throughput biology

Franks AM, Csárdi G, Drummond DA, Airoldi EM, Journal of the American Statistical Association 110 (509) :27—44 (2014).


We consider the problem of quantifying the degree of coordination between transcription and translation, in yeast. Several studies have reported a surprising lack of coordination over the years, in organisms as different as yeast and humans, using diverse technologies. However, a close look at this literature suggests that the lack of reported correlation may not reflect the biology of regulation. These reports do not control for between-study biases and structure in the measurement errors, ignore key aspects of how the data connect to the estimand, and systematically underestimate the correlation as a consequence. Here, we design a careful meta-analysis of 27 yeast datasets, supported by a multilevel model, full uncertainty quantification, a suite of sensitivity analyses, and novel theory, to produce a more accurate estimate of the correlation between mRNA and protein levels—a proxy for coordination. From a statistical perspective, this problem motivates new theory on the impact of noise, model misspecifications, and nonignorable missing data on estimates of the correlation between high-dimensional responses. We find that the correlation between mRNA and protein levels is quite high under the studied conditions, in yeast, suggesting that post-transcriptional regulation plays a less prominent role than previously thought.