American Statistical Association
With advancement in genomic technologies, it is common that two high-dimensional datasets are available, both measuring the same underlying biological phenomenon with different techniques. We consider predicting a continuous outcome Y using X, a set of p-markers which is the best measure of the underlying biological process. This same biological process may also be measured by W, coming from prior technology but correlated with X. On a moderately sized sample we have (Y,X,W), and on larger sample we have (Y,W). We utilize the data on W to boost prediction of Y by X. When p is large and the sub-sample containing X is small, this is a p>n situation. When p is small, this is akin to the classical measurement error problem; however, ours is not the typical goal of calibrating W for use in future studies, but using the available data on W and Y in the larger sample while predicting Y with X in new subjects. We propose to shrink the regression coefficients of Y on X towards different targets that use information derived from W in the larger dataset, comparing these with the classical ridge regression of Y on X, which does not use W. We unify all of these methods a targeted ridge estimators. Finally, we propose a hybrid estimator which is a linear combination of multiple estimators of the regression coefficients and balances efficiency and robustness in a data-adaptive way to theoretically yield smaller prediction error than any of its constituents. We also explore fully Bayesian methods with hierarchical priors to conduct joint analysis of all the available data. The methods are evaluated via simulation studies. We apply them to a gene-expression dataset related to lung cancer. mRNA expression of 91 genes is measured by quantitative real-time polymerase chain reaction (qRT-PCR) and microarray technology on 47 lung cancer patients with microarray measurements available on an additional 392 patients. The goal is to predict survival time using qRT-PCR. The methods are evaluated on an independent sample of 101 patients.
|Date:||Wednesday, July 24, 2013|
|Time:||11:00 A.M. - 12:00 P.M.|
Memorial Sloan-Kettering Cancer Center
Department of Epidemiology and Biostatistics
307 East 63rd Street
(between First and Second Avenues)
3rd Floor Conference Room
New York, New York
Note: To gain access to the building, please follow the directions by the telephone in the foyer.