American Statistical Association
New York City
Metropolitan Area Chapter

Memorial Sloan-Kettering Cancer Center
Biostatistics Special Summer Seminar



Bhramar Mukerjee
Department of Biostatistics
University of Michigan

INCORPORATING AUXILIARY INFORMATION FOR IMPROVED PREDICTION
IN HIGH DIMENSIONAL DATASETS: AN ENSEMBLE OF SHRINKAGE APPROACHES

With advancement in genomic technologies, it is common that two high-dimensional datasets are available, both measuring the same underlying biological phenomenon with different techniques. We consider predicting a continuous outcome Y using X, a set of p-markers which is the best measure of the underlying biological process. This same biological process may also be measured by W, coming from prior technology but correlated with X. On a moderately sized sample we have (Y,X,W), and on larger sample we have (Y,W). We utilize the data on W to boost prediction of Y by X. When p is large and the sub-sample containing X is small, this is a p>n situation. When p is small, this is akin to the classical measurement error problem; however, ours is not the typical goal of calibrating W for use in future studies, but using the available data on W and Y in the larger sample while predicting Y with X in new subjects. We propose to shrink the regression coefficients of Y on X towards different targets that use information derived from W in the larger dataset, comparing these with the classical ridge regression of Y on X, which does not use W. We unify all of these methods a targeted ridge estimators. Finally, we propose a hybrid estimator which is a linear combination of multiple estimators of the regression coefficients and balances efficiency and robustness in a data-adaptive way to theoretically yield smaller prediction error than any of its constituents. We also explore fully Bayesian methods with hierarchical priors to conduct joint analysis of all the available data. The methods are evaluated via simulation studies. We apply them to a gene-expression dataset related to lung cancer. mRNA expression of 91 genes is measured by quantitative real-time polymerase chain reaction (qRT-PCR) and microarray technology on 47 lung cancer patients with microarray measurements available on an additional 392 patients. The goal is to predict survival time using qRT-PCR. The methods are evaluated on an independent sample of 101 patients.

This is joint work with Jeremy Taylor and Philip S. Boonstra.


Date: Wednesday, July 24, 2013
Time: 11:00 A.M. - 12:00 P.M.
Location: Memorial Sloan-Kettering Cancer Center
Department of Epidemiology and Biostatistics
307 East 63rd Street
(between First and Second Avenues)
3rd Floor Conference Room
New York, New York
Note: To gain access to the building, please follow the directions by the telephone in the foyer.

RESERVATIONS ARE NOT REQUIRED


The International Year of Statistics (Statistics2013)
Home Page | Chapter News | Chapter Officers | Chapter Events
Other Metro Area Events | ASA National Home Page | Links To Other Websites
NYC ASA Chapter Constitution | NYC ASA Chapter By-Laws

Page last modified on July 12, 2013

Copyright © 1998-2013 by New York City Metropolitan Area Chapter of the ASA
Designed and maintained by Cynthia Scherer
Send questions or comments to nycasa@mindspring.com