American Statistical Association
New York City
Metropolitan Area Chapter

New York State Psychiatric Institute
at Columbia University Medical Center
Biostatistics Seminar



MODEL SELECTION CRITERIA BASED ON
COMPUTATIONALLY INTENSIVE ESTIMATORS
OF THE EXPECTED OPTIMISM

by

Joseph E. Cavanaugh, Ph.D.
Department of Biostatistics
Department of Statistics and Actuarial Science
The University of Iowa


Abstract

A model selection criterion is often formulated by constructing an approximately unbiased estimator of an expected discrepancy, a measure that gauges the separation between the true model and a fitted candidate model. The expected discrepancy reflects how well, on average, the fitted candidate model predicts “new” data generated under the true model. A related measure, the estimated discrepancy, reflects how well the fitted candidate model predicts the data at hand. In general, a model selection criterion consists of a goodness-of-fit term and a penalty term. The natural estimator of the expected discrepancy, the estimated discrepancy, corresponds to the goodness-of-fit term of the criterion. However, the estimated discrepancy yields an overly optimistic assessment of how effectively the fitted model predicts new data. It therefore serves as a negatively biased estimator of the expected discrepancy. Correcting for this bias leads to the penalty term. Specifically, the penalty term provides an approximation to the expectation of the difference between the expected discrepancy and the estimated discrepancy, a measure known as the expected optimism.

Classical approaches to approximating the expected optimism often lead to simplistic penalty terms based on the sample size and the dimension of the fitted candidate model. However, such approaches generally involve large-sample arguments, restrictive assumptions on the form of the candidate model, or both. The resulting penalty terms may fail to perform adequately in small-sample applications or in settings where the requisite assumptions do not hold. Modern computational statistical methods, such as Monte Carlo simulation, bootstrapping, and cross validation, facilitate the development of flexible and accurate estimators of the expected optimism. Model selection criteria based on such penalty terms often provide more realistic measures of predictive efficacy than their classical counterparts, thereby resulting in superior model determinations.

In this talk, we review the general paradigm for discrepancy-based model selection criteria, and discuss computationally intensive approaches to approximating the expected optimism. We illustrate the utility of some of the resulting criteria in a simulation study based on the state-space time series modeling framework, and in an application from dentistry that involves generalized linear models.

Biographical Note

Joseph E. Cavanaugh is currently a Professor in the Department of Biostatistics at The University of Iowa with a secondary appointment in the Department of Statistics and Actuarial Science. He obtained his Ph.D. in Statistics in 1993 from the University of California, Davis. Dr. Cavanaugh’s research interests include biostatistics, model selection, time series analysis, state-space models, linear models, mixed models, modeling diagnostics, discrimination and classification, and computational statistics.


Date: Wednesday, February 25, 2009
Time: Presentation: 11:00 A.M. - 12:00 P.M.
Discussion: 12:00 - 12:30 P.M.
Location: Mailman School of Public Health
Department of Biostatistics
722 West 168th Street
Biostatistics Computer Lab
6th Floor - Room 656
New York, New York

RESERVATIONS ARE NOT REQUIRED

Coffee will be served.


Home Page | Chapter News | Chapter Officers | Chapter Events
Other Metro Area Events | ASA National Home Page | Links To Other Websites
NYC ASA Chapter Constitution | NYC ASA Chapter By-Laws

Copyright © 2009 by New York City Metropolitan Area Chapter of the ASA
Designed and maintained by Cynthia Scherer
Send questions or comments to nycasa@mindspring.com