American Statistical Association
We discuss a general roadmap for generating causal inference based on observational studies used to general real world evidence. This defines the statistical estimation problem in terms of knowledge about the data generating experiment and a target estimand, where the target estimand is aimed to identify or best approximate the causal quantity of interest. We review targeted minimum loss estimation (TMLE), which provides a general template for the construction of asymptotically efficient plug-in estimators of a target estimand for realistic (i.e, infinite dimensional) statistical models. TMLE is a two stage procedure that first involves using ensemble machine learning termed super-learning to estimate the relevant stochastic relations between the treatment, censoring, covariates and outcome of interest. The super-learner allows one to fully utilize all the advances in machine learning (in addition to more conventional parametric model based estimators) to build a single most powerful machine learning algorithm. In the second step, the TMLE involves maximizing a parametric likelihood along a so-called least favorable parametric model through the super-learner fit of the relevant stochastic relations in the observed data, where this least favorable parametric model will also involve an estimator of the treatment and censoring mechanism. This second step bridges the state of the art in machine learning to estimators of target estimands for which statistical inference is available (i.e, confidence intervals, p-values etc). We present an approach collaborative TMLE to regularize the targeting step, involving targeted estimation of the treatment and censoring mechanism, thereby further optimizing and robustifying the TMLE.
The asymptotic normality and efficiency of the TMLE relies on the asymptotic negligibility of a second-order remainder term. This typically requires the initial (super-learner) estimator to converge at a rate faster than n-1/4 in sample size n. We show that a new Highly Adaptive LASSO (HAL) of the data distribution and its functionals converges indeed at a sufficient rate regardless of the dimensionality of the data/model, under almost no additional regularity. This allows us to propose a general TMLE, using a super-learner whose library includes HAL, that is asymptotically normal and efficient in great generality.
We demonstrate the practical performance of the corresponding HAL-TMLE (and its confidence intervals) for the average causal effect for dimensions up till 10 based on simulations that randomly generate data distributions. We also discuss a nonparametric bootstrap method for inference taking into account the higher order contributions of the HAL-TMLE, providing excellent robust coverage.
|Date:||Thursday, October 17, 2019|
|Time:||11:30 A.M. - 12:30 P.M.|
Mailman School of Public Health
Department of Biostatistics
722 West 168th Street
8th Floor Auditorium
New York, New York