American Statistical Association
New York City
Metropolitan Area Chapter

Levin Lecture Series: Fall 2019 Colloquium Seminars
Department of Biostatistics
Mailman School of Public Health
Columbia University



TARGETED MACHINE LEARNING FOR
CAUSAL INFERENCE BASED ON REAL WORLD DATA

by

Dr. Mark van der Laan
Jiann-Ping Hsu/Karl E. Peace Endowed Chair and Professor of Biostatistics
University of California - Berkeley

Host: Caleb Miles


Abstract

We discuss a general roadmap for generating causal inference based on observational studies used to general real world evidence. This defines the statistical estimation problem in terms of knowledge about the data generating experiment and a target estimand, where the target estimand is aimed to identify or best approximate the causal quantity of interest. We review targeted minimum loss estimation (TMLE), which provides a general template for the construction of asymptotically efficient plug-in estimators of a target estimand for realistic (i.e, infinite dimensional) statistical models. TMLE is a two stage procedure that first involves using ensemble machine learning termed super-learning to estimate the relevant stochastic relations between the treatment, censoring, covariates and outcome of interest. The super-learner allows one to fully utilize all the advances in machine learning (in addition to more conventional parametric model based estimators) to build a single most powerful machine learning algorithm. In the second step, the TMLE involves maximizing a parametric likelihood along a so-called least favorable parametric model through the super-learner fit of the relevant stochastic relations in the observed data, where this least favorable parametric model will also involve an estimator of the treatment and censoring mechanism. This second step bridges the state of the art in machine learning to estimators of target estimands for which statistical inference is available (i.e, confidence intervals, p-values etc). We present an approach collaborative TMLE to regularize the targeting step, involving targeted estimation of the treatment and censoring mechanism, thereby further optimizing and robustifying the TMLE.

The asymptotic normality and efficiency of the TMLE relies on the asymptotic negligibility of a second-order remainder term. This typically requires the initial (super-learner) estimator to converge at a rate faster than n-1/4 in sample size n. We show that a new Highly Adaptive LASSO (HAL) of the data distribution and its functionals converges indeed at a sufficient rate regardless of the dimensionality of the data/model, under almost no additional regularity. This allows us to propose a general TMLE, using a super-learner whose library includes HAL, that is asymptotically normal and efficient in great generality.

We demonstrate the practical performance of the corresponding HAL-TMLE (and its confidence intervals) for the average causal effect for dimensions up till 10 based on simulations that randomly generate data distributions. We also discuss a nonparametric bootstrap method for inference taking into account the higher order contributions of the HAL-TMLE, providing excellent robust coverage.


Date: Thursday, October 17, 2019
Time: 11:30 A.M. - 12:30 P.M.
Location: Mailman School of Public Health
Department of Biostatistics
722 West 168th Street
AR Building
8th Floor Auditorium
New York, New York

Home Page | Chapter News | Chapter Officers | Chapter Events
Other Metro Area Events | ASA National Home Page | Links To Other Websites
NYC ASA Chapter Constitution | NYC ASA Chapter By-Laws

Page last modified on October 11, 2019
Copyright © 1998-2019 by New York City Metropolitan Area Chapter of the ASA
Designed and maintained by Cynthia Scherer
Send questions or comments to nycasa@nycasa.org