Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

Machine Learning and Causal Inference

for Policy Evaluation


Susan Athey
Stanford Graduate School of Business
655 Knight Way
Stanford, CA 94305
1-650-725-1813
[email protected]

ABSTRACT the causal question of interest, and “attributes,” that is, features or
A large literature on causal inference in statistics, econometrics, variables that describe attributes of individual units that are held
biostatistics, and epidemiology (see, e.g., Imbens and Rubin fixed when policies change. Specifically, we propose to divide
[2015] for a recent survey) has focused on methods for statistical the features of a model into causal features, whose values may be
estimation and inference in a setting where the researcher wishes manipulated in a counterfactual policy environment, and
to answer a question about the (counterfactual) impact of a attributes. A second theme is that relative to conventional tools
change in a policy, or “treatment” in the terminology of the from the policy evaluation literature, tools from supervised
literature. The policy change has not necessarily been observed machine learning can be particularly effective at modeling the
before, or may have been observed only for a subset of the association of outcomes with attributes, as well as in modeling
population; examples include a change in minimum wage law or a how causal effects vary with attributes. A final theme is that
change in a firm’s price. The goal is then to estimate the impact of modifications of existing methods may be required to deal with
small set of “treatments” using data from randomized experiments the “fundamental problem of causal inference,” namely, that no
or, more commonly, “observational” studies (that is, non- unit is observed in multiple counterfactual worlds at the same
experimental data). The literature identifies a variety of time: we do not see a patient at the same time with and without
assumptions that, when satisfied, allow the researcher to draw the medication, and we do not see a consumer at the same moment
same types of conclusions that would be available from a exposed to two different prices. This creates a substantial
randomized experiment. To estimate causal effects given non- challenge for cross-validation, as the ground truth for the causal
random assignment of individuals to alternative policies in effect is not observed for any individual.
observational studies, popular techniques include propensity score The talk reviews several lines of research that incorporate these
weighting, matching, and regression analysis; all of these methods themes. The first, exemplified by Athey and Imbens [2015a],
adjust for differences in observed attributes of individuals. focuses on estimating heterogeneity in treatment effects,
Another strand of literature in econometrics, referred to as identifying (based on unit attributes) subpopulations of units that
“structural modeling,” fully specifies the preferences of actors as have larger or smaller than average treatment effects. The method
well as a behavioral model, and estimates those parameters from enables valid inference: confidence intervals for the size of the
data (for applications to auction-based electronic commerce, see treatment effect in each subpopulation are derived. Thus, large-
Athey and Haile [2007] and Athey and Nekipelov [2012]). In scale randomized experiments for drugs or A/B tests in online
both cases, parameter estimates are interpreted as “causal,” and settings can be evaluated systematically, with the method
they are used to make predictions about the effect of policy discovering the magnitude of treatment effect heterogeneity. The
changes. challenge in this setting is to find a method that is optimized for
In contrast, the supervised machine learning literature has the problem of predicting causal effects, rather than for predicting
traditionally focused on prediction, providing data-driven outcomes. The approach can also be applied to observational
approaches to building rich models and relying on cross- studies under some additional conditions. Our approach
validation as a powerful tool for model selection. These methods addresses the problem of cross-validation by constructing an
have been highly successful in practice. This talk will review unbiased (but noisy) estimate of each unit’s treatment effect.
several recent papers that attempt to bring the tools of supervised More generally, we pose the question of how to best modify
machine learning to bear on the problem of policy evaluation, supervised machine learning methods to use estimated parameters
where the papers are connected by three themes. rather than observed data in cross-validation. In ongoing
research, we explore this question in a variety of settings.
The first theme is that it important for both estimation and
inference to distinguish between parts of the model that relate to A second line of research analyzes robustness of causal estimates.
In applied social science studies of the impact of policy changes,
Permission to make digital or hard copies of part or all of this work for personal it is common for researchers to present a handful of alternative
or classroom use is granted without fee provided that copies are not made or models to assess the robustness of the causal estimates. Although
distributed for profit or commercial advantage and that copies bear this notice the importance of model robustness has been highlighted by many
and the full citation on the first page. Copyrights for third-party components of researchers (e.g. Leamer [1983]), to date no metric for the
this work must be honored. For all other uses, contact the Owner/Author(s).
Copyright is held by the owner/author(s).
robustness of a model has gained widespread adoption in the
KDD '15, August 10-13, 2015, Sydney, NSW, Australia. policy evaluation literature. Athey and Imbens [2015b] propose a
ACM 978-1-4503-3664-2/15/08.. measure of robustness of parameter estimates. A starting point is
DOI: http://dx.doi.org/10.1145/2783258.2785466 to define the causal estimand of interest as well as the attributes of

5
individuals in the dataset (features that may affect the robustness Short Biography
of the causal estimate). The method for constructing the Susan Athey is The Economics of Technology Professor at
robustness measure is inspired by the machine learning technique Stanford Graduate School of Business. She received her
of regression trees. The sample is split according to each attribute bachelor's degree from Duke University and her Ph.D. from
in turn, and the original model re-estimated on the two Stanford, and she holds an honorary doctorate from Duke
subsamples. The split point is determined as the one that leads to University. She previously taught at the economics departments at
the greatest improvement in model fit. An alternative estimate of MIT, Stanford and Harvard. In 2007, Professor Athey received
the causal effect is constructed by taking a weighted average of the John Bates Clark Medal, awarded by the American Economic
the estimates in the two subsamples. The robustness measure is Association to “that American
then defined as the standard deviation of the estimates, taken over economist under the age of forty who
all of the alternative estimates (one for each attribute). This is adjudged to have made the most
measure has some attractive properties: there is no need to define significant contribution to economic
an estimation approach other than the one used in the baseline thought and knowledge.” She was
model, and the measure is robust to monotone transformations of elected to the National Academy of
the individual attributes. The measure lacks other desirable Science in 2012 and to the American
properties, however: it can be reduced by adding irrelevant Academy of Arts and Sciences in
attributes to the model, for example. An ongoing research agenda 2008. Professor Athey’s research
addresses this and other issues. focuses on the economics of the
Finally, Abadie et al. [2014] consider the problem of inference in internet, online advertising, the news
environments where the researcher may observe a large part of a media, marketplace design, virtual currencies and the intersection
population, or an entire population. It is typical in social science of computer science, machine learning and economics. She
to treat causal features and attributes symmetrically when advises governments and businesses on marketplace design and
conducting inference about parameter estimates, and to justify platform economics, notably serving since 2007 as a long-term
inference by appealing to the idea that the data are a random consultant to Microsoft Corporation in a variety of roles,
sample from a larger population. We argue that this convention is including consulting chief economist.
not appropriate, and that the source of uncertainty for causal
estimands is not purely sampling variation, but rather uncertainty REFERENCES
arises because we do not observe all of the potential outcomes for [1] Guido Imbens and Donald Rubin. 2015. Causal Inference
any unit. The distinction is especially clear if we observe the for Statistics, Social and Biomedical Sciences: An
entire population of interest: we may observe average income for Introduction. Cambridge University Press: Cambridge,
all fifty states or all countries in the world, or we may observe all United Kingdom.
advertisers or sellers or consumers on an electronic commerce
platform. When the population is observed, there is no [2] Susan Athey and Philip Haile. 2007. Nonparametric
uncertainty about the answers to questions such as, what is the approaches to auctions. In James J. Heckman and Edward E.
average difference in income or average online purchases Leamer, eds. Handbook of Econometrics Volume 6, Elsevier,
between coastal and interior states. On the other hand, if we 3847-3965.
attempt to estimate the effect of changing minimum wage policy [3] Susan Athey and Denis Nekipelov. 2012. A Structural Model
or prices, we have residual uncertainty about the effect of making of Sponsored Search Advertising Auctions. Working paper,
such a change even if we observe a randomized experiment Stanford University. Retrieved May 30, 2015 from
comparing the two policies, as we do not observe any given unit http://faculty-
under multiple policies at the same time. We propose an gsb.stanford.edu/athey/documents/Structural_Sponsored_Sea
alternative approach to conducting inference in regression models rch.pdf.
that takes these factors into account, showing that in general [4] Susan Athey and Guido Imbens. 2015a. Machine learning
conventional standard errors are conservative. More broadly, this methods for estimating heterogeneous causal effects. ArXiv
paper highlights the theme that the theory of inference is different e-print number 1504.01132. Retrieved May 30, 2015 from
for causal estimates than it is for parameter estimates associated http://arxiv.org/abs/1504.01132.
with fixed attributes of individuals.
[5] Edward Leamer. 1983. Let’s take the con out of
econometrics. American Economic Review 73, 1 (Mar.
ACM Classification 1983), 725-736.
• Computing methodologies~Supervised learning by
regression • Computing methodologies~Classification and [6] Susan Athey and Guido Imbens. 2015b. A measure of
regression trees • Computing methodologies~Cross- robustness to misspecification. American Economic Review.
validation 105, 5 (May 2015), 476-80. DOI=10.1257/aer.p20151020.
[7] Alberto Abadie, Susan Athey, Guido Imbens, and Jeffrey
Keywords Wooldridge. 2014. Finite population standard errors. NBER
Supervised machine learning, cross-validation, causal inference, Working Paper Number 20325. Retrieved May 30, 2015
model robustness, policy evaluation, counterfactual prediction, from http://www.nber.org/papers/w20325.
randomized experiments, A/B tests, treatment effects.

You might also like