Skip to main content

Showing 1–32 of 32 results for author: Athey, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.09894  [pdf, other

    cs.LG econ.EM stat.ME stat.ML

    Estimating Wage Disparities Using Foundation Models

    Authors: Keyon Vafa, Susan Athey, David M. Blei

    Abstract: One thread of empirical work in social science focuses on decomposing group differences in outcomes into unexplained components and components explained by observable factors. In this paper, we study gender wage decompositions, which require estimating the portion of the gender wage gap explained by career histories of workers. Classical methods for decomposing the wage gap employ simple predictiv… ▽ More

    Submitted 15 September, 2024; originally announced September 2024.

  2. arXiv:2406.17972  [pdf, other

    cs.LG cs.CL econ.EM

    LABOR-LLM: Language-Based Occupational Representations with Large Language Models

    Authors: Tianyu Du, Ayush Kanodia, Herman Brunborg, Keyon Vafa, Susan Athey

    Abstract: Many empirical studies of labor market questions rely on estimating relatively simple predictive models using small, carefully constructed longitudinal survey datasets based on hand-engineered features. Large Language Models (LLMs), trained on massive datasets, encode vast quantities of world knowledge and can be used for the next job prediction problem. However, while an off-the-shelf LLM produce… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  3. arXiv:2405.04636  [pdf, ps, other

    cs.LG stat.ML

    Data-driven Error Estimation: Upper Bounding Multiple Errors with No Technical Debt

    Authors: Sanath Kumar Krishnamurthy, Susan Athey, Emma Brunskill

    Abstract: We formulate the problem of constructing multiple simultaneously valid confidence intervals (CIs) as estimating a high probability upper bound on the maximum error for a class/set of estimate-estimand-error tuples, and refer to this as the error estimation problem. For a single such tuple, data-driven confidence intervals can often be used to bound the error in our estimate. However, for a class o… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  4. arXiv:2310.08672  [pdf, other

    econ.EM cs.LG stat.ME stat.ML

    Machine Learning Who to Nudge: Causal vs Predictive Targeting in a Field Experiment on Student Financial Aid Renewal

    Authors: Susan Athey, Niall Keleher, Jann Spiess

    Abstract: In many settings, interventions may be more effective for some individuals than others, so that targeting interventions may be beneficial. We analyze the value of targeting in the context of a large-scale field experiment with over 53,000 college students, where the goal was to use "nudges" to encourage students to renew their financial-aid applications before a non-binding deadline. We begin with… ▽ More

    Submitted 31 May, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

  5. arXiv:2307.02108  [pdf, other

    cs.LG stat.ML

    Proportional Response: Contextual Bandits for Simple and Cumulative Regret Minimization

    Authors: Sanath Kumar Krishnamurthy, Ruohan Zhan, Susan Athey, Emma Brunskill

    Abstract: In many applications, e.g. in healthcare and e-commerce, the goal of a contextual bandit may be to learn an optimal treatment assignment policy at the end of the experiment. That is, to minimize simple regret. However, this objective remains understudied. We propose a new family of computationally efficient bandit algorithms for the stochastic contextual bandit setting, where a tuning parameter de… ▽ More

    Submitted 2 November, 2023; v1 submitted 5 July, 2023; originally announced July 2023.

  6. arXiv:2305.12407  [pdf, other

    cs.LG cs.DC econ.EM stat.ML

    Federated Offline Policy Learning with Heterogeneous Observational Data

    Authors: Aldo Gael Carranza, Susan Athey

    Abstract: We consider the problem of learning personalized decision policies on observational data from heterogeneous data sources. Moreover, we examine this problem in the federated setting where a central server aims to learn a policy on the data distributed across the heterogeneous sources without exchanging their raw data. We present a federated policy learning algorithm based on aggregation of local po… ▽ More

    Submitted 21 May, 2023; originally announced May 2023.

  7. arXiv:2304.01906  [pdf, other

    cs.LG cs.MS econ.EM

    Torch-Choice: A PyTorch Package for Large-Scale Choice Modelling with Python

    Authors: Tianyu Du, Ayush Kanodia, Susan Athey

    Abstract: The $\texttt{torch-choice}$ is an open-source library for flexible, fast choice modeling with Python and PyTorch. $\texttt{torch-choice}$ provides a $\texttt{ChoiceDataset}$ data structure to manage databases flexibly and memory-efficiently. The paper demonstrates constructing a $\texttt{ChoiceDataset}$ from databases of various formats and functionalities of $\texttt{ChoiceDataset}$. The package… ▽ More

    Submitted 14 July, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

  8. arXiv:2212.13638  [pdf, other

    cs.SI stat.AP

    Battling the Coronavirus Infodemic Among Social Media Users in Kenya and Nigeria

    Authors: Molly Offer-Westort, Leah R. Rosenzweig, Susan Athey

    Abstract: How can we induce social media users to be discerning when sharing information during a pandemic? An experiment on Facebook Messenger with users from Kenya (n = 7,498) and Nigeria (n = 7,794) tested interventions designed to decrease intentions to share COVID-19 misinformation without decreasing intentions to share factual posts. The initial stage of the study incorporated: (i) a factorial design… ▽ More

    Submitted 15 September, 2023; v1 submitted 27 December, 2022; originally announced December 2022.

    Comments: 52 pages including appendix, 9 figures

  9. arXiv:2211.12004  [pdf, other

    econ.EM cs.LG stat.ML

    Contextual Bandits in a Survey Experiment on Charitable Giving: Within-Experiment Outcomes versus Policy Learning

    Authors: Susan Athey, Undral Byambadalai, Vitor Hadad, Sanath Kumar Krishnamurthy, Weiwen Leung, Joseph Jay Williams

    Abstract: We design and implement an adaptive experiment (a ``contextual bandit'') to learn a targeted treatment assignment policy, where the goal is to use a participant's survey responses to determine which charity to expose them to in a donation solicitation. The design balances two competing objectives: optimizing the outcomes for the subjects in the experiment (``cumulative regret minimization'') and g… ▽ More

    Submitted 21 November, 2022; originally announced November 2022.

    ACM Class: G.3; I.2.6

  10. arXiv:2203.16668  [pdf, other

    cs.LG math.ST stat.ME stat.ML

    Flexible and Efficient Contextual Bandits with Heterogeneous Treatment Effect Oracles

    Authors: Aldo Gael Carranza, Sanath Kumar Krishnamurthy, Susan Athey

    Abstract: Contextual bandit algorithms often estimate reward models to inform decision-making. However, true rewards can contain action-independent redundancies that are not relevant for decision-making. We show it is more data-efficient to estimate any function that explains the reward differences between actions, that is, the treatment effects. Motivated by this observation, building on recent work on ora… ▽ More

    Submitted 24 February, 2023; v1 submitted 30 March, 2022; originally announced March 2022.

  11. arXiv:2202.08370  [pdf, other

    cs.LG econ.EM

    CAREER: A Foundation Model for Labor Sequence Data

    Authors: Keyon Vafa, Emil Palikot, Tianyu Du, Ayush Kanodia, Susan Athey, David M. Blei

    Abstract: Labor economists regularly analyze employment data by fitting predictive models to small, carefully constructed longitudinal survey datasets. Although machine learning methods offer promise for such problems, these survey datasets are too small to take advantage of them. In recent years large datasets of online resumes have also become available, providing data about the career trajectories of mil… ▽ More

    Submitted 29 February, 2024; v1 submitted 16 February, 2022; originally announced February 2022.

  12. arXiv:2107.11732  [pdf, other

    cs.LG econ.EM q-bio.QM stat.ME

    Federated Causal Inference in Heterogeneous Observational Data

    Authors: Ruoxuan Xiong, Allison Koenecke, Michael Powell, Zhu Shen, Joshua T. Vogelstein, Susan Athey

    Abstract: We are interested in estimating the effect of a treatment applied to individuals at multiple sites, where data is stored locally for each site. Due to privacy constraints, individual-level data cannot be shared across sites; the sites may also have heterogeneous populations and treatment assignment mechanisms. Motivated by these considerations, we develop federated methods to draw inference on the… ▽ More

    Submitted 2 April, 2023; v1 submitted 25 July, 2021; originally announced July 2021.

  13. arXiv:2106.06483  [pdf, ps, other

    cs.LG stat.ML

    Towards Costless Model Selection in Contextual Bandits: A Bias-Variance Perspective

    Authors: Sanath Kumar Krishnamurthy, Adrienne Margaret Propp, Susan Athey

    Abstract: Model selection in supervised learning provides costless guarantees as if the model that best balances bias and variance was known a priori. We study the feasibility of similar guarantees for cumulative regret minimization in the stochastic contextual bandit setting. Recent work [Marinov and Zimmert, 2021] identifies instances where no algorithm can guarantee costless regret bounds. Nevertheless,… ▽ More

    Submitted 23 October, 2023; v1 submitted 11 June, 2021; originally announced June 2021.

  14. arXiv:2106.02029  [pdf, other

    stat.ML cs.LG stat.ME

    Off-Policy Evaluation via Adaptive Weighting with Data from Contextual Bandits

    Authors: Ruohan Zhan, Vitor Hadad, David A. Hirshberg, Susan Athey

    Abstract: It has become increasingly common for data to be collected adaptively, for example using contextual bandits. Historical data of this type can be used to evaluate other treatment assignment policies to guide future innovation or experiments. However, policy evaluation is challenging if the target policy differs from the one used to collect data, and popular estimators, including doubly robust (DR)… ▽ More

    Submitted 10 June, 2021; v1 submitted 3 June, 2021; originally announced June 2021.

  15. arXiv:2105.02344  [pdf, other

    stat.ML cs.LG econ.EM

    Policy Learning with Adaptively Collected Data

    Authors: Ruohan Zhan, Zhimei Ren, Susan Athey, Zhengyuan Zhou

    Abstract: Learning optimal policies from historical data enables personalization in a wide variety of applications including healthcare, digital recommendations, and online education. The growing policy learning literature focuses on settings where the data collection rule stays fixed throughout the experiment. However, adaptive data collection is becoming more common in practice, from two primary sources:… ▽ More

    Submitted 16 November, 2022; v1 submitted 5 May, 2021; originally announced May 2021.

    Comments: Improved the upper bound; added simulations

  16. arXiv:2102.13240  [pdf, other

    cs.LG stat.ML

    Adapting to Misspecification in Contextual Bandits with Offline Regression Oracles

    Authors: Sanath Kumar Krishnamurthy, Vitor Hadad, Susan Athey

    Abstract: Computationally efficient contextual bandits are often based on estimating a predictive model of rewards given contexts and arms using past data. However, when the reward model is not well-specified, the bandit algorithm may incur unexpected regret, so recent work has focused on algorithms that are robust to misspecification. We propose a simple family of contextual bandit algorithms that adapt to… ▽ More

    Submitted 11 June, 2021; v1 submitted 25 February, 2021; originally announced February 2021.

    Comments: ICML 2021

  17. arXiv:2010.13013  [pdf, other

    cs.LG math.ST stat.ML

    Tractable contextual bandits beyond realizability

    Authors: Sanath Kumar Krishnamurthy, Vitor Hadad, Susan Athey

    Abstract: Tractable contextual bandit algorithms often rely on the realizability assumption - i.e., that the true expected reward model belongs to a known class, such as linear functions. In this work, we present a tractable bandit algorithm that is not sensitive to the realizability assumption and computationally reduces to solving a constrained regression problem in every epoch. When realizability does no… ▽ More

    Submitted 25 February, 2021; v1 submitted 24 October, 2020; originally announced October 2020.

    Comments: 35 pages, 6 figures

  18. arXiv:2002.09814  [pdf, other

    cs.LG econ.EM stat.ML

    Survey Bandits with Regret Guarantees

    Authors: Sanath Kumar Krishnamurthy, Susan Athey

    Abstract: We consider a variant of the contextual bandit problem. In standard contextual bandits, when a user arrives we get the user's complete feature vector and then assign a treatment (arm) to that user. In a number of applications (like healthcare), collecting features from users can be costly. To address this issue, we propose algorithms that avoid needless feature collection while maintaining strong… ▽ More

    Submitted 22 February, 2020; originally announced February 2020.

    Comments: 17 pages, 10 figures

  19. arXiv:2001.11713  [pdf, other

    cs.LG stat.ML

    Stable Prediction with Model Misspecification and Agnostic Distribution Shift

    Authors: Kun Kuang, Ruoxuan Xiong, Peng Cui, Susan Athey, Bo Li

    Abstract: For many machine learning algorithms, two main assumptions are required to guarantee performance. One is that the test data are drawn from the same distribution as the training data, and the other is that the model is correctly specified. In real applications, however, we often have little prior knowledge on the test data and on the underlying true model. Under model misspecification, agnostic dis… ▽ More

    Submitted 31 January, 2020; originally announced January 2020.

  20. arXiv:1911.02768  [pdf, other

    stat.ML cs.LG stat.ME

    Confidence Intervals for Policy Evaluation in Adaptive Experiments

    Authors: Vitor Hadad, David A. Hirshberg, Ruohan Zhan, Stefan Wager, Susan Athey

    Abstract: Adaptive experiment designs can dramatically improve statistical efficiency in randomized trials, but they also complicate statistical inference. For example, it is now well known that the sample mean is biased in adaptive trials. Inferential challenges are exacerbated when our parameter of interest differs from the parameter the trial was designed to target, such as when we are interested in esti… ▽ More

    Submitted 12 February, 2021; v1 submitted 7 November, 2019; originally announced November 2019.

  21. arXiv:1908.09874  [pdf, other

    stat.ML cs.LG

    Sufficient Representations for Categorical Variables

    Authors: Jonathan Johannemann, Vitor Hadad, Susan Athey, Stefan Wager

    Abstract: Many learning algorithms require categorical data to be transformed into real vectors before it can be used as input. Often, categorical variables are encoded as one-hot (or dummy) vectors. However, this mode of representation can be wasteful since it adds many low-signal regressors, especially when the number of unique categories is large. In this paper, we investigate simple alternative solution… ▽ More

    Submitted 28 October, 2021; v1 submitted 26 August, 2019; originally announced August 2019.

  22. arXiv:1906.02635  [pdf, other

    cs.LG econ.EM stat.ML

    Counterfactual Inference for Consumer Choice Across Many Product Categories

    Authors: Rob Donnelly, Francisco R. Ruiz, David Blei, Susan Athey

    Abstract: This paper proposes a method for estimating consumer preferences among discrete choices, where the consumer chooses at most one product in a category, but selects from multiple categories in parallel. The consumer's utility is additive in the different categories. Her preferences about product attributes as well as her price sensitivity vary across products and are in general correlated across pro… ▽ More

    Submitted 6 August, 2023; v1 submitted 6 June, 2019; originally announced June 2019.

    Journal ref: Quantitative Marketing and Economics, volume 19, pages 369-407 (2021)

  23. arXiv:1812.06227  [pdf, other

    cs.LG stat.ML

    Balanced Linear Contextual Bandits

    Authors: Maria Dimakopoulou, Zhengyuan Zhou, Susan Athey, Guido Imbens

    Abstract: Contextual bandit algorithms are sensitive to the estimation method of the outcome model as well as the exploration method used, particularly in the presence of rich heterogeneity or complex outcome models, which can lead to difficult estimation problems along the path of learning. We develop algorithms for contextual bandits with linear payoffs that integrate balancing methods from the causal inf… ▽ More

    Submitted 14 December, 2018; originally announced December 2018.

    Comments: AAAI 2019 Oral Presentation. arXiv admin note: substantial text overlap with arXiv:1711.07077

  24. arXiv:1810.04778  [pdf, other

    stat.ML cs.LG econ.EM

    Offline Multi-Action Policy Learning: Generalization and Optimization

    Authors: Zhengyuan Zhou, Susan Athey, Stefan Wager

    Abstract: In many settings, a decision-maker wishes to learn a rule, or policy, that maps from observable characteristics of an individual to an action. Examples include selecting offers, prices, advertisements, or emails to send to consumers, as well as the problem of determining which medication to prescribe to a patient. While there is a growing body of literature devoted to this problem, most existing r… ▽ More

    Submitted 19 November, 2018; v1 submitted 10 October, 2018; originally announced October 2018.

  25. arXiv:1808.05293  [pdf, ps, other

    econ.EM cs.LG math.ST

    Design-based Analysis in Difference-In-Differences Settings with Staggered Adoption

    Authors: Susan Athey, Guido Imbens

    Abstract: In this paper we study estimation of and inference for average treatment effects in a setting with panel data. We focus on the setting where units, e.g., individuals, firms, or states, adopt the policy or treatment of interest at a particular point in time, and then remain exposed to this treatment at all times afterwards. We take a design perspective where we investigate the properties of estimat… ▽ More

    Submitted 1 September, 2018; v1 submitted 15 August, 2018; originally announced August 2018.

  26. arXiv:1807.11408  [pdf, other

    stat.ML cs.LG econ.EM math.ST

    Local Linear Forests

    Authors: Rina Friedberg, Julie Tibshirani, Susan Athey, Stefan Wager

    Abstract: Random forests are a powerful method for non-parametric regression, but are limited in their ability to fit smooth signals, and can show poor predictive performance in the presence of strong, smooth effects. Taking the perspective of random forests as an adaptive kernel method, we pair the forest kernel with a local linear regression adjustment to better capture smoothness. The resulting procedure… ▽ More

    Submitted 4 September, 2020; v1 submitted 30 July, 2018; originally announced July 2018.

    Comments: Forthcoming in the Journal of Computational and Graphical Statistics

  27. arXiv:1806.06270  [pdf, other

    cs.LG stat.ML

    Stable Prediction across Unknown Environments

    Authors: Kun Kuang, Ruoxuan Xiong, Peng Cui, Susan Athey, Bo Li

    Abstract: In many important machine learning applications, the training distribution used to learn a probabilistic classifier differs from the testing distribution on which the classifier will be used to make predictions. Traditional methods correct the distribution shift by reweighting the training data with the ratio of the density between test and training data. In many applications training takes place… ▽ More

    Submitted 10 July, 2018; v1 submitted 16 June, 2018; originally announced June 2018.

  28. arXiv:1801.07826  [pdf, other

    econ.EM cs.AI stat.AP stat.ML

    Estimating Heterogeneous Consumer Preferences for Restaurants and Travel Time Using Mobile Location Data

    Authors: Susan Athey, David Blei, Robert Donnelly, Francisco Ruiz, Tobias Schmidt

    Abstract: This paper analyzes consumer choices over lunchtime restaurants using data from a sample of several thousand anonymous mobile phone users in the San Francisco Bay Area. The data is used to identify users' approximate typical morning location, as well as their choices of lunchtime restaurants. We build a model where restaurants have latent characteristics (whose distribution may depend on restauran… ▽ More

    Submitted 22 January, 2018; originally announced January 2018.

  29. arXiv:1711.07077  [pdf, other

    stat.ML cs.LG econ.EM

    Estimation Considerations in Contextual Bandits

    Authors: Maria Dimakopoulou, Zhengyuan Zhou, Susan Athey, Guido Imbens

    Abstract: Contextual bandit algorithms are sensitive to the estimation method of the outcome model as well as the exploration method used, particularly in the presence of rich heterogeneity or complex outcome models, which can lead to difficult estimation problems along the path of learning. We study a consideration for the exploration vs. exploitation framework that does not arise in multi-armed bandits bu… ▽ More

    Submitted 16 December, 2018; v1 submitted 19 November, 2017; originally announced November 2017.

  30. arXiv:1711.03560  [pdf, other

    stat.ML cs.LG econ.EM

    SHOPPER: A Probabilistic Model of Consumer Choice with Substitutes and Complements

    Authors: Francisco J. R. Ruiz, Susan Athey, David M. Blei

    Abstract: We develop SHOPPER, a sequential probabilistic model of shopping data. SHOPPER uses interpretable components to model the forces that drive how a customer chooses products; in particular, we designed SHOPPER to capture how items interact with other items. We develop an efficient posterior inference algorithm to estimate these forces from large-scale data, and we analyze a large dataset from a majo… ▽ More

    Submitted 9 June, 2019; v1 submitted 9 November, 2017; originally announced November 2017.

    Comments: Published at Annals of Applied Statistics. 27 pages, 4 figures

  31. arXiv:1709.10367  [pdf, other

    cs.CL cs.LG stat.ML

    Structured Embedding Models for Grouped Data

    Authors: Maja Rudolph, Francisco Ruiz, Susan Athey, David Blei

    Abstract: Word embeddings are a powerful approach for analyzing language, and exponential family embeddings (EFE) extend them to other types of data. Here we develop structured exponential family embeddings (S-EFE), a method for discovering embeddings that vary across related groups of data. We study how the word usage of U.S. Congressional speeches varies across states and party affiliation, how words are… ▽ More

    Submitted 28 September, 2017; originally announced September 2017.

  32. arXiv:1702.02896  [pdf, other

    math.ST cs.LG econ.EM stat.ML

    Policy Learning with Observational Data

    Authors: Susan Athey, Stefan Wager

    Abstract: In many areas, practitioners seek to use observational data to learn a treatment assignment policy that satisfies application-specific constraints, such as budget, fairness, simplicity, or other functional form constraints. For example, policies may be restricted to take the form of decision trees based on a limited set of easily observable individual characteristics. We propose a new approach to… ▽ More

    Submitted 4 September, 2020; v1 submitted 9 February, 2017; originally announced February 2017.

    Comments: Forthcoming in Econometrica. Original title: Efficient Policy Learning