Skip to main content

Showing 1–50 of 86 results for author: Ghavamzadeh, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.01863  [pdf, other

    cs.LG cs.AI

    Confidence-aware Reward Optimization for Fine-tuning Text-to-Image Models

    Authors: Kyuyoung Kim, Jongheon Jeong, Minyong An, Mohammad Ghavamzadeh, Krishnamurthy Dvijotham, Jinwoo Shin, Kimin Lee

    Abstract: Fine-tuning text-to-image models with reward functions trained on human feedback data has proven effective for aligning model behavior with human intent. However, excessive optimization with such reward models, which serve as mere proxy objectives, can compromise the performance of fine-tuned models, a phenomenon known as reward overoptimization. To investigate this issue in depth, we introduce th… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: ICLR 2024

  2. arXiv:2401.08016  [pdf, other

    cs.LG stat.ML

    Contextual Bandits with Stage-wise Constraints

    Authors: Aldo Pacchiano, Mohammad Ghavamzadeh, Peter Bartlett

    Abstract: We study contextual bandits in the presence of a stage-wise constraint (a constraint at each round), when the constraint must be satisfied both with high probability and in expectation. Obviously the setting where the constraint is in expectation is a relaxation of the one with high probability. We start with the linear case where both the contextual bandit problem (reward function) and the stage-… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

    Comments: 53 pages. arXiv admin note: text overlap with arXiv:2006.10185

  3. arXiv:2311.17855  [pdf, other

    cs.LG cs.AI eess.SY math.OC stat.ML

    Maximum Entropy Model Correction in Reinforcement Learning

    Authors: Amin Rakhsha, Mete Kemertas, Mohammad Ghavamzadeh, Amir-massoud Farahmand

    Abstract: We propose and theoretically analyze an approach for planning with an approximate model in reinforcement learning that can reduce the adverse impact of model error. If the model is accurate enough, it accelerates the convergence to the true value function too. One of its key components is the MaxEnt Model Correction (MoCo) procedure that corrects the model's next-state distributions based on a Max… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

  4. arXiv:2311.02085  [pdf, other

    cs.IR cs.AI

    Preference Elicitation with Soft Attributes in Interactive Recommendation

    Authors: Erdem Biyik, Fan Yao, Yinlam Chow, Alex Haig, Chih-wei Hsu, Mohammad Ghavamzadeh, Craig Boutilier

    Abstract: Preference elicitation plays a central role in interactive recommender systems. Most preference elicitation approaches use either item queries that ask users to select preferred items from a slate, or attribute queries that ask them to express their preferences for item characteristics. Unfortunately, users often wish to describe their preferences using soft attributes for which no ground-truth se… ▽ More

    Submitted 22 October, 2023; originally announced November 2023.

  5. arXiv:2310.18434  [pdf, other

    cs.LG stat.ML

    Bridging Distributionally Robust Learning and Offline RL: An Approach to Mitigate Distribution Shift and Partial Data Coverage

    Authors: Kishan Panaganti, Zaiyan Xu, Dileep Kalathil, Mohammad Ghavamzadeh

    Abstract: The goal of an offline reinforcement learning (RL) algorithm is to learn optimal polices using historical (offline) data, without access to the environment for online exploration. One of the main challenges in offline RL is the distribution shift which refers to the difference between the state-action visitation distribution of the data generating policy and the learning policy. Many recent works… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.

    Comments: 33 pages, preprint

  6. arXiv:2310.06176  [pdf, other

    cs.AI

    Factual and Personalized Recommendations using Language Models and Reinforcement Learning

    Authors: Jihwan Jeong, Yinlam Chow, Guy Tennenholtz, Chih-Wei Hsu, Azamat Tulepbergenov, Mohammad Ghavamzadeh, Craig Boutilier

    Abstract: Recommender systems (RSs) play a central role in connecting users to content, products, and services, matching candidate items to users based on their preferences. While traditional RSs rely on implicit user feedback signals, conversational RSs interact with users in natural language. In this work, we develop a comPelling, Precise, Personalized, Preference-relevant language model (P4LM) that recom… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

  7. arXiv:2306.01237  [pdf, other

    cs.LG stat.ML

    Bayesian Regret Minimization in Offline Bandits

    Authors: Marek Petrik, Guy Tennenholtz, Mohammad Ghavamzadeh

    Abstract: We study how to make decisions that minimize Bayesian regret in offline linear bandits. Prior work suggests that one must take actions with maximum lower confidence bound (LCB) on their reward. We argue that the reliance on LCB is inherently flawed in this setting and propose a new algorithm that directly minimizes upper bounds on the Bayesian regret using efficient conic optimization solvers. Our… ▽ More

    Submitted 2 July, 2024; v1 submitted 1 June, 2023; originally announced June 2023.

    Journal ref: International Conference on Machine Learning, 2024

  8. arXiv:2305.16381  [pdf, other

    cs.LG cs.CV

    DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models

    Authors: Ying Fan, Olivia Watkins, Yuqing Du, Hao Liu, Moonkyung Ryu, Craig Boutilier, Pieter Abbeel, Mohammad Ghavamzadeh, Kangwook Lee, Kimin Lee

    Abstract: Learning from human feedback has been shown to improve text-to-image models. These techniques first learn a reward function that captures what humans care about in the task and then improve the models based on the learned reward function. Even though relatively simple approaches (e.g., rejection sampling based on reward scores) have been investigated, fine-tuning text-to-image models with the rewa… ▽ More

    Submitted 1 November, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023

  9. arXiv:2305.07751  [pdf, other

    cs.LG cs.CR cs.IT math.ST

    Private and Communication-Efficient Algorithms for Entropy Estimation

    Authors: Gecia Bravo-Hermsdorff, Róbert Busa-Fekete, Mohammad Ghavamzadeh, Andres Muñoz Medina, Umar Syed

    Abstract: Modern statistical estimation is often performed in a distributed setting where each sample belongs to a single user who shares their data with a central server. Users are typically concerned with preserving the privacy of their samples, and also with minimizing the amount of data they must transmit to the server. We give improved private and communication-efficient algorithms for estimating sever… ▽ More

    Submitted 12 May, 2023; originally announced May 2023.

    Comments: Originally published at the 36th Conference on Neural Information Processing Systems (NeurIPS 2022). This version corrects some errors in the original version

  10. arXiv:2304.12477  [pdf, other

    math.OC cs.AI

    On Dynamic Programming Decompositions of Static Risk Measures in Markov Decision Processes

    Authors: Jia Lin Hau, Erick Delage, Mohammad Ghavamzadeh, Marek Petrik

    Abstract: Optimizing static risk-averse objectives in Markov decision processes is difficult because they do not admit standard dynamic programming equations common in Reinforcement Learning (RL) algorithms. Dynamic programming decompositions that augment the state space with discrete risk levels have recently gained popularity in the RL community. Prior work has shown that these decompositions are optimal… ▽ More

    Submitted 23 April, 2024; v1 submitted 24 April, 2023; originally announced April 2023.

    Journal ref: Advances in Neural Information Processing Systems (Neurips), 2023

  11. arXiv:2304.11431  [pdf, other

    cs.CV

    A Review of Deep Learning for Video Captioning

    Authors: Moloud Abdar, Meenakshi Kollati, Swaraja Kuraparthi, Farhad Pourpanah, Daniel McDuff, Mohammad Ghavamzadeh, Shuicheng Yan, Abduallah Mohamed, Abbas Khosravi, Erik Cambria, Fatih Porikli

    Abstract: Video captioning (VC) is a fast-moving, cross-disciplinary area of research that bridges work in the fields of computer vision, natural language processing (NLP), linguistics, and human-computer interaction. In essence, VC involves understanding a video and describing it with language. Captioning is used in a host of applications from creating more accessible interfaces (e.g., low-vision navigatio… ▽ More

    Submitted 22 April, 2023; originally announced April 2023.

    Comments: 42 pages, 10 figures

  12. arXiv:2302.12192  [pdf, other

    cs.LG cs.AI cs.CV

    Aligning Text-to-Image Models using Human Feedback

    Authors: Kimin Lee, Hao Liu, Moonkyung Ryu, Olivia Watkins, Yuqing Du, Craig Boutilier, Pieter Abbeel, Mohammad Ghavamzadeh, Shixiang Shane Gu

    Abstract: Deep generative models have shown impressive results in text-to-image synthesis. However, current text-to-image models often generate images that are inadequately aligned with text prompts. We propose a fine-tuning method for aligning such models using human feedback, comprising three stages. First, we collect human feedback assessing model output alignment from a set of diverse text prompts. We t… ▽ More

    Submitted 23 February, 2023; originally announced February 2023.

  13. arXiv:2302.10850  [pdf, other

    cs.LG cs.AI cs.CL

    Offline Reinforcement Learning for Mixture-of-Expert Dialogue Management

    Authors: Dhawal Gupta, Yinlam Chow, Aza Tulepbergenov, Mohammad Ghavamzadeh, Craig Boutilier

    Abstract: Reinforcement learning (RL) has shown great promise for developing dialogue management (DM) agents that are non-myopic, conduct rich conversations, and maximize overall user satisfaction. Despite recent developments in RL and language models (LMs), using RL to power conversational chatbots remains challenging, in part because RL requires online exploration to learn effectively, whereas collecting… ▽ More

    Submitted 29 October, 2023; v1 submitted 21 February, 2023; originally announced February 2023.

    Comments: Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS 2023)

  14. arXiv:2212.04720  [pdf, other

    cs.LG cs.AI

    Multi-Task Off-Policy Learning from Bandit Feedback

    Authors: Joey Hong, Branislav Kveton, Sumeet Katariya, Manzil Zaheer, Mohammad Ghavamzadeh

    Abstract: Many practical applications, such as recommender systems and learning to rank, involve solving multiple similar tasks. One example is learning of recommendation policies for users with similar movie preferences, where the users may still rank the individual movies slightly differently. Such tasks can be organized in a hierarchy, where similar tasks are related through a shared structure. In this w… ▽ More

    Submitted 9 December, 2022; originally announced December 2022.

    Comments: 14 pages, 3 figures

  15. arXiv:2211.13937  [pdf, other

    cs.LG cs.AI eess.SY math.OC stat.ML

    Operator Splitting Value Iteration

    Authors: Amin Rakhsha, Andrew Wang, Mohammad Ghavamzadeh, Amir-massoud Farahmand

    Abstract: We introduce new planning and reinforcement learning algorithms for discounted MDPs that utilize an approximate model of the environment to accelerate the convergence of the value function. Inspired by the splitting approach in numerical linear algebra, we introduce Operator Splitting Value Iteration (OS-VI) for both Policy Evaluation and Control problems. OS-VI achieves a much faster convergence… ▽ More

    Submitted 25 November, 2022; originally announced November 2022.

    Comments: Accepted to NeurIPS2022

  16. arXiv:2209.04067  [pdf, other

    cs.LG cs.AI

    RASR: Risk-Averse Soft-Robust MDPs with EVaR and Entropic Risk

    Authors: Jia Lin Hau, Marek Petrik, Mohammad Ghavamzadeh, Reazul Russel

    Abstract: Prior work on safe Reinforcement Learning (RL) has studied risk-aversion to randomness in dynamics (aleatory) and to model uncertainty (epistemic) in isolation. We propose and analyze a new framework to jointly model the risk associated with epistemic and aleatory uncertainties in finite-horizon and discounted infinite-horizon MDPs. We call this framework that combines Risk-Averse and Soft-Robust… ▽ More

    Submitted 14 September, 2022; v1 submitted 8 September, 2022; originally announced September 2022.

    Journal ref: Artificial Intelligence and Statistics (AISTATS), 2023

  17. arXiv:2208.05129  [pdf, other

    cs.LG cs.AI stat.ML

    Robust Reinforcement Learning using Offline Data

    Authors: Kishan Panaganti, Zaiyan Xu, Dileep Kalathil, Mohammad Ghavamzadeh

    Abstract: The goal of robust reinforcement learning (RL) is to learn a policy that is robust against the uncertainty in model parameters. Parameter uncertainty commonly occurs in many real-world RL applications due to simulator modeling errors, changes in the real-world system dynamics over time, and adversarial disturbances. Robust RL is typically formulated as a max-min problem, where the objective is to… ▽ More

    Submitted 18 October, 2022; v1 submitted 9 August, 2022; originally announced August 2022.

    Comments: Appeared in Neural Information Processing Systems (NeurIPS) 2022

  18. arXiv:2207.00468  [pdf, other

    cs.CL cs.LG

    Reinforcement Learning of Multi-Domain Dialog Policies Via Action Embeddings

    Authors: Jorge A. Mendez, Alborz Geramifard, Mohammad Ghavamzadeh, Bing Liu

    Abstract: Learning task-oriented dialog policies via reinforcement learning typically requires large amounts of interaction with users, which in practice renders such methods unusable for real-world applications. In order to reduce the data requirements, we propose to leverage data from across different dialog domains, thereby reducing the amount of data required from each given domain. In particular, we pr… ▽ More

    Submitted 1 July, 2022; originally announced July 2022.

    Comments: Presented in the Conversational AI Workshop, NeurIPS 2019

  19. arXiv:2206.00059  [pdf, other

    cs.CL cs.AI

    A Mixture-of-Expert Approach to RL-based Dialogue Management

    Authors: Yinlam Chow, Aza Tulepbergenov, Ofir Nachum, MoonKyung Ryu, Mohammad Ghavamzadeh, Craig Boutilier

    Abstract: Despite recent advancements in language models (LMs), their application to dialogue management (DM) problems and ability to carry on rich conversations remain a challenge. We use reinforcement learning (RL) to develop a dialogue agent that avoids being short-sighted (outputting generic utterances) and maximizes overall user satisfaction. Most existing RL approaches to DM train the agent at the wor… ▽ More

    Submitted 31 May, 2022; originally announced June 2022.

  20. arXiv:2205.06331  [pdf, other

    cs.LG cs.MA

    Collaborative Multi-agent Stochastic Linear Bandits

    Authors: Ahmadreza Moradipari, Mohammad Ghavamzadeh, Mahnoosh Alizadeh

    Abstract: We study a collaborative multi-agent stochastic linear bandit setting, where $N$ agents that form a network communicate locally to minimize their overall regret. In this setting, each agent has its own linear bandit problem (its own reward parameter) and the goal is to select the best global action w.r.t. the average of their reward parameters. At each round, each agent proposes an action, and one… ▽ More

    Submitted 12 May, 2022; originally announced May 2022.

    Journal ref: American Control Conference (ACC), 2022

  21. arXiv:2205.06326  [pdf, other

    cs.LG

    Multi-Environment Meta-Learning in Stochastic Linear Bandits

    Authors: Ahmadreza Moradipari, Mohammad Ghavamzadeh, Taha Rajabzadeh, Christos Thrampoulidis, Mahnoosh Alizadeh

    Abstract: In this work we investigate meta-learning (or learning-to-learn) approaches in multi-task linear stochastic bandit problems that can originate from multiple environments. Inspired by the work of [1] on meta-learning in a sequence of linear bandit problems whose parameters are sampled from a single distribution (i.e., a single environment), here we consider the feasibility of meta-learning when tas… ▽ More

    Submitted 12 May, 2022; originally announced May 2022.

    Journal ref: IEEE International Symposium on Information Theory (ISIT), 2022

  22. arXiv:2205.05138  [pdf, other

    cs.LG

    Efficient Risk-Averse Reinforcement Learning

    Authors: Ido Greenberg, Yinlam Chow, Mohammad Ghavamzadeh, Shie Mannor

    Abstract: In risk-averse reinforcement learning (RL), the goal is to optimize some risk measure of the returns. A risk measure often focuses on the worst returns out of the agent's experience. As a result, standard methods for risk-averse RL often ignore high-return strategies. We prove that under certain conditions this inevitably leads to a local-optimum barrier, and propose a soft risk mechanism to bypas… ▽ More

    Submitted 12 October, 2022; v1 submitted 10 May, 2022; originally announced May 2022.

    Comments: Accepted to NeurIPS 2022

  23. arXiv:2202.13001  [pdf, other

    cs.LG stat.ML

    Non-stationary Bandits and Meta-Learning with a Small Set of Optimal Arms

    Authors: MohammadJavad Azizi, Thang Duong, Yasin Abbasi-Yadkori, András György, Claire Vernade, Mohammad Ghavamzadeh

    Abstract: We study a sequential decision problem where the learner faces a sequence of $K$-armed bandit tasks. The task boundaries might be known (the bandit meta-learning setting), or unknown (the non-stationary bandit setting). For a given integer $M\le K$, the learner aims to compete with the best subset of arms of size $M$. We design an algorithm based on a reduction to bandit submodular maximizati… ▽ More

    Submitted 18 October, 2022; v1 submitted 25 February, 2022; originally announced February 2022.

  24. arXiv:2202.12888  [pdf, other

    cs.LG cs.AI stat.ML

    Meta-Learning for Simple Regret Minimization

    Authors: Mohammadjavad Azizi, Branislav Kveton, Mohammad Ghavamzadeh, Sumeet Katariya

    Abstract: We develop a meta-learning framework for simple regret minimization in bandits. In this framework, a learning agent interacts with a sequence of bandit tasks, which are sampled i.i.d.\ from an unknown prior distribution, and learns its meta-parameters to perform better on future tasks. We propose the first Bayesian and frequentist meta-learning algorithms for this setting. The Bayesian algorithm h… ▽ More

    Submitted 4 July, 2023; v1 submitted 25 February, 2022; originally announced February 2022.

  25. arXiv:2202.02830  [pdf, other

    cs.IR cs.AI cs.LG

    Discovering Personalized Semantics for Soft Attributes in Recommender Systems using Concept Activation Vectors

    Authors: Christina Göpfert, Alex Haig, Yinlam Chow, Chih-wei Hsu, Ivan Vendrov, Tyler Lu, Deepak Ramachandran, Hubert Pham, Mohammad Ghavamzadeh, Craig Boutilier

    Abstract: Interactive recommender systems have emerged as a promising paradigm to overcome the limitations of the primitive user feedback used by traditional recommender systems (e.g., clicks, item consumption, ratings). They allow users to express intent, preferences, constraints, and contexts in a richer fashion, often using natural language (including faceted search and dialogue). Yet more research is ne… ▽ More

    Submitted 2 June, 2023; v1 submitted 6 February, 2022; originally announced February 2022.

  26. arXiv:2202.01454  [pdf, other

    cs.LG stat.ML

    Deep Hierarchy in Bandits

    Authors: Joey Hong, Branislav Kveton, Sumeet Katariya, Manzil Zaheer, Mohammad Ghavamzadeh

    Abstract: Mean rewards of actions are often correlated. The form of these correlations may be complex and unknown a priori, such as the preferences of a user for recommended products and their categories. To maximize statistical efficiency, it is important to leverage these correlations when learning. We formulate a bandit variant of this problem where the correlations of mean action rewards are represented… ▽ More

    Submitted 3 February, 2022; originally announced February 2022.

  27. arXiv:2111.06929  [pdf, other

    cs.LG cs.AI

    Hierarchical Bayesian Bandits

    Authors: Joey Hong, Branislav Kveton, Manzil Zaheer, Mohammad Ghavamzadeh

    Abstract: Meta-, multi-task, and federated learning can be all viewed as solving similar tasks, drawn from a distribution that reflects task similarities. We provide a unified view of all these problems, as learning to act in a hierarchical Bayesian bandit. We propose and analyze a natural hierarchical Thompson sampling algorithm (HierTS) for this class of problems. Our regret bounds hold for many variants… ▽ More

    Submitted 5 March, 2022; v1 submitted 12 November, 2021; originally announced November 2021.

    Comments: Proceedings of the 25th International Conference on Artificial Intelligence and Statistics

  28. arXiv:2106.05608  [pdf, other

    cs.LG cs.AI stat.ML

    Thompson Sampling with a Mixture Prior

    Authors: Joey Hong, Branislav Kveton, Manzil Zaheer, Mohammad Ghavamzadeh, Craig Boutilier

    Abstract: We study Thompson sampling (TS) in online decision making, where the uncertain environment is sampled from a mixture distribution. This is relevant in multi-task learning, where a learning agent faces different classes of problems. We incorporate this structure in a natural way by initializing TS with a mixture prior, and call the resulting algorithm MixTS. To analyze MixTS, we develop a novel and… ▽ More

    Submitted 5 March, 2022; v1 submitted 10 June, 2021; originally announced June 2021.

    Comments: Proceedings of the 25th International Conference on Artificial Intelligence and Statistics

  29. arXiv:2106.05378  [pdf, other

    cs.LG

    Feature and Parameter Selection in Stochastic Linear Bandits

    Authors: Ahmadreza Moradipari, Berkay Turan, Yasin Abbasi-Yadkori, Mahnoosh Alizadeh, Mohammad Ghavamzadeh

    Abstract: We study two model selection settings in stochastic linear bandits (LB). In the first setting, which we refer to as feature selection, the expected reward of the LB problem is in the linear span of at least one of $M$ feature maps (models). In the second setting, the reward parameter of the LB problem is arbitrarily selected from $M$ models represented as (possibly) overlapping balls in… ▽ More

    Submitted 17 June, 2022; v1 submitted 9 June, 2021; originally announced June 2021.

    Journal ref: International Conference on Machine Learning, 2022

  30. arXiv:2106.04763  [pdf, other

    cs.LG

    Fixed-Budget Best-Arm Identification in Structured Bandits

    Authors: Mohammad Javad Azizi, Branislav Kveton, Mohammad Ghavamzadeh

    Abstract: Best-arm identification (BAI) in a fixed-budget setting is a bandit problem where the learning agent maximizes the probability of identifying the optimal (best) arm after a fixed number of observations. Most works on this topic study unstructured problems with a small number of arms, which limits their applicability. We propose a general tractable algorithm that incorporates the structure, by succ… ▽ More

    Submitted 4 July, 2023; v1 submitted 8 June, 2021; originally announced June 2021.

  31. arXiv:2103.00755  [pdf, other

    cs.LG

    Adaptive Sampling for Minimax Fair Classification

    Authors: Shubhanshu Shekhar, Greg Fields, Mohammad Ghavamzadeh, Tara Javidi

    Abstract: Machine learning models trained on uncurated datasets can often end up adversely affecting inputs belonging to underrepresented groups. To address this issue, we consider the problem of adaptively constructing training sets which allow us to learn classifiers that are fair in a minimax sense. We first propose an adaptive sampling algorithm based on the principle of optimism, and derive theoretical… ▽ More

    Submitted 19 July, 2021; v1 submitted 28 February, 2021; originally announced March 2021.

    Comments: 30 pages, 13 figures

  32. arXiv:2012.00386  [pdf, other

    cs.LG cs.AI

    Non-Stationary Latent Bandits

    Authors: Joey Hong, Branislav Kveton, Manzil Zaheer, Yinlam Chow, Amr Ahmed, Mohammad Ghavamzadeh, Craig Boutilier

    Abstract: Users of recommender systems often behave in a non-stationary fashion, due to their evolving preferences and tastes over time. In this work, we propose a practical approach for fast personalization to non-stationary users. The key idea is to frame this problem as a latent bandit, where the prototypical models of user behavior are learned offline and the latent state of the user is inferred online… ▽ More

    Submitted 1 December, 2020; originally announced December 2020.

    Comments: 15 pages, 4 figures

  33. arXiv:2011.14495  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    Soft-Robust Algorithms for Batch Reinforcement Learning

    Authors: Elita A. Lobo, Mohammad Ghavamzadeh, Marek Petrik

    Abstract: In reinforcement learning, robust policies for high-stakes decision-making problems with limited data are usually computed by optimizing the percentile criterion, which minimizes the probability of a catastrophic failure. Unfortunately, such policies are typically overly conservative as the percentile criterion is non-convex, difficult to optimize, and ignores the mean performance. To overcome the… ▽ More

    Submitted 26 February, 2021; v1 submitted 29 November, 2020; originally announced November 2020.

  34. A Review of Uncertainty Quantification in Deep Learning: Techniques, Applications and Challenges

    Authors: Moloud Abdar, Farhad Pourpanah, Sadiq Hussain, Dana Rezazadegan, Li Liu, Mohammad Ghavamzadeh, Paul Fieguth, Xiaochun Cao, Abbas Khosravi, U Rajendra Acharya, Vladimir Makarenkov, Saeid Nahavandi

    Abstract: Uncertainty quantification (UQ) plays a pivotal role in reduction of uncertainties during both optimization and decision making processes. It can be applied to solve a variety of real-world applications in science and engineering. Bayesian approximation and ensemble learning techniques are two most widely-used UQ methods in the literature. In this regard, researchers have proposed different UQ met… ▽ More

    Submitted 5 January, 2021; v1 submitted 12 November, 2020; originally announced November 2020.

    Report number: INFFUS_1411]

    Journal ref: 2021

  35. arXiv:2009.06548  [pdf, other

    cs.LG stat.ML

    Variance-Reduced Off-Policy Memory-Efficient Policy Search

    Authors: Daoming Lyu, Qi Qi, Mohammad Ghavamzadeh, Hengshuai Yao, Tianbao Yang, Bo Liu

    Abstract: Off-policy policy optimization is a challenging problem in reinforcement learning (RL). The algorithms designed for this problem often suffer from high variance in their estimators, which results in poor sample efficiency, and have issues with convergence. A few variance-reduced on-policy policy gradient algorithms have been recently proposed that use methods from stochastic optimization to reduce… ▽ More

    Submitted 14 September, 2020; originally announced September 2020.

  36. arXiv:2006.15637  [pdf, other

    cs.LG stat.ML

    Deep Bayesian Quadrature Policy Optimization

    Authors: Akella Ravi Tej, Kamyar Azizzadenesheli, Mohammad Ghavamzadeh, Anima Anandkumar, Yisong Yue

    Abstract: We study the problem of obtaining accurate policy gradient estimates using a finite number of samples. Monte-Carlo methods have been the default choice for policy gradient estimation, despite suffering from high variance in the gradient estimates. On the other hand, more sample efficient alternatives like Bayesian quadrature methods have received little attention due to their high computational co… ▽ More

    Submitted 16 December, 2020; v1 submitted 28 June, 2020; originally announced June 2020.

    Comments: Conference paper: AAAI-21. Code available at https://github.com/Akella17/Deep-Bayesian-Quadrature-Policy-Optimization

  37. arXiv:2006.14364  [pdf, other

    cs.LG stat.ML

    Finite-Sample Analysis of Proximal Gradient TD Algorithms

    Authors: Bo Liu, Ji Liu, Mohammad Ghavamzadeh, Sridhar Mahadevan, Marek Petrik

    Abstract: In this paper, we analyze the convergence rate of the gradient temporal difference learning (GTD) family of algorithms. Previous analyses of this class of algorithms use ODE techniques to prove asymptotic convergence, and to the best of our knowledge, no finite-sample analysis has been done. Moreover, there has been not much work on finite-sample analysis for convergent off-policy reinforcement le… ▽ More

    Submitted 3 July, 2020; v1 submitted 6 June, 2020; originally announced June 2020.

    Comments: 31st Conference on Uncertainty in Artificial Intelligence (UAI). arXiv admin note: substantial text overlap with arXiv:2006.03976

  38. arXiv:2006.13408  [pdf, other

    cs.LG cs.AI stat.ML

    Control-Aware Representations for Model-based Reinforcement Learning

    Authors: Brandon Cui, Yinlam Chow, Mohammad Ghavamzadeh

    Abstract: A major challenge in modern reinforcement learning (RL) is efficient control of dynamical systems from high-dimensional sensory observations. Learning controllable embedding (LCE) is a promising approach that addresses this challenge by embedding the observations into a lower-dimensional latent space, estimating the latent dynamics, and utilizing it to perform control in the latent space. Two impo… ▽ More

    Submitted 23 June, 2020; originally announced June 2020.

  39. arXiv:2006.10185  [pdf, other

    cs.LG stat.ML

    Stochastic Bandits with Linear Constraints

    Authors: Aldo Pacchiano, Mohammad Ghavamzadeh, Peter Bartlett, Heinrich Jiang

    Abstract: We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies, whose expected cumulative reward over the course of $T$ rounds is maximum, and each has an expected cost below a certain threshold $τ$. We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB), and prove an… ▽ More

    Submitted 17 June, 2020; originally announced June 2020.

    Comments: 9 pages

  40. arXiv:2006.05443  [pdf, other

    cs.LG cs.AI stat.ML

    Variational Model-based Policy Optimization

    Authors: Yinlam Chow, Brandon Cui, MoonKyung Ryu, Mohammad Ghavamzadeh

    Abstract: Model-based reinforcement learning (RL) algorithms allow us to combine model-generated data with those collected from interaction with the real system in order to alleviate the data efficiency problem in RL. However, designing such algorithms is often challenging because the bias in simulated data may overshadow the ease of data generation. A potential solution to this challenge is to jointly lear… ▽ More

    Submitted 23 June, 2020; v1 submitted 9 June, 2020; originally announced June 2020.

  41. arXiv:2006.03976  [pdf, other

    cs.LG stat.ML

    Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity

    Authors: Bo Liu, Ian Gemp, Mohammad Ghavamzadeh, Ji Liu, Sridhar Mahadevan, Marek Petrik

    Abstract: In this paper, we introduce proximal gradient temporal difference learning, which provides a principled way of designing and analyzing true stochastic gradient temporal difference learning algorithms. We show how gradient TD (GTD) reinforcement learning methods can be formally derived, not by starting from their original objective functions, as previously attempted, but rather from a primal-dual s… ▽ More

    Submitted 6 June, 2020; originally announced June 2020.

    Comments: Journal of Artificial Intelligence (JAIR)

  42. arXiv:2006.03947  [pdf, other

    eess.SY cs.LG stat.ML

    Neural Lyapunov Redesign

    Authors: Arash Mehrjou, Mohammad Ghavamzadeh, Bernhard Schölkopf

    Abstract: Learning controllers merely based on a performance metric has been proven effective in many physical and non-physical tasks in both control theory and reinforcement learning. However, in practice, the controller must guarantee some notion of safety to ensure that it does not harm either the agent or the environment. Stability is a crucial notion of safety, whose violation can certainly cause unsaf… ▽ More

    Submitted 22 November, 2020; v1 submitted 6 June, 2020; originally announced June 2020.

    Comments: 27 pages

  43. arXiv:2005.09814  [pdf, other

    cs.LG cs.AI stat.ML

    Mirror Descent Policy Optimization

    Authors: Manan Tomar, Lior Shani, Yonathan Efroni, Mohammad Ghavamzadeh

    Abstract: Mirror descent (MD), a well-known first-order method in constrained convex optimization, has recently been shown as an important tool to analyze trust-region algorithms in reinforcement learning (RL). However, there remains a considerable gap between such theoretically analyzed algorithms and the ones used in practice. Inspired by this, we propose an efficient RL algorithm, called {\em mirror desc… ▽ More

    Submitted 7 June, 2021; v1 submitted 19 May, 2020; originally announced May 2020.

  44. arXiv:2003.03297  [pdf, other

    stat.ML cs.LG

    Active Model Estimation in Markov Decision Processes

    Authors: Jean Tarbouriech, Shubhanshu Shekhar, Matteo Pirotta, Mohammad Ghavamzadeh, Alessandro Lazaric

    Abstract: We study the problem of efficient exploration in order to learn an accurate model of an environment, modeled as a Markov decision process (MDP). Efficient exploration in this problem requires the agent to identify the regions in which estimating the model is more difficult and then exploit this knowledge to collect more samples there. In this paper, we formalize this problem, introduce the first a… ▽ More

    Submitted 22 June, 2020; v1 submitted 6 March, 2020; originally announced March 2020.

  45. arXiv:2003.01086  [pdf, other

    cs.LG eess.SY stat.ML

    Predictive Coding for Locally-Linear Control

    Authors: Rui Shu, Tung Nguyen, Yinlam Chow, Tuan Pham, Khoat Than, Mohammad Ghavamzadeh, Stefano Ermon, Hung H. Bui

    Abstract: High-dimensional observations and unknown dynamics are major challenges when applying optimal control to many real-world decision making tasks. The Learning Controllable Embedding (LCE) framework addresses these challenges by embedding the observations into a lower dimensional latent space, estimating the latent dynamics, and then performing control directly in the latent space. To ensure the lear… ▽ More

    Submitted 2 March, 2020; originally announced March 2020.

  46. arXiv:2003.00030  [pdf, other

    cs.AI

    Policy-Aware Model Learning for Policy Gradient Methods

    Authors: Romina Abachi, Mohammad Ghavamzadeh, Amir-massoud Farahmand

    Abstract: This paper considers the problem of learning a model in model-based reinforcement learning (MBRL). We examine how the planning module of an MBRL algorithm uses the model, and propose that the model learning module should incorporate the way the planner is going to use the model. This is in contrast to conventional model learning approaches, such as those based on maximum likelihood estimate, that… ▽ More

    Submitted 3 January, 2021; v1 submitted 28 February, 2020; originally announced March 2020.

  47. arXiv:2002.03221  [pdf, other

    cs.LG stat.ML

    Improved Algorithms for Conservative Exploration in Bandits

    Authors: Evrard Garcelon, Mohammad Ghavamzadeh, Alessandro Lazaric, Matteo Pirotta

    Abstract: In many fields such as digital marketing, healthcare, finance, and robotics, it is common to have a well-tested and reliable baseline policy running in production (e.g., a recommender system). Nonetheless, the baseline policy is often suboptimal. In this case, it is desirable to deploy online learning algorithms (e.g., a multi-armed bandit algorithm) that interact with the system to learn a better… ▽ More

    Submitted 8 February, 2020; originally announced February 2020.

  48. arXiv:2002.03218  [pdf, other

    cs.LG stat.ML

    Conservative Exploration in Reinforcement Learning

    Authors: Evrard Garcelon, Mohammad Ghavamzadeh, Alessandro Lazaric, Matteo Pirotta

    Abstract: While learning in an unknown Markov Decision Process (MDP), an agent should trade off exploration to discover new information about the MDP, and exploitation of the current knowledge to maximize the reward. Although the agent will eventually learn a good or optimal policy, there is no guarantee on the quality of the intermediate policies. This lack of control is undesired in real-world application… ▽ More

    Submitted 15 July, 2020; v1 submitted 8 February, 2020; originally announced February 2020.

    Comments: AISTATS 2020

  49. arXiv:1910.12406  [pdf, other

    stat.ML cs.LG

    Adaptive Sampling for Estimating Multiple Probability Distributions

    Authors: Shubhanshu Shekhar, Tara Javidi, Mohammad Ghavamzadeh

    Abstract: We consider the problem of allocating samples to a finite set of discrete distributions in order to learn them uniformly well in terms of four common distance measures: $\ell_2^2$, $\ell_1$, $f$-divergence, and separation distance. To present a unified treatment of these distances, we first propose a general optimistic tracking algorithm and analyze its sample allocation performance w.r.t.~an orac… ▽ More

    Submitted 6 December, 2019; v1 submitted 27 October, 2019; originally announced October 2019.

    Comments: 40 pages, 3 figures

  50. arXiv:1910.02919  [pdf, other

    cs.LG stat.ML

    Multi-step Greedy Reinforcement Learning Algorithms

    Authors: Manan Tomar, Yonathan Efroni, Mohammad Ghavamzadeh

    Abstract: Multi-step greedy policies have been extensively used in model-based reinforcement learning (RL), both when a model of the environment is available (e.g.,~in the game of Go) and when it is learned. In this paper, we explore their benefits in model-free RL, when employed using multi-step dynamic programming algorithms: $κ$-Policy Iteration ($κ$-PI) and $κ$-Value Iteration ($κ$-VI). These methods it… ▽ More

    Submitted 12 July, 2020; v1 submitted 7 October, 2019; originally announced October 2019.

    Comments: ICML 2020