Search | arXiv e-print repository

Decision-Focused Learning to Predict Action Costs for Planning

Authors: Jayanta Mandi, Marco Foschini, Daniel Holler, Sylvie Thiebaux, Jorg Hoffmann, Tias Guns

Abstract: In many automated planning applications, action costs can be hard to specify. An example is the time needed to travel through a certain road segment, which depends on many factors, such as the current weather conditions. A natural way to address this issue is to learn to predict these parameters based on input features (e.g., weather forecasts) and use the predicted action costs in automated plann… ▽ More In many automated planning applications, action costs can be hard to specify. An example is the time needed to travel through a certain road segment, which depends on many factors, such as the current weather conditions. A natural way to address this issue is to learn to predict these parameters based on input features (e.g., weather forecasts) and use the predicted action costs in automated planning afterward. Decision-Focused Learning (DFL) has been successful in learning to predict the parameters of combinatorial optimization problems in a way that optimizes solution quality rather than prediction quality. This approach yields better results than treating prediction and optimization as separate tasks. In this paper, we investigate for the first time the challenges of implementing DFL for automated planning in order to learn to predict the action costs. There are two main challenges to overcome: (1) planning systems are called during gradient descent learning, to solve planning problems with negative action costs, which are not supported in planning. We propose novel methods for gradient computation to avoid this issue. (2) DFL requires repeated planner calls during training, which can limit the scalability of the method. We experiment with different methods approximating the optimal plan as well as an easy-to-implement caching mechanism to speed up the learning process. As the first work that addresses DFL for automated planning, we demonstrate that the proposed gradient computation consistently yields significantly better plans than predictions aimed at minimizing prediction error; and that caching can temper the computation requirements. △ Less

Submitted 13 August, 2024; originally announced August 2024.

arXiv:2406.03995 [pdf, other]

AC4MPC: Actor-Critic Reinforcement Learning for Nonlinear Model Predictive Control

Authors: Rudolf Reiter, Andrea Ghezzi, Katrin Baumgärtner, Jasper Hoffmann, Robert D. McAllister, Moritz Diehl

Abstract: \Ac{MPC} and \ac{RL} are two powerful control strategies with, arguably, complementary advantages. In this work, we show how actor-critic \ac{RL} techniques can be leveraged to improve the performance of \ac{MPC}. The \ac{RL} critic is used as an approximation of the optimal value function, and an actor roll-out provides an initial guess for primal variables of the \ac{MPC}. A parallel control arc… ▽ More \Ac{MPC} and \ac{RL} are two powerful control strategies with, arguably, complementary advantages. In this work, we show how actor-critic \ac{RL} techniques can be leveraged to improve the performance of \ac{MPC}. The \ac{RL} critic is used as an approximation of the optimal value function, and an actor roll-out provides an initial guess for primal variables of the \ac{MPC}. A parallel control architecture is proposed where each \ac{MPC} instance is solved twice for different initial guesses. Besides the actor roll-out initialization, a shifted initialization from the previous solution is used. Thereafter, the actor and the critic are again used to approximately evaluate the infinite horizon cost of these trajectories. The control actions from the lowest-cost trajectory are applied to the system at each time step. We establish that the proposed algorithm is guaranteed to outperform the original \ac{RL} policy plus an error term that depends on the accuracy of the critic and decays with the horizon length of the \ac{MPC} formulation. Moreover, we do not require globally optimal solutions for these guarantees to hold. The approach is demonstrated on an illustrative toy example and an \ac{AD} overtaking scenario. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2405.02598 [pdf, other]

UDUC: An Uncertainty-driven Approach for Learning-based Robust Control

Authors: Yuan Zhang, Jasper Hoffmann, Joschka Boedecker

Abstract: Learning-based techniques have become popular in both model predictive control (MPC) and reinforcement learning (RL). Probabilistic ensemble (PE) models offer a promising approach for modelling system dynamics, showcasing the ability to capture uncertainty and scalability in high-dimensional control scenarios. However, PE models are susceptible to mode collapse, resulting in non-robust control whe… ▽ More Learning-based techniques have become popular in both model predictive control (MPC) and reinforcement learning (RL). Probabilistic ensemble (PE) models offer a promising approach for modelling system dynamics, showcasing the ability to capture uncertainty and scalability in high-dimensional control scenarios. However, PE models are susceptible to mode collapse, resulting in non-robust control when faced with environments slightly different from the training set. In this paper, we introduce the $\textbf{u}$ncertainty-$\textbf{d}$riven rob$\textbf{u}$st $\textbf{c}$ontrol (UDUC) loss as an alternative objective for training PE models, drawing inspiration from contrastive learning. We analyze the robustness of UDUC loss through the lens of robust optimization and evaluate its performance on the challenging Real-world Reinforcement Learning (RWRL) benchmark, which involves significant environmental mismatches between the training and testing environments. △ Less

Submitted 4 May, 2024; originally announced May 2024.

arXiv:2404.18863 [pdf, other]

PlanNetX: Learning an Efficient Neural Network Planner from MPC for Longitudinal Control

Authors: Jasper Hoffmann, Diego Fernandez, Julien Brosseit, Julian Bernhard, Klemens Esterle, Moritz Werling, Michael Karg, Joschka Boedecker

Abstract: Model predictive control (MPC) is a powerful, optimization-based approach for controlling dynamical systems. However, the computational complexity of online optimization can be problematic on embedded devices. Especially, when we need to guarantee fixed control frequencies. Thus, previous work proposed to reduce the computational burden using imitation learning (IL) approximating the MPC policy by… ▽ More Model predictive control (MPC) is a powerful, optimization-based approach for controlling dynamical systems. However, the computational complexity of online optimization can be problematic on embedded devices. Especially, when we need to guarantee fixed control frequencies. Thus, previous work proposed to reduce the computational burden using imitation learning (IL) approximating the MPC policy by a neural network. In this work, we instead learn the whole planned trajectory of the MPC. We introduce a combination of a novel neural network architecture PlanNetX and a simple loss function based on the state trajectory that leverages the parameterized optimal control structure of the MPC. We validate our approach in the context of autonomous driving by learning a longitudinal planner and benchmarking it extensively in the CommonRoad simulator using synthetic scenarios and scenarios derived from real data. Our experimental results show that we can learn the open-loop MPC trajectory with high accuracy while improving the closed-loop performance of the learned control policy over other baselines like behavior cloning. △ Less

Submitted 22 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

Comments: 6th Annual Learning for Dynamics & Control Conference (L4DC 2024)

arXiv:2403.10704 [pdf, other]

PERL: Parameter Efficient Reinforcement Learning from Human Feedback

Authors: Hakim Sidahmed, Samrat Phatale, Alex Hutcheson, Zhuonan Lin, Zhang Chen, Zac Yu, Jarvis Jin, Roman Komarytsia, Christiane Ahlheim, Yonghao Zhu, Simral Chaudhary, Bowen Li, Saravanan Ganesh, Bill Byrne, Jessica Hoffmann, Hassan Mansoor, Wei Li, Abhinav Rastogi, Lucas Dixon

Abstract: Reinforcement Learning from Human Feedback (RLHF) has proven to be a strong method to align Pretrained Large Language Models (LLMs) with human preferences. But training models with RLHF is computationally expensive, and an overall complex process. In this work, we study RLHF where the underlying models are trained using the parameter efficient method of Low-Rank Adaptation (LoRA) introduced by Hu… ▽ More Reinforcement Learning from Human Feedback (RLHF) has proven to be a strong method to align Pretrained Large Language Models (LLMs) with human preferences. But training models with RLHF is computationally expensive, and an overall complex process. In this work, we study RLHF where the underlying models are trained using the parameter efficient method of Low-Rank Adaptation (LoRA) introduced by Hu et al. [2021]. We investigate the setup of "Parameter Efficient Reinforcement Learning" (PERL), in which we perform reward model training and reinforcement learning using LoRA. We compare PERL to conventional fine-tuning (full-tuning) across various configurations for 7 benchmarks, including 2 novel datasets, of reward modeling and reinforcement learning. We find that PERL performs on par with the conventional RLHF setting, while training faster, and with less memory. This enables the high performance of RLHF, while reducing the computational burden that limits its adoption as an alignment technique for Large Language Models. We also release 2 novel thumbs up/down preference datasets: "Taskmaster Coffee", and "Taskmaster Ticketing" to promote research around RLHF. △ Less

Submitted 15 March, 2024; originally announced March 2024.

arXiv:2403.08904 [pdf, other]

Detecting Hallucination and Coverage Errors in Retrieval Augmented Generation for Controversial Topics

Authors: Tyler A. Chang, Katrin Tomanek, Jessica Hoffmann, Nithum Thain, Erin van Liemt, Kathleen Meier-Hellstern, Lucas Dixon

Abstract: We explore a strategy to handle controversial topics in LLM-based chatbots based on Wikipedia's Neutral Point of View (NPOV) principle: acknowledge the absence of a single true answer and surface multiple perspectives. We frame this as retrieval augmented generation, where perspectives are retrieved from a knowledge base and the LLM is tasked with generating a fluent and faithful response from the… ▽ More We explore a strategy to handle controversial topics in LLM-based chatbots based on Wikipedia's Neutral Point of View (NPOV) principle: acknowledge the absence of a single true answer and surface multiple perspectives. We frame this as retrieval augmented generation, where perspectives are retrieved from a knowledge base and the LLM is tasked with generating a fluent and faithful response from the given perspectives. As a starting point, we use a deterministic retrieval system and then focus on common LLM failure modes that arise during this approach to text generation, namely hallucination and coverage errors. We propose and evaluate three methods to detect such errors based on (1) word-overlap, (2) salience, and (3) LLM-based classifiers. Our results demonstrate that LLM-based classifiers, even when trained only on synthetic errors, achieve high error detection performance, with ROC AUC scores of 95.3% for hallucination and 90.5% for coverage error detection on unambiguous error cases. We show that when no training data is available, our other methods still yield good results on hallucination (84.0%) and coverage error (85.2%) detection. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: Accepted at LREC-COLING 2024

arXiv:2402.02992 [pdf, other]

Decoding-time Realignment of Language Models

Authors: Tianlin Liu, Shangmin Guo, Leonardo Bianco, Daniele Calandriello, Quentin Berthet, Felipe Llinares, Jessica Hoffmann, Lucas Dixon, Michal Valko, Mathieu Blondel

Abstract: Aligning language models with human preferences is crucial for reducing errors and biases in these models. Alignment techniques, such as reinforcement learning from human feedback (RLHF), are typically cast as optimizing a tradeoff between human preference rewards and a proximity regularization term that encourages staying close to the unaligned model. Selecting an appropriate level of regularizat… ▽ More Aligning language models with human preferences is crucial for reducing errors and biases in these models. Alignment techniques, such as reinforcement learning from human feedback (RLHF), are typically cast as optimizing a tradeoff between human preference rewards and a proximity regularization term that encourages staying close to the unaligned model. Selecting an appropriate level of regularization is critical: insufficient regularization can lead to reduced model capabilities due to reward hacking, whereas excessive regularization hinders alignment. Traditional methods for finding the optimal regularization level require retraining multiple models with varying regularization strengths. This process, however, is resource-intensive, especially for large models. To address this challenge, we propose decoding-time realignment (DeRa), a simple method to explore and evaluate different regularization strengths in aligned models without retraining. DeRa enables control over the degree of alignment, allowing users to smoothly transition between unaligned and aligned models. It also enhances the efficiency of hyperparameter tuning by enabling the identification of effective regularization strengths using a validation dataset. △ Less

Submitted 24 May, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

Comments: In Proceedings of the 41st International Conference on Machine Learning (ICML 2024)

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2311.09830 [pdf, other]

AutoPlanBench: Automatically generating benchmarks for LLM planners from PDDL

Authors: Katharina Stein, Daniel Fišer, Jörg Hoffmann, Alexander Koller

Abstract: LLMs are being increasingly used for planning-style tasks, but their capabilities for planning and reasoning are poorly understood. We present AutoPlanBench, a novel method for automatically converting planning benchmarks written in PDDL into textual descriptions and offer a benchmark dataset created with our method. We show that while the best LLM planners do well on some planning tasks, others r… ▽ More LLMs are being increasingly used for planning-style tasks, but their capabilities for planning and reasoning are poorly understood. We present AutoPlanBench, a novel method for automatically converting planning benchmarks written in PDDL into textual descriptions and offer a benchmark dataset created with our method. We show that while the best LLM planners do well on some planning tasks, others remain out of reach of current methods. △ Less

Submitted 9 February, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

arXiv:2309.08042 [pdf, other]

Towards Large-scale Building Attribute Mapping using Crowdsourced Images: Scene Text Recognition on Flickr and Problems to be Solved

Authors: Yao Sun, Anna Kruspe, Liqiu Meng, Yifan Tian, Eike J Hoffmann, Stefan Auer, Xiao Xiang Zhu

Abstract: Crowdsourced platforms provide huge amounts of street-view images that contain valuable building information. This work addresses the challenges in applying Scene Text Recognition (STR) in crowdsourced street-view images for building attribute mapping. We use Flickr images, particularly examining texts on building facades. A Berlin Flickr dataset is created, and pre-trained STR models are used for… ▽ More Crowdsourced platforms provide huge amounts of street-view images that contain valuable building information. This work addresses the challenges in applying Scene Text Recognition (STR) in crowdsourced street-view images for building attribute mapping. We use Flickr images, particularly examining texts on building facades. A Berlin Flickr dataset is created, and pre-trained STR models are used for text detection and recognition. Manual checking on a subset of STR-recognized images demonstrates high accuracy. We examined the correlation between STR results and building functions, and analysed instances where texts were recognized on residential buildings but not on commercial ones. Further investigation revealed significant challenges associated with this task, including small text regions in street-view images, the absence of ground truth labels, and mismatches in buildings in Flickr images and building footprints in OpenStreetMap (OSM). To develop city-wide mapping beyond urban hotspot locations, we suggest differentiating the scenarios where STR proves effective while developing appropriate algorithms or bringing in additional data for handling other cases. Furthermore, interdisciplinary collaboration should be undertaken to understand the motivation behind building photography and labeling. The STR-on-Flickr results are publicly available at https://github.com/ya0-sun/STR-Berlin. △ Less

Submitted 14 September, 2023; originally announced September 2023.

arXiv:2309.01261 [pdf, other]

Worst-Case Input Generation for Concurrent Programs under Non-Monotone Resource Metrics

Authors: Long Pham, Jan Hoffmann

Abstract: Worst-case input generation aims to automatically generate inputs that exhibit the worst-case performance of programs. It has several applications, and can, for example, detect vulnerabilities to denial-of-service attacks. However, it is non-trivial to generate worst-case inputs for concurrent programs, particularly for resources like memory where the peak cost depends on how processes are schedul… ▽ More Worst-case input generation aims to automatically generate inputs that exhibit the worst-case performance of programs. It has several applications, and can, for example, detect vulnerabilities to denial-of-service attacks. However, it is non-trivial to generate worst-case inputs for concurrent programs, particularly for resources like memory where the peak cost depends on how processes are scheduled. This article presents the first sound worst-case input generation algorithm for concurrent programs under non-monotone resource metrics like memory. The key insight is to leverage resource-annotated session types and symbolic execution. Session types describe communication protocols on channels in process calculi. Equipped with resource annotations, resource-annotated session types not only encode cost bounds but also indicate how many resources can be reused and transferred between processes. This information is critical for identifying a worst-case execution path during symbolic execution. The algorithm is sound: if it returns any input, it is guaranteed to be a valid worst-case input. The algorithm is also relatively complete: as long as resource-annotated session types are sufficiently expressive and the background theory for SMT solving is decidable, a worst-case input is guaranteed to be returned. A simple case study of a web server's memory usage demonstrates the utility of the worst-case input generation algorithm. △ Less

Submitted 23 July, 2024; v1 submitted 3 September, 2023; originally announced September 2023.

arXiv:2308.15470 [pdf, other]

Policy composition in reinforcement learning via multi-objective policy optimization

Authors: Shruti Mishra, Ankit Anand, Jordan Hoffmann, Nicolas Heess, Martin Riedmiller, Abbas Abdolmaleki, Doina Precup

Abstract: We enable reinforcement learning agents to learn successful behavior policies by utilizing relevant pre-existing teacher policies. The teacher policies are introduced as objectives, in addition to the task objective, in a multi-objective policy optimization setting. Using the Multi-Objective Maximum a Posteriori Policy Optimization algorithm (Abdolmaleki et al. 2020), we show that teacher policies… ▽ More We enable reinforcement learning agents to learn successful behavior policies by utilizing relevant pre-existing teacher policies. The teacher policies are introduced as objectives, in addition to the task objective, in a multi-objective policy optimization setting. Using the Multi-Objective Maximum a Posteriori Policy Optimization algorithm (Abdolmaleki et al. 2020), we show that teacher policies can help speed up learning, particularly in the absence of shaping rewards. In two domains with continuous observation and action spaces, our agents successfully compose teacher policies in sequence and in parallel, and are also able to further extend the policies of the teachers in order to solve the task. Depending on the specified combination of task and teacher(s), teacher(s) may naturally act to limit the final performance of an agent. The extent to which agents are required to adhere to teacher policies are determined by hyperparameters which determine both the effect of teachers on learning speed and the eventual performance of the agent on the task. In the humanoid domain (Tassa et al. 2018), we also equip agents with the ability to control the selection of teachers. With this ability, agents are able to meaningfully compose from the teacher policies to achieve a superior task reward on the walk task than in cases without access to the teacher policies. We show the resemblance of composed task policies with the corresponding teacher policies through videos. △ Less

Submitted 30 August, 2023; v1 submitted 29 August, 2023; originally announced August 2023.

arXiv:2308.02041 [pdf]

Regulating AI: Applying insights from behavioural economics and psychology to the application of article 5 of the EU AI Act

Authors: Huixin Zhong, Eamonn O'Neill, Janina A. Hoffmann

Abstract: Article 5 of the European Union's Artificial Intelligence Act is intended to regulate AI use to prevent potentially harmful consequences. Nevertheless, applying this legislation practically is likely to be challenging because of ambiguously used terminologies and because it fails to specify which manipulation techniques may be invoked by AI, potentially leading to significant harm. This paper aims… ▽ More Article 5 of the European Union's Artificial Intelligence Act is intended to regulate AI use to prevent potentially harmful consequences. Nevertheless, applying this legislation practically is likely to be challenging because of ambiguously used terminologies and because it fails to specify which manipulation techniques may be invoked by AI, potentially leading to significant harm. This paper aims to bridge this gap by defining key terms and demonstrating how AI may invoke these techniques, drawing from insights in psychology and behavioural economics. First, this paper provides definitions of the terms "subliminal techniques", "manipulative techniques" and "deceptive techniques". Secondly, we identified from the literature in cognitive psychology and behavioural economics three subliminal and five manipulative techniques and exemplify how AI might implement these techniques to manipulate users in real-world case scenarios. These illustrations may serve as a practical guide for stakeholders to detect cases of AI manipulation and consequently devise preventive measures. Article 5 has also been criticised for offering inadequate protection. We critically assess the protection offered by Article 5, proposing specific revisions to paragraph 1, points (a) and (b) of Article 5 to increase its protective effectiveness. △ Less

Submitted 25 February, 2024; v1 submitted 24 July, 2023; originally announced August 2023.

Comments: This paper was accepted for publication by AAAI 2024 paper on December of 2023

arXiv:2305.17300 [pdf, other]

Exploiting Large Neuroimaging Datasets to Create Connectome-Constrained Approaches for more Robust, Efficient, and Adaptable Artificial Intelligence

Authors: Erik C. Johnson, Brian S. Robinson, Gautam K. Vallabha, Justin Joyce, Jordan K. Matelsky, Raphael Norman-Tenazas, Isaac Western, Marisel Villafañe-Delgado, Martha Cervantes, Michael S. Robinette, Arun V. Reddy, Lindsey Kitchell, Patricia K. Rivlin, Elizabeth P. Reilly, Nathan Drenkow, Matthew J. Roos, I-Jeng Wang, Brock A. Wester, William R. Gray-Roncal, Joan A. Hoffmann

Abstract: Despite the progress in deep learning networks, efficient learning at the edge (enabling adaptable, low-complexity machine learning solutions) remains a critical need for defense and commercial applications. We envision a pipeline to utilize large neuroimaging datasets, including maps of the brain which capture neuron and synapse connectivity, to improve machine learning approaches. We have pursue… ▽ More Despite the progress in deep learning networks, efficient learning at the edge (enabling adaptable, low-complexity machine learning solutions) remains a critical need for defense and commercial applications. We envision a pipeline to utilize large neuroimaging datasets, including maps of the brain which capture neuron and synapse connectivity, to improve machine learning approaches. We have pursued different approaches within this pipeline structure. First, as a demonstration of data-driven discovery, the team has developed a technique for discovery of repeated subcircuits, or motifs. These were incorporated into a neural architecture search approach to evolve network architectures. Second, we have conducted analysis of the heading direction circuit in the fruit fly, which performs fusion of visual and angular velocity features, to explore augmenting existing computational models with new insight. Our team discovered a novel pattern of connectivity, implemented a new model, and demonstrated sensor fusion on a robotic platform. Third, the team analyzed circuitry for memory formation in the fruit fly connectome, enabling the design of a novel generative replay approach. Finally, the team has begun analysis of connectivity in mammalian cortex to explore potential improvements to transformer networks. These constraints increased network robustness on the most challenging examples in the CIFAR-10-C computer vision robustness benchmark task, while reducing learnable attention parameters by over an order of magnitude. Taken together, these results demonstrate multiple potential approaches to utilize insight from neural systems for developing robust and efficient machine learning techniques. △ Less

Submitted 26 May, 2023; originally announced May 2023.

Comments: 11 pages, 4 figures

arXiv:2304.13627 [pdf, ps, other]

Automatic Amortized Resource Analysis with Regular Recursive Types

Authors: Jessie Grosen, David M. Kahn, Jan Hoffmann

Abstract: The goal of automatic resource bound analysis is to statically infer symbolic bounds on the resource consumption of the evaluation of a program. A longstanding challenge for automatic resource analysis is the inference of bounds that are functions of complex custom data structures. This article builds on type-based automatic amortized resource analysis (AARA) to address this challenge. AARA is bas… ▽ More The goal of automatic resource bound analysis is to statically infer symbolic bounds on the resource consumption of the evaluation of a program. A longstanding challenge for automatic resource analysis is the inference of bounds that are functions of complex custom data structures. This article builds on type-based automatic amortized resource analysis (AARA) to address this challenge. AARA is based on the potential method of amortized analysis and reduces bound inference to standard type inference with additional linear constraint solving, even when deriving non-linear bounds. A key component of AARA is resource functions that generate the space of possible bounds for values of a given type while enjoying necessary closure properties. Existing work on AARA defined such functions for many data structures such as lists of lists but the question of whether such functions exist for arbitrary data structures remained open. This work answers this questions positively by uniformly constructing resource polynomials for algebraic data structures defined by regular recursive types. These functions are a generalization of all previously proposed polynomial resource functions and can be seen as a general notion of polynomials for values of a given recursive type. A resource type system for FPC, a core language with recursive types, demonstrates how resource polynomials can be integrated with AARA while preserving all benefits of past techniques. The article also proposes the use of new techniques useful for stating the rules of this type system and proving it sound. First, multivariate potential annotations are stated in terms of free semimodules, substantially abstracting details of the presentation of annotations and the proofs of their properties. Second, a logical relation giving semantic meaning to resource types enables a proof of soundness by a single induction on typing derivations. △ Less

Submitted 26 April, 2023; originally announced April 2023.

Comments: 15 pages, 5 figures; to be published in LICS'23

arXiv:2302.06541 [pdf, other]

Towards Agile Text Classifiers for Everyone

Authors: Maximilian Mozes, Jessica Hoffmann, Katrin Tomanek, Muhamed Kouate, Nithum Thain, Ann Yuan, Tolga Bolukbasi, Lucas Dixon

Abstract: Text-based safety classifiers are widely used for content moderation and increasingly to tune generative language model behavior - a topic of growing concern for the safety of digital assistants and chatbots. However, different policies require different classifiers, and safety policies themselves improve from iteration and adaptation. This paper introduces and evaluates methods for agile text cla… ▽ More Text-based safety classifiers are widely used for content moderation and increasingly to tune generative language model behavior - a topic of growing concern for the safety of digital assistants and chatbots. However, different policies require different classifiers, and safety policies themselves improve from iteration and adaptation. This paper introduces and evaluates methods for agile text classification, whereby classifiers are trained using small, targeted datasets that can be quickly developed for a particular policy. Experimenting with 7 datasets from three safety-related domains, comprising 15 annotation schemes, led to our key finding: prompt-tuning large language models, like PaLM 62B, with a labeled dataset of as few as 80 examples can achieve state-of-the-art performance. We argue that this enables a paradigm shift for text classification, especially for models supporting safer online discourse. Instead of collecting millions of examples to attempt to create universal safety classifiers over months or years, classifiers could be tuned using small datasets, created by individuals or small organizations, tailored for specific use cases, and iterated on and adapted in the time-span of a day. △ Less

Submitted 21 October, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

Comments: Findings of EMNLP 2023

arXiv:2212.01607 [pdf, other]

A Hierarchical Approach for Strategic Motion Planning in Autonomous Racing

Authors: Rudolf Reiter, Jasper Hoffmann, Joschka Boedecker, Moritz Diehl

Abstract: We present an approach for safe trajectory planning, where a strategic task related to autonomous racing is learned sample-efficient within a simulation environment. A high-level policy, represented as a neural network, outputs a reward specification that is used within the cost function of a parametric nonlinear model predictive controller (NMPC). By including constraints and vehicle kinematics… ▽ More We present an approach for safe trajectory planning, where a strategic task related to autonomous racing is learned sample-efficient within a simulation environment. A high-level policy, represented as a neural network, outputs a reward specification that is used within the cost function of a parametric nonlinear model predictive controller (NMPC). By including constraints and vehicle kinematics in the NLP, we are able to guarantee safe and feasible trajectories related to the used model. Compared to classical reinforcement learning (RL), our approach restricts the exploration to safe trajectories, starts with a good prior performance and yields full trajectories that can be passed to a tracking lowest-level controller. We do not address the lowest-level controller in this work and assume perfect tracking of feasible trajectories. We show the superior performance of our algorithm on simulated racing tasks that include high-level decision making. The vehicle learns to efficiently overtake slower vehicles and to avoid getting overtaken by blocking faster vehicles. △ Less

Submitted 3 December, 2022; originally announced December 2022.

arXiv:2211.00543 [pdf]

Geo-Information Harvesting from Social Media Data

Authors: Xiao Xiang Zhu, Yuanyuan Wang, Mrinalini Kochupillai, Martin Werner, Matthias Häberle, Eike Jens Hoffmann, Hannes Taubenböck, Devis Tuia, Alex Levering, Nathan Jacobs, Anna Kruspe, Karam Abdulahhad

Abstract: As unconventional sources of geo-information, massive imagery and text messages from open platforms and social media form a temporally quasi-seamless, spatially multi-perspective stream, but with unknown and diverse quality. Due to its complementarity to remote sensing data, geo-information from these sources offers promising perspectives, but harvesting is not trivial due to its data characterist… ▽ More As unconventional sources of geo-information, massive imagery and text messages from open platforms and social media form a temporally quasi-seamless, spatially multi-perspective stream, but with unknown and diverse quality. Due to its complementarity to remote sensing data, geo-information from these sources offers promising perspectives, but harvesting is not trivial due to its data characteristics. In this article, we address key aspects in the field, including data availability, analysis-ready data preparation and data management, geo-information extraction from social media text messages and images, and the fusion of social media and remote sensing data. We then showcase some exemplary geographic applications. In addition, we present the first extensive discussion of ethical considerations of social media data in the context of geo-information harvesting and geographic applications. With this effort, we wish to stimulate curiosity and lay the groundwork for researchers who intend to explore social media data for geo-applications. We encourage the community to join forces by sharing their code and data. △ Less

Submitted 1 November, 2022; originally announced November 2022.

Comments: Accepted for publication IEEE Geoscience and Remote Sensing Magazine

arXiv:2206.06054 [pdf, other]

Specifying and Testing $k$-Safety Properties for Machine-Learning Models

Authors: Maria Christakis, Hasan Ferit Eniser, Jörg Hoffmann, Adish Singla, Valentin Wüstholz

Abstract: Machine-learning models are becoming increasingly prevalent in our lives, for instance assisting in image-classification or decision-making tasks. Consequently, the reliability of these models is of critical importance and has resulted in the development of numerous approaches for validating and verifying their robustness and fairness. However, beyond such specific properties, it is challenging to… ▽ More Machine-learning models are becoming increasingly prevalent in our lives, for instance assisting in image-classification or decision-making tasks. Consequently, the reliability of these models is of critical importance and has resulted in the development of numerous approaches for validating and verifying their robustness and fairness. However, beyond such specific properties, it is challenging to specify, let alone check, general functional-correctness expectations from models. In this paper, we take inspiration from specifications used in formal methods, expressing functional-correctness properties by reasoning about $k$ different executions, so-called $k$-safety properties. Considering a credit-screening model of a bank, the expected property that "if a person is denied a loan and their income decreases, they should still be denied the loan" is a 2-safety property. Here, we show the wide applicability of $k$-safety properties for machine-learning models and present the first specification language for expressing them. We also operationalize the language in a framework for automatically validating such properties using metamorphic testing. Our experiments show that our framework is effective in identifying property violations, and that detected bugs could be used to train better models. △ Less

Submitted 13 June, 2022; originally announced June 2022.

arXiv:2204.08524 [pdf, other]

So2Sat POP -- A Curated Benchmark Data Set for Population Estimation from Space on a Continental Scale

Authors: Sugandha Doda, Yuanyuan Wang, Matthias Kahl, Eike Jens Hoffmann, Kim Ouan, Hannes Taubenböck, Xiao Xiang Zhu

Abstract: Obtaining a dynamic population distribution is key to many decision-making processes such as urban planning, disaster management and most importantly helping the government to better allocate socio-technical supply. For the aspiration of these objectives, good population data is essential. The traditional method of collecting population data through the census is expensive and tedious. In recent y… ▽ More Obtaining a dynamic population distribution is key to many decision-making processes such as urban planning, disaster management and most importantly helping the government to better allocate socio-technical supply. For the aspiration of these objectives, good population data is essential. The traditional method of collecting population data through the census is expensive and tedious. In recent years, statistical and machine learning methods have been developed to estimate population distribution. Most of the methods use data sets that are either developed on a small scale or not publicly available yet. Thus, the development and evaluation of new methods become challenging. We fill this gap by providing a comprehensive data set for population estimation in 98 European cities. The data set comprises a digital elevation model, local climate zone, land use proportions, nighttime lights in combination with multi-spectral Sentinel-2 imagery, and data from the Open Street Map initiative. We anticipate that it would be a valuable addition to the research community for the development of sophisticated approaches in the field of population estimation. △ Less

Submitted 10 November, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

arXiv:2203.15556 [pdf, other]

Training Compute-Optimal Large Language Models

Authors: Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katie Millican, George van den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Jack W. Rae, Oriol Vinyals, Laurent Sifre

Abstract: We investigate the optimal model size and number of tokens for training a transformer language model under a given compute budget. We find that current large language models are significantly undertrained, a consequence of the recent focus on scaling language models whilst keeping the amount of training data constant. By training over 400 language models ranging from 70 million to over 16 billion… ▽ More We investigate the optimal model size and number of tokens for training a transformer language model under a given compute budget. We find that current large language models are significantly undertrained, a consequence of the recent focus on scaling language models whilst keeping the amount of training data constant. By training over 400 language models ranging from 70 million to over 16 billion parameters on 5 to 500 billion tokens, we find that for compute-optimal training, the model size and the number of training tokens should be scaled equally: for every doubling of model size the number of training tokens should also be doubled. We test this hypothesis by training a predicted compute-optimal model, Chinchilla, that uses the same compute budget as Gopher but with 70B parameters and 4$\times$ more more data. Chinchilla uniformly and significantly outperforms Gopher (280B), GPT-3 (175B), Jurassic-1 (178B), and Megatron-Turing NLG (530B) on a large range of downstream evaluation tasks. This also means that Chinchilla uses substantially less compute for fine-tuning and inference, greatly facilitating downstream usage. As a highlight, Chinchilla reaches a state-of-the-art average accuracy of 67.5% on the MMLU benchmark, greater than a 7% improvement over Gopher. △ Less

Submitted 29 March, 2022; originally announced March 2022.

arXiv:2203.09361 [pdf, other]

Expressivity of Planning with Horn Description Logic Ontologies (Technical Report)

Authors: Stefan Borgwardt, Jörg Hoffmann, Alisa Kovtunova, Markus Krötzsch, Bernhard Nebel, Marcel Steinmetz

Abstract: State constraints in AI Planning globally restrict the legal environment states. Standard planning languages make closed-domain and closed-world assumptions. Here we address open-world state constraints formalized by planning over a description logic (DL) ontology. Previously, this combination of DL and planning has been investigated for the light-weight DL DL-Lite. Here we propose a novel compila… ▽ More State constraints in AI Planning globally restrict the legal environment states. Standard planning languages make closed-domain and closed-world assumptions. Here we address open-world state constraints formalized by planning over a description logic (DL) ontology. Previously, this combination of DL and planning has been investigated for the light-weight DL DL-Lite. Here we propose a novel compilation scheme into standard PDDL with derived predicates, which applies to more expressive DLs and is based on the rewritability of DL queries into Datalog with stratified negation. We also provide a new rewritability result for the DL Horn-ALCHOIQ, which allows us to apply our compilation scheme to quite expressive ontologies. In contrast, we show that in the slight extension Horn-SROIQ no such compilation is possible unless the weak exponential hierarchy collapses. Finally, we show that our approach can outperform previous work on existing benchmarks for planning with DL ontologies, and is feasible on new benchmarks taking advantage of more expressive ontologies. That is an extended version of a paper accepted at AAAI 22. △ Less

Submitted 17 March, 2022; originally announced March 2022.

Comments: 16 pages with appendix

MSC Class: 68 ACM Class: I.2.4; I.2.8

arXiv:2202.07315 [pdf, other]

Using Social Media Images for Building Function Classification

Authors: Eike Jens Hoffmann, Karam Abdulahhad, Xiao Xiang Zhu

Abstract: Urban land use on a building instance level is crucial geo-information for many applications, yet difficult to obtain. An intuitive approach to close this gap is predicting building functions from ground level imagery. Social media image platforms contain billions of images, with a large variety of motifs including but not limited to street perspectives. To cope with this issue this study proposes… ▽ More Urban land use on a building instance level is crucial geo-information for many applications, yet difficult to obtain. An intuitive approach to close this gap is predicting building functions from ground level imagery. Social media image platforms contain billions of images, with a large variety of motifs including but not limited to street perspectives. To cope with this issue this study proposes a filtering pipeline to yield high quality, ground level imagery from large social media image datasets. The pipeline ensures that all resulting images have full and valid geotags with a compass direction to relate image content and spatial objects from maps. We analyze our method on a culturally diverse social media dataset from Flickr with more than 28 million images from 42 cities around the world. The obtained dataset is then evaluated in a context of 3-classes building function classification task. The three building classes that are considered in this study are: commercial, residential, and other. Fine-tuned state-of-the-art architectures yield F1-scores of up to 0.51 on the filtered images. Our analysis shows that the performance is highly limited by the quality of the labels obtained from OpenStreetMap, as the metrics increase by 0.2 if only human validated labels are considered. Therefore, we consider these labels to be weak and publish the resulting images from our pipeline together with the buildings they are showing as a weakly labeled dataset. △ Less

Submitted 15 February, 2022; originally announced February 2022.

arXiv:2202.01169 [pdf, other]

Unified Scaling Laws for Routed Language Models

Authors: Aidan Clark, Diego de las Casas, Aurelia Guy, Arthur Mensch, Michela Paganini, Jordan Hoffmann, Bogdan Damoc, Blake Hechtman, Trevor Cai, Sebastian Borgeaud, George van den Driessche, Eliza Rutherford, Tom Hennigan, Matthew Johnson, Katie Millican, Albin Cassirer, Chris Jones, Elena Buchatskaya, David Budden, Laurent Sifre, Simon Osindero, Oriol Vinyals, Jack Rae, Erich Elsen, Koray Kavukcuoglu , et al. (1 additional authors not shown)

Abstract: The performance of a language model has been shown to be effectively modeled as a power-law in its parameter count. Here we study the scaling behaviors of Routing Networks: architectures that conditionally use only a subset of their parameters while processing an input. For these models, parameter count and computational requirement form two independent axes along which an increase leads to better… ▽ More The performance of a language model has been shown to be effectively modeled as a power-law in its parameter count. Here we study the scaling behaviors of Routing Networks: architectures that conditionally use only a subset of their parameters while processing an input. For these models, parameter count and computational requirement form two independent axes along which an increase leads to better performance. In this work we derive and justify scaling laws defined on these two variables which generalize those known for standard language models and describe the performance of a wide range of routing architectures trained via three different techniques. Afterwards we provide two applications of these laws: first deriving an Effective Parameter Count along which all models scale at the same rate, and then using the scaling coefficients to give a quantitative comparison of the three routing techniques considered. Our analysis derives from an extensive evaluation of Routing Networks across five orders of magnitude of size, including models with hundreds of experts and hundreds of billions of parameters. △ Less

Submitted 9 February, 2022; v1 submitted 2 February, 2022; originally announced February 2022.

Comments: Fixing typos and affiliation clarity

arXiv:2201.03288 [pdf, other]

A statistical shape model for radiation-free assessment and classification of craniosynostosis

Authors: Matthias Schaufelberger, Reinald Peter Kühle, Andreas Wachter, Frederic Weichel, Niclas Hagen, Friedemann Ringwald, Urs Eisenmann, Jürgen Hoffmann, Michael Engel, Christian Freudlsperger, Werner Nahm

Abstract: The assessment of craniofacial deformities requires patient data which is sparsely available. Statistical shape models provide realistic and synthetic data enabling comparisons of existing methods on a common dataset. We build the first publicly available statistical 3D head model of craniosynostosis patients and the first model focusing on infants younger than 1.5 years. We further present a sh… ▽ More The assessment of craniofacial deformities requires patient data which is sparsely available. Statistical shape models provide realistic and synthetic data enabling comparisons of existing methods on a common dataset. We build the first publicly available statistical 3D head model of craniosynostosis patients and the first model focusing on infants younger than 1.5 years. We further present a shape-model-based classification pipeline to distinguish between three different classes of craniosynostosis and a control group on photogrammetric surface scans. To the best of our knowledge, our study uses the largest dataset of craniosynostosis patients in a classification study for craniosynostosis and statistical shape modeling to date. We demonstrate that our shape model performs similar to other statistical shape models of the human head. Craniosynostosis-specific pathologies are represented in the first eigenmodes of the model. Regarding the automatic classification of craniosynostis, our classification approach yields an accuracy of 97.8%, comparable to other state-of-the-art methods using both computed tomography scans and stereophotogrammetry. Our publicly available, craniosynostosis-specific statistical shape model enables the assessment of craniosynostosis on realistic and synthetic data. We further present a state-of-the-art shape-model-based classification approach for a radiation-free diagnosis of craniosynostosis. △ Less

Submitted 28 March, 2022; v1 submitted 10 January, 2022; originally announced January 2022.

arXiv:2112.11446 [pdf, other]

Scaling Language Models: Methods, Analysis & Insights from Training Gopher

Authors: Jack W. Rae, Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, Francis Song, John Aslanides, Sarah Henderson, Roman Ring, Susannah Young, Eliza Rutherford, Tom Hennigan, Jacob Menick, Albin Cassirer, Richard Powell, George van den Driessche, Lisa Anne Hendricks, Maribeth Rauh, Po-Sen Huang, Amelia Glaese, Johannes Welbl, Sumanth Dathathri, Saffron Huang, Jonathan Uesato, John Mellor , et al. (55 additional authors not shown)

Abstract: Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world. In this paper, we present an analysis of Transformer-based language model performance across a wide range of model scales -- from models with tens of millions of parameters up to a 280 billion parameter model called Gop… ▽ More Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world. In this paper, we present an analysis of Transformer-based language model performance across a wide range of model scales -- from models with tens of millions of parameters up to a 280 billion parameter model called Gopher. These models are evaluated on 152 diverse tasks, achieving state-of-the-art performance across the majority. Gains from scale are largest in areas such as reading comprehension, fact-checking, and the identification of toxic language, but logical and mathematical reasoning see less benefit. We provide a holistic analysis of the training dataset and model's behaviour, covering the intersection of model scale with bias and toxicity. Finally we discuss the application of language models to AI safety and the mitigation of downstream harms. △ Less

Submitted 21 January, 2022; v1 submitted 8 December, 2021; originally announced December 2021.

Comments: 120 pages

arXiv:2112.04426 [pdf, other]

Improving language models by retrieving from trillions of tokens

Authors: Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George van den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, Diego de Las Casas, Aurelia Guy, Jacob Menick, Roman Ring, Tom Hennigan, Saffron Huang, Loren Maggiore, Chris Jones, Albin Cassirer, Andy Brock, Michela Paganini, Geoffrey Irving, Oriol Vinyals, Simon Osindero, Karen Simonyan , et al. (3 additional authors not shown)

Abstract: We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. With a $2$ trillion token database, our Retrieval-Enhanced Transformer (RETRO) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile, despite using 25$\times$ fewer parameters. After fine-tuning, RETRO performance translates to d… ▽ More We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. With a $2$ trillion token database, our Retrieval-Enhanced Transformer (RETRO) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile, despite using 25$\times$ fewer parameters. After fine-tuning, RETRO performance translates to downstream knowledge-intensive tasks such as question answering. RETRO combines a frozen Bert retriever, a differentiable encoder and a chunked cross-attention mechanism to predict tokens based on an order of magnitude more data than what is typically consumed during training. We typically train RETRO from scratch, yet can also rapidly RETROfit pre-trained transformers with retrieval and still achieve good performance. Our work opens up new avenues for improving language models through explicit memory at unprecedented scale. △ Less

Submitted 7 February, 2022; v1 submitted 8 December, 2021; originally announced December 2021.

Comments: Fix incorrect reported numbers in Table 14

arXiv:2111.00607 [pdf, other]

A Systematic Investigation of Commonsense Knowledge in Large Language Models

Authors: Xiang Lorraine Li, Adhiguna Kuncoro, Jordan Hoffmann, Cyprien de Masson d'Autume, Phil Blunsom, Aida Nematzadeh

Abstract: Language models (LMs) trained on large amounts of data have shown impressive performance on many NLP tasks under the zero-shot and few-shot setup. Here we aim to better understand the extent to which such models learn commonsense knowledge -- a critical component of many NLP applications. We conduct a systematic and rigorous zero-shot and few-shot commonsense evaluation of large pre-trained LMs, w… ▽ More Language models (LMs) trained on large amounts of data have shown impressive performance on many NLP tasks under the zero-shot and few-shot setup. Here we aim to better understand the extent to which such models learn commonsense knowledge -- a critical component of many NLP applications. We conduct a systematic and rigorous zero-shot and few-shot commonsense evaluation of large pre-trained LMs, where we: (i) carefully control for the LMs' ability to exploit potential surface cues and annotation artefacts, and (ii) account for variations in performance that arise from factors that are not related to commonsense knowledge. Our findings highlight the limitations of pre-trained LMs in acquiring commonsense knowledge without task-specific supervision; furthermore, using larger models or few-shot evaluation are insufficient to achieve human-level commonsense performance. △ Less

Submitted 31 October, 2022; v1 submitted 31 October, 2021; originally announced November 2021.

Comments: Accepted to EMNLP 2022

arXiv:2108.12251 [pdf, other]

Changes in Twitter geolocations: Insights and suggestions for future usage

Authors: Anna Kruspe, Matthias Häberle, Eike J. Hoffmann, Samyo Rode-Hasinger, Karam Abdulahhad, Xiao Xiang Zhu

Abstract: Twitter data has become established as a valuable source of data for various application scenarios in the past years. For many such applications, it is necessary to know where Twitter posts (tweets) were sent from or what location they refer to. Researchers have frequently used exact coordinates provided in a small percentage of tweets, but Twitter removed the option to share these coordinates in… ▽ More Twitter data has become established as a valuable source of data for various application scenarios in the past years. For many such applications, it is necessary to know where Twitter posts (tweets) were sent from or what location they refer to. Researchers have frequently used exact coordinates provided in a small percentage of tweets, but Twitter removed the option to share these coordinates in mid-2019. Moreover, there is reason to suspect that a large share of the provided coordinates did not correspond to GPS coordinates of the user even before that. In this paper, we explain the situation and the 2019 policy change and shed light on the various options of still obtaining location information from tweets. We provide usage statistics including changes over time, and analyze what the removal of exact coordinates means for various common research tasks performed with Twitter data. Finally, we make suggestions for future research requiring geolocated tweets. △ Less

Submitted 22 September, 2021; v1 submitted 27 August, 2021; originally announced August 2021.

arXiv:2107.01820 [pdf]

An Explainable AI System for the Diagnosis of High Dimensional Biomedical Data

Authors: Alfred Ultsch, Jörg Hoffmann, Maximilian Röhnert, Malte Von Bonin, Uta Oelschlägel, Cornelia Brendel, Michael C. Thrun

Abstract: Typical state of the art flow cytometry data samples consists of measures of more than 100.000 cells in 10 or more features. AI systems are able to diagnose such data with almost the same accuracy as human experts. However, there is one central challenge in such systems: their decisions have far-reaching consequences for the health and life of people, and therefore, the decisions of AI systems nee… ▽ More Typical state of the art flow cytometry data samples consists of measures of more than 100.000 cells in 10 or more features. AI systems are able to diagnose such data with almost the same accuracy as human experts. However, there is one central challenge in such systems: their decisions have far-reaching consequences for the health and life of people, and therefore, the decisions of AI systems need to be understandable and justifiable by humans. In this work, we present a novel explainable AI method, called ALPODS, which is able to classify (diagnose) cases based on clusters, i.e., subpopulations, in the high-dimensional data. ALPODS is able to explain its decisions in a form that is understandable for human experts. For the identified subpopulations, fuzzy reasoning rules expressed in the typical language of domain experts are generated. A visualization method based on these rules allows human experts to understand the reasoning used by the AI system. A comparison to a selection of state of the art explainable AI systems shows that ALPODS operates efficiently on known benchmark data and also on everyday routine case data. △ Less

Submitted 1 March, 2022; v1 submitted 5 July, 2021; originally announced July 2021.

Comments: 29 pages, 5 figure, 5 tables, data available at https://data.mendeley.com/datasets/jk4dt6wprv/1

MSC Class: 68T05 ACM Class: I.2; I.5

arXiv:2106.13936 [pdf, ps, other]

Automatic Amortized Resource Analysis with the Quantum Physicist's Method

Authors: David M Kahn, Jan Hoffmann

Abstract: We present a novel method for working with the physicist's method of amortized resource analysis, which we call the quantum physicist's method. These principles allow for more precise analyses of resources that are not monotonically consumed, like stack. This method takes its name from its two major features, worldviews and resource tunneling, which behave analogously to quantum superposition and… ▽ More We present a novel method for working with the physicist's method of amortized resource analysis, which we call the quantum physicist's method. These principles allow for more precise analyses of resources that are not monotonically consumed, like stack. This method takes its name from its two major features, worldviews and resource tunneling, which behave analogously to quantum superposition and quantum tunneling. We use the quantum physicist's method to extend the Automatic Amortized Resource Analysis (AARA) type system, enabling the derivation of resource bounds based on tree depth. In doing so, we also introduce remainder contexts, which aid bookkeeping in linear type systems. We then evaluate this new type system's performance by bounding stack use of functions in the Set module of OCaml's standard library. Compared to state-of-the-art implementations of AARA, our new system derives tighter bounds with only moderate overhead. △ Less

Submitted 25 June, 2021; originally announced June 2021.

arXiv:2106.12182 [pdf, other]

Fairness for Image Generation with Uncertain Sensitive Attributes

Authors: Ajil Jalal, Sushrut Karmalkar, Jessica Hoffmann, Alexandros G. Dimakis, Eric Price

Abstract: This work tackles the issue of fairness in the context of generative procedures, such as image super-resolution, which entail different definitions from the standard classification setting. Moreover, while traditional group fairness definitions are typically defined with respect to specified protected groups -- camouflaging the fact that these groupings are artificial and carry historical and poli… ▽ More This work tackles the issue of fairness in the context of generative procedures, such as image super-resolution, which entail different definitions from the standard classification setting. Moreover, while traditional group fairness definitions are typically defined with respect to specified protected groups -- camouflaging the fact that these groupings are artificial and carry historical and political motivations -- we emphasize that there are no ground truth identities. For instance, should South and East Asians be viewed as a single group or separate groups? Should we consider one race as a whole or further split by gender? Choosing which groups are valid and who belongs in them is an impossible dilemma and being "fair" with respect to Asians may require being "unfair" with respect to South Asians. This motivates the introduction of definitions that allow algorithms to be \emph{oblivious} to the relevant groupings. We define several intuitive notions of group fairness and study their incompatibilities and trade-offs. We show that the natural extension of demographic parity is strongly dependent on the grouping, and \emph{impossible} to achieve obliviously. On the other hand, the conceptually new definition we introduce, Conditional Proportional Representation, can be achieved obliviously through Posterior Sampling. Our experiments validate our theoretical results and achieve fair image reconstruction using state-of-the-art generative models. △ Less

Submitted 2 July, 2021; v1 submitted 23 June, 2021; originally announced June 2021.

arXiv:2104.03598 [pdf, other]

Sound Probabilistic Inference via Guide Types

Authors: Di Wang, Jan Hoffmann, Thomas Reps

Abstract: Probabilistic programming languages aim to describe and automate Bayesian modeling and inference. Modern languages support programmable inference, which allows users to customize inference algorithms by incorporating guide programs to improve inference performance. For Bayesian inference to be sound, guide programs must be compatible with model programs. One pervasive but challenging condition for… ▽ More Probabilistic programming languages aim to describe and automate Bayesian modeling and inference. Modern languages support programmable inference, which allows users to customize inference algorithms by incorporating guide programs to improve inference performance. For Bayesian inference to be sound, guide programs must be compatible with model programs. One pervasive but challenging condition for model-guide compatibility is absolute continuity, which requires that the model and guide programs define probability distributions with the same support. This paper presents a new probabilistic programming language that guarantees absolute continuity, and features general programming constructs, such as branching and recursion. Model and guide programs are implemented as coroutines that communicate with each other to synchronize the set of random variables they sample during their execution. Novel guide types describe and enforce communication protocols between coroutines. If the model and guide are well-typed using the same protocol, then they are guaranteed to enjoy absolute continuity. An efficient algorithm infers guide types from code so that users do not have to specify the types. The new programming language is evaluated with an implementation that includes the type-inference algorithm and a prototype compiler that targets Pyro. Experiments show that our language is capable of expressing a variety of probabilistic models with nontrivial control flow and recursion, and that the coroutine-based computation does not introduce significant overhead in actual Bayesian inference. △ Less

Submitted 8 April, 2021; originally announced April 2021.

arXiv:2103.16105 [pdf, ps, other]

Expected-Cost Analysis for Probabilistic Programs and Semantics-Level Adaption of Optional Stopping Theorems

Authors: Di Wang, Jan Hoffmann, Thomas Reps

Abstract: In this article, we present a semantics-level adaption of the Optional Stopping Theorem, sketch an expected-cost analysis as its application, and survey different variants of the Optional Stopping Theorem that have been used in static analysis of probabilistic programs. In this article, we present a semantics-level adaption of the Optional Stopping Theorem, sketch an expected-cost analysis as its application, and survey different variants of the Optional Stopping Theorem that have been used in static analysis of probabilistic programs. △ Less

Submitted 30 March, 2021; originally announced March 2021.

Comments: arXiv admin note: substantial text overlap with arXiv:2001.10150

arXiv:2011.09705 [pdf, other]

Iterative Planning with Plan-Space Explanations: A Tool and User Study

Authors: Rebecca Eifler, Jörg Hoffmann

Abstract: In a variety of application settings, the user preference for a planning task - the precise optimization objective - is difficult to elicit. One possible remedy is planning as an iterative process, allowing the user to iteratively refine and modify example plans. A key step to support such a process are explanations, answering user questions about the current plan. In particular, a relevant kind o… ▽ More In a variety of application settings, the user preference for a planning task - the precise optimization objective - is difficult to elicit. One possible remedy is planning as an iterative process, allowing the user to iteratively refine and modify example plans. A key step to support such a process are explanations, answering user questions about the current plan. In particular, a relevant kind of question is "Why does the plan you suggest not satisfy $p$?", where p is a plan property desirable to the user. Note that such a question pertains to plan space, i.e., the set of possible alternative plans. Adopting the recent approach to answer such questions in terms of plan-property dependencies, here we implement a tool and user interface for human-guided iterative planning including plan-space explanations. The tool runs in standard Web browsers, and provides simple user interfaces for both developers and users. We conduct a first user study, whose outcome indicates the usefulness of plan-property dependency explanations in iterative planning. △ Less

Submitted 19 November, 2020; originally announced November 2020.

Comments: Proceedings of the International Workshop of Explainable AI Planning (XAIP'20), at ICAPS'20

arXiv:2011.09037 [pdf, other]

Probabilistic Resource-Aware Session Types

Authors: Ankush Das, Di Wang, Jan Hoffmann

Abstract: Session types guarantee that message-passing processes adhere to predefined communication protocols. Prior work on session types has focused on deterministic languages but many message-passing systems, such as Markov chains and randomized distributed algorithms, are probabilistic. To model and analyze such systems, this article introduces probabilistic session types and explores their application… ▽ More Session types guarantee that message-passing processes adhere to predefined communication protocols. Prior work on session types has focused on deterministic languages but many message-passing systems, such as Markov chains and randomized distributed algorithms, are probabilistic. To model and analyze such systems, this article introduces probabilistic session types and explores their application in automatic expected resource analysis. Probabilistic session types describe probability distributions over messages and are a conservative extension of intuitionistic (binary) session types. To send on a probabilistic channel, processes have to utilize internal randomness from a probabilistic branching expression or external randomness from receiving on a probabilistic channel. The analysis for expected resource bounds is integrated with the type system and is a variant of automatic amortized resource analysis. It can automatically derive symbolic bounds for different cost metrics by reducing type inference to linear constraint solving. The technical contributions include the meta theory that is based on a novel nested multiverse semantics and a type-reconstruction algorithm that allows flexible mixing of different sources of randomness without burdening the programmer with type annotations. The type system has been implemented in the language PRast. Experiments demonstrate that PRast is applicable in different domains such as resource analysis of randomized distributed algorithms, verification of limiting distributions in Markov chains, and analysis of probabilistic digital contracts. △ Less

Submitted 17 November, 2020; originally announced November 2020.

Comments: technical report

arXiv:2010.16353 [pdf, other]

Typable Fragments of Polynomial Automatic Amortized Resource Analysis

Authors: Long Pham, Jan Hoffmann

Abstract: Being a fully automated technique for resource analysis, automatic amortized resource analysis (AARA) can fail in returning worst-case cost bounds of programs, fundamentally due to the undecidability of resource analysis. For programmers who are unfamiliar with the technical details of AARA, it is difficult to predict whether a program can be successfully analyzed in AARA. Motivated by this proble… ▽ More Being a fully automated technique for resource analysis, automatic amortized resource analysis (AARA) can fail in returning worst-case cost bounds of programs, fundamentally due to the undecidability of resource analysis. For programmers who are unfamiliar with the technical details of AARA, it is difficult to predict whether a program can be successfully analyzed in AARA. Motivated by this problem, this article identifies classes of programs that can be analyzed in type-based polynomial AARA. Firstly, it is shown that the set of functions that are typable in univariate polynomial AARA coincides with the complexity class PTIME. Secondly, the article presents a sufficient condition for typability that axiomatically requires every sub-expression of a given program to be polynomial-time. It is proved that this condition implies typability in multivariate polynomial AARA under some syntactic restrictions. △ Less

Submitted 30 October, 2020; originally announced October 2020.

Comments: This is the full version of our CSL 2021 paper

MSC Class: 68N15 ACM Class: D.3.1; F.3.1

arXiv:2010.03982 [pdf, other]

Generating Instructions at Different Levels of Abstraction

Authors: Arne Köhn, Julia Wichlacz, Álvaro Torralba, Daniel Höller, Jörg Hoffmann, Alexander Koller

Abstract: When generating technical instructions, it is often convenient to describe complex objects in the world at different levels of abstraction. A novice user might need an object explained piece by piece, while for an expert, talking about the complex object (e.g. a wall or railing) directly may be more succinct and efficient. We show how to generate building instructions at different levels of abstra… ▽ More When generating technical instructions, it is often convenient to describe complex objects in the world at different levels of abstraction. A novice user might need an object explained piece by piece, while for an expert, talking about the complex object (e.g. a wall or railing) directly may be more succinct and efficient. We show how to generate building instructions at different levels of abstraction in Minecraft. We introduce the use of hierarchical planning to this end, a method from AI planning which can capture the structure of complex objects neatly. A crowdsourcing evaluation shows that the choice of abstraction level matters to users, and that an abstraction strategy which balances low-level and high-level object descriptions compares favorably to ones which don't. △ Less

Submitted 8 October, 2020; originally announced October 2020.

Comments: Accepted COLING 2020 long paper

arXiv:2008.08262 [pdf, other]

Quarantines as a Targeted Immunization Strategy

Authors: Jessica Hoffmann, Matt Jordan, Constantine Caramanis

Abstract: In the context of the recent COVID-19 outbreak, quarantine has been used to "flatten the curve" and slow the spread of the disease. In this paper, we show that this is not the only benefit of quarantine for the mitigation of an SIR epidemic spreading on a graph. Indeed, human contact networks exhibit a powerlaw structure, which means immunizing nodes at random is extremely ineffective at slowing t… ▽ More In the context of the recent COVID-19 outbreak, quarantine has been used to "flatten the curve" and slow the spread of the disease. In this paper, we show that this is not the only benefit of quarantine for the mitigation of an SIR epidemic spreading on a graph. Indeed, human contact networks exhibit a powerlaw structure, which means immunizing nodes at random is extremely ineffective at slowing the epidemic, while immunizing high-degree nodes can efficiently guarantee herd immunity. We theoretically prove that if quarantines are declared at the right moment, high-degree nodes are disproportionately in the Removed state, which is a form of targeted immunization. Even if quarantines are declared too early, subsequent waves of infection spread slower than the first waves. This leads us to propose an opening and closing strategy aiming at immunizing the graph while infecting the minimum number of individuals, guaranteeing the population is now robust to future infections. To the best of our knowledge, this is the only strategy that guarantees herd immunity without requiring vaccines. We extensively verify our results on simulated and real-life networks. △ Less

Submitted 20 February, 2021; v1 submitted 19 August, 2020; originally announced August 2020.

arXiv:2008.00766 [pdf, other]

Tracking the Race Between Deep Reinforcement Learning and Imitation Learning -- Extended Version

Authors: Timo P. Gros, Daniel Höller, Jörg Hoffmann, Verena Wolf

Abstract: Learning-based approaches for solving large sequential decision making problems have become popular in recent years. The resulting agents perform differently and their characteristics depend on those of the underlying learning approach. Here, we consider a benchmark planning problem from the reinforcement learning domain, the Racetrack, to investigate the properties of agents derived from differen… ▽ More Learning-based approaches for solving large sequential decision making problems have become popular in recent years. The resulting agents perform differently and their characteristics depend on those of the underlying learning approach. Here, we consider a benchmark planning problem from the reinforcement learning domain, the Racetrack, to investigate the properties of agents derived from different deep (reinforcement) learning approaches. We compare the performance of deep supervised learning, in particular imitation learning, to reinforcement learning for the Racetrack model. We find that imitation learning yields agents that follow more risky paths. In contrast, the decisions of deep reinforcement learning are more foresighted, i.e., avoid states in which fatal decisions are more likely. Our evaluations show that for this sequential decision making problem, deep reinforcement learning performs best in many aspects even though for imitation learning optimal decisions are considered. △ Less

Submitted 3 August, 2020; originally announced August 2020.

Comments: Extended Version of the Conference Paper published in the Proceedings of the 17th International Conference on Quantitative Evaluation of SysTems (QEST)

arXiv:2006.16233 [pdf, ps, other]

Liquid Resource Types

Authors: Tristan Knoth, Di Wang, Adam Reynolds, Jan Hoffmann, Nadia Polikarpova

Abstract: This article presents liquid resource types, a technique for automatically verifying the resource consumption of functional programs. Existing resource analysis techniques trade automation for flexibility -- automated techniques are restricted to relatively constrained families of resource bounds, while more expressive proof techniques admitting value-dependent bounds rely on handwritten proofs. L… ▽ More This article presents liquid resource types, a technique for automatically verifying the resource consumption of functional programs. Existing resource analysis techniques trade automation for flexibility -- automated techniques are restricted to relatively constrained families of resource bounds, while more expressive proof techniques admitting value-dependent bounds rely on handwritten proofs. Liquid resource types combine the best of these approaches, using logical refinements to automatically prove precise bounds on a program's resource consumption. The type system augments refinement types with potential annotations to conduct an amortized resource analysis. Importantly, users can annotate data structure declarations to indicate how potential is allocated within the type, allowing the system to express bounds with polynomials and exponentials, as well as more precise expressions depending on program values. We prove the soundness of the type system, provide a library of flexible and reusable data structures for conducting resource analysis, and use our prototype implementation to automatically verify resource bounds that previously required a manual proof. △ Less

Submitted 1 July, 2020; v1 submitted 29 June, 2020; originally announced June 2020.

arXiv:2006.14010 [pdf, ps, other]

Raising Expectations: Automating Expected Cost Analysis with Types

Authors: Di Wang, David M Kahn, Jan Hoffmann

Abstract: This article presents a type-based analysis for deriving upper bounds on the expected execution cost of probabilistic programs. The analysis is naturally compositional, parametric in the cost model, and supports higher order functions and inductive data types. The derived bounds are multivariate polynomials that are functions of data structures. Bound inference is enabled by local type rules that… ▽ More This article presents a type-based analysis for deriving upper bounds on the expected execution cost of probabilistic programs. The analysis is naturally compositional, parametric in the cost model, and supports higher order functions and inductive data types. The derived bounds are multivariate polynomials that are functions of data structures. Bound inference is enabled by local type rules that reduce type inference to linear constraint solving. The type system is based on the potential method of amortized analysis and extends automatic amortized resource analysis (AARA) for deterministic programs. A main innovation is that bounds can contain symbolic probabilities, which may appear in data structures and function arguments. Another contribution is a novel soundness proof that establishes the correctness of the derived bounds with respect to a distribution-based operational cost semantics that also includes nontrivial diverging behavior. For cost models like time, derived bounds imply termination with probability one. To highlight the novel ideas, the presentation focuses on linear potential and a core language. However, the analysis is implemented as an extension of Resource Aware ML and supports polynomial bounds and user defined data structures. The effectiveness of the technique is evaluated by analyzing the sample complexity of discrete distributions and with a novel average-case estimation for deterministic programs that combines expected cost analysis with statistical methods. △ Less

Submitted 21 September, 2020; v1 submitted 24 June, 2020; originally announced June 2020.

arXiv:2006.07360 [pdf, other]

AlgebraNets

Authors: Jordan Hoffmann, Simon Schmitt, Simon Osindero, Karen Simonyan, Erich Elsen

Abstract: Neural networks have historically been built layerwise from the set of functions in ${f: \mathbb{R}^n \to \mathbb{R}^m }$, i.e. with activations and weights/parameters represented by real numbers, $\mathbb{R}$. Our work considers a richer set of objects for activations and weights, and undertakes a comprehensive study of alternative algebras as number representations by studying their performance… ▽ More Neural networks have historically been built layerwise from the set of functions in ${f: \mathbb{R}^n \to \mathbb{R}^m }$, i.e. with activations and weights/parameters represented by real numbers, $\mathbb{R}$. Our work considers a richer set of objects for activations and weights, and undertakes a comprehensive study of alternative algebras as number representations by studying their performance on two challenging problems: large-scale image classification using the ImageNet dataset and language modeling using the enwiki8 and WikiText-103 datasets. We denote this broader class of models as AlgebraNets. Our findings indicate that the conclusions of prior work, which explored neural networks constructed from $\mathbb{C}$ (complex numbers) and $\mathbb{H}$ (quaternions) on smaller datasets, do not always transfer to these challenging settings. However, our results demonstrate that there are alternative algebras which deliver better parameter and computational efficiency compared with $\mathbb{R}$. We consider $\mathbb{C}$, $\mathbb{H}$, $M_{2}(\mathbb{R})$ (the set of $2\times2$ real-valued matrices), $M_{2}(\mathbb{C})$, $M_{3}(\mathbb{R})$ and $M_{4}(\mathbb{R})$. Additionally, we note that multiplication in these algebras has higher compute density than real multiplication, a useful property in situations with inherently limited parameter reuse such as auto-regressive inference and sparse neural networks. We therefore investigate how to induce sparsity within AlgebraNets. We hope that our strong results on large-scale, practical benchmarks will spur further exploration of these unconventional architectures which challenge the default choice of using real numbers for neural network weights and activations. △ Less

Submitted 16 June, 2020; v1 submitted 12 June, 2020; originally announced June 2020.

arXiv:2005.04074 [pdf, other]

Adversarial Graph Embeddings for Fair Influence Maximization over Social Networks

Authors: Moein Khajehnejad, Ahmad Asgharian Rezaei, Mahmoudreza Babaei, Jessica Hoffmann, Mahdi Jalili, Adrian Weller

Abstract: Influence maximization is a widely studied topic in network science, where the aim is to reach the maximum possible number of nodes, while only targeting a small initial set of individuals. It has critical applications in many fields, including viral marketing, information propagation, news dissemination, and vaccinations. However, the objective does not usually take into account whether the final… ▽ More Influence maximization is a widely studied topic in network science, where the aim is to reach the maximum possible number of nodes, while only targeting a small initial set of individuals. It has critical applications in many fields, including viral marketing, information propagation, news dissemination, and vaccinations. However, the objective does not usually take into account whether the final set of influenced nodes is fair with respect to sensitive attributes, such as race or gender. Here we address fair influence maximization, aiming to reach minorities more equitably. We introduce Adversarial Graph Embeddings: we co-train an auto-encoder for graph embedding and a discriminator to discern sensitive attributes. This leads to embeddings which are similarly distributed across sensitive attributes. We then find a good initial set by clustering the embeddings. We believe we are the first to use embeddings for the task of fair influence maximization. While there are typically trade-offs between fairness and influence maximization objectives, our experiments on synthetic and real-world datasets show that our approach dramatically reduces disparity while remaining competitive with state-of-the-art influence maximization methods. △ Less

Submitted 10 May, 2020; v1 submitted 8 May, 2020; originally announced May 2020.

Comments: In Proc. of the 29th International Joint Conference on Artificial Intelligence (IJCAI'20), 2020

arXiv:2002.09519 [pdf, ps, other]

Exponential Automatic Amortized Resource Analysis

Authors: David M Kahn, Jan Hoffmann

Abstract: Automatic amortized resource analysis (AARA) is a type-based technique for inferring concrete (non-asymptotic) bounds on a program's resource usage. Existing work on AARA has focused on bounds that are polynomial in the sizes of the inputs. This paper presents and extension of AARA to exponential bounds that preserves the benefits of the technique, such as compositionality and efficient type infer… ▽ More Automatic amortized resource analysis (AARA) is a type-based technique for inferring concrete (non-asymptotic) bounds on a program's resource usage. Existing work on AARA has focused on bounds that are polynomial in the sizes of the inputs. This paper presents and extension of AARA to exponential bounds that preserves the benefits of the technique, such as compositionality and efficient type inference based on linear constraint solving. A key idea is the use of the Stirling numbers of the second kind as the basis of potential functions, which play the same role as the binomial coefficients in polynomial AARA. To formalize the similarities with the existing analyses, the paper presents a general methodology for AARA that is instantiated to the polynomial version, the exponential version, and a combined system with potential functions that are formed by products of Stirling numbers and binomial coefficients. The soundness of exponential AARA is proved with respect to an operational cost semantics and the analysis of representative example programs demonstrates the effectiveness of the new analysis. △ Less

Submitted 5 March, 2020; v1 submitted 21 February, 2020; originally announced February 2020.

arXiv:2001.10150 [pdf, other]

Central Moment Analysis for Cost Accumulators in Probabilistic Programs

Authors: Di Wang, Jan Hoffmann, Thomas Reps

Abstract: For probabilistic programs, it is usually not possible to automatically derive exact information about their properties, such as the distribution of states at a given program point. Instead, one can attempt to derive approximations, such as upper bounds on tail probabilities. Such bounds can be obtained via concentration inequalities, which rely on the moments of a distribution, such as the expect… ▽ More For probabilistic programs, it is usually not possible to automatically derive exact information about their properties, such as the distribution of states at a given program point. Instead, one can attempt to derive approximations, such as upper bounds on tail probabilities. Such bounds can be obtained via concentration inequalities, which rely on the moments of a distribution, such as the expectation (the first raw moment) or the variance (the second central moment). Tail bounds obtained using central moments are often tighter than the ones obtained using raw moments, but automatically analyzing central moments is more challenging. This paper presents an analysis for probabilistic programs that automatically derives symbolic upper and lower bounds on variances, as well as higher central moments, of cost accumulators. To overcome the challenges of higher-moment analysis, it generalizes analyses for expectations with an algebraic abstraction that simultaneously analyzes different moments, utilizing relations between them. A key innovation is the notion of moment-polymorphic recursion, and a practical derivation system that handles recursive functions. The analysis has been implemented using a template-based technique that reduces the inference of polynomial bounds to linear programming. Experiments with our prototype central-moment analyzer show that, despite the analyzer's upper/lower bounds on various quantities, it obtains tighter tail bounds than an existing system that uses only raw moments, such as expectations. △ Less

Submitted 8 April, 2021; v1 submitted 27 January, 2020; originally announced January 2020.

arXiv:1909.10893 [pdf, other]

Recurrent Independent Mechanisms

Authors: Anirudh Goyal, Alex Lamb, Jordan Hoffmann, Shagun Sodhani, Sergey Levine, Yoshua Bengio, Bernhard Schölkopf

Abstract: Learning modular structures which reflect the dynamics of the environment can lead to better generalization and robustness to changes which only affect a few of the underlying causes. We propose Recurrent Independent Mechanisms (RIMs), a new recurrent architecture in which multiple groups of recurrent cells operate with nearly independent transition dynamics, communicate only sparingly through the… ▽ More Learning modular structures which reflect the dynamics of the environment can lead to better generalization and robustness to changes which only affect a few of the underlying causes. We propose Recurrent Independent Mechanisms (RIMs), a new recurrent architecture in which multiple groups of recurrent cells operate with nearly independent transition dynamics, communicate only sparingly through the bottleneck of attention, and are only updated at time steps where they are most relevant. We show that this leads to specialization amongst the RIMs, which in turn allows for dramatically improved generalization on tasks where some factors of variation differ systematically between training and evaluation. △ Less

Submitted 17 November, 2020; v1 submitted 24 September, 2019; originally announced September 2019.

arXiv:1909.00949 [pdf, other]

Data-Driven Approach to Encoding and Decoding 3-D Crystal Structures

Authors: Jordan Hoffmann, Louis Maestrati, Yoshihide Sawada, Jian Tang, Jean Michel Sellier, Yoshua Bengio

Abstract: Generative models have achieved impressive results in many domains including image and text generation. In the natural sciences, generative models have led to rapid progress in automated drug discovery. Many of the current methods focus on either 1-D or 2-D representations of typically small, drug-like molecules. However, many molecules require 3-D descriptors and exceed the chemical complexity of… ▽ More Generative models have achieved impressive results in many domains including image and text generation. In the natural sciences, generative models have led to rapid progress in automated drug discovery. Many of the current methods focus on either 1-D or 2-D representations of typically small, drug-like molecules. However, many molecules require 3-D descriptors and exceed the chemical complexity of commonly used dataset. We present a method to encode and decode the position of atoms in 3-D molecules from a dataset of nearly 50,000 stable crystal unit cells that vary from containing 1 to over 100 atoms. We construct a smooth and continuous 3-D density representation of each crystal based on the positions of different atoms. Two different neural networks were trained on a dataset of over 120,000 three-dimensional samples of single and repeating crystal structures, made by rotating the single unit cells. The first, an Encoder-Decoder pair, constructs a compressed latent space representation of each molecule and then decodes this description into an accurate reconstruction of the input. The second network segments the resulting output into atoms and assigns each atom an atomic number. By generating compressed, continuous latent spaces representations of molecules we are able to decode random samples, interpolate between two molecules, and alter known molecules. △ Less

Submitted 3 September, 2019; originally announced September 2019.

arXiv:1908.01000 [pdf, other]

InfoGraph: Unsupervised and Semi-supervised Graph-Level Representation Learning via Mutual Information Maximization

Authors: Fan-Yun Sun, Jordan Hoffmann, Vikas Verma, Jian Tang

Abstract: This paper studies learning the representations of whole graphs in both unsupervised and semi-supervised scenarios. Graph-level representations are critical in a variety of real-world applications such as predicting the properties of molecules and community analysis in social networks. Traditional graph kernel based methods are simple, yet effective for obtaining fixed-length representations for g… ▽ More This paper studies learning the representations of whole graphs in both unsupervised and semi-supervised scenarios. Graph-level representations are critical in a variety of real-world applications such as predicting the properties of molecules and community analysis in social networks. Traditional graph kernel based methods are simple, yet effective for obtaining fixed-length representations for graphs but they suffer from poor generalization due to hand-crafted designs. There are also some recent methods based on language models (e.g. graph2vec) but they tend to only consider certain substructures (e.g. subtrees) as graph representatives. Inspired by recent progress of unsupervised representation learning, in this paper we proposed a novel method called InfoGraph for learning graph-level representations. We maximize the mutual information between the graph-level representation and the representations of substructures of different scales (e.g., nodes, edges, triangles). By doing so, the graph-level representations encode aspects of the data that are shared across different scales of substructures. Furthermore, we further propose InfoGraph*, an extension of InfoGraph for semi-supervised scenarios. InfoGraph* maximizes the mutual information between unsupervised graph representations learned by InfoGraph and the representations learned by existing supervised methods. As a result, the supervised encoder learns from unlabeled data while preserving the latent semantic space favored by the current supervised task. Experimental results on the tasks of graph classification and molecular property prediction show that InfoGraph is superior to state-of-the-art baselines and InfoGraph* can achieve performance competitive with state-of-the-art semi-supervised models. △ Less

Submitted 17 January, 2020; v1 submitted 31 July, 2019; originally announced August 2019.

Comments: ICLR 2020 (spotlight)

arXiv:1906.07159 [pdf, other]

vGraph: A Generative Model for Joint Community Detection and Node Representation Learning

Authors: Fan-Yun Sun, Meng Qu, Jordan Hoffmann, Chin-Wei Huang, Jian Tang

Abstract: This paper focuses on two fundamental tasks of graph analysis: community detection and node representation learning, which capture the global and local structures of graphs, respectively. In the current literature, these two tasks are usually independently studied while they are actually highly correlated. We propose a probabilistic generative model called vGraph to learn community membership and… ▽ More This paper focuses on two fundamental tasks of graph analysis: community detection and node representation learning, which capture the global and local structures of graphs, respectively. In the current literature, these two tasks are usually independently studied while they are actually highly correlated. We propose a probabilistic generative model called vGraph to learn community membership and node representation collaboratively. Specifically, we assume that each node can be represented as a mixture of communities, and each community is defined as a multinomial distribution over nodes. Both the mixing coefficients and the community distribution are parameterized by the low-dimensional representations of the nodes and communities. We designed an effective variational inference algorithm which regularizes the community membership of neighboring nodes to be similar in the latent space. Experimental results on multiple real-world graphs show that vGraph is very effective in both community detection and node representation learning, outperforming many competitive baselines in both tasks. We show that the framework of vGraph is quite flexible and can be easily extended to detect hierarchical communities. △ Less

Submitted 17 September, 2019; v1 submitted 18 June, 2019; originally announced June 2019.

Comments: Accepted Paper at NeurIPS 2019

Showing 1–50 of 82 results for author: Hoffmann, J