-
PVF (Parameter Vulnerability Factor): A Scalable Metric for Understanding AI Vulnerability Against SDCs in Model Parameters
Authors:
Xun Jiao,
Fred Lin,
Harish D. Dixit,
Joel Coburn,
Abhinav Pandey,
Han Wang,
Venkat Ramesh,
Jianyu Huang,
Wang Xu,
Daniel Moore,
Sriram Sankar
Abstract:
Reliability of AI systems is a fundamental concern for the successful deployment and widespread adoption of AI technologies. Unfortunately, the escalating complexity and heterogeneity of AI hardware systems make them increasingly susceptible to hardware faults, e.g., silent data corruptions (SDC), that can potentially corrupt model parameters. When this occurs during AI inference/servicing, it can…
▽ More
Reliability of AI systems is a fundamental concern for the successful deployment and widespread adoption of AI technologies. Unfortunately, the escalating complexity and heterogeneity of AI hardware systems make them increasingly susceptible to hardware faults, e.g., silent data corruptions (SDC), that can potentially corrupt model parameters. When this occurs during AI inference/servicing, it can potentially lead to incorrect or degraded model output for users, ultimately affecting the quality and reliability of AI services. In light of the escalating threat, it is crucial to address key questions: How vulnerable are AI models to parameter corruptions, and how do different components (such as modules, layers) of the models exhibit varying vulnerabilities to parameter corruptions? To systematically address this question, we propose a novel quantitative metric, Parameter Vulnerability Factor (PVF), inspired by architectural vulnerability factor (AVF) in computer architecture community, aiming to standardize the quantification of AI model vulnerability against parameter corruptions. We define a model parameter's PVF as the probability that a corruption in that particular model parameter will result in an incorrect output. In this paper, we present several use cases on applying PVF to three types of tasks/models during inference -- recommendation (DLRM), vision classification (CNN), and text classification (BERT), while presenting an in-depth vulnerability analysis on DLRM. PVF can provide pivotal insights to AI hardware designers in balancing the tradeoff between fault protection and performance/efficiency such as mapping vulnerable AI parameter components to well-protected hardware modules. PVF metric is applicable to any AI model and has a potential to help unify and standardize AI vulnerability/resilience evaluation practice.
△ Less
Submitted 11 June, 2024; v1 submitted 2 May, 2024;
originally announced May 2024.
-
User-customizable Shared Control for Robot Teleoperation via Virtual Reality
Authors:
Rui Luo,
Mark Zolotas,
Drake Moore,
Taskin Padir
Abstract:
Shared control can ease and enhance a human operator's ability to teleoperate robots, particularly for intricate tasks demanding fine control over multiple degrees of freedom. However, the arbitration process dictating how much autonomous assistance to administer in shared control can confuse novice operators and impede their understanding of the robot's behavior. To overcome these adverse side-ef…
▽ More
Shared control can ease and enhance a human operator's ability to teleoperate robots, particularly for intricate tasks demanding fine control over multiple degrees of freedom. However, the arbitration process dictating how much autonomous assistance to administer in shared control can confuse novice operators and impede their understanding of the robot's behavior. To overcome these adverse side-effects, we propose a novel formulation of shared control that enables operators to tailor the arbitration to their unique capabilities and preferences. Unlike prior approaches to customizable shared control where users could indirectly modify the latent parameters of the arbitration function by issuing a feedback command, we instead make these parameters observable and directly editable via a virtual reality (VR) interface. We present our user-customizable shared control method for a teleoperation task in SE(3), known as the buzz wire game. A user study is conducted with participants teleoperating a robotic arm in VR to complete the game. The experiment spanned two weeks per subject to investigate longitudinal trends. Our findings reveal that users allowed to interactively tune the arbitration parameters across trials generalize well to adaptations in the task, exhibiting improvements in precision and fluency over direct teleoperation and conventional shared control.
△ Less
Submitted 14 August, 2024; v1 submitted 19 March, 2024;
originally announced March 2024.
-
Unraveling codes: fast, robust, beyond-bound error correction for DRAM
Authors:
Mike Hamburg,
Eric Linstadt,
Danny Moore,
Thomas Vogelsang
Abstract:
Generalized Reed-Solomon (RS) codes are a common choice for efficient, reliable error correction in memory and communications systems. These codes add $2t$ extra parity symbols to a block of memory, and can efficiently and reliably correct up to $t$ symbol errors in that block. Decoding is possible beyond this bound, but it is imperfectly reliable and often computationally expensive. Beyond-bound…
▽ More
Generalized Reed-Solomon (RS) codes are a common choice for efficient, reliable error correction in memory and communications systems. These codes add $2t$ extra parity symbols to a block of memory, and can efficiently and reliably correct up to $t$ symbol errors in that block. Decoding is possible beyond this bound, but it is imperfectly reliable and often computationally expensive. Beyond-bound decoding is an important problem to solve for error-correcting Dynamic Random Access Memory (DRAM). These memories are often designed so that each access touches two extra memory devices, so that a failure in any one device can be corrected. But system architectures increasingly require DRAM to store metadata in addition to user data. When the metadata replaces parity data, a single-device failure is then beyond-bound. An error-correction system can either protect each access with a single RS code, or divide it into several segments protected with a shorter code, usually in an Interleaved Reed-Solomon (IRS) configuration. The full-block RS approach is more reliable, both at correcting errors and at preventing silent data corruption (SDC). The IRS option is faster, and is especially efficient at beyond-bound correction of single- or double-device failures. Here we describe a new family of "unraveling" Reed-Solomon codes that bridges the gap between these options. Our codes are full-block generalized RS codes, but they can also be decoded using an IRS decoder. As a result, they combine the speed and beyond-bound correction capabilities of interleaved codes with the robustness of full-block codes, including the ability of the latter to reliably correct failures across multiple devices. We show that unraveling codes are an especially good fit for high-reliability DRAM error correction.
△ Less
Submitted 27 May, 2024; v1 submitted 19 January, 2024;
originally announced January 2024.
-
Shared Affordance-awareness via Augmented Reality for Proactive Assistance in Human-robot Collaboration
Authors:
Drake Moore,
Mark Zolotas,
Taskin Padir
Abstract:
Enabling humans and robots to collaborate effectively requires purposeful communication and an understanding of each other's affordances. Prior work in human-robot collaboration has incorporated knowledge of human affordances, i.e., their action possibilities in the current context, into autonomous robot decision-making. This "affordance awareness" is especially promising for service robots that n…
▽ More
Enabling humans and robots to collaborate effectively requires purposeful communication and an understanding of each other's affordances. Prior work in human-robot collaboration has incorporated knowledge of human affordances, i.e., their action possibilities in the current context, into autonomous robot decision-making. This "affordance awareness" is especially promising for service robots that need to know when and how to assist a person that cannot independently complete a task. However, robots still fall short in performing many common tasks autonomously. In this work-in-progress paper, we propose an augmented reality (AR) framework that bridges the gap in an assistive robot's capabilities by actively engaging with a human through a shared affordance-awareness representation. Leveraging the different perspectives from a human wearing an AR headset and a robot's equipped sensors, we can build a perceptual representation of the shared environment and model regions of respective agent affordances. The AR interface can also allow both agents to communicate affordances with one another, as well as prompt for assistance when attempting to perform an action outside their affordance region. This paper presents the main components of the proposed framework and discusses its potential through a domestic cleaning task experiment.
△ Less
Submitted 20 December, 2023;
originally announced December 2023.
-
Domain Adaptive Few-Shot Open-Set Learning
Authors:
Debabrata Pal,
Deeptej More,
Sai Bhargav,
Dipesh Tamboli,
Vaneet Aggarwal,
Biplab Banerjee
Abstract:
Few-shot learning has made impressive strides in addressing the crucial challenges of recognizing unknown samples from novel classes in target query sets and managing visual shifts between domains. However, existing techniques fall short when it comes to identifying target outliers under domain shifts by learning to reject pseudo-outliers from the source domain, resulting in an incomplete solution…
▽ More
Few-shot learning has made impressive strides in addressing the crucial challenges of recognizing unknown samples from novel classes in target query sets and managing visual shifts between domains. However, existing techniques fall short when it comes to identifying target outliers under domain shifts by learning to reject pseudo-outliers from the source domain, resulting in an incomplete solution to both problems. To address these challenges comprehensively, we propose a novel approach called Domain Adaptive Few-Shot Open Set Recognition (DA-FSOS) and introduce a meta-learning-based architecture named DAFOSNET. During training, our model learns a shared and discriminative embedding space while creating a pseudo open-space decision boundary, given a fully-supervised source domain and a label-disjoint few-shot target domain. To enhance data density, we use a pair of conditional adversarial networks with tunable noise variances to augment both domains closed and pseudo-open spaces. Furthermore, we propose a domain-specific batch-normalized class prototypes alignment strategy to align both domains globally while ensuring class-discriminativeness through novel metric objectives. Our training approach ensures that DAFOS-NET can generalize well to new scenarios in the target domain. We present three benchmarks for DA-FSOS based on the Office-Home, mini-ImageNet/CUB, and DomainNet datasets and demonstrate the efficacy of DAFOS-NET through extensive experimentation
△ Less
Submitted 22 September, 2023;
originally announced September 2023.
-
Evaluating and Enhancing Robustness of Deep Recommendation Systems Against Hardware Errors
Authors:
Dongning Ma,
Xun Jiao,
Fred Lin,
Mengshi Zhang,
Alban Desmaison,
Thomas Sellinger,
Daniel Moore,
Sriram Sankar
Abstract:
Deep recommendation systems (DRS) heavily depend on specialized HPC hardware and accelerators to optimize energy, efficiency, and recommendation quality. Despite the growing number of hardware errors observed in large-scale fleet systems where DRS are deployed, the robustness of DRS has been largely overlooked. This paper presents the first systematic study of DRS robustness against hardware error…
▽ More
Deep recommendation systems (DRS) heavily depend on specialized HPC hardware and accelerators to optimize energy, efficiency, and recommendation quality. Despite the growing number of hardware errors observed in large-scale fleet systems where DRS are deployed, the robustness of DRS has been largely overlooked. This paper presents the first systematic study of DRS robustness against hardware errors. We develop Terrorch, a user-friendly, efficient and flexible error injection framework on top of the widely-used PyTorch. We evaluate a wide range of models and datasets and observe that the DRS robustness against hardware errors is influenced by various factors from model parameters to input characteristics. We also explore 3 error mitigation methods including algorithm based fault tolerance (ABFT), activation clipping and selective bit protection (SBP). We find that applying activation clipping can recover up to 30% of the degraded AUC-ROC score, making it a promising mitigation method.
△ Less
Submitted 17 July, 2023;
originally announced July 2023.
-
Generative Adversarial Networks for Scintillation Signal Simulation in EXO-200
Authors:
S. Li,
I. Ostrovskiy,
Z. Li,
L. Yang,
S. Al Kharusi,
G. Anton,
I. Badhrees,
P. S. Barbeau,
D. Beck,
V. Belov,
T. Bhatta,
M. Breidenbach,
T. Brunner,
G. F. Cao,
W. R. Cen,
C. Chambers,
B. Cleveland,
M. Coon,
A. Craycraft,
T. Daniels,
L. Darroch,
S. J. Daugherty,
J. Davis,
S. Delaquis,
A. Der Mesrobian-Kabakian
, et al. (65 additional authors not shown)
Abstract:
Generative Adversarial Networks trained on samples of simulated or actual events have been proposed as a way of generating large simulated datasets at a reduced computational cost. In this work, a novel approach to perform the simulation of photodetector signals from the time projection chamber of the EXO-200 experiment is demonstrated. The method is based on a Wasserstein Generative Adversarial N…
▽ More
Generative Adversarial Networks trained on samples of simulated or actual events have been proposed as a way of generating large simulated datasets at a reduced computational cost. In this work, a novel approach to perform the simulation of photodetector signals from the time projection chamber of the EXO-200 experiment is demonstrated. The method is based on a Wasserstein Generative Adversarial Network - a deep learning technique allowing for implicit non-parametric estimation of the population distribution for a given set of objects. Our network is trained on real calibration data using raw scintillation waveforms as input. We find that it is able to produce high-quality simulated waveforms an order of magnitude faster than the traditional simulation approach and, importantly, generalize from the training sample and discern salient high-level features of the data. In particular, the network correctly deduces position dependency of scintillation light response in the detector and correctly recognizes dead photodetector channels. The network output is then integrated into the EXO-200 analysis framework to show that the standard EXO-200 reconstruction routine processes the simulated waveforms to produce energy distributions comparable to that of real waveforms. Finally, the remaining discrepancies and potential ways to improve the approach further are highlighted.
△ Less
Submitted 8 May, 2023; v1 submitted 11 March, 2023;
originally announced March 2023.
-
PyGFI: Analyzing and Enhancing Robustness of Graph Neural Networks Against Hardware Errors
Authors:
Ruixuan Wang,
Fred Lin,
Daniel Moore,
Sriram Sankar,
Xun Jiao
Abstract:
Graph neural networks (GNNs) have recently emerged as a promising learning paradigm in learning graph-structured data and have demonstrated wide success across various domains such as recommendation systems, social networks, and electronic design automation (EDA). Like other deep learning (DL) methods, GNNs are being deployed in sophisticated modern hardware systems, as well as dedicated accelerat…
▽ More
Graph neural networks (GNNs) have recently emerged as a promising learning paradigm in learning graph-structured data and have demonstrated wide success across various domains such as recommendation systems, social networks, and electronic design automation (EDA). Like other deep learning (DL) methods, GNNs are being deployed in sophisticated modern hardware systems, as well as dedicated accelerators. However, despite the popularity of GNNs and the recent efforts of bringing GNNs to hardware, the fault tolerance and resilience of GNNs have generally been overlooked. Inspired by the inherent algorithmic resilience of DL methods, this paper conducts, for the first time, a large-scale and empirical study of GNN resilience, aiming to understand the relationship between hardware faults and GNN accuracy. By developing a customized fault injection tool on top of PyTorch, we perform extensive fault injection experiments on various GNN models and application datasets. We observe that the error resilience of GNN models varies by orders of magnitude with respect to different models and application datasets. Further, we explore a low-cost error mitigation mechanism for GNN to enhance its resilience. This GNN resilience study aims to open up new directions and opportunities for future GNN accelerator design and architectural optimization.
△ Less
Submitted 24 April, 2023; v1 submitted 7 December, 2022;
originally announced December 2022.
-
Debiasing Methods for Fairer Neural Models in Vision and Language Research: A Survey
Authors:
Otávio Parraga,
Martin D. More,
Christian M. Oliveira,
Nathan S. Gavenski,
Lucas S. Kupssinskü,
Adilson Medronha,
Luis V. Moura,
Gabriel S. Simões,
Rodrigo C. Barros
Abstract:
Despite being responsible for state-of-the-art results in several computer vision and natural language processing tasks, neural networks have faced harsh criticism due to some of their current shortcomings. One of them is that neural networks are correlation machines prone to model biases within the data instead of focusing on actual useful causal relationships. This problem is particularly seriou…
▽ More
Despite being responsible for state-of-the-art results in several computer vision and natural language processing tasks, neural networks have faced harsh criticism due to some of their current shortcomings. One of them is that neural networks are correlation machines prone to model biases within the data instead of focusing on actual useful causal relationships. This problem is particularly serious in application domains affected by aspects such as race, gender, and age. To prevent models from incurring on unfair decision-making, the AI community has concentrated efforts in correcting algorithmic biases, giving rise to the research area now widely known as fairness in AI. In this survey paper, we provide an in-depth overview of the main debiasing methods for fairness-aware neural networks in the context of vision and language research. We propose a novel taxonomy to better organize the literature on debiasing methods for fairness, and we discuss the current challenges, trends, and important future work directions for the interested researcher and practitioner.
△ Less
Submitted 10 November, 2022;
originally announced November 2022.
-
Mapping the landscape of histomorphological cancer phenotypes using self-supervised learning on unlabeled, unannotated pathology slides
Authors:
Adalberto Claudio Quiros,
Nicolas Coudray,
Anna Yeaton,
Xinyu Yang,
Bojing Liu,
Hortense Le,
Luis Chiriboga,
Afreen Karimkhan,
Navneet Narula,
David A. Moore,
Christopher Y. Park,
Harvey Pass,
Andre L. Moreira,
John Le Quesne,
Aristotelis Tsirigos,
Ke Yuan
Abstract:
Definitive cancer diagnosis and management depend upon the extraction of information from microscopy images by pathologists. These images contain complex information requiring time-consuming expert human interpretation that is prone to human bias. Supervised deep learning approaches have proven powerful for classification tasks, but they are inherently limited by the cost and quality of annotation…
▽ More
Definitive cancer diagnosis and management depend upon the extraction of information from microscopy images by pathologists. These images contain complex information requiring time-consuming expert human interpretation that is prone to human bias. Supervised deep learning approaches have proven powerful for classification tasks, but they are inherently limited by the cost and quality of annotations used for training these models. To address this limitation of supervised methods, we developed Histomorphological Phenotype Learning (HPL), a fully blue{self-}supervised methodology that requires no expert labels or annotations and operates via the automatic discovery of discriminatory image features in small image tiles. Tiles are grouped into morphologically similar clusters which constitute a library of histomorphological phenotypes, revealing trajectories from benign to malignant tissue via inflammatory and reactive phenotypes. These clusters have distinct features which can be identified using orthogonal methods, linking histologic, molecular and clinical phenotypes. Applied to lung cancer tissues, we show that they align closely with patient survival, with histopathologically recognised tumor types and growth patterns, and with transcriptomic measures of immunophenotype. We then demonstrate that these properties are maintained in a multi-cancer study. These results show the clusters represent recurrent host responses and modes of tumor growth emerging under natural selection. Code, pre-trained models, learned embeddings, and documentation are available to the community at https://github.com/AdalbertoCq/Histomorphological-Phenotype-Learning
△ Less
Submitted 1 September, 2023; v1 submitted 4 May, 2022;
originally announced May 2022.
-
Embedded-model flows: Combining the inductive biases of model-free deep learning and explicit probabilistic modeling
Authors:
Gianluigi Silvestri,
Emily Fertig,
Dave Moore,
Luca Ambrogioni
Abstract:
Normalizing flows have shown great success as general-purpose density estimators. However, many real world applications require the use of domain-specific knowledge, which normalizing flows cannot readily incorporate. We propose embedded-model flows (EMF), which alternate general-purpose transformations with structured layers that embed domain-specific inductive biases. These layers are automatica…
▽ More
Normalizing flows have shown great success as general-purpose density estimators. However, many real world applications require the use of domain-specific knowledge, which normalizing flows cannot readily incorporate. We propose embedded-model flows (EMF), which alternate general-purpose transformations with structured layers that embed domain-specific inductive biases. These layers are automatically constructed by converting user-specified differentiable probabilistic models into equivalent bijective transformations. We also introduce gated structured layers, which allow bypassing the parts of the models that fail to capture the statistics of the data. We demonstrate that EMFs can be used to induce desirable properties such as multimodality, hierarchical coupling and continuity. Furthermore, we show that EMFs enable a high performance form of variational inference where the structure of the prior model is embedded in the variational architecture. In our experiments, we show that this approach outperforms state-of-the-art methods in common structured inference problems.
△ Less
Submitted 15 March, 2022; v1 submitted 12 October, 2021;
originally announced October 2021.
-
Fine-grained Uncertainty Modeling in Neural Networks
Authors:
Rahul Soni,
Naresh Shah,
Jimmy D. Moore
Abstract:
Existing uncertainty modeling approaches try to detect an out-of-distribution point from the in-distribution dataset. We extend this argument to detect finer-grained uncertainty that distinguishes between (a). certain points, (b). uncertain points but within the data distribution, and (c). out-of-distribution points. Our method corrects overconfident NN decisions, detects outlier points and learns…
▽ More
Existing uncertainty modeling approaches try to detect an out-of-distribution point from the in-distribution dataset. We extend this argument to detect finer-grained uncertainty that distinguishes between (a). certain points, (b). uncertain points but within the data distribution, and (c). out-of-distribution points. Our method corrects overconfident NN decisions, detects outlier points and learns to say ``I don't know'' when uncertain about a critical point between the top two predictions. In addition, we provide a mechanism to quantify class distributions overlap in the decision manifold and investigate its implications in model interpretability.
Our method is two-step: in the first step, the proposed method builds a class distribution using Kernel Activation Vectors (kav) extracted from the Network. In the second step, the algorithm determines the confidence of a test point by a hierarchical decision rule based on the chi-squared distribution of squared Mahalanobis distances.
Our method sits on top of a given Neural Network, requires a single scan of training data to estimate class distribution statistics, and is highly scalable to deep networks and wider pre-softmax layer. As a positive side effect, our method helps to prevent adversarial attacks without requiring any additional training. It is directly achieved when the Softmax layer is substituted by our robust uncertainty layer at the evaluation phase.
△ Less
Submitted 11 February, 2020;
originally announced February 2020.
-
Adversarial TCAV -- Robust and Effective Interpretation of Intermediate Layers in Neural Networks
Authors:
Rahul Soni,
Naresh Shah,
Chua Tat Seng,
Jimmy D. Moore
Abstract:
Interpreting neural network decisions and the information learned in intermediate layers is still a challenge due to the opaque internal state and shared non-linear interactions. Although (Kim et al, 2017) proposed to interpret intermediate layers by quantifying its ability to distinguish a user-defined concept (from random examples), the questions of robustness (variation against the choice of ra…
▽ More
Interpreting neural network decisions and the information learned in intermediate layers is still a challenge due to the opaque internal state and shared non-linear interactions. Although (Kim et al, 2017) proposed to interpret intermediate layers by quantifying its ability to distinguish a user-defined concept (from random examples), the questions of robustness (variation against the choice of random examples) and effectiveness (retrieval rate of concept images) remain. We investigate these two properties and propose improvements to make concept activations reliable for practical use.
Effectiveness: If the intermediate layer has effectively learned a user-defined concept, it should be able to recall --- at the testing step --- most of the images containing the proposed concept. For instance, we observed that the recall rate of Tiger shark and Great white shark from the ImageNet dataset with "Fins" as a user-defined concept was only 18.35% for VGG16. To increase the effectiveness of concept learning, we propose A-CAV --- the Adversarial Concept Activation Vector --- this results in larger margins between user concepts and (negative) random examples. This approach improves the aforesaid recall to 76.83% for VGG16.
For robustness, we define it as the ability of an intermediate layer to be consistent in its recall rate (the effectiveness) for different random seeds. We observed that TCAV has a large variance in recalling a concept across different random seeds. For example, the recall of cat images (from a layer learning the concept of tail) varies from 18% to 86% with 20.85% standard deviation on VGG16. We propose a simple and scalable modification that employs a Gram-Schmidt process to sample random noise from concepts and learn an average "concept classifier". This approach improves the aforesaid standard deviation from 20.85% to 6.4%.
△ Less
Submitted 26 February, 2020; v1 submitted 10 February, 2020;
originally announced February 2020.
-
tfp.mcmc: Modern Markov Chain Monte Carlo Tools Built for Modern Hardware
Authors:
Junpeng Lao,
Christopher Suter,
Ian Langmore,
Cyril Chimisov,
Ashish Saxena,
Pavel Sountsov,
Dave Moore,
Rif A. Saurous,
Matthew D. Hoffman,
Joshua V. Dillon
Abstract:
Markov chain Monte Carlo (MCMC) is widely regarded as one of the most important algorithms of the 20th century. Its guarantees of asymptotic convergence, stability, and estimator-variance bounds using only unnormalized probability functions make it indispensable to probabilistic programming. In this paper, we introduce the TensorFlow Probability MCMC toolkit, and discuss some of the considerations…
▽ More
Markov chain Monte Carlo (MCMC) is widely regarded as one of the most important algorithms of the 20th century. Its guarantees of asymptotic convergence, stability, and estimator-variance bounds using only unnormalized probability functions make it indispensable to probabilistic programming. In this paper, we introduce the TensorFlow Probability MCMC toolkit, and discuss some of the considerations that motivated its design.
△ Less
Submitted 4 February, 2020;
originally announced February 2020.
-
Automatic structured variational inference
Authors:
Luca Ambrogioni,
Kate Lin,
Emily Fertig,
Sharad Vikram,
Max Hinne,
Dave Moore,
Marcel van Gerven
Abstract:
Stochastic variational inference offers an attractive option as a default method for differentiable probabilistic programming. However, the performance of the variational approach depends on the choice of an appropriate variational family. Here, we introduce automatic structured variational inference (ASVI), a fully automated method for constructing structured variational families, inspired by the…
▽ More
Stochastic variational inference offers an attractive option as a default method for differentiable probabilistic programming. However, the performance of the variational approach depends on the choice of an appropriate variational family. Here, we introduce automatic structured variational inference (ASVI), a fully automated method for constructing structured variational families, inspired by the closed-form update in conjugate Bayesian models. These convex-update families incorporate the forward pass of the input probabilistic program and can therefore capture complex statistical dependencies. Convex-update families have the same space and time complexity as the input probabilistic program and are therefore tractable for a very large family of models including both continuous and discrete variables. We validate our automatic variational method on a wide range of low- and high-dimensional inference problems. We find that ASVI provides a clear improvement in performance when compared with other popular approaches such as the mean-field approach and inverse autoregressive flows. We provide an open source implementation of ASVI in TensorFlow Probability.
△ Less
Submitted 10 February, 2021; v1 submitted 3 February, 2020;
originally announced February 2020.
-
Joint Distributions for TensorFlow Probability
Authors:
Dan Piponi,
Dave Moore,
Joshua V. Dillon
Abstract:
A central tenet of probabilistic programming is that a model is specified exactly once in a canonical representation which is usable by inference algorithms. We describe JointDistributions, a family of declarative representations of directed graphical models in TensorFlow Probability.
A central tenet of probabilistic programming is that a model is specified exactly once in a canonical representation which is usable by inference algorithms. We describe JointDistributions, a family of declarative representations of directed graphical models in TensorFlow Probability.
△ Less
Submitted 21 January, 2020;
originally announced January 2020.
-
BERT Goes to Law School: Quantifying the Competitive Advantage of Access to Large Legal Corpora in Contract Understanding
Authors:
Emad Elwany,
Dave Moore,
Gaurav Oberoi
Abstract:
Fine-tuning language models, such as BERT, on domain specific corpora has proven to be valuable in domains like scientific papers and biomedical text. In this paper, we show that fine-tuning BERT on legal documents similarly provides valuable improvements on NLP tasks in the legal domain. Demonstrating this outcome is significant for analyzing commercial agreements, because obtaining large legal c…
▽ More
Fine-tuning language models, such as BERT, on domain specific corpora has proven to be valuable in domains like scientific papers and biomedical text. In this paper, we show that fine-tuning BERT on legal documents similarly provides valuable improvements on NLP tasks in the legal domain. Demonstrating this outcome is significant for analyzing commercial agreements, because obtaining large legal corpora is challenging due to their confidential nature. As such, we show that having access to large legal corpora is a competitive advantage for commercial applications, and academic research on analyzing contracts.
△ Less
Submitted 1 November, 2019;
originally announced November 2019.
-
Quantifying the pathways to life using assembly spaces
Authors:
Stuart M. Marshall,
Douglas Moore,
Alastair R. G. Murray,
Sara I. Walker,
Leroy Cronin
Abstract:
We have developed the concept of pathway assembly to explore the amount of extrinsic information required to build an object. To quantify this information in an agnostic way, we present a method to determine the amount of pathway assembly information contained within such an object by deconstructing the object into its irreducible parts, and then evaluating the minimum number of steps to reconstru…
▽ More
We have developed the concept of pathway assembly to explore the amount of extrinsic information required to build an object. To quantify this information in an agnostic way, we present a method to determine the amount of pathway assembly information contained within such an object by deconstructing the object into its irreducible parts, and then evaluating the minimum number of steps to reconstruct the object along any pathway. The mathematical formalisation of this approach uses an assembly space. By finding the minimal number of steps contained in the route by which the objects can be assembled within that space, we can compare how much information (I) is gained from knowing this pathway assembly index (PA) according to I_PA=log (|N|)/(|N_PA |) where, for an end product with PA=x, N is the set of objects possible that can be created from the same irreducible parts within x steps regardless of PA, and NPA is the subset of those objects with the precise pathway assembly index PA=x. Applying this formalism to objects formed in 1D, 2D and 3D space allows us to identify objects in the world or wider Universe that have high assembly numbers. We propose that objects with PA greater than a threshold are important because these are uniquely identifiable as those that must have been produced by biological or technological processes, rather than the assembly occurring via unbiased random processes alone. We think this approach is needed to help identify the new physical and chemical laws needed to understand what life is, by quantifying what life does.
△ Less
Submitted 9 August, 2019; v1 submitted 6 July, 2019;
originally announced July 2019.
-
Automatic Reparameterisation of Probabilistic Programs
Authors:
Maria I. Gorinova,
Dave Moore,
Matthew D. Hoffman
Abstract:
Probabilistic programming has emerged as a powerful paradigm in statistics, applied science, and machine learning: by decoupling modelling from inference, it promises to allow modellers to directly reason about the processes generating data. However, the performance of inference algorithms can be dramatically affected by the parameterisation used to express a model, requiring users to transform th…
▽ More
Probabilistic programming has emerged as a powerful paradigm in statistics, applied science, and machine learning: by decoupling modelling from inference, it promises to allow modellers to directly reason about the processes generating data. However, the performance of inference algorithms can be dramatically affected by the parameterisation used to express a model, requiring users to transform their programs in non-intuitive ways. We argue for automating these transformations, and demonstrate that mechanisms available in recent modeling frameworks can implement non-centring and related reparameterisations. This enables new inference algorithms, and we propose two: a simple approach using interleaved sampling and a novel variational formulation that searches over a continuous space of parameterisations. We show that these approaches enable robust inference across a range of models, and can yield more efficient samplers than the best fixed parameterisation.
△ Less
Submitted 7 June, 2019;
originally announced June 2019.
-
Effect Handling for Composable Program Transformations in Edward2
Authors:
Dave Moore,
Maria I. Gorinova
Abstract:
Algebraic effects and handlers have emerged in the programming languages community as a convenient, modular abstraction for controlling computational effects. They have found several applications including concurrent programming, meta programming, and more recently, probabilistic programming, as part of Pyro's Poutines library. We investigate the use of effect handlers as a lightweight abstraction…
▽ More
Algebraic effects and handlers have emerged in the programming languages community as a convenient, modular abstraction for controlling computational effects. They have found several applications including concurrent programming, meta programming, and more recently, probabilistic programming, as part of Pyro's Poutines library. We investigate the use of effect handlers as a lightweight abstraction for implementing probabilistic programming languages (PPLs). We interpret the existing design of Edward2 as an accidental implementation of an effect-handling mechanism, and extend that design to support nested, composable transformations. We demonstrate that this enables straightforward implementation of sophisticated model transformations and inference algorithms.
△ Less
Submitted 14 November, 2018;
originally announced November 2018.
-
Simple, Distributed, and Accelerated Probabilistic Programming
Authors:
Dustin Tran,
Matthew Hoffman,
Dave Moore,
Christopher Suter,
Srinivas Vasudevan,
Alexey Radul,
Matthew Johnson,
Rif A. Saurous
Abstract:
We describe a simple, low-level approach for embedding probabilistic programming in a deep learning ecosystem. In particular, we distill probabilistic programming down to a single abstraction---the random variable. Our lightweight implementation in TensorFlow enables numerous applications: a model-parallel variational auto-encoder (VAE) with 2nd-generation tensor processing units (TPUv2s); a data-…
▽ More
We describe a simple, low-level approach for embedding probabilistic programming in a deep learning ecosystem. In particular, we distill probabilistic programming down to a single abstraction---the random variable. Our lightweight implementation in TensorFlow enables numerous applications: a model-parallel variational auto-encoder (VAE) with 2nd-generation tensor processing units (TPUv2s); a data-parallel autoregressive model (Image Transformer) with TPUv2s; and multi-GPU No-U-Turn Sampler (NUTS). For both a state-of-the-art VAE on 64x64 ImageNet and Image Transformer on 256x256 CelebA-HQ, our approach achieves an optimal linear speedup from 1 to 256 TPUv2 chips. With NUTS, we see a 100x speedup on GPUs over Stan and 37x over PyMC3.
△ Less
Submitted 28 November, 2018; v1 submitted 5 November, 2018;
originally announced November 2018.
-
A Deep Neural Network for Pixel-Level Electromagnetic Particle Identification in the MicroBooNE Liquid Argon Time Projection Chamber
Authors:
MicroBooNE collaboration,
C. Adams,
M. Alrashed,
R. An,
J. Anthony,
J. Asaadi,
A. Ashkenazi,
M. Auger,
S. Balasubramanian,
B. Baller,
C. Barnes,
G. Barr,
M. Bass,
F. Bay,
A. Bhat,
K. Bhattacharya,
M. Bishai,
A. Blake,
T. Bolton,
L. Camilleri,
D. Caratelli,
I. Caro Terrazas,
R. Carr,
R. Castillo Fernandez,
F. Cavanna
, et al. (148 additional authors not shown)
Abstract:
We have developed a convolutional neural network (CNN) that can make a pixel-level prediction of objects in image data recorded by a liquid argon time projection chamber (LArTPC) for the first time. We describe the network design, training techniques, and software tools developed to train this network. The goal of this work is to develop a complete deep neural network based data reconstruction cha…
▽ More
We have developed a convolutional neural network (CNN) that can make a pixel-level prediction of objects in image data recorded by a liquid argon time projection chamber (LArTPC) for the first time. We describe the network design, training techniques, and software tools developed to train this network. The goal of this work is to develop a complete deep neural network based data reconstruction chain for the MicroBooNE detector. We show the first demonstration of a network's validity on real LArTPC data using MicroBooNE collection plane images. The demonstration is performed for stopping muon and a $ν_μ$ charged current neutral pion data samples.
△ Less
Submitted 22 August, 2018;
originally announced August 2018.
-
Polarity and Intensity: the Two Aspects of Sentiment Analysis
Authors:
Leimin Tian,
Catherine Lai,
Johanna D. Moore
Abstract:
Current multimodal sentiment analysis frames sentiment score prediction as a general Machine Learning task. However, what the sentiment score actually represents has often been overlooked. As a measurement of opinions and affective states, a sentiment score generally consists of two aspects: polarity and intensity. We decompose sentiment scores into these two aspects and study how they are conveye…
▽ More
Current multimodal sentiment analysis frames sentiment score prediction as a general Machine Learning task. However, what the sentiment score actually represents has often been overlooked. As a measurement of opinions and affective states, a sentiment score generally consists of two aspects: polarity and intensity. We decompose sentiment scores into these two aspects and study how they are conveyed through individual modalities and combined multimodal models in a naturalistic monologue setting. In particular, we build unimodal and multimodal multi-task learning models with sentiment score prediction as the main task and polarity and/or intensity classification as the auxiliary tasks. Our experiments show that sentiment analysis benefits from multi-task learning, and individual modalities differ when conveying the polarity and intensity aspects of sentiment.
△ Less
Submitted 4 July, 2018;
originally announced July 2018.
-
PACER: Peripheral Activity Completion Estimation and Recognition
Authors:
Daniel Moore,
Alexander Dean
Abstract:
Embedded peripheral devices such as memories, sensors and communications interfaces are used to perform a function external to a host microcontroller. The device manufacturer typically specifies worst-case current consumption and latency estimates for each of these peripheral actions. Peripheral Activity Completion, Estimation and Recognition (PACER) is introduced as a suite of algorithms that can…
▽ More
Embedded peripheral devices such as memories, sensors and communications interfaces are used to perform a function external to a host microcontroller. The device manufacturer typically specifies worst-case current consumption and latency estimates for each of these peripheral actions. Peripheral Activity Completion, Estimation and Recognition (PACER) is introduced as a suite of algorithms that can be applied to detect completed peripheral operations in real-time. By detecting activity completion, PACER enables the host to exploit slack between the worst-case estimate and the actual response time. These methods were tested independently and in conjunction with IODVS on multiple common peripheral devices. For the peripheral devices under test, the test fixture confirmed decreases in energy expenditures of up to 80% and latency reductions of up to 67%.
△ Less
Submitted 14 January, 2018;
originally announced January 2018.
-
TensorFlow Distributions
Authors:
Joshua V. Dillon,
Ian Langmore,
Dustin Tran,
Eugene Brevdo,
Srinivas Vasudevan,
Dave Moore,
Brian Patton,
Alex Alemi,
Matt Hoffman,
Rif A. Saurous
Abstract:
The TensorFlow Distributions library implements a vision of probability theory adapted to the modern deep-learning paradigm of end-to-end differentiable computation. Building on two basic abstractions, it offers flexible building blocks for probabilistic computation. Distributions provide fast, numerically stable methods for generating samples and computing statistics, e.g., log density. Bijectors…
▽ More
The TensorFlow Distributions library implements a vision of probability theory adapted to the modern deep-learning paradigm of end-to-end differentiable computation. Building on two basic abstractions, it offers flexible building blocks for probabilistic computation. Distributions provide fast, numerically stable methods for generating samples and computing statistics, e.g., log density. Bijectors provide composable volume-tracking transformations with automatic caching. Together these enable modular construction of high dimensional distributions and transformations not possible with previous libraries (e.g., pixelCNNs, autoregressive flows, and reversible residual networks). They are the workhorse behind deep probabilistic programming systems like Edward and empower fast black-box inference in probabilistic models built on deep-network components. TensorFlow Distributions has proven an important part of the TensorFlow toolkit within Google and in the broader deep learning community.
△ Less
Submitted 28 November, 2017;
originally announced November 2017.
-
Meta-Learning MCMC Proposals
Authors:
Tongzhou Wang,
Yi Wu,
David A. Moore,
Stuart J. Russell
Abstract:
Effective implementations of sampling-based probabilistic inference often require manually constructed, model-specific proposals. Inspired by recent progresses in meta-learning for training learning agents that can generalize to unseen environments, we propose a meta-learning approach to building effective and generalizable MCMC proposals. We parametrize the proposal as a neural network to provide…
▽ More
Effective implementations of sampling-based probabilistic inference often require manually constructed, model-specific proposals. Inspired by recent progresses in meta-learning for training learning agents that can generalize to unseen environments, we propose a meta-learning approach to building effective and generalizable MCMC proposals. We parametrize the proposal as a neural network to provide fast approximations to block Gibbs conditionals. The learned neural proposals generalize to occurrences of common structural motifs across different models, allowing for the construction of a library of learned inference primitives that can accelerate inference on unseen models with no model-specific training required. We explore several applications including open-universe Gaussian mixture models, in which our learned proposals outperform a hand-tuned sampler, and a real-world named entity recognition task, in which our sampler yields higher final F1 scores than classical single-site Gibbs sampling.
△ Less
Submitted 1 January, 2019; v1 submitted 20 August, 2017;
originally announced August 2017.
-
Signal-based Bayesian Seismic Monitoring
Authors:
David A. Moore,
Stuart J. Russell
Abstract:
Detecting weak seismic events from noisy sensors is a difficult perceptual task. We formulate this task as Bayesian inference and propose a generative model of seismic events and signals across a network of spatially distributed stations. Our system, SIGVISA, is the first to directly model seismic waveforms, allowing it to incorporate a rich representation of the physics underlying the signal gene…
▽ More
Detecting weak seismic events from noisy sensors is a difficult perceptual task. We formulate this task as Bayesian inference and propose a generative model of seismic events and signals across a network of spatially distributed stations. Our system, SIGVISA, is the first to directly model seismic waveforms, allowing it to incorporate a rich representation of the physics underlying the signal generation process. We use Gaussian processes over wavelet parameters to predict detailed waveform fluctuations based on historical events, while degrading smoothly to simple parametric envelopes in regions with no historical seismicity. Evaluating on data from the western US, we recover three times as many events as previous work, and reduce mean location errors by a factor of four while greatly increasing sensitivity to low-magnitude events.
△ Less
Submitted 1 March, 2017;
originally announced March 2017.
-
Gaussian Process Random Fields
Authors:
David A. Moore,
Stuart J. Russell
Abstract:
Gaussian processes have been successful in both supervised and unsupervised machine learning tasks, but their computational complexity has constrained practical applications. We introduce a new approximation for large-scale Gaussian processes, the Gaussian Process Random Field (GPRF), in which local GPs are coupled via pairwise potentials. The GPRF likelihood is a simple, tractable, and paralleliz…
▽ More
Gaussian processes have been successful in both supervised and unsupervised machine learning tasks, but their computational complexity has constrained practical applications. We introduce a new approximation for large-scale Gaussian processes, the Gaussian Process Random Field (GPRF), in which local GPs are coupled via pairwise potentials. The GPRF likelihood is a simple, tractable, and parallelizeable approximation to the full GP marginal likelihood, enabling latent variable modeling and hyperparameter selection on large datasets. We demonstrate its effectiveness on synthetic spatial data as well as a real-world application to seismic event location.
△ Less
Submitted 30 October, 2015;
originally announced November 2015.
-
A nested transaction mechanism for LOCUS
Authors:
Erik T. Mueller,
Johanna D. Moore,
Gerald J. Popek
Abstract:
A working implementation of nested transactions has been produced for LOCUS, an integrated distributed operating system which provides a high degree of network transparency. Several aspects of our mechanism are novel. First, the mechanism allows a transaction to access objects directly without regard to the location of the object. Second, processes running on behalf of a single transaction may b…
▽ More
A working implementation of nested transactions has been produced for LOCUS, an integrated distributed operating system which provides a high degree of network transparency. Several aspects of our mechanism are novel. First, the mechanism allows a transaction to access objects directly without regard to the location of the object. Second, processes running on behalf of a single transaction may be located at many sites. Thus there is no need to invoke a new transaction to perform processing or access objects at a remote site. Third, unlike other environments, LOCUS allows replication of data objects at more than one site in the network, and this capability is incorporated into the transaction mechanism. If the copy of an object that is currently being accessed becomes unavailable, it is possible to continue work by using another one of the replicated copies. Finally, an efficient orphan removal algorithm is presented, and the problem of providing continued operation during network partitions is addressed in detail.
△ Less
Submitted 10 December, 1998;
originally announced December 1998.
-
An Empirical Investigation of Proposals in Collaborative Dialogues
Authors:
Barbara Di Eugenio,
Pamela W. Jordan,
Johanna D. Moore,
Richmond H. Thomason
Abstract:
We describe a corpus-based investigation of proposals in dialogue. First, we describe our DRI compliant coding scheme and report our inter-coder reliability results. Next, we test several hypotheses about what constitutes a well-formed proposal.
We describe a corpus-based investigation of proposals in dialogue. First, we describe our DRI compliant coding scheme and report our inter-coder reliability results. Next, we test several hypotheses about what constitutes a well-formed proposal.
△ Less
Submitted 25 June, 1998;
originally announced June 1998.
-
Learning Features that Predict Cue Usage
Authors:
Barbara Di Eugenio,
Johanna D. Moore,
Massimo Paolucci
Abstract:
Our goal is to identify the features that predict the occurrence and placement of discourse cues in tutorial explanations in order to aid in the automatic generation of explanations. Previous attempts to devise rules for text generation were based on intuition or small numbers of constructed examples. We apply a machine learning program, C4.5, to induce decision trees for cue occurrence and plac…
▽ More
Our goal is to identify the features that predict the occurrence and placement of discourse cues in tutorial explanations in order to aid in the automatic generation of explanations. Previous attempts to devise rules for text generation were based on intuition or small numbers of constructed examples. We apply a machine learning program, C4.5, to induce decision trees for cue occurrence and placement from a corpus of data coded for a variety of features previously thought to affect cue usage. Our experiments enable us to identify the features with most predictive power, and show that machine learning can be used to induce decision trees useful for text generation.
△ Less
Submitted 21 October, 1997;
originally announced October 1997.
-
DPOCL: A Principled Approach to Discourse Planning
Authors:
R. Michael Young,
Johanna D. Moore
Abstract:
Research in discourse processing has identified two representational requirements for discourse planning systems. First, discourse plans must adequately represent the intentional structure of the utterances they produce in order to enable a computational discourse agent to respond effectively to communicative failures \cite{MooreParisCL}. Second, discourse plans must represent the informational…
▽ More
Research in discourse processing has identified two representational requirements for discourse planning systems. First, discourse plans must adequately represent the intentional structure of the utterances they produce in order to enable a computational discourse agent to respond effectively to communicative failures \cite{MooreParisCL}. Second, discourse plans must represent the informational structure of utterances. In addition to these representational requirements, we argue that discourse planners should be formally characterizable in terms of soundness and completeness.
△ Less
Submitted 10 June, 1994;
originally announced June 1994.
-
Towards a Principled Representation of Discourse Plans
Authors:
R. Michael Young,
Johanna D. Moore,
Martha E. Pollack
Abstract:
We argue that discourse plans must capture the intended causal and decompositional relations between communicative actions. We present a planning algorithm, DPOCL, that builds plan structures that properly capture these relations, and show how these structures are used to solve the problems that plagued previous discourse planners, and allow a system to participate effectively and flexibly in an…
▽ More
We argue that discourse plans must capture the intended causal and decompositional relations between communicative actions. We present a planning algorithm, DPOCL, that builds plan structures that properly capture these relations, and show how these structures are used to solve the problems that plagued previous discourse planners, and allow a system to participate effectively and flexibly in an ongoing dialogue.
△ Less
Submitted 1 June, 1994;
originally announced June 1994.