Search | arXiv e-print repository

Safe and Reliable Training of Learning-Based Aerospace Controllers

Authors: Udayan Mandal, Guy Amir, Haoze Wu, Ieva Daukantas, Fletcher Lee Newell, Umberto Ravaioli, Baoluo Meng, Michael Durling, Kerianne Hobbs, Milan Ganai, Tobey Shim, Guy Katz, Clark Barrett

Abstract: In recent years, deep reinforcement learning (DRL) approaches have generated highly successful controllers for a myriad of complex domains. However, the opaque nature of these models limits their applicability in aerospace systems and safety-critical domains, in which a single mistake can have dire consequences. In this paper, we present novel advancements in both the training and verification of… ▽ More In recent years, deep reinforcement learning (DRL) approaches have generated highly successful controllers for a myriad of complex domains. However, the opaque nature of these models limits their applicability in aerospace systems and safety-critical domains, in which a single mistake can have dire consequences. In this paper, we present novel advancements in both the training and verification of DRL controllers, which can help ensure their safe behavior. We showcase a design-for-verification approach utilizing k-induction and demonstrate its use in verifying liveness properties. In addition, we also give a brief overview of neural Lyapunov Barrier certificates and summarize their capabilities on a case study. Finally, we describe several other novel reachability-based approaches which, despite failing to provide guarantees of interest, could be effective for verification of other DRL systems, and could be of further interest to the community. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: 10 pages, 3 figures

arXiv:2406.18912 [pdf, ps, other]

The nonexistence of unicorns and many-sorted Löwenheim-Skolem theorems

Authors: Benjamin Przybocki, Guilherme Toledo, Yoni Zohar, Clark Barrett

Abstract: Stable infiniteness, strong finite witnessability, and smoothness are model-theoretic properties relevant to theory combination in satisfiability modulo theories. Theories that are strongly finitely witnessable and smooth are called strongly polite and can be effectively combined with other theories. Toledo, Zohar, and Barrett conjectured that stably infinite and strongly finitely witnessable theo… ▽ More Stable infiniteness, strong finite witnessability, and smoothness are model-theoretic properties relevant to theory combination in satisfiability modulo theories. Theories that are strongly finitely witnessable and smooth are called strongly polite and can be effectively combined with other theories. Toledo, Zohar, and Barrett conjectured that stably infinite and strongly finitely witnessable theories are smooth and therefore strongly polite. They called counterexamples to this conjecture unicorn theories, as their existence seemed unlikely. We prove that, indeed, unicorns do not exist. We also prove versions of the Löwenheim-Skolem theorem and the Łoś-Vaught test for many-sorted logic. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: To appear in FM24

arXiv:2406.15882 [pdf, ps, other]

Equivalence Hypergraphs: E-Graphs for Monoidal Theories

Authors: Dan R. Ghica, Chris Barrett, Aleksei Tiurin

Abstract: The technique of equipping graphs with an equivalence relation, called equality saturation, has recently proved both powerful and practical in program optimisation, particularly for satisfiability modulo theory solvers. We give a categorical semantics to these structures, called e-graphs, in terms of Cartesian categories enriched over a semilattice. We show how this semantics can be generalised to… ▽ More The technique of equipping graphs with an equivalence relation, called equality saturation, has recently proved both powerful and practical in program optimisation, particularly for satisfiability modulo theory solvers. We give a categorical semantics to these structures, called e-graphs, in terms of Cartesian categories enriched over a semilattice. We show how this semantics can be generalised to monoidal categories, which opens the door to new applications of e-graph techniques, from algebraic to monoidal theories. Finally, we present a sound and complete combinatorial representation of morphisms in such a category, based on a generalisation of hypergraphs which we call e-hypergraphs. They have the usual advantage that many of their structural equations are absorbed into a general notion of isomorphism. △ Less

Submitted 22 June, 2024; originally announced June 2024.

arXiv:2405.14058 [pdf, other]

Formally Verifying Deep Reinforcement Learning Controllers with Lyapunov Barrier Certificates

Authors: Udayan Mandal, Guy Amir, Haoze Wu, Ieva Daukantas, Fletcher Lee Newell, Umberto J. Ravaioli, Baoluo Meng, Michael Durling, Milan Ganai, Tobey Shim, Guy Katz, Clark Barrett

Abstract: Deep reinforcement learning (DRL) is a powerful machine learning paradigm for generating agents that control autonomous systems. However, the ``black box'' nature of DRL agents limits their deployment in real-world safety-critical applications. A promising approach for providing strong guarantees on an agent's behavior is to use Neural Lyapunov Barrier (NLB) certificates, which are learned functio… ▽ More Deep reinforcement learning (DRL) is a powerful machine learning paradigm for generating agents that control autonomous systems. However, the ``black box'' nature of DRL agents limits their deployment in real-world safety-critical applications. A promising approach for providing strong guarantees on an agent's behavior is to use Neural Lyapunov Barrier (NLB) certificates, which are learned functions over the system whose properties indirectly imply that an agent behaves as desired. However, NLB-based certificates are typically difficult to learn and even more difficult to verify, especially for complex systems. In this work, we present a novel method for training and verifying NLB-based certificates for discrete-time systems. Specifically, we introduce a technique for certificate composition, which simplifies the verification of highly-complex systems by strategically designing a sequence of certificates. When jointly verified with neural network verification engines, these certificates provide a formal guarantee that a DRL agent both achieves its goals and avoids unsafe behavior. Furthermore, we introduce a technique for certificate filtering, which significantly simplifies the process of producing formally verified certificates. We demonstrate the merits of our approach with a case study on providing safety and liveness guarantees for a DRL-controlled spacecraft. △ Less

Submitted 14 August, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

Comments: To appear in FMCAD 2024

arXiv:2405.10801 [pdf, ps, other]

doi 10.1145/3661814.3662091

The Relational Machine Calculus

Authors: Chris Barrett, Daniel Castle, Willem Heijltjes

Abstract: This paper presents the Relational Machine Calculus (RMC): a simple, foundational model of first-order relational programming. The RMC originates from the Functional Machine Calculus (FMC), which generalizes the lambda-calculus and its standard call-by-name stack machine in two directions. One, "locations", introduces multiple stacks, which enable effect operators to be encoded into the abstractio… ▽ More This paper presents the Relational Machine Calculus (RMC): a simple, foundational model of first-order relational programming. The RMC originates from the Functional Machine Calculus (FMC), which generalizes the lambda-calculus and its standard call-by-name stack machine in two directions. One, "locations", introduces multiple stacks, which enable effect operators to be encoded into the abstraction and application constructs. The second, "sequencing", introduces the imperative notions of "skip" and "sequence", similar to kappa-calculus and concatenative programming languages. The key observation of the RMC is that the first-order fragment of the FMC exhibits a latent duality which, given a simple decomposition of the relevant constructors, can be concretely expressed as an involution on syntax. Semantically, this gives rise to a sound and complete calculus for string diagrams of Frobenius monoids. We consider unification as the corresponding symmetric generalization of beta-reduction. By further including standard operators of Kleene algebra, the RMC embeds a range of computational models: the kappa-calculus, logic programming, automata, Interaction Nets, and Petri Nets, among others. These embeddings preserve operational semantics, which for the RMC is again given by a generalization of the standard stack machine for the lambda-calculus. The equational theory of the RMC (which supports reasoning about its operational semantics) is conservative over both the first-order lambda-calculus and Kleene algebra, and can be oriented to give a confluent reduction relation. △ Less

Submitted 17 May, 2024; originally announced May 2024.

Comments: LICS paper, 15 pages excluding references

MSC Class: 03B70 ACM Class: D.3.1; D.3.2; F.4.1; F.1.1; I.2.3

arXiv:2405.06624 [pdf, other]

Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

Authors: David "davidad" Dalrymple, Joar Skalse, Yoshua Bengio, Stuart Russell, Max Tegmark, Sanjit Seshia, Steve Omohundro, Christian Szegedy, Ben Goldhaber, Nora Ammann, Alessandro Abate, Joe Halpern, Clark Barrett, Ding Zhao, Tan Zhi-Xuan, Jeannette Wing, Joshua Tenenbaum

Abstract: Ensuring that AI systems reliably and robustly avoid harmful or dangerous behaviours is a crucial challenge, especially for AI systems with a high degree of autonomy and general intelligence, or systems used in safety-critical contexts. In this paper, we will introduce and define a family of approaches to AI safety, which we will refer to as guaranteed safe (GS) AI. The core feature of these appro… ▽ More Ensuring that AI systems reliably and robustly avoid harmful or dangerous behaviours is a crucial challenge, especially for AI systems with a high degree of autonomy and general intelligence, or systems used in safety-critical contexts. In this paper, we will introduce and define a family of approaches to AI safety, which we will refer to as guaranteed safe (GS) AI. The core feature of these approaches is that they aim to produce AI systems which are equipped with high-assurance quantitative safety guarantees. This is achieved by the interplay of three core components: a world model (which provides a mathematical description of how the AI system affects the outside world), a safety specification (which is a mathematical description of what effects are acceptable), and a verifier (which provides an auditable proof certificate that the AI satisfies the safety specification relative to the world model). We outline a number of approaches for creating each of these three core components, describe the main technical challenges, and suggest a number of potential solutions to them. We also argue for the necessity of this approach to AI safety, and for the inadequacy of the main alternative approaches. △ Less

Submitted 8 July, 2024; v1 submitted 10 May, 2024; originally announced May 2024.

arXiv:2405.06127 [pdf, other]

Efficiently Synthesizing Lowest Cost Rewrite Rules for Instruction Selection

Authors: Ross Daly, Caleb Donovick, Caleb Terrill, Jackson Melchert, Priyanka Raina, Clark Barrett, Pat Hanrahan

Abstract: Compiling programs to an instruction set architecture (ISA) requires a set of rewrite rules that map patterns consisting of compiler instructions to patterns consisting of ISA instructions. We synthesize such rules by constructing SMT queries, whose solutions represent two functionally equivalent programs. These two programs are interpreted as an instruction selection rewrite rule. Existing work i… ▽ More Compiling programs to an instruction set architecture (ISA) requires a set of rewrite rules that map patterns consisting of compiler instructions to patterns consisting of ISA instructions. We synthesize such rules by constructing SMT queries, whose solutions represent two functionally equivalent programs. These two programs are interpreted as an instruction selection rewrite rule. Existing work is limited to single-instruction ISA patterns, whereas our solution does not have that restriction. Furthermore, we address inefficiencies of existing work by developing two optimized algorithms. The first only generates unique rules by preventing synthesis of duplicate and composite rules. The second only generates lowest-cost rules by preventing synthesis of higher-cost rules. We evaluate our algorithms on multiple ISAs. Without our optimizations, the vast majority of synthesized rewrite rules are either duplicates, composites, or higher cost. Our optimizations result in synthesis speed-ups of up to 768x and 4004x for the two algorithms. △ Less

Submitted 17 May, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

Comments: 9 pages, 2 figures, in submission to FMCAD24, fixed small bugs in results generation, improved baseline comparison, improved background and writing clarity

arXiv:2405.03057 [pdf, ps, other]

Verifying SQL Queries using Theories of Tables and Relations

Authors: Mudathir Mohamed, Andrew Reynolds, Cesare Tinelli, Clark Barrett

Abstract: We present a number of first- and second-order extensions to SMT theories specifically aimed at representing and analyzing SQL queries with join, projection, and selection operations. We support reasoning about SQL queries with either bag or set semantics for database tables. We provide the former via an extension of a theory of finite bags and the latter via an extension of the theory of finite r… ▽ More We present a number of first- and second-order extensions to SMT theories specifically aimed at representing and analyzing SQL queries with join, projection, and selection operations. We support reasoning about SQL queries with either bag or set semantics for database tables. We provide the former via an extension of a theory of finite bags and the latter via an extension of the theory of finite relations. Furthermore, we add the ability to reason about tables with null values by introducing a theory of nullable sorts based on an extension of the theory of algebraic datatypes. We implemented solvers for these theories in the SMT solver cvc5 and evaluated them on a set of benchmarks derived from public sets of SQL equivalence problems. △ Less

Submitted 27 May, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

arXiv:2404.18988 [pdf, other]

Markovian Agents for Informative Language Modeling

Authors: Scott Viteri, Max Lamparth, Peter Chatain, Clark Barrett

Abstract: Chain-of-Thought (CoT) reasoning could in principle enable a deeper understanding of a language model's (LM) internal reasoning. However, prior work suggests that LMs can answer questions similarly despite changes in their CoT, suggesting that those models are not truly using the CoT. We propose an reinforcement learning technique to produce CoTs that are sufficient alone for predicting future tex… ▽ More Chain-of-Thought (CoT) reasoning could in principle enable a deeper understanding of a language model's (LM) internal reasoning. However, prior work suggests that LMs can answer questions similarly despite changes in their CoT, suggesting that those models are not truly using the CoT. We propose an reinforcement learning technique to produce CoTs that are sufficient alone for predicting future text, independent of other context. This methodology ensures that if the LM can predict future tokens, then it must have used the CoT to understand its context. We formalize the informativeness of a sender to a receiver LM as the degree to which the sender helps the receiver predict their future observations, and we define a "Markovian" LM as one which predicts future text given only a CoT as context. We derive a "Markovian training" procedure by applying our definition of informativeness to a Markovian LM and optimizing via policy gradient and Proximal Policy Optimization (PPO). We demonstrate our training algorithm's effectiveness on fifteen-term arithmetic problems, show the model utilizes the CoT, and externally validate that the generated CoT is meaningful and usable by another model. △ Less

Submitted 22 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

Comments: 21 pages, 8 figures

arXiv:2404.16122 [pdf, ps, other]

Generalized Optimization Modulo Theories

Authors: Nestan Tsiskaridze, Clark Barrett, Cesare Tinelli

Abstract: Optimization Modulo Theories (OMT) has emerged as an important extension of the highly successful Satisfiability Modulo Theories (SMT) paradigm. The OMT problem requires solving an SMT problem with the restriction that the solution must be optimal with respect to a given objective function. We introduce a generalization of the OMT problem where, in particular, objective functions can range over pa… ▽ More Optimization Modulo Theories (OMT) has emerged as an important extension of the highly successful Satisfiability Modulo Theories (SMT) paradigm. The OMT problem requires solving an SMT problem with the restriction that the solution must be optimal with respect to a given objective function. We introduce a generalization of the OMT problem where, in particular, objective functions can range over partially ordered sets. We provide a formalization of and an abstract calculus for the generalized OMT problem and prove their key correctness properties. Generalized OMT extends previous work on OMT in several ways. First, in contrast to many current OMT solvers, our calculus is theory-agnostic, enabling the optimization of queries over any theories or combinations thereof. Second, our formalization unifies both single- and multi-objective optimization problems, allowing us to study them both in a single framework and facilitating the use of objective functions that are not supported by existing OMT approaches. Finally, our calculus is sufficiently general to fully capture a wide variety of current OMT approaches (each of which can be realized as a specific strategy for rule application in the calculus) and to support the exploration of new search strategies. Much like the original abstract DPLL(T) calculus for SMT, our Generalized OMT calculus is designed to establish a theoretical foundation for understanding and research and to serve as a framework for studying variations of and extensions to existing OMT methodologies. △ Less

Submitted 28 April, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

arXiv:2401.14461 [pdf, other]

Marabou 2.0: A Versatile Formal Analyzer of Neural Networks

Authors: Haoze Wu, Omri Isac, Aleksandar Zeljić, Teruhiro Tagomori, Matthew Daggitt, Wen Kokke, Idan Refaeli, Guy Amir, Kyle Julian, Shahaf Bassan, Pei Huang, Ori Lahav, Min Wu, Min Zhang, Ekaterina Komendantskaya, Guy Katz, Clark Barrett

Abstract: This paper serves as a comprehensive system description of version 2.0 of the Marabou framework for formal analysis of neural networks. We discuss the tool's architectural design and highlight the major features and components introduced since its initial release. This paper serves as a comprehensive system description of version 2.0 of the Marabou framework for formal analysis of neural networks. We discuss the tool's architectural design and highlight the major features and components introduced since its initial release. △ Less

Submitted 20 May, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

Comments: Condensed version accepted at CAV'24

arXiv:2401.04827 [pdf]

The site linkage spectrum of data arrays

Authors: Christopher Barrett, Andrei Bura, Fenix Huang, Christian Reidys

Abstract: A new perspective is introduced regarding the analysis of Multiple Sequence Alignments (MSA), representing aligned data defined over a finite alphabet of symbols. The framework is designed to produce a block decomposition of an MSA, where each block is comprised of sequences exhibiting a certain site-coherence. The key component of this framework is an information theoretical potential defined on… ▽ More A new perspective is introduced regarding the analysis of Multiple Sequence Alignments (MSA), representing aligned data defined over a finite alphabet of symbols. The framework is designed to produce a block decomposition of an MSA, where each block is comprised of sequences exhibiting a certain site-coherence. The key component of this framework is an information theoretical potential defined on pairs of sites (links) within the MSA. This potential quantifies the expected drop in variation of information between the two constituent sites, where the expectation is taken with respect to all possible sub-alignments, obtained by removing a finite, fixed collection of rows. It is proved that the potential is zero for linked sites representing columns, whose symbols are in bijective correspondence and it is strictly positive, otherwise. It is furthermore shown that the potential assumes its unique minimum for links at which each symbol pair appears with the same multiplicity. Finally, an application is presented regarding anomaly detection in an MSA, composed of inverse fold solutions of a fixed tRNA secondary structure, where the anomalies are represented by inverse fold solutions of a different RNA structure. △ Less

Submitted 9 January, 2024; originally announced January 2024.

Comments: 15 pages, 2 figures

MSC Class: 94D99 (Primary)

arXiv:2312.12679 [pdf, other]

Towards Efficient Verification of Quantized Neural Networks

Authors: Pei Huang, Haoze Wu, Yuting Yang, Ieva Daukantas, Min Wu, Yedi Zhang, Clark Barrett

Abstract: Quantization replaces floating point arithmetic with integer arithmetic in deep neural network models, providing more efficient on-device inference with less power and memory. In this work, we propose a framework for formally verifying properties of quantized neural networks. Our baseline technique is based on integer linear programming which guarantees both soundness and completeness. We then sho… ▽ More Quantization replaces floating point arithmetic with integer arithmetic in deep neural network models, providing more efficient on-device inference with less power and memory. In this work, we propose a framework for formally verifying properties of quantized neural networks. Our baseline technique is based on integer linear programming which guarantees both soundness and completeness. We then show how efficiency can be improved by utilizing gradient-based heuristic search methods and also bound-propagation techniques. We evaluate our approach on perception networks quantized with PyTorch. Our results show that we can verify quantized networks with better scalability and efficiency than the previous state of the art. △ Less

Submitted 27 December, 2023; v1 submitted 19 December, 2023; originally announced December 2023.

Comments: This paper has been accepted by AAAI2024

arXiv:2312.07104 [pdf, other]

SGLang: Efficient Execution of Structured Language Model Programs

Authors: Lianmin Zheng, Liangsheng Yin, Zhiqiang Xie, Chuyue Sun, Jeff Huang, Cody Hao Yu, Shiyi Cao, Christos Kozyrakis, Ion Stoica, Joseph E. Gonzalez, Clark Barrett, Ying Sheng

Abstract: Large language models (LLMs) are increasingly used for complex tasks that require multiple generation calls, advanced prompting techniques, control flow, and structured inputs/outputs. However, efficient systems are lacking for programming and executing these applications. We introduce SGLang, a system for efficient execution of complex language model programs. SGLang consists of a frontend langua… ▽ More Large language models (LLMs) are increasingly used for complex tasks that require multiple generation calls, advanced prompting techniques, control flow, and structured inputs/outputs. However, efficient systems are lacking for programming and executing these applications. We introduce SGLang, a system for efficient execution of complex language model programs. SGLang consists of a frontend language and a runtime. The frontend simplifies programming with primitives for generation and parallelism control. The runtime accelerates execution with novel optimizations like RadixAttention for KV cache reuse and compressed finite state machines for faster structured output decoding. Experiments show that SGLang achieves up to 6.4x higher throughput compared to state-of-the-art inference systems on various large language and multi-modal models on tasks including agent control, logical reasoning, few-shot learning benchmarks, JSON decoding, retrieval-augmented generation pipelines, and multi-turn chat. The code is publicly available at https://github.com/sgl-project/sglang △ Less

Submitted 5 June, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

arXiv:2310.17807 [pdf, other]

Clover: Closed-Loop Verifiable Code Generation

Authors: Chuyue Sun, Ying Sheng, Oded Padon, Clark Barrett

Abstract: The use of large language models for code generation is a rapidly growing trend in software development. However, without effective methods for ensuring the correctness of generated code, this trend could lead to any number of undesirable outcomes. In this paper, we lay out a vision for addressing this challenge: the Clover paradigm, short for Closed-Loop Verifiable Code Generation, which reduces… ▽ More The use of large language models for code generation is a rapidly growing trend in software development. However, without effective methods for ensuring the correctness of generated code, this trend could lead to any number of undesirable outcomes. In this paper, we lay out a vision for addressing this challenge: the Clover paradigm, short for Closed-Loop Verifiable Code Generation, which reduces correctness checking to the more accessible problem of consistency checking. At the core of Clover lies a checker that performs consistency checks among code, docstrings, and formal annotations. The checker is implemented using a novel integration of formal verification tools and large language models. We provide a theoretical analysis to support our thesis that Clover should be effective at consistency checking. We also empirically investigate its feasibility on a hand-designed dataset (CloverBench) featuring annotated Dafny programs at a textbook level of difficulty. Experimental results show that for this dataset, (i) LLMs are reasonably successful at automatically generating formal specifications; and (ii) our consistency checker achieves a promising acceptance rate (up to 87%) for correct instances while maintaining zero tolerance for incorrect ones (no false positives). △ Less

Submitted 3 June, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

arXiv:2310.04870 [pdf, other]

Lemur: Integrating Large Language Models in Automated Program Verification

Authors: Haoze Wu, Clark Barrett, Nina Narodytska

Abstract: The demonstrated code-understanding capability of LLMs raises the question of whether they can be used for automated program verification, a task that demands high-level abstract reasoning about program properties that is challenging for verification tools. We propose a general methodology to combine the power of LLMs and automated reasoners for automated program verification. We formally describe… ▽ More The demonstrated code-understanding capability of LLMs raises the question of whether they can be used for automated program verification, a task that demands high-level abstract reasoning about program properties that is challenging for verification tools. We propose a general methodology to combine the power of LLMs and automated reasoners for automated program verification. We formally describe this methodology as a set of transition rules and prove its soundness. We instantiate the calculus as a sound automated verification procedure and demonstrate practical improvements on a set of synthetic and competition benchmarks. △ Less

Submitted 24 April, 2024; v1 submitted 7 October, 2023; originally announced October 2023.

Comments: Accepted at ICLR'24

arXiv:2308.14840 [pdf, other]

doi 10.1561/3300000041

Identifying and Mitigating the Security Risks of Generative AI

Authors: Clark Barrett, Brad Boyd, Elie Burzstein, Nicholas Carlini, Brad Chen, Jihye Choi, Amrita Roy Chowdhury, Mihai Christodorescu, Anupam Datta, Soheil Feizi, Kathleen Fisher, Tatsunori Hashimoto, Dan Hendrycks, Somesh Jha, Daniel Kang, Florian Kerschbaum, Eric Mitchell, John Mitchell, Zulfikar Ramzan, Khawaja Shams, Dawn Song, Ankur Taly, Diyi Yang

Abstract: Every major technical invention resurfaces the dual-use dilemma -- the new technology has the potential to be used for good as well as for harm. Generative AI (GenAI) techniques, such as large language models (LLMs) and diffusion models, have shown remarkable capabilities (e.g., in-context learning, code-completion, and text-to-image generation and editing). However, GenAI can be used just as well… ▽ More Every major technical invention resurfaces the dual-use dilemma -- the new technology has the potential to be used for good as well as for harm. Generative AI (GenAI) techniques, such as large language models (LLMs) and diffusion models, have shown remarkable capabilities (e.g., in-context learning, code-completion, and text-to-image generation and editing). However, GenAI can be used just as well by attackers to generate new attacks and increase the velocity and efficacy of existing attacks. This paper reports the findings of a workshop held at Google (co-organized by Stanford University and the University of Wisconsin-Madison) on the dual-use dilemma posed by GenAI. This paper is not meant to be comprehensive, but is rather an attempt to synthesize some of the interesting findings from the workshop. We discuss short-term and long-term goals for the community on this topic. We hope this paper provides both a launching point for a discussion on this important topic as well as interesting problems that the research community can work to address. △ Less

Submitted 28 December, 2023; v1 submitted 28 August, 2023; originally announced August 2023.

Journal ref: Foundations and Trends in Privacy and Security 6 (2023) 1-52

arXiv:2308.13106 [pdf, other]

PEak: A Single Source of Truth for Hardware Design and Verification

Authors: Caleb Donovick, Ross Daly, Jackson Melchert, Lenny Truong, Priyanka Raina, Pat Hanrahan, Clark Barrett

Abstract: Domain-specific languages for hardware can significantly enhance designer productivity, but sometimes at the cost of ease of verification. On the other hand, ISA specification languages are too static to be used during early stage design space exploration. We present PEak, an open-source hardware design and specification language, which aims to improve both design productivity and verification cap… ▽ More Domain-specific languages for hardware can significantly enhance designer productivity, but sometimes at the cost of ease of verification. On the other hand, ISA specification languages are too static to be used during early stage design space exploration. We present PEak, an open-source hardware design and specification language, which aims to improve both design productivity and verification capability. PEak does this by providing a single source of truth for functional models, formal specifications, and RTL. PEak has been used in several academic projects, and PEak-generated RTL has been included in three fabricated hardware accelerators. In these projects, the formal capabilities of PEak were crucial for enabling both novel design space exploration techniques and automated compiler synthesis. △ Less

Submitted 24 August, 2023; originally announced August 2023.

arXiv:2307.07885 [pdf, ps, other]

Combining Finite Combination Properties: Finite Models and Busy Beavers

Authors: Guilherme Toledo, Yoni Zohar, Clark Barrett

Abstract: This work is a part of an ongoing effort to understand the relationships between properties used in theory combination. We here focus on including two properties that are related to shiny theories: the finite model property and stable finiteness. For any combination of properties, we consider the question of whether there exists a theory that exhibits it. When there is, we provide an example with… ▽ More This work is a part of an ongoing effort to understand the relationships between properties used in theory combination. We here focus on including two properties that are related to shiny theories: the finite model property and stable finiteness. For any combination of properties, we consider the question of whether there exists a theory that exhibits it. When there is, we provide an example with the simplest possible signature. One particular class of interest includes theories with the finite model property that are not finitely witnessable. To construct such theories, we utilize the Busy Beaver function. △ Less

Submitted 15 July, 2023; originally announced July 2023.

arXiv:2306.14048 [pdf, other]

H$_2$O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models

Authors: Zhenyu Zhang, Ying Sheng, Tianyi Zhou, Tianlong Chen, Lianmin Zheng, Ruisi Cai, Zhao Song, Yuandong Tian, Christopher Ré, Clark Barrett, Zhangyang Wang, Beidi Chen

Abstract: Large Language Models (LLMs), despite their recent impressive accomplishments, are notably cost-prohibitive to deploy, particularly for applications involving long-content generation, such as dialogue systems and story writing. Often, a large amount of transient state information, referred to as the KV cache, is stored in GPU memory in addition to model parameters, scaling linearly with the sequen… ▽ More Large Language Models (LLMs), despite their recent impressive accomplishments, are notably cost-prohibitive to deploy, particularly for applications involving long-content generation, such as dialogue systems and story writing. Often, a large amount of transient state information, referred to as the KV cache, is stored in GPU memory in addition to model parameters, scaling linearly with the sequence length and batch size. In this paper, we introduce a novel approach for implementing the KV cache which significantly reduces its memory footprint. Our approach is based on the noteworthy observation that a small portion of tokens contributes most of the value when computing attention scores. We call these tokens Heavy Hitters (H$_2$). Through a comprehensive investigation, we find that (i) the emergence of H$_2$ is natural and strongly correlates with the frequent co-occurrence of tokens in the text, and (ii) removing them results in significant performance degradation. Based on these insights, we propose Heavy Hitter Oracle (H$_2$O), a KV cache eviction policy that dynamically retains a balance of recent and H$_2$ tokens. We formulate the KV cache eviction as a dynamic submodular problem and prove (under mild assumptions) a theoretical guarantee for our novel eviction algorithm which could help guide future work. We validate the accuracy of our algorithm with OPT, LLaMA, and GPT-NeoX across a wide range of tasks. Our implementation of H$_2$O with 20% heavy hitters improves the throughput over three leading inference systems DeepSpeed Zero-Inference, Hugging Face Accelerate, and FlexGen by up to 29$\times$, 29$\times$, and 3$\times$ on OPT-6.7B and OPT-30B. With the same batch size, H2O can reduce the latency by up to 1.9$\times$. The code is available at https://github.com/FMInference/H2O. △ Less

Submitted 18 December, 2023; v1 submitted 24 June, 2023; originally announced June 2023.

arXiv:2306.05854 [pdf, other]

Partitioning Strategies for Distributed SMT Solving

Authors: Amalee Wilson, Andres Noetzli, Andrew Reynolds, Byron Cook, Cesare Tinelli, Clark Barrett

Abstract: For many users of Satisfiability Modulo Theories (SMT) solvers, the solver's performance is the main bottleneck in their application. One promising approach for improving performance is to leverage the increasing availability of parallel and cloud computing. However, despite many efforts, the best parallel approach to date consists of running a portfolio of solvers, meaning that performance is sti… ▽ More For many users of Satisfiability Modulo Theories (SMT) solvers, the solver's performance is the main bottleneck in their application. One promising approach for improving performance is to leverage the increasing availability of parallel and cloud computing. However, despite many efforts, the best parallel approach to date consists of running a portfolio of solvers, meaning that performance is still limited by the best possible sequential performance. In this paper, we revisit divide-and-conquer approaches to parallel SMT, in which a challenging problem is partitioned into several subproblems. We introduce several new partitioning strategies and evaluate their performance, both alone as well as within portfolios, on a large set of difficult SMT benchmarks. We show that hybrid portfolios that include our new strategies can significantly outperform traditional portfolios for parallel SMT. △ Less

Submitted 8 June, 2023; originally announced June 2023.

Comments: Submitted to FMCAD 2023

arXiv:2306.02003 [pdf, other]

On Optimal Caching and Model Multiplexing for Large Model Inference

Authors: Banghua Zhu, Ying Sheng, Lianmin Zheng, Clark Barrett, Michael I. Jordan, Jiantao Jiao

Abstract: Large Language Models (LLMs) and other large foundation models have achieved noteworthy success, but their size exacerbates existing resource consumption and latency challenges. In particular, the large-scale deployment of these models is hindered by the significant resource requirements during inference. In this paper, we study two approaches for mitigating these challenges: employing a cache to… ▽ More Large Language Models (LLMs) and other large foundation models have achieved noteworthy success, but their size exacerbates existing resource consumption and latency challenges. In particular, the large-scale deployment of these models is hindered by the significant resource requirements during inference. In this paper, we study two approaches for mitigating these challenges: employing a cache to store previous queries and learning a model multiplexer to choose from an ensemble of models for query processing. Theoretically, we provide an optimal algorithm for jointly optimizing both approaches to reduce the inference cost in both offline and online tabular settings. By combining a caching algorithm, namely Greedy Dual Size with Frequency (GDSF) or Least Expected Cost (LEC), with a model multiplexer, we achieve optimal rates in both offline and online settings. Empirically, simulations show that the combination of our caching and model multiplexing algorithms greatly improves over the baselines, with up to $50\times$ improvement over the baseline when the ratio between the maximum cost and minimum cost is $100$. Experiments on real datasets show a $4.3\times$ improvement in FLOPs over the baseline when the ratio for FLOPs is $10$, and a $1.8\times$ improvement in latency when the ratio for average latency is $1.85$. △ Less

Submitted 28 August, 2023; v1 submitted 3 June, 2023; originally announced June 2023.

arXiv:2305.16073 [pdf, ps, other]

On the Simply-Typed Functional Machine Calculus: Categorical Semantics and Strong Normalisation

Authors: Chris Barrett

Abstract: The Functional Machine Calculus (FMC) was recently introduced as a generalization of the lambda-calculus to include higher-order global state, probabilistic and non-deterministic choice, and input and output, while retaining confluence. The calculus can encode both the call-by-name and call-by-value semantics of these effects. This is enabled by two independent generalisations, both natural from t… ▽ More The Functional Machine Calculus (FMC) was recently introduced as a generalization of the lambda-calculus to include higher-order global state, probabilistic and non-deterministic choice, and input and output, while retaining confluence. The calculus can encode both the call-by-name and call-by-value semantics of these effects. This is enabled by two independent generalisations, both natural from the perspective of the FMC's operational semantics, which is given by a simple multi-stack machine. The first generalization decomposes the syntax of the lambda-calculus in a way that allows for sequential composition of terms and the encoding of reduction strategies. Specifically, there exist translations of the call-by-name and call-by-value lambda-calculus which preserve operational semantics. The second parameterizes application and abstraction in terms of 'locations' (corresponding to the multiple stacks of the machine), which gives a unification of the operational semantics, syntax, and reduction rules of the given effects with those of the lambda-calculus. The FMC further comes equipped with a simple type system which restricts and captures the behaviour of effects. This thesis makes two main contributions, showing that two fundamental properties of the lambda-calculus are preserved by the FMC. The first is to show that the categorical semantics of the FMC, modulo an appropriate equational theory, is given by the free Cartesian closed category. The equational theory is validated by a notion of observational equivalence. The second contribution is a proof that typed FMC-terms are strongly normalising. This is an extension (and small simplification) of Gandy's proof for the lambda-calculus, which additionally emphasizes its latent operational intuition. △ Less

Submitted 25 May, 2023; originally announced May 2023.

Comments: Ph.D. thesis; 158 pages; 11 figures

ACM Class: F.1.1; F.3.2

arXiv:2305.11087 [pdf, other]

Lightweight Online Learning for Sets of Related Problems in Automated Reasoning

Authors: Haoze Wu, Christopher Hahn, Florian Lonsing, Makai Mann, Raghuram Ramanujan, Clark Barrett

Abstract: We present Self-Driven Strategy Learning ($\textit{sdsl}$), a lightweight online learning methodology for automated reasoning tasks that involve solving a set of related problems. $\textit{sdsl}$ does not require offline training, but instead automatically constructs a dataset while solving earlier problems. It fits a machine learning model to this data which is then used to adjust the solving str… ▽ More We present Self-Driven Strategy Learning ($\textit{sdsl}$), a lightweight online learning methodology for automated reasoning tasks that involve solving a set of related problems. $\textit{sdsl}$ does not require offline training, but instead automatically constructs a dataset while solving earlier problems. It fits a machine learning model to this data which is then used to adjust the solving strategy for later problems. We formally define the approach as a set of abstract transition rules. We describe a concrete instance of the sdsl calculus which uses conditional sampling for generating data and random forests as the underlying machine learning model. We implement the approach on top of the Kissat solver and show that the combination of Kissat+$\textit{sdsl}$ certifies larger bounds and finds more counter-examples than other state-of-the-art bounded model checking approaches on benchmarks obtained from the latest Hardware Model Checking Competition. △ Less

Submitted 15 August, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

Comments: Condensed version published at FMCAD'23

arXiv:2305.06064 [pdf, other]

DNN Verification, Reachability, and the Exponential Function Problem

Authors: Omri Isac, Yoni Zohar, Clark Barrett, Guy Katz

Abstract: Deep neural networks (DNNs) are increasingly being deployed to perform safety-critical tasks. The opacity of DNNs, which prevents humans from reasoning about them, presents new safety and security challenges. To address these challenges, the verification community has begun developing techniques for rigorously analyzing DNNs, with numerous verification algorithms proposed in recent years. While a… ▽ More Deep neural networks (DNNs) are increasingly being deployed to perform safety-critical tasks. The opacity of DNNs, which prevents humans from reasoning about them, presents new safety and security challenges. To address these challenges, the verification community has begun developing techniques for rigorously analyzing DNNs, with numerous verification algorithms proposed in recent years. While a significant amount of work has gone into developing these verification algorithms, little work has been devoted to rigorously studying the computability and complexity of the underlying theoretical problems. Here, we seek to contribute to the bridging of this gap. We focus on two kinds of DNNs: those that employ piecewise-linear activation functions (e.g., ReLU), and those that employ piecewise-smooth activation functions (e.g., Sigmoids). We prove the two following theorems: 1) The decidability of verifying DNNs with a particular set of piecewise-smooth activation functions is equivalent to a well-known, open problem formulated by Tarski; and 2) The DNN verification problem for any quantifier-free linear arithmetic specification can be reduced to the DNN reachability problem, whose approximation is NP-complete. These results answer two fundamental questions about the computability and complexity of DNN verification, and the ways it is affected by the network's activation functions and error tolerance; and could help guide future efforts in developing DNN verification tools. △ Less

Submitted 10 July, 2023; v1 submitted 10 May, 2023; originally announced May 2023.

Comments: This is a preprint version of the paper that appears at CONCUR 2023

arXiv:2305.02384 [pdf, ps, other]

Combining Combination Properties: An Analysis of Stable Infiniteness, Convexity, and Politeness

Authors: Guilherme Vicentin de Toledo, Yoni Zohar, Clark Barrett

Abstract: We make two contributions to the study of theory combination in satisfiability modulo theories. The first is a table of examples for the combinations of the most common model-theoretic properties in theory combination, namely stable infiniteness, smoothness, convexity, finite witnessability, and strong finite witnessability (and therefore politeness and strong politeness as well). All of our examp… ▽ More We make two contributions to the study of theory combination in satisfiability modulo theories. The first is a table of examples for the combinations of the most common model-theoretic properties in theory combination, namely stable infiniteness, smoothness, convexity, finite witnessability, and strong finite witnessability (and therefore politeness and strong politeness as well). All of our examples are sharp, in the sense that we also offer proofs that no theories are available within simpler signatures. This table significantly progresses the current understanding of the various properties and their interactions. The most remarkable example in this table is of a theory over a single sort that is polite but not strongly polite (the existence of such a theory was only known until now for two-sorted signatures). The second contribution is a new combination theorem showing that in order to apply polite theory combination, it is sufficient for one theory to be stably infinite and strongly finitely witnessable, thus showing that smoothness is not a critical property in this combination method. This result has the potential to greatly simplify the process of showing which theories can be used in polite combination, as showing stable infiniteness is considerably simpler than showing smoothness. △ Less

Submitted 3 May, 2023; originally announced May 2023.

arXiv:2303.13697 [pdf, other]

Soy: An Efficient MILP Solver for Piecewise-Affine Systems

Authors: Haoze Wu, Min Wu, Dorsa Sadigh, Clark Barrett

Abstract: Piecewise-affine (PWA) systems are widely used for modeling and control of robotics problems including modeling contact dynamics. A common approach is to encode the control problem of the PWA system as a Mixed-Integer Convex Program (MICP), which can be solved by general-purpose off-the-shelf MICP solvers. To mitigate the scalability challenge of solving these MICP problems, existing work focuses… ▽ More Piecewise-affine (PWA) systems are widely used for modeling and control of robotics problems including modeling contact dynamics. A common approach is to encode the control problem of the PWA system as a Mixed-Integer Convex Program (MICP), which can be solved by general-purpose off-the-shelf MICP solvers. To mitigate the scalability challenge of solving these MICP problems, existing work focuses on devising efficient and strong formulations of the problems, while less effort has been spent on exploiting their specific structure to develop specialized solvers. The latter is the theme of our work. We focus on efficiently handling one-hot constraints, which are particularly relevant when encoding PWA dynamics. We have implemented our techniques in a tool, Soy, which organically integrates logical reasoning, arithmetic reasoning, and stochastic local search. For a set of PWA control benchmarks, Soy solves more problems, faster, than two state-of-the-art MICP solvers. △ Less

Submitted 15 August, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

Comments: Same version published at IROS'2023

arXiv:2303.06865 [pdf, other]

FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU

Authors: Ying Sheng, Lianmin Zheng, Binhang Yuan, Zhuohan Li, Max Ryabinin, Daniel Y. Fu, Zhiqiang Xie, Beidi Chen, Clark Barrett, Joseph E. Gonzalez, Percy Liang, Christopher Ré, Ion Stoica, Ce Zhang

Abstract: The high computational and memory requirements of large language model (LLM) inference make it feasible only with multiple high-end accelerators. Motivated by the emerging demand for latency-insensitive tasks with batched processing, this paper initiates the study of high-throughput LLM inference using limited resources, such as a single commodity GPU. We present FlexGen, a high-throughput generat… ▽ More The high computational and memory requirements of large language model (LLM) inference make it feasible only with multiple high-end accelerators. Motivated by the emerging demand for latency-insensitive tasks with batched processing, this paper initiates the study of high-throughput LLM inference using limited resources, such as a single commodity GPU. We present FlexGen, a high-throughput generation engine for running LLMs with limited GPU memory. FlexGen can be flexibly configured under various hardware resource constraints by aggregating memory and computation from the GPU, CPU, and disk. By solving a linear programming problem, it searches for efficient patterns to store and access tensors. FlexGen further compresses the weights and the attention cache to 4 bits with negligible accuracy loss. These techniques enable FlexGen to have a larger space of batch size choices and thus significantly increase maximum throughput. As a result, when running OPT-175B on a single 16GB GPU, FlexGen achieves significantly higher throughput compared to state-of-the-art offloading systems, reaching a generation throughput of 1 token/s for the first time with an effective batch size of 144. On the HELM benchmark, FlexGen can benchmark a 30B model with a 16GB GPU on 7 representative sub-scenarios in 21 hours. The code is available at https://github.com/FMInference/FlexGen △ Less

Submitted 12 June, 2023; v1 submitted 13 March, 2023; originally announced March 2023.

arXiv:2303.01713 [pdf, other]

Convex Bounds on the Softmax Function with Applications to Robustness Verification

Authors: Dennis Wei, Haoze Wu, Min Wu, Pin-Yu Chen, Clark Barrett, Eitan Farchi

Abstract: The softmax function is a ubiquitous component at the output of neural networks and increasingly in intermediate layers as well. This paper provides convex lower bounds and concave upper bounds on the softmax function, which are compatible with convex optimization formulations for characterizing neural networks and other ML models. We derive bounds using both a natural exponential-reciprocal decom… ▽ More The softmax function is a ubiquitous component at the output of neural networks and increasingly in intermediate layers as well. This paper provides convex lower bounds and concave upper bounds on the softmax function, which are compatible with convex optimization formulations for characterizing neural networks and other ML models. We derive bounds using both a natural exponential-reciprocal decomposition of the softmax as well as an alternative decomposition in terms of the log-sum-exp function. The new bounds are provably and/or numerically tighter than linear bounds obtained in previous work on robustness verification of transformers. As illustrations of the utility of the bounds, we apply them to verification of transformers as well as of the robustness of predictive uncertainty estimates of deep ensembles. △ Less

Submitted 3 March, 2023; originally announced March 2023.

Comments: AISTATS 2023

arXiv:2212.01051 [pdf, other]

VeriX: Towards Verified Explainability of Deep Neural Networks

Authors: Min Wu, Haoze Wu, Clark Barrett

Abstract: We present VeriX (Verified eXplainability), a system for producing optimal robust explanations and generating counterfactuals along decision boundaries of machine learning models. We build such explanations and counterfactuals iteratively using constraint solving techniques and a heuristic based on feature-level sensitivity ranking. We evaluate our method on image recognition benchmarks and a real… ▽ More We present VeriX (Verified eXplainability), a system for producing optimal robust explanations and generating counterfactuals along decision boundaries of machine learning models. We build such explanations and counterfactuals iteratively using constraint solving techniques and a heuristic based on feature-level sensitivity ranking. We evaluate our method on image recognition benchmarks and a real-world scenario of autonomous aircraft taxiing. △ Less

Submitted 25 September, 2023; v1 submitted 2 December, 2022; originally announced December 2022.

arXiv:2211.13140 [pdf, other]

The Functional Machine Calculus II: Semantics

Authors: Chris Barrett, Willem Heijltjes, Guy McCusker

Abstract: The Functional Machine Calculus (FMC), recently introduced by the authors, is a generalization of the lambda-calculus which may faithfully encode the effects of higher-order mutable store, I/O and probabilistic/non-deterministic input. Significantly, it remains confluent and can be simply typed in the presence of these effects. In this paper, we explore the denotational semantics of the FMC. We ha… ▽ More The Functional Machine Calculus (FMC), recently introduced by the authors, is a generalization of the lambda-calculus which may faithfully encode the effects of higher-order mutable store, I/O and probabilistic/non-deterministic input. Significantly, it remains confluent and can be simply typed in the presence of these effects. In this paper, we explore the denotational semantics of the FMC. We have three main contributions: first, we argue that its syntax -- in which both effects and lambda-calculus are realised using the same syntactic constructs -- is semantically natural, corresponding closely to the structure of a Scott-style domain theoretic semantics. Second, we show that simple types confer strong normalization by extending Gandy's proof for the lambda-calculus, including a small simplification of the technique. Finally, we show that the typed FMC (without considering the specifics of encoded effects), modulo an appropriate equational theory, is a complete language for Cartesian closed categories. △ Less

Submitted 5 February, 2023; v1 submitted 23 November, 2022; originally announced November 2022.

Comments: 40 pages, published in Computer Science Logic 2023 Updated to conform to published version

ACM Class: F.1.1; F.3.2

arXiv:2210.12871 [pdf, other]

Tighter Abstract Queries in Neural Network Verification

Authors: Elazar Cohen, Yizhak Yisrael Elboher, Clark Barrett, Guy Katz

Abstract: Neural networks have become critical components of reactive systems in various domains within computer science. Despite their excellent performance, using neural networks entails numerous risks that stem from our lack of ability to understand and reason about their behavior. Due to these risks, various formal methods have been proposed for verifying neural networks; but unfortunately, these typica… ▽ More Neural networks have become critical components of reactive systems in various domains within computer science. Despite their excellent performance, using neural networks entails numerous risks that stem from our lack of ability to understand and reason about their behavior. Due to these risks, various formal methods have been proposed for verifying neural networks; but unfortunately, these typically struggle with scalability barriers. Recent attempts have demonstrated that abstraction-refinement approaches could play a significant role in mitigating these limitations; but these approaches can often produce networks that are so abstract, that they become unsuitable for verification. To deal with this issue, we present CEGARETTE, a novel verification mechanism where both the system and the property are abstracted and refined simultaneously. We observe that this approach allows us to produce abstract networks which are both small and sufficiently accurate, allowing for quick verification times while avoiding a large number of refinement steps. For evaluation purposes, we implemented CEGARETTE as an extension to the recently proposed CEGAR-NN framework. Our results are very promising, and demonstrate a significant improvement in performance over multiple benchmarks. △ Less

Submitted 14 May, 2023; v1 submitted 23 October, 2022; originally announced October 2022.

arXiv:2209.05201 [pdf, other]

Proof-Stitch: Proof Combination for Divide and Conquer SAT Solvers

Authors: Abhishek Nair, Saranyu Chattopadhyay, Haoze Wu, Alex Ozdemir, Clark Barrett

Abstract: With the increasing availability of parallel computing power, there is a growing focus on parallelizing algorithms for important automated reasoning problems such as Boolean satisfiability (SAT). Divide-and-Conquer (D&C) is a popular parallel SAT solving paradigm that partitions SAT instances into independent sub-problems which are then solved in parallel. For unsatisfiable instances, state-of-the… ▽ More With the increasing availability of parallel computing power, there is a growing focus on parallelizing algorithms for important automated reasoning problems such as Boolean satisfiability (SAT). Divide-and-Conquer (D&C) is a popular parallel SAT solving paradigm that partitions SAT instances into independent sub-problems which are then solved in parallel. For unsatisfiable instances, state-of-the-art D&C solvers generate DRAT refutations for each sub-problem. However, they do not generate a single refutation for the original instance. To close this gap, we present Proof-Stitch, a procedure for combining refutations of different sub-problems into a single refutation for the original instance. We prove the correctness of the procedure and propose optimizations to reduce the size and checking time of the combined refutations by invoking existing trimming tools in the proof-combination process. We also provide an extensible implementation of the proposed technique. Experiments on instances from last year's SAT competition show that the optimized refutations are checkable up to seven times faster than unoptimized refutations. △ Less

Submitted 4 September, 2022; originally announced September 2022.

Comments: 6 pages

arXiv:2208.07669 [pdf, ps, other]

On Optimizing Back-Substitution Methods for Neural Network Verification

Authors: Tom Zelazny, Haoze Wu, Clark Barrett, Guy Katz

Abstract: With the increasing application of deep learning in mission-critical systems, there is a growing need to obtain formal guarantees about the behaviors of neural networks. Indeed, many approaches for verifying neural networks have been recently proposed, but these generally struggle with limited scalability or insufficient accuracy. A key component in many state-of-the-art verification schemes is co… ▽ More With the increasing application of deep learning in mission-critical systems, there is a growing need to obtain formal guarantees about the behaviors of neural networks. Indeed, many approaches for verifying neural networks have been recently proposed, but these generally struggle with limited scalability or insufficient accuracy. A key component in many state-of-the-art verification schemes is computing lower and upper bounds on the values that neurons in the network can obtain for a specific input domain -- and the tighter these bounds, the more likely the verification is to succeed. Many common algorithms for computing these bounds are variations of the symbolic-bound propagation method; and among these, approaches that utilize a process called back-substitution are particularly successful. In this paper, we present an approach for making back-substitution produce tighter bounds. To achieve this, we formulate and then minimize the imprecision errors incurred during back-substitution. Our technique is general, in the sense that it can be integrated into numerous existing symbolic-bound propagation techniques, with only minor modifications. We implement our approach as a proof-of-concept tool, and present favorable results compared to state-of-the-art verifiers that perform back-substitution. △ Less

Submitted 16 August, 2022; originally announced August 2022.

Comments: This is the extended version of a paper with the same title that appeared at FMCAD 2022

MSC Class: 68Q60

arXiv:2206.03669 [pdf, other]

Toward Certified Robustness Against Real-World Distribution Shifts

Authors: Haoze Wu, Teruhiro Tagomori, Alexander Robey, Fengjun Yang, Nikolai Matni, George Pappas, Hamed Hassani, Corina Pasareanu, Clark Barrett

Abstract: We consider the problem of certifying the robustness of deep neural networks against real-world distribution shifts. To do so, we bridge the gap between hand-crafted specifications and realistic deployment settings by proposing a novel neural-symbolic verification framework, in which we train a generative model to learn perturbations from data and define specifications with respect to the output o… ▽ More We consider the problem of certifying the robustness of deep neural networks against real-world distribution shifts. To do so, we bridge the gap between hand-crafted specifications and realistic deployment settings by proposing a novel neural-symbolic verification framework, in which we train a generative model to learn perturbations from data and define specifications with respect to the output of the learned model. A unique challenge arising from this setting is that existing verifiers cannot tightly approximate sigmoid activations, which are fundamental to many state-of-the-art generative models. To address this challenge, we propose a general meta-algorithm for handling sigmoid activations which leverages classical notions of counter-example-guided abstraction refinement. The key idea is to "lazily" refine the abstraction of sigmoid functions to exclude spurious counter-examples found in the previous abstraction, thus guaranteeing progress in the verification process while keeping the state-space small. Experiments on the MNIST and CIFAR-10 datasets show that our framework significantly outperforms existing methods on a range of challenging distribution shifts. △ Less

Submitted 6 March, 2023; v1 submitted 8 June, 2022; originally announced June 2022.

Comments: SatML'23. Keywords: certified robustness, distribution shift, generative models, S-shaped activations, CEGAR

arXiv:2206.00512 [pdf, other]

Neural Network Verification with Proof Production

Authors: Omri Isac, Clark Barrett, Min Zhang, Guy Katz

Abstract: Deep neural networks (DNNs) are increasingly being employed in safety-critical systems, and there is an urgent need to guarantee their correctness. Consequently, the verification community has devised multiple techniques and tools for verifying DNNs. When DNN verifiers discover an input that triggers an error, that is easy to confirm; but when they report that no error exists, there is no way to e… ▽ More Deep neural networks (DNNs) are increasingly being employed in safety-critical systems, and there is an urgent need to guarantee their correctness. Consequently, the verification community has devised multiple techniques and tools for verifying DNNs. When DNN verifiers discover an input that triggers an error, that is easy to confirm; but when they report that no error exists, there is no way to ensure that the verification tool itself is not flawed. As multiple errors have already been observed in DNN verification tools, this calls the applicability of DNN verification into question. In this work, we present a novel mechanism for enhancing Simplex-based DNN verifiers with proof production capabilities: the generation of an easy-to-check witness of unsatisfiability, which attests to the absence of errors. Our proof production is based on an efficient adaptation of the well-known Farkas' lemma, combined with mechanisms for handling piecewise-linear functions and numerical precision errors. As a proof of concept, we implemented our technique on top of the Marabou DNN verifier. Our evaluation on a safety-critical system for airborne collision avoidance shows that proof production succeeds in almost all cases and requires only minimal overhead. △ Less

Submitted 27 August, 2022; v1 submitted 1 June, 2022; originally announced June 2022.

Comments: This is a preprint version of the paper that appeared at FMCAD 2022

arXiv:2205.14867 [pdf, other]

Measuring and mitigating voting access disparities: a study of race and polling locations in Florida and North Carolina

Authors: Mohsen Abbasi, Suresh Venkatasubramanian, Sorelle A. Friedler, Kristian Lum, Calvin Barrett

Abstract: Voter suppression and associated racial disparities in access to voting are long-standing civil rights concerns in the United States. Barriers to voting have taken many forms over the decades. A history of violent explicit discouragement has shifted to more subtle access limitations that can include long lines and wait times, long travel times to reach a polling station, and other logistical barri… ▽ More Voter suppression and associated racial disparities in access to voting are long-standing civil rights concerns in the United States. Barriers to voting have taken many forms over the decades. A history of violent explicit discouragement has shifted to more subtle access limitations that can include long lines and wait times, long travel times to reach a polling station, and other logistical barriers to voting. Our focus in this work is on quantifying disparities in voting access pertaining to the overall time-to-vote, and how they could be remedied via a better choice of polling location or provisioning more sites where voters can cast ballots. However, appropriately calibrating access disparities is difficult because of the need to account for factors such as population density and different community expectations for reasonable travel times. In this paper, we quantify access to polling locations, developing a methodology for the calibrated measurement of racial disparities in polling location "load" and distance to polling locations. We apply this methodology to a study of real-world data from Florida and North Carolina to identify disparities in voting access from the 2020 election. We also introduce algorithms, with modifications to handle scale, that can reduce these disparities by suggesting new polling locations from a given list of identified public locations (including schools and libraries). Applying these algorithms on the 2020 election location data also helps to expose and explore tradeoffs between the cost of allocating more polling locations and the potential impact on access disparities. The developed voting access measurement methodology and algorithmic remediation technique is a first step in better polling location assignment. △ Less

Submitted 30 May, 2022; originally announced May 2022.

arXiv:2205.08095 [pdf, other]

Reasoning About Vectors using an SMT Theory of Sequences

Authors: Ying Sheng, Andres Nötzli, Andrew Reynolds, Yoni Zohar, David Dill, Wolfgang Grieskamp, Junkil Park, Shaz Qadeer, Clark Barrett, Cesare Tinelli

Abstract: Dynamic arrays, also referred to as vectors, are fundamental data structures used in many programs. Modeling their semantics efficiently is crucial when reasoning about such programs. The theory of arrays is widely supported but is not ideal, because the number of elements is fixed (determined by its index sort) and cannot be adjusted, which is a problem, given that the length of vectors often pla… ▽ More Dynamic arrays, also referred to as vectors, are fundamental data structures used in many programs. Modeling their semantics efficiently is crucial when reasoning about such programs. The theory of arrays is widely supported but is not ideal, because the number of elements is fixed (determined by its index sort) and cannot be adjusted, which is a problem, given that the length of vectors often plays an important role when reasoning about vector programs. In this paper, we propose reasoning about vectors using a theory of sequences. We introduce the theory, propose a basic calculus adapted from one for the theory of strings, and extend it to efficiently handle common vector operations. We prove that our calculus is sound and show how to construct a model when it terminates with a saturated configuration. Finally, we describe an implementation of the calculus in cvc5 and demonstrate its efficacy by evaluating it on verification conditions for smart contracts and benchmarks derived from existing array benchmarks. △ Less

Submitted 21 May, 2022; v1 submitted 17 May, 2022; originally announced May 2022.

Comments: IJCAR 2022

arXiv:2203.11201 [pdf, other]

Efficient Neural Network Analysis with Sum-of-Infeasibilities

Authors: Haoze Wu, Aleksandar Zeljić, Guy Katz, Clark Barrett

Abstract: Inspired by sum-of-infeasibilities methods in convex optimization, we propose a novel procedure for analyzing verification queries on neural networks with piecewise-linear activation functions. Given a convex relaxation which over-approximates the non-convex activation functions, we encode the violations of activation functions as a cost function and optimize it with respect to the convex relaxati… ▽ More Inspired by sum-of-infeasibilities methods in convex optimization, we propose a novel procedure for analyzing verification queries on neural networks with piecewise-linear activation functions. Given a convex relaxation which over-approximates the non-convex activation functions, we encode the violations of activation functions as a cost function and optimize it with respect to the convex relaxation. The cost function, referred to as the Sum-of-Infeasibilities (SoI), is designed so that its minimum is zero and achieved only if all the activation functions are satisfied. We propose a stochastic procedure, DeepSoI, to efficiently minimize the SoI. An extension to a canonical case-analysis-based complete search procedure can be achieved by replacing the convex procedure executed at each search state with DeepSoI. Extending the complete search with DeepSoI achieves multiple simultaneous goals: 1) it guides the search towards a counter-example; 2) it enables more informed branching decisions; and 3) it creates additional opportunities for bound derivation. An extensive evaluation across different benchmarks and solvers demonstrates the benefit of the proposed techniques. In particular, we demonstrate that SoI significantly improves the performance of an existing complete search procedure. Moreover, the SoI-based implementation outperforms other state-of-the-art complete verifiers. We also show that our technique can efficiently improve upon the perturbation bound derived by a recent adversarial attack algorithm. △ Less

Submitted 19 March, 2022; originally announced March 2022.

Comments: TACAS'22

arXiv:2203.03153 [pdf, other]

doi 10.1145/3563325

Scalable Verification of GNN-based Job Schedulers

Authors: Haoze Wu, Clark Barrett, Mahmood Sharif, Nina Narodytska, Gagandeep Singh

Abstract: Recently, Graph Neural Networks (GNNs) have been applied for scheduling jobs over clusters, achieving better performance than hand-crafted heuristics. Despite their impressive performance, concerns remain over whether these GNN-based job schedulers meet users' expectations about other important properties, such as strategy-proofness, sharing incentive, and stability. In this work, we consider form… ▽ More Recently, Graph Neural Networks (GNNs) have been applied for scheduling jobs over clusters, achieving better performance than hand-crafted heuristics. Despite their impressive performance, concerns remain over whether these GNN-based job schedulers meet users' expectations about other important properties, such as strategy-proofness, sharing incentive, and stability. In this work, we consider formal verification of GNN-based job schedulers. We address several domain-specific challenges such as networks that are deeper and specifications that are richer than those encountered when verifying image and NLP classifiers. We develop vegas, the first general framework for verifying both single-step and multi-step properties of these schedulers based on carefully designed algorithms that combine abstractions, refinements, solvers, and proof transfer. Our experimental results show that vegas achieves significant speed-up when verifying important properties of a state-of-the-art GNN-based scheduler compared to previous methods. △ Less

Submitted 15 September, 2022; v1 submitted 7 March, 2022; originally announced March 2022.

Comments: Condensed version published at OOPSLA'22

arXiv:2201.01978 [pdf, other]

An Abstraction-Refinement Approach to Verifying Convolutional Neural Networks

Authors: Matan Ostrovsky, Clark Barrett, Guy Katz

Abstract: Convolutional neural networks have gained vast popularity due to their excellent performance in the fields of computer vision, image processing, and others. Unfortunately, it is now well known that convolutional networks often produce erroneous results - for example, minor perturbations of the inputs of these networks can result in severe classification errors. Numerous verification approaches hav… ▽ More Convolutional neural networks have gained vast popularity due to their excellent performance in the fields of computer vision, image processing, and others. Unfortunately, it is now well known that convolutional networks often produce erroneous results - for example, minor perturbations of the inputs of these networks can result in severe classification errors. Numerous verification approaches have been proposed in recent years to prove the absence of such errors, but these are typically geared for fully connected networks and suffer from exacerbated scalability issues when applied to convolutional networks. To address this gap, we present here the Cnn-Abs framework, which is particularly aimed at the verification of convolutional networks. The core of Cnn-Abs is an abstraction-refinement technique, which simplifies the verification problem through the removal of convolutional connections in a way that soundly creates an over-approximation of the original problem; and which restores these connections if the resulting problem becomes too abstract. Cnn-Abs is designed to use existing verification engines as a backend, and our evaluation demonstrates that it can significantly boost the performance of a state-of-the-art DNN verification engine, reducing runtime by 15.7% on average. △ Less

Submitted 6 January, 2022; originally announced January 2022.

MSC Class: 68Q60

arXiv:2108.06081 [pdf, ps, other]

doi 10.34727/2021/isbn.978-3-85448-046-4_12

Scaling Up Hardware Accelerator Verification using A-QED with Functional Decomposition

Authors: Saranyu Chattopadhyay, Florian Lonsing, Luca Piccolboni, Deepraj Soni, Peng Wei, Xiaofan Zhang, Yuan Zhou, Luca Carloni, Deming Chen, Jason Cong, Ramesh Karri, Zhiru Zhang, Caroline Trippel, Clark Barrett, Subhasish Mitra

Abstract: Hardware accelerators (HAs) are essential building blocks for fast and energy-efficient computing systems. Accelerator Quick Error Detection (A-QED) is a recent formal technique which uses Bounded Model Checking for pre-silicon verification of HAs. A-QED checks an HA for self-consistency, i.e., whether identical inputs within a sequence of operations always produce the same output. Under modest as… ▽ More Hardware accelerators (HAs) are essential building blocks for fast and energy-efficient computing systems. Accelerator Quick Error Detection (A-QED) is a recent formal technique which uses Bounded Model Checking for pre-silicon verification of HAs. A-QED checks an HA for self-consistency, i.e., whether identical inputs within a sequence of operations always produce the same output. Under modest assumptions, A-QED is both sound and complete. However, as is well-known, large design sizes significantly limit the scalability of formal verification, including A-QED. We overcome this scalability challenge through a new decomposition technique for A-QED, called A-QED with Decomposition (A-QED$^2$). A-QED$^2$ systematically decomposes an HA into smaller, functional sub-modules, called sub-accelerators, which are then verified independently using A-QED. We prove completeness of A-QED$^2$; in particular, if the full HA under verification contains a bug, then A-QED$^2$ ensures detection of that bug during A-QED verification of the corresponding sub-accelerators. Results on over 100 (buggy) versions of a wide variety of HAs with millions of logic gates demonstrate the effectiveness and practicality of A-QED$^2$. △ Less

Submitted 17 August, 2021; v1 submitted 13 August, 2021; originally announced August 2021.

Comments: preprint of a paper to appear at FMCAD 2021, including appendix

arXiv:2108.05987 [pdf, other]

Automating System Configuration

Authors: Nestan Tsiskaridze, Maxwell Strange, Makai Mann, Kavya Sreedhar, Qiaoyi Liu, Mark Horowitz, Clark Barrett

Abstract: The increasing complexity of modern configurable systems makes it critical to improve the level of automation in the process of system configuration. Such automation can also improve the agility of the development cycle, allowing for rapid and automated integration of decoupled workflows. In this paper, we present a new framework for automated configuration of systems representable as state machin… ▽ More The increasing complexity of modern configurable systems makes it critical to improve the level of automation in the process of system configuration. Such automation can also improve the agility of the development cycle, allowing for rapid and automated integration of decoupled workflows. In this paper, we present a new framework for automated configuration of systems representable as state machines. The framework leverages model checking and satisfiability modulo theories (SMT) and can be applied to any application domain representable using SMT formulas. Our approach can also be applied modularly, improving its scalability. Furthermore, we show how optimization can be used to produce configurations that are best according to some metric and also more likely to be understandable to humans. We showcase this framework and its flexibility by using it to configure a CGRA memory tile for various image processing applications. △ Less

Submitted 18 August, 2021; v1 submitted 12 August, 2021; originally announced August 2021.

arXiv:2106.10392 [pdf]

Effective Pre-Silicon Verification of Processor Cores by Breaking the Bounds of Symbolic Quick Error Detection

Authors: Karthik Ganesan, Florian Lonsing, Srinivasa Shashank Nuthakki, Eshan Singh, Mohammad Rahmani Fadiheh, Wolfgang Kunz, Dominik Stoffel, Clark Barrett, Subhasish Mitra

Abstract: We present a novel approach to pre-silicon verification of processor designs. The purpose of pre-silicon verification is to find logic bugs in a design at an early stage and thus avoid time- and cost-intensive post-silicon debugging. Our approach relies on symbolic quick error detection (Symbolic QED, or SQED). SQED is targeted at finding logic bugs in a symbolic representation of a design by comb… ▽ More We present a novel approach to pre-silicon verification of processor designs. The purpose of pre-silicon verification is to find logic bugs in a design at an early stage and thus avoid time- and cost-intensive post-silicon debugging. Our approach relies on symbolic quick error detection (Symbolic QED, or SQED). SQED is targeted at finding logic bugs in a symbolic representation of a design by combining bounded model checking (BMC) with QED tests. QED tests are powerful in generating short sequences of instructions (traces) that trigger bugs. We extend an existing SQED approach with symbolic starting states. This way, we enable the BMC tool to select starting states arbitrarily when generating a trace. To avoid false positives, (e.g., traces starting in unreachable states that may not be-have in accordance with the processor instruction-set architecture), we define constraints to restrict the set of possible starting states. We demonstrate that these constraints, togeth-er with reasonable assumptions about the system behavior, allow us to avoid false positives. Using our approach, we discovered previously unknown bugs in open-source RISC-V processor cores that existing methods cannot detect. Moreover, our novel approach out-performs existing ones in the detection of bugs having long traces and in the detection of hardware Trojans, i.e., unauthorized modifications of a design. △ Less

Submitted 18 June, 2021; originally announced June 2021.

Comments: This article has the full author list which was missing in arXiv:1908.06757. arXiv admin note: substantial text overlap with arXiv:1908.06757

arXiv:2105.09743 [pdf, ps, other]

lazybvtoint at the SMT Competition 2020

Authors: Yoni Zohar, Ahmed Irfan, Makai Mann, Andres Notzli, Andrew Reynolds, Clark Barrett

Abstract: lazybvtoint is a new prototype SMT-solver, that will participate in the incremental and non-incremental tracks of the \qfbv logic. lazybvtoint is a new prototype SMT-solver, that will participate in the incremental and non-incremental tracks of the \qfbv logic. △ Less

Submitted 7 May, 2021; originally announced May 2021.

arXiv:2105.01382 [pdf, other]

doi 10.1145/3545116

A Subatomic Proof System for Decision Trees

Authors: Chris Barrett, Alessio Guglielmi

Abstract: We design a proof system for propositional classical logic that integrates two languages for Boolean functions: standard conjunction-disjunction-negation and binary decision trees. We give two reasons to do so. The first is proof-theoretical naturalness: the system consists of all and only the inference rules generated by the single, simple, linear scheme of the recently introduced subatomic logic… ▽ More We design a proof system for propositional classical logic that integrates two languages for Boolean functions: standard conjunction-disjunction-negation and binary decision trees. We give two reasons to do so. The first is proof-theoretical naturalness: the system consists of all and only the inference rules generated by the single, simple, linear scheme of the recently introduced subatomic logic. Thanks to this regularity, cuts are eliminated via a natural construction. The second reason is that the system generates efficient proofs. Indeed, we show that a certain class of tautologies due to Statman, which cannot have better than exponential cut-free proofs in the sequent calculus, have polynomial cut-free proofs in our system. We achieve this by using the same construction that we use for cut elimination. In summary, by expanding the language of propositional logic, we make its proof theory more regular and generate more proofs, some of which are very efficient. That design is made possible by considering atoms as superpositions of their truth values, which are connected by self-dual, non-commutative connectives. A proof can then be projected via each atom into two proofs, one for each truth value, without a need for cuts. Those projections are semantically natural and are at the heart of the constructions in this paper. To accommodate self-dual non-commutativity, we compose proofs in deep inference. △ Less

Submitted 30 June, 2022; v1 submitted 4 May, 2021; originally announced May 2021.

Comments: To appear on ACM Transactions on Computational Logic

arXiv:2104.14155 [pdf, other]

Automated Design Space Exploration of CGRA Processing Element Architectures using Frequent Subgraph Analysis

Authors: Jackson Melchert, Kathleen Feng, Caleb Donovick, Ross Daly, Clark Barrett, Mark Horowitz, Pat Hanrahan, Priyanka Raina

Abstract: The architecture of a coarse-grained reconfigurable array (CGRA) processing element (PE) has a significant effect on the performance and energy efficiency of an application running on the CGRA. This paper presents an automated approach for generating specialized PE architectures for an application or an application domain. Frequent subgraphs mined from a set of applications are merged to form a PE… ▽ More The architecture of a coarse-grained reconfigurable array (CGRA) processing element (PE) has a significant effect on the performance and energy efficiency of an application running on the CGRA. This paper presents an automated approach for generating specialized PE architectures for an application or an application domain. Frequent subgraphs mined from a set of applications are merged to form a PE architecture specialized to that application domain. For the image processing and machine learning domains, we generate specialized PEs that are up to 10.5x more energy efficient and consume 9.1x less area than a baseline PE. △ Less

Submitted 29 April, 2021; originally announced April 2021.

arXiv:2104.11738 [pdf, ps, other]

Politeness and Stable Infiniteness: Stronger Together

Authors: Ying Sheng, Yoni Zohar, Christophe Ringeissen, Andrew Reynolds, Clark Barrett, Cesare Tinelli

Abstract: We make two contributions to the study of polite combination in satisfiability modulo theories. The first contribution is a separation between politeness and strong politeness, by presenting a polite theory that is not strongly polite. This result shows that proving strong politeness (which is often harder than proving politeness) is sometimes needed in order to use polite combination. The s… ▽ More We make two contributions to the study of polite combination in satisfiability modulo theories. The first contribution is a separation between politeness and strong politeness, by presenting a polite theory that is not strongly polite. This result shows that proving strong politeness (which is often harder than proving politeness) is sometimes needed in order to use polite combination. The second contribution is an optimization to the polite combination method, obtained by borrowing from the Nelson-Oppen method. In its non-deterministic form, the Nelson-Oppen method is based on guessing arrangements over shared variables. In contrast, polite combination requires an arrangement over \emph{all} variables of the shared sort (not just the shared variables). We show that when using polite combination, if the other theory is stably infinite with respect to a shared sort, only the shared variables of that sort need be considered in arrangements, as in the Nelson-Oppen method. Reasoning about arrangements of variables is exponential in the worst case, so reducing the number of variables that are considered has the potential to improve performance significantly. We show preliminary evidence for this in practice by demonstrating a speed-up on a smart contract verification benchmark. △ Less

Submitted 27 April, 2021; v1 submitted 23 April, 2021; originally announced April 2021.

arXiv:2103.01629 [pdf, other]

DeepCert: Verification of Contextually Relevant Robustness for Neural Network Image Classifiers

Authors: Colin Paterson, Haoze Wu, John Grese, Radu Calinescu, Corina S. Pasareanu, Clark Barrett

Abstract: We introduce DeepCert, a tool-supported method for verifying the robustness of deep neural network (DNN) image classifiers to contextually relevant perturbations such as blur, haze, and changes in image contrast. While the robustness of DNN classifiers has been the subject of intense research in recent years, the solutions delivered by this research focus on verifying DNN robustness to small pertu… ▽ More We introduce DeepCert, a tool-supported method for verifying the robustness of deep neural network (DNN) image classifiers to contextually relevant perturbations such as blur, haze, and changes in image contrast. While the robustness of DNN classifiers has been the subject of intense research in recent years, the solutions delivered by this research focus on verifying DNN robustness to small perturbations in the images being classified, with perturbation magnitude measured using established Lp norms. This is useful for identifying potential adversarial attacks on DNN image classifiers, but cannot verify DNN robustness to contextually relevant image perturbations, which are typically not small when expressed with Lp norms. DeepCert addresses this underexplored verification problem by supporting:(1) the encoding of real-world image perturbations; (2) the systematic evaluation of contextually relevant DNN robustness, using both testing and formal verification; (3) the generation of contextually relevant counterexamples; and, through these, (4) the selection of DNN image classifiers suitable for the operational context (i)envisaged when a potentially safety-critical system is designed, or (ii)observed by a deployed system. We demonstrate the effectiveness of DeepCert by showing how it can be used to verify the robustness of DNN image classifiers build for two benchmark datasets (`German Traffic Sign' and `CIFAR-10') to multiple contextually relevant perturbations. △ Less

Submitted 2 March, 2021; originally announced March 2021.

arXiv:2101.06825 [pdf, other]

doi 10.46298/lmcs-18(3:26)2022

Counterexample-Guided Prophecy for Model Checking Modulo the Theory of Arrays

Authors: Makai Mann, Ahmed Irfan, Alberto Griggio, Oded Padon, Clark Barrett

Abstract: We develop a framework for model checking infinite-state systems by automatically augmenting them with auxiliary variables, enabling quantifier-free induction proofs for systems that would otherwise require quantified invariants. We combine this mechanism with a counterexample-guided abstraction refinement scheme for the theory of arrays. Our framework can thus, in many cases, reduce inductive rea… ▽ More We develop a framework for model checking infinite-state systems by automatically augmenting them with auxiliary variables, enabling quantifier-free induction proofs for systems that would otherwise require quantified invariants. We combine this mechanism with a counterexample-guided abstraction refinement scheme for the theory of arrays. Our framework can thus, in many cases, reduce inductive reasoning with quantifiers and arrays to quantifier-free and array-free reasoning. We evaluate the approach on a wide set of benchmarks from the literature. The results show that our implementation often outperforms state-of-the-art tools, demonstrating its practical potential. △ Less

Submitted 30 August, 2022; v1 submitted 17 January, 2021; originally announced January 2021.

Journal ref: Logical Methods in Computer Science, Volume 18, Issue 3 (August 31, 2022) lmcs:8436

Showing 1–50 of 84 results for author: Barrett, C