Skip to main content

Showing 1–30 of 30 results for author: Schwarz, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.06483  [pdf, other

    cs.LG cs.CL

    Composable Interventions for Language Models

    Authors: Arinbjorn Kolbeinsson, Kyle O'Brien, Tianjin Huang, Shanghua Gao, Shiwei Liu, Jonathan Richard Schwarz, Anurag Vaidya, Faisal Mahmood, Marinka Zitnik, Tianlong Chen, Thomas Hartvigsen

    Abstract: Test-time interventions for language models can enhance factual accuracy, mitigate harmful outputs, and improve model efficiency without costly retraining. But despite a flood of new methods, different types of interventions are largely developing independently. In practice, multiple interventions must be applied sequentially to the same model, yet we lack standardized ways to study how interventi… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  2. arXiv:2407.04761  [pdf, ps, other

    cs.DS math.OC

    A Decomposition Theorem for Dynamic Flows

    Authors: Lukas Graf, Tobias Harks, Julian Schwarz

    Abstract: The famous edge flow decomposition theorem of Gallai (1958) states that any static edge $s$,$d$-flow in a directed graph can be decomposed into a linear combination of incidence vectors of paths and cycles. In this paper, we study the decomposition problem for the setting of dynamic edge $s$,$d$-flows assuming a quite general dynamic flow propagation model. We prove the following decomposition the… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 54 pages, 3 figures

  3. arXiv:2406.10670  [pdf, other

    cs.LG cs.AI cs.CL

    CoLoR-Filter: Conditional Loss Reduction Filtering for Targeted Language Model Pre-training

    Authors: David Brandfonbrener, Hanlin Zhang, Andreas Kirsch, Jonathan Richard Schwarz, Sham Kakade

    Abstract: Selecting high-quality data for pre-training is crucial in shaping the downstream task performance of language models. A major challenge lies in identifying this optimal subset, a problem generally considered intractable, thus necessitating scalable and effective heuristics. In this work, we propose a data selection method, CoLoR-Filter (Conditional Loss Reduction Filtering), which leverages an em… ▽ More

    Submitted 24 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

  4. arXiv:2404.02831  [pdf, other

    cs.AI

    Empowering Biomedical Discovery with AI Agents

    Authors: Shanghua Gao, Ada Fang, Yepeng Huang, Valentina Giunchiglia, Ayush Noori, Jonathan Richard Schwarz, Yasha Ektefaie, Jovana Kondic, Marinka Zitnik

    Abstract: We envision "AI scientists" as systems capable of skeptical learning and reasoning that empower biomedical research through collaborative agents that integrate AI models and biomedical tools with experimental platforms. Rather than taking humans out of the discovery process, biomedical AI agents combine human creativity and expertise with AI's ability to analyze large datasets, navigate hypothesis… ▽ More

    Submitted 24 July, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

  5. arXiv:2403.08477  [pdf, other

    cs.CV cs.LG

    Unleashing the Power of Meta-tuning for Few-shot Generalization Through Sparse Interpolated Experts

    Authors: Shengzhuang Chen, Jihoon Tack, Yunqiao Yang, Yee Whye Teh, Jonathan Richard Schwarz, Ying Wei

    Abstract: Recent successes suggest that parameter-efficient fine-tuning of foundation models as the state-of-the-art method for transfer learning in vision, replacing the rich literature of alternatives such as meta-learning. In trying to harness the best of both worlds, meta-tuning introduces a subsequent optimization stage of foundation models but has so far only shown limited success and crucially tends… ▽ More

    Submitted 1 July, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

    Comments: The Forty-first International Conference on Machine Learning, 2024

  6. arXiv:2403.04317  [pdf, other

    cs.LG cs.CL

    Online Adaptation of Language Models with a Memory of Amortized Contexts

    Authors: Jihoon Tack, Jaehyung Kim, Eric Mitchell, Jinwoo Shin, Yee Whye Teh, Jonathan Richard Schwarz

    Abstract: Due to the rapid generation and dissemination of information, large language models (LLMs) quickly run out of date despite enormous development costs. Due to this crucial need to keep models updated, online learning has emerged as a critical necessity when utilizing LLMs for real-world applications. However, given the ever-expanding corpus of unseen documents and the large parameter space of moder… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: 14 pages

  7. arXiv:2312.05328  [pdf, other

    cs.AI

    Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding

    Authors: Talfan Evans, Shreya Pathak, Hamza Merzic, Jonathan Schwarz, Ryutaro Tanno, Olivier J. Henaff

    Abstract: Power-law scaling indicates that large-scale training with uniform sampling is prohibitively slow. Active learning methods aim to increase data efficiency by prioritizing learning on the most relevant examples. Despite their appeal, these methods have yet to be widely adopted since no one algorithm has been shown to a) generalize across models and tasks b) scale to large datasets and c) yield over… ▽ More

    Submitted 14 February, 2024; v1 submitted 8 December, 2023; originally announced December 2023.

    Comments: Technical report

  8. arXiv:2312.02753  [pdf, other

    eess.IV cs.CV cs.LG stat.ML

    C3: High-performance and low-complexity neural compression from a single image or video

    Authors: Hyunjik Kim, Matthias Bauer, Lucas Theis, Jonathan Richard Schwarz, Emilien Dupont

    Abstract: Most neural compression models are trained on large datasets of images or videos in order to generalize to unseen data. Such generalization typically requires large and expressive architectures with a high decoding complexity. Here we introduce C3, a neural compression method with strong rate-distortion (RD) performance that instead overfits a small model to each image or video separately. The res… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

  9. arXiv:2309.04382  [pdf, other

    cond-mat.dis-nn cs.LG cs.NE

    Emergent learning in physical systems as feedback-based aging in a glassy landscape

    Authors: Vidyesh Rao Anisetti, Ananth Kandala, J. M. Schwarz

    Abstract: By training linear physical networks to learn linear transformations, we discern how their physical properties evolve due to weight update rules. Our findings highlight a striking similarity between the learning behaviors of such networks and the processes of aging and memory formation in disordered and glassy systems. We show that the learning dynamics resembles an aging process, where the system… ▽ More

    Submitted 30 October, 2023; v1 submitted 8 September, 2023; originally announced September 2023.

    Comments: 11 pages, 7 figures

  10. arXiv:2307.00075  [pdf, other

    math.DS cs.NE

    Quantum State Assignment Flows

    Authors: Jonathan Schwarz, Jonas Cassel, Bastian Boll, Martin Gärttner, Peter Albers, Christoph Schnörr

    Abstract: This paper introduces assignment flows for density matrices as state spaces for representing and analyzing data associated with vertices of an underlying weighted graph. Determining an assignment flow by geometric integration of the defining dynamical system causes an interaction of the non-commuting states across the graph, and the assignment of a pure (rank-one) state to each vertex after conver… ▽ More

    Submitted 30 June, 2023; originally announced July 2023.

  11. arXiv:2302.03130  [pdf, other

    cs.LG cs.CV

    Spatial Functa: Scaling Functa to ImageNet Classification and Generation

    Authors: Matthias Bauer, Emilien Dupont, Andy Brock, Dan Rosenbaum, Jonathan Richard Schwarz, Hyunjik Kim

    Abstract: Neural fields, also known as implicit neural representations, have emerged as a powerful means to represent complex signals of various modalities. Based on this Dupont et al. (2022) introduce a framework that views neural fields as data, termed *functa*, and proposes to do deep learning directly on this dataset of neural fields. In this work, we show that the proposed framework faces limitations w… ▽ More

    Submitted 9 February, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

  12. arXiv:2302.00617  [pdf, other

    cs.LG cs.AI

    Learning Large-scale Neural Fields via Context Pruned Meta-Learning

    Authors: Jihoon Tack, Subin Kim, Sihyun Yu, Jaeho Lee, Jinwoo Shin, Jonathan Richard Schwarz

    Abstract: We introduce an efficient optimization-based meta-learning technique for large-scale neural field training by realizing significant memory savings through automated online context point selection. This is achieved by focusing each learning step on the subset of data with the highest expected immediate improvement in model quality, resulting in the almost instantaneous modeling of global structure… ▽ More

    Submitted 24 October, 2023; v1 submitted 1 February, 2023; originally announced February 2023.

    Comments: Published as a conference proceeding for NeurIPS 2023

  13. arXiv:2301.09479  [pdf, other

    stat.ML cs.AI cs.LG

    Modality-Agnostic Variational Compression of Implicit Neural Representations

    Authors: Jonathan Richard Schwarz, Jihoon Tack, Yee Whye Teh, Jaeho Lee, Jinwoo Shin

    Abstract: We introduce a modality-agnostic neural compression algorithm based on a functional view of data and parameterised as an Implicit Neural Representation (INR). Bridging the gap between latent coding and sparsity, we obtain compact latent representations non-linearly mapped to a soft gating mechanism. This allows the specialisation of a shared INR network to each data item through subnetwork selecti… ▽ More

    Submitted 7 April, 2023; v1 submitted 23 January, 2023; originally announced January 2023.

  14. arXiv:2208.08862  [pdf, ps, other

    cond-mat.dis-nn cs.LG cs.NE

    Frequency propagation: Multi-mechanism learning in nonlinear physical networks

    Authors: Vidyesh Rao Anisetti, A. Kandala, B. Scellier, J. M. Schwarz

    Abstract: We introduce frequency propagation, a learning algorithm for nonlinear physical networks. In a resistive electrical circuit with variable resistors, an activation current is applied at a set of input nodes at one frequency, and an error current is applied at a set of output nodes at another frequency. The voltage response of the circuit to these boundary currents is the superposition of an `activa… ▽ More

    Submitted 10 August, 2022; originally announced August 2022.

    Comments: 9 pages, 0 figures

  15. arXiv:2205.08957  [pdf, other

    stat.ML cs.AI cs.LG

    Meta-Learning Sparse Compression Networks

    Authors: Jonathan Richard Schwarz, Yee Whye Teh

    Abstract: Recent work in Deep Learning has re-imagined the representation of data as functions mapping from a coordinate space to an underlying continuous signal. When such functions are approximated by neural networks this introduces a compelling alternative to the more common multi-dimensional array representation. Recent work on such Implicit Neural Representations (INRs) has shown that - following caref… ▽ More

    Submitted 8 August, 2022; v1 submitted 18 May, 2022; originally announced May 2022.

    Comments: Published in TMLR (2022)

  16. arXiv:2203.12098  [pdf, other

    cond-mat.soft cs.LG cs.NE

    Learning by non-interfering feedback chemical signaling in physical networks

    Authors: Vidyesh Rao Anisetti, B. Scellier, J. M. Schwarz

    Abstract: Both non-neural and neural biological systems can learn. So rather than focusing on purely brain-like learning, efforts are underway to study learning in physical systems. Such efforts include equilibrium propagation (EP) and coupled learning (CL), which require storage of two different states-the free state and the perturbed state-during the learning process to retain information about gradients.… ▽ More

    Submitted 23 June, 2022; v1 submitted 22 March, 2022; originally announced March 2022.

    Comments: 9 pages, 2 figures

  17. arXiv:2110.00296  [pdf, other

    stat.ML cs.AI cs.LG

    Powerpropagation: A sparsity inducing weight reparameterisation

    Authors: Jonathan Schwarz, Siddhant M. Jayakumar, Razvan Pascanu, Peter E. Latham, Yee Whye Teh

    Abstract: The training of sparse neural networks is becoming an increasingly important tool for reducing the computational footprint of models at training and evaluation, as well enabling the effective scaling up of models. Whereas much work over the years has been dedicated to specialised pruning techniques, little attention has been paid to the inherent effect of gradient based training on model sparsity.… ▽ More

    Submitted 6 October, 2021; v1 submitted 1 October, 2021; originally announced October 2021.

    Comments: Accepted at NeurIPS 2021

  18. arXiv:2107.13298  [pdf, ps, other

    cs.GT cs.DM cs.DS math.OC

    Generalized Nash Equilibrium Problems with Mixed-Integer Variables

    Authors: Tobias Harks, Julian Schwarz

    Abstract: We consider generalized Nash equilibrium problems (GNEPs) with non-convex strategy spaces and non-convex cost functions. This general class of games includes the important case of games with mixed-integer variables for which only a few results are known in the literature. We present a new approach to characterize equilibria via a convexification technique using the Nikaido-Isoda function. To any g… ▽ More

    Submitted 9 April, 2024; v1 submitted 28 July, 2021; originally announced July 2021.

    Comments: 38 pages, 10 figures, 3 tables

    Journal ref: Mathematical Programming (2024)

  19. arXiv:2010.14274  [pdf, other

    cs.AI cs.LG

    Behavior Priors for Efficient Reinforcement Learning

    Authors: Dhruva Tirumala, Alexandre Galashov, Hyeonwoo Noh, Leonard Hasenclever, Razvan Pascanu, Jonathan Schwarz, Guillaume Desjardins, Wojciech Marian Czarnecki, Arun Ahuja, Yee Whye Teh, Nicolas Heess

    Abstract: As we deploy reinforcement learning agents to solve increasingly challenging problems, methods that allow us to inject prior knowledge about the structure of the world and effective solution strategies becomes increasingly important. In this work we consider how information and architectural constraints can be combined with ideas from the probabilistic modeling literature to learn behavior priors… ▽ More

    Submitted 27 October, 2020; originally announced October 2020.

    Comments: Submitted to Journal of Machine Learning Research (JMLR)

  20. arXiv:2006.01523  [pdf

    cs.CY

    Psychiatric Home Treatment for Inpatient Care -- Design, Implementation and Participation

    Authors: Stefan Hochwarter, Pierre Tangermann, Martin Heinze, Julian Schwarz

    Abstract: The use of information and communication technologies (ICT) to support long-term care is gaining attention, also in the light of population ageing. Known in Scandinavian countries under the term of welfare technology, it aims to increase the quality of life and independence of people with physical, psychological or social impairments. In Germany, a new form of psychiatric home treatment, inpatient… ▽ More

    Submitted 2 June, 2020; originally announced June 2020.

    Journal ref: Vol. 27 No. 1 (2019): Proceedings from the annual NOKOBIT conference held in Narvik 26-27 November 2019

  21. arXiv:1905.01240  [pdf, other

    cs.LG cs.AI stat.ML

    Information asymmetry in KL-regularized RL

    Authors: Alexandre Galashov, Siddhant M. Jayakumar, Leonard Hasenclever, Dhruva Tirumala, Jonathan Schwarz, Guillaume Desjardins, Wojciech M. Czarnecki, Yee Whye Teh, Razvan Pascanu, Nicolas Heess

    Abstract: Many real world tasks exhibit rich structure that is repeated across different parts of the state space or in time. In this work we study the possibility of leveraging such repeated structure to speed up and regularize learning. We start from the KL regularized expected reward objective which introduces an additional component, a default policy. Instead of relying on a fixed default policy, we lea… ▽ More

    Submitted 3 May, 2019; originally announced May 2019.

    Comments: Accepted as a conference paper at ICLR 2019

  22. arXiv:1903.11907  [pdf, other

    stat.ML cs.LG

    Meta-Learning surrogate models for sequential decision making

    Authors: Alexandre Galashov, Jonathan Schwarz, Hyunjik Kim, Marta Garnelo, David Saxton, Pushmeet Kohli, S. M. Ali Eslami, Yee Whye Teh

    Abstract: We introduce a unified probabilistic framework for solving sequential decision making problems ranging from Bayesian optimisation to contextual bandits and reinforcement learning. This is accomplished by a probabilistic model-based approach that explains observed data while capturing predictive uncertainty during the decision making process. Crucially, this probabilistic model is chosen to be a Me… ▽ More

    Submitted 12 June, 2019; v1 submitted 28 March, 2019; originally announced March 2019.

  23. arXiv:1901.11356  [pdf, other

    stat.ML cs.LG

    Functional Regularisation for Continual Learning with Gaussian Processes

    Authors: Michalis K. Titsias, Jonathan Schwarz, Alexander G. de G. Matthews, Razvan Pascanu, Yee Whye Teh

    Abstract: We introduce a framework for Continual Learning (CL) based on Bayesian inference over the function space rather than the parameters of a deep neural network. This method, referred to as functional regularisation for Continual Learning, avoids forgetting a previous task by constructing and memorising an approximate posterior belief over the underlying task-specific function. To achieve this we rely… ▽ More

    Submitted 11 February, 2020; v1 submitted 31 January, 2019; originally announced January 2019.

    Comments: 17 pages, 7 figures

  24. arXiv:1901.05761  [pdf, other

    cs.LG stat.ML

    Attentive Neural Processes

    Authors: Hyunjik Kim, Andriy Mnih, Jonathan Schwarz, Marta Garnelo, Ali Eslami, Dan Rosenbaum, Oriol Vinyals, Yee Whye Teh

    Abstract: Neural Processes (NPs) (Garnelo et al 2018a;b) approach regression by learning to map a context set of observed input-output pairs to a distribution over regression functions. Each function models the distribution of the output given an input, conditioned on the context. NPs have the benefit of fitting observed data efficiently with linear complexity in the number of context input-output pairs, an… ▽ More

    Submitted 9 July, 2019; v1 submitted 17 January, 2019; originally announced January 2019.

  25. arXiv:1811.11682  [pdf, other

    cs.LG cs.AI stat.ML

    Experience Replay for Continual Learning

    Authors: David Rolnick, Arun Ahuja, Jonathan Schwarz, Timothy P. Lillicrap, Greg Wayne

    Abstract: Continual learning is the problem of learning new tasks or knowledge while protecting old knowledge and ideally generalizing from old experience to learn new tasks faster. Neural networks trained by stochastic gradient descent often degrade on old tasks when trained successively on new tasks with different data distributions. This phenomenon, referred to as catastrophic forgetting, is considered a… ▽ More

    Submitted 26 November, 2019; v1 submitted 28 November, 2018; originally announced November 2018.

    Comments: NeurIPS 2019

  26. arXiv:1807.01622  [pdf, other

    cs.LG stat.ML

    Neural Processes

    Authors: Marta Garnelo, Jonathan Schwarz, Dan Rosenbaum, Fabio Viola, Danilo J. Rezende, S. M. Ali Eslami, Yee Whye Teh

    Abstract: A neural network (NN) is a parameterised function that can be tuned via gradient descent to approximate a labelled collection of data with high precision. A Gaussian process (GP), on the other hand, is a probabilistic model that defines a distribution over possible functions, and is updated in light of data via the rules of probabilistic inference. GPs are probabilistic, data-efficient and flexibl… ▽ More

    Submitted 4 July, 2018; originally announced July 2018.

  27. arXiv:1805.11534  [pdf, other

    stat.ML cs.LG

    airpred: A Flexible R Package Implementing Methods for Predicting Air Pollution

    Authors: M. Benjamin Sabath, Qian Di, Danielle Braun, Joel Schwarz, Francesca Dominici, Christine Choirat

    Abstract: Fine particulate matter (PM$_{2.5}$) is one of the criteria air pollutants regulated by the Environmental Protection Agency in the United States. There is strong evidence that ambient exposure to (PM$_{2.5}$) increases risk of mortality and hospitalization. Large scale epidemiological studies on the health effects of PM$_{2.5}$ provide the necessary evidence base for lowering the safety standards… ▽ More

    Submitted 30 October, 2018; v1 submitted 29 May, 2018; originally announced May 2018.

  28. arXiv:1805.06370  [pdf, other

    stat.ML cs.LG

    Progress & Compress: A scalable framework for continual learning

    Authors: Jonathan Schwarz, Jelena Luketina, Wojciech M. Czarnecki, Agnieszka Grabska-Barwinska, Yee Whye Teh, Razvan Pascanu, Raia Hadsell

    Abstract: We introduce a conceptually simple and scalable framework for continual learning domains where tasks are learned sequentially. Our method is constant in the number of parameters and is designed to preserve performance on previously encountered tasks while accelerating learning progress on subsequent problems. This is achieved by training a network with two components: A knowledge base, capable of… ▽ More

    Submitted 2 July, 2018; v1 submitted 16 May, 2018; originally announced May 2018.

    Comments: Accepted at ICML 2018

  29. arXiv:1712.07040  [pdf, other

    cs.CL cs.AI cs.NE

    The NarrativeQA Reading Comprehension Challenge

    Authors: Tomáš Kočiský, Jonathan Schwarz, Phil Blunsom, Chris Dyer, Karl Moritz Hermann, Gábor Melis, Edward Grefenstette

    Abstract: Reading comprehension (RC)---in contrast to information retrieval---requires integrating information and reasoning about events, entities, and their relations across a full document. Question answering is conventionally used to assess RC ability, in both artificial agents and children learning to read. However, existing RC datasets and tasks are dominated by questions that can be solved by selecti… ▽ More

    Submitted 19 December, 2017; originally announced December 2017.

  30. arXiv:1501.00513  [pdf

    cs.DC

    Self-Repairing Disk Arrays

    Authors: Jehan-François Pâris, Ahmed Amer, Darrell D. E. Long, Thomas J. E. Schwarz

    Abstract: As the prices of magnetic storage continue to decrease, the cost of replacing failed disks becomes increasingly dominated by the cost of the service call itself. We propose to eliminate these calls by building disk arrays that contain enough spare disks to operate without any human intervention during their whole lifetime. To evaluate the feasibility of this approach, we have simulated the behavio… ▽ More

    Submitted 2 January, 2015; originally announced January 2015.

    Comments: Part of ADAPT Workshop proceedings, 2015 (arXiv:1412.2347)

    Report number: ADAPT/2015/02