Search | arXiv e-print repository

Reducing the Cost of Quantum Chemical Data By Backpropagating Through Density Functional Theory

Authors: Alexander Mathiasen, Hatem Helal, Paul Balanca, Adam Krzywaniak, Ali Parviz, Frederik Hvilshøj, Blazej Banaszewski, Carlo Luschi, Andrew William Fitzgibbon

Abstract: Density Functional Theory (DFT) accurately predicts the quantum chemical properties of molecules, but scales as $O(N_{\text{electrons}}^3)$. Schütt et al. (2019) successfully approximate DFT 1000x faster with Neural Networks (NN). Arguably, the biggest problem one faces when scaling to larger molecules is the cost of DFT labels. For example, it took years to create the PCQ dataset (Nakata & Shimaz… ▽ More Density Functional Theory (DFT) accurately predicts the quantum chemical properties of molecules, but scales as $O(N_{\text{electrons}}^3)$. Schütt et al. (2019) successfully approximate DFT 1000x faster with Neural Networks (NN). Arguably, the biggest problem one faces when scaling to larger molecules is the cost of DFT labels. For example, it took years to create the PCQ dataset (Nakata & Shimazaki, 2017) on which subsequent NNs are trained within a week. DFT labels molecules by minimizing energy $E(\cdot )$ as a "loss function." We bypass dataset creation by directly training NNs with $E(\cdot )$ as a loss function. For comparison, Schütt et al. (2019) spent 626 hours creating a dataset on which they trained their NN for 160h, for a total of 786h; our method achieves comparable performance within 31h. △ Less

Submitted 6 February, 2024; originally announced February 2024.

arXiv:2311.01135 [pdf, other]

Generating QM1B with PySCF$_{\text{IPU}}$

Authors: Alexander Mathiasen, Hatem Helal, Kerstin Klaser, Paul Balanca, Josef Dean, Carlo Luschi, Dominique Beaini, Andrew Fitzgibbon, Dominic Masters

Abstract: The emergence of foundation models in Computer Vision and Natural Language Processing have resulted in immense progress on downstream tasks. This progress was enabled by datasets with billions of training examples. Similar benefits are yet to be unlocked for quantum chemistry, where the potential of deep learning is constrained by comparatively small datasets with 100k to 20M training examples. Th… ▽ More The emergence of foundation models in Computer Vision and Natural Language Processing have resulted in immense progress on downstream tasks. This progress was enabled by datasets with billions of training examples. Similar benefits are yet to be unlocked for quantum chemistry, where the potential of deep learning is constrained by comparatively small datasets with 100k to 20M training examples. These datasets are limited in size because the labels are computed using the accurate (but computationally demanding) predictions of Density Functional Theory (DFT). Notably, prior DFT datasets were created using CPU supercomputers without leveraging hardware acceleration. In this paper, we take a first step towards utilising hardware accelerators by introducing the data generator PySCF$_{\text{IPU}}$ using Intelligence Processing Units (IPUs). This allowed us to create the dataset QM1B with one billion training examples containing 9-11 heavy atoms. We demonstrate that a simple baseline neural network (SchNet 9M) improves its performance by simply increasing the amount of training data without additional inductive biases. To encourage future researchers to use QM1B responsibly, we highlight several limitations of QM1B and emphasise the low-resolution of our DFT options, which also serves as motivation for even larger, more accurate datasets. Code and dataset are available on Github: http://github.com/graphcore-research/pyscf-ipu △ Less

Submitted 2 November, 2023; originally announced November 2023.

Comments: 15 pages, 7 figures. NeurIPS 2023 Track Datasets and Benchmarks

ACM Class: I.2.6; J.2

arXiv:2009.14554 [pdf, other]

One Reflection Suffice

Authors: Alexander Mathiasen, Frederik Hvilshøj

Abstract: Orthogonal weight matrices are used in many areas of deep learning. Much previous work attempt to alleviate the additional computational resources it requires to constrain weight matrices to be orthogonal. One popular approach utilizes *many* Householder reflections. The only practical drawback is that many reflections cause low GPU utilization. We mitigate this final drawback by proving that *one… ▽ More Orthogonal weight matrices are used in many areas of deep learning. Much previous work attempt to alleviate the additional computational resources it requires to constrain weight matrices to be orthogonal. One popular approach utilizes *many* Householder reflections. The only practical drawback is that many reflections cause low GPU utilization. We mitigate this final drawback by proving that *one* reflection is sufficient, if the reflection is computed by an auxiliary neural network. △ Less

Submitted 30 September, 2020; originally announced September 2020.

arXiv:2009.14075 [pdf, other]

Backpropagating through Fréchet Inception Distance

Authors: Alexander Mathiasen, Frederik Hvilshøj

Abstract: The Fréchet Inception Distance (FID) has been used to evaluate hundreds of generative models. We introduce FastFID, which can efficiently train generative models with FID as a loss function. Using FID as an additional loss for Generative Adversarial Networks improves their FID. The Fréchet Inception Distance (FID) has been used to evaluate hundreds of generative models. We introduce FastFID, which can efficiently train generative models with FID as a loss function. Using FID as an additional loss for Generative Adversarial Networks improves their FID. △ Less

Submitted 14 April, 2021; v1 submitted 29 September, 2020; originally announced September 2020.

arXiv:2009.13977 [pdf, other]

What if Neural Networks had SVDs?

Authors: Alexander Mathiasen, Frederik Hvilshøj, Jakob Rødsgaard Jørgensen, Anshul Nasery, Davide Mottin

Abstract: Various Neural Networks employ time-consuming matrix operations like matrix inversion. Many such matrix operations are faster to compute given the Singular Value Decomposition (SVD). Previous work allows using the SVD in Neural Networks without computing it. In theory, the techniques can speed up matrix operations, however, in practice, they are not fast enough. We present an algorithm that is fas… ▽ More Various Neural Networks employ time-consuming matrix operations like matrix inversion. Many such matrix operations are faster to compute given the Singular Value Decomposition (SVD). Previous work allows using the SVD in Neural Networks without computing it. In theory, the techniques can speed up matrix operations, however, in practice, they are not fast enough. We present an algorithm that is fast enough to speed up several matrix operations. The algorithm increases the degree of parallelism of an underlying matrix multiplication $H\cdot X$ where $H$ is an orthogonal matrix represented by a product of Householder matrices. Code is available at www.github.com/AlexanderMath/fasth . △ Less

Submitted 29 September, 2020; originally announced September 2020.

arXiv:2006.01491 [pdf, other]

The Fine-Grained and Parallel Complexity of Andersen's Pointer Analysis

Authors: Anders Alnor Mathiasen, Andreas Pavlogiannis

Abstract: Pointer analysis is one of the fundamental problems in static program analysis. Given a set of pointers, the task is to produce a useful over-approximation of the memory locations that each pointer may point-to at runtime. The most common formulation is Andersen's Pointer Analysis (APA), defined as an inclusion-based set of $m$ pointer constraints over a set of $n$ pointers. Existing algorithms so… ▽ More Pointer analysis is one of the fundamental problems in static program analysis. Given a set of pointers, the task is to produce a useful over-approximation of the memory locations that each pointer may point-to at runtime. The most common formulation is Andersen's Pointer Analysis (APA), defined as an inclusion-based set of $m$ pointer constraints over a set of $n$ pointers. Existing algorithms solve APA in $O(n^2\cdot m)$ time, while it has been conjectured that the problem has no truly sub-cubic algorithm, with a proof so far having remained elusive. In this work we draw a rich fine-grained and parallel complexity landscape of APA, and present upper and lower bounds. First, we establish an $O(n^3)$ upper-bound for general APA, improving over $O(n^2\cdot m)$ as $n=O(m)$. Second, we show that even on-demand APA ("may a specific pointer $a$ point to a specific location $b$?") has an $Ω(n^3)$ (combinatorial) lower bound under standard complexity-theoretic hypotheses. This formally establishes the long-conjectured "cubic bottleneck" of APA, and shows that our $O(n^3)$-time algorithm is optimal. Third, we show that under mild restrictions, APA is solvable in $\tilde{O}(n^ω)$ time, where $ω<2.373$ is the matrix-multiplication exponent. It is believed that $ω=2+o(1)$, in which case this bound becomes quadratic. Fourth, we show that even under such restrictions, even the on-demand problem has an $Ω(n^2)$ lower bound under standard complexity-theoretic hypotheses, and hence our algorithm is optimal when $ω=2+o(1)$. Fifth, we study the parallelizability of APA and establish lower and upper bounds: (i) in general, the problem is P-complete and hence unlikely parallelizable, whereas (ii) under mild restrictions, the problem is parallelizable. Our theoretical treatment formalizes several insights that can lead to practical improvements in the future. △ Less

Submitted 14 October, 2020; v1 submitted 2 June, 2020; originally announced June 2020.

arXiv:1909.12518 [pdf, ps, other]

Margin-Based Generalization Lower Bounds for Boosted Classifiers

Authors: Allan Grønlund, Lior Kamma, Kasper Green Larsen, Alexander Mathiasen, Jelani Nelson

Abstract: Boosting is one of the most successful ideas in machine learning. The most well-accepted explanations for the low generalization error of boosting algorithms such as AdaBoost stem from margin theory. The study of margins in the context of boosting algorithms was initiated by Schapire, Freund, Bartlett and Lee (1998) and has inspired numerous boosting algorithms and generalization bounds. To date,… ▽ More Boosting is one of the most successful ideas in machine learning. The most well-accepted explanations for the low generalization error of boosting algorithms such as AdaBoost stem from margin theory. The study of margins in the context of boosting algorithms was initiated by Schapire, Freund, Bartlett and Lee (1998) and has inspired numerous boosting algorithms and generalization bounds. To date, the strongest known generalization (upper bound) is the $k$th margin bound of Gao and Zhou (2013). Despite the numerous generalization upper bounds that have been proved over the last two decades, nothing is known about the tightness of these bounds. In this paper, we give the first margin-based lower bounds on the generalization error of boosted classifiers. Our lower bounds nearly match the $k$th margin bound and thus almost settle the generalization performance of boosted classifiers in terms of margins. △ Less

Submitted 7 May, 2020; v1 submitted 27 September, 2019; originally announced September 2019.

arXiv:1901.10789 [pdf, other]

Optimal Minimal Margin Maximization with Boosting

Authors: Allan Grønlund, Kasper Green Larsen, Alexander Mathiasen

Abstract: Boosting algorithms produce a classifier by iteratively combining base hypotheses. It has been observed experimentally that the generalization error keeps improving even after achieving zero training error. One popular explanation attributes this to improvements in margins. A common goal in a long line of research, is to maximize the smallest margin using as few base hypotheses as possible, culmin… ▽ More Boosting algorithms produce a classifier by iteratively combining base hypotheses. It has been observed experimentally that the generalization error keeps improving even after achieving zero training error. One popular explanation attributes this to improvements in margins. A common goal in a long line of research, is to maximize the smallest margin using as few base hypotheses as possible, culminating with the AdaBoostV algorithm by (R{ä}tsch and Warmuth [JMLR'04]). The AdaBoostV algorithm was later conjectured to yield an optimal trade-off between number of hypotheses trained and the minimal margin over all training points (Nie et al. [JMLR'13]). Our main contribution is a new algorithm refuting this conjecture. Furthermore, we prove a lower bound which implies that our new algorithm is optimal. △ Less

Submitted 30 January, 2019; originally announced January 2019.

arXiv:1701.07204 [pdf, other]

Fast Exact k-Means, k-Medians and Bregman Divergence Clustering in 1D

Authors: Allan Grønlund, Kasper Green Larsen, Alexander Mathiasen, Jesper Sindahl Nielsen, Stefan Schneider, Mingzhou Song

Abstract: The $k$-Means clustering problem on $n$ points is NP-Hard for any dimension $d\ge 2$, however, for the 1D case there exists exact polynomial time algorithms. Previous literature reported an $O(kn^2)$ time dynamic programming algorithm that uses $O(kn)$ space. It turns out that the problem has been considered under a different name more than twenty years ago. We present all the existing work that h… ▽ More The $k$-Means clustering problem on $n$ points is NP-Hard for any dimension $d\ge 2$, however, for the 1D case there exists exact polynomial time algorithms. Previous literature reported an $O(kn^2)$ time dynamic programming algorithm that uses $O(kn)$ space. It turns out that the problem has been considered under a different name more than twenty years ago. We present all the existing work that had been overlooked and compare the various solutions theoretically. Moreover, we show how to reduce the space usage for some of them, as well as generalize them to data structures that can quickly report an optimal $k$-Means clustering for any $k$. Finally we also generalize all the algorithms to work for the absolute distance and to work for any Bregman Divergence. We complement our theoretical contributions by experiments that compare the practical performance of the various algorithms. △ Less

Submitted 25 April, 2018; v1 submitted 25 January, 2017; originally announced January 2017.

Showing 1–9 of 9 results for author: Mathiasen, A