Michael Poli

Michael Poli

Palo Alto, California, United States
968 followers 500+ connections

About

Deep learning, numerics and systems.

I like to architect big neural nets that run…

Activity

Join now to see all activity

Experience

  • Liquid AI Graphic
  • -

  • -

  • -

    San Francisco Bay Area

  • -

    Redmond, Washington, United States

  • -

    Daejeon, South Korea

  • -

    Daejeon, South Korea

  • -

  • -

    Yuseong-gu, Daejeon, Korea

  • -

    Singapore

  • -

    Nanjing City, China

Education

  • Stanford University Graphic

    Stanford University

    -

    Research at the intersection of machine learning, systems and signal processing

    Advised by Stefano Ermon.

  • -

    Deep learning, dynamical systems and differential equations.

  • -

    中上班, 综合, 口语, 听力, 写作

  • -

  • -

Licenses & Certifications

Publications

  • Hypersolvers: Toward Fast Continuous-Depth Models

    Thirty-fourth Conference on Neural Information Processing Systems (NeurIPS20)

    The infinite-depth paradigm pioneered by Neural ODEs has launched a renaissance in the search for novel dynamical system-inspired deep learning primitives; however, their utilization in problems of non-trivial size has often proved impossible due to poor computational scalability. This work paves the way for scalable Neural ODEs with time-to-prediction comparable to traditional discrete networks. We introduce hypersolvers, neural networks designed to solve ODEs with low overhead and theoretical…

    The infinite-depth paradigm pioneered by Neural ODEs has launched a renaissance in the search for novel dynamical system-inspired deep learning primitives; however, their utilization in problems of non-trivial size has often proved impossible due to poor computational scalability. This work paves the way for scalable Neural ODEs with time-to-prediction comparable to traditional discrete networks. We introduce hypersolvers, neural networks designed to solve ODEs with low overhead and theoretical guarantees on accuracy. The synergistic combination of hypersolvers and Neural ODEs allows for cheap inference and unlocks a new frontier for practical application of continuous-depth models. Experimental evaluations on standard benchmarks, such as sampling for continuous normalizing flows, reveal consistent pareto efficiency over classical numerical methods.

    Other authors
    See publication
  • WATTNet: Learning to Trade FX via Hierarchical Spatio-Temporal Representation of Highly Multivariate Time Series

    The 29th International Joint Conference on Artificial Intelligence, IJCAI-PRICAI2020

    Finance is a particularly challenging application area for deep learning models due to low noise-to-signal ratio, non-stationarity, and partial observability. Non-deliverable-forwards (NDF), a derivatives contract used in foreign exchange (FX) trading, presents additional difficulty in the form of long-term planning required for an effective selection of start and end date of the contract. In this work, we focus on tackling the problem of NDF tenor selection by leveraging high-dimensional…

    Finance is a particularly challenging application area for deep learning models due to low noise-to-signal ratio, non-stationarity, and partial observability. Non-deliverable-forwards (NDF), a derivatives contract used in foreign exchange (FX) trading, presents additional difficulty in the form of long-term planning required for an effective selection of start and end date of the contract. In this work, we focus on tackling the problem of NDF tenor selection by leveraging high-dimensional sequential data consisting of spot rates, technical indicators and expert tenor patterns. To this end, we construct a dataset from the Depository Trust & Clearing Corporation (DTCC) NDF data that includes a comprehensive list of NDF volumes and daily spot rates for 64 FX pairs. We introduce WaveATTentionNet (WATTNet), a novel temporal convolution (TCN) model for spatio-temporal modeling of highly multivariate time series, and validate it across NDF markets with varying degrees of dissimilarity between the training and test periods in terms of volatility and general market regimes. The proposed method achieves a significant positive return on investment (ROI) in all NDF markets under analysis, outperforming recurrent and classical baselines by a wide margin. Finally, we propose two orthogonal interpretability approaches to verify noise stability and detect the driving factors of the learned tenor selection strategy.

    Other authors
    See publication
  • Neural Ordinary Differential Equation Value Networks for Parametrized Action Spaces

    Eighth International Conference on Learning Representations (ICLR2020), Workshop on Integration of Deep Neural Models and Differential Equations

    Action spaces equipped with parameter sets are a common occurrence in reinforcement learning applications. Solutions to problems of this class have been developed under different frameworks, such as parametrized action Markov decision processes (PAMDP) or hierarchical reinforcement learning (HRL). These approaches often require extensions or modifications to standard existing algorithms developed on standard MDPs. For this reason they can be unwieldy and, particularly in the case of HRL…

    Action spaces equipped with parameter sets are a common occurrence in reinforcement learning applications. Solutions to problems of this class have been developed under different frameworks, such as parametrized action Markov decision processes (PAMDP) or hierarchical reinforcement learning (HRL). These approaches often require extensions or modifications to standard existing algorithms developed on standard MDPs. For this reason they can be unwieldy and, particularly in the case of HRL, computationally inefficient. We propose adopting a different parametrization scheme for state--action value networks based on neural ordinary differential equations (NODEs) as a scalable, plug--and--play approach for parametrized action spaces. NODEs value networks do not require extensive modification to existing algorithms nor the adoption of HRL methods. Our solution can directly be integrated into existing training algorithms and opens up new opportunities in single-agent and multi-agent settings with tight precision constraints on the action parameters such as robotics.

    Other authors
    See publication
  • Port-Hamiltonian Gradient Flows

    Eighth International Conference on Learning Representations (ICLR2020), Workshop on Integration of Deep Neural Models and Differential Equations

    In this paper we present a general framework for continuous--time gradient descent, often referred to as gradient flow. We extend Hamiltonian gradient flows, which ascribe mechanical dynamics to neural network parameters and constitute a natural continuous-time alternative to discrete momentum-based gradient descent approaches. The proposed Port-Hamiltonian Gradient Flow (PHGF) casts neural network training into a system--theoretic framework: a fictitious physical system is coupled to the…

    In this paper we present a general framework for continuous--time gradient descent, often referred to as gradient flow. We extend Hamiltonian gradient flows, which ascribe mechanical dynamics to neural network parameters and constitute a natural continuous-time alternative to discrete momentum-based gradient descent approaches. The proposed Port-Hamiltonian Gradient Flow (PHGF) casts neural network training into a system--theoretic framework: a fictitious physical system is coupled to the neural network by setting the loss function as an energy term of the system. As autonomous port--Hamiltonian systems naturally tend to dissipate energy towards one of its minima by construction, solving the system simultaneously trains the neural network. We show that general PHGFs are compatible with both continuous-time data--stream optimization, where the optimizer processes a continuous stream of data, as well as standard fixed-step optimization. In continuous-time, PHGFs allow for the embedding of black--box adaptive--step ODE solvers and are able to stick to the energy manifold, thus avoiding divergence due to large learning rates. In fixed-step optimization, on the other hand, PGHFs open the door to novel fixed-step approaches based on symplectic discretizations of the Port--Hamiltonian with similar memory footprint and computational complexity as momentum optimizers.

    Other authors
    See publication
  • Dissecting Neural ODEs

    Thirty-fourth Conference on Neural Information Processing Systems (NeurIPS20)

    Continuous deep learning architectures have recently re-emerged as variants of Neural Ordinary Differential Equations (Neural ODEs). The infinite-depth approach offered by these models theoretically bridges the gap between deep learning and dynamical systems; however, deciphering their inner working is still an open challenge and most of their applications are currently limited to the inclusion as generic black-box modules. In this work, we "open the box" and offer a system-theoretic…

    Continuous deep learning architectures have recently re-emerged as variants of Neural Ordinary Differential Equations (Neural ODEs). The infinite-depth approach offered by these models theoretically bridges the gap between deep learning and dynamical systems; however, deciphering their inner working is still an open challenge and most of their applications are currently limited to the inclusion as generic black-box modules. In this work, we "open the box" and offer a system-theoretic perspective, including state augmentation strategies and robustness, with the aim of clarifying the influence of several design choices on the underlying dynamics. We also introduce novel architectures: among them, a Galerkin-inspired depth-varying parameter model and neural ODEs with data-controlled vector fields.

    Other authors
    See publication
  • Graph Neural Ordinary Differential Equations

    The AAAI-20 Workshop on Deep Learning on Graphs: Methodologies and Applications (AAAI-DLGMA 2020)

    We extend the framework of graph neural networks (GNN) to continuous time. Graph neural ordinary differential equations (GDEs) are introduced as the counterpart to GNNs where the input--output relationship is determined by a continuum of GNN layers. The GDE framework is shown to be compatible with the majority of commonly used GNN models with minimal modification to the original formulations. We evaluate the effectiveness of GDEs on both static as well as dynamic datasets: results prove their…

    We extend the framework of graph neural networks (GNN) to continuous time. Graph neural ordinary differential equations (GDEs) are introduced as the counterpart to GNNs where the input--output relationship is determined by a continuum of GNN layers. The GDE framework is shown to be compatible with the majority of commonly used GNN models with minimal modification to the original formulations. We evaluate the effectiveness of GDEs on both static as well as dynamic datasets: results prove their general effectiveness even in cases where the data is not generated by continuous time processes.

    Other authors
    See publication
  • WATTNet: Learning to Trade FX via Hierarchical Spatio-Temporal Representation of Highly Multivariate Time Series

    The AAAI-20 Workshop on Knowledge Discovery from Unstructured Data in Financial Services (AAAI-KDF 2020)

    Finance is a particularly challenging application area for deep learning models due to low noise-to-signal ratio, non-stationarity, and partial observability. Non-deliverable-forwards (NDF), a derivatives contract used in foreign exchange (FX) trading, presents additional difficulty in the form of long-term planning required for an effective selection of start and end date of the contract. In this work, we focus on tackling the problem of NDF tenor selection by leveraging high-dimensional…

    Finance is a particularly challenging application area for deep learning models due to low noise-to-signal ratio, non-stationarity, and partial observability. Non-deliverable-forwards (NDF), a derivatives contract used in foreign exchange (FX) trading, presents additional difficulty in the form of long-term planning required for an effective selection of start and end date of the contract. In this work, we focus on tackling the problem of NDF tenor selection by leveraging high-dimensional sequential data consisting of spot rates, technical indicators and expert tenor patterns. To this end, we construct a dataset from the Depository Trust & Clearing Corporation (DTCC) NDF data that includes a comprehensive list of NDF volumes and daily spot rates for 64 FX pairs. We introduce WaveATTentionNet (WATTNet), a novel temporal convolution (TCN) model for spatio-temporal modeling of highly multivariate time series, and validate it across NDF markets with varying degrees of dissimilarity between the training and test periods in terms of volatility and general market regimes. The proposed method achieves a significant positive return on investment (ROI) in all NDF markets under analysis, outperforming recurrent and classical baselines by a wide margin. Finally, we propose two orthogonal interpretability approaches to verify noise stability and detect the driving factors of the learned tenor selection strategy.

    Other authors
    See publication
  • Port-Hamiltonian Approach to Neural Network Training

    IEEE Conference on Decision and Control (CDC19)

    Neural networks are discrete entities: subdivided into discrete layers and parametrized by weights which are iteratively optimized via difference equations. Recent work proposes networks with layer outputs which are no longer quantized but are solutions of an ordinary differential equation (ODE); however, these networks are still optimized via discrete methods (e.g. gradient descent). In this paper, we explore a different direction: namely, we propose a novel framework for learning in which the…

    Neural networks are discrete entities: subdivided into discrete layers and parametrized by weights which are iteratively optimized via difference equations. Recent work proposes networks with layer outputs which are no longer quantized but are solutions of an ordinary differential equation (ODE); however, these networks are still optimized via discrete methods (e.g. gradient descent). In this paper, we explore a different direction: namely, we propose a novel framework for learning in which the parameters themselves are solutions of ODEs. By viewing the optimization process as the evolution of a port-Hamiltonian system, we can ensure convergence to a minimum of the objective function. Numerical experiments have been performed to show the validity and effectiveness of the proposed methods.

    Other authors
    See publication

Courses

  • AI-based Time Series Analysis

    AI608

  • Advanced Machine Learning

    CS671

  • Business Intelligence

    KSE521

  • Data-Driven Decision Making and Control

    IE437

  • Dynamic Programming and Reinforcement Learning

    IE540

  • Game Theory and Multi-Agent Reinforcement Learning

    IE801

  • Graph Mining and Social Network Analysis

    AI607

  • Introduction to Financial Engineering

    IE471

  • Linear Programming

    IE521

  • Machine Learning for Healthcare

    AI810

  • Optimization for AI

    AI505

  • Parallel Architectures for Deep Learning

    EE488

  • Probability and Statistics

    CC511

  • Recent Advances in Deep Learning

    EE807

  • Statistical Learning Theory

    EE531

Projects

Honors & Awards

  • "Almatong" double-degree project scholarship

    University of Bologna

    Double-degree project awarded because of high ranking in university career.
    It entails cover on great part of Shanghai's Tongji University stay expenses, Bologna-Shanghai round trip flight ticket and medical insurance.

Languages

  • English

    Native or bilingual proficiency

  • Italiano

    Native or bilingual proficiency

  • 中文

    Professional working proficiency

  • Français

    Elementary proficiency

  • 한국어

    Elementary proficiency

More activity by Michael

View Michael’s full profile

  • See who you know in common
  • Get introduced
  • Contact Michael directly
Join to view full profile

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Others named Michael Poli in United States

Add new skills with these courses