Michael Poli

Palo Alto, California, United States

968 followers 500+ connections

View mutual connections with Michael

Welcome back

Email or phone

Password

Forgot password?

or

New to LinkedIn? Join now

or

New to LinkedIn? Join now

Join to view profile

Liquid AI

Stanford University

About

Deep learning, numerics and systems.

I like to architect big neural nets that run…

Activity

We’re hiring again :) and this time it’s to work alongside Jeffrey who is leading our Product Engineering team. The least I can say about Jeffrey…

We’re hiring again :) and this time it’s to work alongside Jeffrey who is leading our Product Engineering team. The least I can say about Jeffrey…

Liked by Michael Poli
Hey yo gang : Looking for 2 people to join the beautiful Liquid team. - C++ Engineer to work on low level systems engineering and porting models…

Hey yo gang : Looking for 2 people to join the beautiful Liquid team. - C++ Engineer to work on low level systems engineering and porting models…

Liked by Michael Poli
Huge news today for Godot Engine (top 10 open source project) creators and stewards W4 Games — We are proud to announce that Second Dinner (makers of…

Huge news today for Godot Engine (top 10 open source project) creators and stewards W4 Games — We are proud to announce that Second Dinner (makers of…

Liked by Michael Poli

Join now to see all activity

Experience

Liquid AI
-
-
-

San Francisco Bay Area
-

Redmond, Washington, United States
-

Daejeon, South Korea
-

Daejeon, South Korea
-
-

Yuseong-gu, Daejeon, Korea
-

Singapore
-

Nanjing City, China

Education

Stanford University

2021 - 2026

Research at the intersection of machine learning, systems and signal processing

Advised by Stefano Ermon.
2018 - 2020

Deep learning, dynamical systems and differential equations.
2018 - 2018

中上班, 综合, 口语, 听力, 写作
2014 - 2017
2014 - 2017

Licenses & Certifications

TOEFL iBT 113/120

The TOEFL® test

Issued Oct 2020
汉语水平考试五级 (HSK5) - C1

Hanban

Issued Jun 2018

Publications

Hypersolvers: Toward Fast Continuous-Depth Models

Thirty-fourth Conference on Neural Information Processing Systems (NeurIPS20) July 19, 2020
The infinite-depth paradigm pioneered by Neural ODEs has launched a renaissance in the search for novel dynamical system-inspired deep learning primitives; however, their utilization in problems of non-trivial size has often proved impossible due to poor computational scalability. This work paves the way for scalable Neural ODEs with time-to-prediction comparable to traditional discrete networks. We introduce hypersolvers, neural networks designed to solve ODEs with low overhead and theoretical…

The infinite-depth paradigm pioneered by Neural ODEs has launched a renaissance in the search for novel dynamical system-inspired deep learning primitives; however, their utilization in problems of non-trivial size has often proved impossible due to poor computational scalability. This work paves the way for scalable Neural ODEs with time-to-prediction comparable to traditional discrete networks. We introduce hypersolvers, neural networks designed to solve ODEs with low overhead and theoretical guarantees on accuracy. The synergistic combination of hypersolvers and Neural ODEs allows for cheap inference and unlocks a new frontier for practical application of continuous-depth models. Experimental evaluations on standard benchmarks, such as sampling for continuous normalizing flows, reveal consistent pareto efficiency over classical numerical methods.

Other authors
See publication
WATTNet: Learning to Trade FX via Hierarchical Spatio-Temporal Representation of Highly Multivariate Time Series

The 29th International Joint Conference on Artificial Intelligence, IJCAI-PRICAI2020 May 17, 2020
Finance is a particularly challenging application area for deep learning models due to low noise-to-signal ratio, non-stationarity, and partial observability. Non-deliverable-forwards (NDF), a derivatives contract used in foreign exchange (FX) trading, presents additional difficulty in the form of long-term planning required for an effective selection of start and end date of the contract. In this work, we focus on tackling the problem of NDF tenor selection by leveraging high-dimensional…

Finance is a particularly challenging application area for deep learning models due to low noise-to-signal ratio, non-stationarity, and partial observability. Non-deliverable-forwards (NDF), a derivatives contract used in foreign exchange (FX) trading, presents additional difficulty in the form of long-term planning required for an effective selection of start and end date of the contract. In this work, we focus on tackling the problem of NDF tenor selection by leveraging high-dimensional sequential data consisting of spot rates, technical indicators and expert tenor patterns. To this end, we construct a dataset from the Depository Trust & Clearing Corporation (DTCC) NDF data that includes a comprehensive list of NDF volumes and daily spot rates for 64 FX pairs. We introduce WaveATTentionNet (WATTNet), a novel temporal convolution (TCN) model for spatio-temporal modeling of highly multivariate time series, and validate it across NDF markets with varying degrees of dissimilarity between the training and test periods in terms of volatility and general market regimes. The proposed method achieves a significant positive return on investment (ROI) in all NDF markets under analysis, outperforming recurrent and classical baselines by a wide margin. Finally, we propose two orthogonal interpretability approaches to verify noise stability and detect the driving factors of the learned tenor selection strategy.

Other authors
See publication
Neural Ordinary Differential Equation Value Networks for Parametrized Action Spaces

Eighth International Conference on Learning Representations (ICLR2020), Workshop on Integration of Deep Neural Models and Differential Equations February 27, 2020
Action spaces equipped with parameter sets are a common occurrence in reinforcement learning applications. Solutions to problems of this class have been developed under different frameworks, such as parametrized action Markov decision processes (PAMDP) or hierarchical reinforcement learning (HRL). These approaches often require extensions or modifications to standard existing algorithms developed on standard MDPs. For this reason they can be unwieldy and, particularly in the case of HRL…

Action spaces equipped with parameter sets are a common occurrence in reinforcement learning applications. Solutions to problems of this class have been developed under different frameworks, such as parametrized action Markov decision processes (PAMDP) or hierarchical reinforcement learning (HRL). These approaches often require extensions or modifications to standard existing algorithms developed on standard MDPs. For this reason they can be unwieldy and, particularly in the case of HRL, computationally inefficient. We propose adopting a different parametrization scheme for state--action value networks based on neural ordinary differential equations (NODEs) as a scalable, plug--and--play approach for parametrized action spaces. NODEs value networks do not require extensive modification to existing algorithms nor the adoption of HRL methods. Our solution can directly be integrated into existing training algorithms and opens up new opportunities in single-agent and multi-agent settings with tight precision constraints on the action parameters such as robotics.

Other authors
See publication
Port-Hamiltonian Gradient Flows

Eighth International Conference on Learning Representations (ICLR2020), Workshop on Integration of Deep Neural Models and Differential Equations February 27, 2020
In this paper we present a general framework for continuous--time gradient descent, often referred to as gradient flow. We extend Hamiltonian gradient flows, which ascribe mechanical dynamics to neural network parameters and constitute a natural continuous-time alternative to discrete momentum-based gradient descent approaches. The proposed Port-Hamiltonian Gradient Flow (PHGF) casts neural network training into a system--theoretic framework: a fictitious physical system is coupled to the…

In this paper we present a general framework for continuous--time gradient descent, often referred to as gradient flow. We extend Hamiltonian gradient flows, which ascribe mechanical dynamics to neural network parameters and constitute a natural continuous-time alternative to discrete momentum-based gradient descent approaches. The proposed Port-Hamiltonian Gradient Flow (PHGF) casts neural network training into a system--theoretic framework: a fictitious physical system is coupled to the neural network by setting the loss function as an energy term of the system. As autonomous port--Hamiltonian systems naturally tend to dissipate energy towards one of its minima by construction, solving the system simultaneously trains the neural network. We show that general PHGFs are compatible with both continuous-time data--stream optimization, where the optimizer processes a continuous stream of data, as well as standard fixed-step optimization. In continuous-time, PHGFs allow for the embedding of black--box adaptive--step ODE solvers and are able to stick to the energy manifold, thus avoiding divergence due to large learning rates. In fixed-step optimization, on the other hand, PGHFs open the door to novel fixed-step approaches based on symplectic discretizations of the Port--Hamiltonian with similar memory footprint and computational complexity as momentum optimizers.

Other authors
See publication
Dissecting Neural ODEs

Thirty-fourth Conference on Neural Information Processing Systems (NeurIPS20) February 19, 2020
Continuous deep learning architectures have recently re-emerged as variants of Neural Ordinary Differential Equations (Neural ODEs). The infinite-depth approach offered by these models theoretically bridges the gap between deep learning and dynamical systems; however, deciphering their inner working is still an open challenge and most of their applications are currently limited to the inclusion as generic black-box modules. In this work, we "open the box" and offer a system-theoretic…

Continuous deep learning architectures have recently re-emerged as variants of Neural Ordinary Differential Equations (Neural ODEs). The infinite-depth approach offered by these models theoretically bridges the gap between deep learning and dynamical systems; however, deciphering their inner working is still an open challenge and most of their applications are currently limited to the inclusion as generic black-box modules. In this work, we "open the box" and offer a system-theoretic perspective, including state augmentation strategies and robustness, with the aim of clarifying the influence of several design choices on the underlying dynamics. We also introduce novel architectures: among them, a Galerkin-inspired depth-varying parameter model and neural ODEs with data-controlled vector fields.

Other authors
See publication
Graph Neural Ordinary Differential Equations

The AAAI-20 Workshop on Deep Learning on Graphs: Methodologies and Applications (AAAI-DLGMA 2020) November 19, 2019
We extend the framework of graph neural networks (GNN) to continuous time. Graph neural ordinary differential equations (GDEs) are introduced as the counterpart to GNNs where the input--output relationship is determined by a continuum of GNN layers. The GDE framework is shown to be compatible with the majority of commonly used GNN models with minimal modification to the original formulations. We evaluate the effectiveness of GDEs on both static as well as dynamic datasets: results prove their…

We extend the framework of graph neural networks (GNN) to continuous time. Graph neural ordinary differential equations (GDEs) are introduced as the counterpart to GNNs where the input--output relationship is determined by a continuum of GNN layers. The GDE framework is shown to be compatible with the majority of commonly used GNN models with minimal modification to the original formulations. We evaluate the effectiveness of GDEs on both static as well as dynamic datasets: results prove their general effectiveness even in cases where the data is not generated by continuous time processes.

Other authors
See publication
WATTNet: Learning to Trade FX via Hierarchical Spatio-Temporal Representation of Highly Multivariate Time Series

The AAAI-20 Workshop on Knowledge Discovery from Unstructured Data in Financial Services (AAAI-KDF 2020) September 24, 2019
Finance is a particularly challenging application area for deep learning models due to low noise-to-signal ratio, non-stationarity, and partial observability. Non-deliverable-forwards (NDF), a derivatives contract used in foreign exchange (FX) trading, presents additional difficulty in the form of long-term planning required for an effective selection of start and end date of the contract. In this work, we focus on tackling the problem of NDF tenor selection by leveraging high-dimensional…

Finance is a particularly challenging application area for deep learning models due to low noise-to-signal ratio, non-stationarity, and partial observability. Non-deliverable-forwards (NDF), a derivatives contract used in foreign exchange (FX) trading, presents additional difficulty in the form of long-term planning required for an effective selection of start and end date of the contract. In this work, we focus on tackling the problem of NDF tenor selection by leveraging high-dimensional sequential data consisting of spot rates, technical indicators and expert tenor patterns. To this end, we construct a dataset from the Depository Trust & Clearing Corporation (DTCC) NDF data that includes a comprehensive list of NDF volumes and daily spot rates for 64 FX pairs. We introduce WaveATTentionNet (WATTNet), a novel temporal convolution (TCN) model for spatio-temporal modeling of highly multivariate time series, and validate it across NDF markets with varying degrees of dissimilarity between the training and test periods in terms of volatility and general market regimes. The proposed method achieves a significant positive return on investment (ROI) in all NDF markets under analysis, outperforming recurrent and classical baselines by a wide margin. Finally, we propose two orthogonal interpretability approaches to verify noise stability and detect the driving factors of the learned tenor selection strategy.

Other authors
See publication
Port-Hamiltonian Approach to Neural Network Training

IEEE Conference on Decision and Control (CDC19) July 21, 2019
Neural networks are discrete entities: subdivided into discrete layers and parametrized by weights which are iteratively optimized via difference equations. Recent work proposes networks with layer outputs which are no longer quantized but are solutions of an ordinary differential equation (ODE); however, these networks are still optimized via discrete methods (e.g. gradient descent). In this paper, we explore a different direction: namely, we propose a novel framework for learning in which the…

Neural networks are discrete entities: subdivided into discrete layers and parametrized by weights which are iteratively optimized via difference equations. Recent work proposes networks with layer outputs which are no longer quantized but are solutions of an ordinary differential equation (ODE); however, these networks are still optimized via discrete methods (e.g. gradient descent). In this paper, we explore a different direction: namely, we propose a novel framework for learning in which the parameters themselves are solutions of ODEs. By viewing the optimization process as the evolution of a port-Hamiltonian system, we can ensure convergence to a minimum of the objective function. Numerical experiments have been performed to show the validity and effectiveness of the proposed methods.

Other authors
See publication

Courses

AI-based Time Series Analysis

AI608
Advanced Machine Learning

CS671
Business Intelligence

KSE521
Data-Driven Decision Making and Control

IE437
Dynamic Programming and Reinforcement Learning

IE540
Game Theory and Multi-Agent Reinforcement Learning

IE801
Graph Mining and Social Network Analysis

AI607
Introduction to Financial Engineering

IE471
Linear Programming

IE521
Machine Learning for Healthcare

AI810
Optimization for AI

AI505
Parallel Architectures for Deep Learning

EE488
Probability and Statistics

CC511
Recent Advances in Deep Learning

EE807
Statistical Learning Theory

EE531

Projects

TorchDyn: A library for all things neural differential equations

Feb 2020
The development of torchdyn, sparked by the joint work of Michael Poli & Stefano Massaroli, has been supported throughout by their almae maters. In particular, by Prof. Jinkyoo Park (KAIST), Prof. Atsushi Yamashita (The University of Tokyo) and Prof. Hajime Asama (The University of Tokyo).

Other creators
See project

Honors & Awards

"Almatong" double-degree project scholarship

University of Bologna

May 2015

Double-degree project awarded because of high ranking in university career.
It entails cover on great part of Shanghai's Tongji University stay expenses, Bologna-Shanghai round trip flight ticket and medical insurance.

Languages

English

Native or bilingual proficiency
Italiano

Native or bilingual proficiency
中文

Professional working proficiency
Français

Elementary proficiency
한국어

Elementary proficiency

More activity by Michael

Career update: After 3 years at Meta... I'm still at Meta, but I moved to the PyTorch Core team! Don't worry: I will still be maintaining TorchRL…

Career update: After 3 years at Meta... I'm still at Meta, but I moved to the PyTorch Core team! Don't worry: I will still be maintaining TorchRL…

Liked by Michael Poli
Hey, I'll be speaking at the AI Engineer World's Fair (https://lnkd.in/ekM8VQPP) in SF on June 26. We'll discuss fine-tuning and model merging in…

Hey, I'll be speaking at the AI Engineer World's Fair (https://lnkd.in/ekM8VQPP) in SF on June 26. We'll discuss fine-tuning and model merging in…

Liked by Michael Poli
When a new model is released, those are the key metrics I first look at to understand its performance: 👀 - MixEval: A dynamic benchmark evaluating…

When a new model is released, those are the key metrics I first look at to understand its performance: 👀 - MixEval: A dynamic benchmark evaluating…

Liked by Michael Poli
We are excited to announce the launch of Dragonfly, an instruction-tuned vision-language architecture that enhances fine-grained visual understanding…

We are excited to announce the launch of Dragonfly, an instruction-tuned vision-language architecture that enhances fine-grained visual understanding…

Liked by Michael Poli
⭐ The LLM Course reached 30k stars on GitHub! The popularity of this course is pretty amazing to me. To put things into perspective, it has more…

⭐ The LLM Course reached 30k stars on GitHub! The popularity of this course is pretty amazing to me. To put things into perspective, it has more…

Liked by Michael Poli
ChatGPT for genomes - introducing EVO: Scientists from Stanford University and UC Berkeley have trained an AI model on vast troves of biological data…

ChatGPT for genomes - introducing EVO: Scientists from Stanford University and UC Berkeley have trained an AI model on vast troves of biological data…

Liked by Michael Poli
New AI site dropping tomorrow! It'll let you explore careers by uploading your resume and add your interests. After you upload your resume and add…

New AI site dropping tomorrow! It'll let you explore careers by uploading your resume and add your interests. After you upload your resume and add…

Liked by Michael Poli
Excited to announce RTF is the first release of a new research 📝 group at Liquid AI, led by Stefano Massaroli and I. We will keep focusing on the…

Excited to announce RTF is the first release of a new research 📝 group at Liquid AI, led by Stefano Massaroli and I. We will keep focusing on the…

Shared by Michael Poli
⭐ Impressed by the new Yi-1.5-34B model from 01.AI Yi-34B was used as the unofficial, high-quality Llama 2-34B that was never released. This 1.5…

⭐ Impressed by the new Yi-1.5-34B model from 01.AI Yi-34B was used as the unofficial, high-quality Llama 2-34B that was never released. This 1.5…

Liked by Michael Poli

View Michael’s full profile

See who you know in common
Get introduced
Contact Michael directly

Join to view full profile

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Others named Michael Poli in United States

31 others named Michael Poli in United States are on LinkedIn

See others named Michael Poli

Add new skills with these courses

See all courses

Michael Poli

Palo Alto, California, United States 968 followers 500+ connections

See your mutual connections View mutual connections with Michael Sign in Welcome back Email or phone Password Show Forgot password? Sign in or New to LinkedIn? Join now or New to LinkedIn? Join now

About

Activity

We’re hiring again :) and this time it’s to work alongside Jeffrey who is leading our Product Engineering team. The least I can say about Jeffrey…

Liked by Michael Poli

Hey yo gang : Looking for 2 people to join the beautiful Liquid team. - C++ Engineer to work on low level systems engineering and porting models…

Liked by Michael Poli

Huge news today for Godot Engine (top 10 open source project) creators and stewards W4 Games — We are proud to announce that Second Dinner (makers of…

Liked by Michael Poli

Experience

-

-

-

-

-

-

-

-

-

-

Education

Licenses & Certifications

TOEFL iBT 113/120

汉语水平考试五级 (HSK5) - C1

Publications

Thirty-fourth Conference on Neural Information Processing Systems (NeurIPS20) July 19, 2020

The 29th International Joint Conference on Artificial Intelligence, IJCAI-PRICAI2020 May 17, 2020

Eighth International Conference on Learning Representations (ICLR2020), Workshop on Integration of Deep Neural Models and Differential Equations February 27, 2020

Eighth International Conference on Learning Representations (ICLR2020), Workshop on Integration of Deep Neural Models and Differential Equations February 27, 2020

Thirty-fourth Conference on Neural Information Processing Systems (NeurIPS20) February 19, 2020

The AAAI-20 Workshop on Deep Learning on Graphs: Methodologies and Applications (AAAI-DLGMA 2020) November 19, 2019

The AAAI-20 Workshop on Knowledge Discovery from Unstructured Data in Financial Services (AAAI-KDF 2020) September 24, 2019

IEEE Conference on Decision and Control (CDC19) July 21, 2019

Courses

AI-based Time Series Analysis

AI608

Advanced Machine Learning

CS671

Business Intelligence

KSE521

Data-Driven Decision Making and Control

IE437

Dynamic Programming and Reinforcement Learning

IE540

Game Theory and Multi-Agent Reinforcement Learning

IE801

Graph Mining and Social Network Analysis

AI607

Introduction to Financial Engineering

IE471

Linear Programming

IE521

Machine Learning for Healthcare

AI810

Optimization for AI

AI505

Parallel Architectures for Deep Learning

EE488

Probability and Statistics

CC511

Recent Advances in Deep Learning

EE807

Statistical Learning Theory

EE531

Projects

Feb 2020

Honors & Awards

"Almatong" double-degree project scholarship

University of Bologna

Languages

English

Native or bilingual proficiency

Italiano

Native or bilingual proficiency

中文

Professional working proficiency

Français

Elementary proficiency

Palo Alto, California, United States

968 followers 500+ connections

View mutual connections with Michael

Welcome back

Email or phone

Password

Forgot password?

or

New to LinkedIn? Join now

or

New to LinkedIn? Join now