Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Machine Learning for Time-Series with Python: Forecast, predict, and detect anomalies with state-of-the-art machine learning methods
Machine Learning for Time-Series with Python: Forecast, predict, and detect anomalies with state-of-the-art machine learning methods
Machine Learning for Time-Series with Python: Forecast, predict, and detect anomalies with state-of-the-art machine learning methods
Ebook734 pages4 hours

Machine Learning for Time-Series with Python: Forecast, predict, and detect anomalies with state-of-the-art machine learning methods

Rating: 0 out of 5 stars

()

Read preview

About this ebook

The Python time-series ecosystem is huge and often quite hard to get a good grasp on, especially for time-series since there are so many new libraries and new models. This book aims to deepen your understanding of time series by providing a comprehensive overview of popular Python time-series packages and help you build better predictive systems.

Machine Learning for Time-Series with Python starts by re-introducing the basics of time series and then builds your understanding of traditional autoregressive models as well as modern non-parametric models. By observing practical examples and the theory behind them, you will become confident with loading time-series datasets from any source, deep learning models like recurrent neural networks and causal convolutional network models, and gradient boosting with feature engineering.

This book will also guide you in matching the right model to the right problem by explaining the theory behind several useful models. You’ll also have a look at real-world case studies covering weather, traffic, biking, and stock market data.

By the end of this book, you should feel at home with effectively analyzing and applying machine learning methods to time-series.

LanguageEnglish
Release dateOct 29, 2021
ISBN9781801816106
Machine Learning for Time-Series with Python: Forecast, predict, and detect anomalies with state-of-the-art machine learning methods

Related to Machine Learning for Time-Series with Python

Related ebooks

Data Modeling & Design For You

View More

Related articles

Reviews for Machine Learning for Time-Series with Python

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Machine Learning for Time-Series with Python - Ben Auffarth

    Cover.png

    Machine Learning for Time-Series with Python

    Forecast, predict, and detect anomalies with state-of-the-art machine learning methods

    Ben Auffarth

    BIRMINGHAM—MUMBAI

    Python and the Python Logo are trademarks of the Python Software Foundation.

    Machine Learning for Time-Series with Python

    Copyright © 2021 Packt Publishing

    All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

    Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

    Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

    Producer: Dr. Shailesh Jain

    Acquisition Editor – Peer Reviews: Saby Dsilva

    Project Editor: Namrata Katare

    Content Development Editor: Alex Patterson

    Copy Editor: Safis Editor

    Technical Editor: Aditya Sawant

    Proofreader: Safis Editor

    Indexer: Sejal Dsilva

    Presentation Designer: Pranit Padwal

    First published: October 2021

    Production reference: 3110322

    Published by Packt Publishing Ltd.

    Livery Place

    35 Livery Street

    Birmingham

    B3 2PB, UK.

    ISBN 978-1-80181-962-6

    www.packt.com

    Contributors

    About the author

    Ben Auffarth is the author of Artificial Intelligence with Python Cookbook, and he co-founded and is the former president of Data Science Speakers, London. With a Ph.D. in computer science, Ben Auffarth has analyzed experiments with terabytes of data, run brain models on up to 64k cores, built systems processing hundreds of thousands of transactions per day, and trained neural networks on millions of text documents. He often encounters time-series problems in his work.

    My partner was working hard over the weekends so I could concentrate, and my son of two and a half years would often tell me to get to work (work, papa). I'm reading lots of stories to him to make up for this time. I'd like to thank the technical reviewer for his fantastic suggestions and spotting many errors (any remaining ones are on me).

    About the reviewers

    Kevin Sheppard is an academic economist who specializes in the application of statistical methodology to measuring economic phenomena. His research focuses on developing statistical methodology for measuring, modeling, and forecasting measures of risk. Kevin's research is widely used in portfolio management and risk measurement. He is the maintainer of the arch and linearmodels Python packages. He is also a core contributor to statsmodels and a committer to pandas and PyData.

    In 2019, his contributions to NumPy were recognized by an award from NumFocus. He has worked at the University of Oxford for the past 15 years. During this period, he has also worked for the Office of Financial Research in the U.S. Department of Treasury and has worked as a consultant to other governments and in the finance industry. Prior to joining Oxford, Kevin completed his PhD at the University of California-San Diego.

    Dr Andrey Kostenko recently assumed the role of lead data scientist at the Hydroinformatics Institute (H2i.sg), a specialized consultancy and solution services provider for all aspects of water management. Prior to joining H2i, Andrey had worked as a senior data scientist at IAG InsurTech Innovation Hub for over 3 years. Before moving to Singapore in 2018, he worked as a data scientist at TrafficGuard.ai, an Australian AdTech start-up developing novel data-driven algorithms for mobile ad fraud detection. In 2013, Andrey received his doctorate degree in mathematics and statistics from Monash University, Australia, after earning an MBA degree from the UK and his first university degree from Russia.

    Andrey is an enthusiastic, self-motivated, and result-oriented data science and machine learning professional, with extensive experience across a variety of disciplines and industries, including hands-on coding in R and Python to build, train, and serve time-series models for forecasting and other applications. He believes that lifelong learning and open source software are both critical for innovation in advanced analytics and artificial intelligence. Andrey is very passionate about data science in general and sequential data in particular, so one of his current focuses is on applications of deep learning to spatiotemporal data in the context of weather-related decision making.

    In his spare time, Andrey is often found engaged in competitive data science projects, learning new tools across the R and Python ecosystems, exploring the latest trends in web development, solving chess puzzles, or reading about the history of science and mathematics.

    Contents

    Preface

    Who this book is for

    What this book covers

    To get the most out of this book

    Get in touch

    Introduction to Time-Series with Python

    What Is a Time-Series?

    Characteristics of Time-Series

    Time-Series and Forecasting – Past and Present

    Demography

    Genetics

    Astronomy

    Economics

    Meteorology

    Medicine

    Applied Statistics

    Python for Time-Series

    Installing libraries

    Jupyter Notebook and JupyterLab

    NumPy

    pandas

    Best practice in Python

    Summary

    Time-Series Analysis with Python

    What is time-series analysis?

    Working with time-series in Python

    Requirements

    Datetime

    pandas

    Understanding the variables

    Uncovering relationships between variables

    Identifying trend and seasonality

    Summary

    Preprocessing Time-Series

    What Is Preprocessing?

    Feature Transforms

    Scaling

    Log and Power Transformations

    Imputation

    Feature Engineering

    Date- and Time-Related Features

    ROCKET

    Shapelets

    Python Practice

    Log and Power Transformations in Practice

    Imputation

    Holiday Features

    Date Annotation

    Paydays

    Seasons

    The Sun and Moon

    Business Days

    Automated Feature Extraction

    ROCKET

    Shapelets in Practice

    Summary

    Introduction to Machine Learning for Time-Series

    Machine learning with time-series

    Supervised, unsupervised, and reinforcement learning

    History of machine learning

    Machine learning workflow

    Cross-validation

    Error metrics for time-series

    Regression

    Classification

    Comparing time-series

    Machine learning algorithms for time-series

    Distance-based approaches

    Shapelets

    ROCKET

    Time-Series Forest and Canonical Interval Forest

    Symbolic approaches

    HIVE-COTE

    Discussion

    Implementations

    Summary

    Forecasting with Moving Averages and Autoregressive Models

    What are classical models?

    Moving average and autoregression

    Model selection and order

    Exponential smoothing

    ARCH and GARCH

    Vector autoregression

    Python libraries

    Statsmodels

    Python practice

    Requirements

    Modeling in Python

    Summary

    Unsupervised Methods for Time-Series

    Unsupervised methods for time-series

    Anomaly detection

    Microsoft

    Google

    Amazon

    Facebook

    Twitter

    Implementations

    Change point detection

    Clustering

    Python practice

    Requirements

    Anomaly detection

    Change point detection

    Summary

    Machine Learning Models for Time-Series

    More machine learning methods for time-series

    Validation

    K-nearest neighbors with dynamic time warping

    Silverkite

    Gradient boosting

    Python exercise

    Virtual environments

    K-nearest neighbors with dynamic time warping in Python

    Silverkite

    Gradient boosting

    Ensembles with Kats

    Summary

    Online Learning for Time-Series

    Online learning for time-series

    Online algorithms

    Drift

    Drift detection methods

    Adaptive learning methods

    Python practice

    Drift detection

    Regression

    Model selection

    Summary

    Probabilistic Models for Time-Series

    Probabilistic Models for Time-Series

    Prophet

    Markov Models

    Fuzzy Modeling

    Bayesian Structural Time-Series Models

    Python Exercise

    Prophet

    Markov Switching Model

    Fuzzy Time-Series

    Bayesian Structural Time-Series Modeling

    Summary

    Deep Learning for Time-Series

    Introduction to deep learning

    Deep learning for time-series

    Autoencoders

    InceptionTime

    DeepAR

    N-BEATS

    Recurrent neural networks

    ConvNets

    Transformer architectures

    Informer

    Python practice

    Fully connected network

    Recurrent neural network

    Dilated causal convolutional neural network

    Summary

    Reinforcement Learning for Time-Series

    Introduction to reinforcement learning

    Reinforcement Learning for Time-Series

    Bandit algorithms

    Deep Q-Learning

    Python Practice

    Recommendations

    Trading with DQN

    Summary

    Multivariate Forecasting

    Forecasting a Multivariate Time-Series

    Python practice

    What's next for time-series?

    Other Books You May Enjoy

    Index

    Landmarks

    Cover

    Index

    Preface

    Time-series are ubiquitous in industry and in research. Examples of time-series can be found in healthcare, energy, finance, user behavior, and website metrics to name just a few. Due to their prevalence, time-series modeling and forecasting is crucial and it's of great economic importance to be able to model them accurately.

    While traditional and well-established approaches have been dominating econometrics research and – until recently – industry, machine learning for time-series is a relatively new research field that's only recently come out of its infancy.

    In the last few years, a lot of progress has been made in machine learning on time-series; however, little of this has been made available in book form for a technical audience. Many books focus on traditional techniques, but hardly deal with recent machine learning techniques. This book aims to fill this gap and covers a lot of the latest progress, as evident in results from competition such as M4, or the current state-of-the-art in time-series classification.

    If you read this book, you'll learn about established as well as cutting edge techniques and tools in Python for machine learning with time-series. Each chapter covers a different topic, such as anomaly detection, probabilistic models, drift detection and adaptive online learning, deep learning models, and reinforcement learning. Each of these topics comes with a review of the latest research and an introduction to popular libraries with examples.

    Who this book is for

    If you want to build models that are reactive to the latest trends, seasonality, and business cycles, this is the book for you. This book is for data scientists, analysts, or programmers who want to learn more about time-series, and want to catch up on different techniques in machine learning.

    What this book covers

    Chapter 1, Introduction to Time-Series with Python, is a general introduction to the topic. You'll learn about time-series and why they are important, and many conventions, and you'll see an overview of applications and techniques that will be explained in more detail in dedicated chapters.

    Chapter 2, Time-Series Analysis with Python, breaks down the steps for analyzing time-series. It explains statistical tests and visualizations relevant for making sense of and drawing insights from time-series.

    Chapter 3, Preprocessing Time-Series, is about data treatment for time-series for traditional techniques and for machine learning. Methods such as naïve and Loess STL decomposition for seasonal and trend effects are covered, along with normalizations for values, as well as specific feature extraction techniques such as catch22 and ROCKET.

    Chapter 4, Introduction to Machine Learning for Time-Series, deals with an overview of the state of the art for univariate and multivariate time-series forecasts and predictions.

    Chapter 5, Forecasting with Moving Averages and Autoregressive Models, focuses on forecasting, mostly on univariate time-series (see Chapter 12, Multivariate Forecasting for multivariate time-series). Well-established traditional methods used in econometrics are introduced, explained, and applied on data sets.

    Chapter 6, Unsupervised Methods for Time-Series, introduces anomaly detection, change detection, and clustering. The chapter reviews industry practices at major technology companies such as Facebook, Amazon, Google, and others, and gives practical examples for both anomaly detection and change detection.

    Chapter 7, Machine Learning Models for Time-Series, reviews recent research on machine learning for time-series at institutes such as at the University of East Anglia and Monash University. Many techniques are summarized and compared throughout the chapter, and there's a practical section with many examples.

    Chapter 8, Online Learning for Time-Series, introduces online learning, a topic often neglected. Online models continuously update their parameters based on latest samples, and some of them have mechanisms to deal with different kinds of drift – a common problem with time-series.

    Chapter 9, Probabilistic Models for Time-Series, covers probabilistic models for time-series. This includes models with confidence intervals such as Facebook's Prophet, Markov Models, Fuzzy Models, and counter-factual causal models such as Bayesian Structural Time-Series Models as proposed by Google.

    Chapter 10, Deep Learning for Time-Series, reviews recent literature and benchmarks for different tasks. The chapter explains techniques such as autoencoders, InceptionTime, DeepAR, N-BEATS, Recurrent Neural Networks, ConvNets, and Informer. Deep learning still hasn't completely caught up with more traditional or other machine learning techniques; however, the progress has been promising, and for certain applications such as multivariate predictions, deep learning techniques are emerging as the state of the art, as can be seen in competitions such as M4.

    Chapter 11, Reinforcement Learning for Time-Series, gives an overview of basic concepts in reinforcement learning. It introduces techniques relevant for time-series such as bandit algorithms and Deep Q-Learning, and they are applied for a recommender system and for a trading algorithm.

    Chapter 12, Multivariate Forecasting, gives practical examples for multivariate multistep forecasts of energy demand with deep learning models.

    To get the most out of this book

    You should have a basic knowledge of Python to get started.

    All notebooks used in this book come with links to Google Colab, where you should be able to execute them.

    Download the example code files

    The code bundle for the book is hosted on GitHub at https://github.com/PacktPublishing/Machine-Learning-for-Time-Series-with-Python. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

    Download the color images

    We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781801819626_ColorImages.pdf.

    Conventions used

    There are a number of text conventions used throughout this book.

    CodeInText : Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. For example; Let's use UCBRegressor to select the best learning rate for a linear regression model.

    A block of code is set as follows:

    import

    numpy

    as

    np

    import

    pandas

    as

    pd

    from

    keras.layers

    import

    Conv1D, Input, Add, Activation, Dropout

    from

    keras.models

    import

    Sequential, Mode

    When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

    owid_covid[

    "

    date

    "

    ] = pd.to_datetime(owid_covid[

    "

    date

    "

    ]

    Any command-line input or output is written as follows:

    pip install xgboost

    Bold: Indicates a new term, an important word, or words that you see on the screen, for example, in menus or dialog boxes, also appear in the text like this. For example: "The task of identifying, quantifying, and decomposing these and other characteristics is called time-series analysis."

    Warnings or important notes appear like this.

    Tips and tricks appear like this.

    Get in touch

    Feedback from our readers is always welcome.

    General feedback: Email [email protected], and mention the book's title in the subject of your message. If you have questions about any aspect of this book, please email us at [email protected].

    Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book we would be grateful if you would report this to us. Please visit, http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

    Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

    If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit http://authors.packtpub.com.

    Share your thoughts

    Once you've read Machine Learning for Time-Series with Python, we'd love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.

    Your review is important to us and the tech community and will help us make sure we're delivering excellent quality content.

    1

    Introduction to Time-Series with Python

    This book is about machine learning for time-series with Python, and you can see this chapter as a 101 class for time-series. In this chapter, we'll introduce time-series, the history of research into time-series, and how to use Python for time-series.

    We'll start with what a time-series is and its main properties. We'll then look at the history of the study of time-series in different scientific disciplines foundational to the field, such as demography, astronomy, medicine, and economics.

    Then, we'll go over the capabilities of Python for time-series and why Python is the go-to language for doing machine learning with time-series. Finally, I will describe how to install the most prominent libraries in Python for time-series analysis and machine learning, and we'll cover the basics of Python as relevant to time-series and machine learning.

    We're going to cover the following topics:

    What Is a Time-Series?

    Characteristics of Time-Series

    Time-Series and Forecasting – Past and Present

    Demography

    Genetics

    Astronomy

    Economics

    Meteorology

    Medicine

    Applied Statistics

    Python for Time-Series

    But what is a time-series? Let's start with a definition!

    What Is a Time-Series?

    Since this is a book about time-series data, we should start with a clarification of what we are talking about. In this section, we'll introduce time-series and their characteristics, and we'll go through different kinds of problems and types of analyses relevant to machine learning and statistics.

    Many disciplines, such as finance, public administration, energy, retail, and healthcare, are dominated by time-series data. Large areas of micro- and macro-economics rely on applied statistics with an emphasis on time-series analyses and modeling. The following are examples of time-series data:

    Daily closing values of a stock index

    Number of weekly infections of a disease

    Weekly series of train accidents

    Rainfall per day

    Sensor data such as temperature measurements per hour

    Population growth per year

    Quarterly earnings of a company over a number of years

    This is only to name but a few. Any data that deals with changes over time is a time-series.

    It might be worth defining briefly what is considered a time-series.

    Definition: Time-Series are datasets where observations are arranged in chronological order.

    This is a very broad definition. Alternatively, we could have said that a time-series is a sequence of data points taken sequentially over time, or that a time-series is the result of a stochastic process.

    Formally, we can define a time-series in two ways. The first one is as a mapping from the time domain to the domain of real numbers:

    where and .

    Another way to define a time-series is as a stochastic process:

    Here, or denotes the value of the random variable X at time point t.

    If T is a set of real numbers, it's a continuous-time stochastic process. If T is a set of integers, we call it a stochastic process in discrete time. The convention in the latter case is to write .

    Since time is the primary index of the dataset, by implication, time-series datasets describe how the world changes over time. They often deal with the question of how the past influences the presence or future.

    The increase of monitoring and data collection brings with it the need for both statistical and machine learning techniques applied to time-series to predict and characterize the behavior of complex systems or components within a system. An important part of working with time-series is the question of how the future can be predicted based on the past. This is called forecasting.

    Some methods allow adding business cycles as additional features. These additional features are called exogenous features - they are time-dependent, explanatory variables. We'll go through examples of feature generation in chapter 3, Preprocessing Time-Series.

    Characteristics of Time-Series

    Here's an extract of a time-series dataset as an example, exported from Google Trends, on searches for Python, R, and Julia:

    /var/folders/80/g9sqgdws2rn0yc3rd5y3nd340000gp/T/TemporaryItems/(A Document Being Saved By screencaptureui 2)/Screenshot 2021-03-11 at 17.13.02.png

    Figure 1.1: Extract of a time-series dataset

    This is a multivariate time-series, with columns for Python, R, and Julia. The first column is the index, a date column, and its period is the month. In cases, where we have only a single variable, we speak of a univariate series. This dataset would be univariate if we had only one programming language instead of three.

    Time-Series mostly come as discrete-time, where the time difference between each point is the same. The most important characteristics of time-series are the following:

    Long-term movements of the values (trend)

    Seasonal variations (seasonality)

    Irregular or cyclic components

    A trend is the general direction in which something is developing or changing, such as a long-term increase or decrease in a sequence. An example of where a trend can be observed would be global warming, the process by which the temperatures on our planet have been rising over the last half-century.

    Here's a plot of global surface temperature changes over the last 100 years from the GISS Surface Temperature Analysis dataset released by NASA:

    temperatures.png

    Figure 1.2: GISS surface temperature analysis from 1880 to 2019

    As you can see in Figure 1.2, temperature changes have been varying around 0 until the mid-20th century; however, since then, there's been a clearly visible trend of an overall rise in the yearly temperature.

    Seasonality is a variation that occurs at specific regular intervals of less than a year. Seasonality can occur on different time spans, such as daily, weekly, monthly, or yearly. An example of weekly seasonality would be sales of ice cream picking up each weekend. Also, depending on where you live, ice cream might only be sold in spring and summer. This is a yearly variation.

    Other than seasonal changes and trends, there is variability that's not of a fixed frequency or that rises and falls in a way that's not based on seasonal frequency. Some of these we might be able to explain based on the knowledge we have.

    As an example of cyclic variability that's irregular, bank holidays can fall on different calendar days each year, and promotional campaigns could depend on business decisions, such as the introduction of a new product. As an example of cyclic changes that are not seasonal, changes at the scale of milliseconds or that take place over time periods longer than a year would not be called seasonal effects.

    Stationarity is the property of a time-series not to change its distribution over time as described by its summary statistics. If a time-series is stationary, it means that it has no trend and no deterministic seasonal variability, although other cyclical variability is permitted. This is an important feature for the algorithms that we'll discuss in Chapter 5, Forecasting with Moving Averages and Autoregressive Models. To apply them, we'll need to transform non-stationary data into stationary data by removing seasonality and trend.

    We'll discuss these and other concepts in more detail in Chapter 2, Time-Series Analysis with Python, and Chapter 3, Preprocessing Time-Series.

    The task of identifying, quantifying, and decomposing these and other characteristics is called time-series analysis. Exploratory time-series analysis is often the first step before any feature transformation and machine learning.

    Time-Series and Forecasting – Past and Present

    Time-Series have been studied since antiquity, and since then, time-series analysis and forecasting have come a long way. A variety of disciplines contributed to the development of techniques applied to time-series, including mathematics, astronomy, demographics, and statistics. Many innovations came initially from mathematics, later statistics, and finally machine learning. Many innovations in applied statistics had their origins in demography (used in public administration), economics, or other fields.

    In this section, I'll sketch the development path from simpler methods leading up to the machine learning methods available today. I'll try to chart the development of concepts relevant to time-series from the time of the Industrial Revolution to modernity. We'll deal with the more technical and up-to-date side of things in Chapter 4, Introduction to Machine Learning with Time-Series.

    There's still much more to come for time-series. The development of wearable sensors and the Internet of Things means that big data is available to be analyzed and used for forecasting. The availability of large datasets for benchmarks and competitions has been helping create new methods in recent years as we'll discuss in later chapters.

    Demography

    Much of the early work that went into establishing the theory and practice of time-series analysis came from demography as used in public administration. Many of the people mentioned in this section either worked as public servants or contributed in a private capacity out of interest in abstract problems.

    John Graunt, originally a haberdasher by profession, became interested in death records as

    Enjoying the preview?
    Page 1 of 1