Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Machine Learning Engineering with Python: Manage the production life cycle of machine learning models using MLOps with practical examples
Machine Learning Engineering with Python: Manage the production life cycle of machine learning models using MLOps with practical examples
Machine Learning Engineering with Python: Manage the production life cycle of machine learning models using MLOps with practical examples
Ebook460 pages3 hours

Machine Learning Engineering with Python: Manage the production life cycle of machine learning models using MLOps with practical examples

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Machine learning engineering is a thriving discipline at the interface of software development and machine learning. This book will help developers working with machine learning and Python to put their knowledge to work and create high-quality machine learning products and services.

Machine Learning Engineering with Python takes a hands-on approach to help you get to grips with essential technical concepts, implementation patterns, and development methodologies to have you up and running in no time. You'll begin by understanding key steps of the machine learning development life cycle before moving on to practical illustrations and getting to grips with building and deploying robust machine learning solutions. As you advance, you'll explore how to create your own toolsets for training and deployment across all your projects in a consistent way. The book will also help you get hands-on with deployment architectures and discover methods for scaling up your solutions while building a solid understanding of how to use cloud-based tools effectively. Finally, you'll work through examples to help you solve typical business problems.

By the end of this book, you'll be able to build end-to-end machine learning services using a variety of techniques and design your own processes for consistently performant machine learning engineering.

LanguageEnglish
Release dateNov 5, 2021
ISBN9781801077101
Machine Learning Engineering with Python: Manage the production life cycle of machine learning models using MLOps with practical examples

Related to Machine Learning Engineering with Python

Related ebooks

Computers For You

View More

Related articles

Reviews for Machine Learning Engineering with Python

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Machine Learning Engineering with Python - Andrew P. McMahon

    9781801079259.png

    BIRMINGHAM—MUMBAI

    Machine Learning Engineering with Python

    Copyright © 2021 Packt Publishing

    All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

    Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

    Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

    Publishing Product Manager: Ali Abidi

    Senior Editor: David Sugarman

    Content Development Editor: Nathanya Dias

    Technical Editor: Sonam Pandey

    Copy Editor: Safis Editing

    Project Coordinator: Aparna Ravikumar Nair

    Proofreader: Safis Editing

    Indexer: Sejal Dsilva

    Production Designer: Jyoti Chauhan

    First published: November 2021

    Production reference: 1280921

    Published by Packt Publishing Ltd.

    Livery Place

    35 Livery Street

    Birmingham

    B3 2PB, UK.

    ISBN 978-1-80107-925-9

    www.packt.com

    Contributors

    About the author

    Andrew Peter (Andy) McMahon is a machine learning engineer and data scientist with experience of working in, and leading, successful analytics and software teams. His expertise centers on building production-grade ML systems that can deliver value at scale. He is currently ML Engineering Lead at NatWest Group and was previously Analytics Team Lead at Aggreko.

    He has an undergraduate degree in theoretical physics from the University of Glasgow, as well as master's and Ph.D. degrees in condensed matter physics from Imperial College London. In 2019, Andy was named Data Scientist of the Year at the International Data Science Awards. He currently co-hosts the AI Right podcast, discussing hot topics in AI with other members of the Scottish tech scene.

    This book, and everything I've ever achieved, would not have been possible without a lot of people. I wish to thank my mum, for introducing me to science and science fiction, my dad, for teaching me not to have any regrets, and my wider family and friends for making my life full of laughter. Most of all, I want to thank my wife, Hayley, and my son, Teddy, for being my sunshine every single day and giving me a reason to keep pushing myself to be the best I can be.

    About the reviewers

    Daksh Trehan began his career as a data analyst. His love for data and statistics is unimaginable. Various statistical techniques introduced him to the world of ML and data science. While his focus is on being a data analyst, he loves to forecast given data using ML techniques. He understands the power of data in today's world and constantly tries to change the world using various ML techniques and his concrete data visualization skills. He loves to write articles on ML and AI, and these have bagged him more than 100,000 views to date. He has also contributed as an ML consultant to 365 Days as a TikTok creator, written by Dr. Markus Rach, which is available publicly on the Amazon e-book store.

    Ved Prakash Upadhyay is an experienced machine learning professional. He did his master's in information science at the University of Illinois Urbana-Champaign. Currently, he is working at IQVIA as a senior machine learning engineer. His work focuses on building recommendation systems for various pharma clients of IQVIA. He has strong experience with productionalizing machine learning pipelines and is skilled with the different tools that are used in the industry. Furthermore, he has acquired an in-depth conceptual knowledge of machine learning algorithms. IQVIA is a leading global provider of advanced analytics, technology solutions, and clinical research services to the life sciences industry.

    Michael Petrey is a data scientist with a background in education and consulting. He holds a master's in analytics from Georgia Tech and loves using data visualization and analysis to get people the best tools for their jobs. You might find Michael on a hike near Atlanta, eating ice cream in Boston, or at a café in Wellington.

    Table of Contents

    Preface

    Section 1: What Is ML Engineering?

    Chapter 1: Introduction to ML Engineering

    Technical requirements

    Defining a taxonomy of data disciplines

    Data scientist

    ML engineer

    Data engineer

    Assembling your team

    ML engineering in the real world

    What does an ML solution look like?

    Why Python?

    High-level ML system design

    Example 1: Batch anomaly detection service

    Example 2: Forecasting API

    Example 3: Streamed classification

    Summary

    Chapter 2: The Machine Learning Development Process

    Technical requirements

    Setting up our tools

    Setting up an AWS account

    Concept to solution in four steps

    Discover

    Play

    Develop

    Deploy

    Summary

    Section 2: ML Development and Deployment

    Chapter 3: From Model to Model Factory

    Technical requirements

    Defining the model factory

    Designing your training system

    Training system design options

    Train-run

    Train-persist

    Retraining required

    Detecting drift

    Engineering features for consumption

    Engineering categorical features

    Engineering numerical features

    Learning about learning

    Defining the target

    Cutting your losses

    Hierarchies of automation

    Optimizing hyperparameters

    AutoML

    Auto-sklearn

    Persisting your models

    Building the model factory with pipelines

    Scikit-learn pipelines

    Spark ML pipelines

    Summary

    Chapter 4: Packaging Up

    Technical requirements

    Writing good Python

    Recapping the basics

    Tips and tricks

    Adhering to standards

    Writing good PySpark

    Choosing a style

    Object-oriented programming

    Functional programming

    Packaging your code

    Why package?

    Selecting use cases for packaging

    Designing your package

    Building your package

    Testing, logging, and error handling

    Testing

    Logging

    Error handling

    Not reinventing the wheel

    Summary

    Chapter 5: Deployment Patterns and Tools

    Technical requirements

    Architecting systems

    Exploring the unreasonable effectiveness of patterns

    Swimming in data lakes

    Microservices

    Event-based designs

    Batching

    Containerizing

    Hosting your own microservice on AWS

    Pushing to ECR

    Hosting on ECS

    Creating a load balancer

    Pipelining 2.0

    Revisiting CI/CD

    Summary

    Chapter 6: Scaling Up

    Technical requirements

    Scaling with Spark

    Spark tips and tricks

    Spark on the cloud

    Spinning up serverless infrastructure

    Containerizing at scale with Kubernetes

    Summary

    Section 3: End-to-End Examples

    Chapter 7: Building an Example ML Microservice

    Technical requirements

    Understanding the forecasting problem

    Designing our forecasting service

    Selecting the tools

    Executing the build

    Training pipeline and forecaster

    Training and forecast handlers

    Summary

    Chapter 8: Building an Extract Transform Machine Learning Use Case

    Technical requirements

    Understanding the batch processing problem

    Designing an ETML solution

    Selecting the tools

    Interfaces

    Scaling of models

    Scheduling of ETML pipelines

    Executing the build

    Not reinventing the wheel in practice

    Using the Gitflow workflow

    Injecting some engineering practices

    Other Books You May Enjoy

    Preface

    Machine Learning (ML) is rightfully recognized as one of the most powerful tools available for organizations to extract value from their data. As the capabilities of ML algorithms have grown over the years, it has become increasingly obvious that implementing them in a scalable, fault-tolerant, and automated way is a discipline in its own right. This discipline, ML engineering, is the focus of this book.

    The book covers a wide variety of topics in order to help you understand the tools, techniques, and processes you can apply to engineer your ML solutions, with an emphasis on introducing the key concepts so that you can build on them in your own work. Much of what we will cover will also help you maintain and monitor your solutions, the purview of the closely related discipline of Machine Learning Operations (MLOps).

    All the code examples are given in Python, the most popular programming language for data applications. Python is a high-level and object-oriented language with a rich ecosystem of tools focused on data science and ML. Packages such as scikit-learn and pandas often form the backbone of ML modeling code in data science teams across the world. In this book, we will also use these tools but discuss how to wrap them up in production-grade pipelines and deploy them using appropriate cloud and open source tools. We will not spend a lot of time on how to build the best ML model, though some of the tools covered will certainly help with that. Instead, the aim is to understand what to do after you have an ML model.

    Many of the examples in the book will leverage services and solutions from Amazon Web Services (AWS). I believe that the accompanying explanations and discussions will, however, mean that you can still apply everything you learn here to any cloud provider or even in an on-premises setting.

    Machine Learning Engineering with Python will help you to navigate the challenges of taking ML to production and give you the confidence to start applying MLOps in your organizations.

    Who this book is for

    This book is for ML engineers, data scientists, and software developers who want to build robust software solutions with ML components. It is also relevant to anyone who manages or wants to understand the production life cycle of these systems. The book assumes intermediate-level knowledge of Python. Basic knowledge of AWS and Bash will also be beneficial.

    What this book covers

    Chapter 1, Introduction to ML Engineering, explains what we mean by ML engineering and how this relates to the disciplines of data science and data engineering. It covers what you need to do to build an effective ML engineering team, as well as what real software solutions containing ML can look like.

    Chapter 2, The Machine Learning Development Process, explores a development process that will be applicable to almost any ML engineering project. It discusses how you can set your development tooling up for success for later chapters as well.

    Chapter 3, From Model to Model Factory, teaches you how to build solutions that train multiple ML models during the product life cycle. It also covers drift detection and pipelining to help you start to build out your MLOps practices.

    Chapter 4, Packaging Up, discusses best practices for coding in Python and how this relates to building your own packages and libraries for reuse in multiple projects.

    Chapter 5, Deployment Patterns and Tools, teaches you some of the standard ways you can get your ML system into production. In particular, the chapter will focus on hosting solutions in the cloud.

    Chapter 6, Scaling Up, teaches you how to take your solutions and scale them to massive datasets or large numbers of prediction requests using Apache Spark and serverless infrastructure.

    Chapter 7, Building an Example ML Microservice, walks through how to use what you have learned elsewhere in the book to build a forecasting service that can be triggered via an API.

    Chapter 8, Building an Extract Transform Machine Learning Use Case, walks through how to use what you have learned to build a pipeline that performs batch processing. We do this by adding a lot of our newly acquired ML engineering best practices to the simple package created in Chapter 4, Packaging Up.

    To get the most out of this book

    To get the most out of the examples in the book, you will need access to a computer or server where you have privileges to install and run Python and Apache Spark applications. For many of the examples, you will also require access to a terminal, such as Bash. The examples in the book were built on a Linux machine running Bash so you may need to translate some pieces for your operating system and terminal. For some examples using AWS, you will require an account where you can enable billing. Examples in the book used Apache Spark v3.0.2.

    In Chapter 5, Deployment Patterns and Tools, we use the Managed Workflows with Apache Spark (MWAA) service from AWS. There is no free tier option for MWAA so as soon as you spin up the example, you will be charged for the environment and any instances. Ensure you are happy to do this before proceeding and I recommend closing down your MWAA instances when finished.

    In Chapter 7, Building an Example ML Microservice, we build out a use case leveraging the AWS Forecast service, which is only available in a subset of AWS Regions. To check the availability in your Region, and what Regions you can switch to for that example, you can use https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services/.

    Technical requirements are given in most of the chapters, but to support this, there are Conda environment .yml files provided in the book repository: https://github.com/PacktPublishing/Machine-Learning-Engineering-with-Python.

    If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book's GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

    Download the example code files

    You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Machine-Learning-Engineering-with-Python. If there's an update to the code, it will be updated in the GitHub repository.

    We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

    Download the color images

    We also provide a PDF file that has color images of the screenshots and diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781801079259_ColorImages.pdf.

    Conventions used

    There are a number of text conventions used throughout this book.

    Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: Mount the downloaded WebStorm-10*.dmg disk image file as another disk in your system.

    A block of code is set as follows:

    html, body, #map {

    height: 100%;

    margin: 0;

    padding: 0

    }

    When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

    [default]

    exten => s,1,Dial(Zap/1|30)

    exten => s,2,Voicemail(u100)

    exten => s,102,Voicemail(b100)

    exten => i,1,Voicemail(s0)

    Any command-line input or output is written as follows:

    $ mkdir css

    $ cd css

    Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: Select System info from the Administration panel.

    Tips or important notes

    Appear like this.

    Get in touch

    Feedback from our readers is always welcome.

    General feedback: If you have questions about any aspect of this book, email us at [email protected] and mention the book title in the subject of your message.

    Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.

    Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

    If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

    Share Your Thoughts

    Once you've read Machine Learning Engineering with Python, we'd love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.

    Your review is important to us and the tech community and will help us make sure we're delivering excellent quality content.

    Section 1: What Is ML Engineering?

    The objective of this section is to provide a discussion of what activities could be classed as ML engineering and how this constitutes an important element of using data to generate value in organizations. You will also be introduced to an example software development process that captures the key aspects required in any successful ML engineering project.

    This section comprises the following chapters:

    Chapter 1, Introduction to ML Engineering

    Chapter 2, The Machine Learning Development Process

    Chapter 1: Introduction to ML Engineering

    Welcome to Machine Learning Engineering with Python, a book that aims to introduce you to the exciting world of making Machine Learning (ML) systems production-ready.

    This book will take you through a series of chapters covering training systems, scaling up solutions, system design, model tracking, and a host of other topics, to prepare you for your own work in ML engineering or to work with others in this space. No book can be exhaustive on this topic, so this one will focus on concepts and examples that I think cover the foundational principles of this increasingly important discipline.

    You will get a lot from this book even if you do not run the technical examples, or even if you try to apply the main points in other programming languages or with different tools. In covering the key principles, the aim is that you come away from this book feeling more confident in tackling your own ML engineering challenges, whatever your chosen toolset.

    In this first chapter, you will learn about the different types of data role relevant to ML engineering and how to distinguish them; how to use this knowledge to build and work within appropriate teams; some of the key points to remember when building working ML products in the real world; how to start to isolate appropriate problems for engineered ML solutions; and how to create your own high-level ML system designs for a variety of typical business problems.

    We will cover all of these aspects in the following sections:

    Defining a taxonomy of data disciplines

    Assembling your team

    ML engineering in the real world

    What does an ML solution look like?

    High-level ML system design

    Now that we have explained what we are going after in this first chapter, let's get started!

    Technical requirements

    Throughout the book, we will assume that Python 3 is installed and working. The following Python packages are used in this chapter:

    Scikit-learn 0.23.2

    NumPy

    pandas

    imblearn

    Prophet 0.7.1

    Defining a taxonomy of data disciplines

    The explosion of data and the potential applications of that data over the past few years have led to a proliferation of job roles and responsibilities. The debate that once raged over how a data scientist was different from a statistician has now become extremely complex. I would argue, however, that it does not have to be so complicated. The activities that have to be undertaken to get value from data are pretty consistent, no matter what business vertical you are in, so it should be reasonable to expect that the skills and roles you need to perform these steps will also be relatively consistent. In this chapter, we will explore some of the main data disciplines that I think you will always need in any data project. As you can guess, given the name of this book, I will be particularly keen to explore the notion of ML engineering and how this

    Enjoying the preview?
    Page 1 of 1