Ebook697 pages4 hours

MLOps Engineering at Scale

Name: MLOps Engineering at Scale
Author: Carl Osipov
ISBN: 9781638356509

By Carl Osipov

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Dodge costly and time-consuming infrastructure tasks, and rapidly bring your machine learning models to production with MLOps and pre-built serverless tools!

In MLOps Engineering at Scale you will learn:

    Extracting, transforming, and loading datasets
    Querying datasets with SQL
    Understanding automatic differentiation in PyTorch
    Deploying model training pipelines as a service endpoint
    Monitoring and managing your pipeline’s life cycle
    Measuring performance improvements

MLOps Engineering at Scale shows you how to put machine learning into production efficiently by using pre-built services from AWS and other cloud vendors. You’ll learn how to rapidly create flexible and scalable machine learning systems without laboring over time-consuming operational tasks or taking on the costly overhead of physical hardware. Following a real-world use case for calculating taxi fares, you will engineer an MLOps pipeline for a PyTorch model using AWS server-less capabilities.

About the technology
A production-ready machine learning system includes efficient data pipelines, integrated monitoring, and means to scale up and down based on demand. Using cloud-based services to implement ML infrastructure reduces development time and lowers hosting costs. Serverless MLOps eliminates the need to build and maintain custom infrastructure, so you can concentrate on your data, models, and algorithms.

About the book
MLOps Engineering at Scale teaches you how to implement efficient machine learning systems using pre-built services from AWS and other cloud vendors. This easy-to-follow book guides you step-by-step as you set up your serverless ML infrastructure, even if you’ve never used a cloud platform before. You’ll also explore tools like PyTorch Lightning, Optuna, and MLFlow that make it easy to build pipelines and scale your deep learning models in production.

What's inside

    Reduce or eliminate ML infrastructure management
    Learn state-of-the-art MLOps tools like PyTorch Lightning and MLFlow
    Deploy training pipelines as a service endpoint
    Monitor and manage your pipeline’s life cycle
    Measure performance improvements

About the reader
Readers need to know Python, SQL, and the basics of machine learning. No cloud experience required.

About the author
Carl Osipov implemented his first neural net in 2000 and has worked on deep learning and machine learning at Google and IBM.

Table of Contents

PART 1 - MASTERING THE DATA SET
1 Introduction to serverless machine learning
2 Getting started with the data set
3 Exploring and preparing the data set
4 More exploratory data analysis and data preparation
PART 2 - PYTORCH FOR SERVERLESS MACHINE LEARNING
5 Introducing PyTorch: Tensor basics
6 Core PyTorch: Autograd, optimizers, and utilities
7 Serverless machine learning at scale
8 Scaling out with distributed training
PART 3 - SERVERLESS MACHINE LEARNING PIPELINE
9 Feature selection
10 Adopting PyTorch Lightning
11 Hyperparameter optimization
12 Machine learning pipeline

Skip carousel

LanguageEnglish

PublisherManning

Release dateMar 22, 2022

ISBN9781638356509

Author

Carl Osipov

Related authors

Skip carousel

Related to MLOps Engineering at Scale

Related ebooks

Skip carousel

Machine Learning Bookcamp: Build a portfolio of real-life projects
Ebook
Machine Learning Bookcamp: Build a portfolio of real-life projects
byAlexey Grigorev
Rating: 4 out of 5 stars
4/5
Deep Learning Patterns and Practices
Ebook
Deep Learning Patterns and Practices
byAndrew Ferlitsch
Rating: 0 out of 5 stars
0 ratings
Machine Learning with TensorFlow, Second Edition
Ebook
Machine Learning with TensorFlow, Second Edition
byChris Mattmann
Rating: 0 out of 5 stars
0 ratings
Practical Full Stack Machine Learning: A Guide to Build Reliable, Reusable, and Production-Ready Full Stack ML Solutions
Ebook
Practical Full Stack Machine Learning: A Guide to Build Reliable, Reusable, and Production-Ready Full Stack ML Solutions
byAlok Kumar
Rating: 0 out of 5 stars
0 ratings
Designing Cloud Data Platforms
Ebook
Designing Cloud Data Platforms
byDanil Zburivsky
Rating: 0 out of 5 stars
0 ratings
Machine Learning Systems: Designs that scale
Ebook
Machine Learning Systems: Designs that scale
byJeffrey Smith
Rating: 0 out of 5 stars
0 ratings
Deep Learning with Structured Data
Ebook
Deep Learning with Structured Data
byMark Ryan
Rating: 0 out of 5 stars
0 ratings
Deep Learning with Python, Second Edition
Ebook
Deep Learning with Python, Second Edition
byFrancois Chollet
Rating: 0 out of 5 stars
0 ratings
Data Engineering on Azure
Ebook
Data Engineering on Azure
byVlad Riscutia
Rating: 0 out of 5 stars
0 ratings
Graph Databases in Action: Examples in Gremlin
Ebook
Graph Databases in Action: Examples in Gremlin
byJosh Perryman
Rating: 0 out of 5 stars
0 ratings
TensorFlow in Action
Ebook
TensorFlow in Action
byThushan Ganegedara
Rating: 0 out of 5 stars
0 ratings
Data Lake Development with Big Data
Ebook
Data Lake Development with Big Data
byPasupuleti Pradeep
Rating: 0 out of 5 stars
0 ratings
Serverless Architectures on AWS, Second Edition
Ebook
Serverless Architectures on AWS, Second Edition
byPeter Sbarski
Rating: 5 out of 5 stars
5/5
Infrastructure as Code, Patterns and Practices: With examples in Python and Terraform
Ebook
Infrastructure as Code, Patterns and Practices: With examples in Python and Terraform
byRosemary Wang
Rating: 0 out of 5 stars
0 ratings
Machine Learning Engineering in Action
Ebook
Machine Learning Engineering in Action
byBen Wilson
Rating: 0 out of 5 stars
0 ratings
Graph-Powered Machine Learning
Ebook
Graph-Powered Machine Learning
byAlessandro Negro
Rating: 0 out of 5 stars
0 ratings
How to Lead in Data Science
Ebook
How to Lead in Data Science
byJike Chong
Rating: 0 out of 5 stars
0 ratings
Machine Learning in Action
Ebook
Machine Learning in Action
byPeter Harrington
Rating: 0 out of 5 stars
0 ratings
Natural Language Processing in Action: Understanding, analyzing, and generating text with Python
Ebook
Natural Language Processing in Action: Understanding, analyzing, and generating text with Python
byHannes Hapke
Rating: 0 out of 5 stars
0 ratings
Practical Recommender Systems
Ebook
Practical Recommender Systems
byKim Falk
Rating: 5 out of 5 stars
5/5
Data Pipelines with Apache Airflow
Ebook
Data Pipelines with Apache Airflow
byJulian de Ruiter
Rating: 0 out of 5 stars
0 ratings
Deep Learning for Search
Ebook
Deep Learning for Search
byTommaso Teofili
Rating: 0 out of 5 stars
0 ratings
Spark in Action: Covers Apache Spark 3 with Examples in Java, Python, and Scala
Ebook
Spark in Action: Covers Apache Spark 3 with Examples in Java, Python, and Scala
byJean-Georges Perrin
Rating: 0 out of 5 stars
0 ratings
Mastering Machine Learning on AWS: Advanced machine learning in Python using SageMaker, Apache Spark, and TensorFlow
Ebook
Mastering Machine Learning on AWS: Advanced machine learning in Python using SageMaker, Apache Spark, and TensorFlow
byDr. Saket S.R. Mengle
Rating: 0 out of 5 stars
0 ratings
Grokking Deep Learning
Ebook
Grokking Deep Learning
byAndrew W. Trask
Rating: 0 out of 5 stars
0 ratings
Grokking Machine Learning
Ebook
Grokking Machine Learning
byLuis Serrano
Rating: 0 out of 5 stars
0 ratings
Real-World Natural Language Processing: Practical applications with deep learning
Ebook
Real-World Natural Language Processing: Practical applications with deep learning
byMasato Hagiwara
Rating: 0 out of 5 stars
0 ratings
Pandas in Action
Ebook
Pandas in Action
byBoris Paskhaver
Rating: 0 out of 5 stars
0 ratings
Mastering Large Datasets with Python: Parallelize and Distribute Your Python Code
Ebook
Mastering Large Datasets with Python: Parallelize and Distribute Your Python Code
byJohn Wolohan
Rating: 0 out of 5 stars
0 ratings
Data Science with Python and Dask
Ebook
Data Science with Python and Dask
byJesse Daniel
Rating: 0 out of 5 stars
0 ratings

Computers For You

Skip carousel

The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution
Ebook
The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution
byWalter Isaacson
Rating: 4 out of 5 stars
4/5
The Simulation Hypothesis: An MIT Computer Scientist Shows Why AI, Quantum Physics and Eastern Mystics All Agree We Are In a Video Game
Ebook
The Simulation Hypothesis: An MIT Computer Scientist Shows Why AI, Quantum Physics and Eastern Mystics All Agree We Are In a Video Game
byRizwan Virk
Rating: 5 out of 5 stars
5/5
Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics
Ebook
Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics
byGary Smith
Rating: 4 out of 5 stars
4/5
Elon Musk
Ebook
Elon Musk
byWalter Isaacson
Rating: 4 out of 5 stars
4/5
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
Ebook
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
byCea West
Rating: 5 out of 5 stars
5/5
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Ebook
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
byMargot Lee Shetterly
Rating: 4 out of 5 stars
4/5
The Invisible Rainbow: A History of Electricity and Life
Ebook
The Invisible Rainbow: A History of Electricity and Life
byArthur Firstenberg
Rating: 5 out of 5 stars
5/5
Slenderman: Online Obsession, Mental Illness, and the Violent Crime of Two Midwestern Girls
Ebook
Slenderman: Online Obsession, Mental Illness, and the Violent Crime of Two Midwestern Girls
byKathleen Hale
Rating: 4 out of 5 stars
4/5
The Hacker Crackdown: Law and Disorder on the Electronic Frontier
Ebook
The Hacker Crackdown: Law and Disorder on the Electronic Frontier
byBruce Sterling
Rating: 4 out of 5 stars
4/5
101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters
Ebook
101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters
byTriumph Books
Rating: 4 out of 5 stars
4/5
ChatGPT Money Machine 2024 - The Ultimate Chatbot Cheat Sheet to Go From Clueless Noob to Prompt Prodigy Fast! Complete AI Beginner’s Course to Catch the GPT Gold Rush Before It Leaves You Behind
Ebook
ChatGPT Money Machine 2024 - The Ultimate Chatbot Cheat Sheet to Go From Clueless Noob to Prompt Prodigy Fast! Complete AI Beginner’s Course to Catch the GPT Gold Rush Before It Leaves You Behind
byAlec Rowe
Rating: 0 out of 5 stars
0 ratings
CompTIA Security+ Get Certified Get Ahead: SY0-701 Study Guide
Ebook
CompTIA Security+ Get Certified Get Ahead: SY0-701 Study Guide
byJoe Shelley
Rating: 5 out of 5 stars
5/5
Uncanny Valley: A Memoir
Ebook
Uncanny Valley: A Memoir
byAnna Wiener
Rating: 4 out of 5 stars
4/5
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
Ebook
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
byWalter Shields
Rating: 4 out of 5 stars
4/5
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
Ebook
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
byArthur T. Brooks
Rating: 0 out of 5 stars
0 ratings
Ultimate Guide to Mastering Command Blocks!: Minecraft Keys to Unlocking Secret Commands
Ebook
Ultimate Guide to Mastering Command Blocks!: Minecraft Keys to Unlocking Secret Commands
byTriumph Books
Rating: 5 out of 5 stars
5/5
Deep Search: How to Explore the Internet More Effectively
Ebook
Deep Search: How to Explore the Internet More Effectively
byAlan Pearce
Rating: 5 out of 5 stars
5/5
Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad
Ebook
Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad
byAaron Smith
Rating: 0 out of 5 stars
0 ratings
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally
Ebook
How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally
byAlex Parkinson
Rating: 4 out of 5 stars
4/5
Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are
Ebook
Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are
bySeth Stephens-Davidowitz
Rating: 4 out of 5 stars
4/5
Grokking Algorithms: An illustrated guide for programmers and other curious people
Ebook
Grokking Algorithms: An illustrated guide for programmers and other curious people
byAditya Bhargava
Rating: 4 out of 5 stars
4/5
The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology
Ebook
The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology
byTJ Books
Rating: 4 out of 5 stars
4/5
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
Ebook
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
byCea West
Rating: 4 out of 5 stars
4/5
Dark Aeon: Transhumanism and the War Against Humanity
Ebook
Dark Aeon: Transhumanism and the War Against Humanity
byJoe Allen
Rating: 5 out of 5 stars
5/5
CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61
Ebook
CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61
byQuentin Docter
Rating: 0 out of 5 stars
0 ratings
The Professional Voiceover Handbook: Voiceover training, #1
Ebook
The Professional Voiceover Handbook: Voiceover training, #1
byPeter Baker
Rating: 5 out of 5 stars
5/5
CompTIA Certification: The Ultimate Guide To Discover CompTIA. Certified Quickly And Easily Passing The Certification Exam. Real Practice Test With Detailed Screenshots, Answers And Explanations
Ebook
CompTIA Certification: The Ultimate Guide To Discover CompTIA. Certified Quickly And Easily Passing The Certification Exam. Real Practice Test With Detailed Screenshots, Answers And Explanations
byDavid Mayer
Rating: 0 out of 5 stars
0 ratings
Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work
Ebook
Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
Ebook
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
byNigel Tillery
Rating: 0 out of 5 stars
0 ratings

Related podcast episodes

Skip carousel

Build A Data Lake For Your Security Logs With Scanner: Monitoring and auditing IT systems for security events requires the ability to quickly analyze massive volumes of unstructured log data. The majority of products that are available either require too much effort to structure the logs, or aren't fast enough for interactive use cases. Cliff Crosland co-founded Scanner to provide fast querying of high scale log data for security auditing. In this episode he shares the story of how it got started, how it works, and how you can get started with it.
Podcast episode
Build A Data Lake For Your Security Logs With Scanner: Monitoring and auditing IT systems for security events requires the ability to quickly analyze massive volumes of unstructured log data. The majority of products that are available either require too much effort to structure the logs, or aren't fast enough for interactive use cases. Cliff Crosland co-founded Scanner to provide fast querying of high scale log data for security auditing. In this episode he shares the story of how it got started, how it works, and how you can get started with it.
byData Engineering Podcast
0 ratings
0% found this document useful
Build Better Machine Learning Models With Confidence By Adding Validation With Deepchecks: A cross-over episode from The Machine Learning Podcast with the team from Deepchecks, exploring the challenges of testing and validating machine learning applications and their work to make it easier.
Podcast episode
Build Better Machine Learning Models With Confidence By Adding Validation With Deepchecks: A cross-over episode from The Machine Learning Podcast with the team from Deepchecks, exploring the challenges of testing and validating machine learning applications and their work to make it easier.
byThe Python Podcast.__init__
0 ratings
0% found this document useful
#70 Beyond the Language Wars: R & Python for the Modern Data Scientist
Podcast episode
#70 Beyond the Language Wars: R & Python for the Modern Data Scientist
byDataFramed
0 ratings
0% found this document useful
Exploring deep reinforcement learning: with Thomas Simonini of Hugging Face
Podcast episode
Exploring deep reinforcement learning: with Thomas Simonini of Hugging Face
byPractical AI: Machine Learning, Data Science, LLM
0 ratings
0% found this document useful
A CIOs view of Enterprise Architecture - What You Need to Know
Podcast episode
A CIOs view of Enterprise Architecture - What You Need to Know
byEnterprise Architecture Podcast
0 ratings
0% found this document useful
Let's Talk About Natural Language Processing: This episode reboots our podcast with the theme of Natural Language Processing for the next few months. We begin with introductions of Yoshi and Linh Da and then get into a broad discussion about natural language processing: what it is, what some of...
Podcast episode
Let's Talk About Natural Language Processing: This episode reboots our podcast with the theme of Natural Language Processing for the next few months. We begin with introductions of Yoshi and Linh Da and then get into a broad discussion about natural language processing: what it is, what some of...
byData Skeptic
0 ratings
0% found this document useful
Eureka moments with natural language processing: featuring Nicholas Mohnacky of bundleIQ
Podcast episode
Eureka moments with natural language processing: featuring Nicholas Mohnacky of bundleIQ
byPractical AI: Machine Learning, Data Science, LLM
0 ratings
0% found this document useful
This Week In Machine Learning & AI - 5/20/16: AI at Google I/O, Amazon's Deep Learning DSSTNE: This Week In Machine Learning & AI - May 20, 2016…
Podcast episode
This Week In Machine Learning & AI - 5/20/16: AI at Google I/O, Amazon's Deep Learning DSSTNE: This Week In Machine Learning & AI - May 20, 2016…
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
Revisit The Fundamental Principles Of Working With Data To Avoid Getting Caught In The Hype Cycle: The data ecosystem has seen a constant flurry of activity for the past several years, and it shows no signs of slowing down. With all of the products, techniques, and buzzwords being discussed it can be easy to be overcome by the hype. In this episode Juan Sequeda and Tim Gasper from data.world share their views on the core principles that you can use to ground your work and avoid getting caught in the hype cycles.
Podcast episode
Revisit The Fundamental Principles Of Working With Data To Avoid Getting Caught In The Hype Cycle: The data ecosystem has seen a constant flurry of activity for the past several years, and it shows no signs of slowing down. With all of the products, techniques, and buzzwords being discussed it can be easy to be overcome by the hype. In this episode Juan Sequeda and Tim Gasper from data.world share their views on the core principles that you can use to ground your work and avoid getting caught in the hype cycles.
byData Engineering Podcast
0 ratings
0% found this document useful
Putting Airflow Into Production With James Meickle - Episode 43: Lessons Learned While Building A Data Science Platform With Airflow (Interview)
Podcast episode
Putting Airflow Into Production With James Meickle - Episode 43: Lessons Learned While Building A Data Science Platform With Airflow (Interview)
byData Engineering Podcast
0 ratings
0% found this document useful
Azure Databricks: I sat down with Ali Ghodsi, CEO and found of Databricks, and John Chirapurath, GM for Data Platform Marketing at Microsoft related to the recent announcement of Azure Databricks. When I heard about the announcement, my first thoughts were...
Podcast episode
Azure Databricks: I sat down with Ali Ghodsi, CEO and found of Databricks, and John Chirapurath, GM for Data Platform Marketing at Microsoft related to the recent announcement of Azure Databricks. When I heard about the announcement, my first thoughts were...
byData Skeptic
0 ratings
0% found this document useful
Leveling Up Natural Language Processing with Transfer Learning: An interview with Paul Azunre about how you can use transfer learning techniques to build more flexible natural language processing systems and reduce the requirements for labelled data.
Podcast episode
Leveling Up Natural Language Processing with Transfer Learning: An interview with Paul Azunre about how you can use transfer learning techniques to build more flexible natural language processing systems and reduce the requirements for labelled data.
byThe Python Podcast.__init__
0 ratings
0% found this document useful
#28 - Becoming an Effective Software Engineering Manager - James Stanier
Podcast episode
#28 - Becoming an Effective Software Engineering Manager - James Stanier
byTech Lead Journal
0 ratings
0% found this document useful
2155: Databricks - The Story Behind the Lakehouse Company: Many are citing open source as the future. The UK Government's National Data Strategy even talks about the importance of opening public sector datasets to form the backbone of innovation, efficiency, and growth. This is a trend that Databricks...
Podcast episode
2155: Databricks - The Story Behind the Lakehouse Company: Many are citing open source as the future. The UK Government's National Data Strategy even talks about the importance of opening public sector datasets to form the backbone of innovation, efficiency, and growth. This is a trend that Databricks...
byThe Tech Talks Daily Podcast
0 ratings
0% found this document useful
433: Falling for FastAPI: Mike's falling in love with FastAPI and gives us a hint at the next project he's building.
Podcast episode
433: Falling for FastAPI: Mike's falling in love with FastAPI and gives us a hint at the next project he's building.
byCoder Radio
0 ratings
0% found this document useful
HashiCorp Vault for Kubernetes: Bret is joined by Rosemary Wang from HashiCorp to show off Vault for Kubernetes, an open source secrets provider.
Podcast episode
HashiCorp Vault for Kubernetes: Bret is joined by Rosemary Wang from HashiCorp to show off Vault for Kubernetes, an open source secrets provider.
byDevOps and Docker Talk: Cloud Native Interviews and Tooling
0 ratings
0% found this document useful
#40 Becoming a Data Scientist
Podcast episode
#40 Becoming a Data Scientist
byDataFramed
100%
100% found this document useful
Google’s Site Reliability Engineering with Todd Underwood: Google’s site reliability engineers are responsible for maintaining the highly available services that power the Google software that we all use on a regular basis. O’Reilly recently published the book “Site Reliability Engineering: How Google Runs Pro...
Podcast episode
Google’s Site Reliability Engineering with Todd Underwood: Google’s site reliability engineers are responsible for maintaining the highly available services that power the Google software that we all use on a regular basis. O’Reilly recently published the book “Site Reliability Engineering: How Google Runs Pro...
byCloud Engineering Archives - Software Engineering Daily
100%
100% found this document useful
S1:E1 "The Beginning"
Podcast episode
S1:E1 "The Beginning"
byData Science Now
0 ratings
0% found this document useful
An Introduction to the Go Programming language with Andrew Gerrand: Andrew Gerrand is a developer at Google who works on the Go Programming Language (golang). Why Go and why now? What kinds of problems does Go solve that aren't a good match for existing languages? How does Go compare to C++ and improve upon it?
Podcast episode
An Introduction to the Go Programming language with Andrew Gerrand: Andrew Gerrand is a developer at Google who works on the Go Programming Language (golang). Why Go and why now? What kinds of problems does Go solve that aren't a good match for existing languages? How does Go compare to C++ and improve upon it?
byHanselminutes with Scott Hanselman
0 ratings
0% found this document useful
Composable Data Analytics
Podcast episode
Composable Data Analytics
byThe Cloudcast
0 ratings
0% found this document useful
Commanding the Council of the Lords of Thought with Anna Belak: A few years ago Corey caught wind of the open source project Sysdig, which at the time attracted his attention. Now it has turned into something “rather interesting” when it comes to observability and security. Anna Belak, Sysdig’s Director of Thought Lea
Podcast episode
Commanding the Council of the Lords of Thought with Anna Belak: A few years ago Corey caught wind of the open source project Sysdig, which at the time attracted his attention. Now it has turned into something “rather interesting” when it comes to observability and security. Anna Belak, Sysdig’s Director of Thought Lea
byScreaming in the Cloud
0 ratings
0% found this document useful
Understanding Time-Series Database Patterns
Podcast episode
Understanding Time-Series Database Patterns
byThe Cloudcast
0 ratings
0% found this document useful
An Exploration Of Tobias' Experience In Building A Data Lakehouse From Scratch: Five years of hosting the Data Engineering Podcast has provided Tobias Macey with a wealth of insight into the work of building and operating data systems at a variety of scales and for myriad purposes. In order to condense that acquired knowledge into a format that is useful to everyone Scott Hirleman turns the tables in this episode and asks Tobias about the tactical and strategic aspects of his experiences applying those lessons to the work of building a data platform from scratch.
Podcast episode
An Exploration Of Tobias' Experience In Building A Data Lakehouse From Scratch: Five years of hosting the Data Engineering Podcast has provided Tobias Macey with a wealth of insight into the work of building and operating data systems at a variety of scales and for myriad purposes. In order to condense that acquired knowledge into a format that is useful to everyone Scott Hirleman turns the tables in this episode and asks Tobias about the tactical and strategic aspects of his experiences applying those lessons to the work of building a data platform from scratch.
byData Engineering Podcast
0 ratings
0% found this document useful
Increase Your Odds Of Success For Analytics And AI Through More Effective Knowledge Management With AlignAI: Making effective use of data requires proper context around the information that is being used. As the size and complexity of your organization increases the difficulty of ensuring that everyone has the necessary knowledge about how to get their work done scales exponentially. Wikis and intranets are a common way to attempt to solve this problem, but they are frequently ineffective. Rehgan Avon co-founded AlignAI to help address this challenge through a more purposeful platform designed to collect and distribute the knowledge of how and why data is used in a business. In this episode she shares the strategic and tactical elements of how to make more effective use of the technical and organizational resources that are available to you for getting work done with data.
Podcast episode
Increase Your Odds Of Success For Analytics And AI Through More Effective Knowledge Management With AlignAI: Making effective use of data requires proper context around the information that is being used. As the size and complexity of your organization increases the difficulty of ensuring that everyone has the necessary knowledge about how to get their work done scales exponentially. Wikis and intranets are a common way to attempt to solve this problem, but they are frequently ineffective. Rehgan Avon co-founded AlignAI to help address this challenge through a more purposeful platform designed to collect and distribute the knowledge of how and why data is used in a business. In this episode she shares the strategic and tactical elements of how to make more effective use of the technical and organizational resources that are available to you for getting work done with data.
byData Engineering Podcast
0 ratings
0% found this document useful
Build Your Second Brain One Piece At A Time: Generative AI promises to accelerate the productivity of human collaborators. Currently the primary way of working with these tools is through a conversational prompt, which is often cumbersome and unwieldy. In order to simplify the integration of AI capabilities into developer workflows Tsavo Knott helped create Pieces, a powerful collection of tools that complements the tools that developers already use. In this episode he explains the data collection and preparation process, the collection of model types and sizes that work together to power the experience, and how to incorporate it into your workflow to act as a second brain.
Podcast episode
Build Your Second Brain One Piece At A Time: Generative AI promises to accelerate the productivity of human collaborators. Currently the primary way of working with these tools is through a conversational prompt, which is often cumbersome and unwieldy. In order to simplify the integration of AI capabilities into developer workflows Tsavo Knott helped create Pieces, a powerful collection of tools that complements the tools that developers already use. In this episode he explains the data collection and preparation process, the collection of model types and sizes that work together to power the experience, and how to incorporate it into your workflow to act as a second brain.
byData Engineering Podcast
0 ratings
0% found this document useful
Let The Whole Team Participate In Data With The Quilt Versioned Data Hub: Data is a team sport, but it's often difficult for everyone on the team to participate. For a long time the mantra of data tools has been "by developers, for developers", which automatically excludes a large portion of the business members who play a crucial role in the success of any data project. Quilt Data was created as an answer to make it easier for everyone to contribute to the data being used by an organization and collaborate on its application. In this episode Aneesh Karve shares the journey that Quilt has taken to provide an approachable interface for working with versioned data in S3 that empowers everyone to collaborate.
Podcast episode
Let The Whole Team Participate In Data With The Quilt Versioned Data Hub: Data is a team sport, but it's often difficult for everyone on the team to participate. For a long time the mantra of data tools has been "by developers, for developers", which automatically excludes a large portion of the business members who play a crucial role in the success of any data project. Quilt Data was created as an answer to make it easier for everyone to contribute to the data being used by an organization and collaborate on its application. In this episode Aneesh Karve shares the journey that Quilt has taken to provide an approachable interface for working with versioned data in S3 that empowers everyone to collaborate.
byData Engineering Podcast
0 ratings
0% found this document useful
Harnessing Generative AI For Creating Educational Content With Illumidesk: Generative AI has unlocked a massive opportunity for content creation. There is also an unfulfilled need for experts to be able to share their knowledge and build communities. Illumidesk was built to take advantage of this intersection. In this episode Greg Werner explains how they are using generative AI as an assistive tool for creating educational material, as well as building a data driven experience for learners.
Podcast episode
Harnessing Generative AI For Creating Educational Content With Illumidesk: Generative AI has unlocked a massive opportunity for content creation. There is also an unfulfilled need for experts to be able to share their knowledge and build communities. Illumidesk was built to take advantage of this intersection. In this episode Greg Werner explains how they are using generative AI as an assistive tool for creating educational material, as well as building a data driven experience for learners.
byData Engineering Podcast
0 ratings
0% found this document useful
Build Your Own Data Pipeline - Andreas Kretz
Podcast episode
Build Your Own Data Pipeline - Andreas Kretz
byDataTalks.Club
0 ratings
0% found this document useful
Privacy-aware Data Pipelines with Skyflow’s Piper Keyes: A data analytics pipeline is important to modern businesses because it allows them to extract valuable insights from the large amounts of data they generate and collect on a daily basis. This leads to better decision making, improved efficiency, and ...
Podcast episode
Privacy-aware Data Pipelines with Skyflow’s Piper Keyes: A data analytics pipeline is important to modern businesses because it allows them to extract valuable insights from the large amounts of data they generate and collect on a daily basis. This leads to better decision making, improved efficiency, and ...
byPartially Redacted: Data, AI, Security, and Privacy
0 ratings
0% found this document useful

Skip carousel

AWS Vs Azure What’s The Difference?
PC Pro Magazine
Article
AWS Vs Azure What’s The Difference?
Sep 11, 2022
7 min read
Why Is ELT Better For Cloud Data Warehousing?
Techfastly
Article
Why Is ELT Better For Cloud Data Warehousing?
Apr 1, 2021
2 min read
How Image Recognition Works
APC
Article
How Image Recognition Works
Nov 4, 2019
4 min read
The Fundamental Limits of Machine Learning
Nautilus
Article
The Fundamental Limits of Machine Learning
Sep 20, 2016
5 min read
Tensor Flow 101
APC
Article
Tensor Flow 101
Jan 27, 2020
4 min read
Want A Job In Data Science? You Might Have To Take A Standardized Test When Applying
Chicago Tribune
Article
Want A Job In Data Science? You Might Have To Take A Standardized Test When Applying
Jul 10, 2018
3 min read
Scikit-Learn: The Ultimate Python Library
APC
Article
Scikit-Learn: The Ultimate Python Library
Jul 15, 2019
4 min read
Create A RESTful Server In Go
Linux Format
Article
Create A RESTful Server In Go
Oct 19, 2021
8 min read
Mining Actionable Information with Smart Capture
The European Business Review
Article
Mining Actionable Information with Smart Capture
May 22, 2018
4 min read
Data-driven Decision Making That Uses Data, Mind And Heart
The European Business Review
Article
Data-driven Decision Making That Uses Data, Mind And Heart
Jan 31, 2020
14 min read
AWS Chief Adam Selipsky Talks Generative AI, Amazon’s Investment In Anthropic And Cloud Cost Cutting
TechLife News
Article
AWS Chief Adam Selipsky Talks Generative AI, Amazon’s Investment In Anthropic And Cloud Cost Cutting
Dec 16, 2023
4 min read
AWS Chief Adam Selipsky Talks Generative AI, Amazon’s Investment In Anthropic And Cloud Cost Cutting
AppleMagazine
Article
AWS Chief Adam Selipsky Talks Generative AI, Amazon’s Investment In Anthropic And Cloud Cost Cutting
Dec 15, 2023
4 min read
Powering Costing With Artificial Intelligence: The Case Of Vodafone Procurement
The European Business Review
Article
Powering Costing With Artificial Intelligence: The Case Of Vodafone Procurement
May 25, 2021
8 min read
Machine-learning On Your Android Phone?
APC
Article
Machine-learning On Your Android Phone?
Dec 30, 2019
4 min read
Under A Cloud
Linux Format
Article
Under A Cloud
Jun 29, 2021
“For us techies, the cloud has made a lot of things easier. Much of it is built on open source technology. Linux is the most common operating system for public cloud services with a 90 per cent share. Open source databases make it easier to host and
1 min read
How Technology Commons Revolutionise Industry Foundations
The European Business Review
Article
How Technology Commons Revolutionise Industry Foundations
Feb 11, 2022
9 min read
Salesforce Adding Einstein Analytics Al To Tableau Platform
Techfastly
Article
Salesforce Adding Einstein Analytics Al To Tableau Platform
Feb 4, 2021
3 min read
Naga Chandrasekaran
HWM Singapore
Article
Naga Chandrasekaran
Dec 6, 2022
Micron’s 232-layer NAND technology provided the high-performance storage necessary to support advanced solutions and real-time services required in data centre and automotive applications, thanks to benefits like longer battery life, better performan
3 min read
There’s A New Career In Town
True Love
Article
There’s A New Career In Town
Oct 21, 2019
2 min read
How Can AI Help Your Business?
PC Pro Magazine
Article
How Can AI Help Your Business?
Jun 8, 2023
7 min read
AI As A Service
PC Pro Magazine
Article
AI As A Service
Jul 9, 2020
2 min read
2 The Use of Python in AI and ML
Techfastly
Article
2 The Use of Python in AI and ML
Nov 30, 2020
3 min read
Supercomputer On A Platter
Business Today
Article
Supercomputer On A Platter
Apr 1, 2022
CHENNAI-HEADQUARTERED automobile major TVS Motor Company uses high-performance computing (HPC) for running R&D simulations and testing the aero-dynamics of two-wheelers, which allows it to make the vehicles stable at speed and more efficient, cool en
7 min read
Opinion
Linux Format
Article
Opinion
Jul 23, 2024
Italo Vignoli is one of the founders of LibreOffice and the Document Foundation. “LibreOffice 24.8 will be announced in the second half of August, and the developers are working hard to optimise the new features that will be included. It will be the
3 min read
Getting The edge
The European Business Review
Article
Getting The edge
Feb 25, 2021
7 min read
Is Quantum Computing Ready For Prime Time?
APC
Article
Is Quantum Computing Ready For Prime Time?
Oct 9, 2023
4 min read
All Your Database Are Belong To Us
Linux Format
Article
All Your Database Are Belong To Us
Apr 6, 2021
7 min read
Leadership Forum: Investing in Disruption
Rotman Management
Article
Leadership Forum: Investing in Disruption
Jan 1, 2019
10 min read
Darq
PC Pro Magazine
Article
Darq
Jul 9, 2022
3 min read
Enterprise Soaring Success
Linux Format
Article
Enterprise Soaring Success
Aug 27, 2019
7 min read

Related categories

Skip carousel

Reviews for MLOps Engineering at Scale

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

MLOps Engineering at Scale - Carl Osipov

MLOps Engineering at Scale

CARL OSIPOV

To comment go to liveBook

Manning_M_small

Manning

Shelter Island

For more information on this and other Manning titles go to

www.manning.com

Copyright

For online information and ordering of these and other Manning books, please visit www.manning.com. The publisher offers discounts on these books when ordered in quantity.

For more information, please contact

Special Sales Department

Manning Publications Co.

20 Baldwin Road

PO Box 761

Shelter Island, NY 11964

Email: [email protected]

No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher.

Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps.

♾ Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine.

ISBN: 9781617297762

preface

acknowledgments

about this book

about the author

about the cover illustration

Part 1 Mastering the data set

1 Introduction to serverless machine learning

1.1 What is a machine learning platform?

1.2 Challenges when designing a machine learning platform

1.3 Public clouds for machine learning platforms

1.4 What is serverless machine learning?

1.5 Why serverless machine learning?

Serverless vs. IaaS and PaaS

Serverless machine learning life cycle

1.6 Who is this book for?

What you can get out of this book

1.7 How does this book teach?

1.8 When is this book not for you?

1.9 Conclusions

2 Getting started with the data set

2.1 Introducing the Washington, DC taxi rides data set

What is the business use case?

What are the business rules?

What is the schema for the business service?

What are the options for implementing the business service?

What data assets are available for the business service?

Downloading and unzipping the data set

2.2 Starting with object storage for the data set

Understanding object storage vs. filesystems

Authenticating with Amazon Web Services

Creating a serverless object storage bucket

2.3 Discovering the schema for the data set

Introducing AWS Glue

Authorizing the crawler to access your objects

Using a crawler to discover the data schema

2.4 Migrating to columnar storage for more efficient analytics

Introducing column-oriented data formats for analytics

Migrating to a column-oriented data format

3 Exploring and preparing the data set

3.1 Getting started with interactive querying

Choosing the right use case for interactive querying

Introducing AWS Athena

Preparing a sample data set

Interactive querying using Athena from a browser

Interactive querying using a sample data set

Querying the DC taxi data set

3.2 Getting started with data quality

From garbage in, garbage out to data quality

Before starting with data quality

Normative principles for data quality

3.3 Applying VACUUM to the DC taxi data

Enforcing the schema to ensure valid values

Cleaning up invalid fare amounts

Improving the accuracy

3.4 Implementing VACUUM in a PySpark job

4 More exploratory data analysis and data preparation

4.1 Getting started with data sampling

Exploring the summary statistics of the cleaned-up data set

Choosing the right sample size for the test data set

Exploring the statistics of alternative sample sizes

Using a PySpark job to sample the test set

Part 2 PyTorch for serverless machine learning

5 Introducing PyTorch: Tensor basics

5.1 Getting started with tensors

5.2 Getting started with PyTorch tensor creation operations

5.3 Creating PyTorch tensors of pseudorandom and interval values

5.4 PyTorch tensor operations and broadcasting

5.5 PyTorch tensors vs. native Python lists

6 Core PyTorch: Autograd, optimizers, and utilities

6.1 Understanding the basics of autodiff

6.2 Linear regression using PyTorch automatic differentiation

6.3 Transitioning to PyTorch optimizers for gradient descent

6.4 Getting started with data set batches for gradient descent

6.5 Data set batches with PyTorch Dataset and DataLoader

6.6 Dataset and DataLoader classes for gradient descent with batches

7 Serverless machine learning at scale

7.1 What if a single node is enough for my machine learning model?

7.2 Using IterableDataset and ObjectStorageDataset

7.3 Gradient descent with out-of-memory data sets

7.4 Faster PyTorch tensor operations with GPUs

7.5 Scaling up to use GPU cores

8 Scaling out with distributed training

8.1 What if the training data set does not fit in memory?

Illustrating gradient accumulation

Preparing a sample model and data set

Understanding gradient descent using out-of-memory data shards

8.2 Parameter server approach to gradient accumulation

8.3 Introducing logical ring-based gradient descent

8.4 Understanding ring-based distributed gradient descent

8.5 Phase 1: Reduce-scatter

8.6 Phase 2: All-gather

Part 3 Serverless machine learning pipeline

9 Feature selection

9.1 Guiding principles for feature selection

Related to the label

Recorded before inference time

Supported by abundant examples

Expressed as a number with a meaningful scale

Based on expert insights about the project

9.2 Feature selection case studies

9.3 Feature selection using guiding principles

Related to the label

Recorded before inference time

Supported by abundant examples

Numeric with meaningful magnitude

Bring expert insight to the problem

9.4 Selecting features for the DC taxi data set

10 Adopting PyTorch Lightning

10.1 Understanding PyTorch Lightning

Converting PyTorch model training to PyTorch Lightning

Enabling test and reporting for a trained model

Enabling validation during model training

11 Hyperparameter optimization

11.1 Hyperparameter optimization with Optuna

Understanding loguniform hyperparameters

Using categorical and log-uniform hyperparameters

11.2 Neural network layers configuration as a hyperparameter

11.3 Experimenting with the batch normalization hyperparameter

Using Optuna study for hyperparameter optimization

Visualizing an HPO study in Optuna

12 Machine learning pipeline

12.1 Describing the machine learning pipeline

12.2 Enabling PyTorch-distributed training support with Kaen

Understanding PyTorch-distributed training settings

12.3 Unit testing model training in a local Kaen container

12.4 Hyperparameter optimization with Optuna

Enabling MLFlow support

Using HPO for DcTaxiModel in a local Kaen provider

Training with the Kaen AWS provider

Appendix A Introduction to machine learning

Appendix B Getting started with Docker

index

front matter

preface

A useful piece of feedback that I got from a reviewer of this book was that it became a cheat code for them to scale the steep MLOps learning curve. I hope that the content of this book will help you become a better informed practitioner of machine learning engineering and data science, as well as a more productive contributor to your projects, your team, and your organization.

In 2021, major technology companies are vocal about their efforts to democratize artificial intelligence (AI) by making technologies like deep learning more accessible to a broader population of scientists and engineers. Regrettably, the democratization approach taken by the corporations focuses too much on core technologies and not enough on the practice of delivering AI systems to end users. As a result, machine learning (ML) engineers and data scientists are well prepared to create experimental, proof-of-concept AI prototypes but fall short in successfully delivering these prototypes to production. This is evident from a wide spectrum of issues: from unacceptably high failure rates of AI projects to ethical controversies about AI systems that make it to end users. I believe that, to become successful, the effort to democratize AI must progress beyond the myopic focus on core, enabling technologies like Keras, PyTorch, and TensorFlow. MLOps emerged as a unifying term for the practice of taking experimental ML code and running it effectively in production. Serverless ML is the leading cloud-native software development model for ML and MLOps, abstracting away infrastructure and improving productivity of the practitioners.

I also encourage you to make use of the Jupyter notebooks that accompany this book. The DC taxi fare project used in the notebook code is designed to give you the practice you need to grow as a practitioner. Happy reading and happy coding!

acknowledgments

I am forever grateful to my daughter, Sophia. You are my eternal source of happiness and inspiration. My wife, Alla, was boundlessly patient with me while I wrote my first book. You were always there to support me and to cheer me along. To my father, Mikhael, I wouldn’t be who I am without you.

I also want to thank the people at Manning who made this book possible: Marina Michaels, my development editor; Frances Buontempo, my technical development editor; Karsten Strøbaek, my technical proofreader; Deirdre Hiam, my project editor; Michele Mitchell, my copyeditor; and Keri Hales, my proofreader.

Many thanks go to the technical peer reviewers: Conor Redmond, Daniela Zapata, Dianshuang Wu, Dimitris Papadopoulos, Dinesh Ghanta, Dr. Irfan Ullah, Girish Ahankari, Jeff Hajewski, Jesús A. Juárez-Guerrero, Trichy Venkataraman Krishnamurthy, Lucian-Paul Torje, Manish Jain, Mario Solomou, Mathijs Affourtit, Michael Jensen, Michael Wright, Pethuru Raj Chelliah, Philip Kirkbride, Rahul Jain, Richard Vaughan, Sayak Paul, Sergio Govoni, Srinivas Aluvala, Tiklu Ganguly, and Todd Cook. Your suggestions helped make this a better book.

about this book

Thank you for purchasing MLOps Engineering at Scale.

Who should read this book

To get the most value from this book, you’ll want to have existing skills in data analysis with Python and SQL, as well as have some experience with machine learning. I expect that if you are reading this book, you are interested in developing your expertise as a machine learning engineer, and you are planning to deploy your machine learning—based prototypes to production.

This book is for information technology professionals or those in academia who have had some exposure to machine learning and are working on or are interested in launching a machine learning system in production. There is a refresher on machine learning prerequisites for this book in appendix A. Keep in mind that if you are brand new to machine learning you may find that studying both machine learning and cloud-based infrastructure for machine learning at the same time can be overwhelming.

If you are a software or a data engineer, and you are planning on starting a machine learning project, this book can help you gain a deeper understanding of the machine learning project life cycle. You will see that although the practice of machine learning depends on traditional information technologies (i.e., computing, storage, and networking), it is different from the traditional information technology in practice. The former is significantly more experimental and more iterative than you may have experienced as a software or a data professional, and you should be prepared for the outcomes to be less known in advance. When working with data, the machine learning practice is more like the scientific process, including forming hypotheses about data, testing alternative models to answer questions about the hypothesis, and ranking and choosing the best performing models to launch atop your machine learning platform.

If you are a machine learning engineer or practitioner, or a data scientist, keep in mind that this book is not about making you a better researcher. The book is not written to educate you about the frontiers of science in machine learning. This book also will not attempt to reteach you the machine learning basics, although you may find the material in appendix A, targeted at information technology professionals, a useful reference. Instead, you should expect to use this book to become a more valuable collaborator on your machine learning team. The book will help you do more with what you already know about data science and machine learning so that you can deliver ready-to-use contributions to your project or your organization. For example, you will learn how to implement your insights about improving machine learning model accuracy and turn them into production-ready capabilities.

How this book is organized: A road map

This book is composed of three parts. In part 1, I chart out the landscape of what it takes to put a machine learning system in production, describe an engineering gap between experimental machine learning code and production machine learning systems, and explain how serverless machine learning can help bridge the gap. By the end of part 1, I’ll have taught you how to use serverless features of a public cloud (Amazon Web Services) to get started with a real-world machine learning use case, prepare a working machine learning data set for the use case, and ensure that you are prepared to apply machine learning to the use case.

Chapter 1 presents a broad view on the field on machine learning systems engineering and what it takes to put the systems in production.

Chapter 2 introduces you to the taxi trips data set for the Washington, DC, municipality and teaches you how to start using the data set for machine learning in the Amazon Web Services (AWS) public cloud.

Chapter 3 applies the AWS Athena interactive query service to dig deeper into the data set, uncover data quality issues, and then address them through a rigorous and principled data quality assurance process.

Chapter 4 demonstrates how to use statistical measures to summarize data set samples and to quantify their similarity to the entire data set. The chapter also covers how to pick the right size for your test, training, and validation data sets and use distributed processing in the cloud to prepare the data set samples for machine learning.

In part 2, I teach you to use the PyTorch deep learning framework to develop models for a structured data set, explain how to distribute and scale up machine learning model training in the cloud, and show how to deploy trained machine learning models to scale with user demand. In the process, you’ll learn to evaluate and assess the performance of alternative machine learning model implementations and how to pick the right one for the use case.

Chapter 5 covers the PyTorch fundamentals by introducing the core tensor application programming interface (API) and helping you gain a level of fluency with using the API.

Chapter 6 focuses on the deep learning aspects of PyTorch, including support for automatic differentiation, alternative gradient descent algorithms, and supporting utilities.

Chapter 7 explains how to scale up your PyTorch programs by teaching about the graphical processing unit (GPU) features and how to take advantage of them to accelerate your deep learning code.

Chapter 8 teaches about data parallel approaches for distributed PyTorch training and covers, in-depth, the distinction between traditional, parameter, server-based approaches and the ring-based distributed training (e.g., Horovod).

In part 3, I introduce you to the battle-tested techniques of machine learning practitioners and cover feature engineering, hyperparameter tuning, and machine learning pipeline assembly. By the conclusion of this book, you will have set up a machine learning platform that ingests raw data, prepares it for machine learning, applies feature engineering, and trains high-performance, hyperparameter-tuned machine learning models.

Chapter 9 explores the use cases around feature selection and feature engineering, using case studies to build intuition about the features that can be selected or engineered for the DC taxi data set.

Chapter 10 teaches how to eliminate boilerplate engineering code in your DC taxi PyTorch model implementation by adopting a framework called PyTorch Lightning. Also, the chapter navigates through the steps required to train, validate, and test your enhanced deep learning model.

Chapter 11 integrates your deep learning model with an open-source hyperparameter optimization framework called Optuna, helping you train multiple models based on alternative hyperparameter values, and then ranking the trained models according to their loss and metric performance.

Chapter 12 packages your deep learning model implementation into a Docker container in order to run it through the various stages of the entire machine learning pipeline, starting from the development data set all the way to a trained model ready for production deployment.

About the code

You can access the code for this book from my Github repository: github.com/osipov/smlbook. The code in this repository is packaged as Jupyter notebooks and is designed to be used in a Linux-based Jupyter notebook environment. This means that you have options when it comes to how you can execute the code. If you have your own, local Jupyter environment, for example, with the Jupyter native client (JupyterApp: https://github.com/jupyterlab/jupyterlab_app) or a Conda distribution (https://jupyter.org/install), that’s great! If you do not use a local Jupyter distribution, you can run the code from the notebooks using a cloud-based service such as Google Colab or Binder. My Github repository README.md file includes badges and hyperlinks to help you launch chapter-specific notebooks in Google Colab.

I strongly urge you to use a local Jupyter installation as opposed to a cloud service, especially if you are worried about the security of your AWS account credentials. Some steps of the code will require you to use your AWS credentials for tasks like creating storage buckets, launching AWS Glue extract-transform-load (ETL) jobs, and more. The code for chapter 12 must be executed on node with Docker installed, so I recommend planning to use a local Jupyter installation on a laptop or a desktop where you have sufficient capacity to install Docker. You can find out more about Docker installation requirements in appendix B.

liveBook discussion forum

Purchase of MLOps for Engineering at Scale includes free access to liveBook, Manning’s online reading platform. Using liveBook’s exclusive discussion features, you can attach comments to the book globally or to specific sections or paragraphs. It’s a snap to make notes for yourself, ask and answer technical questions, and receive help from the author and other users.

To access the forum, go to https://livebook.manning.com/#!/book/mlops-engineering-at-scale/discussion. Be sure to join the forum and say hi! You can also learn more about Manning’s forums and the rules of conduct at https://livebook.manning.com/#!/discussion.

Manning’s commitment to our readers is to provide a venue where a meaningful dialogue between individual readers and between readers and the author can take place. It is not a commitment to any specific amount of participation on the part of the author, whose contribution to the forum remains voluntary (and unpaid). We suggest you try asking the author some challenging questions lest his interest stray! The forum and the archives of previous discussions will be accessible from the publisher’s website as long as the book is in print.

about the author

about the cover illustration

The figure on the cover of MLOps Engineering at Scale is captioned Femme du Thibet, or a woman of Tibet. The illustration is taken from a collection of dress costumes from various countries by Jacques Grasset de Saint-Sauveur (1757—1810), titled Costumes de Différents Pays, published in France in 1797. Each illustration is finely drawn and colored by hand. The rich variety of Grasset de Saint-Sauveur’s collection reminds us vividly of how culturally apart the world’s towns and regions were just 200 years ago. Isolated from each other, people spoke different dialects and languages. In the streets or in the countryside, it was easy to identify where they lived and what their trade or station in life was just by their dress.

The way we dress has changed since then and the diversity by region, so rich at the time, has faded away. It is now hard to tell apart the inhabitants of different continents, let alone different towns, regions, or countries. Perhaps we have traded cultural diversity for a more varied personal life—certainly for a more varied and fast-paced technological life.

At a time when it is hard to tell one computer book from another, Manning celebrates the inventiveness and initiative of the computer business with book covers based on the rich diversity of regional life of two centuries ago, brought back to life by Grasset de Saint-Sauveur’s pictures.

Part 1 Mastering the data set

Engineering an effective machine learning system depends on a thorough understanding of the project data set. If you have prior experience building machine learning models, you might be tempted to skip this step. After all, shouldn’t the machine learning algorithms automate the learning of the patterns from the data? However, as you are going to observe throughout this book, machine learning systems that succeed in production depend on a practitioner who understands the project data set and then applies human insights about the data in ways that modern algorithms can’t.

1 Introduction to serverless machine learning

This chapter covers

What serverless machine learning is and why you should care

The difference between machine learning code and a machine learning platform

How this book teaches about serverless machine learning

The target audience for this book

What you can learn from this book

A Grand Canyon—like gulf separates experimental machine learning code and production machine learning systems. The scenic view across the canyon is magical: when a machine learning system is running successfully in production it can seem prescient. The first time I started typing a query into a machine learning—powered autocomplete search bar and saw the system anticipate my words, I was hooked. I must have tried dozens of different queries to see how well the system worked. So, what does it take to trek across the canyon?

It is surprisingly easy to get started. Given the right data and less than an hour of coding time, it is possible to write the experimental machine learning code and re-create the remarkable experience I have had using the search bar that predicted my words. In my conversations with information technology professionals, I find that many have started to experiment with machine learning. Online classes in machine learning, such as the one from Coursera and Andrew Ng, have a wealth of information about how to get started with machine learning basics. Increasingly, companies that hire for information technology jobs expect entry-level experience with machine learning.¹

While it is relatively easy to experiment with machine learning, building on the results of the experiments to deliver products, services, or features has proven to be difficult. Some companies have even started to use the word unicorn to describe the unreasonably hard-to-find machine learning practitioners with the skills needed to launch production machine learning systems. Practitioners with successful launch experience often have skills that span machine learning, software engineering, and many information technology specialties.

This book is for those who are interested in trekking the journey from experimental machine learning code to a production machine learning system. In this book, I will teach you how to assemble the components for a machine learning platform and use them as a foundation for your production machine learning system. In the process, you will learn:

How to use and integrate public cloud services, including the ones from Amazon Web Services (AWS), for machine learning, including data ingest, storage, and processing

How to assess and achieve data quality standards for machine learning from structured data

How to engineer synthetic features to improve machine learning effectiveness

How to reproducibly sample structured data into experimental subsets for exploration and analysis

How to implement machine learning models using PyTorch and Python in a Jupyter notebook environment

How to implement data processing and machine learning pipelines to achieve both high throughput and low latency

How to train and deploy machine learning models that depend on data processing pipelines

How to monitor and manage the life cycle of your machine learning system once it is put in production

Why should you invest the time to learn these skills? They will not make you a renowned machine learning researcher or help you discover the next ground-breaking machine learning algorithm. However, if you learn from this book, you can prepare yourself to deliver the results of your machine learning efforts sooner and more productively, and grow to be a more valuable contributor to your machine learning project, team, or organization.

1.1 What is a machine learning platform?

If you have never heard of the phrase yak shaving as it is used in the information technology industry,² here’s a hypothetical example of how it may show up during a day in a life of a machine learning practitioner:

My company wants our machine learning system to launch in a month . . . but it is taking us too long to train our machine learning models . . . so I should speed things up by enabling graphical processing units (GPUs) for training . . . but our GPU device drivers are incompatible with our machine learning framework . . . so I need to upgrade to the latest Linux device drivers for compatibility . . . which means that I need to be on the new version of the Linux distribution.

There are many more similar possibilities in which you need to shave a yak to speed up machine learning. The contemporary practice of launching machine learning—based systems in production and keeping them running has too much in common with the yak-shaving story. Instead of focusing on the features needed to make the product a resounding success, too much engineering time is spent on apparently unrelated activities like re-installing Linux device drivers or searching the web for the right cluster settings to configure the data processing middleware.

Why is that? Even if you have the expertise of machine learning PhDs on your project, you still need the support of many information technology services and resources to launch the system. Hidden Technical Debt in Machine Learning Systems, a peer-reviewed article published in 2015 and based on insights from dozens of machine learning practitioners at Google, advises that mature machine learning systems end up being (at most) 5% machine learning code (http://mng.bz/01jl).

This book uses the phrase machine learning platform to describe the 95% that play a supporting yet critical role in the entire system. Having the right machine learning platform can make or break your product.

If you take a closer look at figure 1.1, you should be able to describe some of the capabilities you need from a machine learning platform. Obviously, the platform needs to ingest and store data, process data (which includes applying machine learning and other computations to data), and serve the insights discovered by machine learning to the users of the platform. The less obvious observation is that the platform should be able to handle multiple, concurrent machine learning projects and enable multiple users to run the projects in isolation from each other. Otherwise, replacing only the machine learning code translates to reworking 95% of the system.

01-01

Figure 1.1 Although machine learning code is what makes your machine learning system stand out, it amounts to only about 5% of the system code according to the experiences described in Hidden Technical Debt in Machine Learning Systems by Google’s Sculley et al. Serverless machine learning helps you assemble the other 95% using cloud-based infrastructure.

1.2 Challenges when designing a machine learning platform

How much data should the platform be able to store and process? AcademicTorrents.com is a website dedicated to helping machine learning practitioners get access to public data sets suitable for machine learning. The website lists over 50 TB of data sets, of which the largest are 1—5 TB in size. Kaggle, a website popular for hosting data science competitions, includes data sets as large as 3 TB. You might be tempted to ignore the largest data sets as outliers and focus on more common data sets that are at the scale of gigabytes. However, you should keep in mind that successes in machine learning are often due to reliance on larger data sets. The Unreasonable Effectiveness of Data, by Peter Norvig et al. (http://mng.bz/5Zz4), argues in favor of the machine learning systems that can take advantage of larger data sets: simple models and a lot of data trump more elaborate models based on less data.

A machine learning platform that is expected to operate on a scale of terabytes to petabytes of data for storage and processing must be built as a distributed computing system using multiple inter-networked servers in a cluster, each processing a part of the data set. Otherwise, a data set with hundreds of gigabytes to terabytes will cause out-of-memory problems when processed by a single server with a typical hardware configuration. Having a cluster of servers as part of a machine learning platform also addresses the input/output bandwidth limitations of individual servers. Most servers can supply a CPU with just a few gigabytes of data per second. This means that most types of data processing performed by a machine learning platform can be sped up by splitting up the data sets in chunks (sometimes called shards) that are processed in parallel by the servers in the cluster. The distributed systems design for a machine learning platform as described is commonly known as scaling out.

A significant portion of figure 1.1 is the serving part of the infrastructure used in the platform. This is the part that exposes the data insights produced by the machine learning code to the users of the platform. If you have ever had your email provider classify your emails as spam or not spam, or if you have ever used a product recommendation feature of your favorite e-commerce website, you have interacted as a user with the serving infrastructure part of a machine learning platform. The serving infrastructure for a major email or an e-commerce provider needs to be capable of making the decisions for millions of users around the globe, millions of times a second. Of course, not every machine learning platform needs to operate at this scale. However, if you are planning to deliver a product based on machine learning, you need to keep in mind that it is within the realm of possibility for digital products and services to reach hundreds of millions of users in months. For example, Pokemon Go, a machine learning—powered video game from Niantic, reached half a billion users in less than two months.

Is it prohibitively expensive to launch and operate a machine learning platform at scale? As recently as the 2000s, running a scalable machine learning platform would have required a significant upfront investment in servers, storage, networking as well as software and the expertise needed to build one. The first machine learning platform I worked on for a customer back in 2009 cost over $100,000 USD and was built using on-premises hardware and open source Apache Hadoop (and Mahout) middleware. In addition to upfront costs, machine learning platforms can be expensive to operate due to waste of resources: most machine learning code underutilizes the capacity of the platform. As you know, the training phase of machine learning is resource-intensive, leading to high utilization of computing, storage, and networking. However, trainings are intermittent and are relatively rare for a machine learning system in production, translating to low average utilization. Serving infrastructure utilization varies based on the specific use case for a machine learning system and fluctuates based on factors like time of day, seasonality, marketing events, and more.

1.3 Public clouds for machine learning platforms

The good news is that public cloud-computing infrastructure can help you create a machine learning platform and address the challenges described in the previous section. In particular, the approach described in this book will take advantage of public clouds from vendors like Amazon Web Services, Microsoft Azure, or Google Cloud to provide your machine learning platform with:

Secure isolation so that multiple users of your platform can work in parallel with different machine learning projects and code

Access to information technologies like data storage, computing, and networking when your projects need them and for as long as they are needed

Metering based on consumption so that your machine learning projects are billed just for the resources you used

This book will teach you how to create a machine learning platform from public cloud infrastructure using Amazon Web Services as the primary example. In particular, I will teach you:

How to use public cloud services to cost effectively store data sets regardless of whether they are made of kilobytes of terabytes of data

How to optimize the utilization and cost of your machine learning platform computing infrastructure so that you are using just the servers you need

How to elastically scale your serving infrastructure to reduce the operational costs of your machine learning platform

1.4 What is serverless machine learning?

Serverless machine learning is a model for the software development of machine learning code written to run on a machine learning platform hosted in a cloud-computing infrastructure with consumption-based metering and billing.

If a machine learning system

Enjoying the preview?

Page 1 of 1

MLOps Engineering at Scale

About this ebook

Carl Osipov

Related authors

Related to MLOps Engineering at Scale

Related ebooks

Computers For You

Related podcast episodes

Related articles

Related categories

Reviews for MLOps Engineering at Scale

What did you think?

Book preview

MLOps Engineering at Scale - Carl Osipov

contents

Part 1 Mastering the data set

Part 2 PyTorch for serverless machine learning

Part 3 Serverless machine learning pipeline

preface

acknowledgments

Who should read this book

How this book is organized: A road map

About the code

liveBook discussion forum

about the cover illustration

Part 1 Mastering the data set

1 Introduction to serverless machine learning

This chapter covers

1.1 What is a machine learning platform?

1.2 Challenges when designing a machine learning platform

1.3 Public clouds for machine learning platforms

1.4 What is serverless machine learning?