Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Automated Machine Learning with Microsoft Azure: Build highly accurate and scalable end-to-end AI solutions with Azure AutoML
Automated Machine Learning with Microsoft Azure: Build highly accurate and scalable end-to-end AI solutions with Azure AutoML
Automated Machine Learning with Microsoft Azure: Build highly accurate and scalable end-to-end AI solutions with Azure AutoML
Ebook597 pages4 hours

Automated Machine Learning with Microsoft Azure: Build highly accurate and scalable end-to-end AI solutions with Azure AutoML

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Automated Machine Learning with Microsoft Azure will teach you how to build high-performing, accurate machine learning models in record time. It will equip you with the knowledge and skills to easily harness the power of artificial intelligence and increase the productivity and profitability of your business.

Guided user interfaces (GUIs) enable both novices and seasoned data scientists to easily train and deploy machine learning solutions to production. Using a careful, step-by-step approach, this book will teach you how to use Azure AutoML with a GUI as well as the AzureML Python software development kit (SDK).

First, you'll learn how to prepare data, train models, and register them to your Azure Machine Learning workspace. You'll then discover how to take those models and use them to create both automated batch solutions using machine learning pipelines and real-time scoring solutions using Azure Kubernetes Service (AKS).

Finally, you will be able to use AutoML on your own data to not only train regression, classification, and forecasting models but also use them to solve a wide variety of business problems.
By the end of this Azure book, you'll be able to show your business partners exactly how your ML models are making predictions through automatically generated charts and graphs, earning their trust and respect.

LanguageEnglish
Release dateApr 23, 2021
ISBN9781800561977
Automated Machine Learning with Microsoft Azure: Build highly accurate and scalable end-to-end AI solutions with Azure AutoML

Related to Automated Machine Learning with Microsoft Azure

Related ebooks

Data Modeling & Design For You

View More

Related articles

Reviews for Automated Machine Learning with Microsoft Azure

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Automated Machine Learning with Microsoft Azure - Dennis Michael Sawyers

    Cover.png

    BIRMINGHAM—MUMBAI

    Automated Machine Learning with Microsoft Azure

    Copyright © 2021 Packt Publishing

    All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

    Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

    Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

    Group Product Manager: Kunal Parikh

    Publishing Product Manager: Ali Abidi

    Senior Editor: David Sugarman

    Content Development Editor: Tazeen Shaikh

    Technical Editor: Sonam Pandey

    Copy Editor: Safis Editing

    Project Coordinator: Aparna Ravikumar Nair

    Proofreader: Safis Editing

    Indexer: Manju Arasan

    Production Designer: Vijay Kamble

    First published: April 2021

    Production reference: 1260321

    Published by Packt Publishing Ltd.

    Livery Place

    35 Livery Street

    Birmingham

    B3 2PB, UK.

    ISBN 978-1-80056-531-9

    www.packt.com

    To my wife, Kyoko Sawyers, who has always been by my side and supported me through many long evenings, and to my daughter, Sophia Rose, who was born halfway through the writing of this book.

    – Dennis Sawyers

    Contributors

    About the author

    Dennis Michael Sawyers is a senior cloud solutions architect (CSA) at Microsoft, specializing in data and AI. In his role as a CSA, he helps Fortune 500 companies leverage Microsoft Azure cloud technology to build top-class machine learning and AI solutions. Prior to his role at Microsoft, he was a data scientist at Ford Motor Company in Global Data Insight and Analytics (GDIA) and a researcher in anomaly detection at the highly regarded Carnegie Mellon Auton Lab. He received a master's degree in data analytics from Carnegie Mellon's Heinz College and a bachelor's degree from the University of Michigan. More than anything, Dennis is passionate about democratizing AI solutions through automated machine learning technology.

    I want to thank the people who have been close to me and supported me, especially my wife, Kyoko, for encouraging me to finish this book, Rick Durham and Sam Istephan, for teaching me Azure Machine Learning, and Sabina Cartacio, Aniththa Umamahesan, and Deepti Mokkapati from the Microsoft Azure product team for helping me learn the ins and outs of AutoML.

    About the reviewer

    Marek Chmel is a senior CSA at Microsoft, specializing in data and AI. He is a speaker and trainer with more than 15 years' experience. He has been a Data Platform MVP since 2012. He has earned numerous certifications, including Azure Architect, Data Engineer and Scientist Associate, Certified Ethical Hacker, and several eLearnSecurity certifications. Marek earned his master's degree in business and informatics from Nottingham Trent University. He started his career as a trainer for Microsoft Server courses and later worked as SharePoint team lead and principal database administrator. He has authored two books, Hands-On Data Science with SQL Server 2017 and SQL Server 2017 Administrator's Guide.

    Table of Contents

    Preface

    Section 1: AutoML Explained – Why, What, and How

    Chapter 1: Introducing AutoML

    Explaining data science's ROI problem

    Defining machine learning, data science, and AI

    Machine learning versus traditional software

    The five steps to machine learning success

    Putting it all together

    Analyzing why AI projects fail slowly

    Solving the ROI problem with AutoML

    Summary

    Chapter 2: Getting Started with Azure Machine Learning Service

    Technical requirements

    Creating your first AMLS workspace

    Creating an Azure account

    Creating an AMLS workspace

    Creating an AMLS workspace with code

    Navigating AML studio

    Building compute to run your AutoML jobs

    Creating a compute instance

    Creating a compute cluster

    Creating a compute cluster and compute instance with the Azure CLI

    Working with data in AMLS

    Creating a dataset using the GUI

    Creating a dataset using code

    Understanding how AutoML works on Azure

    Ensuring data quality with data guardrails

    Improving data with intelligent feature engineering

    Normalizing data for ML with iterative data transformation

    Training models quickly with iterative ML model building

    Getting the best results with ML model ensembling

    Summary

    Chapter 3: Training Your First AutoML Model

    Technical requirements

    Loading data into AMLS for AutoML

    Creating an AutoML solution

    Interpreting your AutoML results

    Understanding data guardrails

    Understanding model metrics

    Explaining your AutoML model

    Obtaining better AutoML performance

    Summary

    Section 2: AutoML for Regression, Classification, and Forecasting – A Step-by-Step Guide

    Chapter 4: Building an AutoML Regression Solution

    Technical requirements

    Preparing data for AutoML regression

    Setting up your Jupyter environment

    Preparing your data for AutoML

    Training an AutoML regression model

    Registering your trained regression model

    Fine-tuning your AutoML regression model

    Improving AutoML regression models

    Understanding AutoML regression algorithms

    Summary

    Chapter 5: Building an AutoML Classification Solution

    Technical requirements

    Prepping data for AutoML classification

    Navigating to your Jupyter environment

    Loading and transforming your data

    Training an AutoML classification model

    Registering your trained classification model

    Training an AutoML multiclass model

    Fine-tuning your AutoML classification model

    Improving AutoML classification models

    Understanding AutoML classification algorithms

    Summary

    Chapter 6: Building an AutoML Forecasting Solution

    Technical requirements

    Prepping data for AutoML forecasting

    Navigating to your Jupyter environment

    Loading and transforming your data

    Training an AutoML forecasting model

    Training a forecasting model with standard algorithms

    Training a forecasting model with Prophet and ARIMA

    Registering your trained forecasting model

    Fine-tuning your AutoML forecasting model

    Improving AutoML forecasting models

    Understanding AutoML forecasting algorithms

    Summary

    Chapter 7: Using the Many Models Solution Accelerator

    Technical requirements

    Installing the many models solution accelerator

    Creating a new notebook in your Jupyter environment

    Installing the MMSA from GitHub

    Prepping data for many models

    Prepping the sample OJ dataset

    Prepping a pandas dataframe

    Training many models simultaneously

    Training the sample OJ dataset

    Training your sample dataset with the MMSA

    Scoring new data for many models

    Scoring OJ sales data with the MMSA

    Scoring your sample dataset with many models

    Improving your many models results

    Summary

    Section 3: AutoML in Production – Automating Real-Time and Batch Scoring Solutions

    Chapter 8: Choosing Real-Time versus Batch Scoring

    Technical requirements

    Architecting batch scoring solutions

    Understanding the five-step batch scoring process

    Scheduling your batch scoring solution

    Scoring data in batches and delivering results

    Choosing batch over real time

    Architecting real-time scoring solutions

    Understanding the four-step real-time scoring process

    Training a model for real-time deployment  

    Delivering results in real time

    Knowing when to use real-time scoring

    Choosing real-time over batch solutions

    Determining batch versus real-time scoring scenarios

    Scenarios for real-time or batch scoring

    Answers for the type of solution appropriate for each scenario

    Summary

    Chapter 9: Implementing a Batch Scoring Solution

    Technical requirements

    Creating an ML pipeline

    Coding the first three steps of your ML scoring pipeline

    Creating a Python script to score data in your ML pipeline

    Creating and containerizing an environment

    Configuring and running your ML scoring pipeline

    Accessing your scored predictions via AML studio

    Creating a parallel scoring pipeline

    Coding the first three steps of your ML parallel scoring pipeline

    Creating Python scripts to score data in your ML parallel pipeline

    Configuring and running your ML parallel scoring pipeline

    Creating an AutoML training pipeline

    Coding the first two steps of your AutoML training pipeline

    Configuring your AutoML model training settings and step

    Creating a Python script to register your model

    Configuring and running your AutoML training pipeline

    Triggering and scheduling your ML pipelines

    Triggering your published pipeline from the GUI

    Triggering and scheduling a published pipeline through code

    Summary

    Chapter 10: Creating End-to-End AutoML Solutions

    Technical requirements

    Connecting AMLS to ADF

    Creating an ADF

    Creating a service principal and granting access

    Creating a linked service to connect ADF with AMLS

    Scheduling a machine learning pipeline in ADF

    Transferring data using ADF

    Installing a self-hosted integration runtime

    Creating an Azure Blob storage linked service

    Creating a linked service to your PC

    Creating an ADF pipeline to copy data

    Automating an end-to-end scoring solution

    Editing an ML pipeline to score new data

    Creating an ADF pipeline to run your ML pipeline

    Adding a trigger to your ADF pipeline

    Automating an end-to-end training solution

    Creating a pipeline to copy data into Azure

    Editing an ML pipeline to train with new data

    Adding a Machine Learning Execute Pipeline activity to your ADF pipeline

    Summary

    Chapter 11: Implementing a Real-Time Scoring Solution

    Technical requirements

    Creating real-time endpoints through the UI

    Creating an ACI-hosted endpoint through the UI

    Creating an AKS cluster through the UI

    Creating an AKS-hosted endpoint through the UI

    Creating real-time endpoints through the SDK

    Creating and testing a real-time endpoint with ACI through Python

    Creating an AKS cluster through Python

    Creating and testing a real-time endpoint with AKS through Python

    Improving performance on your AKS cluster

    Summary

    Chapter 12: Realizing Business Value with AutoML

    Technical requirements

    Architecting AutoML solutions

    Making key architectural decisions for AutoML solutions

    Architecting a batch solution

    Architecting a real-time solution

    Visualizing AutoML modeling results

    Visualizing the results of classification

    Visualizing the results of forecasting and regression

    Explaining AutoML results to your business

    Using AutoML in other Microsoft products

    Using AutoML within PowerBI

    Using AutoML within Azure Synapse Analytics

    Using AutoML with ML.NET

    Using AutoML on SQL Server, HDInsight, and Azure Databricks

    Realizing business value

    Getting the business to adopt a new, automated solution

    Getting the business to replace an older, automated process

    Getting the business to adopt a new, decision-assistance tool

    Getting the business to replace an old decision assistance tool

    Summary

    Why subscribe?

    Other Books You May Enjoy

    Preface

    Automated Machine Learning with Microsoft Azure will help you build high-performing, accurate machine learning models in record time. It allows anyone to easily harness the power of artificial intelligence and increase the productivity and profitability of their business. With a series of clicks on a graphical user interface (GUI), novices and seasoned data scientists alike can easily train and deploy machine learning solutions to production.

    This book will teach you how to use Azure AutoML with both the GUI and the Azure Machine Learning Python SDK in a careful, step-by-step fashion. First, you'll learn how to prepare data, train models, and register them to your Azure Machine Learning workspace. Then, you'll learn how to take those models and use them to create both automated batch solutions using machine learning pipelines and real-time scoring solutions using Azure Kubernetes Service (AKS).

    By the time you finish Automated Machine Learning with Microsoft Azure, you will be able to use AutoML on your own data to not only train regression, classification, and forecasting models but also use them to solve a wide variety of business problems. You'll be able to show your business partners exactly how your machine learning models make predictions through automatically generated charts and graphs, earning their trust and respect.

    Who this book is for

    Data scientists, aspiring data scientists, machine learning engineers, and anyone interested in applying artificial intelligence or machine learning in their business will find this book useful. You need to have beginner-level knowledge of artificial intelligence and a technical background in computer science, statistics, or information technology before getting started with this machine learning book. Having a background in Python will help you implement this book's more advanced features, but even data analysts and SQL experts will be able to train machine learning models after finishing this book.

    What this book covers

    Chapter 1, Introducing AutoML, begins by explaining the current state of data science and artificial intelligence in industry and why so many companies are having such a hard time extracting value from data. It explains how data scientists work, why their processes are inherently slow, and why they need to be made quicker. Finally, it introduces AutoML as the solution to achieve the return on investment required by industry.

    Chapter 2, Getting Started with Azure Machine Learning Service, goes into depth in explaining the different artifacts of Azure Machine Learning and how they integrate to form end-to-end machine learning solutions. You'll learn about datastores, datasets, compute instances, compute clusters, environments, and experiments, and how you use them to create machine learning solutions on Azure.

    Chapter 3, Training Your First AutoML Model, will have you create your first AutoML model using publicly available Titanic data. You will use the Azure Machine Learning Studio GUI to upload your data into your workspace, create a dataset, and run an AutoML classification job to predict Titanic survivors. Lastly, you'll use AutoML's explainability features to see which factors were most vital to predicting survival.

    Chapter 4, Building an AutoML Regression Solution, will help you train an AutoML regression model using the Azure Machine Learning SDK in Python. You'll learn how to access Jupyter notebooks within Azure Machine Learning, use compute clusters for remote training on the cloud, and create an AutoML model that predicts a number. By the end of this chapter, you will be able to replicate this work for any regression problem you have in the future.

    Chapter 5, Building an AutoML Classification Solution, will help you train an AutoML classification model using the Azure Machine Learning SDK in Python in two ways. First, you'll train a binary classification model to predict one of two categories. Then, you will train a multiclass classification model to predict one of three categories. By the end of this chapter, you'll be an expert in training all types of classification models with AutoML.

    Chapter 6, Building an AutoML Forecasting Solution, looks at forecasting, one of the most common machine learning problems and one of the hardest to master. In this chapter, you'll learn how to code a forecasting solution with AutoML, making use of advanced forecasting-specific algorithms and features. You'll learn the ins and outs of forecasting and be able to avoid many of the common mistakes people make while forecasting.

    Chapter 7, Using the Many Models Solution Accelerator, expands on how the Many Models Solution Accelerator (MMSA) is a cutting-edge Azure technology that lets companies train hundreds of thousands of models quickly and easily. Here, you will learn how to access the MMSA and adapt it to your own problems. This is a powerful code-only solution aimed at seasoned data scientists, but even novices will be able to use it by the end of this chapter.

    Chapter 8, Choosing Real-Time versus Batch Scoring, explores how real-time solutions and batch solutions represent the two ways to score machine learning models. This chapter delves into common business scenarios and explains how you should choose which type of solution to create. The end of this chapter features a quiz that will test your ability to match business problems to the correct type of solution, saving you time and money.

    Chapter 9, Implementing a Batch Scoring Solution, emphasizes how machine learning pipelines are Azure Machine Learning's batch scoring solution of choice. Machine learning pipelines are containerized code where, once you create them, you can easily rerun and schedule them on an automated basis. This chapter has you use the AutoML models you created in earlier chapters to create powerful batch scoring solutions that run on a schedule of your choice.

    Chapter 10, Creating End-to-End AutoML Solutions, emphasizes how Azure Data Factory (ADF) is a code-free data orchestration tool that integrates easily with machine learning pipelines. In this chapter, you'll learn how to seamlessly move data into and out of Azure, and how to integrate that flow with your scoring pipelines. By the end of this chapter, you will understand how ADF and AMLS combine to create the ultimate data science experience.

    Chapter 11, Implementing a Real-Time Scoring Solution, teaches you how to create real-time scoring endpoints hosted on AKS and Azure Container Instances (ACI). You'll learn how to deploy AutoML models to an endpoint with a single click from the Azure Machine Learning Studio GUI as well as through Python code in a Jupyter notebook, completing your AutoML training.

    Chapter 12, Realizing Business Value with AutoML, focuses on how creating an end-to-end solution is just the first step in realizing business value; you'll also need to gain end user trust. This chapter focuses on how to gain this trust through architectural diagrams, model interpretability, and presenting results in an intuitive, easy-to-understand manner. You'll learn how to become and be seen as a trusted, reliable partner to your business.

    To get the most out of this book

    You will need to have the following requirements:

    In order to use Automated Machine Learning with Microsoft Azure, you will need a working internet connection. We recommend either Microsoft Edge or Google Chrome to have the best experience with the Azure portal. Furthermore, you will be required to create an Azure account (at no cost) if you do not already have one.

    If you are using the digital version of this book, we advise you to type the code yourself or access the code via the GitHub repository (link available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

    As you work through the book, please feel free to try AutoML with your own data. It helps greatly in your learning experience to solve problems that interest you. At the end of each chapter, try adapting your own datasets to the example code.

    Download the example code files

    You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Automated-Machine-Learning-with-Microsoft-Azure. In case there's an update to the code, it will be updated on the existing GitHub repository.

    We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

    Download the color images

    We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781800565319_ColorImages.pdf.

    Conventions used

    There are a number of text conventions used throughout this book.

    Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: You have another helper function here, get_forecasting_output.

    A block of code is set as follows:

    from azureml.core import Workspace, Dataset, Datastore

    from azureml.core import Experiment

    from azureml.core.compute import ComputeTarget

    from azureml.train.automl import AutoMLConfig

    from azureml.train.automl.run import AutoMLRun

    from azureml.widgets import RunDetails

    Any command-line input or output is written as follows:

    from azureml.pipeline.core import PipelineRun

    experiment = Experiment(ws, 'your-experiment_name')

    pipeline_run = PipelineRun(experiment, 'your-pipeline-run-id')

    Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: Go to Experiments under Assets in Azure Machine Learning Studio, click your experiment name, select your run ID, click the Models tab, select the highest-performing algorithm, and click the Metrics tab.

    Tips or important notes

    Appear like this.

    Get in touch

    Feedback from our readers is always welcome.

    General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].

    Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

    Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

    If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

    Reviews

    Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

    For more information about Packt, please visit packt.com.

    Section 1: AutoML Explained – Why, What, and How

    In this first part, you will understand why you should use AutoML and how it solves common industry problems. You will also build an AutoML solution through a UI. 

    This section comprises the following chapters:

    Chapter 1, Introducing AutoML

    Chapter 2, Getting Started with Azure Machine Learning Service

    Chapter 3, Training Your First AutoML Model

    Chapter 1: Introducing AutoML

    AI is everywhere. From recommending products on your favorite websites to optimizing the supply chains of Fortune 500 companies to forecasting demand for shops of all sizes, AI has emerged as a dominant force. Yet, as AI becomes more and more prevalent in the workplace, a worrisome trend has emerged: most AI projects fail.

    Failure occurs for a variety of technical and non-technical reasons. Sometimes, it's because the AI model performs poorly. Other times, it's due to data issues. Machine learning algorithms require reliable, accurate, timely data, and sometimes your data fails to meet those standards. When data isn't the issue and your model performs well, failure usually occurs because end users simply do not trust AI to guide their decision making.

    For every worrisome trend, however, there is a promising solution. Microsoft and a host of other companies have developed automated machine learning (AutoML) to increase the success of your AI projects. In this book, you will learn how to use AutoML on Microsoft's Azure cloud platform. This book will teach you how to boost your productivity if you are a data scientist. If you are not a data scientist, this book will enable you to build machine learning models and harness the power of AI.

    In this chapter, we will begin by understanding what AI and machine learning are and explain why companies have had such trouble in seeing a return on their investment in AI. Then, we will proceed into a deeper dive into how data scientists work and why that workflow is inherently slow and mistake-prone from a project success perspective. Finally, we conclude the chapter by introducing AutoML as the key to unlocking productivity in machine learning projects.

    In this chapter, we will cover the following topics:

    Explaining data science's ROI problem

    Analyzing why AI projects fail slowly

    Solving the ROI problem with AutoML

    Explaining data science's ROI problem

    Data scientist has been consistently ranked the best job in America by Forbes Magazine from 2016 to 2019, yet the best job in America has not produced the best results for the companies employing them. According to VentureBeat, 87% of data science projects fail to make it into production. This means that most of the work that data scientists perform does not impact their employer in any meaningful way.

    By

    Enjoying the preview?
    Page 1 of 1