Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Machine Learning Engineering on AWS: Build, scale, and secure machine learning systems and MLOps pipelines in production
Machine Learning Engineering on AWS: Build, scale, and secure machine learning systems and MLOps pipelines in production
Machine Learning Engineering on AWS: Build, scale, and secure machine learning systems and MLOps pipelines in production
Ebook994 pages6 hours

Machine Learning Engineering on AWS: Build, scale, and secure machine learning systems and MLOps pipelines in production

Rating: 0 out of 5 stars

()

Read preview

About this ebook

There is a growing need for professionals with experience in working on machine learning (ML) engineering requirements as well as those with knowledge of automating complex MLOps pipelines in the cloud. This book explores a variety of AWS services, such as Amazon Elastic Kubernetes Service, AWS Glue, AWS Lambda, Amazon Redshift, and AWS Lake Formation, which ML practitioners can leverage to meet various data engineering and ML engineering requirements in production.
This machine learning book covers the essential concepts as well as step-by-step instructions that are designed to help you get a solid understanding of how to manage and secure ML workloads in the cloud. As you progress through the chapters, you’ll discover how to use several container and serverless solutions when training and deploying TensorFlow and PyTorch deep learning models on AWS. You’ll also delve into proven cost optimization techniques as well as data privacy and model privacy preservation strategies in detail as you explore best practices when using each AWS.
By the end of this AWS book, you'll be able to build, scale, and secure your own ML systems and pipelines, which will give you the experience and confidence needed to architect custom solutions using a variety of AWS services for ML engineering requirements.

LanguageEnglish
Release dateOct 27, 2022
ISBN9781803231389
Machine Learning Engineering on AWS: Build, scale, and secure machine learning systems and MLOps pipelines in production

Related to Machine Learning Engineering on AWS

Related ebooks

Data Modeling & Design For You

View More

Related articles

Reviews for Machine Learning Engineering on AWS

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Machine Learning Engineering on AWS - Joshua Arvin Lat

    9781803247595cov_Highres.png

    BIRMINGHAM—MUMBAI

    Machine Learning Engineering on AWS

    Copyright © 2022 Packt Publishing

    All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

    Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

    Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

    Publishing Product Manager: Ali Abidi

    Content Development Editor: Priyanka Soam

    Technical Editor: Devanshi Ayare

    Copy Editor: Safis Editing

    Project Coordinator: Farheen Fathima

    Proofreader: Safis Editing

    Indexer: Sejal Dsilva

    Production Designer: Ponraj Dhandapani

    Marketing Coordinators: Shifa Ansari

    First published: October 2022

    Production reference: 1290922

    Published by Packt Publishing Ltd.

    Livery Place

    35 Livery Street

    Birmingham

    B3 2PB, UK.

    ISBN 978-1-80324-759-5

    www.packt.com

    Contributors

    About the author

    Joshua Arvin Lat is the Chief Technology Officer (CTO) of NuWorks Interactive Labs, Inc. He previously served as the CTO of three Australian-owned companies and also served as Director of Software Development and Engineering for multiple e-commerce start-ups in the past, which allowed him to be more effective as a leader. Years ago, he and his team won first place in a global cybersecurity competition with their published research paper. He is also an AWS Machine Learning Hero and has shared his knowledge at several international conferences, discussing practical strategies on machine learning, engineering, security, and management.

    About the reviewers

    Raphael Jambalos manages the Cloud-Native Development Team at eCloudValley, Philippines. His team architects and implements solutions that leverage AWS services to deliver reliable applications. He is also a community leader for the AWS user group MegaManila, organizing monthly meetups and growing the community. In his free time, he loves to read books and write about tech on his blog (https://dev.to/raphael_jambalos). He holds five AWS certifications and is an AWS APN Ambassador for the Philippines. He was also a technical reviewer for the Packt book Machine Learning with Amazon SageMaker Cookbook.

    Sophie Soliven is the General Manager of E-commerce Services and Dropship for BeautyMnl. As one of the pioneers and leaders of the company, she contributed to its growth from its humble beginnings to what it is today – the biggest homegrown e-commerce platform in the Philippines – by using a data-driven approach to scale its operations. She has obtained a number of certifications on data analytics and cloud computing, including Microsoft Power BI Data Analyst Associate, Tableau Desktop Specialist, and AWS Certified Cloud Practitioner. For the last couple of years, she has been sharing her knowledge and experience in data-driven operations at local and international conferences and events.

    Table of Contents

    Preface

    Part 1: Getting Started with Machine Learning Engineering on AWS

    1

    Introduction to ML Engineering on AWS

    Technical requirements

    What is expected from ML engineers?

    How ML engineers can get the most out of AWS

    Essential prerequisites

    Creating the Cloud9 environment

    Increasing Cloud9’s storage

    Installing the Python prerequisites

    Preparing the dataset

    Generating a synthetic dataset using a deep learning model

    Exploratory data analysis

    Train-test split

    Uploading the dataset to Amazon S3

    AutoML with AutoGluon

    Setting up and installing AutoGluon

    Performing your first AutoGluon AutoML experiment

    Getting started with SageMaker and SageMaker Studio

    Onboarding with SageMaker Studio

    Adding a user to an existing SageMaker Domain

    No-code machine learning with SageMaker Canvas

    AutoML with SageMaker Autopilot

    Summary

    Further reading

    2

    Deep Learning AMIs

    Technical requirements

    Getting started with Deep Learning AMIs

    Launching an EC2 instance using a Deep Learning AMI

    Locating the framework-specific DLAMI

    Choosing the instance type

    Ensuring a default secure configuration

    Launching the instance and connecting to it using EC2 Instance Connect

    Downloading the sample dataset

    Training an ML model

    Loading and evaluating the model

    Cleaning up

    Understanding how AWS pricing works for EC2 instances

    Using multiple smaller instances to reduce the overall cost of running ML workloads

    Using spot instances to reduce the cost of running training jobs

    Summary

    Further reading

    3

    Deep Learning Containers

    Technical requirements

    Getting started with AWS Deep Learning Containers

    Essential prerequisites

    Preparing the Cloud9 environment

    Downloading the sample dataset

    Using AWS Deep Learning Containers to train an ML model

    Serverless ML deployment with Lambda’s container image support

    Building the custom container image

    Testing the container image

    Pushing the container image to Amazon ECR

    Running ML predictions on AWS Lambda

    Completing and testing the serverless API setup

    Summary

    Further reading

    Part 2: Solving Data Engineering and Analysis Requirements

    4

    Serverless Data Management on AWS

    Technical requirements

    Getting started with serverless data management

    Preparing the essential prerequisites

    Opening a text editor on your local machine

    Creating an IAM user

    Creating a new VPC

    Uploading the dataset to S3

    Running analytics at scale with Amazon Redshift Serverless

    Setting up a Redshift Serverless endpoint

    Opening Redshift query editor v2

    Creating a table

    Loading data from S3

    Querying the database

    Unloading data to S3

    Setting up Lake Formation

    Creating a database

    Creating a table using an AWS Glue Crawler

    Using Amazon Athena to query data in Amazon S3

    Setting up the query result location

    Running SQL queries using Athena

    Summary

    Further reading

    5

    Pragmatic Data Processing and Analysis

    Technical requirements

    Getting started with data processing and analysis

    Preparing the essential prerequisites

    Downloading the Parquet file

    Preparing the S3 bucket

    Automating data preparation and analysis with AWS Glue DataBrew

    Creating a new dataset

    Creating and running a profile job

    Creating a project and configuring a recipe

    Creating and running a recipe job

    Verifying the results

    Preparing ML data with Amazon SageMaker Data Wrangler

    Accessing Data Wrangler

    Importing data

    Transforming the data

    Analyzing the data

    Exporting the data flow

    Turning off the resources

    Verifying the results

    Summary

    Further reading

    Part 3: Diving Deeper with Relevant Model Training and Deployment Solutions

    6

    SageMaker Training and Debugging Solutions

    Technical requirements

    Getting started with the SageMaker Python SDK

    Preparing the essential prerequisites

    Creating a service limit increase request

    Training an image classification model with the SageMaker Python SDK

    Creating a new Notebook in SageMaker Studio

    Downloading the training, validation, and test datasets

    Uploading the data to S3

    Using the SageMaker Python SDK to train an ML model

    Using the %store magic to store data

    Using the SageMaker Python SDK to deploy an ML model

    Using the Debugger Insights Dashboard

    Utilizing Managed Spot Training and Checkpoints

    Cleaning up

    Summary

    Further reading

    7

    SageMaker Deployment Solutions

    Technical requirements

    Getting started with model deployments in SageMaker

    Preparing the pre-trained model artifacts

    Preparing the SageMaker script mode prerequisites

    Preparing the inference.py file

    Preparing the requirements.txt file

    Preparing the setup.py file

    Deploying a pre-trained model to a real-time inference endpoint

    Deploying a pre-trained model to a serverless inference endpoint

    Deploying a pre-trained model to an asynchronous inference endpoint

    Creating the input JSON file

    Adding an artificial delay to the inference script

    Deploying and testing an asynchronous inference endpoint

    Cleaning up

    Deployment strategies and best practices

    Summary

    Further reading

    Part 4: Securing, Monitoring, and Managing Machine Learning Systems and Environments

    8

    Model Monitoring and Management Solutions

    Technical prerequisites

    Registering models to SageMaker Model Registry

    Creating a new notebook in SageMaker Studio

    Registering models to SageMaker Model Registry using the boto3 library

    Deploying models from SageMaker Model Registry

    Enabling data capture and simulating predictions

    Scheduled monitoring with SageMaker Model Monitor

    Analyzing the captured data

    Deleting an endpoint with a monitoring schedule

    Cleaning up

    Summary

    Further reading

    9

    Security, Governance, and Compliance Strategies

    Managing the security and compliance of ML environments

    Authentication and authorization

    Network security

    Encryption at rest and in transit

    Managing compliance reports

    Vulnerability management

    Preserving data privacy and model privacy

    Federated Learning

    Differential Privacy

    Privacy-preserving machine learning

    Other solutions and options

    Establishing ML governance

    Lineage Tracking and reproducibility

    Model inventory

    Model validation

    ML explainability

    Bias detection

    Model monitoring

    Traceability, observability, and auditing

    Data quality analysis and reporting

    Data integrity management

    Summary

    Further reading

    Part 5: Designing and BuildingEnd-to-end MLOps Pipelines

    10

    Machine Learning Pipelines with Kubeflow on Amazon EKS

    Technical requirements

    Diving deeper into Kubeflow, Kubernetes, and EKS

    Preparing the essential prerequisites

    Preparing the IAM role for the EC2 instance of the Cloud9 environment

    Attaching the IAM role to the EC2 instance of the Cloud9 environment

    Updating the Cloud9 environment with the essential prerequisites

    Setting up Kubeflow on Amazon EKS

    Running our first Kubeflow pipeline

    Using the Kubeflow Pipelines SDK to build ML workflows

    Cleaning up

    Recommended strategies and best practices

    Summary

    Further reading

    11

    Machine Learning Pipelines with SageMaker Pipelines

    Technical requirements

    Diving deeper into SageMaker Pipelines

    Preparing the essential prerequisites

    Running our first pipeline with SageMaker Pipelines

    Defining and preparing our first ML pipeline

    Running our first ML pipeline

    Creating Lambda functions for deployment

    Preparing the Lambda function for deploying a model to a new endpoint

    Preparing the Lambda function for checking whether an endpoint exists

    Preparing the Lambda function for deploying a model to an existing endpoint

    Testing our ML inference endpoint

    Completing the end-to-end ML pipeline

    Defining and preparing the complete ML pipeline

    Running the complete ML pipeline

    Cleaning up

    Recommended strategies and best practices

    Summary

    Further reading

    Index

    Other Books You May Enjoy

    Preface

    There is a growing need for professionals with experience in working on machine learning (ML) engineering requirements as well as those with knowledge of automating complex MLOps pipelines in the cloud. This book explores a variety of AWS services, such as Amazon Elastic Kubernetes Service, AWS Glue, AWS Lambda, Amazon Redshift, and AWS Lake Formation, which ML practitioners can leverage to meet various data engineering and ML engineering requirements in production.

    This machine learning book covers the essential concepts as well as step-by-step instructions that are designed to help you get a solid understanding of how to manage and secure ML workloads in the cloud. As you progress through the chapters, you’ll discover how to use several container and serverless solutions when training and deploying TensorFlow and PyTorch deep learning models on AWS. You’ll also delve into proven cost optimization techniques as well as data privacy and model privacy preservation strategies in detail as you explore best practices when using each AWS.

    By the end of this AWS book, you'll be able to build, scale, and secure your own ML systems and pipelines, which will give you the experience and confidence needed to architect custom solutions using a variety of AWS services for ML engineering requirements.

    Who this book is for

    This book is for ML engineers, data scientists, and AWS cloud engineers interested in working on production data engineering, machine learning engineering, and MLOps requirements using a variety of AWS services such as Amazon EC2, Amazon Elastic Kubernetes Service (EKS), Amazon SageMaker, AWS Glue, Amazon Redshift, AWS Lake Formation, and AWS Lambda -- all you need is an AWS account to get started. Prior knowledge of AWS, machine learning, and the Python programming language will help you to grasp the concepts covered in this book more effectively.

    What this book covers

    Chapter 1, Introduction to ML Engineering on AWS, focuses on helping you get set up, understand the key concepts, and get your feet wet quickly with several simplified AutoML examples.

    Chapter 2, Deep Learning AMIs, introduces AWS Deep Learning AMIs and how they are used to help ML practitioners perform ML experiments faster inside EC2 instances. Here, we will also dive a bit deeper into how AWS pricing works for EC2 instances so that you will have a better idea of how to optimize and reduce the overall costs of running ML workloads in the cloud.

    Chapter 3, Deep Learning Containers, introduces AWS Deep Learning Containers and how they are used to help ML practitioners perform ML experiments faster using containers. Here, we will also deploy a trained deep learning model inside an AWS Lambda function using Lambda’s container image support.

    Chapter 4, Serverless Data Management on AWS, presents several serverless solutions, such as Amazon Redshift Serverless and AWS Lake Formation, for managing and querying data on AWS.

    Chapter 5, Pragmatic Data Processing and Analysis, focuses on the different services available when working on data processing and analysis requirements, such as AWS Glue DataBrew and Amazon SageMaker Data Wrangler.

    Chapter 6, SageMaker Training and Debugging Solutions, presents the different solutions and capabilities available when training an ML model using Amazon SageMaker. Here, we dive a bit deeper into the different options and strategies when training and tuning ML models in SageMaker.

    Chapter 7, SageMaker Deployment Solutions, focuses on the relevant deployment solutions and strategies when performing ML inference on the AWS platform.

    Chapter 8, Model Monitoring and Management Solutions, presents the different monitoring and management solutions available on AWS.

    Chapter 9, Security, Governance, and Compliance Strategies, focuses on the relevant security, governance, and compliance strategies needed to secure production environments. Here, we will also dive a bit deeper into the different techniques to ensure data privacy and model privacy.

    Chapter 10, Machine Learning Pipelines with Kubeflow on Amazon EKS, focuses on using Kubeflow Pipelines, Kubernetes, and Amazon EKS to deploy an automated end-to-end MLOps pipeline on AWS.

    Chapter 11, Machine Learning Pipelines with SageMaker Pipelines, focuses on using SageMaker Pipelines to design and build automated end-to-end MLOps pipelines. Here, we will apply, combine, and connect the different strategies and techniques we learned in the previous chapters of the book.

    To get the most out of this book

    You will need an AWS account and a stable internet connection to complete the hands-on solutions in this book. If you still do not have an AWS account, feel free to check the AWS Free Tier page and click Create a Free Account: https://aws.amazon.com/free/.

    If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book’s GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

    Download the example code files

    You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Machine-Learning-Engineering-on-AWS. If there’s an update to the code, it will be updated in the GitHub repository.

    We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

    Download the color images

    We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://packt.link/jeBII.

    Conventions used

    There are a number of text conventions used throughout this book.

    Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: ENTRYPOINT is set to /opt/conda/bin/python -m awslambdaric. The CMD command is then set to app.handler. The ENTRYPOINT and CMD instructions define which command is executed when the container starts to run.

    A block of code is set as follows:

    SELECT booking_changes, has_booking_changes, *

    FROM dev.public.bookings

    WHERE

    (booking_changes=0 AND has_booking_changes='True')

    OR

    (booking_changes>0 AND has_booking_changes='False');

    When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

    ---

    apiVersion: eksctl.io/v1alpha5

    kind: ClusterConfig

    metadata:

      name:

    kubeflow-eks-000

     

      region:

    us-west-2

     

      version: 1.21

    availabilityZones: [

    us-west-2a, us-west-2b, us-west-2c, us-west-2d

    ]

    managedNodeGroups:

    - name: nodegroup

      desiredCapacity:

    5

     

      instanceType:

    m5.xlarge

     

      ssh:

        enableSsm: true

    Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: After clicking the FILTER button, a drop-down menu should appear. Locate and select Greater than or equal to from the list of options under By condition. This should update the pane on the right side of the page and show the list of configuration options for the Filter values operation.

    Tips or Important Notes

    Appear like this.

    Get in touch

    Feedback from our readers is always welcome.

    General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].

    Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata, select your book, click on the Errata Submission Form link, and enter the details.

    Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

    If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

    Share Your Thoughts

    Once you’ve read Machine Learning Engineering on AWS, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.

    Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.

    Part 1: Getting Started with Machine Learning Engineering on AWS

    In this section, readers will be introduced to the world of ML engineering on AWS.

    This section comprises the following chapters:

    Chapter 1, Introduction to ML Engineering on AWS

    Chapter 2, Deep Learning AMIs

    Chapter 3, Deep Learning Containers

    1

    Introduction to ML Engineering on AWS

    Most of us started our machine learning (ML) journey by training our first ML model using a sample dataset on our laptops or home computers. Things are somewhat straightforward until we need to work with much larger datasets and run our ML experiments in the cloud. It also becomes more challenging once we need to deploy our trained models to production-level inference endpoints or web servers. There are a lot of things to consider when designing and building ML systems and these are just some of the challenges data scientists and ML engineers face when working on real-life requirements. That said, we must use the right platform, along with the right set of tools, when performing ML experiments and deployments in the cloud.

    At this point, you might be wondering why we should even use a cloud platform when running our workloads. Can’t we build this platform ourselves? Perhaps you might be thinking that building and operating your own data center is a relatively easy task. In the past, different teams and companies have tried setting up infrastructure within their data centers and on-premise hardware. Over time, these companies started migrating their workloads to the cloud as they realized how hard and expensive it was to manage and operate data centers. A good example of this would be the Netflix team, which migrated their resources to the AWS cloud. Migrating to the cloud allowed them to scale better and allowed them to have a significant increase in service availability.

    The Amazon Web Services (AWS) platform provides a lot of services and capabilities that can be used by professionals and companies around the world to manage different types of workloads in the cloud. These past couple of years, AWS has announced and released a significant number of services, capabilities, and features that can be used for production-level ML experiments and deployments as well. This is due to the increase in ML workloads being migrated to the cloud globally. As we go through each of the chapters in this book, we will have a better understanding of how different services are used to solve the challenges when productionizing ML models.

    The following diagram shows the hands-on journey for this chapter:

    Figure 1.1 – Hands-on journey for this chapter

    Figure 1.1 – Hands-on journey for this chapter

    In this introductory chapter, we will focus on getting our feet wet by trying out different options when building an ML model on AWS. As shown in the preceding diagram, we will use a variety of AutoML services and solutions to build ML models that can help us predict if a hotel booking will be cancelled or not based on the information available. We will start by setting up a Cloud9 environment, which will help us run our code through an integrated development environment (IDE) in our browser. In this environment, we will generate a realistic synthetic dataset using a deep learning model called the Conditional Generative Adversarial Network. We will upload this dataset to Amazon S3 using the AWS CLI. Inside the Cloud9 environment, we will also install AutoGluon and run an AutoML experiment to train and generate multiple models using the synthetic dataset. Finally, we will use SageMaker Canvas and SageMaker Autopilot to run AutoML experiments using the uploaded dataset in S3. If you are wondering what these fancy terms are, keep reading as we demystify each of these in this chapter.

    In this chapter, we will cover the following topics:

    What is expected from ML engineers?

    How ML engineers can get the most out of AWS

    Essential prerequisites

    Preparing the dataset

    AutoML with AutoGluon

    Getting started with SageMaker and SageMaker Canvas

    No-code machine learning with SageMaker Canvas

    AutoML with SageMaker Autopilot

    In addition to getting our feet wet using key ML services, libraries, and tools to perform AutoML experiments, this introductory chapter will help us gain a better understanding of several ML and ML engineering concepts that will be relevant to the succeeding chapters of this book. With this in mind, let’s get started!

    Technical requirements

    Before we start, we must have an AWS account. If you do not have an AWS account yet, simply create an account here: https://aws.amazon.com/free/. You may proceed with the next steps once the account is ready.

    The Jupyter notebooks, source code, and other files for each chapter are available in this book’s GitHub repository: https://github.com/PacktPublishing/Machine-Learning-Engineering-on-AWS.

    What is expected from ML engineers?

    ML engineering involves using ML and software engineering concepts and techniques to design, build, and manage production-level ML systems, along with pipelines. In a team working to build ML-powered applications, ML engineers are generally expected to build and operate the ML infrastructure that’s used to train and deploy models. In some cases, data scientists may also need to work on infrastructure-related requirements, especially if there is no clear delineation between the roles and responsibilities of ML engineers and data scientists in an organization.

    There are several things an ML engineer should consider when designing and building ML systems and platforms. These would include the quality of the deployed ML model, along with the security, scalability, evolvability, stability, and overall cost of the ML infrastructure used. In this book, we will discuss the different strategies and best practices to achieve the different objectives of an ML engineer.

    ML engineers should also be capable of designing and building automated ML workflows using a variety of solutions. Deployed models degrade over time and model retraining becomes essential in ensuring the quality of deployed ML models. Having automated ML pipelines in place helps enable automated model retraining and deployment.

    Important note

    If you are excited to learn more about how to build custom ML pipelines on AWS, then you should check out the last section of this book: Designing and building end-to-end MLOps pipelines. You should find several chapters dedicated to deploying complex ML pipelines on AWS!

    How ML engineers can get the most out of AWS

    There are many services and capabilities in the AWS platform that an ML engineer can choose from. Professionals who are already familiar with using virtual machines can easily spin up EC2 instances and run ML experiments using deep learning frameworks inside these virtual private servers. Services such as AWS Glue, Amazon EMR, and AWS Athena can be utilized by ML engineers and data engineers for different data management and processing needs. Once the ML models need to be deployed into dedicated inference endpoints, a variety of options become available:

    Figure 1.2 – AWS machine learning stack

    Figure 1.2 – AWS machine learning stack

    As shown in the preceding diagram, data scientists, developers, and ML engineers can make use of multiple services and capabilities from the AWS machine learning stack. The services grouped under AI services can easily be used by developers with minimal ML experience. To use the services listed here, all we need would be some experience working with data, along with the software development skills required to use SDKs and APIs. If we want to quickly build ML-powered applications with features such as language translation, text-to-speech, and product recommendation, then we can easily do that using the services under the AI Services bucket. In the middle, we have ML services and their capabilities, which help solve the more custom ML requirements of data scientists and ML engineers. To use the services and capabilities listed here, a solid understanding of the ML process is needed. The last layer, ML frameworks and infrastructure, offers the highest level of flexibility and customizability as this includes the ML infrastructure and framework support needed by more advanced use cases.

    So, how can ML engineers make the most out of the AWS machine learning stack? The ability of ML engineers to design, build, and manage ML systems improves as they become more familiar with the services, capabilities, and tools available in the AWS platform. They may start with AI services to quickly build AI-powered applications on AWS. Over time, these ML engineers will make use of the different services, capabilities, and infrastructure from the lower two layers as they become more comfortable dealing with intermediate ML engineering requirements.

    Essential prerequisites

    In this section, we will prepare the following:

    The Cloud9 environment

    The S3 bucket

    The synthetic dataset, which will be generated using a deep learning model

    Let’s get started.

    Creating the Cloud9 environment

    One of the more convenient options when performing ML experiments inside a virtual private server is to use the AWS Cloud9 service. AWS Cloud9 allows developers, data scientists, and ML engineers to manage and run code within a development environment using a browser. The code is stored and executed inside an EC2 instance, which provides an environment similar to what most developers have.

    Important note

    It is recommended to use an Identity and Access Management (IAM) user with limited permissions instead of the root account when running the examples in this book. We will discuss this along with other security best practices in detail in Chapter 9, Security, Governance, and Compliance Strategies. If you are just starting to use AWS, you may proceed with using the root account in the meantime.

    Follow these steps to create a Cloud9 environment where we will generate the synthetic dataset and run the AutoGluon AutoML experiment:

    Type cloud9 in the search bar. Select Cloud9 from the list of results:

    Figure 1.3 – Navigating to the Cloud9 console

    Figure 1.3 – Navigating to the Cloud9 console

    Here, we can see that the region is currently set to Oregon (us-west-2). Make sure that you change this to where you want the resources to be created.

    Next, click Create environment.

    Under the Name environment field, specify a name for the Cloud9 environment (for example, mle-on-aws) and click Next step.

    Under Environment type, choose Create a new EC2 instance for environment (direct access). Select m5.large for Instance type and then Ubuntu Server (18.04 LTS) for Platform:

    Figure 1.4 – Configuring the Cloud9 environment settings

    Figure 1.4 – Configuring the Cloud9 environment settings

    Here, we can see that there are other options for the instance type. In the meantime, we will stick with m5.large as it should be enough to run the hands-on solutions in this chapter.

    For the Cost-saving setting option, choose After four hours from the list of drop-down options. This means that the server where the Cloud9 environment is running will automatically shut down after 4 hours of inactivity.

    Under Network settings (advanced), select the default VPC of the region for the Network (VPC) configuration. It should have a format similar to vpc-abcdefg (default). For the Subnet option, choose the option that has a format similar to subnet-abcdefg | Default in us-west-2a.

    Important note

    It is recommended that you use the default VPC since the networking configuration is simple. This will help you avoid issues, especially if you’re just getting started with VPCs. If you encounter any VPC-related issues when launching a Cloud9 instance, you may need to check if the selected subnet has been configured with internet access via the route table configuration in the VPC console. You may retry launching the instance using another subnet or by using a new VPC altogether. If you are planning on creating a new VPC, navigate to https://go.aws/3sRSigt and create a VPC with a Single Public Subnet. If none of these options work, you may try launching the Cloud9 instance in another region. We’ll discuss Virtual Private Cloud (VPC) networks in detail in Chapter 9, Security, Governance, and Compliance Strategies.

    Click Next Step.

    On the review page, click Create environment. This should redirect you to the Cloud9 environment, which should take a minute or so to load. The Cloud9 IDE is shown in the following screenshot. This is where we can write our code and run the scripts and commands needed to work on some of the hands-on solutions in this book:

    Figure 1.5 – AWS Cloud9 interface

    Figure 1.5 – AWS Cloud9 interface

    Using this IDE is fairly straightforward as it looks very similar to code editors such as Visual Studio Code and Sublime Text. As shown in the preceding screenshot, we can find the menu bar at the top (A). The file tree can be found on the left-hand side (B). The editor covers a major portion of the screen in the middle (C). Lastly, we can find the terminal at the bottom (D).

    Important note

    If this is your first time using AWS Cloud9, here is a 4-minute introduction video from AWS to help you get started: https://www.youtube.com/watch?v=JDHZOGMMkj8.

    Now that we have our Cloud9 environment ready, it is time we configure it with a larger storage space.

    Increasing Cloud9’s storage

    When a Cloud9 instance is created, the attached volume only starts with 10GB of disk space. Given that we will be installing different libraries and frameworks while running ML experiments in this instance, we will need more than 10GB of disk space. We will resize the volume programmatically using the boto3 library.

    Important note

    If this is your first time using the boto3 library, it is the AWS SDK for Python, which gives us a way to programmatically manage the different AWS resources in our AWS accounts. It is a service-level SDK that helps us list, create, update, and delete AWS resources such as EC2 instances, S3 buckets, and EBS volumes.

    Follow these steps to download and run some scripts to increase the volume disk space from 10GB to 120GB:

    In the terminal of our Cloud9 environment (right after the $ sign at the bottom of the screen), run the following bash command:

    wget -O resize_and_reboot.py https://bit.ly/3ea96tW

    This will download the script file located at https://bit.ly/3ea96tW. Here, we are simply using a URL shortener, which would map the shortened link to https://raw.githubusercontent.com/PacktPublishing/Machine-Learning-Engineering-on-AWS/main/chapter01/resize_and_reboot.py.

    Important note

    Note that we are using the big O flag instead of a small o or a zero (0) when using the wget command.

    What’s inside the file we just downloaded? Let’s quickly inspect the file before we run the script. Double-click the resize_and_reboot.py file in the file tree (located on the left-hand side of the screen) to open the Python script file in the editor pane. As shown in the following screenshot, the resize_and_reboot.py script has three major sections. The first block of code focuses on importing the prerequisites needed to run the script. The second block of code focuses on resizing the volume of a selected EC2 instance using the boto3 library. It makes use of the describe_volumes() method to get the volume ID of the current instance, and then makes use of the modify_volume() method to update the volume size to 120GB. The last section involves a single line of code that simply reboots the EC2 instance. This line of code uses the os.system() method to run the sudo reboot shell command:

    Figure 1.6 – The resize_and_reboot.py script file

    Figure 1.6 – The resize_and_reboot.py script file

    You can find the resize_and_reboot.py script file in this book’s GitHub repository: https://github.com/PacktPublishing/Machine-Learning-Engineering-on-AWS/blob/main/chapter01/resize_and_reboot.py. Note that for this script to work, the EC2_INSTANCE_ID environment variable must be set to select the correct target instance. We’ll set this environment variable a few steps from now before we run the resize_and_reboot.py script.

    Next, run the following command in the terminal:

    python3 -m pip install --user --upgrade boto3

    This will upgrade the version of boto3 using pip.

    Important note

    If this is your first time using pip, it is the package installer for Python. It makes it convenient to install different packages and libraries using the command line.

    You may use python3 -m pip show boto3 to check the version you are using. This book assumes that you are using version 1.20.26 or later.

    The remaining statements focus on getting the Cloud9 environment’s instance_id from the instance metadata service and storing this value in the EC2_INSTANCE_ID variable. Let’s run the following in the terminal:

    TARGET_METADATA_URL=http://169.254.169.254/latest/meta-data/instance-idexport EC2_INSTANCE_ID=$(curl -s $TARGET_METADATA_URL)echo $EC2_INSTANCE_ID

    This should give us an EC2 instance ID with a format similar to i-01234567890abcdef.

    Now that we have the EC2_INSTANCE_ID environment variable set with the appropriate value, we can run the following command:

    python3 resize_and_reboot.py

    This will run the Python script we downloaded earlier using the wget command. After performing the volume resize operation using boto3, the script will reboot the instance. You should see a Reconnecting… notification at the top of the page while the Cloud9 environment’s EC2 instance is being restarted.

    Important note

    Feel free to run the lsblk command after the instance has been restarted. This should help you verify that the volume of the Cloud9 environment instance has been resized to 120GB.

    Now that we have successfully resized the volume to 120GB, we should be able to work on the next set of solutions without having to worry about disk space

    Enjoying the preview?
    Page 1 of 1