The Machine Learning Solutions Architect Handbook: Practical strategies and best practices on the ML lifecycle, system design, MLOps, and generative AI
By David Ping
()
About this ebook
David Ping, Head of GenAI and ML Solution Architecture for global industries at AWS, provides expert insights and practical examples to help you become a proficient ML solutions architect, linking technical architecture to business-related skills.
You'll learn about ML algorithms, cloud infrastructure, system design, MLOps , and how to apply ML to solve real-world business problems. David explains the generative AI project lifecycle and examines Retrieval Augmented Generation (RAG), an effective architecture pattern for generative AI applications. You’ll also learn about open-source technologies, such as Kubernetes/Kubeflow, for building a data science environment and ML pipelines before building an enterprise ML architecture using AWS. As well as ML risk management and the different stages of AI/ML adoption, the biggest new addition to the handbook is the deep exploration of generative AI.
By the end of this book , you’ll have gained a comprehensive understanding of AI/ML across all key aspects, including business use cases, data science, real-world solution architecture, risk management, and governance. You’ll possess the skills to design and construct ML solutions that effectively cater to common use cases and follow established ML architecture patterns, enabling you to excel as a true professional in the field.
Related to The Machine Learning Solutions Architect Handbook
Related ebooks
Expert Data Modeling with Power BI: Enrich and optimize your data models to get the best out of Power BI for reporting and business needs Rating: 0 out of 5 stars0 ratingsPython Data Cleaning Cookbook: Prepare your data for analysis with pandas, NumPy, Matplotlib, scikit-learn, and OpenAI Rating: 0 out of 5 stars0 ratingsBuilding Modern GUIs with tkinter and Python: Building user-friendly GUI applications with ease (English Edition) Rating: 0 out of 5 stars0 ratingsThe Definitive Guide to Power Query (M): Mastering complex data transformation with Power Query Rating: 0 out of 5 stars0 ratingsArtificial Intelligence for Process & Product Innovation Rating: 0 out of 5 stars0 ratingsUltimate Data Engineering with Databricks Rating: 0 out of 5 stars0 ratingsOdoo Development Cookbook: Build effective business applications using the latest features in Odoo 17 Rating: 0 out of 5 stars0 ratingsOLAP Solutions: Building Multidimensional Information Systems Rating: 3 out of 5 stars3/5Introduction to Machine Learning in the Cloud with Python: Concepts and Practices Rating: 0 out of 5 stars0 ratingsLegal Analytics A Complete Guide - 2021 Edition Rating: 0 out of 5 stars0 ratingsMastering Time Series Analysis and Forecasting with Python Rating: 0 out of 5 stars0 ratingsReact Application Architecture for Production: Learn best practices and expert tips to deliver enterprise-ready React web apps Rating: 0 out of 5 stars0 ratingsSelf-Service Data Analytics and Governance for Managers Rating: 0 out of 5 stars0 ratingsUltimate Web Authentication Handbook Rating: 0 out of 5 stars0 ratingsData Modeling and Database Design: Turn Your Data into Actionable Insights Rating: 0 out of 5 stars0 ratings"Artificial Intelligence: How Does It Work? And How to Use It?" Rating: 0 out of 5 stars0 ratingsMachine Learning Complete Self-Assessment Guide Rating: 0 out of 5 stars0 ratingsAI in Retail Second Edition Rating: 0 out of 5 stars0 ratingsMastering Machine Learning: A Comprehensive Guide to Success Rating: 0 out of 5 stars0 ratingsProfit From Your Forecasting Software: A Best Practice Guide for Sales Forecasters Rating: 0 out of 5 stars0 ratingsAWS Certified Data Analytics Study Guide: Specialty (DAS-C01) Exam Rating: 0 out of 5 stars0 ratingsUltimate Python Libraries for Data Analysis and Visualization Rating: 0 out of 5 stars0 ratingsData Science Solutions on Azure: Tools and Techniques Using Databricks and MLOps Rating: 0 out of 5 stars0 ratingsSoftware Product Managers A Complete Guide - 2020 Edition Rating: 0 out of 5 stars0 ratingsDeep Learning Complete Self-Assessment Guide Rating: 0 out of 5 stars0 ratingsUnreal Engine 5 Shaders and Effects Cookbook: Over 50 recipes to help you create materials and utilize advanced shading techniques Rating: 0 out of 5 stars0 ratingsSelf-Service AI with Power BI Desktop: Machine Learning Insights for Business Rating: 0 out of 5 stars0 ratingsMastering MEAN Stack: Build full stack applications using MongoDB, Express.js, Angular, and Node.js (English Edition) Rating: 0 out of 5 stars0 ratings
Computers For You
The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution Rating: 4 out of 5 stars4/5The Invisible Rainbow: A History of Electricity and Life Rating: 5 out of 5 stars5/5Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition Rating: 4 out of 5 stars4/5CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61 Rating: 0 out of 5 stars0 ratingsThe Hacker Crackdown: Law and Disorder on the Electronic Frontier Rating: 4 out of 5 stars4/5Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics Rating: 4 out of 5 stars4/5Elon Musk Rating: 4 out of 5 stars4/5Mastering ChatGPT: 21 Prompts Templates for Effortless Writing Rating: 5 out of 5 stars5/5Slenderman: Online Obsession, Mental Illness, and the Violent Crime of Two Midwestern Girls Rating: 4 out of 5 stars4/5Master Builder Roblox: The Essential Guide Rating: 4 out of 5 stars4/5Uncanny Valley: A Memoir Rating: 4 out of 5 stars4/5CompTIA Security+ Get Certified Get Ahead: SY0-701 Study Guide Rating: 5 out of 5 stars5/5Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are Rating: 4 out of 5 stars4/5The Professional Voiceover Handbook: Voiceover training, #1 Rating: 5 out of 5 stars5/5101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters Rating: 4 out of 5 stars4/5Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates Rating: 4 out of 5 stars4/5Grokking Algorithms: An illustrated guide for programmers and other curious people Rating: 4 out of 5 stars4/5People Skills for Analytical Thinkers Rating: 5 out of 5 stars5/5The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology Rating: 4 out of 5 stars4/5Deep Search: How to Explore the Internet More Effectively Rating: 5 out of 5 stars5/5SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5
Reviews for The Machine Learning Solutions Architect Handbook
0 ratings0 reviews
Book preview
The Machine Learning Solutions Architect Handbook - David Ping
The Machine Learning Solutions Architect Handbook
Second Edition
Practical strategies and best practices on the ML lifecycle, system design, MLOps, and generative AI
David Ping
The Machine Learning Solutions Architect Handbook
Second Edition
Copyright © 2024 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Publishing Product Manager: Bhavesh Amin
Acquisition Editor – Peer Reviews: Gaurav Gavas
Project Editor: Amisha Vathare
Content Development Editor: Tanya D’cruz
Copy Editor: Safis Editing
Technical Editor: Anjitha Murali
Proofreader: Safis Editing
Indexer: Hemangini Bari
Presentation Designer: Ajay Patule
Developer Relations Marketing Executive: Monika Sangwan
First published: January 2022
Second edition: April 2024
Production reference: 1080424
Published by Packt Publishing Ltd.
Grosvenor House
11 St Paul’s Square
Birmingham
B3 1RB, UK.
ISBN 978-1-80512-250-0
www.packt.com
Contributors
About the author
David Ping is a seasoned technology executive with over 25 years of experience in the technology and financial services sectors. Specializing in cloud architecture, AI/ML, generative AI, ML platforms, and data analytics, he currently leads a global AI/ML solutions architecture team for industries at AWS, guiding companies worldwide in deploying cutting-edge AI/ML solutions. Previously holding executive roles at Credit Suisse and JPMorgan, David began his career as a software engineer at Intel after graduating with an engineering degree from Cornell University.
About the reviewers
Sepehr Pakbaz has been developing software since 2000 and has experience in full-stack software development, working with a variety of programming languages such as Python, JavaScript, .NET, and recently Golang. He has also worked as a product owner, consultant, and cloud solution architect. He has worked for companies like IBM and Microsoft in the past and is currently a Solutions Architect at Amazon Web Services. Additionally, he works as a consultant for his own company, Starspak LLC, as a side hustle.
Chakravarthy Nagarajan is a technology evangelist with 23 years of industry experience in ML, big data, and high performance computing. He is currently working as a Principal AI/ML Specialist Solutions Architect at Amazon Web Services based in Bay Area, USA. He helps customers solve real-world complex business problems by building prototypes with end-to-end AI/ML solutions on cloud and edge devices. His specialization includes generative AI, computer vision, natural language processing, time series forecasting, and personalization. In his current role, Chakravarthy helps customers across start-ups, enterprises, and ISVs to solve their business problems using AI and ML solutions across North America.
Amit Nandi is a Solutions and Enterprise Architect specializing in driving innovation across diverse industries, including financial, pharmaceutical, manufacturing, and retail. He is recognized for architecting and implementing groundbreaking business paradigms through the integration of big data technologies, real-time streaming, and cutting-edge ML and AI solutions. He built an ML/AI - powered cybersecurity platform and enabled MLOps for the research team of a large pharmaceutical company.
Join our community on Discord
Join our community’s Discord space for discussions with the author and other readers:
https://packt.link/mlsah
Contents
Preface
Who this book is for
What this book covers
To get the most out of this book
Get in touch
Navigating the ML Lifecycle with ML Solutions Architecture
ML versus traditional software
ML lifecycle
Business problem understanding and ML problem framing
Data understanding and data preparation
Model training and evaluation
Model deployment
Model monitoring
Business metric tracking
ML challenges
ML solutions architecture
Business understanding and ML transformation
Identification and verification of ML techniques
System architecture design and implementation
ML platform workflow automation
Security and compliance
Summary
Exploring ML Business Use Cases
ML use cases in financial services
Capital market front office
Sales trading and research
Investment banking
Wealth management
Capital market back office operations
Net Asset Value review
Post-trade settlement failure prediction
Risk management and fraud
Anti-money laundering
Trade surveillance
Credit risk
Insurance
Insurance underwriting
Insurance claim management
ML use cases in media and entertainment
Content development and production
Content management and discovery
Content distribution and customer engagement
ML use cases in healthcare and life sciences
Medical imaging analysis
Drug discovery
Healthcare data management
ML use cases in manufacturing
Engineering and product design
Manufacturing operations – product quality and yield
Manufacturing operations – machine maintenance
ML use cases in retail
Product search and discovery
Targeted marketing
Sentiment analysis
Product demand forecasting
ML use cases in the automotive industry
Autonomous vehicles
Perception and localization
Decision and planning
Control
Advanced driver assistance systems (ADAS)
Summary
Exploring ML Algorithms
Technical requirements
How machines learn
Overview of ML algorithms
Consideration for choosing ML algorithms
Algorithms for classification and regression problems
Linear regression algorithms
Logistic regression algorithms
Decision tree algorithms
Random forest algorithm
Gradient boosting machine and XGBoost algorithms
K-nearest neighbor algorithm
Multi-layer perceptron (MLP) networks
Algorithms for clustering
Algorithms for time series analysis
ARIMA algorithm
DeepAR algorithm
Algorithms for recommendation
Collaborative filtering algorithm
Multi-armed bandit/contextual bandit algorithm
Algorithms for computer vision problems
Convolutional neural networks
ResNet
Algorithms for natural language processing (NLP) problems
Word2Vec
BERT
Generative AI algorithms
Generative adversarial network
Generative pre-trained transformer (GPT)
Large Language Model
Diffusion model
Hands-on exercise
Problem statement
Dataset description
Setting up a Jupyter Notebook environment
Running the exercise
Summary
Data Management for ML
Technical requirements
Data management considerations for ML
Data management architecture for ML
Data storage and management
AWS Lake Formation
Data ingestion
Kinesis Firehose
AWS Glue
AWS Lambda
Data cataloging
AWS Glue Data Catalog
Custom data catalog solution
Data processing
ML data versioning
S3 partitions
Versioned S3 buckets
Purpose-built data version tools
ML feature stores
Data serving for client consumption
Consumption via API
Consumption via data copy
Special databases for ML
Vector databases
Graph databases
Data pipelines
Authentication and authorization
Data governance
Data lineage
Other data governance measures
Hands-on exercise – data management for ML
Creating a data lake using Lake Formation
Creating a data ingestion pipeline
Creating a Glue Data Catalog
Discovering and querying data in the data lake
Creating an Amazon Glue ETL job to process data for ML
Building a data pipeline using Glue workflows
Summary
Exploring Open-Source ML Libraries
Technical requirements
Core features of open-source ML libraries
Understanding the scikit-learn ML library
Installing scikit-learn
Core components of scikit-learn
Understanding the Apache Spark ML library
Installing Spark ML
Core components of the Spark ML library
Understanding the TensorFlow deep learning library
Installing TensorFlow
Core components of TensorFlow
Hands-on exercise – training a TensorFlow model
Understanding the PyTorch deep learning library
Installing PyTorch
Core components of PyTorch
Hands-on exercise – building and training a PyTorch model
How to choose between TensorFlow and PyTorch
Summary
Kubernetes Container Orchestration Infrastructure Management
Technical requirements
Introduction to containers
Overview of Kubernetes and its core concepts
Namespaces
Pods
Deployment
Kubernetes Job
Kubernetes custom resources and operators
Services
Networking on Kubernetes
Security and access management
API authentication and authorization
Hands-on – creating a Kubernetes infrastructure on AWS
Problem statement
Lab instruction
Summary
Open-Source ML Platforms
Core components of an ML platform
Open-source technologies for building ML platforms
Implementing a data science environment
Building a model training environment
Registering models with a model registry
Serving models using model serving services
The Gunicorn and Flask inference engine
The TensorFlow Serving framework
The TorchServe serving framework
KFServing framework
Seldon Core
Triton Inference Server
Monitoring models in production
Managing ML features
Automating ML pipeline workflows
Apache Airflow
Kubeflow Pipelines
Designing an end-to-end ML platform
ML platform-based strategy
ML component-based strategy
Summary
Building a Data Science Environment Using AWS ML Services
Technical requirements
SageMaker overview
Data science environment architecture using SageMaker
Onboarding SageMaker users
Launching Studio applications
Preparing data
Preparing data interactively with SageMaker Data Wrangler
Preparing data at scale interactively
Processing data as separate jobs
Creating, storing, and sharing features
Training ML models
Tuning ML models
Deploying ML models for testing
Best practices for building a data science environment
Hands-on exercise – building a data science environment using AWS services
Problem statement
Dataset description
Lab instructions
Setting up SageMaker Studio
Launching a JupyterLab notebook
Training the BERT model in the Jupyter notebook
Training the BERT model with the SageMaker Training service
Deploying the model
Building ML models with SageMaker Canvas
Summary
Designing an Enterprise ML Architecture with AWS ML Services
Technical requirements
Key considerations for ML platforms
The personas of ML platforms and their requirements
ML platform builders
Platform users and operators
Common workflow of an ML initiative
Platform requirements for the different personas
Key requirements for an enterprise ML platform
Enterprise ML architecture pattern overview
Model training environment
Model training engine using SageMaker
Automation support
Model training lifecycle management
Model hosting environment
Inference engines
Authentication and security control
Monitoring and logging
Adopting MLOps for ML workflows
Components of the MLOps architecture
Monitoring and logging
Model training monitoring
Model endpoint monitoring
ML pipeline monitoring
Service provisioning management
Best practices in building and operating an ML platform
ML platform project execution best practices
ML platform design and implementation best practices
Platform use and operations best practices
Summary
Advanced ML Engineering
Technical requirements
Training large-scale models with distributed training
Distributed model training using data parallelism
Parameter server overview
AllReduce overview
Distributed model training using model parallelism
Naïve model parallelism overview
Tensor parallelism/tensor slicing overview
Implementing model-parallel training
Achieving low-latency model inference
How model inference works and opportunities for optimization
Hardware acceleration
Central processing units (CPUs)
Graphics processing units (GPUs)
Application-specific integrated circuit
Model optimization
Quantization
Pruning (also known as sparsity)
Graph and operator optimization
Graph optimization
Operator optimization
Model compilers
TensorFlow XLA
PyTorch Glow
Apache TVM
Amazon SageMaker Neo
Inference engine optimization
Inference batching
Enabling parallel serving sessions
Picking a communication protocol
Inference in large language models
Text Generation Inference (TGI)
DeepSpeed-Inference
FastTransformer
Hands-on lab – running distributed model training with PyTorch
Problem statement
Dataset description
Modifying the training script
Modifying and running the launcher notebook
Summary
Building ML Solutions with AWS AI Services
Technical requirements
What are AI services?
Overview of AWS AI services
Amazon Comprehend
Amazon Textract
Amazon Rekognition
Amazon Transcribe
Amazon Personalize
Amazon Lex V2
Amazon Kendra
Amazon Q
Evaluating AWS AI services for ML use cases
Building intelligent solutions with AI services
Automating loan document verification and data extraction
Loan document classification workflow
Loan data processing flow
Media processing and analysis workflow
E-commerce product recommendation
Customer self-service automation with intelligent search
Designing an MLOps architecture for AI services
AWS account setup strategy for AI services and MLOps
Code promotion across environments
Monitoring operational metrics for AI services
Hands-on lab – running ML tasks using AI services
Summary
AI Risk Management
Understanding AI risk scenarios
The regulatory landscape around AI risk management
Understanding AI risk management
Governance oversight principles
AI risk management framework
Applying risk management across the AI lifecycle
Business problem identification and definition
Data acquisition and management
Risk considerations
Risk mitigations
Experimentation and model development
Risk considerations
Risk mitigations
AI system deployment and operations
Risk considerations
Risk mitigations
Designing ML platforms with governance and risk management considerations
Data and model documentation
Lineage and reproducibility
Observability and auditing
Scalability and performance
Data quality
Summary
Bias, Explainability, Privacy, and Adversarial Attacks
Understanding bias
Understanding ML explainability
LIME
SHAP
Understanding security and privacy-preserving ML
Differential privacy
Understanding adversarial attacks
Evasion attacks
PGD attacks
HopSkipJump attacks
Data poisoning attacks
Clean-label backdoor attack
Model extraction attack
Attacks against generative AI models
Defense against adversarial attacks
Robustness-based methods
Detector-based method
Open-source tools for adversarial attacks and defenses
Hands-on lab – detecting bias, explaining models, training privacy-preserving mode, and simulating adversarial attack
Problem statement
Detecting bias in the training dataset
Explaining feature importance for a trained model
Training privacy-preserving models
Simulate a clean-label backdoor attack
Summary
Charting the Course of Your ML Journey
ML adoption stages
Exploring AI/ML
Disjointed AI/ML
Integrated AI/ML
Advanced AI/ML
AI/ML maturity and assessment
Technical maturity
Business maturity
Governance maturity
Organization and talent maturity
Maturity assessment and improvement process
AI/ML operating models
Centralized model
Decentralized model
Hub and spoke model
Solving ML journey challenges
Developing the AI vision and strategy
Getting started with the first AI/ML initiative
Solving scaling challenges with AI/ML adoption
Solving ML use case scaling challenges
Solving technology scaling challenges
Solving governance scaling challenges
Summary
Navigating the Generative AI Project Lifecycle
The advancement and economic impact of generative AI
What industries are doing with generative AI
Financial services
Healthcare and life sciences
Media and entertainment
Automotive and manufacturing
The lifecycle of a generative AI project and the core technologies
Business use case selection
FM selection and evaluation
Initial screening via manual assessment
Automated model evaluation
Human evaluation
Assessing AI risks for FMs
Other evaluation consideration
Building FMs from scratch via pre-training
Adaptation and customization
Domain adaptation pre-training
Fine-tuning
Reinforcement learning from human feedback
Prompt engineering
Model management and deployment
The limitations, risks, and challenges of adopting generative AI
Summary
Designing Generative AI Platforms and Solutions
Operational considerations for generative AI platforms and solutions
New generative AI workflow and processes
New technology components
New roles
Exploring generative AI platforms
The prompt management component
FM benchmark workbench
Supervised fine-tuning and RLHF
FM monitoring
The retrieval-augmented generation pattern
Open-source frameworks for RAG
LangChain
LlamaIndex
Evaluating a RAG pipeline
Advanced RAG patterns
Designing a RAG architecture on AWS
Choosing an LLM adaptation method
Response quality
Cost of the adaptation
Implementation complexity
Bringing it all together
Considerations for deploying generative AI applications in production
Model readiness
Decision-making workflow
Responsible AI assessment
Guardrails in production environments
External knowledge change management
Practical generative AI business solutions
Generative AI-powered semantic search engine
Financial data analysis and research workflow
Clinical trial recruiting workflow
Media entertainment content creation workflow
Car design workflow
Contact center customer service operation
Are we close to having artificial general intelligence?
The symbolic approach
The connectionist/neural network approach
The neural-symbolic approach
Summary
Other Books You May Enjoy
Index
Landmarks
Cover
Index
Preface
As artificial intelligence (AI) continues to gain traction across diverse industries, the need for proficient machine learning (ML) solutions architects is on the rise. These professionals play a pivotal role in bridging business requirements with ML solutions, crafting ML technology platforms that address both business and technical challenges. This book is designed to equip individuals with a comprehensive understanding of business use cases, ML algorithms, system architecture patterns, ML tools, AI risk management, enterprise AI adoption strategies, and the emerging field of generative AI.
Upon completing this book, you will possess a comprehensive understanding of AI/ML and generative AI topics, encompassing business use cases, scientific principles, technological underpinnings, architectural considerations, risk management, operational aspects, and the journey towards enterprise adoption. Moreover, you will acquire hands-on technical proficiency with a diverse array of open-source and AWS technologies, empowering you to build and deploy cutting-edge AI/ML and generative AI solutions effectively. This holistic knowledge and practical skillset will enable you to articulate and address the multifaceted challenges and opportunities presented by these disruptive technologies.
Who this book is for
This book is designed for two primary audiences: developers and cloud architects who are looking for guidance and hands-on learning materials to become ML solutions architects, and experienced ML architecture practitioners and data scientists who are looking to develop a broader understanding of industry ML use cases, enterprise data and ML architecture patterns, data management and ML tools, ML governance, and advanced ML engineering techniques. This book can also benefit data engineers and cloud system administrators looking to understand how data management and cloud system architecture fit into the overall ML platform architecture. Risk professionals, AI product managers, and technology decision makers will also benefit from topics on AI risk management, business AI use cases, and ML maturity journey and best practices.
This book assumes you have some Python programming knowledge and are familiar with AWS services. Some of the chapters are designed for ML beginners to learn the core ML fundamentals, and they might overlap with the knowledge already possessed by experienced ML practitioners.
What this book covers
Chapter 1, Navigating the ML Lifecycle with ML Solutions Architecture, introduces ML solutions architecture functions, covering its fundamentals and scope.
Chapter 2, Exploring ML Business Use Cases, talks about real-world applications of AI/ML across various industries such as financial services, healthcare, media entertainment, automotive, manufacturing, and retail.
Chapter 3, Exploring ML Algorithms, introduces common ML and deep learning algorithms for classification, regression, clustering, time series, recommendations, computer vision, natural language processing, and generative AI tasks. You will get hands-on experience of setting up a Jupyter server and building ML models on your local machine.
Chapter 4, Data Management for ML, addresses the crucial topic of data management for ML, detailing how to leverage an array of AWS services to construct robust data management architectures. You will develop hands-on skills with AWS services for building data management pipelines for ML.
Chapter 5, Exploring Open-Source ML Libraries, covers the core features of scikit-learn, Spark ML, PyTorch and TensorFlow, and how to use these ML libraries for data preparation, model training, and model serving. You will practice building deep learning models using TensorFlow and PyTorch.
Chapter 6, Kubernetes Container Orchestration Infrastructure Management, introduces containers, Kubernetes concepts, Kubernetes networking, and Kubernetes security. Kubernetes is a core open-source infrastructure for building open-source ML solutions. You will also practice setting up the Kubernetes platform on AWS EKS and deploying an ML workload in Kubernetes.
Chapter 7, Open-Source ML Platforms, talks about the core concepts and the technical details of various open-source ML platform technologies, such as Kubeflow, MLflow, AirFlow, and Seldon Core. The chapter also covers how to use these technologies to build a data science environment and ML automation pipeline.
Chapter 8, Building a Data Science Environment Using AWS ML Services, introduces various AWS managed services for building data science environments, including Amazon SageMaker, Amazon ECR, and Amazon CodeCommit. You will also get hands-on experience with these services to configure a data science environment for experimentation and model training.
Chapter 9, Designing an Enterprise ML Architecture with AWS ML Services, talks about the core requirements for an enterprise ML platform, discusses the architecture patterns and best practices for building an enterprise ML platform on AWS, and dives deep into the various core ML capabilities of SageMaker and other AWS services.
Chapter 10, Advanced ML Engineering, provides insights into advanced ML engineering aspects such as distributed model training and low-latency model serving, crucial for meeting the demands of large-scale model training and high-performance serving requirements. You will also get hands on with distributed data parallel model training using a SageMaker training cluster.
Chapter 11, Building ML Solutions with AWS AI Services, will introduce AWS AI services and the types of problems these services can help solve without building an ML model from scratch. You will learn about the core capabilities of some key AI services and where they can be leveraged for building ML-powered business applications.
Chapter 12, AI Risk Management, explores AI risk management principles, frameworks, and risk and mitigation, providing comprehensive coverage of AI risk scenarios, guiding principles, frameworks, and risk mitigation considerations across the entire ML lifecycle. It elucidates how ML platforms can facilitate governance through documentation, model inventory maintenance, and monitoring processes.
Chapter 13, Bias, Explainability, Privacy, and Adversarial Attacks, delves into the technical aspects of various risks, providing in-depth explanations of bias detection techniques, model explainability methods, privacy preservation approaches, as well as adversarial attack scenarios and corresponding mitigation strategies.
Chapter 14, Charting the Course of Your ML Journey, outlines the stages of adoption and presents a corresponding maturity model designed to facilitate progress along the ML journey. Additionally, it addresses key considerations essential for overcoming the hurdles encountered throughout this process.
Chapter 15, Navigating the Generative AI Project Lifecycle, discusses the advancement and economic impact of generative AI, the various industry trends in generative AI adoption, and guides readers through the various stages of a generative AI project, from ideation to deployment, exploring various generative AI technologies, and limitations and challenges along the way.
Chapter 16, Designing Generative AI Platforms and Solutions, explores generative AI platforms’ architecture, the retrieval-augmented generation (RAG) application architecture and best practices, considerations for generative AI production deployment and practical generative AI-powered business applications across diverse industry use cases.
The chapter finishes with a discussion on artificial general intelligence (AGI) and various theoretical approaches the research community has taken in their pursuit of AGI.
To get the most out of this book
If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book’s GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.
For the hardware/software requirements for the book, all you will need is a Windows or Mac machine, and an AWS account.
Download the example code files
You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/The-Machine-Learning-Solutions-Architect-and-Risk-Management-Handbook-Second-Edition/. If there’s an update to the code, it will be updated in the GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
Download the color images
We also provide a PDF file that has color images of the screenshots and diagrams used in this book. You can download it here: https://packt.link/gbp/9781805122500.
Conventions used
There are a number of text conventions used throughout this book.
Code in text
: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "Mount the downloaded
WebStorm-10*.dmg
disk image file as another disk in your system."
A block of code is set as follows:
import
pandas
as
pd churn_data = pd.read_csv(
churn.csv
) churn_data.head()
When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:
# The following command calculates the various statistics
for
the features.
churn_data.describe()
# The following command displays the histograms for the
different features.
# You can replace the column names to plot the histograms
for
other features
churn_data.hist([
'CreditScore'
,
'Age'
,
'Balance'
])
# The following command calculate the correlations among
features
churn_data.corr()
Any command-line input or output is written as follows:
! pip3 install --upgrade tensorflow
Bold: Indicates a new term, an important word, or words that you see on screen. For instance, words in menus or dialog boxes appear in bold. Here is an example: "An example of a deep learning-based solution is the Amazon Echo virtual assistant."
Warnings or important notes appear like this.
Tips and tricks appear like this.
Get in touch
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, email us at
and mention the book title in the subject of your message.
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.
Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at
with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit
authors.packtpub.com
.
Share your thoughts
Once you’ve read The Machine Learning Solutions Architect Handbook, Second Edition, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.
Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.
Download a free PDF copy of this book
Thanks for purchasing this book!
Do you like to read on the go but are unable to carry your print books everywhere?
Is your eBook purchase not compatible with the device of your choice?
Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.
Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.
The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily.
Follow these simple steps to get the benefits:
Scan the QR code or visit the link below:
https://packt.link/free-ebook/9781805122500
Submit your proof of purchase.
That’s it! We’ll send your free PDF and other benefits to your email directly.
1
Navigating the ML Lifecycle with ML Solutions Architecture
The field of artificial intelligence (AI) and machine learning (ML) has had a long history. Over the last 70+ years, ML has evolved from checker game-playing computer programs in the 1950s to advanced AI capable of beating the human world champion in the game of Go. More recently, Generative AI (GenAI) technology such as ChatGPT has been taking the industry by storm, generating huge interest among company executives and consumers alike, promising new ways to transform businesses such as drug discovery, new media content, financial report analysis, and consumer product design. Along the way, the technology infrastructure for ML has also evolved from a single machine/server for small experiments and models to highly complex end-to-end ML platforms capable of training, managing, and deploying tens of thousands of ML models. The hyper-growth in the AI/ML field has resulted in the creation of many new professional roles, such as MLOps engineering, AI/ML product management, ML software engineering, AI risk manager, and AI strategist across a range of industries.
Machine learning solutions architecture (ML solutions architecture) is another relatively new discipline that is playing an increasingly critical role in the full end-to-end ML lifecycle as ML projects become increasingly complex in terms of business impact, science sophistication, and the technology landscape.
This chapter will help you understand where ML solutions architecture fits in the full data science lifecycle. We will discuss the different steps it will take to get an ML project from the ideation stage to production and the challenges faced by organizations, such as use case identification, data quality issues, and shortage of ML talent when implementing an ML initiative. Finally, we will finish the chapter by briefly discussing the core focus areas of ML solutions architecture, including system architecture, workflow automation, and security and compliance.
In this chapter, we are going to cover the following main topics:
ML versus traditional software
The ML lifecycle and its key challenges
What is ML solutions architecture, and where does it fit in the overall lifecycle?
Upon completing this chapter, you will understand the role of an ML solutions architect and what business and technology areas you need to focus on to support end-to-end ML initiatives. The intent of this chapter is to offer a fundamental introduction to the ML lifecycle for those in the early stages of their exploration in the field. Experienced ML practitioners may wish to skip this foundational overview and proceed directly to more advanced content.
The more advanced section commences in Chapter 4; however, many technical practitioners may find Chapter 2 helpful, as numerous technical practitioners often need more business understanding of where ML can be applied in different businesses and workflows. Additionally, Chapter 3, could prove beneficial for certain practitioners, as it provides an introduction to ML algorithms for those new to this topic and can also serve as a refresher for those practicing these concepts regularly.
ML versus traditional software
Before I started working in the field of AI/ML, I spent many years building computer software platforms for large financial services institutions. Some of the business problems I worked on had complex rules, such as identifying companies for comparable analysis for investment banking deals or creating a master database for all the different companies’ identifiers from the different data providers. We had to implement hardcoded rules in database-stored procedures and application server backends to solve these problems. We often debated if certain rules made sense or not for the business problems we tried to solve.
As rules changed, we had to reimplement the rules and make sure the changes did not break anything. To test for new releases or changes, we often replied to human experts to exhaustively test and validate all the business logic implemented before the production release. It was a very time-consuming and error-prone process and required a significant amount of engineering, testing against the documented specification, and rigorous change management for deployment every time new rules were introduced, or existing rules needed to be changed. We often replied to users to report business logic issues in production, and when an issue was reported in production, we sometimes had to open up the source code to troubleshoot or explain the logic of how it worked. I remember I often asked myself if there were better ways to do this.
After I started working in the field of AI/ML, I started to solve many similar challenges using ML techniques. With ML, I did not need to come up with complex rules that often require deep data and domain expertise to create or maintain the complex rules for decision making. Instead, I focused on collecting high-quality data and used ML algorithms to learn the rules and patterns from the data directly. This new approach eliminated many of the challenging aspects of creating new rules (for example, a deep domain expertise requirement, or avoiding human bias) and maintaining existing rules. To validate the model before the production release, we could examine model performance metrics such as accuracy. While it still required data science expertise to interpret the model metrics against the nature of the business problems and dataset, it did not require exhaustive manual testing of all the different scenarios. When a model was deployed into production, we would monitor if the model performed as expected by monitoring any significant changes in production data versus the data we have collected for model training. We would collect new unseen data and labels for production data and test the model performance periodically to ensure that its predictive accuracy remains robust when faced with new, previously unseen production data. To explain why a model made a decision the way it did, we did not need to open up the source code to re-examine the hardcoded logic. Instead, we would rely on ML techniques to help explain the relative importance of different input features to understand what factors were most influential in the decision-making by the ML models.
The following figure shows a graphical view of the process differences between developing a piece of software and training an ML model:
Figure 1.1: ML and computer software
Now that you know the difference between ML and traditional software, it is time to dive deep into understanding the different stages in an ML lifecycle.
ML lifecycle
One of the early ML projects that I worked on was a fascinating yet daunting sports predictive analytics problem for a major league brand. I was given a list of predictive analytics outcomes to think about to see if there were ML solutions for the problems. I was a casual viewer of the sport; I didn’t know anything about the analytics to be generated, nor the rules of the games in the detail that was needed. I was provided with some sample data but had no idea what to do with it.
The first thing I started to work on was an immersion in the sport itself. I delved into the intricacies of the game, studying the different player positions and events that make up each game and play. Only after being armed with the newfound domain knowledge did the data start to make sense. Together with the stakeholder, we evaluated the impact of the different analytics outcomes and assessed the modeling feasibility based on the data we had. With a clear understanding of the data, we came up with a couple of top ML analytics with the most business impact to focus on. We also decided how they would be integrated into the existing business workflow, and how they would be measured on their impacts.
Subsequently, I delved deeper into the data to ascertain what information was available and what was lacking. The raw dataset had a lot of irrelevant data points that needed to be removed while the relevant data points needed to be transformed to provide the strongest signals for model training. I processed and prepared the dataset based on a few of the ML algorithms I had considered and conducted experiments to determine the best approach. I lacked a tool to track the different experiment results, so I had to document what I had done manually. After some initial rounds of experimentation, it became evident that the existing data was not sufficient to train a high-performance model. Hence, I decided to build a custom deep learning model to incorporate data of different modalities as the data points had temporal dependencies and required additional spatial information for the modeling. The data owner was able to provide the additional datasets I required, and after more experiments with custom algorithms and significant data preparations and feature engineering, I eventually trained a model that met the business objectives.
After completing the model, another hard challenge began – deploying and operationalizing the model in production and integrating it into the existing business workflow and system architecture. We engaged in many architecture and engineering discussions and eventually built out a deployment architecture for the model.
As you can see from my personal experience, the journey from business idea to ML production deployment involved many steps. A typical lifecycle of an ML project follows a formal structure, which includes several essential stages like business understanding, data acquisition and understanding, data preparation, model building, model evaluation, and model deployment. Since a big component of the lifecycle is experimentation with different datasets, features, and algorithms, the whole process is highly iterative. Furthermore, it is essential to note that there is no guarantee of a successful outcome. Factors such as the availability and quality of data, feature engineering techniques (the process of using domain knowledge to extract useful features from raw data), and the capability of the learning algorithms, among others, can all affect the final results.
Figure 1.2: ML lifecycle
The preceding figure illustrates the key steps in ML projects, and in the subsequent sections, we will delve into each of these steps in greater detail.
Business problem understanding and ML problem framing
The first stage in the lifecycle is business understanding. This stage involves the understanding of the business goals and defining business metrics that can measure the project’s success. For example, the following are some examples of business goals:
Cost reduction for operational processes, such as document processing.
Mitigation of business or operational risks, such as fraud and compliance.
Product or service revenue improvements, such as better target marketing, new insight generation for better decision making, and increased customer satisfaction.
To measure the success, you may use specific business metrics such as the number of hours reduced in a business process, an increased number of true positive frauds detected, a conversion rate improvement from target marketing, or the number of churn rate reductions. This is an essential step to get right to ensure there is sufficient justification for an ML project and that the outcome of the project can be successfully measured.
After you have defined the business goals and business metrics, you need to evaluate if there is an ML solution for the business problem. While ML has a wide scope of applications, it is not always an optimal solution for every business problem.
Data understanding and data preparation
The saying that data is the new oil
holds particularly true for ML. Without the required data, you cannot move forward with an ML project. That’s why the next step in the ML lifecycle is data acquisition, understanding, and preparation.
Based on the business problems and ML approach, you will need to gather and comprehend the available data to determine if you have the right data and data volume to solve the ML problem. For example, suppose the business problem to address is credit card fraud detection. In that case, you will need datasets such as historical credit card transaction data, customer demographics, account data, device usage data, and networking access data. Detailed data analysis is then necessary to determine if the dataset features and quality are sufficient for the modeling tasks. You also need to decide if the data needs labeling, such as
fraud
or
not-fraud
. During this step, depending on the data quality, a significant amount of data wrangling might be performed to prepare and clean the data and to generate the dataset for model training and model evaluation, depending on the data quality.
Model training and evaluation
Using the training and validation datasets established, a data scientist must run a number of experiments using different ML algorithms and dataset features for feature selection and model development. This is a highly iterative process and could require numerous runs of data processing and model development to find the right algorithm and dataset combination for optimal model performance. In addition to model performance, factors such as data bias and model explainability may need to be considered to comply with internal or regulatory requirements.
Prior to deployment into production, the model quality must be validated using the relevant technical metrics, such as the accuracy score. This is usually accomplished using a holdout dataset, also known as a test dataset, to gauge how the model performs on unseen data. It is crucial to understand which metrics are appropriate for model validation, as they vary depending on the ML problems and the dataset used. For example, model accuracy would be a suitable validation metric for a document classification use case if the number of document types is relatively balanced. However, model accuracy would not be a good metric to evaluate the model performance for a fraud detection use case – this is because the number of frauds is small and even if the model predicts
not-fraud
all the time, the model accuracy could still be very high.
Model deployment
After the model is fully trained and validated to meet the expected performance metric, it can be deployed into production and the business workflow. There are two main deployment concepts here. The first involves the deployment of the model itself to be used by a client application to generate predictions. The second concept is to integrate this prediction workflow into a business workflow application. For example, deploying the credit fraud model would either host the model behind an API for real-time prediction or as a package that can be loaded dynamically to support batch predictions. Moreover, this prediction workflow also needs to be integrated into business workflow applications for fraud detection, which might include the fraud detection of real-time transactions, decision automation based on prediction output, and fraud detection analytics for detailed fraud analytics.
Model monitoring
The ML lifecycle does not end with model deployment. Unlike software, whose behavior is highly deterministic since developers explicitly code its logic, an ML model could behave differently in production from its behavior in model training and validation. This could be caused by changes in the production data characteristics, data distribution, or the potential manipulation of request data. Therefore, model monitoring is an important post-deployment step for detecting model performance degradation (a.k.a model drift) or dataset distribution change in the production environment (a.k.a data drift).
Business metric tracking
The actual business impact should be tracked and measured as an ongoing process to ensure the model delivers the expected business benefits. This may involve comparing the business metrics before and after the model deployment, or A/B testing where a business metric is compared between workflows with or without the ML model. If the model does not deliver the expected benefits, it should be re-evaluated for improvement opportunities. This could also mean framing the business problem as a different ML problem. For example, if churn prediction does not help improve customer satisfaction, then consider a personalized product/service offering to solve the problem.
ML challenges
Over the years, I have worked on many real-world problems using ML solutions and encountered different challenges faced by different industries during ML adoptions.
I often get the same question when working on ML projects: We have a lot of data – can you help us figure out what insights we can generate using ML? I refer to companies with this question as having a business use case challenge. Not being able to identify business use cases for ML is a very big hurdle for many companies. Without a properly identified business problem and its value proposition and benefit, it becomes difficult to initiate an ML project.
In my conversations with different companies across their industries, data-related challenges emerge as a frequent issue. This includes data quality, data inventory, data accessibility, data governance, and data availability. This problem affects both data-poor and data-rich companies and is often exacerbated by data silos, data security, and industry regulations.
The shortage of data science and ML talent is another major challenge I have heard from many companies. Companies, in general, are having a tough time attracting and retaining top ML talents, which is a common problem across all industries. As ML platforms become more complex and the scope of ML projects increases, the need for other ML-related functions starts to surface. Nowadays, in addition to just data scientists, an organization would also need functional roles for ML product management, ML infrastructure engineering, and ML operations management.
Based on my experiences, I have observed that cultural acceptance of ML-based solutions is another significant challenge for broad adoption. There are individuals who perceive ML as a threat to their job functions, and their lack of knowledge in ML makes them hesitant to adopt these new methods in their business workflows.
The practice of ML solutions architecture aims to help solve some of the challenges in ML. In the next section, we will explore ML solutions architecture and its role in the ML lifecycle.
ML solutions architecture
When I initially worked with companies as an ML solutions architect, the landscape was quite different from what it is now. The focus was mainly on data science and modeling, and the problems at hand were small in scope. Back then, most of the problems could be solved using simple ML techniques. The datasets were small, and the infrastructure required was not too demanding. The scope of the ML initiative at these companies was limited to a few data scientists or teams. As an ML architect at that time, I primarily needed to have solid data science skills and general cloud architecture knowledge to get the job done.
In more recent years, the landscape of ML initiatives has become more intricate and multifaceted, necessitating involvement from a broader range of functions and personas at companies. My engagement has expanded to include discussions with business executives about ML strategies and organizational design to facilitate the broad adoption of AI/ML throughout their enterprises. I have been tasked with designing more complex ML platforms, utilizing a diverse range of technologies for large enterprises to meet stringent security and compliance requirements. ML workflow orchestration and operations have become increasingly crucial topics of discussion, and more and more companies are looking to train large ML models with enormous amounts of training data. The number of ML models trained and deployed by some companies has skyrocketed to tens of thousands from a few dozen models in just a few years. Furthermore, sophisticated and security-sensitive customers have sought guidance on topics such as ML privacy, model explainability, and data and model bias. As an ML solutions architect, I’ve noticed that the skills and knowledge required to be successful in this role have evolved significantly.
Trying to navigate the complexities of a business, data, science, and technology landscape can be a daunting task. As an ML solutions architect, I have seen firsthand the challenges that companies face in bringing all these pieces together. In my view, ML solutions architecture is an essential discipline that serves as a bridge connecting the different components of an ML initiative. Drawing on my years of experience working with companies of all sizes and across diverse industries, I believe that an ML solutions architect plays a pivotal role in identifying business needs, developing ML solutions to address these needs, and designing the technology platforms necessary to run these solutions. By collaborating with various business and technology partners, an ML solutions architect can help companies unlock the full potential of their data and realize tangible benefits from their ML initiatives.
The following figure illustrates the core functional areas covered by the ML solutions architecture:
Figure 1.3: ML solutions architecture coverage
In the following sections, we will explore each of these areas in greater detail:
Business understanding: Business problem understanding and transformation using AI and ML.
Identification and verification of ML techniques: Identification and verification of ML techniques for solving specific ML problems.
System architecture of the ML technology platform: System architecture design and implementation of the ML technology platforms.
MLOps: ML platform automation technical design.
Security and compliance: Security, compliance, and audit considerations for the ML platform and ML models.
So, let’s dive in!
Business understanding and ML transformation
The goal of the business workflow analysis is to identify inefficiencies in the workflows and determine if ML can be applied to help eliminate pain points, improve efficiency, or even create new revenue opportunities.
Picture this: you are tasked with improving a call center’s operations. You know there are inefficiencies that need to be addressed, but you’re not sure where to start. That’s where business workflow analysis comes in. By analyzing the call center’s workflows, you can identify pain points such as long customer wait times, knowledge gaps among agents, and the inability to extract customer insights from call recordings. Once you have identified these issues, you can determine what data is available and which business metrics need to be improved. This is where ML comes in. You can use ML to create virtual assistants for common customer inquiries, transcribe audio recordings to allow for text analysis, and detect customer intent for product cross-sell and up-sell. But sometimes, you need to modify the business process to incorporate ML solutions. For example, if you want to use call recording analytics to generate insights for cross-selling or up-selling products, but there’s no established process to act on those insights, you may need to introduce an automated target marketing process or a proactive outreach process by the sales team.
Identification and verification of ML techniques
Once you have come up with a list of ML options, the next step is to determine if the assumption behind the ML approach is valid. This could involve conducting a simple proof of concept (POC) modeling to validate the available dataset and modeling approach, or technology POC using pre-built AI services, or testing of ML frameworks. For example, you might want to test the feasibility of text transcription from audio files using an existing text transcription service or build a customer propensity model for a new product conversion from a marketing campaign.
It is worth noting that ML solutions architecture does not focus on developing new machine algorithms, a job best suited for applied data scientists or research data scientists. Instead, ML solutions architecture focuses on identifying and applying ML algorithms to address a range of ML problems such as predictive analytics, computer vision, or natural language processing. Also, the goal of any modeling task here is not to build production-quality models but rather to validate the approach for further experimentations by full-time applied data scientists.
System architecture design and implementation
The most important aspect of the ML solutions architect’s role is the technical architecture design of the ML platform. The platform will need to provide the technical capability to support the different phases of the ML cycle and personas, such as data scientists and operations engineers. Specifically, an ML platform needs to have the following core functions:
Data explorations and experimentation: Data scientists use ML platforms for data exploration, experimentation, model building, and model evaluation. ML platforms need to provide capabilities such as data science development tools for model authoring and experimentation, data wrangling tools for data exploration and wrangling, source code control for code management, and a package repository for library package management.
Data management and large-scale data processing: Data scientists or data engineers will need the technical capability to ingest, store, access, and process large amounts of data for cleansing, transformation, and feature engineering.
Model training infrastructure management: ML platforms will need to provide model training infrastructure for different modeling training using different types of computing resources, storage, and networking configurations. It also needs to support different types of ML libraries or frameworks, such as scikit-learn, TensorFlow, and PyTorch.
Model hosting/serving: ML platforms will need to provide the technical capability to host and serve the model for prediction generations, for real-time, batch, or both.
Model management: Trained ML models will need to be managed and tracked for easy access and lookup, with relevant metadata.
Feature management: Common and reusable features will need to be managed and served for model training and model serving purposes.
ML platform workflow automation
A key aspect of ML platform design is workflow automation and continuous integration/continuous deployment (CI/CD), also known as MLOps. ML is a multi-step workflow – it needs to be automated, which includes data processing, model training, model validation, and model hosting. Infrastructure provisioning automation and self-service is another aspect of automation design. Key components of workflow automation include the following:
Pipeline design and management: The ability to create different automation pipelines for various tasks, such as model training and model hosting.
Pipeline execution and monitoring: The ability to run different pipelines and monitor the pipeline execution status for the entire pipeline and each