Machine Learning Infrastructure and Best Practices for Software Engineers: Take your machine learning software from a prototype to a fully fledged software system

Ebook680 pages5 hours

Machine Learning Infrastructure and Best Practices for Software Engineers: Take your machine learning software from a prototype to a fully fledged software system

Name: Machine Learning Infrastructure and Best Practices for Software Engineers: Take your machine learning software from a prototype to a fully fledged software system
Author: Miroslaw Staron
ISBN: 9781837636945

By Miroslaw Staron

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Although creating a machine learning pipeline or developing a working prototype of a software system from that pipeline is easy and straightforward nowadays, the journey toward a professional software system is still extensive. This book will help you get to grips with various best practices and recipes that will help software engineers transform prototype pipelines into complete software products.
The book begins by introducing the main concepts of professional software systems that leverage machine learning at their core. As you progress, you’ll explore the differences between traditional, non-ML software, and machine learning software. The initial best practices will guide you in determining the type of software you need for your product. Subsequently, you will delve into algorithms, covering their selection, development, and testing before exploring the intricacies of the infrastructure for machine learning systems by defining best practices for identifying the right data source and ensuring its quality.
Towards the end, you’ll address the most challenging aspect of large-scale machine learning systems – ethics. By exploring and defining best practices for assessing ethical risks and strategies for mitigation, you will conclude the book where it all began – large-scale machine learning software.

Skip carousel

LanguageEnglish

PublisherPackt Publishing

Release dateJan 31, 2024

ISBN9781837636945

Author

Miroslaw Staron

Related authors

Skip carousel

Related to Machine Learning Infrastructure and Best Practices for Software Engineers

Related ebooks

Skip carousel

Applied Machine Learning Explainability Techniques: Make ML models explainable and trustworthy for practical applications using LIME, SHAP, and more
Ebook
Applied Machine Learning Explainability Techniques: Make ML models explainable and trustworthy for practical applications using LIME, SHAP, and more
byAditya Bhattacharya
Rating: 0 out of 5 stars
0 ratings
Metaprogramming in C#: Automate your .NET development and simplify overcomplicated code
Ebook
Metaprogramming in C#: Automate your .NET development and simplify overcomplicated code
byEinar Ingebrigtsen
Rating: 0 out of 5 stars
0 ratings
The Machine Learning Solutions Architect Handbook: Create machine learning platforms to run solutions in an enterprise setting
Ebook
The Machine Learning Solutions Architect Handbook: Create machine learning platforms to run solutions in an enterprise setting
byDavid Ping
Rating: 0 out of 5 stars
0 ratings
Systems Engineering Demystified: A practitioner's handbook for developing complex systems using a model-based approach
Ebook
Systems Engineering Demystified: A practitioner's handbook for developing complex systems using a model-based approach
byJon Holt
Rating: 4 out of 5 stars
4/5
Machine Learning Engineering with Python: Manage the production life cycle of machine learning models using MLOps with practical examples
Ebook
Machine Learning Engineering with Python: Manage the production life cycle of machine learning models using MLOps with practical examples
byAndrew P. McMahon
Rating: 0 out of 5 stars
0 ratings
Machine Learning for Imbalanced Data: Tackle imbalanced datasets using machine learning and deep learning techniques
Ebook
Machine Learning for Imbalanced Data: Tackle imbalanced datasets using machine learning and deep learning techniques
byAbhishek Kumar
Rating: 0 out of 5 stars
0 ratings
Interpretable Machine Learning with Python: Learn to build interpretable high-performance models with hands-on real-world examples
Ebook
Interpretable Machine Learning with Python: Learn to build interpretable high-performance models with hands-on real-world examples
bySerg Masís
Rating: 0 out of 5 stars
0 ratings
Engineering MLOps: Rapidly build, test, and manage production-ready machine learning life cycles at scale
Ebook
Engineering MLOps: Rapidly build, test, and manage production-ready machine learning life cycles at scale
byEmmanuel Raj
Rating: 0 out of 5 stars
0 ratings
Deep Learning with Azure: Building and Deploying Artificial Intelligence Solutions on the Microsoft AI Platform
Ebook
Deep Learning with Azure: Building and Deploying Artificial Intelligence Solutions on the Microsoft AI Platform
byMathew Salvaris
Rating: 0 out of 5 stars
0 ratings
Practical Data Analysis - Second Edition
Ebook
Practical Data Analysis - Second Edition
byHector Cuesta
Rating: 0 out of 5 stars
0 ratings
R Machine Learning Projects: Implement supervised, unsupervised, and reinforcement learning techniques using R 3.5
Ebook
R Machine Learning Projects: Implement supervised, unsupervised, and reinforcement learning techniques using R 3.5
byDr. Sunil Kumar Chinnamgari
Rating: 0 out of 5 stars
0 ratings
Python Machine Learning Projects: Learn how to build Machine Learning projects from scratch (English Edition)
Ebook
Python Machine Learning Projects: Learn how to build Machine Learning projects from scratch (English Edition)
byDr. Deepali R Vora
Rating: 0 out of 5 stars
0 ratings
The Deep Learning Architect's Handbook: Build and deploy production-ready DL solutions leveraging the latest Python techniques
Ebook
The Deep Learning Architect's Handbook: Build and deploy production-ready DL solutions leveraging the latest Python techniques
byEe Kin Chin
Rating: 0 out of 5 stars
0 ratings
Introduction to DBMS: Designing and Implementing Databases from Scratch for Absolute Beginners
Ebook
Introduction to DBMS: Designing and Implementing Databases from Scratch for Absolute Beginners
byDr. Hariram Chavan
Rating: 0 out of 5 stars
0 ratings
Building ETL Pipelines with Python: Create and deploy enterprise-ready ETL pipelines by employing modern methods
Ebook
Building ETL Pipelines with Python: Create and deploy enterprise-ready ETL pipelines by employing modern methods
byBrij Kishore Pandey
Rating: 0 out of 5 stars
0 ratings
Machine Learning Algorithms
Ebook
Machine Learning Algorithms
byGiuseppe Bonaccorso
Rating: 0 out of 5 stars
0 ratings
Mastering Azure Machine Learning.: Execute large-scale end-to-end machine learning with Azure
Ebook
Mastering Azure Machine Learning.: Execute large-scale end-to-end machine learning with Azure
byKörner Christoph
Rating: 0 out of 5 stars
0 ratings
Metaprogramming with Python: A programmer's guide to writing reusable code to build smarter applications
Ebook
Metaprogramming with Python: A programmer's guide to writing reusable code to build smarter applications
bySulekha AloorRavi
Rating: 0 out of 5 stars
0 ratings
Reproducible Data Science with Pachyderm: Learn how to build version-controlled, end-to-end data pipelines using Pachyderm 2.0
Ebook
Reproducible Data Science with Pachyderm: Learn how to build version-controlled, end-to-end data pipelines using Pachyderm 2.0
bySvetlana Karslioglu
Rating: 0 out of 5 stars
0 ratings
Machine Learning for Developers: Uplift your regular applications with the power of statistics, analytics, and machine learning
Ebook
Machine Learning for Developers: Uplift your regular applications with the power of statistics, analytics, and machine learning
byRodolfo Bonnin
Rating: 0 out of 5 stars
0 ratings
Machine Learning in Biotechnology and Life Sciences: Build machine learning models using Python and deploy them on the cloud
Ebook
Machine Learning in Biotechnology and Life Sciences: Build machine learning models using Python and deploy them on the cloud
bySaleh Alkhalifa
Rating: 0 out of 5 stars
0 ratings
Debugging Machine Learning Models with Python: Develop high-performance, low-bias, and explainable machine learning and deep learning models
Ebook
Debugging Machine Learning Models with Python: Develop high-performance, low-bias, and explainable machine learning and deep learning models
byAli Madani
Rating: 0 out of 5 stars
0 ratings
Edge Computing Patterns for Solution Architects: Learn methods and principles of resilient distributed application architectures from hybrid cloud to far edge
Ebook
Edge Computing Patterns for Solution Architects: Learn methods and principles of resilient distributed application architectures from hybrid cloud to far edge
byAshok Iyengar
Rating: 0 out of 5 stars
0 ratings
Hands-On Machine Learning with Azure: Build powerful models with cognitive machine learning and artificial intelligence
Ebook
Hands-On Machine Learning with Azure: Build powerful models with cognitive machine learning and artificial intelligence
byThomas K Abraham
Rating: 0 out of 5 stars
0 ratings
Production-Ready Applied Deep Learning: Learn how to construct and deploy complex models in PyTorch and TensorFlow deep learning frameworks
Ebook
Production-Ready Applied Deep Learning: Learn how to construct and deploy complex models in PyTorch and TensorFlow deep learning frameworks
byTomasz Palczewski
Rating: 0 out of 5 stars
0 ratings
Machine Learning in Java: Helpful techniques to design, build, and deploy powerful machine learning applications in Java, 2nd Edition
Ebook
Machine Learning in Java: Helpful techniques to design, build, and deploy powerful machine learning applications in Java, 2nd Edition
byAshishSingh Bhatia
Rating: 0 out of 5 stars
0 ratings
System Design Guide for Software Professionals: Build scalable solutions – from fundamental concepts to cracking top tech company interviews
Ebook
System Design Guide for Software Professionals: Build scalable solutions – from fundamental concepts to cracking top tech company interviews
byDhirendra Sinha
Rating: 0 out of 5 stars
0 ratings
A Handbook of Mathematical Models with Python: Elevate your machine learning projects with NetworkX, PuLP, and linalg
Ebook
A Handbook of Mathematical Models with Python: Elevate your machine learning projects with NetworkX, PuLP, and linalg
byDr. Ranja Sarkar
Rating: 0 out of 5 stars
0 ratings
The AI Product Manager's Handbook: Develop a product that takes advantage of machine learning to solve AI problems
Ebook
The AI Product Manager's Handbook: Develop a product that takes advantage of machine learning to solve AI problems
byIrene Bratsis
Rating: 0 out of 5 stars
0 ratings
Federated Learning with Python: Design and implement a federated learning system and develop applications using existing frameworks
Ebook
Federated Learning with Python: Design and implement a federated learning system and develop applications using existing frameworks
byKiyoshi Nakayama PhD
Rating: 0 out of 5 stars
0 ratings

Computers For You

Skip carousel

The Invisible Rainbow: A History of Electricity and Life
Ebook
The Invisible Rainbow: A History of Electricity and Life
byArthur Firstenberg
Rating: 5 out of 5 stars
5/5
Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad
Ebook
Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad
byAaron Smith
Rating: 0 out of 5 stars
0 ratings
Slenderman: Online Obsession, Mental Illness, and the Violent Crime of Two Midwestern Girls
Ebook
Slenderman: Online Obsession, Mental Illness, and the Violent Crime of Two Midwestern Girls
byKathleen Hale
Rating: 4 out of 5 stars
4/5
The Simulation Hypothesis: An MIT Computer Scientist Shows Why AI, Quantum Physics and Eastern Mystics All Agree We Are In a Video Game
Ebook
The Simulation Hypothesis: An MIT Computer Scientist Shows Why AI, Quantum Physics and Eastern Mystics All Agree We Are In a Video Game
byRizwan Virk
Rating: 5 out of 5 stars
5/5
The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution
Ebook
The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution
byWalter Isaacson
Rating: 4 out of 5 stars
4/5
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Ebook
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
byMargot Lee Shetterly
Rating: 4 out of 5 stars
4/5
Elon Musk
Ebook
Elon Musk
byWalter Isaacson
Rating: 4 out of 5 stars
4/5
Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics
Ebook
Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics
byGary Smith
Rating: 4 out of 5 stars
4/5
CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61
Ebook
CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61
byQuentin Docter
Rating: 0 out of 5 stars
0 ratings
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
Ebook
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
byCea West
Rating: 5 out of 5 stars
5/5
101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters
Ebook
101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters
byTriumph Books
Rating: 4 out of 5 stars
4/5
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
Ebook
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
byNigel Tillery
Rating: 0 out of 5 stars
0 ratings
The Professional Voiceover Handbook: Voiceover training, #1
Ebook
The Professional Voiceover Handbook: Voiceover training, #1
byPeter Baker
Rating: 5 out of 5 stars
5/5
The Complete Powershell Training for Beginners
Ebook
The Complete Powershell Training for Beginners
byAbdelfattah Benammi
Rating: 0 out of 5 stars
0 ratings
Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are
Ebook
Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are
bySeth Stephens-Davidowitz
Rating: 4 out of 5 stars
4/5
The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology
Ebook
The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology
byTJ Books
Rating: 4 out of 5 stars
4/5
The Hacker Crackdown: Law and Disorder on the Electronic Frontier
Ebook
The Hacker Crackdown: Law and Disorder on the Electronic Frontier
byBruce Sterling
Rating: 4 out of 5 stars
4/5
Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition
Ebook
Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition
byAndrew Hodges
Rating: 4 out of 5 stars
4/5
CompTIA Security+ Get Certified Get Ahead: SY0-701 Study Guide
Ebook
CompTIA Security+ Get Certified Get Ahead: SY0-701 Study Guide
byJoe Shelley
Rating: 5 out of 5 stars
5/5
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Uncanny Valley: A Memoir
Ebook
Uncanny Valley: A Memoir
byAnna Wiener
Rating: 4 out of 5 stars
4/5
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
Ebook
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
byArthur T. Brooks
Rating: 0 out of 5 stars
0 ratings
How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally
Ebook
How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally
byAlex Parkinson
Rating: 4 out of 5 stars
4/5
Ultimate Guide to Mastering Command Blocks!: Minecraft Keys to Unlocking Secret Commands
Ebook
Ultimate Guide to Mastering Command Blocks!: Minecraft Keys to Unlocking Secret Commands
byTriumph Books
Rating: 5 out of 5 stars
5/5
ChatGPT Money Machine 2024 - The Ultimate Chatbot Cheat Sheet to Go From Clueless Noob to Prompt Prodigy Fast! Complete AI Beginner’s Course to Catch the GPT Gold Rush Before It Leaves You Behind
Ebook
ChatGPT Money Machine 2024 - The Ultimate Chatbot Cheat Sheet to Go From Clueless Noob to Prompt Prodigy Fast! Complete AI Beginner’s Course to Catch the GPT Gold Rush Before It Leaves You Behind
byAlec Rowe
Rating: 0 out of 5 stars
0 ratings
CompTIA Security+ Practice Questions
Ebook
CompTIA Security+ Practice Questions
byIP Specialist
Rating: 2 out of 5 stars
2/5
Deep Search: How to Explore the Internet More Effectively
Ebook
Deep Search: How to Explore the Internet More Effectively
byAlan Pearce
Rating: 5 out of 5 stars
5/5
CompTIA Certification: The Ultimate Guide To Discover CompTIA. Certified Quickly And Easily Passing The Certification Exam. Real Practice Test With Detailed Screenshots, Answers And Explanations
Ebook
CompTIA Certification: The Ultimate Guide To Discover CompTIA. Certified Quickly And Easily Passing The Certification Exam. Real Practice Test With Detailed Screenshots, Answers And Explanations
byDavid Mayer
Rating: 0 out of 5 stars
0 ratings
Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work
Ebook
Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work
bySteven Cooper
Rating: 4 out of 5 stars
4/5
The Huffington Post Complete Guide to Blogging
Ebook
The Huffington Post Complete Guide to Blogging
byThe editors of the Huffington Post
Rating: 3 out of 5 stars
3/5

Related podcast episodes

Skip carousel

The Role of Infrastructure in ML // Niels Bantilan // #197
Podcast episode
The Role of Infrastructure in ML // Niels Bantilan // #197
byMLOps.community
0 ratings
0% found this document useful
Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach: Artificial intelligence has dominated the headlines for several months due to the successes of large language models. This has prompted numerous debates about the possibility of, and timeline for, artificial general intelligence (AGI). Peter Voss has dedicated decades of his life to the pursuit of truly intelligent software through the approach of cognitive AI. In this episode he explains his approach to building AI in a more human-like fashion and the emphasis on learning rather than statistical prediction.
Podcast episode
Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach: Artificial intelligence has dominated the headlines for several months due to the successes of large language models. This has prompted numerous debates about the possibility of, and timeline for, artificial general intelligence (AGI). Peter Voss has dedicated decades of his life to the pursuit of truly intelligent software through the approach of cognitive AI. In this episode he explains his approach to building AI in a more human-like fashion and the emphasis on learning rather than statistical prediction.
byData Engineering Podcast
0 ratings
0% found this document useful
Machine in Production = Data Engineering + ML + Software Engineering // Satish Chandra Gupta // MLOps Coffee Sessions #16
Podcast episode
Machine in Production = Data Engineering + ML + Software Engineering // Satish Chandra Gupta // MLOps Coffee Sessions #16
byMLOps.community
0 ratings
0% found this document useful
MLOps - Design Thinking to Build ML Infra for ML and LLM Use Casess // Amritha Arun Babu & Abhik Choudhury // #221
Podcast episode
MLOps - Design Thinking to Build ML Infra for ML and LLM Use Casess // Amritha Arun Babu & Abhik Choudhury // #221
byMLOps.community
0 ratings
0% found this document useful
The Changing Faces of Data and Analytics
Podcast episode
The Changing Faces of Data and Analytics
byInsights Tomorrow
0 ratings
0% found this document useful
Experiment Tracking in the Age of LLMs // Piotr Niedźwiedź // MLOps Podcast #168
Podcast episode
Experiment Tracking in the Age of LLMs // Piotr Niedźwiedź // MLOps Podcast #168
byMLOps.community
0 ratings
0% found this document useful
Build Your Second Brain One Piece At A Time: Generative AI promises to accelerate the productivity of human collaborators. Currently the primary way of working with these tools is through a conversational prompt, which is often cumbersome and unwieldy. In order to simplify the integration of AI capabilities into developer workflows Tsavo Knott helped create Pieces, a powerful collection of tools that complements the tools that developers already use. In this episode he explains the data collection and preparation process, the collection of model types and sizes that work together to power the experience, and how to incorporate it into your workflow to act as a second brain.
Podcast episode
Build Your Second Brain One Piece At A Time: Generative AI promises to accelerate the productivity of human collaborators. Currently the primary way of working with these tools is through a conversational prompt, which is often cumbersome and unwieldy. In order to simplify the integration of AI capabilities into developer workflows Tsavo Knott helped create Pieces, a powerful collection of tools that complements the tools that developers already use. In this episode he explains the data collection and preparation process, the collection of model types and sizes that work together to power the experience, and how to incorporate it into your workflow to act as a second brain.
byData Engineering Podcast
0 ratings
0% found this document useful
The Birth and Growth of Spark: An Open Source Success Story // Matei Zaharia // MLOps Podcast #155
Podcast episode
The Birth and Growth of Spark: An Open Source Success Story // Matei Zaharia // MLOps Podcast #155
byMLOps.community
0 ratings
0% found this document useful
The Three Roles of the Chief Data Officer: ADP’s Jack Berkowitz
Podcast episode
The Three Roles of the Chief Data Officer: ADP’s Jack Berkowitz
byMe, Myself, and AI
0 ratings
0% found this document useful
Open Standards Make MLOps Easier and Silos Harder // Cody Peterson // #234
Podcast episode
Open Standards Make MLOps Easier and Silos Harder // Cody Peterson // #234
byMLOps.community
0 ratings
0% found this document useful
Reduce Friction In Your Business Analytics Through Entity Centric Data Modeling: For business analytics the way that you model the data in your warehouse has a lasting impact on what types of questions can be answered quickly and easily. The major strategies in use today were created decades ago when the software and hardware for warehouse databases were far more constrained. In this episode Maxime Beauchemin of Airflow and Superset fame shares his vision for the entity-centric data model and how you can incorporate it into your own warehouse design.
Podcast episode
Reduce Friction In Your Business Analytics Through Entity Centric Data Modeling: For business analytics the way that you model the data in your warehouse has a lasting impact on what types of questions can be answered quickly and easily. The major strategies in use today were created decades ago when the software and hardware for warehouse databases were far more constrained. In this episode Maxime Beauchemin of Airflow and Superset fame shares his vision for the entity-centric data model and how you can incorporate it into your own warehouse design.
byData Engineering Podcast
0 ratings
0% found this document useful
How Data Engineering Teams Power Machine Learning With Feature Platforms: Feature engineering is a crucial aspect of the machine learning workflow. To make that possible, there are a number of technical and procedural capabilities that must be in place first. In this episode Razi Raziuddin shares how data engineering teams can support the machine learning workflow through the development and support of systems that empower data scientists and ML engineers to build and maintain their own features.
Podcast episode
How Data Engineering Teams Power Machine Learning With Feature Platforms: Feature engineering is a crucial aspect of the machine learning workflow. To make that possible, there are a number of technical and procedural capabilities that must be in place first. In this episode Razi Raziuddin shares how data engineering teams can support the machine learning workflow through the development and support of systems that empower data scientists and ML engineers to build and maintain their own features.
byData Engineering Podcast
0 ratings
0% found this document useful
[Exclusive] Zilliz Roundtable // Why Purpose-built Vector Databases Matter for Your Use Case
Podcast episode
[Exclusive] Zilliz Roundtable // Why Purpose-built Vector Databases Matter for Your Use Case
byMLOps.community
0 ratings
0% found this document useful
Top Skills Every young Executive Must Have: Top Skills Every young Executive Must Have
Podcast episode
Top Skills Every young Executive Must Have: Top Skills Every young Executive Must Have
byPersonal Branding Podcast
0 ratings
0% found this document useful
Scott Hanselman - The Fear Factor in Maintainable Software: Does your team's software give you warm fuzzies, or does it leave you filled with fear? On this episode of Maintainable, Scott Hanselman, VP of Developer Community at Microsoft, shares his take on the emotional and human side of well-maintained code, how teams can overcome the fear that leads to technical debt, and the importance of finding balance between prep work and shipping.
Podcast episode
Scott Hanselman - The Fear Factor in Maintainable Software: Does your team's software give you warm fuzzies, or does it leave you filled with fear? On this episode of Maintainable, Scott Hanselman, VP of Developer Community at Microsoft, shares his take on the emotional and human side of well-maintained code, how teams can overcome the fear that leads to technical debt, and the importance of finding balance between prep work and shipping.
byMaintainable
0 ratings
0% found this document useful
MLOps Meetup #29 // Scaling Machine Learning Capabilities in Large Organizations // Bertjan Broeksema & Axel Goblet
Podcast episode
MLOps Meetup #29 // Scaling Machine Learning Capabilities in Large Organizations // Bertjan Broeksema & Axel Goblet
byMLOps.community
0 ratings
0% found this document useful
Making Email Better With AI At Shortwave: Generative AI has rapidly transformed everything in the technology sector. When Andrew Lee started work on Shortwave he was focused on making email more productive. When AI started gaining adoption he realized that he had even more potential for a transformative experience. In this episode he shares the technical challenges that he and his team have overcome in integrating AI into their product, as well as the benefits and features that it provides to their customers.
Podcast episode
Making Email Better With AI At Shortwave: Generative AI has rapidly transformed everything in the technology sector. When Andrew Lee started work on Shortwave he was focused on making email more productive. When AI started gaining adoption he realized that he had even more potential for a transformative experience. In this episode he shares the technical challenges that he and his team have overcome in integrating AI into their product, as well as the benefits and features that it provides to their customers.
byData Engineering Podcast
0 ratings
0% found this document useful
554. Barry Saunders: AI Project Case Study: Show Notes: Barry Saunders, a digital expert at McKinsey, discusses his background in the firm and his experience in AI-related projects. He worked in the LEAP practice, which built platforms for video streaming, preventative maintenance, and...
Podcast episode
554. Barry Saunders: AI Project Case Study: Show Notes: Barry Saunders, a digital expert at McKinsey, discusses his background in the firm and his experience in AI-related projects. He worked in the LEAP practice, which built platforms for video streaming, preventative maintenance, and...
byUnleashed - How to Thrive as an Independent Professional
0 ratings
0% found this document useful
How to Build Production-Ready AI Models for Manufacturing // [Exclusive] LatticeFlow Roundtable
Podcast episode
How to Build Production-Ready AI Models for Manufacturing // [Exclusive] LatticeFlow Roundtable
byMLOps.community
0 ratings
0% found this document useful
Harnessing Generative AI For Creating Educational Content With Illumidesk: Generative AI has unlocked a massive opportunity for content creation. There is also an unfulfilled need for experts to be able to share their knowledge and build communities. Illumidesk was built to take advantage of this intersection. In this episode Greg Werner explains how they are using generative AI as an assistive tool for creating educational material, as well as building a data driven experience for learners.
Podcast episode
Harnessing Generative AI For Creating Educational Content With Illumidesk: Generative AI has unlocked a massive opportunity for content creation. There is also an unfulfilled need for experts to be able to share their knowledge and build communities. Illumidesk was built to take advantage of this intersection. In this episode Greg Werner explains how they are using generative AI as an assistive tool for creating educational material, as well as building a data driven experience for learners.
byData Engineering Podcast
0 ratings
0% found this document useful
The Future of AI and ML in Process Automation // Slater Victoroff // MLOps Coffee Sessions #64
Podcast episode
The Future of AI and ML in Process Automation // Slater Victoroff // MLOps Coffee Sessions #64
byMLOps.community
0 ratings
0% found this document useful
From MVP to Production // Day 2 Panel 2 // AI in Production Conference
Podcast episode
From MVP to Production // Day 2 Panel 2 // AI in Production Conference
byMLOps.community
0 ratings
0% found this document useful
381 Programming Framework: Which Ones To Learn? - Simple Programmer Podcast: If you're a software developer I doubt you'll ever be able to learn everything that software developer has to offer. Every day new programming languages come out, technology changes and the process is updated. All this amount of information makes it...
Podcast episode
381 Programming Framework: Which Ones To Learn? - Simple Programmer Podcast: If you're a software developer I doubt you'll ever be able to learn everything that software developer has to offer. Every day new programming languages come out, technology changes and the process is updated. All this amount of information makes it...
bySimple Programmer Podcast
0 ratings
0% found this document useful
Breaking Down Today’s Machine Learning Technology with Christina Pawlikowski: Melissa Perri is joined by Christina Pawlikowski, a teaching fellow at Harvard and co-founder of Causal, to help demystify machine learning and AI on this episode of Product Thinking.
Podcast episode
Breaking Down Today’s Machine Learning Technology with Christina Pawlikowski: Melissa Perri is joined by Christina Pawlikowski, a teaching fellow at Harvard and co-founder of Causal, to help demystify machine learning and AI on this episode of Product Thinking.
byProduct Thinking
0 ratings
0% found this document useful
Declarative Machine Learning Systems: Big Tech Level ML Without a Big Tech Team // Piero Molino // MLOps Coffee Sessions #101
Podcast episode
Declarative Machine Learning Systems: Big Tech Level ML Without a Big Tech Team // Piero Molino // MLOps Coffee Sessions #101
byMLOps.community
0 ratings
0% found this document useful
LLMs in Focus: From One-Size Fits All to Verticalized Solutions // Venky Ganti & Laurel Orr // #196
Podcast episode
LLMs in Focus: From One-Size Fits All to Verticalized Solutions // Venky Ganti & Laurel Orr // #196
byMLOps.community
0 ratings
0% found this document useful
Explainability in the MLOps Cycle // Dattaraj Rao // MLOps Podcast #138
Podcast episode
Explainability in the MLOps Cycle // Dattaraj Rao // MLOps Podcast #138
byMLOps.community
0 ratings
0% found this document useful
#88 - Observability Engineering - Liz Fong-Jones
Podcast episode
#88 - Observability Engineering - Liz Fong-Jones
byTech Lead Journal
0 ratings
0% found this document useful
Observability 2.0 with Charity Majors
Podcast episode
Observability 2.0 with Charity Majors
byThe IaC Podcast
0 ratings
0% found this document useful
? MLOps lessons learned helping companies build their ML systems with Lee Harper, Lead DS at Catapult
Podcast episode
? MLOps lessons learned helping companies build their ML systems with Lee Harper, Lead DS at Catapult
byThe MLOps Podcast
0 ratings
0% found this document useful

Skip carousel

The Deep Learning Revolution For Artificial Intelligence
Facility Management
Article
The Deep Learning Revolution For Artificial Intelligence
Mar 28, 2019
3 min read
Getting The edge
The European Business Review
Article
Getting The edge
Feb 25, 2021
7 min read
Federated Learning Uses The Data Right On Our Devices
Futurity
Article
Federated Learning Uses The Data Right On Our Devices
Jul 21, 2022
2 min read
Generative AI: What Leaders Need To Know
Rotman Management
Article
Generative AI: What Leaders Need To Know
Jan 1, 2024
12 min read
AI And Design: Questions Of Ethics
Architecture Australia
Article
AI And Design: Questions Of Ethics
Mar 4, 2024
Artificial intelligence (AI) is a very old idea, but the term AI and the field of AI as it relates to modern programmable digital computing have taken their contemporary forms in the past 70 years.1Today, we interact with AI technologies constantly,
5 min read
2 The Use of Python in AI and ML
Techfastly
Article
2 The Use of Python in AI and ML
Nov 30, 2020
3 min read
Why We Need To Fear The Risk Of AI Model Collapse
Evening Standard
Article
Why We Need To Fear The Risk Of AI Model Collapse
Dec 17, 2023
4 min read
COMPETITIVE ADVANTAGE THROUGH SOFTWARE: Contrasting Enterprises & Startups
The European Business Review
Article
COMPETITIVE ADVANTAGE THROUGH SOFTWARE: Contrasting Enterprises & Startups
Feb 4, 2019
6 min read
Ideas Lab
K-Zone
Article
Ideas Lab
Oct 10, 2021
Meet Rashina Hoda, a software engineering researcher who studies how software engineers develop the software products we all love! K-Z : Hi Rashina! What do you do in your role at Monash University? R: As Associate Professor of Software Engineeri
2 min read
In Conversation with Surbhi Rathore
Techfastly
Article
In Conversation with Surbhi Rathore
Oct 1, 2021
4 min read
Forward Thinking
Racecar Engineering
Article
Forward Thinking
Feb 4, 2022
8 min read
Code Meets Cognition
Business Today
Article
Code Meets Cognition
May 12, 2023
3 min read
Jobs Of The Future
True Love
Article
Jobs Of The Future
Jan 26, 2023
5 min read
Decoding The Impact Of AI
Her World Singapore
Article
Decoding The Impact Of AI
May 5, 2023
6 min read
Hack It Right
India Today
Article
Hack It Right
Jun 13, 2019
After attending the two-day security conference ' BountyCon' organised jointly by Facebook and Google in Singapore in March, Rohit Kumar, a second-year student of BCA (Hons) in computer application from Lovely Professional University (LPU), Punjab, w
4 min read
01 Ready Or Not, AI Is Here To Assist You
HWM Singapore
Article
01 Ready Or Not, AI Is Here To Assist You
Jul 11, 2023
4 min read
How Can I Use Artificial Intelligence (AI) More Effectively At Work?
Her World Singapore
Article
How Can I Use Artificial Intelligence (AI) More Effectively At Work?
May 7, 2024
2 min read
Five Steps To Join The Era Of Industry 4.0
Architectural Review Asia Pacific
Article
Five Steps To Join The Era Of Industry 4.0
Sep 4, 2019
When 3D modelling tool Revit first arrived on the scene, Australian architects were some of the world’s earliest adopters, with local users outnumbering Europe and the US combined. As a country, we’re often ahead of the curve, and should be building
1 min read
Fact-check And Verify Information
Post South Africa
Article
Fact-check And Verify Information
Mar 13, 2024
Q: What is AI? A: AI is the acronym for artificial intelligence (AI) and refers to the development of computer systems capable of performing tasks that typically require human intelligence, such as visual perception, speech recognition, decision-maki
3 min read
Adoption of Cognitive Computing Across Various Industries
Techfastly
Article
Adoption of Cognitive Computing Across Various Industries
Dec 1, 2021
5 min read
Remote AI
Residential Tech Today
Article
Remote AI
Jun 28, 2019
Artificial Intelligence (AI) is changing our world at a dizzying pace, promising to improve lives and make us all better, faster, and stronger (or unemployed!). I spend a considerable amount of time studying where AI might impact the smart home, part
4 min read
Family History In The AI Era
Family Tree UK
Article
Family History In The AI Era
Apr 12, 2024
7 min read
A.i. Coding
Linux Format
Article
A.i. Coding
Aug 22, 2023
16 min read
Web App Security
Linux Format
Article
Web App Security
Jun 29, 2021
8 min read
Leadership Forum: Investing in Disruption
Rotman Management
Article
Leadership Forum: Investing in Disruption
Jan 1, 2019
10 min read
The Machine Learning Revolution
Maximum PC
Article
The Machine Learning Revolution
Aug 17, 2021
8 min read
You Won’t Believe How Well This Algorithm Spots Clickbait
Futurity
Article
You Won’t Believe How Well This Algorithm Spots Clickbait
Aug 29, 2019
3 min read
New A.I. Application Can Write Its Own Code
Futurity
Article
New A.I. Application Can Write Its Own Code
Apr 25, 2018
Computer scientists have created a deep-learning, software-coding application that can help human programmers navigate the growing multitude of often-undocumented application programming interfaces, or APIs. Designing applications that can program co
2 min read
How To Make Sense From And With AI ?
The European Business Review
Article
How To Make Sense From And With AI ?
Sep 25, 2021
4 min read
“Be Global But Act Local because Each Economy Is Unique”
Business Today
Article
“Be Global But Act Local because Each Economy Is Unique”
Dec 8, 2023
6 min read

Related categories

Skip carousel

Reviews for Machine Learning Infrastructure and Best Practices for Software Engineers

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Machine Learning Infrastructure and Best Practices for Software Engineers - Miroslaw Staron

Cover.pngPackt Logo

Machine Learning Infrastructure and Best Practices for Software Engineers

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Group Product Manager: Niranjan Naikwadi

Publishing Product Manager: Yasir Ali Khan

Book Project Manager: Hemangi Lotlikar

Senior Editor: Sushma Reddy

Technical Editor: Kavyashree K S

Copy Editor: Safis Editing

Proofreader: Safis Editing

Indexer: Hemangini Bari

Production Designer: Gokul Raj S.T

DevRel Marketing Coordinator: Vinishka Kalra

First published: January 2024

Production reference: 1170124

Published by

Packt Publishing Ltd.

Grosvenor House

11 St Paul’s Square

Birmingham

B3 1RB, UK

ISBN 978-1-83763-406-4

www.packtpub.com

Writing a book with a lot of practical examples requires a lot of extra time, which is often taken from family and friends. I dedicate this book to my family – Alexander, Cornelia, Viktoria, and Sylwia – who always supported and encouraged me, and to my parents and parents-in-law, who shaped me to be who I am.

– Miroslaw Staron

Contributors

About the author

Miroslaw Staron is a professor of Applied IT at the University of Gothenburg in Sweden with a focus on empirical software engineering, measurement, and machine learning. He is currently editor-in-chief of Information and Software Technology and co-editor of the regular Practitioner’s Digest column of IEEE Software. He has authored books on automotive software architectures, software measurement, and action research. He also leads several projects in AI for software engineering and leads an AI and digitalization theme at Software Center. He has written over 200 journal and conference articles.

I would like to thank my family for their support in writing this book. I would also like to thank my colleagues from the Software Center program who provided me with the ability to develop my ideas and knowledge in this area – in particular, Wilhelm Meding, Jan Bosch, Ola Söder, Gert Frost, Martin Kitchen, Niels Jørgen Strøm, and several other colleagues. One person who really ignited my interest in this area is of course Mirosław Mirek Ochodek, to whom I am extremely grateful. I would also like to thank the funders of my research, who supported my studies throughout the years. I would like to thank my Ph.D. students, who challenged me and encouraged me to always dig deeper into the topics. I’m also very grateful to the reviewers of this book – Hongyi Zhang and Sushant K. Pandey, who provided invaluable comments and feedback for the book. Finally, I would like to extend my gratitude to my publishing team – Hemangi Lotlikar, Sushma Reddy, and Anant Jaint – this book would not have materialized without you!

About the reviewers

Hongyi Zhang is a researcher at Chalmers University of Technology with over five years of experience in the fields of machine learning and software engineering. Specializing in machine learning, edge/cloud computing, and software engineering, his research merges machine learning theory and software applications, driving tangible improvements in industrial machine learning ecosystems.

Sushant Kumar Pandey is a dedicated post-doctoral researcher at the Department of CSE, Chalmers at the University of Gothenburg, Sweden, who seamlessly integrates academia with industry, collaborating with Volvo Cars in Gothenburg. Armed with a Ph.D. in CSE from the esteemed Indian Institute of Technology (BHU), India, Sushant specializes in the application of AI in software engineering. His research advances technology’s transformative potential. As a respected reviewer for prestigious venues such as IST, KBS, EASE, and ESWA, Sushant actively contributes to shaping the discourse in his field. Beyond research, he leverages his expertise to mentor students, fostering innovation and excellence in the next generation of professionals.

Table of Contents

Preface

Part 1: Machine Learning Landscape in Software Engineering

Machine Learning Compared to Traditional Software

Machine learning is not traditional software

Supervised, unsupervised, and reinforcement learning – it is just the beginning

An example of traditional and machine learning software

Probability and software – how well they go together

Testing and evaluation – the same but different

Summary

References

Elements of a Machine Learning System

Elements of a production machine learning system

Data and algorithms

Data collection

Feature extraction

Data validation

Configuration and monitoring

Configuration

Monitoring

Infrastructure and resource management

Data serving infrastructure

Computational infrastructure

How this all comes together – machine learning pipelines

References

Data in Software Systems – Text, Images, Code, and Their Annotations

Raw data and features – what are the differences?

Images

Text

Visualization of output from more advanced text processing

Structured text – source code of programs

Every data has its purpose – annotations and tasks

Annotating text for intent recognition

Where different types of data can be used together – an outlook on multi-modal data models

References

Data Acquisition, Data Quality, and Noise

Sources of data and what we can do with them

Extracting data from software engineering tools – Gerrit and Jira

Extracting data from product databases – GitHub and Git

Data quality

Noise

Summary

References

Quantifying and Improving Data Properties

Feature engineering – the basics

Clean data

Noise in data management

Attribute noise

Splitting data

How ML models handle noise

References

Part 2: Data Acquisition and Management

Processing Data in Machine Learning Systems

Numerical data

Summarizing the data

Diving deeper into correlations

Summarizing individual measures

Reducing the number of measures – PCA

Other types of data – images

Text data

Toward feature engineering

References

Feature Engineering for Numerical and Image Data

Feature engineering

Feature engineering for numerical data

PCA

t-SNE

ICA

Locally linear embedding

Linear discriminant analysis

Autoencoders

Feature engineering for image data

Summary

References

Feature Engineering for Natural Language Data

Natural language data in software engineering and the rise of GitHub Copilot

What a tokenizer is and what it does

Bag-of-words and simple tokenizers

WordPiece tokenizer

BPE

The SentencePiece tokenizer

Word embeddings

FastText

From feature extraction to models

References

Part 3: Design and Development of ML Systems

Types of Machine Learning Systems – Feature-Based and Raw Data-Based (Deep Learning)

Why do we need different types of models?

Classical machine learning models

Convolutional neural networks and image processing

BERT and GPT models

Using language models in software systems

Summary

References

Training and Evaluating Classical Machine Learning Systems and Neural Networks

Training and testing processes

Training classical machine learning models

Understanding the training process

Random forest and opaque models

Training deep learning models

Misleading results – data leaking

Summary

References

Training and Evaluation of Advanced ML Algorithms – GPT and Autoencoders

From classical ML to GenAI

The theory behind advanced models – AEs and transformers

AEs

Transformers

Training and evaluation of a RoBERTa model

Training and evaluation of an AE

Developing safety cages to prevent models from breaking the entire system

Summary

References

Designing Machine Learning Pipelines (MLOps) and Their Testing

What ML pipelines are

ML pipelines

Elements of MLOps

ML pipelines – how to use ML in the system in practice

Deploying models to HuggingFace

Downloading models from HuggingFace

Raw data-based pipelines

Pipelines for NLP-related tasks

Pipelines for images

Feature-based pipelines

Testing of ML pipelines

Monitoring ML systems at runtime

Summary

References

Designing and Implementing Large-Scale, Robust ML Software

ML is not alone

The UI of an ML model

Data storage

Deploying an ML model for numerical data

Deploying a generative ML model for images

Deploying a code completion model as an extension

Summary

References

Part 4: Ethical Aspects of Data Management and ML System Development

Ethics in Data Acquisition and Management

Ethics in computer science and software engineering

Data is all around us, but can we really use it?

Ethics behind data from open source systems

Ethics behind data collected from humans

Contracts and legal obligations

References

Ethics in Machine Learning Systems

Bias and ML – is it possible to have an objective AI?

Measuring and monitoring for bias

Other metrics of bias

Developing mechanisms to prevent ML bias from spreading throughout the system

Summary

References

Integrating ML Systems in Ecosystems

Ecosystems

Creating web services over ML models using Flask

Creating a web service using Flask

Creating a web service that contains a pre-trained ML model

Deploying ML models using Docker

Combining web services into ecosystems

Summary

References

Summary and Where to Go Next

To know where we’re going, we need to know where we’ve been

Best practices

Current developments

My view on the future

Final remarks

References

Index

Other Books You May Enjoy

Preface

Machine learning has gained a lot of popularity in recent years. The introduction of large language models such as GPT-3 and 4 only increased the speed of the development of this field. These large language models have become so powerful that it is almost impossible to train them on a local computer. However, this is not necessary at all. These language models provide the ability to create new tools without the need to train them because they can be steered by the context window and the prompt.

In this book, my goal is to show how machine learning models can be trained, evaluated, and tested – both in the context of a small prototype and in the context of a fully-fledged software product. The primary objective of this book is to bridge the gap between theoretical knowledge and practical implementation of machine learning in software engineering. It aims to equip you with the skills necessary to not only understand but also effectively implement and innovate with AI and machine learning technologies in your professional pursuits.

The journey of integrating machine learning into software engineering is as thrilling as it is challenging. As we delve into the intricacies of machine learning infrastructure, this book serves as a comprehensive guide, navigating through the complexities and best practices that are pivotal for software engineers. It is designed to bridge the gap between the theoretical aspects of machine learning and the practical challenges faced during implementation in real-world scenarios.

We begin by exploring the fundamental concepts of machine learning, providing a solid foundation for those new to the field. As we progress, the focus shifts to the infrastructure – the backbone of any successful machine learning project. From data collection and processing to model training and deployment, each step is crucial and requires careful consideration and planning.

A significant portion of the book is dedicated to best practices. These practices are not just theoretical guidelines but are derived from real-life experiences and case studies that my research team discovered during our work in this field. These best practices offer invaluable insights into handling common pitfalls and ensuring the scalability, reliability, and efficiency of machine learning systems.

Furthermore, we delve into the ethics of data and machine learning algorithms. We explore the theories behind ethics in machine learning, look closer into the licensing of data and models, and finally, explore the practical frameworks that can quantify bias in data and models in machine learning.

This book is not just a technical guide; it is a journey through the evolving landscape of machine learning in software engineering. Whether you are a novice eager to learn, or a seasoned professional seeking to enhance your skills, this book aims to be a valuable resource, providing clarity and direction in the exciting and ever-changing world of machine learning.

Who this book is for

This book is meticulously crafted for software engineers, computer scientists, and programmers who seek practical applications of artificial intelligence and machine learning in their field. The content is tailored to impart foundational knowledge on working with machine learning models, viewed through the lens of a programmer and system architect.

The book presupposes familiarity with programming principles, but it does not demand expertise in mathematics or statistics. This approach ensures accessibility to a broader range of professionals and enthusiasts in the software development domain. For those of you without prior experience in Python, this book necessitates acquiring a basic understanding of the language. However, the material is structured to facilitate a rapid and comprehensive grasp of Python essentials. Conversely, for those proficient in Python but not yet seasoned in professional programming, this book serves as a valuable resource for transitioning into the realm of software engineering with a focus on AI and ML applications.

What this book covers

Chapter 1

, Machine Learning Compared to Traditional Software, explores where these two types of software systems are most appropriate. We learn about the software development processes that programmers use to create both types of software and we also learn about the classical four types of machine learning software – rule-based, supervised, unsupervised, and reinforcement learning. Finally, we also learn about the different roles of data in traditional and machine learning software.

Chapter 2

, Elements of a Machine Learning System, reviews each element of a professional machine learning system. We start by understanding which elements are important and why. Then, we explore how to create such elements and how to work by putting them together into a single machine learning system – the so-called machine learning pipeline.

Chapter 3

, Data in Software Systems – Text, Images, Code, and Features, introduces three data types – images, texts, and formatted text (program source code). We explore how each of these types of data can be used in machine learning, how they should be annotated, and for what purpose. Introducing these three types of data provides us with the possibility to explore different ways of annotating these sources of data.

Chapter 4

, Data Acquisition, Data Quality, and Noise, dives deeper into topics related to data quality. We go through a theoretical model for assessing data quality and we provide methods and tools to operationalize it. We also look into the concept of noise in machine learning and how to reduce it by using different tokenization methods.

Chapter 5

, Quantifying and Improving Data Properties, dives deeper into the properties of data and how to improve them. In contrast to the previous chapter, we work on feature vectors rather than raw data. The feature vectors are already a transformation of the data; therefore, we can change such properties as noise or even change how the data is perceived. We focus on the processing of text, which is an important part of many machine learning algorithms nowadays. We start by understanding how to transform data into feature vectors using simple algorithms, such as bag of words, so that we can work on feature vectors.

Chapter 6

, Processing Data in Machine Learning Systems, dives deeper into the ways in which data and algorithms are entangled. We talk a lot about data in generic terms, but in this chapter, we explain what kind of data is needed in machine learning systems. We explain the fact that all kinds of data are used in numerical form – either as a feature vector or as more complex feature matrices. Then, we will explain the need to transform unstructured data (e.g., text) into structured data. This chapter will lay the foundations for going deeper into each type of data, which is the content of the next few chapters.

Chapter 7

, Feature Engineering for Numerical and Image Data, focuses on the feature engineering process for numerical and image data. We start by going through the typical methods such as Principal Component Analysis (PCA), which we used previously for visualization. We then move on to more advanced methods such as the t-Student Distribution Stochastic Network Embeddings (t-SNE) and Independent Component Analysis (ICA). What we end up with is the use of autoencoders as a dimensionality reduction technique for both numerical and image data.

Chapter 8

, Feature Engineering for Natural Language Data, explores the first steps that made the transformer (GPT) technologies so powerful – feature extraction from natural language data. Natural language is a special kind of data source in software engineering. With the introduction of GitHub Copilot and ChatGPT, it became evident that machine learning and artificial intelligence tools for software engineering tasks are no longer science fiction.

Chapter 9

, Types of Machine Learning Systems – Feature-Based and Raw Data-Based (Deep Learning), explores different types of machine learning systems. We start from classical machine learning models such as random forest and we move on to convolutional and GPT models, which are called deep learning models. Their name comes from the fact that they use raw data as input and the first layers of the models include feature extraction layers. They are also designed to progressively learn more abstract features as the input data moves through these models. This chapter demonstrates each of these types of models and progresses from classical machine learning to the generative AI models.

Chapter 10

, Training and Evaluation of Classical ML Systems and Neural Networks, goes a bit deeper into the process of training and evaluation. We start with the basic theory behind different algorithms and then we show how they are trained. We start with the classical machine learning models, exemplified by the decision trees. Then, we gradually move toward deep learning where we explore both the dense neural networks and some more advanced types of networks.

Chapter 11

, Training and Evaluation of Advanced ML Algorithms – GPT and Autoencoders, explores how generative AI models work based on GPT and Bidirectional Encoder Representation Transformers (BERT). These models are designed to generate new data based on the patterns that they were trained on. We also look at the concept of autoencoders, where we train an autoencoder to generate new images based on the previously trained data.

Chapter 12

, Designing Machine Learning Pipelines and their Testing, describes how the main goal of MLOps is to bridge the gap between data science and operations teams, fostering collaboration and ensuring that machine learning projects can be effectively and reliably deployed at scale. MLOps helps to automate and optimize the entire machine learning life cycle, from model development to deployment and maintenance, thus improving the efficiency and effectiveness of ML systems in production. In this chapter, we learn how machine learning systems are designed and operated in practice. The chapter shows how pipelines are turned into a software system, with a focus on testing ML pipelines and their deployment at Hugging Face.

Chapter 13

, Designing and Implementation of Large-Scale, Robust ML Software, explains how to integrate the machine learning model with a graphical user interface programmed in Gradio and storage in a database. We use two examples of machine learning pipelines – an example of the model for predicting defects from our previous chapters and a generative AI model to create pictures from a natural language prompt.

Chapter 14

, Ethics in Data Acquisition and Management, starts by exploring a few examples of unethical systems that show bias, such as credit ranking systems that penalize certain minorities. We also explain the problems with using open source data and revealing the identities of subjects. The core of the chapter, however, is the explanation and discussion on ethical frameworks for data management and software systems, including the IEEE and ACM codes of conduct.

Chapter 15

, Ethics in Machine Learning Systems, focuses on the bias in machine learning systems. We start by exploring sources of bias and briefly discussing these sources. We then explore ways to spot biases, how to minimize them, and finally, how to communicate potential biases to the users of our system.

Chapter 16

, Integration of ML Systems in Ecosystems, explains how packaging the ML systems into web services allows us to integrate them into workflows in a very flexible way. Instead of compiling or using dynamically linked libraries, we can deploy machine learning components that communicate over HTTP protocols using JSON protocols. In fact, we have already seen how to use that protocol by using the GPT-3 model that is hosted by OpenAI. In this chapter, we explore the possibility of creating our own Docker container with a pre-trained machine learning model, deploying it, and integrating it with other components.

Chapter 17

, Summary and Where to Go Next, revisits all the best practices and summarizes them per chapter. In addition, we also look into what the future of machine learning and AI may bring to software engineering.

To get the most out of this book

In this book, we use Python and PyTorch, so you need to have these two installed on your system. I used them on Windows and Linux, but they can also be used in cloud environments such as Google Colab or GitHub Codespaces (both were tested).

If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book’s GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

Download the example code files

You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Machine-Learning-Infrastructure-and-Best-Practices-for-Software-Engineers

. If there’s an update to the code, it will be updated in the GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/

. Check them out!

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: The model itself is created one line above, in the model = LinearRegression() line.

A block of code is set as follows:

def fibRec(n):

if n < 2:

return n

else:

return fibRec(n-1) + fibRec(n-2)

Any command-line input or output is written as follows:

>python app.py

Best practices

Appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, email us at [email protected]

and mention the book title in the subject of your message.

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata

and fill in the form.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected]

with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com

Share Your Thoughts

Once you’ve read Machine Learning Infrastructure and Best Practices for Software Engineers, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page

for this book and share your feedback.

Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.

Download a free PDF copy of this book

Thanks for purchasing this book!

Do you like to read on the go but are unable to carry your print books everywhere?

Is your eBook purchase not compatible with the device of your choice?

Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.

Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.

The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily

Follow these simple steps to get the benefits:

Scan the QR code or visit the link below

Download a free PDF copy of this book

https://packt.link/free-ebook/978-1-83763-406-4

Submit your proof of purchase

That’s it! We’ll send your free PDF and other benefits to your email directly

Part 1:Machine Learning Landscape in Software Engineering

Traditionally, Machine Learning (ML) was considered to be a niche domain in software engineering. No large software systems used statistical learning in production. This changed in the 2010s when recommendation systems started to utilize large quantities of data – for example, to recommend movies, books, or music. With the rise of transformer technologies, this has changed. Commonly known products such as ChatGPT popularized these techniques and showed that they are no longer niche products, but have entered the mainstream software products and services. Software engineering needs to keep up and we need to know how to create the software based on these modern machine learning models. In this first part of the book, we look at how machine learning changes software development and how we need to adapt to these changes.

This part has the following chapters:

Chapter 1

, Machine Learning Compared to Traditional Software

Chapter 2

, Elements of a Machine Learning System

Chapter 3

, Data in Software Systems – Text, Images, Code, and Features

Chapter 4

, Data Acquisition, Data Quality, and Noise

Chapter 5

, Quantifying and Improving Data Properties

Machine Learning Compared to Traditional Software

Machine learning software is a special kind of software that finds patterns in data, learns from them, and even recreates these patterns on new data. Developing the machine learning software is, therefore, focused on finding the right data, matching it with the appropriate algorithm, and evaluating its performance. Traditional software, on the contrary, is developed with the algorithm in mind. Based on software requirements, programmers develop algorithms that solve specific tasks and then test them. Data is secondary, although not completely unimportant. Both types of software can co-exist in the same software system, but the programmer must ensure compatibility between them.

In this chapter, we’ll explore where these two types of software systems are most appropriate. We’ll learn about the software development processes that programmers use to create both types of software. We’ll also learn about the four classical types of machine learning software – rule-based learning, supervised learning, unsupervised learning, and reinforcement learning. Finally, we’ll learn about the different roles of data in traditional and machine learning software – as input to pre-programmed algorithms in traditional software and input to training models in machine learning software.

The best practices introduced in this chapter provide practical guidance on when to choose each type of software and how to assess the advantages and disadvantages of these types. By exploring a few modern examples, we’ll understand how to create an entire software system with machine learning algorithms at the center.

In this chapter, we’re going to cover the following main topics:

Machine learning is not a traditional software

Probability and software – how well do they go together?

Testing and validation – the same but different

Machine learning is not traditional software

Although machine learning and artificial intelligence have been around since the 1950s, introduced by Alan Turing, they only became popular with the first MYCIN system and our understanding of machine learning systems changed over time. It was not until the 2010s that we started to perceive, design, and develop machine learning in the same way as we do today (in 2023). In my view, two pivotal moments shaped the landscape of machine learning as we see it today.

The first pivotal moment was the focus on big data in the late 2000s and early 2010s. With the introduction of smartphones, companies started to collect and process increasingly large quantities of data, mostly about our behavior online. One of the companies that perfected this was Google, which collected data about our searches, online behavior, and usage of Google’s operating system, Android. As the volume of the collected data increased (and its speed/velocity), so did its value and the need for its veracity – the five Vs. These five Vs – volume, velocity, value, veracity, and variety – required a new approach to working with data. The classical approach of relational databases (SQL) was no longer sufficient. Relational databases became too slow in handling high-velocity data streams, which gave way to map-reduce algorithms, distributed databases, and in-memory databases. The classical approach of relational schemas became too constraining for the variety of data, which gave way for non-SQL databases, which stored documents.

The second pivotal moment was the rise of modern machine learning algorithms – deep learning. Deep learning algorithms are designed to handle unstructured data such as text, images, or music (compared to structured data in the form of tables and matrices). Classical machine learning algorithms, such as regression, decision trees, or random forest, require data in a tabular form. Each row is a data point, and each column is one

Enjoying the preview?

Page 1 of 1

Machine Learning Infrastructure and Best Practices for Software Engineers: Take your machine learning software from a prototype to a fully fledged software system

About this ebook

Miroslaw Staron

Related authors

Related to Machine Learning Infrastructure and Best Practices for Software Engineers

Related ebooks

Computers For You

Related podcast episodes

Related articles

Related categories

Reviews for Machine Learning Infrastructure and Best Practices for Software Engineers

What did you think?

Book preview

Machine Learning Infrastructure and Best Practices for Software Engineers - Miroslaw Staron