Meri Nova’s Post

View profile for Meri Nova, graphic

I help data and ml enthusiasts land 6-figure jobs | Data Scientist & ML Engineer | Founder @Break Into Data | ADHD + C-PTSD advocate

You can’t call yourself a Machine Learning Engineer if you don’t know how to deploy your models. And no it is not just about wrapping functions in API endpoints using Flask or BentoML and calling it a day. When I ran the ML Coding Challenge Break Into Data, I advised participants to work on smaller problems which meant working on one model. However, in reality, big tech companies have hundreds and thousands of models running in production at any given moment. And every single model has its own complex deployment architecture that corresponds to their software and product requirements. So here are 4 main deployment strategies you should know: 1. Batch prediction (Figure 1) - The predictions are generated at defined frequencies and stored in a SQL or in-memory database and retrieved as needed. For example, Amazon might generate top recommended products every 4 hours and they are shown to users when they log in. You can read more about the architecture here - https://lnkd.in/gAbRCvdX 2. Online prediction (Figure 2) - Rather than wait hours or days for predictions, we can generate predictions as soon as they are needed and serve these to users right away. Online inferencing also allows us to make predictions with recent streaming features, like product or user activity updates made in the last 10 minutes.  You can read more on the differences between online and batch prediction here - https://lnkd.in/ggV7xGaS 3. Real-time deployment (Figure 3) - It is one of the hardest deployment architectures. It requires that the reaction to requests must be fulfilled within a matter of milliseconds. Consider, stock market predictions, air traffic control use cases, where latency is one of the highest priorities. To handle additional parallel requests from other users, you need to count on multi-threaded processes and vertical scaling by adding additional servers. Learn more here - https://lnkd.in/gQHKyqS6 4. Edge deployment (Figure 4)- It is when the model is directly deployed on the client such as the local machine, a mobile phone or IoT products. This allows offline predictions that result in the fastest inferences. Models need to be lightweight to fit on the smaller hardware pieces. Check out a CV use case here - https://lnkd.in/g8KSxXTG … If you want to learn more about Practical ML Engineering, sign up here - merinova.substack.com #machinelearning #ai

  • No alternative text description for this image
  • No alternative text description for this image
  • No alternative text description for this image
Stevance Nikoloski

Data Scientist | University Lecturer | Expert in data science, AI/ML/LLM solutions. We are offering LLMOps platform either on premises or PaaS

3w

Hi Meri Nova, it is a very nice and useful post. But I think it is not good and ethic to use graphics without referencing to them. The diagram of Real-time deployment architecture is from Aurimas Griciūnas, and you should note that. https://www.linkedin.com/posts/aurimas-griciunas_genai-llm-machinelearning-activity-7180831680532733952-34ih?utm_source=combined_share_message&utm_medium=member_desktop

Venkata Naga Sai Kumar Bysani

Data Scientist | BCBS Of South Carolina | SQL | Python | AWS | ML | Featured on Times Square, Fox, NBC | MS in Data Science at UConn | Proven track record in driving actionable insights and predictive analytics |

3w

Great breakdown, Meri:)

Shane Butler

DS @ Stripe | Data Science Coach

3w

Absolutely, and it doesn't stop at deployment either. Ensuring your models remain effective and reliable over time involves continuous monitoring and maintenance. Model retraining, managing drift, and implementing feedback loops are critical to maintaining the performance of your deployed models in production. Many of the DS investigations I've been a part of when we need to understand changes in metrics start with: "has anything changed in our ML model performance?"

Like
Reply
Anuj Yadav

Technology Leader | X- AVP Technology IXIGO | Microsoft | Oracle | Expedia | MakeMyTrip

3w

I can't agree more! Evaluating the need and functioning is difficult without understanding how you want your models to learn. Yes, once clearly explained others can help. Like, someone good at K8 and Docker can help you with setup using the CI/CD platform. But they need to be provided with clear instructions or detailed information. My $0.02 => Every engineer is a product manager for infrastructure. If not, challenges and late nights are waiting.

Like
Reply
Kartik Singhal

Senior Machine Learning Engineer @ Meta

3w

Advise anyone trying to improve their MLE skills to save this post and also subscribe! Thanks for sharing Meri.

David Leon

Performance Optimizations & Algorithm Developer @ Mobileye | Distributed ML Researcher

3w

Nerlnet is an excellent open source framework exactly to practice and conduct experiments of online training in distributed systems. Edge models are part of distributed systems and indeed they should solve smaller problems that are gathered together forming solutions of big problem. Search Nerlnet on github and give it a try.

Like
Reply

Meri, wonderful insights on ML deployment strategies! I'm curious, how adaptable are these strategies in projects with continuously evolving data? Alex Belov

Like
Reply
Saul Ramirez, Ph.D.

Data Scientist | ML Research Engineer | LLM Wizard

3w

Meri Nova Awesome post, where was this yesterday before my interview when I was asked to describe a system for online predictions 😅

Like
Reply
Sravya Madipalli

Senior Manager, Data Science| Ex-Microsoft

3w

Meri Nova, thanks for the breakdown and, more importantly, for giving us great resources to learn more about them.

Like
Reply
See more comments

To view or add a comment, sign in

Explore topics