🔥 Matt Dancho 🔥’s Post

View profile for 🔥 Matt Dancho 🔥, graphic

Helping 7,000+ learn Data Science for Business | Marketing Analytics | Time Series Forecasting | Quantitative Finance || @mdancho84 on Twitter

XGBoost is now the go-to number 1 must-have algorithm in my data science toolkit. But for years, I had no clue what I was doing. In 3 minutes, I’ll share 3 months of research (business case included). Let’s go: 1. XGBoost, which stands for Extreme Gradient Boosting, is an advanced implementation of the gradient boosting machine (GBM) algorithm. It was developed to optimize both computational speed and model performance. 2. Gradient Boosting Machine (GBM): GBMs are an ensemble approach that combines multiple weak learners (typically decision trees) to create a strong predictive model. 3. Difference between GBM and Random Forest: Random Forest also uses decision trees. However, the difference is how the trees are developed. GBM sequentially adds weak learners (shorter trees), where each one corrects its predecessor. RF constructs strong learners (large trees) in parallel using Bootstrap Aggregation (Bagging). 4. Performance: XGBoost is an ultra-fast implementation of GBM that includes high-efficiency, scalability, ability to handle sparse data, parallel learning, and regularization to reduce over-fitting. XGBoost tends to be more efficient than Random Forest and traditional GBM, and often provides better performance. This is why I like it so much. 5. Business Use Cases: I use XGBoost in many business cases. Let’s start with the one that made the most impact, a $12,000,000 sales increase. Lead scoring. In 2 years of developing a Lead Scoring Model that made my company $12,000,000, I used a number of different algorithms. I started with the most basic- Logistic Regression for classification probability of a customer purchasing. And over time, improved it with better algorithms, better and more complex features. The final iteration had XGBoost as a key model in the algorithm. 6. Business Use Case (Time Series): I later discovered that I could use XGBoost on time series for forecasting sales demand at the product level. This was a major improvement over less-scalable techniques (ARIMA, ETS) that had to be run iteratively on every product. We had 12,000+ products. XGBoost cut training times from 3 days to about 4 minutes. One thing to watch out for is that because it’s tree-based, XGBoost cannot predict beyond the maximum or below the minimum in a dataset. So differencing may be required. There you have it- my top 6 concepts on xgboost. The next problem you'll face is how to apply data science to business. I'd like to help. I’ve spent 100 hours consolidating my learnings into a free 5-day course, How to Solve Business Problems with Data Science. It comes with: 300+ lines of R and Python code  5 bonus trainings 2 systematic frameworks 1 complete roadmap to avoid mistakes and start solving business problems with data science, TODAY. 👉 Here it is for free: https://lnkd.in/e_EkiuFD

  • No alternative text description for this image
Joshua Ebner

AI and data science consulting. Reach out for a consult.

1mo

Boosted trees are consistently one of the best performing ML algorithms, and they're good for solving many problems. But it's worth noting that boosted trees and other ensemble methods are often more difficult to deploy and serve up in production.

So, the question is: what is the last tree doing that the first isn’t doing?

Like
Reply
Muhammad Ishtiaq Khan

60K+ on LinkedIn | Leading Big Data Analytics & Automation at Etisalat e& UAE (PTCL & Ufone Group Internal Audit) | Python, R, PowerBI, SQL, DWH & Tableau | Data Science - Machine Learning - Continuous Auditing

1mo

Your insights into its application, especially in lead scoring and time series forecasting, are truly enlightening. The dramatic reduction in training time for 12,000+ products is particularly impressive.

Gene I.

Product Manager (AI / AdTech / e-commerce)

1mo

thanks, that's great reading time this weekend

Adetunji Onafuwa

Data scientist & Machine Learning Engineer @ up2parts GmbH

1mo

Very informative

Like
Reply
See more comments

To view or add a comment, sign in

Explore topics