Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

MRA PROJECT

MILESTONE II
Contents
EXPLORATORY DATA ANALYSIS
1. Problem Statement
2. Data Description
3. Univariate Analysis (Numerical & Categorical Variables)
4. Bivariate Analysis
5. Multivariate Analysis
6. Trends – months/years/quarters/days
Contents
MARKET BASKET ANALYSIS (Association Rules)
1. MBA & Association Rules
2. Support, Lift and Confidence - Threshold values
3. KNIME Workflow
4. Associations
5. Interpretations
6. Recommendations
7. Possible Combo Suggestions
Problem Statement EDA

A Grocery Store shared the transactional data with you. Your job is to identify the most popular
combos that can be suggested to the Grocery Store chain after a thorough analysis of the most
commonly occurring sets of menu items in the customer orders. The Store doesn’t have any combo
meals. Can you suggest the best combo meals?
EXPLORATORY DATA
ANALYSIS
EDA
Data Description EDA

Top 5 rows of data set Bottom 5 rows of data set

Data Description

Data Info
Data Description EDA

1. This data set is about the products purchased by each Order ID on a specified
date in a Grocery store.
2. There are 3 columns and 20641 rows
3. There are no null entries
4. There are 2 data types in the data set
1. Object
2. Integer

5. There are 37 unique Products


6. There are 1139 unique Order ID
7. One of the columns is a Date column
Univariate Analysis EDA
Univariate Analysis EDA

Inferences
1. Poultry has the highest count - 640
2. Hand Soap has the least count - 502
3. There are 37 different products
4. No product has a count less than 500.
5. Only Poultry has a count greater than
600
6. 36 products have a count between 500
and 600
7. Mean count of product is 543
Univariate Analysis EDA

Inferences
1. The OrderID is from 1 to 1139. This denotes each
customer who purchase the products from the
store
2. There are 4 IDs with the highest purchase count
of 34 – 226, 957, 1013 and 1071
3. There are 2 IDs with the least count of 3 – 1139
and 408
4. The least 20 IDs have a count from 3 to 5
5. The top 20 IDs have a count from 34 to 32
Univariate Analysis EDA

Inferences
1. The years 2018 and 2019 have the same number of sales.
2. 2020 has a sale which is less than 75% of the sales in 2018 or 2019
Bi-variate Analysis EDA
Bi-variate Analysis EDA

Inferences
1. In the year 2018, Cereal was sold the most. Sandwich loaves was the least.
2. In the year 2019, Poultry was sold the most. Hand soap was the least.
3. In the year 2020, Dinner rolls was sold the most, Sugar was the least.
4. It is evident that Poultry was bought by 640 distinct OrderIDs and it is the most bought
product.
5. There were 533 distinct OrderID in sales in the year 2018. It was reduced by 26 in the
year 2019
6. Only 99 active customers made purchase in the year 2020. That is 18.5% of number of
customers who were active in the year 2018 and 19.5% of that in the year 2019.
7. In the year 2020, no product exceeded a sale of 100
Multi-variate Analysis EDA
Multi-variate Analysis EDA
Multi-variate Analysis EDA
Trends EDA
Trends EDA
Trends EDA

Inferences
1. Both Product and Order ID follow a similar trend.
2. Be it Year, Quarter, Month or Day, it is evident that the trend is a decreasing trend.
3. This shows that the store is facing a decrease in sales with reference to Order ID and
Product
4. The drastic change was from 2019 to 2020.
5. End of every month records the minimum sales
6. End of every year records the minimum sales
MARKET BASKET
ANALYSIS
MBA
MBA & Association Rules MBA

MBA – Market Basket Analysis


1. Market basket analysis in data mining is to analyse the
combination of products which been bought together.
2. This is a technique that gives the careful study of
purchases done by a customer in a supermarket.
3. This concept identifies the pattern of frequent purchase
items by customers
4. Market Basket Analysis is modelled on Association rule
mining
MBA & Association Rules MBA

Association Rules
1. Association rules are used to predict the likelihood of
products being purchased together.
2. These rules count the frequency of items that occur
together, seeking to find associations that occur far
more often than expected
3. In this problem, using MBA and Association rules will
help finding the best combo and recommendation to
improve sales of the grocery store
4. Apriori is an algorithm for frequent item set mining and
association rule learning over relational databases
Support, Confidence & Lift MBA

The Three measures


SUPPORT — the percentage of transactions in the database follow the rule

CONFIDENCE — the percentage of customers who bought A also bought B

LIFT — the ratio between Confidence of A and Support B


KNIME Workflow MBA
Workflow
1 Dataset is grouped by, based on OrderID
Product column is converted to Sets
2
Threshold values are provided
3 Consequences, Implies and Items columns are renamed
The output is exported into a csv file
4

5
Values
Support = 0.05
Confidence = 0.5
Itemset Length = 3
KNIME Workflow MBA

Output Table – Top 25 Rules


Support Confidence Lift Recommended_item Recommended_with Item_list
0.065 0.507 1.203 poultry <--- [fruits, pork]
0.065 0.503 1.327 soap <--- [sandwich loaves, laundry detergent]
0.066 0.500 1.297 bagels <--- [pork, sugar]
0.066 0.500 1.417 flour <--- [dishwashing liquid/detergent, sandwich loaves]
0.066 0.500 1.331 mixes <--- [butter, hand soap]
0.066 0.510 1.358 individual meals <--- [sandwich loaves, laundry detergent]
0.066 0.500 1.268 waffles <--- [dishwashing liquid/detergent, sandwich loaves]
0.067 0.500 1.280 soda <--- [pasta, pork]
0.067 0.500 1.186 poultry <--- [pasta, pork]
0.067 0.521 1.351 bagels <--- [fruits, pork]
0.067 0.507 1.339 laundry detergent <--- [sandwich bags, sugar]
0.067 0.507 1.202 poultry <--- [sandwich bags, sugar]
0.067 0.503 1.336 juice <--- [spaghetti sauce, flour]
0.067 0.507 1.339 laundry detergent <--- [butter, hand soap]
0.067 0.507 1.297 cheeses <--- [dishwashing liquid/detergent, sandwich loaves]
0.068 0.503 1.274 lunch meat <--- [shampoo, tortillas]
0.068 0.500 1.280 soda <--- [flour, beef]
0.068 0.503 1.294 dinner rolls <--- [shampoo, tortillas]
0.068 0.503 1.339 mixes <--- [shampoo, tortillas]
0.068 0.503 1.349 spaghetti sauce <--- [sandwich loaves, milk]
0.068 0.513 1.366 individual meals <--- [dishwashing liquid/detergent, sandwich loaves]
0.068 0.503 1.368 sandwich bags <--- [butter, pork]
0.068 0.520 1.384 mixes <--- [dishwashing liquid/detergent, sandwich loaves]
0.068 0.503 1.277 waffles <--- [fruits, juice]
0.068 0.500 1.280 cheeses <--- [sandwich loaves, sugar]
KNIME Workflow MBA

Output Table
Support Confidence Lift Recommended_item Recommended_with Item_list
0.195 0.501 1.189 poultry <--- [dinner rolls]

Top 5 Support Values 0.099


0.099
0.579
0.577
1.490 dinner rolls
1.368 poultry
<---
<---
[spaghetti sauce, poultry]
[dinner rolls, spaghetti sauce]
0.099 0.509 1.364 spaghetti sauce <--- [dinner rolls, poultry]
0.096 0.545 1.447 juice <--- [poultry, aluminum foil]

Support Confidence Lift Recommended_item Recommended_with Item_list


0.076 0.585 1.388 poultry <--- [sandwich loaves, laundry detergent]
0.099 0.579 1.490 dinner rolls <--- [spaghetti sauce, poultry]
Top 5 Confidence Values 0.099 0.577 1.368 poultry <--- [dinner rolls, spaghetti sauce]
0.079 0.573 1.36 poultry <--- [mixes, sugar]
0.087 0.566 1.342 poultry <--- [lunch meat, mixes]

Support Confidence Lift Recommended_item Recommended_with Item_list


0.083 0.563 1.498 individual meals <--- [sandwich loaves, lunch meat]

Top 5 Lift Values 0.099


0.078
0.579
0.56
1.490 dinner rolls
1.486 juice
<---
<---
[spaghetti sauce, poultry]
[shampoo, spaghetti sauce]
0.083 0.514 1.47 sandwich loaves <--- [cheeses, ketchup]
0.086 0.547 1.467 spaghetti sauce <--- [dinner rolls, juice]
Associations MBA

Associations Identified
1. At a Support value of 0.05 and Confidence value of 0.5, 1188 rules are formed and the
lift is greater than 1 for all the 1188 rules. This means there is a positive correlation
within the itemset.
2. The higher the support value means the item is more likely to be ordered
3. The higher the Confidence probability, the more likely the product combo will succeed
4. Lift = 1 means there is no correlation within itemset
Lift > 1 → Positive correlation within itemset (Optimum Lift)
Lift < 1 → Negative correlation within itemset
Interpretations MBA

1. Since the Lift value is more than 1 for all the rules formed in the analysis, it means there is
positive correlation within itemset.
2. As shown in the below table, Poultry is the most recommended item. It is recommended for 225
rules.
3. Butter, hand soap and pork have a zero recommendation.
4. Number of recommendations for Poultry is 18.9% out of the total recommendations.
5. Second most recommended item is Soda. It is 5.7% out of total recommendations. This is 13.2%
less than Poultry.
Recommended_item No. of recommends
poultry 225
soda 68
lunch meat 67
yogurt 65
cheeses 63
eggs 62
waffles 54
ice cream 52
dinner rolls 51
dishwashing liquid/detergent 46
Recommendations MBA

1. An introductory offer for new customers with point rewarding scheme for next purchase can
be implemented.
2. Poultry could be suggested as a combo offer with most of the food and snacks items such as
dinner rolls and spaghetti sauce. Creating combo with Poultry will be highly beneficial.
3. Offers such as “Buy 2 Soda to get 1 Poultry free” can be used to increase sales of Soda
4. 25% to 30% Discount on items like Soda, Lunch meat, Yogurt, Cheeses, Eggs, etc will help in
increasing the sales of them.
5. For items that have nil recommendations, no attention is required.
6. Items with high support, confidence and lift should be given offers like “combo packs, buy 1
get 1 free, discounts” to increase the sales.
Possible Combo Suggestions MBA

1. The best 5 recommendations are as shown below. These have the maximum Lift values.
2. These recommendation rules are for the given combo of items
3. It means that, for instance, someone who buys Cheeses and Ketchup is more likely to buy
Sandwich Loaves.
4. For the below shown combo of items, the respective recommended items will go along
resulting in a high sales percent.
individual meals <--- [sandwich loaves, lunch meat]
dinner rolls <--- [spaghetti sauce, poultry]
juice <--- [shampoo, spaghetti sauce]
sandwich loaves <--- [cheeses, ketchup]
spaghetti sauce <--- [dinner rolls, juice]
THANK YOU
PUVYA RAVI

Supporting File Link


HERE

You might also like