Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 36

Naive Bayes

Naive Bayes
• Naive Bayes is a probabilistic Supervised
machine learning algorithm that can be used
in a wide variety of classification tasks.
• Typical applications include filtering spam,
classifying documents, sentiment prediction
etc. It is based on the works of Thomas Bayes
(1702–61) and hence the name.
why is it called ‘Naive’?
• The name naive is used because it assumes
the features that go into the model is
independent of each other.
• That is changing the value of one feature, does
not directly influence or change the value of
any of the other features used in the
algorithm.
Pros
• It is easy and fast to predict class of test data set.
• It also perform well in multi class prediction
• When assumption of independence holds, a Naive
Bayes classifier performs better compare to
other models like logistic regression and you need
less training data.
• It perform well in case of categorical input
variables compared to numerical variable(s). For
numerical variable, normal distribution is assumed
(bell curve, which is a strong assumption).
When to use Naive Bayes
• When the data set is Labelled
• When the data set is large
• When the attributes are independent
Cons
• If categorical variable has a category (in test
data set), which was not observed in training
data set, then model will assign a 0 (zero)
probability and will be unable to make a
prediction. This is often known as “Zero
Frequency”. To solve this, we can use the
smoothing technique. One of the simplest
smoothing techniques is called Laplace
estimation.
Cons
• On the other side naive Bayes is also known as
a bad estimator, so the probability outputs
from predict_proba are not to be taken too
seriously.
• Another limitation of Naive Bayes is the
assumption of independent predictors. In real
life, it is almost impossible that we get a set of
predictors which are completely independent.
Applications of Naive Bayes Algorithms
• Real time Prediction: Naive Bayes is an eager
learning classifier and it is sure fast. Thus, it could
be used for making predictions in real time.
• Multi class Prediction: This algorithm is also well
known for multi class prediction feature. Here we
can predict the probability of multiple classes of
target variable.
• Text classification/ Spam Filtering/ Sentiment
Analysis
• Recommendation System:
What is Conditional Probability?
• Coin Toss
When you flip a fair coin, there is an equal
chance of getting either heads or tails. So you
can say the probability of getting heads is 50%.
• Fair Dice
Similarly what would be the probability of getting
a 1 when you roll a dice with 6 faces? Assuming
the dice is fair, the probability of 1/6 = 0.166.
What is Conditional Probability?
• Playing Cards
• If you pick a card from the deck, can you guess
the probability of getting a queen given the card
is a spade?
• Well, I have already set a condition that the card
is a spade. So, the denominator (eligible
population) is 13 and not 52. And since there is
only one queen in spades, the probability it is a
queen given the card is a spade is 1/13 = 0.077
What is Conditional Probability?
• This is a classic example of conditional
probability. So, when you say the conditional
probability of A given B, it denotes the
probability of A occurring given that B has
already occurred.
• Mathematically, Conditional probability of A
given B can be computed as:
P(A|B) = P(A AND B) / P(B)
What is Conditional Probability?
• School Example
• Let’s see a slightly complicated example. Consider a
school with a total population of 100 persons. These
100 persons can be seen either as ‘Students’ and
‘Teachers’ or as a population of ‘Males’ and
‘Females’.
• With given below tabulation of the 100 people, what
is the conditional probability that a certain member
of the school is a ‘Teacher’ given that he is a ‘Man’?
• calculate this, you may intuitively filter the
sub-population of 60 males and focus on the
12 (male) teachers.
• So the required conditional probability
P(Teacher | Male) = 12 / 60 = 0.2.
Bayes Rule
• The Bayes Rule that we use for Naive Bayes,
can be derived from these two notations.
Bayes’ theorem
• The probability of an event, based on prior
knowledge of conditions that might be
related to the event.

as marginal probability
Navie Baye’s theorem
Example
• Let us say P(Fire) - how often there is fire, and
P(Smoke) means how often we see smoke,
then:
• P(Fire|Smoke) - how often there is fire when
we can see smoke
• P(Smoke|Fire) means how often we can see
smoke when there is fire
STRONG

today = (Sunny,
Cool, High,
True) Play or Not
Example
Example
Naive Bayes
Naïve Bayes

P(today)yes:

1- 0-
P(today)no:

P ( X ( today  | Play yes )∗P ( Play= yes)


P ( (Play yes |X ( today ))=
P ( X ( today))

P (X ( today   | Playno )∗P ( Play= no )


P (( Play no |X ( today ))=
P (X ( today ))
Example
• today = (Sunny, Hot, Normal, False ) Play or Not
Example
Probability Basics
• Probability function P(), returns the probability of
an event
• Probability functions for categorical features are
referred to as probability mass functions
• The probability functions for continuous features
are known as probability density functions.
• The joint probability refers to the probability of
an assignment of specific values to multiple
different features p(a=T, b=T, c=F)
Probability Basics
• Conditional probability refers to the probability of
one feature taking a specific value given that we
already know the value of a different feature p(a|b)
• Probability distribution is a data structure that
describes the probability of each possible value a
feature can take (p(toss) =p(H)+p(T)=1
• A joint probability distribution is a probability
distribution over more than one feature assignment
Probability Basics
• Product Rule:
• Chain Rule:
• Bayesian Theorem: Dependent Features
• Naive Bayesian Theorem: Independent Features
Bayesian Prediction: Example
Bayesian Prediction
Bayesian Prediction
Bayesian Prediction

P(t)=0.3333 <p(f)=0.6667 … So FALSE

Predict when :
Bayesian Prediction

You might also like