What is Naive Bayes?

- April 21, 2018

In my blog post Bayes and Binomial Theorem i talk about Bayes' theorem and how it is used to determine, or estimate rather, a conditional probability by turning the conditions around.

...

In other words, you can use P(B|A) and prior probabilities P(A) and P(B) to calculate P(A|B). This is very powerful because often we have information on the former three probabilities but not on the latter.

Naive Bayes is a classification algorithm that does this with features of a dataset. In non-math words: We calculate the probability of belonging to class A given feature vector B by multiplying the proportion of feature vector B in the population of class A with the proportion of class A and then divide the whole thing by the proportion of vector B in the population. This is in principle a very straight forward calculation, but you can probably tell when it will be hard or impossible to do: If we have many features, it becomes more and more unlikely that a specific feature vector has been seen before. And if all or most feature vectors only occur once in a population, we can not be confident of our probability estimations based on them.

Naive Bayes solves this problem by making an assumption that is almost certainly incorrect, namely that features are independent of each other. This is where it gets the 'naive' from. If we do this we can take each feature separately instead of taking the whole feature vector as one. The probability of having a particular value for a feature in a large data set is much larger than having a specific combination of all features. In addition to this naive assumption, the denominator P(B) does not usually need to be calculated, because it will be the same for all classes and for classification purposes it does not matter if we have absolute probabilities, we just need to know which of the classes gets the largest number out of our calculation.

It turns out, perhaps surprisingly given the assumptions it makes, that Naive Bayes is often a very good classification algorithm, often used in problems like email Spam detection. However, do not use it for estimating probabilities. the feature independence assumption makes it very inaccurate for those.

Comments

Tejuteju28 June 2018 at 12:21
It was really a nice article and i was really impressed by reading this Data Science online Course Bangalore
ReplyDelete
Replies
Dharani M29 November 2018 at 06:51
Nice blog....
data science training in Marathahalli

best data science courses in Marathahalli

data science institute in Marathahalli

data science certification Marathahalli

data analytics training in Marathahalli

data science training institute in Marathahalli
ReplyDelete
Replies
asha30 November 2018 at 05:07
Nice post..... Thank you for sharing this post
data science training in bangalore

best data science courses in bangalore

data science institute in bangalore

data science certification bangalore

data analytics training in bangalore

data science training institute in bangalore
ReplyDelete
Replies
Tecblog126 March 2024 at 14:15
A Dedicated server in Silicon Valley requires cutting-edge technologies
ReplyDelete
Replies

Add comment

Search This Blog

from data.science import dinant

What is Naive Bayes?

Comments

Post a Comment

Popular posts from this blog

Kaggle Tensorflow Speech Recognition Challenge

Post-Script to the Purpose of this Blog

What is k-Nearest Neighbors?