Lab 3: Classification (cont.)

Naive Bayes

Naive Bayes is one of the most famous classification techniques, one of the most simplest ones, and the easiest to apply.

Like other Bayesian techniques, it just chooses the most probable class for each record, according to the estimation of the probability of each class given the record, whose label we want to predict. The trick and simplicity of Naive Bayes resides in the assumption of conditional independence among the variables, with simplifies that estimation and turns Naive Bayes as the standard baseline for classification.

Indeed, we can evaluate the performance of each classifier over a given dataset, simply by comparing their results among each other, in particular with the results of Naive Bayes over the dataset.

The nicest property of Naive Bayes is that it is not parametrizable, and so, its performance serves as a comparison baseline: any model is only interesting if it outperforms the one learnt through Naive Bayes.

If we inspect the classes available in the sklearn.naive_bayes package, we see there are more then the GaussianNB estimators. Indeed, there are also the MultinomialNB and the BernoulliNB, that are adequate to use when the data distribution is close to be a multinomial or Bernoulli.