Lab 3: Classification (cont.)

KNN

KNN is possibly the second most famous classification technique, also very simple and easy to apply.

It doesn't create a model, and whenever we need to classify a new record, it chooses the n most similar (closest) records to the given one, call them neighbors, and classifies the new one as the majority of its neighbors.

Naturally, the number of neighbors to consider, call it n, is one of the parameters of any implementation of KNN. KNeighborsClassifier receives n through the n_neighbors parameter. Another important parameter is the distance function to use to choose the neighbors - metric is the parameter to use, and it can be manhattan, euclidean or chebyshev, among others.

Given the importance of these parameters, we need to choose them carefully, which means we need to try different ones and understand how they impact on the quality of the results.

Next, we can see the results achieved by a set of parameters combinations.

After the plot you can see the parameters for which the best results were achieved. So let's see its performance, in that context in terms of other metrics.