Go to the profile of  Paras Patidar
Paras Patidar
I am working on Machine Learning, Python and Django.
1 min read

Classification Problem

We will be taking an example of a classification problem with the help of KNearestNeighbors in Scikit-Learn. KNearestNeighbors -> Basic idea : Predict the label of data point by -> Looking at the 'k' closest labeled data points -> Taking a majority vote

Classification Problem

We will be taking an example of classification problem with the help of KNearestNeighbors in Scikit-Learn.

KNearestNeighbors

-> Basic idea : Predict the label of data point by

    -> Looking at the 'k' closest labeled data points

    -> Taking a majority vote

Scikit-Learn fit and predict

-> All Machine Learning models implemented as Python class :

   -> Implement the algorithm for learning and predicting

   -> Store the information learned from data

-> Training a model on the data = "fitting" a model on the data

   -> .fit() method

-> To predict the labels of new data

   -> .predict() method

Measuring Model Performance

-> In Classification problems, accuracy is commonly used metric.

-> Accuracy = It is the fraction of correct predictions

        Accuracy = No. of correct predictions / No. of data points

Now, here comes two problems ?

  1. Which data should be used to compute the accuracy ?
  2. How will the model perform on new data ?

To resolve this problem we can,

Split data into training and testing set :

-> Fit/Train the classifier on the training set

-> Make predictions on the test set

-> Compare predictions with the known labels

Model Complexity(Important Thing) :

-> Larger k = Smoother decision boundary= Less complex model

-> Smaller k = Can Lead to overfitting = More complex model

Let's code :

We are using Iris Dataset in the following code :

Happy Machine Learning