Model Evaluation can help you understand Model's Performance
Splitting data into Train and Test Set
- Accuracy = Fraction of correct predictions
- Which data should be used to compute accuracy ?
- How will the model perform on new data ?
We can split data into training and testing set :
- Fit/Train the classifier on the training set
- Make Predictions on the test set
- Compare the predictions with the known lables
We always need some anoynomus data to test our model , so that we can check how well it is formed. So we need to split our dataset into training and testing set.
Each ML task has different metrices for evaluation :
- R^2 (R Square)
- Within sum of squared errors
You can get the metrics of data on the training set , but you model know your data and is of no use to use the metrics on training set.
So, we have to get the metrics of data on the test set, the data which is unknown for your model and how well it is performed on new and unknown data will tell us the real metrics. So use metrics on the test data.
This is why we split our data into training and testing set.
Sometimes referred to as “testing” data, the holdout subset provides a final estimate of the machine learning model’s performance after it has been trained and validated. Holdout sets should never be used to make decisions about which algorithms to use or for improving or tuning algorithms.
It is the fundamental way to understand your model's perfromance.
The bias-variance trade-off is the point where we are adding noise by adding model complexity(flexibility).
In this the error in the training set goes down as the model is complex and fitted on the training set but the test error is starting to go up.
The model after the bias trade-off begins to overfit.
Let's see by an example :
- Here, the center of the target is a model that perfectly predicts the correct values.
- As we move away from the center , our predictions get worse and worse.
- Now, we are repeating our entire model building process to get a number of separate hits on the target.
- Each hits represents an individual realization of our model,given the chance variability in the training data we gather.
- Sometimes, we will get a good distribution of training data so we predict very well and we are close to the target (near the red part), while sometimes our training data must be full of outliers or non-standard values results in poorer predictions.
- These different realizations results in a scatter of hits on the target.
From the above diagram you can understand that :
- Low Bias - Low Variance results is better predictions.
- Low Bias - High Variance results in scattered prediction but around the target.
- High Bias - Low Variance Results in the prediction which is far away from the target and gathered at the name place.
- High Bias - High Variance results in worse predictions.
A common temptation for beginners is to continually add complexity to a model until its fits the training set very well.
It will result in high accuracy in the training dataset but when it wil interact with the test dataset or new data it will result in poor accuracy.
Doing this can cause a model to overfit to your training data and cause large errors on new data, such as the test set.
Let's take a look at an example model on how we can see overfitting occurs from a error standpoint using test data.
We will use Black Curve with some noise points off of it to represent the True shape the data follows.
In the first figure :
- We are plotting a curve with respect to features(X) and target variables(Y).
- Black curve is the True shape that the data follows.
- When we apply a linear model(Yellow line) it fit the model in a linear way and fits the dataset with the linear relationship.
- When we apply a Quadratic model(Blue line) it make the model little complex and fits the data with some model complexity.
- When we apply a spline(Green line) it make the model more complex that linear and Quadratic which is not good, it will result the high accuracy in the training data but as soon as the new data or test set comes it will result in large errors.
So, hence we conclude do not overfit the model.
In the second figure :
- We are plotting the graph between the model complexity(flexibility) and the Mean Squared Error.
- In training data, The linear model performed on the training data results in the large error. The Quadratic model performed on the training data results in the decrease in error. The Spline model performed on the training data results in the best error and model completely fits the data with a small error.
- In test data, The linear model performed on the test data results in approximately the same as training data.The Quadratic model performed on the test data also results in approximatel the same as training data but a slight increase in error.But, the problem comes here in spline data we were thinking that our model will result in the best error but it gave us the high error than the quadratic model.
Hence, we conclude that do not overfit the data otherwise it will not perform well on the test set or the new data.
In the third figure :
- It is just the implementation of Mean Squared Error, Bias and Variance.
It just telling the performance of MSE , Bias and Variance.
So, you understood the model evaluation and Bias-Variance trade off and i know it is quite lengthy but it will clear your all the concepts. This last picture is the conclusion of all the things.
This is the graph between the Model Complexity and Prediction Error :
- Model complexity results from Low to High
- As it moves to the left of the graph , resulting in the Low model complexity and High Bias - Low Variance the model starts to become Underfit the training and testing set.
- As it moves to the right of the graph, resulting in the high model complexity and Low Bias - High Variance the model start to become overfit the training and testing set.
- So we have to face the Bias-Variance Trade-Off .So, we have to evaluate our model so that it neither Overfits or underfit the data and remains at the center of the graph and giving us the prediction.
We should spent our time on Model Evaluation and Bias-Variance Trade-Off to bring the best model which do not underfit or overfit the data.
Thank You !