The problem of overfitting
Overfitting happens when the model or prediction fits the training set too well, but doesn't generalize well to new examples it hasn't seen before
let's look at a comparison plot to better understand:
photo taken from: https://towardsdatascience.com/understanding-the-bias-variance-tradeoff-165e6942b229
overfitting: high variance, fits training set extremely well but does not fit to new examples it hasn't seen before.
underfitting: high bias, does not fitting the training set or new examples
generalization or good balance: fits training set and new examples pretty well.
The goal of Machine learning is to find a model that's neither underfitting nor overfitting.
Addressing overfitting
A few ways to address overfitting:
collect more data
select and use only a subset of features (more on this in the future)
reduce the size of the parameters using regularization
Comments