Data Science Machine Learning Python

What is Overfitting?

In this post we will discuss overfitting. What causes it and how to deal with it?

Overfitting is a term used in statistical analysis that refers to learned model that is too good. Now you may think that there’s no such thing as too good a thing, but overfitting can be problematic. In machine learning, we want to make data as good as possible.

Fit it using feature engineering- So that a machine learning algorithm has best possible chance to make the right decisions.

A signal is the target pattern that you want to learn, Noise refers to the random error in data that doesn’t need to follow the pattern. and the machine learning goal is to learn the signal while not learning the noise. Overfitting occurs when we learn the noise.

Causes of Overfitting

So what causes overfitting?

Well Primarily complexity where overly complex models exist. Remember, models require large datasets and this can obviously lead to complexity. On the other hand smaller datasets need to be large enough to show clear patterns.

And if they’re not large enough, the agents can’t detect patterns or trends.

So how do we deal with overfitting? Well First we collect more data.

Take more samples and label our examples and combine datasets. You can simplify models using models with fewer parameters when possible. Minimizing attributes and features when possible and use models that punish complexity, regularization methods.

There are also ensemble learning methods where you can train models and combine the results.

Important Note:

If you’re a college student and have skills in programming languages, Want to earn through blogging? Mail us at

Leave a Reply

%d bloggers like this: