Overfitting

Overfitting is a modeling error that occurs when a Machine Learning model learns not only the underlying pattern in the training data but also the noise and outliers. This results in a model that performs exceptionally well on the training set but poorly on unseen data, as it fails to generalize.

Examples of Overfitting:

  • Polynomial Regression: A polynomial Regression model of high degree may perfectly fit a small set of training data points but will have large deviations when predicting new data points.
  • Decision Trees: A decision tree that is overly complex, with many branches, may classify training data perfectly but will struggle with new data, as it captures specific details that don’t generalize.

Cases of Overfitting:

  • Small Datasets: When the dataset is small, a model may fit the training data too closely, learning the noise rather than the signal.
  • High Model Complexity: Using highly complex models (like deep neural networks) without sufficient data can lead to overfitting, as they have more capacity to learn the noise.