Overfitting and Regularization
Overfitting

- Overfitting is when the model performs well in training but poorly on test data, leading to low training error and high test error.
- High variance
- Low Bias
How to Fix:
- Adding More Data:
- Retrain the algorithm on a larger, more diverse dataset to improve model performance.
- Consider data augmentation to artificially increase the size of the dataset, introducing new variations to improve generalization.
- Regularization:
- Introduce a complexity penalty to the model to prioritize simpler solutions and reduce overfitting.
- Various regularization techniques exist to prevent overfitting by restricting the model's complexity.
- Removing Features:
- Simplify the data by removing irrelevant or complex features to reduce overfitting.
Regularization
Ridge Regression / L2 Regularization
- Shrinkage technique that aims to solve Overfitting by shrinking some of the model's coefficients towards 0.
Where:
-
is the matrix of predictor variables. -
is the vector of observed values. -
is the regularization parameter. -
is the identity matrix. -
represents the ridge regression coefficients. -
Pros:
- Solves Overfitting
- Lower model variance
- Computationally cheap
-
Cons
- Low interpretability
Lasso Regression / L1 Regularization
- Shrinkage technique that aims to solve Overfitting by shrinking some of the model's coefficients towards 0 and setting some to 0 itself.
- Lasso Regression Formula
- The objective function for Lasso regression is given by
-
Objective Function:
- The Lasso regression aims to minimize the following objective function:
-
Residual Sum of Squares:
- The first term is the residual sum of squares, which measures the difference between the observed and predicted values
-
Regularization Term:
- The second term is the L1 regularization term, which is the sum of the absolute values of the coefficients
-
Combine Terms:
- The Lasso regression combines these two terms to form the objective function that needs to be minimized
-
Optimization Problem:
- The goal is to find the coefficients
that minimize the objective function
- The goal is to find the coefficients
- Pros
- Solves Overfitting
- Easy to understand
- High interpretability
- Feature selection
- Cons
- Higher variance than Ridge.