Understanding Ridge Regression: A Guide to Regularization

As we saw in previous posts, a good way to reduce overfitting is to regularize the model (i.e., to constrain it): the fewer degrees of freedom it has, the harder it will be for it to overfit the data. For example, a simple way to regularize a polynomial model is to reduce the number of polynomial degrees.
For a linear model, regularization is typically achieved by constraining the weights of the model. We will now look at Ridge Regression, Lasso Regression, and Elastic Net, which implement three different ways to constrain the weights.

Ridge Regression

Ridge Regression (also called Tikhonov regularization) is a regularized version of Linear Regression: a regularization term equal to α∑i (1,n)=θ(i,2)is added to the cost function.
This forces the learning algorithm to not only fit the data but also keep the model weights as small as possible. Note that the regularization term should only be added to the cost function during training. Once the model is trained, you want to evaluate the model’s performance using the unregularized performance measure.

The hyperparameter α controls how much you want to regularize the model. If α = 0 then Ridge Regression is just Linear Regression. If α is very large, then all weights end up very close to zero and the result is a flat line going through the data’s mean. Equation below presents the Ridge Regression cost function

Mathematical representation of Ridge Regression cost function, combining Mean Squared Error with a regularization term.

Note that the bias term θ0 is not regularized (the sum starts at i = 1, not 0). If we define w as the vector of feature weights (θ1 to θn), then the regularization term is simply equal to ½(∥ w ∥2), where ∥ w ∥2 represents the ℓ2 norm of the weight vector. For Gradient Descent, just add αw to the MSE gradient vector.

Visualize the Ridge Regression

Figure below shows several Ridge models trained on some linear data using different α value. On the left, plain Ridge models are used, leading to linear predictions. On the right, the data is first expanded using PolynomialFeatures(degree=10), then it is scaled using a StandardScaler, and finally the Ridge models are applied to the resulting features: this is Polynomial Regression with Ridge regularization. Note how increasing α leads to flatter (i.e., less extreme, more reasonable) predictions; this reduces the model’s variance but increases its bias.
As with Linear Regression, we can perform Ridge Regression either by computing a closed-form equation or by performing Gradient Descent. The pros and cons are the same. Equation below shows the closed-form solution (where A is the (n + 1) × (n + 1) identity matrix except with a 0 in the top-left cell, corresponding to the bias term).

Comparison of Ridge Regression models with different alpha values on a linear dataset. The left graph shows models with alpha = 0 (blue), 10 (green dashed), and 100 (red dotted), resulting in linear predictions. The right graph displays models with alpha = 0 (blue), 1e-05 (green dashed), and 1 (red dotted) after polynomial transformation, showing varying degrees of model fitting.

Mathematical equation representing the closed-form solution for Ridge Regression, showing the relationship between the input features and the output predictions.

Here is how to perform Ridge Regression with Scikit-Learn using a closed-form solution (a variant of Equation above using a matrix factorization technique by André-Louis Cholesky):

from sklearn.linear_model import Ridge
ridge_reg = Ridge(alpha=1, solver="cholesky")
ridge_reg.fit(X, y)
ridge_reg.predict([[1.5]])

And using Stochastic Gradient Descent

sgd_reg = SGDRegressor(penalty="l2")
sgd_reg.fit(X, y.ravel())
sgd_reg.predict([[1.5]])

The penalty hyperparameter sets the type of regularization term to use. Specifying “l2” indicates that you want SGD to add a regularization term to the cost function equal to half the square of the ℓ2 norm of the weight vector: this is simply Ridge Regression.

Important Notice for college students

If you’re a college student and have skills in programming languages, Want to earn through blogging? Mail us at geekycomail@gmail.com

For more Programming related blogs Visit Us Geekycodes . Follow us on Instagram.

Regularized Linear Models(Ridge Regression) | Machine Learning from Scratch

By

Ridge Regression

Visualize the Ridge Regression

Important Notice for college students

Like this:

Related

By

Related Post

Why RAG Chatbots Struggle in Production

Measuring ROI for a GenAI Initiative in Healthcare

Building a Regression MLP Using the Sequential API

Leave a ReplyCancel reply

You missed

Why RAG Chatbots Struggle in Production

Measuring ROI for a GenAI Initiative in Healthcare

Unique Strings with Odd and Even Swapping Allowed

Applying SOLID Principles and Dependency Injection in Python

By

Ridge Regression

Visualize the Ridge Regression

Important Notice for college students

Share this:

Like this:

Related

By

Related Post

Leave a ReplyCancel reply

You missed

Discover more from Geeky Codes