Least Absolute Shrinkage and Selection Operator Regression (simply called Lasso Regression) is another regularized version of Linear Regression: just like Ridge Regression, it adds a regularization term to the cost function, but it uses the ℓ1 norm of the weight vector instead of half the square of the ℓ2 norm.

Figure below shows the same thing as another figure but replaces Ridge models with Lasso models and uses smaller α values.

An important characteristic of Lasso Regression is that it tends to completely eliminate the weights of the least important features (i.e., set them to zero). For example, the dashed line in the right plot on Figure above (with α = 10-7) looks quadratic, almost linear: all the weights for the high-degree polynomial features are equal to zero. In other words, Lasso Regression automatically performs feature selection and outputs a sparse model (i.e., with few nonzero feature weights).
You can get a sense of why this is the case by looking at Figure 4-19: on the top-left plot, the background contours (ellipses) represent an unregularized MSE cost function (α = 0), and the white circles show the Batch Gradient Descent path with that cost function. The foreground contours (diamonds) represent the ℓ1 penalty, and the triangles show the BGD path for this penalty only (α → ∞). Notice how the path first reaches θ1 = 0, then rolls down a gutter until it reaches θ2 = 0. On the top-right plot, the contours represent the same cost function plus an ℓ1
penalty with α = 0.5. The global minimum is on the θ2 = 0 axis. BGD first reaches θ2 = 0, then rolls down the gutter until it reaches the global minimum. The two bottom plots show the same
thing but uses an ℓ2 penalty instead. The regularized minimum is closer to θ = 0 than the unregularized minimum, but the weights do not get fully eliminated.

The Lasso cost function is not differentiable at θi = 0 (for i = 1, 2, ⋯, n), but Gradient Descent still works fine if you use a subgradient vector g instead when any θi = 0. Equation 4-11 shows a subgradient vector equation you can use for Gradient Descent with the Lasso cost function.

Here is a small Scikit-Learn example using the Lasso class. Note that you could instead use an SGDRegressor(penalty=”l1″).
from sklearn.linear_model import Lasso
lasso_reg = Lasso(alpha=0.1)
lasso_reg.fit(X, y)
lasso_reg.predict([[1.5]])
Conclusion
In conclusion, Lasso Regression proves to be a valuable addition to the data scientist’s toolkit, offering a robust solution to linear regression problems with high-dimensional data and contributing to the ongoing quest for parsimonious and interpretable models in the ever-evolving landscape of machine learning.