LASSO
(1.8 hours to learn)
Summary
The Lasso is a form of regularized linear regression. Unlike ridge regression, it puts an L1 penalty on the weights, which encourages sparsity, i.e. it encourages most of the weights to be exactly zero. The general trick of using L1 norms to encourage sparsity is widely used in machine learning.
Context
This concept has the prerequisites:
- linear regression (LASSO is a regularized form of linear regression.)
- optimization problems (LASSO is formulated as an optimization problem.)
- ridge regression (It's useful to understand the Lasso by contrasting it with ridge regression.)
Core resources (read/watch one of the following)
-Free-
→ The Elements of Statistical Learning
A graudate-level statistical learning textbook with a focus on frequentist methods.
-Paid-
→ Machine Learning: a Probabilistic Perspective
A very comprehensive graudate-level machine learning textbook.
Location:
Sections 13.3-13.3.4, pgs. 429-438
Supplemental resources (the following are optional, but you may find them useful)
-Paid-
→ Pattern Recognition and Machine Learning
A textbook for a graduate machine learning course, with a focus on Bayesian methods.
Location:
Section 3.1.4, pgs. 144-146
See also
- Ridge regression is another regularized version of linear regression, using an L2 penalty instead of L1.
- The LASSO encourages sparsity of the weight vector. If we believe certain features are likely to be important as a group, we can use group sparsity instead.
- Some algorithms for optimizing the LASSO objective include: Other uses of L1 regularization include: