Sunday, August 16, 2015

Why Regularization is needed in Regression?


Hello Data Ninja’s!

This introductory blog post is all about need of regularization in Ordinary Least Square (OLS) method. I have tried covering the concept using simple text only. In coming posts, I will try to use equations and visuals for better comprehension.  

The Linear Regression is a process of fitting the curve/line so that the sum of square of difference between estimated value and actual value is minimized (minimization of squared residual).The method is also called the Ordinary Least Square (OLS) method.

 But, the Ordinary Least Square method is not sufficient whenever low ratio of observations to number variable exist in Regression Modelling. The prediction accuracy gets compromised in case of higher number of variable and low data points.  The methods like Ridge Regression and Lasso provide probable solution for the problem.
  
Ridge regression generally yields better predictions than OLS solution, through a better compromise between bias and variance. Its main drawback is that all predictors are kept in the model, so it is not very interesting if you seek a parsimonious model or want to apply some kind of feature selection.
To achieve sparsity, the lasso is more appropriate but it will not necessarily yield good results in presence of high collinearity (it has been observed that if predictors are highly correlated, the prediction performance of the lasso is dominated by ridge regression).
I expect the attempt of blog post to introduce the shortcoming of OLS and need of Regularization in regression model was successful through this blog post. If yes keep visiting this blog to get more incite on Data Analytics & Digital Marketing. You can also subscribe for blog posts using subscription options available in right sidebar.