Why does using Gradient descent over Stochatic gradient descent improve performance?

Currently, I'm running two types of logistic regression.

  1. logistic regression with SGD
  2. logistic regression with GD

implemented as follows

SGD= SGDClassifier(loss=log,max_iter=1000,penalty='l1',alpha=0.001)
logreg = LogisticRegression(solver='liblinear', max_iter=100, penalty='l1', C=0.1)

nevermind the hyperparameters as I've used GridsearchCV and tried multiple combinations.

When calculating accuracy logistic with GD performs better than SGD. I want to understand why this is the case, is using GD instead SGD one way to mitigate underfitting model?

Topic sgd gradient-descent logistic-regression python machine-learning

Category Data Science


Gradient Descent should have better results as it runs in your whole data. Stochastic Gradient Descent looks at batches, making it useful for big data. Batches( or subset) make it run faster but, it can get converge at a local minimum.

On Wikipedia you can find the following citation :

SGD replaces the actual gradient (calculated from the entire data set) by an estimate thereof (calculated from a randomly selected subset of the data). Especially in high-dimensional optimization problems, this reduces the computational burden, achieving faster iterations in trade for a lower convergence rate

SGD vs GD


SGD has a regularization effect and finds the solution faster. GD on the other hand takes a look at whole data and finds the next best step.

SGD may be come to optimal global minima but GD can. But GD is not practical with large data.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.