Why not using linear regression for finetuning the last layer of a neural network?
In transfer learning, often only the last layer of the network is retrained using gradient descent. However, the last layer of a common neural network performs only a linear transformation, so why do we use gradient descent and not linear (or logistic) regression to finetune the last layer?
Topic finetuning transfer-learning linear-regression neural-network
Category Data Science