Minibatch SGD performs better than Adam for Region proposal network training

Question

Minibatch SGD performs better than Adam for Region proposal network training

Abhisek Dash

2021年8月20日 15:01

I am using both minibatch SGD (with momentum) and Adam for training a region proposal network. The library used is KERAS. The batch size in both cases is 5 and initial learning rate is 0.01. The learning rate decay schedule is also same for both optimizers

The rpn classification loss steadily reduces in case of SGD with momentum but diverges in case of Adam. The performance of SGD with momentum is noticeably better after about 500 epochs

Given that everything is same in case of both optimizers, why does Adam perform worse. Any intuitive explanations would be great.

Topic object-detection mini-batch-gradient-descent keras deep-learning machine-learning

Category Data Science

Minibatch SGD performs better than Adam for Region proposal network training

About