Why do most Federated Learning works use SGD, but not more advanced optimizers such as ADAM?
I see most methods using SGD-based optimizers. Since more advanced optimizers are common for centralized learning, such as ADAM, why are they (i.e., ADAM) not as commonly used for federated learning?
Topic federated-learning machine-learning
Category Data Science