Why do most Federated Learning works use SGD, but not more advanced optimizers such as ADAM?

I see most methods using SGD-based optimizers. Since more advanced optimizers are common for centralized learning, such as ADAM, why are they (i.e., ADAM) not as commonly used for federated learning?

Topic federated-learning machine-learning

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.