tnn architecture suitable for distributed learning?

I am working on tnn, i found that its not working like other neural networks like they have layers and weights. my question is that tnn can be used with federated learning in which we trained model on clients and only send the model weights to the server.

Topic federated-learning

Category Data Science


The transformer architecture is not different from other architectures, in the sense that it has layers, trainable parameters and is trained with gradient descent techniques. Therefore, it can be subject to federated learning.

However, Transformer models are normally very large in comparison with other architectures like LSTMs, and pose problems in the federated setup, specifically slow and unstable convergence.

You may have a look at variants of the Transformer specially made for the federated setup, e.g.: Federated Learning with Dynamic Transformer for Text to Speech (INTERSPEECH'21)

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.