tnn architecture suitable for distributed learning?

Question

tnn architecture suitable for distributed learning?

user12

2021年9月2日 19:15

I am working on tnn, i found that its not working like other neural networks like they have layers and weights. my question is that tnn can be used with federated learning in which we trained model on clients and only send the model weights to the server.

Topic federated-learning

Category Data Science

noe · Accepted Answer · 2021年9月2日 07:26

The transformer architecture is not different from other architectures, in the sense that it has layers, trainable parameters and is trained with gradient descent techniques. Therefore, it can be subject to federated learning.

However, Transformer models are normally very large in comparison with other architectures like LSTMs, and pose problems in the federated setup, specifically slow and unstable convergence.

You may have a look at variants of the Transformer specially made for the federated setup, e.g.: Federated Learning with Dynamic Transformer for Text to Speech (INTERSPEECH'21)

tnn architecture suitable for distributed learning?

About