Sequence-to-Sequence Transformer for Neural machine translation
I am using the tutorial in Keras documentation here. I am new to deep learning. On a different dataset Menyo-20k dataset, of about 10071 total pairs, 7051 training pairs,1510 validation pairs,1510 test pairs. The highest validation accuracy and test accuracy I have gotten is approximately 0.26. I tried the list of things below:
- Using the following optimizers: 
SGD, Adam, RMSprop - Tried different learning rate
 - Tried the dropout rate of 
0.4 and 0.1 - Tried using different embedding dimensions and feed-forward network dimension
 - Used 
Early stopping and patience =3, the model does not go past the13th epoch. I tried the model itself without changing any parameters, thevalidation accuracy never got to 0.3, I tried to change the different parameters in order to know what I am doing wrong and I can't figure it out. Please what am I doing wrong? Thank you in advance for your guidance. 
Topic transformer keras deep-learning language-model nlp
Category Data Science