Training Loss increases, but Validation Loss decreases
I am finetuning a T5 transformer model on a sequence to sequence task. My program outputs the training and validation loss every 500 optimization steps.
However, when I first started training the model, the training loss steeply increased, but my validation loss decreased (My training dataset has about 85,000 samples and my validation dataset has about 10,000 samples)! Does anyone know why this might be happening? Is this a sign my model is not learning properly?
Also, does anyone know why my training loss is so much higher (with a completely different graph than my validation loss)?
Topic loss huggingface deep-learning nlp machine-learning
Category Data Science