Unable to debug where torch Adam optimiser is going wrong
I was implementing a training loop in vscode. I have created a Adam optimizer using XLM-Roberta model as follows:
xlm_r_model = XLMRobertaForSequenceClassification.from_pretrained(xlm-roberta-base,
num_labels = NUM_LABELS,
output_attentions=False,
output_hidden_states=False
)
xlm_r_model.to(device)
optimizer = torch.optim.Adam(xlm_r_model.parameters(), lr=LR)
Then at following line:
optimizer.step()
vscode simply terminates the execution, without any error stack trace.
So I debugged to get to know exactly where this is happening. I reached this line, which makes F.adam(...)
call:
Weirdly, on github, torch.optim.adam
does not have this line. It seems that the closest matching line is line 150.
This call then goes to torch.optim._functional.adam
:
In above image, those params
(line 72) in for loop contains 201 elements and am unable to figure it out exactly which param is going wrong. When I continue it to run, it doesn't pause in debug mode whenever error occurs, instead vscode simply terminates.
Again, I am not able to find this function on github's _functional
version
When I checked several Kaggle notebooks (1,2,3,4) for training xlm roberta, they are using AdamW
and torch_xla package to train on TPUs something like this:
import torch_xla.core.xla_model as xm
optimizer = AdamW([{'params': model.roberta.parameters(), 'lr': LR},
{'params': [param for name, param in model.named_parameters() if 'roberta' not in name], 'lr': 1e-3} ], lr=LR, weight_decay=0)
xm.optimizer_step(optimizer)
Do I miss some contenxt and it is indeed compulsory to train using AdamW
or torch_xla
? Or am doing some stupid mistake?
PS:
Topic huggingface bert pytorch python
Category Data Science