Unable to debug where torch Adam optimiser is going wrong

I was implementing a training loop in vscode. I have created a Adam optimizer using XLM-Roberta model as follows:

xlm_r_model = XLMRobertaForSequenceClassification.from_pretrained(xlm-roberta-base,
                                                num_labels = NUM_LABELS,
                                                output_attentions=False,
                                                output_hidden_states=False  
                                            )
xlm_r_model.to(device)
optimizer = torch.optim.Adam(xlm_r_model.parameters(), lr=LR)

Then at following line:

optimizer.step()

vscode simply terminates the execution, without any error stack trace.

So I debugged to get to know exactly where this is happening. I reached this line, which makes F.adam(...) call:

Weirdly, on github, torch.optim.adam does not have this line. It seems that the closest matching line is line 150.

This call then goes to torch.optim._functional.adam:

In above image, those params (line 72) in for loop contains 201 elements and am unable to figure it out exactly which param is going wrong. When I continue it to run, it doesn't pause in debug mode whenever error occurs, instead vscode simply terminates.

Again, I am not able to find this function on github's _functional version

When I checked several Kaggle notebooks (1,2,3,4) for training xlm roberta, they are using AdamW and torch_xla package to train on TPUs something like this:

import torch_xla.core.xla_model as xm
optimizer = AdamW([{'params': model.roberta.parameters(), 'lr': LR},
                    {'params': [param for name, param in model.named_parameters() if 'roberta' not in name], 'lr': 1e-3} ], lr=LR, weight_decay=0)
xm.optimizer_step(optimizer)

Do I miss some contenxt and it is indeed compulsory to train using AdamW or torch_xla? Or am doing some stupid mistake?

PS:

  1. Am running this no colab . Its pip shows torch version 1.10.0+cu111 and python 3.7.13. I have run codeserver on colab through colabcode and debugging in browser based vscode.

  2. I was able to train Bert with Adam optimizer earlier.

Topic huggingface bert pytorch python

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.