Loading a Model with weights and optimizers without creating an instance in PyTorch

Question

Loading a Model with weights and optimizers without creating an instance in PyTorch

Übermensch

2022年5月14日 18:03

I recently downloaded Camembert Model to fine-tune it for my purpose.

Upon unzipping the file the contents are:

Upon loading the model.pt file using pytorch:

import torch
model = torch.load(model_saved_at)

I saw that model was in OrderedDict format containing the following keys:

args
model
optimizer_history
extra_state
last_optimizer_state

As the name suggests most of them are OrderedKeys themselves with the exception of args which belongs to a class argsparse.Namespace. Using vars() we can see args only contains some hyperparameters and values which are to be passed from the command-line.

model[model] contains the weights which I want to load and use as my base model. A small part of it is as shown below:

for ans in model[model].keys():
    try:
        print(ans, \t ,model[model][ans].size())
    except:
        print(ans, type(ans))

decoder.sentence_encoder.embed_tokens.weight     torch.Size([32005, 768])
decoder.sentence_encoder.embed_positions.weight      torch.Size([514, 768])
decoder.sentence_encoder.layers.0.self_attn.in_proj_weight   torch.Size([2304, 768])
decoder.sentence_encoder.layers.0.self_attn.in_proj_bias     torch.Size([2304])
decoder.sentence_encoder.layers.0.self_attn.out_proj.weight      torch.Size([768, 768])
decoder.sentence_encoder.layers.0.self_attn.out_proj.bias    torch.Size([768])
decoder.sentence_encoder.layers.0.self_attn_layer_norm.weight    torch.Size([768])
decoder.sentence_encoder.layers.0.self_attn_layer_norm.bias      torch.Size([768])
decoder.sentence_encoder.layers.0.fc1.weight     torch.Size([3072, 768])
decoder.sentence_encoder.layers.0.fc1.bias   torch.Size([3072])
decoder.sentence_encoder.layers.0.fc2.weight     torch.Size([768, 3072])
decoder.sentence_encoder.layers.0.fc2.bias   torch.Size([768])

However, I cannot use load_state_dict() since I have no instance of this class. How am I suppose to load the weights and optimization parameters without creating an instance? I thought of using sentence.bpe.model but they are for tokenization purposes.

Topic bert pytorch nlp

Category Data Science

SrJ · Accepted Answer · 2020年8月5日 10:01

If you are open to using huggingface transformer for fine tuning which is really popular, here is a code sample:

import transformers
class Model(nn.Module):
    def __init__(self):
        super(Model,self).__init__()
        self.Bert = transformers.CamemBertModel.from_pretrained('camembert-base')
        self.fc0 = nn.Linear(768,1)

        nn.init.normal_(self.fc0.weight,std= 0.1)
        nn.init.normal_(self.fc0.bias ,0.)
        
    def forward(self,input_ids,attention_mask):
        hid= self.Bert(input_ids,attention_mask = attention_mask)
        hid= hid[0][:,0]
        x = self.fc0(hid)
        return x

You can change your last layer as you need.It is just a sample. It will load the pretrained weights from huggingface.You need not provide it.You can install transformers by the following line in terminal.

pip3 install transformers

Loading a Model with weights and optimizers without creating an instance in PyTorch

About