Loading a Model with weights and optimizers without creating an instance in PyTorch

I recently downloaded Camembert Model to fine-tune it for my purpose.

Upon unzipping the file the contents are:

Upon loading the model.pt file using pytorch:

import torch
model = torch.load(model_saved_at)

I saw that model was in OrderedDict format containing the following keys:

args
model
optimizer_history
extra_state
last_optimizer_state

As the name suggests most of them are OrderedKeys themselves with the exception of args which belongs to a class argsparse.Namespace. Using vars() we can see args only contains some hyperparameters and values which are to be passed from the command-line.

model[model] contains the weights which I want to load and use as my base model. A small part of it is as shown below:

for ans in model[model].keys():
    try:
        print(ans, \t ,model[model][ans].size())
    except:
        print(ans, type(ans))
decoder.sentence_encoder.embed_tokens.weight     torch.Size([32005, 768])
decoder.sentence_encoder.embed_positions.weight      torch.Size([514, 768])
decoder.sentence_encoder.layers.0.self_attn.in_proj_weight   torch.Size([2304, 768])
decoder.sentence_encoder.layers.0.self_attn.in_proj_bias     torch.Size([2304])
decoder.sentence_encoder.layers.0.self_attn.out_proj.weight      torch.Size([768, 768])
decoder.sentence_encoder.layers.0.self_attn.out_proj.bias    torch.Size([768])
decoder.sentence_encoder.layers.0.self_attn_layer_norm.weight    torch.Size([768])
decoder.sentence_encoder.layers.0.self_attn_layer_norm.bias      torch.Size([768])
decoder.sentence_encoder.layers.0.fc1.weight     torch.Size([3072, 768])
decoder.sentence_encoder.layers.0.fc1.bias   torch.Size([3072])
decoder.sentence_encoder.layers.0.fc2.weight     torch.Size([768, 3072])
decoder.sentence_encoder.layers.0.fc2.bias   torch.Size([768])

However, I cannot use load_state_dict() since I have no instance of this class. How am I suppose to load the weights and optimization parameters without creating an instance? I thought of using sentence.bpe.model but they are for tokenization purposes.

Topic bert pytorch nlp

Category Data Science


If you are open to using huggingface transformer for fine tuning which is really popular, here is a code sample:

import transformers
class Model(nn.Module):
    def __init__(self):
        super(Model,self).__init__()
        self.Bert = transformers.CamemBertModel.from_pretrained('camembert-base')
        self.fc0 = nn.Linear(768,1)

        nn.init.normal_(self.fc0.weight,std= 0.1)
        nn.init.normal_(self.fc0.bias ,0.)
        
    def forward(self,input_ids,attention_mask):
        hid= self.Bert(input_ids,attention_mask = attention_mask)
        hid= hid[0][:,0]
        x = self.fc0(hid)
        return x

You can change your last layer as you need.It is just a sample. It will load the pretrained weights from huggingface.You need not provide it.You can install transformers by the following line in terminal.

pip3 install transformers

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.