Loading a Model with weights and optimizers without creating an instance in PyTorch
I recently downloaded Camembert Model to fine-tune it for my purpose.
Upon unzipping the file the contents are:
Upon loading the model.pt
file using pytorch:
import torch
model = torch.load(model_saved_at)
I saw that model
was in OrderedDict format containing the following keys:
args
model
optimizer_history
extra_state
last_optimizer_state
As the name suggests most of them are OrderedKeys
themselves with the exception of args
which belongs to a class argsparse.Namespace
. Using vars()
we can see args
only contains some hyperparameters and values which are to be passed from the command-line.
model[model]
contains the weights which I want to load and use as my base model.
A small part of it is as shown below:
for ans in model[model].keys():
try:
print(ans, \t ,model[model][ans].size())
except:
print(ans, type(ans))
decoder.sentence_encoder.embed_tokens.weight torch.Size([32005, 768])
decoder.sentence_encoder.embed_positions.weight torch.Size([514, 768])
decoder.sentence_encoder.layers.0.self_attn.in_proj_weight torch.Size([2304, 768])
decoder.sentence_encoder.layers.0.self_attn.in_proj_bias torch.Size([2304])
decoder.sentence_encoder.layers.0.self_attn.out_proj.weight torch.Size([768, 768])
decoder.sentence_encoder.layers.0.self_attn.out_proj.bias torch.Size([768])
decoder.sentence_encoder.layers.0.self_attn_layer_norm.weight torch.Size([768])
decoder.sentence_encoder.layers.0.self_attn_layer_norm.bias torch.Size([768])
decoder.sentence_encoder.layers.0.fc1.weight torch.Size([3072, 768])
decoder.sentence_encoder.layers.0.fc1.bias torch.Size([3072])
decoder.sentence_encoder.layers.0.fc2.weight torch.Size([768, 3072])
decoder.sentence_encoder.layers.0.fc2.bias torch.Size([768])
However, I cannot use load_state_dict()
since I have no instance of this class. How am I suppose to load the weights and optimization parameters without creating an instance? I thought of using sentence.bpe.model
but they are for tokenization purposes.
Category Data Science