Here's what you should do
Prepare your dataset: Follow similar instructions as described in the paper and preprocess your dataset. This will be your major task as after this you will only have to fine-tune the model. If you don't have a dataset, you can use the dataset used in this research paper, which can be downloaded from here.
Download the pre-trained model. Or you can choose to start from the provided fine-tuned model checkpoint (from the link). You will have to check which version of the model works best for your dataset. If you select model fine-tuned for summarization task and your dataset is similar to CNN/DailyMail dataset [37] and Gigaword [36] you can skip fine-tuning.
Fine-tune model: In this step, you will be using the command mentioned in the readme of the Github repository. Note that, there are some parameters that should be according to the language model you have downloaded in the previous step. Based on the size of your dataset you can change the number of epochs in the following command. You should also note that this will require GPU. The repository readme recommends 2 or 4 v100-32G GPU cards for finetuning the model.
OUTPUT_DIR=/{path_of_fine-tuned_model}/
MODEL_RECOVER_PATH=/{path_of_pre-trained_model}/unilmv1-large-cased.bin
export PYTORCH_PRETRAINED_BERT_CACHE=/{tmp_folder}/bert-cased-pretrained-cache
export CUDA_VISIBLE_DEVICES=0,1,2,3
python biunilm/run_seq2seq.py --do_train --fp16 --amp --num_workers 0 \
--bert_model bert-large-cased --new_segment_ids --tokenized_input \
--data_dir ${DATA_DIR} \
--output_dir ${OUTPUT_DIR}/bert_save \
--log_dir ${OUTPUT_DIR}/bert_log \
--model_recover_path ${MODEL_RECOVER_PATH} \
--max_seq_length 192 --max_position_embeddings 192 \
--trunc_seg a --always_truncate_tail --max_len_a 0 --max_len_b 64 \
--mask_prob 0.7 --max_pred 48 \
--train_batch_size 128 --gradient_accumulation_steps 1 \
--learning_rate 0.00003 --warmup_proportion 0.1 --label_smoothing 0.1 \
--num_train_epochs 30
Evaluate your model: Use biunilm/decode_seq2seq.py
to decode (predict the output of evaluation dataset) and use the provided evaluation script to evaluate the trained model.
Use the trained model: In order to use this model to make a prediction, you can simply write your own python code to:
- load Pytorch pre-trained model using
pytorch_pretrained_bert
library as used in decode_seq2seq.py
file
- Tokenize your input
- predict the output and detokenize the output.
Here is the logic which you can use:
model = BertForSeq2SeqDecoder.from_pretrained(long_list_of_arguments)
batch = seq2seq_loader.batch_list_to_batch_tensors(input_batch)
input_ids, token_type_ids, position_ids, input_mask, mask_qkv, task_idx = batch
traces = model(input_ids, token_type_ids,position_ids, input_mask, task_idx=task_idx, mask_qkv=mask_qkv)
Note that this is not the complete logic. This code just show how github repository code has handled the saved model and used it to make predictions. Use traces
to convert ids to tokens and detokenize the output tokens (as used in the code here). The detokenization step is necessary as the input sequence is tokenized to subword units by WordPiece.
For reference here is the code which loads the pre-trained model. You can go through the loop to understand the logic and implement it in your case. I hope this helps.