Select best answer from several existing ones for a question

Question

Select best answer from several existing ones for a question

dokondr

2020年3月11日 10:27

After analyzing questions on a forum, a human support team has created a set of general answers, that can be used to provide basic answers on the forum.

I am trying to build a system that:

Selects best answer from this set of answers for a given question. How to do this?
Estimates acceptability of such an answer. Which metrics to use?

Using document embeddings, such as doc2vec to find similarity between question and answer does not solve the problem, I think. Other ideas?

Update 1

In my case I don't have labeled data set with good answers to train my model. My problem is unsupervised learning problem.

Topic question-answering nlp

Category Data Science

Sonu · Accepted Answer · 2020年3月11日 07:51

This problem is multiple choice answering question. I can see you have already tried gensim, doc2vec etc. You can try pytorch based transformer solution. Here is the link: multiple-choice . You can create your data in swag format and remove --do_train in below code for prediction on your dataset.

It has been trained on swag dataset and has given decent accuracy.

If it works fine for you then good, else you would like to finetune it. For finetuning --do_train should be mentioned below:

#training on 4 tesla V100(16GB) GPUS
export SWAG_DIR=/path/to/swag_data_dir
python ./examples/run_multiple_choice.py \
--model_type roberta \
--task_name swag \
--model_name_or_path roberta-base \
--do_train \
--do_eval \
--do_lower_case \
--data_dir $SWAG_DIR \
--learning_rate 5e-5 \
--num_train_epochs 3 \
--max_seq_length 80 \
--output_dir models_bert/swag_base \
--per_gpu_eval_batch_size=16 \
--per_gpu_train_batch_size=16 \
--gradient_accumulation_steps 2 \
--overwrite_output

Select best answer from several existing ones for a question

About