How to process list type questions in Question Answering task

How to generate question-answer-context triplets for questions with multiple answer strings? How to measure performance for it?

For a question with one single answer, we generate one question-answer-context triplet, and calculate EM/F1 score. Then take average scores of the whole training set as the overall performance.

For a list type question, is it correct to generate multiple triplets for each candidate answer string as separate records in the training set? Even they would share the same context and question. When calculating performance, should we combine answers from separate triplets (which share the same question and context) first, then compare them with the 'true' answer list of the question. Or just take average scores as other records with different questions in the training set, like how we calculate performance for questions with single answer?

Topic question-answering deep-learning nlp

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.