Distractor Generation for Multiple Choice Questions

I'm currently working on generating distractor for multiple choice questions. Training set consists of question, answer and 3 distractor and I need to predict 3 distractor for test set. I have gone through many research papers regarding this but the problem in my case is unique. Here the problem is the questions and answers are for a comprehension(usually a big passage of text story) but the comprehension based on which is not given nor any supporting text is given for the question. Moreover, the answers and distractor are not a single word but sentences. The research paper I went mostly worked with some kind of support text. Even the SciQ dataset had some supporting text but the problem im working is different

This research paper was the one which I thought closely went by what I wanted and I'm planning to implement this. Below is an excerpt from the paper which the authors say worked better than NN models.

We solve DG as the following ranking problem: Problem. Given a candidate distractor set D and a MCQ dataset M = {(qi , ai , {di1, ..., dik})} N i=1, where qi is the question stem, ai is the key, Di = {di1...dik} ⊆ D are the distractors associated with qi and ai , find a point-wise ranking function r: (qi , ai , d) → [0, 1] for d ∈ D, such that distractors in Di are ranked higher than those in D − Di.

My questions are a) From what I understood, The above lines says we first create a big list containing all the distractors in the dataset and then we create a pointwise ranking function with respect to all distractors for every question? So if we have n questions and d distractors. We will have a (nxd) matrix where pointwise function values range between o and 1. Also, a question's own distractors should be ranked higher than the rest. Right?

To learn the ranking function, we investigate two types of models: feature-based models and NNbased models.

Feature-based Models: Given a tuple (q, a, d), a feature-based model first transforms it to a feature vector φ(q, a, d) ∈ R d with the function φ. We design the following features for DG, resulting in a 26-dimension feature vector:

  • Emb Sim. Embedding similarity between q and d and the similarity between a and d.
  • POS Sim. Jaccard similarity between a and d’s POS tags.
  • ED. The edit distance between a and d.
  • Token Sim. Jaccard similarities between q and d’s tokens, a and d’s tokens, and q and a’s tokens.
  • Length. a and d’s character and token lengths and the difference of lengths.
  • Suffix. The absolute and relative length of a and d’s longest common suffix.
  • Freq. Average word frequency in a and d.
  • Single. Singular/plural consistency of a and d. This
  • Wiki Sim.

My questions: Will these feature generation idea applies to both word distractors and sentence distractors? ( As per the paper, they claim it will).

Apart from all of these, I have other simple questions such as should I remove stopwords here?

I'm new to NLP. So any suggestions about which SOTA implementation would work here would be very helpful. Thanks in advance.

Topic text-generation deep-learning nlp python machine-learning

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.