Merging models: using a named entity recognition model to annotate data on a different dataset

Lets say we have two trained models Ma and Mb which were trained with different datasets in a Named Entity Recognition task. Those datasets A and B contain different document and also variables or text to recognize. For example:

  • Model A has been trained on dataset A with variables A_NAME, A_SURNAME, A_TITLE
  • Model B has been trained on dataset B with variables B_ORG, B_COUNTRY, B_ADDRESS

We now want to have a model Mc which detects all those variables altogether, but because documents in Dataset A and Dataset B are different - even if they both contain instances of all variables - we cannot reuse the manually annotated data and we don't have the resources to annotate a Dataset C.

My questions regarding how to solve this would include:

  • Is this a known problem that already has a name in literature or research? I have found 'partial labeling' to be often refering to other kind of problem.
  • Which would be a good solution for this problem?
  • Would be ok to just use Ma and Mb to annotate a common set of documents with all six different variables and then train a new model Mc on this automatically annotated data?
  • What would be the problems on doing so?
  • Any relevant papers?

Topic annotation text-classification machine-learning

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.