Merging models: using a named entity recognition model to annotate data on a different dataset

Question

Merging models: using a named entity recognition model to annotate data on a different dataset

Ed.

2021年9月28日 12:12

Lets say we have two trained models Ma and Mb which were trained with different datasets in a Named Entity Recognition task. Those datasets A and B contain different document and also variables or text to recognize. For example:

Model A has been trained on dataset A with variables A_NAME, A_SURNAME, A_TITLE
Model B has been trained on dataset B with variables B_ORG, B_COUNTRY, B_ADDRESS

We now want to have a model Mc which detects all those variables altogether, but because documents in Dataset A and Dataset B are different - even if they both contain instances of all variables - we cannot reuse the manually annotated data and we don't have the resources to annotate a Dataset C.

My questions regarding how to solve this would include:

Is this a known problem that already has a name in literature or research? I have found 'partial labeling' to be often refering to other kind of problem.
Which would be a good solution for this problem?
Would be ok to just use Ma and Mb to annotate a common set of documents with all six different variables and then train a new model Mc on this automatically annotated data?
What would be the problems on doing so?
Any relevant papers?

Topic annotation text-classification machine-learning

Category Data Science

Merging models: using a named entity recognition model to annotate data on a different dataset

About