Merging models: using a named entity recognition model to annotate data on a different dataset
Lets say we have two trained models Ma and Mb which were trained with different datasets in a Named Entity Recognition task. Those datasets A and B contain different document and also variables or text to recognize. For example:
- Model A has been trained on dataset A with variables A_NAME, A_SURNAME, A_TITLE
- Model B has been trained on dataset B with variables B_ORG, B_COUNTRY, B_ADDRESS
We now want to have a model Mc which detects all those variables altogether, but because documents in Dataset A and Dataset B are different - even if they both contain instances of all variables - we cannot reuse the manually annotated data and we don't have the resources to annotate a Dataset C.
My questions regarding how to solve this would include:
- Is this a known problem that already has a name in literature or research? I have found 'partial labeling' to be often refering to other kind of problem.
- Which would be a good solution for this problem?
- Would be ok to just use Ma and Mb to annotate a common set of documents with all six different variables and then train a new model Mc on this automatically annotated data?
- What would be the problems on doing so?
- Any relevant papers?
Topic annotation text-classification machine-learning
Category Data Science