Would there be any reason to pretrain BERT on specific texts?
So the official BERT English model is trained on Wikipedia and BookCurpos (source).
Now, for example, let's say I want to use BERT for Movies tag recommendation. Is there any reason for me to pretrain a new BERT model from scratch on movie-related dataset?
Can my model become more accurate since I trained it on movie-related texts rather than general texts? Is there an example of such usage?
To be clear, the question is on the importance of context (not size) of the dataset.
Topic pretraining bert transfer-learning language-model
Category Data Science