Generation of medical institution names: training corpora?
My question is quite similar to this one: Generation of institution names. I need to be able to produce 'fake' names of medical institutions, specifically to create data for unit tests. Unfortunately, simple tools like Faker do not work well for this task, so I am interested in a more sophisticated solution, possibly involving some NER model(s). My question here is where can I get text corpora for training the model? The texts must contain (human-)recognizable names of medical institutions, preferably in a number of languages. I have seen allusions that this might be done by scraping PubMed or other Web sources - are there possibly some concrete examples or howtos?
Topic text-generation named-entity-recognition python
Category Data Science