Spacy custom POS tagging for medical concepts

We are a group of doctors trying to use linguistic features of Spacy, especially the part of speech tagging to show relationships between medical concepts like:

'Femoral artery pseudoaneurysm as in ==

femoral artery ['Anatomical Location'] -- and pseudoaneurysm ['Pathology']

We are new to NLP and spacy, can someone with experience with NLP and Spacy explain if this is a good approach to show these relationships in medical documents? If not what are the other alternative methods?

Many thanks!

Topic spacy nlp machine-learning

Category Data Science


Based on the example, it looks like you need more than simple POS tagging. Thankfully there is a full subdomain of NLP devoted to biomedical data, and there are many tools available which can help with this kind of task:

  • In case the data is made of biomedical research papers, you will find a lot of resources related to the Medline and PubMedCentral databases:
  • cTakes is another annotator system which is more specialized with clinical texts.
  • SciSpacy is a Spacy variant specialized for biomedical text. It can also annotate medical terms with UMLS labels.

The last one in particular seems particularly appropriate in your case. biomedical text presents a lot of specific difficulties which cannot be handled with general domain models.

Note that there are probably more tools and resources, this a very active domain.

(disclaimer: I recycled a large part of an older answer)

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.