how to improve my imbalanced data NLP model?

I want to classify a patient's health as a prediction probability and get the top 10 most ill patients in a hospital. I have patient's condition notes, medical notes, diagnoses notes, and lab notes for each day. Current approach - vectorize all the notes using spacy's scispacy model and sum all the vectors grouped by patient id and day. (200 columns) find the unit vectors of the above vectors. (200 columns) use a moving average function on the vectors grouped …
Category: Data Science

How to build vocabulary file for NLP embeddings efficiently?

I am currently building various word embeddings for my NLP project, ranging from Word2Vec, ELMo, LINE etc. I am looking to train ELMo using AllenNLP, a Python package for NLP, using the tutorial here. To improve the efficiency during training, the tutorial recommends to input a vocab file for the entire corpus, similar to this, snippet is seen below. (FYI: First 3 tokens represents end-of-sentence, start-of-sentence and unknown tokens; tokens that appear more frequently will be at the start of …
Category: Data Science

AllenNLP installation issue - No matching distribution found for torchvision<0.9.0,>=0.8.1

As per demos, we are expected to install AllenNLP using following command: pip install allennlp==2.1.0 allennlp-models==2.1.0 But it always throws error: ERROR: Could not find a version that satisfies the requirement torchvision&lt;0.9.0,&gt;=0.8.1 (from allennlp==2.1.0) (from versions: 0.1.6, 0.1.7, 0.1.8, 0.1.9, 0.2.0, 0.2.1, 0.2.2, 0.2.2.post2, 0.2.2.post3, 0.3.0, 0.4.1, 0.5.0, 0.9.0, 0.9.1, 0.10.0) ERROR: No matching distribution found for torchvision&lt;0.9.0,&gt;=0.8.1 (from allennlp==2.1.0) Have tried installing torchvision separately but the version demanded by AllenNLP is not available at all. As can be seen …
Topic: allennlp
Category: Data Science

Getting Word Embeddings for Sentences using long-former model?

I am new to Huggingface and have few basic queries. This post might be helpful to others as well who are starting to use longformer model from huggingface. Objective: Create Sentence/document embeddings using longformer model. We don't have lables in our data-set, so we want to do clustering on output of embeddings generated. Please let me know if the code is correct? Environment info transformers version:3.0.2 Platform: Python version: Python 3.6.12 :: Anaconda, Inc. PyTorch version (GPU?):1.7.1 Tensorflow version (GPU?): …
Category: Data Science

How does the character convolution work in ELMo?

When I read the original ELMo paper (https://arxiv.org/pdf/1802.05365.pdf), I'm stumped by the following line: The context insensitive type representation uses 2048 character n-gram convolutional filters followed by two highway layers (Srivastava et al., 2015) and a linear projection down to a 512 representation. The Srivastava citation only seems to relate to the highway layer concept. So, what happens prior to the biLSTM layer(s) in ELMo? As I understand it, one-hot encoded vectors (so, 'raw text') are passed to a convolutional …
Category: Data Science

SpaCy vs AllenNLP?

I have used a little of both spaCy and allenNLP in my NLP projects. I like them both as they work very well with PyTorch (my DL framework choice!). But, I still cannot decide which one to master in a long term so that I can increase the pace of my NLP projects in future. Can someone please share their experience or suggest the differences between these 2 libraries or pros and cons?
Category: Data Science

NLP approaches to infer Processes from Text

I would like to use NLP techniques to infer a process out of raw text. For example, if I have a sentence like: Recruitment is about attracting and selecting the right person for the job. To get the following process: Attracting the right person. I noticed that a very good strong step forward is to use SpaCy, tokenizing the texts and filtering them for NOUNS. But from this point on, I'm completely blank. Someone suggested to me something named &quot;Semantic …
Category: Data Science

How to use regularizer in AllenNLP?

Apology if this sounds a bit lame. I am trying to use Allennlp for my NLP tasks and would like to use regularization to reduce overfitting. However from all the online tutorials, all the regularizers are set as None, and I still couldn't find out how to use the regularizer after many many attempts. If I use the example in the official tutorial (https://github.com/titipata/allennlp-tutorial) , what if I want to add in regularizer for LSTM and feedforward layer? class AcademicPaperClassifier(Model): …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.