stanford-nlp

How to predict the sentiment of the entities form the tweet?

coding_ninza

2022年6月4日 17:05

I have a JSON file (tweets.json) that contains tweets (sentences) along with the name of the author. Objective 1: Get the most frequent entities from the tweets. Objective 2: Find out the sentiment/polarity of each author towards each of the entities. Sample Input: Assume we have only 3 tweets: Tweet1 by Author1: Pink Pearl Apples are tasty but Empire Apples are not. Tweet2 by Author2: Empire Apples are very tasty. Tweet3 by Author3: Pink Pearl Apples are not tasty. Sample …

Topic: spacy stanford-nlp sentiment-analysis language-model nlp

Category: Data Science

Trying to compress text with NLP

Fmkit

2022年5月13日 02:02

For a university project, I need to send text in Spanish via SMS. As these have a cost, I am trying to compress this text in an inefficient way. This consists of first generating a permutation of codes formed by two characters of many alphabets (fines, Cyrillic, etc.) to which I assign a word that has more than two characters (to say that it is being compressed). Then I take each word in a sentence and assign it its associated …

Topic: stanford-nlp nltk regression nlp machine-learning

Category: Data Science

what is sentence embeding and how to do sentence embedding for a sentence and how to use word embedding to create a sentence embedding?

Aj_MLstater

2022年4月29日 22:05

What is sentence embedding? How would you do sentence embedding for a sentence like: "How old are you?" How do you use word embedding to create a sentence embedding?

Topic: stanford-nlp word-embeddings nlp

Category: Data Science

In smoothing of n-gram model in NLP, why don't we consider start and end of sentence tokens?

KGhatak

2022年4月19日 08:00

When learning Add-1 smoothing, I found that somehow we are adding 1 to each word in our vocabulary, but not considering start-of-sentence and end-of-sentence as two words in the vocabulary. Let me give an example to explain. Example: Assume we have a corpus of three sentences: "John read Moby Dick", "Mary read a different book", and "She read a book by Cher". After training our bi-gram model on this corpus of three sentences, we need to evaluate the probability of …

Topic: stanford-nlp ngrams language-model nlp

Category: Data Science

Lemmatization Vs Stemming

ashirwad

2022年3月3日 19:12

I have been reading about both these techniques to find the root of the word, but how do we prefer one to the other? Is "Lemmatization" always better than "Stemming"?

Topic: stanford-nlp nlp

Category: Data Science

GloVe dot product optimized for non-comutative data whilst the operation itself being commutative

Arik

2022年2月23日 21:24

To my current knowledge, GloVe word vectors dot product are optimized to be the w_i ⋅ w_j = log⁡(P(ⅈ|j)) The probability being computed from a cooccurance matrix. However, dot product is a commutative operation, whilst the log probablity isn't. Is this issue being adressed in GloVe? Am I missing something?

Topic: mathematics stanford-nlp word2vec word-embeddings nlp

Category: Data Science

Extracting information with corresponding fields

user10351235

2022年2月17日 04:06

I have large pool of scanned county documents. I need to extract information like document title, borrower name&address, lender name&address etc. The text is like this Eg: the deed of trust, between abc llc, a limited company, whose address is XXXXXX, herein called "borrower", and xyz, whose address is XXXXX,herein called "lender". I used Named entity recognition method to extract the names, it works well. but how would i know which name is borrower and which one is lender? can …

Topic: stanford-nlp nlp python machine-learning

Category: Data Science

How can I extract the reason of the legal compensation from a court report?

Omar Souaidi

2022年2月12日 01:09

I'm working on a project (court-related). At a certain point, I have to extract the reason of the legal compensation. For instance, let's take these sentences (from a court report) Order mister X to pay EUR 5000 for compensation for unpaid wages and To cover damages, mister X must pay EUR 4000 to mister Y I want to make an algorithm that is able from this sentence to extract the motive of legal compensation. For the first sentence Order mister …

Topic: stanford-nlp data deep-learning nlp

Category: Data Science

How to i get word embeddings for out of vocabulary words using a transformer model?

cerofrais

2022年2月7日 19:04

When i tried to get word embeddings of a sentence using bio_clinical bert, for a sentence of 8 words i am getting 11 token ids(+start and end) because "embeddings" is an out of vocabulary word/token, that is being split into em,bed,ding,s. I would like to know if there is any aggregation strategies available that make sense apart from doing a mean of these vectors. from transformers import AutoTokenizer, AutoModel # download and load model tokenizer = AutoTokenizer.from_pretrained("emilyalsentzer/Bio_ClinicalBERT") model = AutoModel.from_pretrained("emilyalsentzer/Bio_ClinicalBERT") …

Topic: huggingface transformer tokenization stanford-nlp nlp

Category: Data Science

Model Overfitting in text classification how to solve?

rabia qayyum

2022年1月31日 05:24

This is my CNN model i am doing text classification on mental health social media data. the model is overfitting as validation loss is much greater than training loss. There are three columns(Text, Title,label) and 7 classes in dataset depression 256140 Anxiety 85916 bipolar 41262 mentalhealth 39161 BPD 37996 schizophrenia 17388 autism 7110 I am providing my model and its history. For this particular issue i need, interpretation of how to interpret the model and a solution. Here is my …

Topic: text-classification overfitting stanford-nlp convolutional-neural-network classification

Category: Data Science

Reversing a dependency tree into the original sentence

Carlos Vázquez Monzón

2022年1月26日 21:58

I'm wondering if it is possible to convert a dependency parser such as (ROOT (S (NP (PRP$ My) (NN dog)) (ADVP (RB also)) (VP (VBZ likes) (S (VP (VBG eating) (NP (NN sausage))))) into "My dog also likes eating sausage." with Standford CoreNLP or otherwise

Topic: stanford-nlp nlp

Category: Data Science

How to evaluate triple extraction in NLP?

SageMaker

2021年12月14日 18:07

I am current NLP work, I am extracting triples using triple extraction function in Stanford NLP and Spacy libraries. I am looking for a good method to evaluate how good the extraction has been? Any suggestions

Topic: text-classification stanford-nlp text-mining nlp machine-learning

Category: Data Science

Closed Domain Question Answering which doesn't answer Questions

Anirban Saha

2021年12月3日 17:07

I've been exploring Closed Domain Question Answering Implementations which have been trained on SQuAD 2.0 dataset. Ideally, it should not answer questions which the context text corpus doesn't contain answers to. But while implementing such models using the Haystack repo or the FARM repo, I'm finding that it always answers these questions even when it shouldn't. Is there any implementation available that takes into account the fact that it shouldn't answer questions when it doesn't find a suitable answer. References: …

Topic: question-answering stanford-nlp nlp

Category: Data Science

How to extract numerical information from text descriptions

Mohsen Sichani

2021年11月21日 16:58

I have an attribute that is the description of an operation (i.e description of a building consent), I need to translate this to a mathematical operation. I need to find out the new number of dwelling that is going to build, and I have to ignore any other operation. I am not sure how to tackle this problem. I can do Regex, and do lots of searches but there should be a smarter way (is there???) by using machine learning/text …

Topic: ai stanford-nlp text-mining nlp machine-learning

Category: Data Science

How does Stanford CRF encode NER string features?

maxbeaudoin

2021年11月17日 13:39

Most features created by the NERFeatureFactory are strings e.g. from usePrev, useNext, useNGrams etc. From my understanding, that's too many tokens to fit in a dictionary or to use embeddings. I don't see how the UNKNOWN embedding would bring any value given that most features are not known words. I've been looking at the code on Github but haven't figured it out yet. I love New York! > love > love-I-W-PW, love-New-W-NW, #lo#, #ov#, #ve# etc

Topic: stanford-nlp java named-entity-recognition feature-extraction machine-learning

Category: Data Science

How do I go for NLP based on phrases instead of sentences?

girl101

2021年10月29日 16:19

I have a list of words in this format: chem, chemistry chemi, chemistry chm, chemistry chmstry, chemistry Here, the first column represents the actual word which is in the second column. I need to apply NLP (in python3) so that when the model is trained using this dataset and I give 'chmty' as input, it will give 'chemistry' as output. I don't want string similarity techniques, I want to build an NLP model.

Topic: machine-learning-model stanford-nlp nlp python machine-learning

Category: Data Science

How to stay up to date in NLP and use the best approaches?

Thomas Lee

2021年9月16日 18:10

There are many fast advancements in NLP field, BERT, RoBERTa, ALBERT, and XLNe, and no one can check the news or papers daily. Is there any way or site that keeps track of all these new developments and possibly provides a link to the code? For example, if someone needs to use text summarization, the suggested approach would be X, and so on.

Topic: text-classification stanford-nlp text-mining nlp machine-learning

Category: Data Science

Adding additional classes in stanford NLP NER or Spacy

Aniiya0978

2021年9月10日 05:17

For stanford NER 3 class model, Location, Person, Organization recognizers are available. Is it possible to add additional classes to this model. For example : Sports as one class to tag sports names. or if not, is there any model where i can add additional classes. Note: I didnt exactly mean to add "sports" as a class. I was wondering is there a possibility to add a custom class in that model. If not possible in stanford, is it possible …

Topic: spacy python-3.x stanford-nlp nltk python

Category: Data Science

Entity Recognition in Stanford NLP using Python

idpd15

2021年8月14日 02:00

I am using Stanford Core NLP using Python. I have taken the code from here. This is the code: from stanfordcorenlp import StanfordCoreNLP import logging import json class StanfordNLP: def __init__(self, host='http://localhost', port=9000): self.nlp = StanfordCoreNLP(host, port=port, timeout=30000 , quiet=True, logging_level=logging.DEBUG) self.props = { 'annotators': 'tokenize,ssplit,pos,lemma,ner,parse,depparse,dcoref,relation,sentiment', 'pipelineLanguage': 'en', 'outputFormat': 'json' } def word_tokenize(self, sentence): return self.nlp.word_tokenize(sentence) def pos(self, sentence): return self.nlp.pos_tag(sentence) def ner(self, sentence): return self.nlp.ner(sentence) def parse(self, sentence): return self.nlp.parse(sentence) def dependency_parse(self, sentence): return self.nlp.dependency_parse(sentence) def annotate(self, sentence): return …

Topic: stanford-nlp nlp python

Category: Data Science

Restrict Date parser in certain cases

Deepak Sharma

2021年8月12日 11:15

Sorry if the title wasn't self-explanatory. Here is a detailed version. I created a data parser to parse dates from resumes. The ultimate goal is to find how many years of work experience a candidate has "based on the resume." The parser can catch dates in all formats like: MM/DD/YY - MM/DD/YY MM/DD/YYYY - MM/DD/YYYY Apr 09 - Jul 11 03/09 - 07/11 2007 - 2010 etc. The way in which the parser works is it first extracts all the …

Topic: stanford-nlp text-mining nlp python machine-learning

Category: Data Science

About