Entity Recognition in Stanford NLP using Python
I am using Stanford Core NLP using Python. I have taken the code from here.
This is the code:
from stanfordcorenlp import StanfordCoreNLP
import logging
import json
class StanfordNLP:
def __init__(self, host='http://localhost', port=9000):
self.nlp = StanfordCoreNLP(host, port=port,
timeout=30000 , quiet=True, logging_level=logging.DEBUG)
self.props = {
'annotators': 'tokenize,ssplit,pos,lemma,ner,parse,depparse,dcoref,relation,sentiment',
'pipelineLanguage': 'en',
'outputFormat': 'json'
}
def word_tokenize(self, sentence):
return self.nlp.word_tokenize(sentence)
def pos(self, sentence):
return self.nlp.pos_tag(sentence)
def ner(self, sentence):
return self.nlp.ner(sentence)
def parse(self, sentence):
return self.nlp.parse(sentence)
def dependency_parse(self, sentence):
return self.nlp.dependency_parse(sentence)
def annotate(self, sentence):
return json.loads(self.nlp.annotate(sentence, properties=self.props))
@staticmethod
def tokens_to_dict(_tokens):
tokens = defaultdict(dict)
for token in _tokens:
tokens[int(token['index'])] = {
'word': token['word'],
'lemma': token['lemma'],
'pos': token['pos'],
'ner': token['ner']
}
return tokens
if __name__ == '__main__':
sNLP = StanfordNLP()
text = r'China on Wednesday issued a $50-billion list of U.S. goods including soybeans and small aircraft for possible tariff hikes in an escalating technology dispute with Washington that companies worry could set back the global economic recovery.The country\'s tax agency gave no date for the 25 percent increase...'
ANNOTATE = sNLP.annotate(text)
POS = sNLP.pos(text)
TOKENS = sNLP.word_tokenize(text)
NER = sNLP.ner(text)
PARSE = sNLP.parse(text)
DEP_PARSE = sNLP.dependency_parse(text)
I am only interested in Entity Recognition which is being saved in the variable NER. The command NER is giving the following result:
The same thing if I run on Stanford Website, the output for NER is:
There are 2 problems with my Python Code:
1. '$' and '50-billion' should be combined and named a single entity.
Similarly, I want '25' and 'percent' as a single entity as it is showing in the online stanford output.
2. In my output, 'Washington' is shown as State and 'China' is shown as Country. I want both of them to be shown as 'Loc' as in the stanford website output. The possible solution to this problem lies in the documentation .
But I don't know which model am I using and how to change the model.
Topic stanford-nlp nlp python
Category Data Science