How does Stanford CRF encode NER string features?

maxbeaudoin

2021年11月17日 13:39

Most features created by the NERFeatureFactory are strings e.g. from usePrev, useNext, useNGrams etc. From my understanding, that's too many tokens to fit in a dictionary or to use embeddings. I don't see how the UNKNOWN embedding would bring any value given that most features are not known words. I've been looking at the code on Github but haven't figured it out yet.

I love New York! > love > love-I-W-PW, love-New-W-NW, #lo#, #ov#, #ve# etc

Topic stanford-nlp java named-entity-recognition feature-extraction machine-learning

Category Data Science

How does Stanford CRF encode NER string features?

About