How does Stanford CRF encode NER string features?
Most features created by the NERFeatureFactory
are strings e.g. from usePrev
, useNext
, useNGrams
etc. From my understanding, that's too many tokens to fit in a dictionary or to use embeddings. I don't see how the UNKNOWN
embedding would bring any value given that most features are not known words. I've been looking at the code on Github but haven't figured it out yet.
I love New York!
> love
> love-I-W-PW
, love-New-W-NW
, #lo#
, #ov#
, #ve#
etc