What is ChunkParserI in nltk.chunk ? What exactly it has been called for?

from nltk.chunk import ChunkParserI 
from nltk.chunk.util import conlltags2tree 
from nltk.corpus import gazetteers 

class LocationChunker(ChunkParserI): 
    def __init__(self): 
        self.locations = set(gazetteers.words()) 
        self.lookahead = 0
        for loc in self.locations: 
            nwords = loc.count(' ') 
        if nwords  self.lookahead: 
            self.lookahead = nwords 

What is ChunkParserI in nltk.chunk ? What exactly it has been called for? Also, please explain the code. What is the difference between chunking and parsing?

Topic nltk parsing nlp

Category Data Science


Parsing is the process of decomposing a string into it's constituent symbols (if the string is a word or a sequence of characters) or syntactic components (if the string is a meaningful textual entity like a short story, a scientific abstract or a sentence). In an NLP context, when one talks about parsing, he/she usually refers to the latter interpretation.

Chunking (in an NLP context) is a specific form of parsing in that it extracts groups of words in so-called 'chunks'. These groups of words or chunks are 'meaningful short phrases from the sentence (tagged with Part-of-Speech). Chunks are thus made up of words and the kinds of words are defined using the part-of-speech tags. One can even define a pattern or words that can't be a part of chunk and such words are known as 'chinks''1. The latter can be defined with chunking rules.

I assume the code you posted comes from "Natural Language Processing: Python and NLTK" by Hardeniya et al.2? From there i can find that the LocationChunker class 'starts by constructing a set of all locations in the gazetteers corpus. Then, it finds the maximum number of words in a single location string so it knows how many words it must look ahead when parsing a tagged sentence.' (cf. Chapter 5, p. 319)

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.