Unable to resolve Type error using Tokenizer.tokenize from NLTK
I want to tokenize text data and am unable to proceed due to a type error, am unable to know how to proceed to rectify the error, To give some context - all the columns - Resolution code','Resolution Note','Description','Shortdescription' are text data in English- here is the code that I have written :
#Removal of Stop words:
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.corpus import stopwords
from nltk.tokenize import RegexpTokenizer
tokenizer = RegexpTokenizer(r'\w+')
stop_words = set(stopwords.words('english'))
tokenizer = RegexpTokenizer(r'\w+')
dfclean_imp_netc=pd.DataFrame()
for column in ['Resolution code','Resolution Note','Description','Shortdescription']:
dfimpnetc[column] = dfimpnetc[column].apply(tokenizer.tokenize)
for column in ['Resolution code','Resolution Note','Description','Short description']:
dfclean_imp_netc[column] = dfimpnetc[column].apply(lambda vec: [word for word in vec if word not in stop_words])
dfimpnetc['Resolution Note'] = dfclean_imp_netc['Resolution Note']
dfimpnetc['Description'] = dfclean_imp_netc['Description']
dfimpnetc['Short description'] = dfclean_imp_netc['Short description']
dfimpnetc['Resolution code'] = dfclean_imp_netc['Resolution code']
Topic tokenization nltk python
Category Data Science