nltk.corpus for data science related words?
from job description I scraped from the internet, I've went through all nlp processes and I've got to place where I found:
freq = nltk.FreqDist(lemmatized_list)
most_freq_words = freq.most_common(100)
which outputs:
[('data', 179),
('experience', 86),
('work', 78),
('business', 71),
('team', 59),
('learn', 56),
('model', 49),
('skills', 47),
('science', 41),
('use', 41),
('build', 39),
('machine', 37),
('ability', 36),.....
and so on. My problem is I do not want to consider words like "experience", "work", and only consider keywords related to data science. I'm guessing there is a corpus for data science terms which I can use like how I use stop word corpus to not select them. Let me know if there is a way, Thanks!
Category Data Science