Is there a process flow to follow for text analytics?
I am trying to draw a process flow (like a template) to be followed while on text analysis projects. So far, I've come up with this.
Text Analytics Steps
- Data Collection
- Acquire data
- Convert data into plain text
- Remove Duplicate Entries
- Text Parsing and Extracting Features
- Tokenization
- Parsing
- Remove HTML characters
- Decode complex symbols to UTF-8
- Spell check
- Apostrophe look-up
- Remove punctuation marks
- Remove expressions / emojis
- Split attached words
- Slangs look-up
- Remove URLs
- Lemmatization / Stemming (Normalization of Tokens)
- Parts-of-Speech Tagging
- Text Filtering
- Remove start-words
- Remove stop-words
- Remove irrelevant words based on frequency
- Text Transformation
- Bag of Words Representation
- TF-IDF
- Text Mining / Analysis (whichever analysis needed)
- Text Categorization
- Text Classification (supervised)
- Topic Modeling (unsupervised)
- Text Clustering
- Similarity Analysis
- Sentiment Analysis
Is this flow in the right order of steps?
What are the steps/sub-steps that I am missing?
Does the process flow look like a template or go-to flow chart when undertaking any text analytics project?
Edit: Updated process flow
Topic text-filter text-mining nlp
Category Data Science