Which ML method for multiclass (non-binary) text classification should I choose (from SparkML)?

I am working on a quite big dataset that will be processed on the cluster, so this is why I am using PySpark for that purpose.

The presentable records of this dataset have a such structure:

+----------+------------+--------------------+--------------------+--------------------+
|         0|  07/29/2013|       Consumer Loan|        Vehicle loan|Managing the loan...|
|         1|  07/29/2013|Bank account or s...|    Checking account|Using a debit or ...|
|         2|  07/29/2013|Bank account or s...|    Checking account|Account opening, ...

After some preprocessing/data cleansing operations I would like to create and then obviously train a model that will classify issues (Issue) into some categories, that are still unknown. I have read some articles about TF-IDF, but not sure if this could be suitable for this case.

Topic pyspark classification machine-learning

Category Data Science


If you want to categorise your text using machine learning techniques, you have to get fixed length features from text to train any ML model. You can do That using bag of words, tf-idf, averaging word vectors. If you are using any deep learning based models, you can use LSTM with word vectors or CNN’s.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.