Understanding text conversion into SVM input

Question

Understanding text conversion into SVM input

Vignesh Mohan

2015年5月24日 10:10

In Support Vector Machines, when used for sentiment analysis, text gets converted into a set of data points. How does this happen, usually?

Topic svm nlp libsvm machine-learning

Category Data Science

user242782 · Accepted Answer · 2015年5月22日 22:39

Well the text doesn't get converted into data points ... Let's say we are doing sentence level opinion mining.. Features are extracted from a sentence . Now it depends on case to case as to what features to use... A common one is bag of words models in which features become distinct words in sentence and the value of features are the frequency it is repeated in a sentence. Those frequencies are your data points.

score 1 · Accepted Answer · 2015年5月4日 00:22

Text can be converted to data via the use of concept clusters (after stemming and stopping), or to count (frequencies) via use of n-grams. N-grams are basically tabulations of the 1-gram count (frequency) of alphabet characters (a though z) in each document, and counts of 2-grams (aa to zz), 3-grams (aaa through zzz), up to about 5-grams (aaaaa through zzzzz). Beyond 5-grams, the data will be sparse and less informative. Thus, a dataset can be constructed for which rows represent documents, and columns represent n-grams. The data values themselves are the total number of occurrences of each gram found in each document.

FYI - n-grams have proven to be the best technique for identifying different languages based on characters.

Regarding SVMs, focus on the SVM literature.

Understanding text conversion into SVM input

About