Using the Datumbox Machine Learning Framework for website classification - guidelines?
A short while ago, I came across this ML framework that has implemented several different algorithms ready for use. The site also provides a handy API that you can access with an API key.
I have need of the framework to solve a website classification problem where I basically need to categorize several thousand websites based on their HTML content. As I don't want to be bound to their existing API, I wanted to use the framework to implement my own.
However, besides some introductory-level data mining courses and associated reading, I know very little as to what exactly I would need to use. Specifically, I'm at a loss as to what exactly I need to do to train the classifier and then model the data.
The framework already includes some classification algorithms like NaiveBayes, which I know is well suited to the task of text classification, but I'm not exactly sure how to apply it to the problem.
Can anyone give me a rough guidelines as to what exactly I would need to do to accomplish this task?
Topic java classification machine-learning
Category Data Science