Using the Datumbox Machine Learning Framework for website classification - guidelines?

A short while ago, I came across this ML framework that has implemented several different algorithms ready for use. The site also provides a handy API that you can access with an API key.

I have need of the framework to solve a website classification problem where I basically need to categorize several thousand websites based on their HTML content. As I don't want to be bound to their existing API, I wanted to use the framework to implement my own.

However, besides some introductory-level data mining courses and associated reading, I know very little as to what exactly I would need to use. Specifically, I'm at a loss as to what exactly I need to do to train the classifier and then model the data.

The framework already includes some classification algorithms like NaiveBayes, which I know is well suited to the task of text classification, but I'm not exactly sure how to apply it to the problem.

Can anyone give me a rough guidelines as to what exactly I would need to do to accomplish this task?

Topic java classification machine-learning

Category Data Science


You can make use of the text classification class for your task.first make sure how you are going to classify the websites.(i.e as sports site,health and wealth site etc).Get some training data and train them & Done

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.