Text classification with Weka (unlimited dependent variable values)
In our dataset we have 2 attributes, citizen and nric. The rule is if citizen is US, then the result should be the nric value, otherwise Non-US.
Could you please suggest which algorithm in Weka I should use and most importantly how to defind this dataset in ARFF format.
Here to note is nric can be any random text value. There is no fixed value set for nric and result.
Train dataset
citizen | nric | result |
---|---|---|
US | US123 | US123 |
CA | CA332 | Non-US |
US | US223 | US223 |
US | US776 | US776 |
DE | DE112 | Non-US |
SG | SG762 | Non-US |
MM | MM001 | Non-US |
Test dataset
citizen | nric | result |
---|---|---|
US | US777 | US777 |
JP | JP919 | Non-US |
IN | IN010 | Non-US |
Topic machine-learning-model weka classification
Category Data Science