Color name prediction

Given data: R G B Color 0 0 0 Black 255 255 255 White 255 0 0 Red 0 255 0 Lime 0 0 255 Blue 255 255 0 Yellow 0 255 255 Cyan_Aqua ... Can we predict the color given an RGB input? For example, 224, 255, 255=light_cyan The goal is to generate logical names and not random names. For instance, if the data contains "green", a closest match with lighter hue, would be named "light green". If yes, …
Topic: weka
Category: Data Science

Multivariate analysis

Can we use weka for multivariate data analysis? When we have more than one variable as the dependent variable... ( without using factor analysis to reduce the number of variables associated with the dependent variable). Thank you
Topic: weka
Category: Data Science

Increasing minNumObj increasing accuracy in decision tree

I have been using a J48 classifier in weka and have noticed that increasing minNumObj -- The minimum number of instances per leaf leads to a small accuracy increase. -M Result. Size Num Leaves 2 73.8281 % 39 20 3 74.2188 % 39 20 4 74.4792 % 37 19 5 74.6094 % 25 13 6 74.2188 % 23 12 7 74.2188 % 23 12 8 74.349 % 23 12 9 75.2604 % 29 15 10 75.5208 % 29 15 11 …
Category: Data Science

Text classification with Weka (unlimited dependent variable values)

In our dataset we have 2 attributes, citizen and nric. The rule is if citizen is US, then the result should be the nric value, otherwise Non-US. Could you please suggest which algorithm in Weka I should use and most importantly how to defind this dataset in ARFF format. Here to note is nric can be any random text value. There is no fixed value set for nric and result. Train dataset citizen nric result US US123 US123 CA CA332 …
Category: Data Science

ValueError for Chi2 Python

I am running Feature selection using Chi2 code on some data ,the diabetes dataset and the HR dataset from Kaggle. While running the code on diabetes, all is good because the values are all numeric hence are converted to float. But the HR data has string values such as "Job Title" , so Python cannot convert it into a float understandably. My question is, is there a way I could run such a code on non numeric data to derive …
Category: Data Science

What does random seed value mean in Weka?

I am using Weka to make a dataset classification, but there is an option in the classifier evaluation (random seed for XVAL/% split). What does this option mean and what is the seed value? Also, what is the effect of changing the value of this option from one to two or three or other values? I read that the value of the seed is the starting point, but what is the difference if it is the starting point (seed value) …
Category: Data Science

Various algorithms performance in a problem and what can be deduced about data and problem?

HI I am currently trying to apply various algorithms to a classification problem to assess which could be better and then try to fine tune the bests of the first approach. I am a beginner so I use Weka for now. I have basic ML concept understanding but am not in the details of algorithms yet. I observed that on my problem, RBF networks performed vastly worse than IBK and other K methods. From what I read about RBF networks, …
Category: Data Science

Is my model difference between Training accuracy and cross validation accuracy considered as overfit?

So I used weka to determine my training accuracy and cross-validation accuracy. It has showed me that my training accuracy is 84.9167 % and my Cross validation accuracy is 83.9167 % I also tried to use sklearn to determine my training and cross validation accuracy and gave me the following: 83.5% on training and 82.67% on cross validation accuracy. Is the difference between training accuracy and cross validation accuracy enough to consider my model overfit?
Category: Data Science

The Differences Between Weka Random Forest and Scikit-Learn Random Forest

I have used both weka random forest and sklearn random forest in my research, but I have realised that they use different methods to combine the predictions of the base learners i.e. decision trees to make the final prediction. To predict the class of an instance, weka random forest uses majority vote which predicts the class of the instance as the class predicted by majority of the decision trees. The class probability of the instance is computed as fraction of …
Category: Data Science

Interpreting evaluation metrics with threshold/cutoff

I was doing churn prediction for a company. I've got the following results by applying 3 classifier. Model Accuracy AUC Logistic Regression 0.671 0.736 Decision Tree (pruned) 0.681 0.665 Decision Tree unpruned 0.623 0.627 Now, I want to know two things: which model has a better accuracy for a cutoff of 0.9? As the logistic regression has highest AUC so, in my opinion, Logistic Regression is better Which model is the best in terms of ranking the predictions according to …
Category: Data Science

Attribute selection from dataset

I work on a dataset with numeric values.The class labels has also numeric values.I made the 6 numeric class labels into one witch contain values like e.g happy_pleased. I want to insert the new .arff file into weka but i have a problem with the class @attribute as i declare it as nominal. Witch is the right type to declare the class label? I tried nominal but nothing happened
Category: Data Science

Parameters optimization algorithms in Weka

In Weka, I used the Grid and Random search parameters tuning algorithms but unfortunately, their performance (in terms of better prediction accuracy) is observed worst when we use the ML algorithms (Support Vector Regression, Linear Regression etc) without any optimization algorithms. I wonder how it is possible? I mean one algorithm (Grid or Random search) should perform better or worst when compared with each other but they have worst performance compared to without any parameter optimization algorithms. I even tried …
Category: Data Science

Analysis of Alternating Decision Tree on Weka

I am applying the AD Tree algorithm & this is the tree visualization of the output: I can't understand the values in the decision nodes (-0.4,0.541,-0.882...), How are these calculated? & how did we calculate the root node's score? Are predicate conditions (<127.5..) formed by entropy splitting mechanism? This is an image of the output: Any help is appreciated, cannot find any AD Tree output analysis document!!
Category: Data Science

Optimum minimum number of instances in weka's j48

There is a parameter named minnumobj in the options of the j48 tree algorithm in weka. This parameter indicates the minimum number of participants to be in a leaf. Does this parameter have an optimum value regarding the instances or attributes? Or should I set the value that achieves the highest classification success? For example, I have 6500 instances and 9 attributes and I got highest success when minnumobj=100.
Category: Data Science

I need help in PCA results using WEKA Tool

I'm working on an experiment using KDD'99 cupset I have 42 features. the paper I 'm comparing with concludes that 3 features with precision ..% ok are the best subset to identify the attack X. In my experiment, I applied 4 different classifier through the PCA. How to compare between them in order to conclude the number of features used in my experiment ? how to explain my features in order to say that n features gives higher preciosion.
Category: Data Science

opening Weka from command line

I'm very new to computer coding and data science in general. I'm trying to open Weka and use it on a practice csv data set called weather.arff. The manual I have says to type this into command line: java weka.classifiers.j48.J48 -t weather.arff When I open the command line it goes to C:\User\Admin When I type in the above on command line it gives me message Error: Could not find or load main class weka.classifiers.j48.J48 Might someone give me some advice …
Topic: weka
Category: Data Science

How to interpret PCA rankings in Weka

I am struggling to understand what the rankings in Weka are representing. I.e. the coefficients for each attribute in the rank. What is the output in the Weka program for PCA telling me with these rankings? And how does this help me feature select attributes? Because right now its making no sense how they are ranked. My data set is 31000 rows with 13 attributes. My class is Income_brack > or <= 50k
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.