weka - Geeks Mental

Color name prediction

gammay

2022年6月3日 18:10

Given data: R G B Color 0 0 0 Black 255 255 255 White 255 0 0 Red 0 255 0 Lime 0 0 255 Blue 255 255 0 Yellow 0 255 255 Cyan_Aqua ... Can we predict the color given an RGB input? For example, 224, 255, 255=light_cyan The goal is to generate logical names and not random names. For instance, if the data contains "green", a closest match with lighter hue, would be named "light green". If yes, …

Topic: weka

Category: Data Science

Multivariate analysis

sanjeewa dayarathne

2022年5月22日 05:02

Can we use weka for multivariate data analysis? When we have more than one variable as the dependent variable... ( without using factor analysis to reduce the number of variables associated with the dependent variable). Thank you

Topic: weka

Category: Data Science

Increasing minNumObj increasing accuracy in decision tree

Gooze_Berry

2022年5月16日 18:02

I have been using a J48 classifier in weka and have noticed that increasing minNumObj -- The minimum number of instances per leaf leads to a small accuracy increase. -M Result. Size Num Leaves 2 73.8281 % 39 20 3 74.2188 % 39 20 4 74.4792 % 37 19 5 74.6094 % 25 13 6 74.2188 % 23 12 7 74.2188 % 23 12 8 74.349 % 23 12 9 75.2604 % 29 15 10 75.5208 % 29 15 11 …

Topic: weka data-mining machine-learning

Category: Data Science

Is my model overfitting? Weka Random Forest

Amben Uchiha

2022年5月15日 01:00

I have the following result from weka. As I observed the result I have noticed the ROC area is above 90 and the correctly classified instances is 85% Is this a sign of overfitting?

Topic: weka random-forest

Category: Data Science

How do I simultaneously select multiple values for k-means in WEKA?

Abdulaziz Ghalib

2022年5月13日 14:00

I have tried WEKA's Experimenter. However, it's for classification. I'm looking for a way to apply the k-means algorithm on the same dataset but with multiple 'k' values. Is there any option in WEKA's GUI allow this?

Topic: weka clustering

Category: Data Science

Text classification with Weka (unlimited dependent variable values)

Sue Nile

2022年3月8日 11:52

In our dataset we have 2 attributes, citizen and nric. The rule is if citizen is US, then the result should be the nric value, otherwise Non-US. Could you please suggest which algorithm in Weka I should use and most importantly how to defind this dataset in ARFF format. Here to note is nric can be any random text value. There is no fixed value set for nric and result. Train dataset citizen nric result US US123 US123 CA CA332 …

Topic: machine-learning-model weka classification

Category: Data Science

ValueError for Chi2 Python

FredNina

2022年2月15日 10:49

I am running Feature selection using Chi2 code on some data ,the diabetes dataset and the HR dataset from Kaggle. While running the code on diabetes, all is good because the values are all numeric hence are converted to float. But the HR data has string values such as "Job Title" , so Python cannot convert it into a float understandably. My question is, is there a way I could run such a code on non numeric data to derive …

Topic: chi-square-test weka python

Category: Data Science

In Weka, how to draw learning curve evaluated on both test and training set?

modeller

2022年2月8日 02:44

This is just for finding overfitting gap. After initial research, I can only find method to draw learning curve using evaluation of test set. However, I could not evaluate on training set and over the two learning curves.

Topic: weka machine-learning

Category: Data Science

What does random seed value mean in Weka?

Ahmad Sarairah

2022年2月5日 23:06

I am using Weka to make a dataset classification, but there is an option in the classifier evaluation (random seed for XVAL/% split). What does this option mean and what is the seed value? Also, what is the effect of changing the value of this option from one to two or three or other values? I read that the value of the seed is the starting point, but what is the difference if it is the starting point (seed value) …

Topic: weka classification data-mining

Category: Data Science

Various algorithms performance in a problem and what can be deduced about data and problem?

Ando Jurai

2022年2月3日 14:55

HI I am currently trying to apply various algorithms to a classification problem to assess which could be better and then try to fine tune the bests of the first approach. I am a beginner so I use Weka for now. I have basic ML concept understanding but am not in the details of algorithms yet. I observed that on my problem, RBF networks performed vastly worse than IBK and other K methods. From what I read about RBF networks, …

Topic: weka random-forest dimensionality-reduction k-means machine-learning

Category: Data Science

Is my model difference between Training accuracy and cross validation accuracy considered as overfit?

Amben Uchiha

2022年2月1日 04:28

So I used weka to determine my training accuracy and cross-validation accuracy. It has showed me that my training accuracy is 84.9167 % and my Cross validation accuracy is 83.9167 % I also tried to use sklearn to determine my training and cross validation accuracy and gave me the following: 83.5% on training and 82.67% on cross validation accuracy. Is the difference between training accuracy and cross validation accuracy enough to consider my model overfit?

Topic: weka scikit-learn

Category: Data Science

The Differences Between Weka Random Forest and Scikit-Learn Random Forest

David Tian

2022年1月5日 21:31

I have used both weka random forest and sklearn random forest in my research, but I have realised that they use different methods to combine the predictions of the base learners i.e. decision trees to make the final prediction. To predict the class of an instance, weka random forest uses majority vote which predicts the class of the instance as the class predicted by majority of the decision trees. The class probability of the instance is computed as fraction of …

Topic: weka random-forest scikit-learn

Category: Data Science

Interpreting evaluation metrics with threshold/cutoff

khair

2021年10月19日 17:05

I was doing churn prediction for a company. I've got the following results by applying 3 classifier. Model Accuracy AUC Logistic Regression 0.671 0.736 Decision Tree (pruned) 0.681 0.665 Decision Tree unpruned 0.623 0.627 Now, I want to know two things: which model has a better accuracy for a cutoff of 0.9? As the logistic regression has highest AUC so, in my opinion, Logistic Regression is better Which model is the best in terms of ranking the predictions according to …

Topic: auc weka evaluation data-mining machine-learning

Category: Data Science

Attribute selection from dataset

andrikoulas

2021年9月26日 16:02

I work on a dataset with numeric values.The class labels has also numeric values.I made the 6 numeric class labels into one witch contain values like e.g happy_pleased. I want to insert the new .arff file into weka but i have a problem with the class @attribute as i declare it as nominal. Witch is the right type to declare the class label? I tried nominal but nothing happened

Topic: weka dataset data-mining

Category: Data Science

Parameters optimization algorithms in Weka

Khan

2021年3月9日 23:02

In Weka, I used the Grid and Random search parameters tuning algorithms but unfortunately, their performance (in terms of better prediction accuracy) is observed worst when we use the ML algorithms (Support Vector Regression, Linear Regression etc) without any optimization algorithms. I wonder how it is possible? I mean one algorithm (Grid or Random search) should perform better or worst when compared with each other but they have worst performance compared to without any parameter optimization algorithms. I even tried …

Topic: hyperparameter-tuning grid-search weka

Category: Data Science

Analysis of Alternating Decision Tree on Weka

SCodes

2021年2月7日 04:03

I am applying the AD Tree algorithm & this is the tree visualization of the output: I can't understand the values in the decision nodes (-0.4,0.541,-0.882...), How are these calculated? & how did we calculate the root node's score? Are predicate conditions (<127.5..) formed by entropy splitting mechanism? This is an image of the output: Any help is appreciated, cannot find any AD Tree output analysis document!!

Topic: weka decision-trees classification visualization algorithms

Category: Data Science

Optimum minimum number of instances in weka's j48

Mertkan SIMSEK

2020年12月29日 23:42

There is a parameter named minnumobj in the options of the j48 tree algorithm in weka. This parameter indicates the minimum number of participants to be in a leaf. Does this parameter have an optimum value regarding the instances or attributes? Or should I set the value that achieves the highest classification success? For example, I have 6500 instances and 9 attributes and I got highest success when minnumobj=100.

Topic: weka decision-trees

Category: Data Science

I need help in PCA results using WEKA Tool

Marwa A.

2020年12月21日 02:07

I'm working on an experiment using KDD'99 cupset I have 42 features. the paper I 'm comparing with concludes that 3 features with precision ..% ok are the best subset to identify the attack X. In my experiment, I applied 4 different classifier through the PCA. How to compare between them in order to conclude the number of features used in my experiment ? how to explain my features in order to say that n features gives higher preciosion.

Topic: pca weka feature-extraction feature-selection

Category: Data Science

opening Weka from command line

Katie Melosto

2020年12月9日 08:05

I'm very new to computer coding and data science in general. I'm trying to open Weka and use it on a practice csv data set called weather.arff. The manual I have says to type this into command line: java weka.classifiers.j48.J48 -t weather.arff When I open the command line it goes to C:\User\Admin When I type in the above on command line it gives me message Error: Could not find or load main class weka.classifiers.j48.J48 Might someone give me some advice …

Topic: weka

Category: Data Science

How to interpret PCA rankings in Weka

Fraser Gilbert

2020年10月12日 18:01

I am struggling to understand what the rankings in Weka are representing. I.e. the coefficients for each attribute in the rank. What is the output in the Weka program for PCA telling me with these rankings? And how does this help me feature select attributes? Because right now its making no sense how they are ranked. My data set is 31000 rows with 13 attributes. My class is Income_brack > or <= 50k

Topic: pca weka feature-selection machine-learning

Category: Data Science

About