h2o - Geeks Mental

why is H2O using only a part of the data?

Ben

2022年5月17日 08:19

I have this dataframe: > head(df_clas_sn) country serial_no_of_generator_1 serial_no_of_generator_2 serial_no_of_generator_3 unit_type 11 Germany XY 01 0620 ORiP 12 India XY 01 0631 ORiP 13 Germany XY 02 0683 ORiP 14 Germany XZ 02 0735 KRIT 15 England XY 03 0844 KRIT 16 Germany XZ 05 0243 ORiP position_in_unit hours_balance status_code 11 Y 2771 1 12 DE 3783 1 13 G 1267 1 14 DE 7798 1 15 G 1136 1 16 M 6197 1 with these dimensions: > dim(df_clas_sn) [1] …

Topic: h2o confusion-matrix r

Category: Data Science

How to extract the sample split (values) of decision tree leaves ( terminal nodes) applying h2o library

Sapiens

2022年4月27日 03:03

Sorry for a long story, but it is a long story. :) I am using the h2o library for Python to build a decision tree and to extract the decision rules out of it. I am using some data for training where labels get TRUE and FALSE values. My final goal is to extract the significant path (leaf) of the tree where the number of TRUE cases significantly exceeds that of FALSE ones. treemodel=H2OGradientBoostingEstimator(ntrees = 3, max_depth = maxDepth, distribution="bernoulli") …

Topic: h2o prediction data decision-trees python

Category: Data Science

Running H2O in databricks

physics

2022年3月29日 03:04

I am trying to run H2O in databricks. However, when I do the following: hc = pysparkling.H2OContext.getOrCreate(spark) I get the following error: java.lang.AbstractMethodError Does anyone know what the problem could be?

Topic: h2o data-science-model pyspark

Category: Data Science

AutoML for categorical feature encoding

The Great

2022年1月15日 17:34

I have an input dataset with more than 100 variables where around 80% of the variables are categorical in nature. While some variables like gender, country etc can be one-hot encoded but I also have few variables which have an inherent order in their values such rating - Very good, good, bad etc. Is there any auto-ML approach which we can use to do this encoding based on the variable type? For ex: I would like to provide the below …

Topic: automl h2o deep-learning neural-network machine-learning

Category: Data Science

Plotting Deviance Residuals and Leverage of GLM Model Using H2O

lostwanderer

2022年1月15日 02:27

Is it possible to plot the deviance residuals and leverage (e.g. cook's distance) of every observation fitted in a GLM model using H2O? From H2O's documentation, seems it only calculates the sum of all deviance residuals, but cannot output the residuals for each observation.

Topic: h2o glm python

Category: Data Science

H2O Python H2OModelSelectionEstimator

lostwanderer

2022年1月13日 02:31

I want to try H2O's Model Selection function in Python, but cannot load the library for some reason. The following code failed: from h2o.estimators import H2OModelSelectionEstimator Error message: cannot import name 'H2OModelSelectionEstimator' from 'h2o.estimators' Other H2O libraries like H2OGeneralizedLinearEstimator worked fine for me though https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/model_selection.html

Topic: h2o glm python

Category: Data Science

multi class classification : unbalanced data - good testing results poor prediction results

Swap

2021年9月24日 15:06

I have unbalanced dataset with 11 classes where 1 one class is 30% and rest are between 5-12%. I am not a hardcore programmer so I am using the product from https://www.h2o.ai/. I used GBM and DRF and used the option to balance the classes and the training results are great (98-99% precision and recall) as per the confusion matrix however when I test it on the validation set the only class where I get decent accuracy is the class …

Topic: h2o multiclass-classification class-imbalance classification

Category: Data Science

Which decision tree algorithm does H2O use?

wordsforthewise

2021年5月3日 17:13

Does H2O's plain random forest use CART, C4.5, 5.0, or something else? I cannot find this information. sklearn's docs say they use a modified version of CART, and I assume H2O also uses something like CART.

Topic: h2o decision-trees

Category: Data Science

How can I prevent overfitting?

user8419142

2021年4月10日 21:48

hope to find you well ! I am trying to build a model to classiffy customers with propensity to buy, but i cannot get rid of overfitting! My approach is the following: I have created the train dataset with unbalanced approach and have now a target 1 of 6% and a total of 6.755 rows and 252 columns. On the other hand, the test dataset has 313.587 rows and target 1 is only 34 of the cases (really low %). …

Topic: h2o overfitting classification r machine-learning

Category: Data Science

Which loss functions does h2o.gbm use by default?

user111690

2021年2月10日 23:03

the GBM implementation of the h2o package only allows the user to specify a loss function via the distribution argument, which defaults to multinomial for categorical response variables and gaussian for numerical response variables. According to the documentation, the loss functions are implied by the distributions. But I need to know which loss functions are used, and I can't find that anywhere in the documentation. I'm guessing it's the MSE for gaussian and cross-entropy for multinomial - does anybody here …

Topic: h2o loss-function gbm

Category: Data Science

h2o much faster than neuralnet (in R)

user110645

2021年1月21日 10:00

I’m a novice to machine learning. I've been trying out different neural network implementations in R, including the neuralnet package and the deeplearning function of the h2o package. For neuralnet, the default setting is one hidden layer with one hidden neuron. With this setting, the model takes several minutes to fit to my data. In the h2o package, the default is two layers with 200 neurons each, and the model takes only a few seconds. How is this possible? Are …

Topic: h2o neural-network r machine-learning

Category: Data Science

H2O deep learning model performance

user979974

2020年4月3日 18:01

I am discovering H2O deeplearning and I would like to have your point of view about the performance that's performed my model on classification problem. Do you think my model is overfitting? dl_fit2 <- h2o.deeplearning(x = predictors, y = response, training_frame = train, validation_frame = valid, epochs = 200, score_validation_samples=10000, score_duty_cycle=0.025, activation = "RectifierWithDropout", hidden = c(80, 10, 80), hidden_dropout_ratios = c(0.2, 0.2, 0.2), loss = "CrossEntropy", rate=0.01, rate_annealing=2e-6, adaptive_rate = FALSE, momentum_start = 0.2, momentum_ramp = 1e7, momentum_stable = …

Topic: h2o deep-learning

Category: Data Science

Modelling in python and scoring in MATLAB?

mlee_jordan

2019年7月7日 15:01

I have model objects either pickled object or H2O POJO. Is it possible to call those objects and do the scoring in MATLAB?

Topic: h2o matlab python

Category: Data Science

About