automl

How to train a model to predict if 2 samples refer to the same thing?

Martin

2022年5月30日 14:04

I have 2 ddbb with around 60,000 samples each. Both have the same features (same column names) that represent particular things with text or categories (turned into numbers). Each sample in a ddbb is assumed to refer to a different particular thing. But there are some objects that are represented in both ddbb, yet with somewhat different values in the same-name column (like different open descriptions, or classified as another category). The aim is to train a machine learning model …

Topic: automl text-classification feature-engineering supervised-learning

Category: Data Science

Big difference between Bootstrap Values and Approximately Unbiased p-values

Mirko

2022年5月12日 09:07

I'm clustering objects over many different descriptors. I chose a hierarchical clustering method (specifically average linking algorithm with euclidean distances) because I wanted to use bootstrap values to give statistical significance to my clusters. I used pvclust (in python, it should be equivalent to r package pvclust). The package calculates both Bootstrap values BP and Approximately Unbiased p-values AU. The results are shown in this dendrogram I don't know how to interpret the fact that UA are relatively high while …

Topic: bootstraping automl clustering

Category: Data Science

Azure ML / AutoML: problem with univariate time series forecasting

movingabout

2022年4月18日 15:06

I'm having troubles generating univariate time series forecasts with Azure Automated Machine Learning (I know...). What I'm doing So I have about 5 years worth of monthly observations in a dataframe that looks like this: date target_value 2015-02-01 123 2015-03-01 456 2015-04-01 789 ... ... I want to forecast target_value based on past values of target_value, i.e. univariate forecasting like ARIMA for instance. So I am setting up the AutoML forecast like this: # that's the dataframe as shown above …

Topic: automl forecasting azure-ml

Category: Data Science

Google NLP AutoML

asmgx

2022年2月28日 02:01

I am doing research for Google NLP AutoML, What methodologies they have used, techniques, models, feature selection, hyper parameter optimization, etc. I could not find any paper on how google built their NLP AutoML. Can anyone guide me on that? how to find google's research on that field for academic research? Any paper you may have will help. Thanks

Topic: automl nlp google machine-learning

Category: Data Science

Best way to represent a version feature based on percentiles

Gabriel Ballesteros

2022年2月25日 19:05

We're training a binary classifier in AutoML, and one of the features consist of browser versions. Currently these versions are provided "normalized" to the model, according to the percentile of the browser the current observation falls into. For example, if the percentiles of some specific browser versions are: percentile version p25 34 p50 45 p75 53 p99 70 then an observation with said browser and version=54 would be represented as: p25 p50 p75 p99 1 1 1 0 My question …

Topic: binary-classification google-cloud-platform automl feature-construction feature-extraction

Category: Data Science

Auto ML for feature engineering

The Great

2022年1月23日 16:12

Is there any Auto ML that can try different feature engineering approaches, encoding, feature selection based on importance etc? I have been manually trying different encoding techniques for categorical variables and find it very time consuming (every time to change the encoding, run the model and repeat the same procedure). Is there any Auto-ML solution that can minimize our data preprocessing and feature engineering efforts? Of course, I understand the importance of domain inputs but I don't think I would …

Topic: automl deep-learning neural-network classification machine-learning

Category: Data Science

How to specify Search Space in Auto-Sklearn

asmgx

2022年1月20日 06:04

I know how to specify Feature Selection methods and the list of the Algorithms used in Auto-Sklearn 2.0 mdl = autosklearn.classification.AutoSklearn2Classifier( include = { 'classifier': ["random_forest", "gaussian_nb", "libsvm_svc", "adaboost"], 'feature_preprocessor': ["no_preprocessing"] }, exclude=None) I know that Auto-Sklearn use Bayesian Optimisation SMAC but I would like to specify the HyperParameters in AutoSklearn For example, I want to specify random_forst with Estimator = 1000 only or MLP with HiddenLayerSize = 100 only. any idea how to do that?

Topic: automl hyperparameter-tuning hyperparameter scikit-learn python

Category: Data Science

AutoML for categorical feature encoding

The Great

2022年1月15日 17:34

I have an input dataset with more than 100 variables where around 80% of the variables are categorical in nature. While some variables like gender, country etc can be one-hot encoded but I also have few variables which have an inherent order in their values such rating - Very good, good, bad etc. Is there any auto-ML approach which we can use to do this encoding based on the variable type? For ex: I would like to provide the below …

Topic: automl h2o deep-learning neural-network machine-learning

Category: Data Science

Predict next integer in sequence using ML.NET

keithl8041

2021年12月2日 21:52

Given a lengthy sequence of integers in the range of 0-1 I would like to be able to predict the next likely integer based on the previous sequence. Example dataset: 1 1 1 0 0 0 0 1 1 0 0 1 0 1 1 0 0 0 1 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 1 0 0 1 0 1 1 0 …

Topic: automl lstm keras regression

Category: Data Science

Automatic detection of ML problem type: Regression or Classification

dokondr

2021年9月14日 15:04

I am trying to design an algorithm that based on training data automatically detects ML problem type: Regression or Classification. There is no need to say that it is impossible to design such an algorithm that will be correct in 100% of cases. The goal is to find a heuristic that will be wrong in 10% or less. The first obvious, naive idea would be assigning regression model to the data that has at least 80% of unique values. Yet …

Topic: automl regression classification

Category: Data Science

No Recall metric in MLEval library in Python

Anakin Skywalker

2021年8月2日 22:42

I am exploring different AutoML libraries in Python. Found MLEval from Alteryx. When I try to use this tutorial, I have an interesting result. I was trying to add recall to my metrics, but this library does not have it. Printing evalml.objectives.utils.get_core_objective_names() I get ['expvariance', 'maxerror', 'medianae', 'mse', 'mae', 'r2', 'root mean squared error', 'mcc multiclass', 'log loss multiclass', 'auc weighted', 'auc macro', 'auc micro', 'precision weighted', 'precision macro', 'precision micro', 'f1 weighted', 'f1 macro', 'f1 micro', 'balanced accuracy multiclass', …

Topic: automl metric classification python

Category: Data Science

Feature set choice in Google's Vertex AI/AutoML

user2268997

2021年7月23日 21:11

This is a subjective question on utilizing Vertex AI/AutoML in practice. I posted it on stackoverflow and it was closed. I hope it is within scope here. I'm using Google's Vertex AI/AutoML's Tabular dataset models to learn a regression problem on structured data with human engineered features - it's a score/ranking problem and the training target values are either 0 or 1. Our constructed features are often correlated, sometimes the same data point normalized on different dimensions, e.g. number of …

Topic: google-cloud-platform automl feature-selection

Category: Data Science

Why does my first Amazon sagemaker autogluon training job fail and keep failing?

Alain van Rijn

2021年6月29日 19:54

I followed the instructions from this article about creating a code-free machine learning pipeline. I already had a working pipeline offline using the same data in TPOT (autoML). I uploaded my data to AWS, to try their autoML thing. I did the exact steps that were described in the article and uploaded my _train and _test csv files, both with a column named 'target' that contains the target value. The following error message was returned as a failure reason: AlgorithmError: …

Topic: automl sagemaker cloud-computing aws

Category: Data Science

Azure automl time series forecasting error

YJay

2021年1月3日 21:07

I'm using Microsoft Azure automl to try and generate models for time series forecasting but I keep getting an error: Error: Could not determine the data set time frequency. All series in the data set have one row and no freq parameter was provided. Please provide the freq (forecast frequency) parameter or review the time_series_id_column_names setting to decrease the number of time series. My data set looks like: Date, Temp, 1,2019-05-07 13:51:00,25.19, 2,2019-05-07 13:51:58,25.14, 3,2019-05-07 13:53:00,25.14, 4,2019-05-07 13:54:00,25.14, 5,2019-05-07 13:55:00,25.1, …

Topic: automl forecasting azure-ml time-series dataset

Category: Data Science

lags and rolling window in Azure AutoML Time series forecasting

xcsob

2020年3月4日 16:22

I'm following this tutorial to try Machine Learning AutoML Forecasting. In the several parameters we can submit to the AutoML experiment, we have these ones: target_logs; target_rolling_window_size; Can you explain with an example how the several forecasting algorithms works when these two parameters are set? Thank you automl_advanced_settings = { 'time_column_name': time_column_name, 'max_horizon': max_horizon, 'target_lags': 12, 'target_rolling_window_size': 4, } automl_config = AutoMLConfig(task='forecasting', primary_metric='normalized_root_mean_squared_error', experiment_timeout_hours=0.3, training_data=train, label_column_name=target_column_name, compute_target=compute_target, enable_early_stopping = True, n_cross_validations=3, verbosity=logging.INFO, **automl_advanced_settings)

Topic: automl azure-ml time-series

Category: Data Science

Auto ML vs Manual ML for a project

The Great

2020年1月24日 08:31

I recently was introduced to a AUTO ML library based on genetic programming called tpot. Thanks to @Noah Weber. I have few questions 1) When we have AUTO ML, why do people usually spend time on Feature selection or preprocessing etc? I mean they do at-least reduce the search space/feature space 2) I mean atleast, they reduce our work to some extent and we can work from the output of AUTO ML solution and tune further if required. We don't …

Topic: automl deep-learning predictive-modeling data-mining machine-learning

Category: Data Science

Incremental training and Auto Machine Learning for big datasets

Stefano Fiorucci - anakin87

2020年1月13日 12:09

I built a NLP sentence classifier, which uses vectors from word embedding as features. Training dataset is big (100k sentences). Every sentence has 930 features. I found the best model using an auto machine learning library (auto-sklearn); the training required 40 GB of RAM and 60 hours. The best model is an ensemble of the top N models found by this library. Occasionally, I need to add some data to the training set and update the training. Since this autoML …

Topic: automl training machine-learning

Category: Data Science

Retraining a text classifier with new data that has new labels

AnandS

2019年12月14日 04:29

I have a text classifier model built on AutoML Natural Language. It currently does a great job classifying text into the set of labels it was trained for. (One of the labels it is trained for is "Uncategorized") Now, I'd like to make the model to start classfying some of the "Uncategorized" text into additional new labels. I have new data to train the model on the new labels. How do I go about this given that i don't want …

Topic: automl

Category: Data Science

About