machine-learning

Estimating class prevalence in unlabelled data after predicting labels with a binary classifier

CadPat

2022年6月5日 03:03

I'm looking to get an estimate of the prevalence of 1's (i.e. the rate of positive labels) in a very large dataset that I have. However, I am hoping to report this percentage as a 95% credible interval instead of as an exact estimate of rate, taking into account the model uncertainties. These are the steps I'm hoping to perform: Train a binary classifier on labelled training data. Use a labelled test set to estimate the specificity and sensitivity of …

Topic: bayesian classification statistics machine-learning

Category: Data Science

Model Undetermined Number of Labels

sakher

2022年6月5日 01:58

I'm look for tutorials on how to build a Tensorflow model that generates predictions from input, for example, generating sentences from a paragraph, then the loss is determined when compared to ground truth labels. Or generating a number of predictions for objects found in an image. The main idea is having undetermined number of predictions or labels.

Topic: machine-learning-model prediction tensorflow machine-learning

Category: Data Science

Multiple activation functions with TensorFlow estimator DNNClassifier

David C.

2022年6月5日 01:05

I just want to know if is it possible to use tf.estimator.DNNClassifier with multiple different activation functions. I mean, could I use a DNNClassifier estimator which use different activation functions for different layers? For example, if I have a three layers model, could I use for the first layer a sigmoid function, for the second one a ReLu function and finally for the last one a tanh function? I would like to know if it isn't possible to do it …

Topic: tensorflow python machine-learning

Category: Data Science

The difference between data science and algorithm development

אבנר יעקב

2022年6月4日 23:08

I see a lot of job opportunities in the field of data science but I'm not sure the difference between a data scientist and deep learning algorithm developer. Can someone explain that to me?

Topic: deep-learning algorithms machine-learning

Category: Data Science

How to preprocess an ordered categorical variable to feed a machine learning algorithm?

marcus

2022年6月4日 22:00

I have a categorical variable that measures the income of a family: A: no income B: Up to $500 C: $500-$700 … P: $5000-$6000 Q: More than \\\$6000 It seems odd to me that I have to get dummies for this variable, since it's ordered. I wonder if it's better to map the values: {'A': 0, 'B': 1, …, 'Q': 17} so I can input it into the algorithm this values as integer numbers. What's the proper way of preprocessing …

Topic: data-wrangling preprocessing dataset machine-learning

Category: Data Science

Prediction issue with xgboost custom loss

phil

2022年6月4日 21:00

I have an issue with xgboost custom objectives: I do not manage to get consistent forecasts. In other words, the scale of my forecasts is not in line with the values I would like to predict. I tried many custom loss, but I always get the same issue. import numpy as np import pandas as pd import xgboost as xgb from sklearn.datasets import make_regression n_samples_train = 500 n_samples_test = 100 n_features = 200 X, y = make_regression(n_samples_train, n_features,noise=10) X_test, y_test …

Topic: prediction xgboost machine-learning

Category: Data Science

What are the differences between the below feature selection methods?

Niyaz

2022年6月4日 20:48

Do the below codes do the same? If not, what are the differences? fs = RFE(estimator=RandomForestClassifier(), n_features_to_select=10) fs.fit(X, y) print(fs.support_) fs= RandomForestClassifier(), fs.fit(X, y) print(fs.feature_importances_[:10,])

Topic: scikit-learn feature-selection machine-learning

Category: Data Science

What enables transformers or very deep models "plan" ahead for sequential decision making?

Water Dragon

2022年6月4日 18:29

I was watching this amazing lecture by Oriol Vinyals. On one slide, there is a question asking if the very deep models plan. Transformer models or models employed in applications like Dialogue Generation do not have a planning component but behave like they already have the dialogue planned. Dr. Vinyals mentioned that there are papers on "how transformers are building up knowledge to answer questions or do all sorts of very interesting analyses". Can any please refer to a few …

Topic: transformer reinforcement-learning deep-learning neural-network machine-learning

Category: Data Science

How do you do 1-vs-rest classifiers in XGBoost Library (Not Sklearn)?

Sebastian

2022年6月4日 18:02

I am working with a very large dataset that would benefit from using training continuation with the xgb_model parameter in xgb.train(). The label (Y) of dataset itself has 4 classes and is highly imbalanced, so I would like to generate per-label PR curves for it to evaluate its performance, and would thus need to treat each class as it's own binary problem using a one-vs-rest classifier. After a lot of reading I haven't found an equivalent to sklearn's OneVsRestClassifier in …

Topic: xgboost multiclass-classification bigdata machine-learning

Category: Data Science

Neural network / machine learning approach to model specific sequencing-classification problem in industry

Kunis

2022年6月4日 17:27

I am working on a project which involves developing a machine learning/deep learning for an application in a roll-to-roll industry. For a long time, I have been looking for similar problems as a way to get some guidance but I was never able to find anything related. Basically, the problem can be seen as follows: An industrial machine is producing a roll of some material, which tends to have visible defects throughout the roll. I have already available a machine …

Topic: lstm deep-learning classification machine-learning

Category: Data Science

Random Forest Classifier Output

Pavan

2022年6月4日 16:42

Used a RandomForestClassifier for my prediciton model. But the output printed is either 0 or in decimals. What do I need to do for my model to show me 0 and 1's instead of decimals? Note: used feature importance and removed the least important columns,still the accuracy is the same and the output hasn't changed much. Also, i have my estimators equal to 1000. do i increase or decrease this? edit: target col 1 0 0 1 output col 0.994 …

Topic: prediction random-forest predictive-modeling machine-learning

Category: Data Science

Hyper-parameter tuning of NaiveBayes Classier

Sameer Zahid

2022年6月4日 16:33

I'm fairly new to machine learning and I'm aware of the concept of hyper-parameters tuning of classifiers, and I've come across a couple of examples of this technique. However, I'm trying to use NaiveBayes Classifier of sklearn for a task but I'm not sure about the values of the parameters that I should try. What I want is something like this, but for GaussianNB() classifier and not SVM: from sklearn.model_selection import GridSearchCV C=[0.05,0.1,0.2,0.3,0.25,0.4,0.5,0.6,0.7,0.8,0.9,1] gamma=[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0] kernel=['rbf','linear'] hyper={'kernel':kernel,'C':C,'gamma':gamma} gd=GridSearchCV(estimator=svm.SVC(),param_grid=hyper,verbose=True) gd.fit(X,Y) print(gd.best_score_) print(gd.best_estimator_) …

Topic: hyperparameter-tuning naive-bayes-classifier hyperparameter scikit-learn machine-learning

Category: Data Science

Class token in ViT and BERT

Shir

2022年6月4日 15:02

I'm trying to understand the architecture of the ViT Paper, and noticed they use a CLASS token like in BERT. To the best of my understanding this token is used to gather knowledge of the entire class, and is then solely used to predict the class of the image. My question is — why does this token exist as input in all the transformer blocks and is treated the same as the word / patches tokens? Treating the class token …

Topic: attention-mechanism computer-vision deep-learning nlp machine-learning

Category: Data Science

Intuitively, why do Non-monotonic Activations Work?

Jason

2022年6月4日 14:43

The swish/SiLU activation is very popular, and many would argue it has dethroned ReLU. However, it is non-monotonic, which seems to go against popular intuition (at least on this site: example 1, example 2). Reading the swish paper, the justification that the authors give is that non-monotonicity "increases expressivity and improves gradient flow... [and] may also provide some robustness to different initializations and learning rates." The authors provide an image to back up this claim, but at best this argument …

Topic: activation-function deep-learning neural-network machine-learning

Category: Data Science

Combine multiple duplicate categorical variables into a single one for multiple linear regression

mlbrulz

2022年6月4日 12:41

I am trying to create a regression model that predicts the box office success of a movie, with one of the explanatory variables being the actors who appear in the film. My problem is that I decided to do the first 4 billed actors, but in the model, it is taking it as 4 separate variables (Actor 1, Actor 2, Actor 3, Actor 4). For example, Jack Nicholson is the lead in "as good as it gets" so he would …

Topic: machine-learning-model regression machine-learning

Category: Data Science

R-Squared for real valued label under non linear regression learner

user6703592

2022年6月4日 11:59

Below are my questions of R-Squared for real valued label under non linear regression learner. It may be a large problem, if there is no easy answer, could you give me some references？ Firstly for the real valued label, except for R-squared, is there any good value to evaluate the performance of fitting? I know that small MSE, MAE, e.t.c usually mean the good fitting. However they may be not as intuitive as a ratio like R-Sqaured (how small is …

Topic: r-squared machine-learning

Category: Data Science

Using survival analysis models with uncensored data for time-to-event prediction

Mykola Zotko

2022年6月4日 10:12

Are there any advantages of using survival analysis models like Cox’s proportional hazard model with uncensored data over simple linear regression or other classic ML models? I have data with recurrent events and I try to predict the time of the next event. Data contains about 2000 different subjects and about 60 events per subject. The percentage of censored data (the last event of each subject) is small, and I don't think it plays a big role in the prediction.

Topic: survival-analysis time-series machine-learning

Category: Data Science

Which supervised ML model to use for exam/grade prediction?

Ahmed Mahmud

2022年6月4日 10:01

So I plan on making a mobile app that will let students predict their final grades based on their mock exam results. I can train my model with previous years results. X: 5 mock results Y: Final grade obtained However, I have the issue that sometimes, or most the times, the user may be using the app whilst not having taken ALL the mock exams yet, they may want to see if they are on track and use it once …

Topic: supervised-learning python machine-learning

Category: Data Science

How to interpret a specific feature importance?

DN1

2022年6月4日 09:07

Apologies for a very case specific question. I have a dataset of genes, with which I am using machine learning to predict if a gene causes a disease. One of the features I have is a beta value (which is the effect size of the gene's impact on the disease), and I'm not sure how best to interpret and use this feature. I condense the beta values from the variant level to the gene level, so a gene is left …

Topic: bioinformatics feature-selection machine-learning

Category: Data Science

Merging two datasets with different features for machine learning prediction

Djakarta_zero

2022年6月4日 06:13

I'm trying to create a model which predicts Real estate prices with xgboost in machine learning, my question is : Can i combine two datasets to do it ? First dataset : 13 features Second dataset : 100 features Thé différence between the two datasets is that the first dataset is Real estate transaction from 2018 to 2021 with features like area , région And the second is also transaction but from 2011 to 2016 but with more features like …

Topic: prediction pandas feature-selection python machine-learning

Category: Data Science

About