probability

Very low probability in naive Bayes classifier 1

R. Cox

2022年6月3日 15:04

I have some training data (TRAIN) and some test data (TEST). Each row of each table contains an observed class (X) and some columns of binary (Y). I'm using a Python script that is intended to predict the probability (Pr) of X given Y in the test data based on the training data. It uses a Bernoulli naive Bayes classifier. Here is my script: https://stackoverflow.com/questions/55187516/look-up-bernoullinb-probability-in-dataframe It works on the dummy data that is included with the script. On the real …

Topic: prediction probability naive-bayes-classifier machine-learning

Category: Data Science

XGBClassifier's predictions are not probabilities with objective='binary:logistic'

João Bravo

2022年5月30日 18:55

I am using a XGBoost's XGBClassifier, a binary 0-1 target, and I am trying to define a custom metric function. It supposedly receives an array of predictions and a DMatrix with the training set according to the XGBoost Tutorials. I have used objective='binary:logistic' in order to get probabilities but the prediction values passed to the custom metric function are not between 0 and 1. They can be like between -3 and 5 and the range of values seems to grow …

Topic: metric probability xgboost scikit-learn classification

Category: Data Science

Using softmax for multilabel classification (as per Facebook paper)

Steve Ahlswede

2022年5月27日 14:02

I came across this paper by some Facebook researchers where they found that using a softmax and CE loss function during training led to improved results over sigmoid + BCE. They do this by changing the one-hot label vector such that each '1' is divided by the number of labels for the given image (e.g. from [0, 1, 1, 0] to [0, 0.5, 0.5, 0]). However, they do not mention how this could then be used in the inference stage, …

Topic: convolutional-neural-network probability multilabel-classification deep-learning machine-learning

Category: Data Science

Log odds vs Log probability

Apoorva

2022年5月27日 12:02

Log-odds has a linear relationship with the independent variables, which is why log-odds equals a linear equation. What about log of probability? How is it related to the independent variables? Is there a way to check the relationship?

Topic: probability logistic-regression

Category: Data Science

Store's unseen items sales forecasting

RAVI TEJA M

2022年5月25日 16:03

I am working on sales forecasting problem.I am able to provide data about which items got sold and not sold to the algorithm.How to provide algorithm information about items that are not present in the store.Is there any way we could encode this information in data or any other algorithms accepts this kind of information.Currently, I am using Neural Networks and Random Forest to forecast Sales.

Topic: probability forecast statistics predictive-modeling machine-learning

Category: Data Science

Analysis of probability distribution of each features and Machine Learning

Hing

2022年5月23日 04:07

While I know that probability distributions are for hypothesis testing, confidence level constructions, etc. They definitely have many roles in statistical analysis. However, it is not obvious to me now how probability distributions come in handy for machine learning problems? In ML algorithms, they are expected to automatically pick up distributions from dataset. I wonder if there are any places of probability distributions in better solving ML problem? Shortly put, how could statistical techniques related to probability distributions can benefit …

Topic: distribution probability statistics machine-learning

Category: Data Science

Compare cross validation values of Bernoulli NB and Multinomial NB

JimBelushi2

2022年5月22日 22:06

I'm testing the Multinomial NB and Bernoulli NB on my dataset and I'm using the cross validation score to better understand which of the two algorithms work better. This is the first classifier: from sklearn.naive_bayes import MultinomialNB clf_multinomial = MultinomialNB() clf_multinomial.fit(X_train, y_train) y_predicted = clf_multinomial.predict(X_test) score = clf_multinomial.score(X_test, y_test) scores = cross_val_score(clf_multinomial, X_train, y_train, cv=5) print(scores) print(score) And these are the scores: [0.75 0.875 0.66666667 0.95833333 0.86956522] 0.8637666498061035 This is the second classifier: from sklearn.naive_bayes import BernoulliNB clf_multivariate = BernoulliNB() …

Topic: probability cross-validation classification python

Category: Data Science

Wave Function in Python

Elevate Vienna

2022年5月20日 08:05

How to apply Wave Function to a Data Set in Python to derive frequency distribution and probability amplitude?

Topic: forecasting probability data python

Category: Data Science

How to understand Bionomial Theroem and the Recursion Rule?

Gojilla

2022年5月19日 17:01

In this video from EDX, the instructor explains the binomial theorem as: Binomial Theorem: When you calculate $(a + b)^n = a^n + C(1)a^{(n-1)b} + C(2)a^{(n-2)b^2} + ... + C(n-1)ab^{(n-1) + b^n}$ The coefficients = $C1, C2, ..., C(n-1)$ Coefficient of $(a + b)^4 = 1, 4, 6, 4, 1$ From the formula above, you get after some manipulation: $(a + b)^n = a^n + (n choose 1) a^{(n-1) b^1} + (n choose 2) a^{(n-2) b^2} + ... + (n …

Topic: probability statistics

Category: Data Science

Text similarity for badly written text

Ramiro Hum-Sah

2022年5月19日 03:29

Consider the following scenario: Suppose two lists of words $L_{1}$ and $L_{2}$ are given. $L_{1}$ contains just bad-written phrases (like 'age' instead of '4ge' or 'blwe' instead of 'blue' etc.). On the other hand, each element of $L_{2}$ is a well-written version of each element of $L_{1}$. Here is an example: $$L_{1}=[...,dqta \ 5ciencc,...,s7ack \ exch9nge,...],$$ $$L_{2}=[...,stack \ exchange,...,data \ science,...].$$ Problem: Is there any strategy to try to predict which element $w^{\prime}$ in $L_{2}$ is the syntactically correct counterpart …

Topic: bert probability multilabel-classification multiclass-classification nlp

Category: Data Science

Machine Learning for conditional density estimation

Enk9456

2022年5月17日 11:48

Suppose I have a set of examples $X = (x_1,x_2,..,x_n)$ with continuous numeric targets $Y = (y_1,y_2,..,y_n)$. While it is standard to use regression models to make point predictions of $y_i$ as $f(x_{i}) = \hat{y}_i$, I am interested in predicting a density function for $y_{i}$. What I want is analogous to the use of probabilities in classification instead of hard predictions (e.g. predict vs predict_proba in Scikit-learn), but for continuous regression problems. Specifically, a different density function (e.g. in the …

Topic: density-estimation probability regression

Category: Data Science

Odds vs Likelihood

Apoorva

2022年5月17日 07:05

Odds is the chance of an event occurring against the event not occurring. Likelihood is the probability of a set of parameters being supported by the data in hand. In logistic regression, we use log odds to convert a probability-based model to a likelihood-based model. In what way are odds & likelihood related? And can we call odds a type of conditional probability?

Topic: probability logistic-regression

Category: Data Science

Modeling the influence of events order on probability

Luc

2022年5月16日 20:01

The case is to model if the sequence of events influences the probability of binary target variable. We have for example five different events which occur in time (event: A,B,C,D,E). They can occur in order from 1 to 5. I would like to check if the order of their occurrence influences the target variable. My first idea was to convert the time of occurrence into numbers from 1 to 5 and then for example use logistic regression. Do You know …

Topic: probability sequence data-mining

Category: Data Science

Is there any way to artificially create a probability calibration for data coming from another model?

Juan Esteban de la Calle

2022年5月12日 13:57

I have predictions, which come from a survival model, this model gives me very low probabilities, and I am not sure if they fulfill the real probability of the phenomenon. For example, I calculate $P\left( T\leq t+d \middle| T>t \right)$ and the probabilities are very low (with $d=180$). To summarize, I need these probabilities to be on average another number (let's say $0.2$). Is it possible to create an artificial calibration with only this number (the desired average) as the …

Topic: data-science-model probability-calibration probability

Category: Data Science

Best metric to evaluate model probabilities

Robert Chasnouski

2022年5月10日 16:16

i'm trying to create ML model for binary classification problem with balanced dataset and i care mostly about probabilities. I was trying to search web and i find only advices to use AUC or logloss scores. There is no advices to use Brier score as evaluation metric. Can i use brier score as evaluation metric or there is some pitfalls within it? As i can understand if i will use logloss score as evaluation metric the "winner one" model will …

Topic: probability-calibration probability evaluation machine-learning

Category: Data Science

Incorrect example of applying Bayes theorem

Dimos

2022年5月9日 10:23

I have been reading the book "The Data Science Design Manual" (by Steven S. Skiena) and I came across an example that explained how the Bayes theorem can be applied that confused me and made me suspect it might be wrong. The example is the following: $$ P(A|B) = \frac{P(B|A)P(A)}{P(B)} $$ Suppose A is the event that person x is actually a terrorist, and B is the result of a feature-based classifier that decides if x looks like a terrorist. …

Topic: bayesian probability data statistics

Category: Data Science

What algorithms can handle probabilistic targets?

Merry

2022年5月3日 19:46

I have a classification problem where I want to want to use probabilities instead of classes to train my model to learn to output probabilities. In my dataset, I have instances where the probabilities of two classes are almost equal and I would like the model to be able to learn these subtleties instead of me just providing the class for each instance. Is there an ML model that can handle this? Thanks!

Topic: machine-learning-model probability classification machine-learning

Category: Data Science

Aggregated probability based on multiple predictions on independent samples using the same classifier

deniz

2022年5月3日 14:32

i have a understanding question regarding the interpretation of a aggregation of a machine learning classifier. Lets assume i have trained a binary classifier and it was validated with a accuracy of 70% (dataset is always balanced). My question is now, if this probability seems to low for me - and i would search for ways to improve that without any readjustments on the classifier - would the following idea be valid?: The classifier predicts three independent samples (always with …

Topic: binary-classification prediction probability machine-learning

Category: Data Science

how to deal with features in pairwaise comparison models?

Mohamed Amine

2022年5月1日 19:23

I am working on a dataset of ATP (Association of Tennis Professionals - men only) tennis games over several years. I want to predict the outcome of tennis so one way to do that is using a Bradley-Terry model which is a probability model I am asking about how to do feature selection or feature engineering( I am not talking about domain knowledge FE) or preprocessing that must be applied before training the model

Topic: probability feature-engineering preprocessing feature-selection predictive-modeling

Category: Data Science

the mean and standard deviation aren't the same as those of the input data i provided after sampling

codebreaker12

2022年4月25日 14:21

I have a log-normal mean and a standard deviation. after i converted them to the underlying normal distribution's parameters mu and sigma, I sampled from the log-normal distribution however when i take the mean and standard deviation of this sampled data i don't get the results i plugged in at first. This only happens when the log-normal mean is way smaller than the log-normal standard deviation otherwise it works. how do i prevent this from happening and get the input …

Topic: distribution probability scipy sampling python

Category: Data Science

About