Very low probability in naive Bayes classifier 1

I have some training data (TRAIN) and some test data (TEST). Each row of each table contains an observed class (X) and some columns of binary (Y). I'm using a Python script that is intended to predict the probability (Pr) of X given Y in the test data based on the training data. It uses a Bernoulli naive Bayes classifier. Here is my script: https://stackoverflow.com/questions/55187516/look-up-bernoullinb-probability-in-dataframe It works on the dummy data that is included with the script. On the real …
Category: Data Science

XGBClassifier's predictions are not probabilities with objective='binary:logistic'

I am using a XGBoost's XGBClassifier, a binary 0-1 target, and I am trying to define a custom metric function. It supposedly receives an array of predictions and a DMatrix with the training set according to the XGBoost Tutorials. I have used objective='binary:logistic' in order to get probabilities but the prediction values passed to the custom metric function are not between 0 and 1. They can be like between -3 and 5 and the range of values seems to grow …
Category: Data Science

Using softmax for multilabel classification (as per Facebook paper)

I came across this paper by some Facebook researchers where they found that using a softmax and CE loss function during training led to improved results over sigmoid + BCE. They do this by changing the one-hot label vector such that each '1' is divided by the number of labels for the given image (e.g. from [0, 1, 1, 0] to [0, 0.5, 0.5, 0]). However, they do not mention how this could then be used in the inference stage, …
Category: Data Science

Log odds vs Log probability

Log-odds has a linear relationship with the independent variables, which is why log-odds equals a linear equation. What about log of probability? How is it related to the independent variables? Is there a way to check the relationship?
Category: Data Science

Store's unseen items sales forecasting

I am working on sales forecasting problem.I am able to provide data about which items got sold and not sold to the algorithm.How to provide algorithm information about items that are not present in the store.Is there any way we could encode this information in data or any other algorithms accepts this kind of information.Currently, I am using Neural Networks and Random Forest to forecast Sales.
Category: Data Science

Analysis of probability distribution of each features and Machine Learning

While I know that probability distributions are for hypothesis testing, confidence level constructions, etc. They definitely have many roles in statistical analysis. However, it is not obvious to me now how probability distributions come in handy for machine learning problems? In ML algorithms, they are expected to automatically pick up distributions from dataset. I wonder if there are any places of probability distributions in better solving ML problem? Shortly put, how could statistical techniques related to probability distributions can benefit …
Category: Data Science

Compare cross validation values of Bernoulli NB and Multinomial NB

I'm testing the Multinomial NB and Bernoulli NB on my dataset and I'm using the cross validation score to better understand which of the two algorithms work better. This is the first classifier: from sklearn.naive_bayes import MultinomialNB clf_multinomial = MultinomialNB() clf_multinomial.fit(X_train, y_train) y_predicted = clf_multinomial.predict(X_test) score = clf_multinomial.score(X_test, y_test) scores = cross_val_score(clf_multinomial, X_train, y_train, cv=5) print(scores) print(score) And these are the scores: [0.75 0.875 0.66666667 0.95833333 0.86956522] 0.8637666498061035 This is the second classifier: from sklearn.naive_bayes import BernoulliNB clf_multivariate = BernoulliNB() …
Category: Data Science

How to understand Bionomial Theroem and the Recursion Rule?

In this video from EDX, the instructor explains the binomial theorem as: Binomial Theorem: When you calculate $(a + b)^n = a^n + C(1)a^{(n-1)b} + C(2)a^{(n-2)b^2} + ... + C(n-1)ab^{(n-1) + b^n}$ The coefficients = $C1, C2, ..., C(n-1)$ Coefficient of $(a + b)^4 = 1, 4, 6, 4, 1$ From the formula above, you get after some manipulation: $(a + b)^n = a^n + (n choose 1) a^{(n-1) b^1} + (n choose 2) a^{(n-2) b^2} + ... + (n …
Category: Data Science

Text similarity for badly written text

Consider the following scenario: Suppose two lists of words $L_{1}$ and $L_{2}$ are given. $L_{1}$ contains just bad-written phrases (like 'age' instead of '4ge' or 'blwe' instead of 'blue' etc.). On the other hand, each element of $L_{2}$ is a well-written version of each element of $L_{1}$. Here is an example: $$L_{1}=[...,dqta \ 5ciencc,...,s7ack \ exch9nge,...],$$ $$L_{2}=[...,stack \ exchange,...,data \ science,...].$$ Problem: Is there any strategy to try to predict which element $w^{\prime}$ in $L_{2}$ is the syntactically correct counterpart …
Category: Data Science

Machine Learning for conditional density estimation

Suppose I have a set of examples $X = (x_1,x_2,..,x_n)$ with continuous numeric targets $Y = (y_1,y_2,..,y_n)$. While it is standard to use regression models to make point predictions of $y_i$ as $f(x_{i}) = \hat{y}_i$, I am interested in predicting a density function for $y_{i}$. What I want is analogous to the use of probabilities in classification instead of hard predictions (e.g. predict vs predict_proba in Scikit-learn), but for continuous regression problems. Specifically, a different density function (e.g. in the …
Category: Data Science

Odds vs Likelihood

Odds is the chance of an event occurring against the event not occurring. Likelihood is the probability of a set of parameters being supported by the data in hand. In logistic regression, we use log odds to convert a probability-based model to a likelihood-based model. In what way are odds & likelihood related? And can we call odds a type of conditional probability?
Category: Data Science

Modeling the influence of events order on probability

The case is to model if the sequence of events influences the probability of binary target variable. We have for example five different events which occur in time (event: A,B,C,D,E). They can occur in order from 1 to 5. I would like to check if the order of their occurrence influences the target variable. My first idea was to convert the time of occurrence into numbers from 1 to 5 and then for example use logistic regression. Do You know …
Category: Data Science

Is there any way to artificially create a probability calibration for data coming from another model?

I have predictions, which come from a survival model, this model gives me very low probabilities, and I am not sure if they fulfill the real probability of the phenomenon. For example, I calculate $P\left( T\leq t+d \middle| T>t \right)$ and the probabilities are very low (with $d=180$). To summarize, I need these probabilities to be on average another number (let's say $0.2$). Is it possible to create an artificial calibration with only this number (the desired average) as the …
Category: Data Science

Best metric to evaluate model probabilities

i'm trying to create ML model for binary classification problem with balanced dataset and i care mostly about probabilities. I was trying to search web and i find only advices to use AUC or logloss scores. There is no advices to use Brier score as evaluation metric. Can i use brier score as evaluation metric or there is some pitfalls within it? As i can understand if i will use logloss score as evaluation metric the "winner one" model will …
Category: Data Science

Incorrect example of applying Bayes theorem

I have been reading the book "The Data Science Design Manual" (by Steven S. Skiena) and I came across an example that explained how the Bayes theorem can be applied that confused me and made me suspect it might be wrong. The example is the following: $$ P(A|B) = \frac{P(B|A)P(A)}{P(B)} $$ Suppose A is the event that person x is actually a terrorist, and B is the result of a feature-based classifier that decides if x looks like a terrorist. …
Category: Data Science

What algorithms can handle probabilistic targets?

I have a classification problem where I want to want to use probabilities instead of classes to train my model to learn to output probabilities. In my dataset, I have instances where the probabilities of two classes are almost equal and I would like the model to be able to learn these subtleties instead of me just providing the class for each instance. Is there an ML model that can handle this? Thanks!
Category: Data Science

Aggregated probability based on multiple predictions on independent samples using the same classifier

i have a understanding question regarding the interpretation of a aggregation of a machine learning classifier. Lets assume i have trained a binary classifier and it was validated with a accuracy of 70% (dataset is always balanced). My question is now, if this probability seems to low for me - and i would search for ways to improve that without any readjustments on the classifier - would the following idea be valid?: The classifier predicts three independent samples (always with …
Category: Data Science

how to deal with features in pairwaise comparison models?

I am working on a dataset of ATP (Association of Tennis Professionals - men only) tennis games over several years. I want to predict the outcome of tennis so one way to do that is using a Bradley-Terry model which is a probability model I am asking about how to do feature selection or feature engineering( I am not talking about domain knowledge FE) or preprocessing that must be applied before training the model
Category: Data Science

the mean and standard deviation aren't the same as those of the input data i provided after sampling

I have a log-normal mean and a standard deviation. after i converted them to the underlying normal distribution's parameters mu and sigma, I sampled from the log-normal distribution however when i take the mean and standard deviation of this sampled data i don't get the results i plugged in at first. This only happens when the log-normal mean is way smaller than the log-normal standard deviation otherwise it works. how do i prevent this from happening and get the input …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.