scoring

Is the maximum BDeu Bayesian Network always the empty network?

jamesmiraflor

2022年4月30日 00:03

I'm recently reading a paper about Scoring Mechanisms for Bayesian Networks. For the BDeu score, it appears that the maximum possible score of BDeu for Bayesian Network structure learning is zero. Does it mean that the best network is always the empty network?

Topic: bayesian-networks pgm scoring machine-learning

Category: Data Science

Best practices for scoring hundreds of models on the same massive dataset?

blacksite

2022年4月27日 05:04

I have 500+ models predicting various things and a massive database of over 400m+ individuals and about 5,000 possible independent variables. Currently, my scoring process takes about 5 days, and operates by chunking up the 400m+ records into 100k-person pieces and spinning up n-number of threads, each with a particular subset of the 500+ models, and running this way until all records are scored for all models. Each thread is a Python process which submits R code (i.e. loads an …

Topic: scoring

Category: Data Science

Ranking algorithm based on a handful of features

kms

2022年4月11日 00:03

I am trying to determine the apt algorithm for a ranking problem that I am working on. I have social media metrics - engagement, sentiment, audience size etc for several brands and am looking for a ranking / classification algorithm to rank them. I am not sure if I have a dependent variable or label class for classical classification algorithms. The data is aggregated by brands and algorithms needs to rank the brands based on the metrics. Any ideas would …

Topic: ranking scoring classification machine-learning

Category: Data Science

R in production

coulminer

2022年3月16日 17:41

Many of us are very familiar with using R in reproducible, but very much targeted, ad-hoc analysis. Given that R is currently the best collection of cutting-edge scientific methods from world-class experts in each particular field, and given that plenty of libraries exist for data io in R, it seems very natural to extend its applications into production environments for live decision making. Therefore my questions are: did someone of you go into production with pure R (I know of …

Topic: scoring predictive-modeling r

Category: Data Science

Ranking ATM based on Utilization and Economic Data (Scoring/Rank Model)

James

2022年3月15日 22:07

I have a sample data of around 10 ATM's Locations along with their Utilization Count (Deposits, Withdrawals and Others) for the past 3 months. I am planning to collect additional data such as nearby places of Commercial Interest and Others where there might be demand of Cash. The data is collected for approximately 300 meters of each ATM, i.e., places of Commercial Interest nearby the ATM. I would like to build a 'Scoring/Rank Model' which can take all these inputs …

Topic: model-selection scoring evaluation predictive-modeling

Category: Data Science

Scikit-learn with a custom scoring function using a 'feature'

ML_fan

2022年2月10日 05:24

0 I am trying to use a new metric called 'SERA' (Squared Error Relevance Area) as a custom scoring function for imbalanced regression as mentioned in this paper. https://link.springer.com/article/10.1007/s10994-020-05900-9 Here is what the paper tells in brief. To calculate SERA a feature known as 'relevance' defined by the user is required for each feature-label pair. Relevance varies from 0 to 1. 0 for not relevant and 1 for highly relevant. This is the procedure for the calculation of SERA. Relevance …

Topic: scoring regression scikit-learn python machine-learning

Category: Data Science

Approaches for matching leads to salesmen

bkubs55577

2022年1月22日 00:35

I'm starting to tackle a new problem where we are trying to optimally match new leads (perspective customers) for our product to our sales representatives in the hopes of improving bottom-line metrics like conversion rate, average sale price, etc. We have a bunch of data from the leads when they fill out their info on web forms and from 3rd party data providers we use to enrich the core web form data (we try and pull their soft credit score, …

Topic: scoring regression classification recommender-system machine-learning

Category: Data Science

Develop a Scorecard Model with Orange 3.30

Aldo Ivan Leal

2021年11月25日 19:01

I'm a super fan user of Orange 3.30, actually I've beeing develop some Collection Strategys and some othe of CLI in my actuall work, and everything has been OK with all the decisions that I've being making, but not that we have some suppet data history it is time to create our Behaviour Score so that need a Scorecard Model. I've beeing reading a lot of Orange 3.30 but nothing seems to approuch of what I've need. The main goal …

Topic: orange3 orange scoring sql predictive-modeling

Category: Data Science

How can i adapt accuracy metric for multiclass classification?

Maths12

2021年11月16日 09:30

I have a problem which is multiclass e.g. That is 4 classes. I would like a custom metric to assess the model where only if class 3 is predicted as class 2 and class 2 is predicted as class 3 (i.e. those in the middle) then it is penalized less. How can i do this by adapting the sklearn accuracy_score metric of similar? e.g. comparing: predicted_labels = [1,3,0,0,2..] actual = [0,0,2,1,3,3...]

Topic: model-evaluations metric scoring accuracy machine-learning

Category: Data Science

Right way to compare model scores for Next Best Action

user8419142

2021年11月4日 13:06

I have around 15 classification models for different products built in different ways (some are RF, some are Gradient Boosting, some were downsampled in one way, others in other way, some are built in 12 months historic, some are built in 24 months historic) and I have to compare the scores to choose what product to offer. All models have target 1 for "customer bought the product" and 0 for "customer didn't buy the product". I have read about this …

Topic: mathematics score scoring classification statistics

Category: Data Science

How to interpret Sum of Squared Error in a classification task

user60080

2021年9月13日 17:46

I am working on ANN. I have 2497 training examples and each of them is a vector of 128, so the input size is 128. Number of neurons in hidden layer is 64 and number of output neurons is 6 (since classes are six). My Target vector looks something like this: [0 1 0 0 0 0]. This means that the example belongs to class 2. I have used sigmoid as an activation at all layers and sum of squared …

Topic: multiclass-classification c scoring classification machine-learning

Category: Data Science

calculating score for the sentiment

Maria

2021年9月2日 18:51

I am working on the sentiment project. I have used the BERT model. Now I need to generate a score for the sentiment of each sentence. I don't have any idea what would be the potential approach to do this. Any input is much appreciated.

Topic: scoring deep-learning sentiment-analysis nlp machine-learning

Category: Data Science

Scikit-learn make_scorer custom metric problem for multiclass clasification

Antonio Velazquez Bustamante

2021年8月31日 07:33

I was doing a churn analysis using: randomcv = RandomizedSearchCV(estimator=clf,param_distributions = params_grid, cv=kfoldcv,n_iter=100, n_jobs=-1, scoring='roc_auc') and everything was fine, but then, I tried it with a custom scoring function this way: def gain_fn(y_true, y_prob): tp = np.where((y_prob >= 0.02) & (y_true==1), 40000, 0) fp = np.where((y_prob >= 0.02) & (y_true==0), -1000, 0) return np.sum([tp,fp]) scorer_fn = make_scorer(gain_fn, greater_is_better = True, needs_proba=True) randomcv = RandomizedSearchCV(estimator=clf,param_distributions = params_grid, cv=kfoldcv,n_iter=100, n_jobs=-1, scoring=scorer_fn) but I need to make a calculation, inside of gain_fn, with …

Topic: scoring scikit-learn python

Category: Data Science

Random forest mode scoring

Amit Raz

2021年5月23日 17:00

We are using random forest algorithm but having some trouble understanding the scoring method it uses. take for example the following CM of the test set: Threshold 45 cm is: [[67969 48031] [ 3321 11120]] and the prescion is: 0.18799344051632602 Threshold 50 cm is: [[77642 38358] [ 4785 9656]] and the prescion is: 0.2011080101632834 Threshold 55 cm is: [[88825 27175] [ 6796 7645]] and the prescion is: 0.2195577254445159 Threshold 60 cm is: [[100411 15589] [ 9629 4812]] and the prescion …

Topic: scoring decision-trees random-forest

Category: Data Science

Having trouble scaling scores of logistic regression

Ach113

2021年2月16日 22:01

I am constructing a credit scorecard using logistic regression, similar to the one shown here. However, when trying to convert the coefficients of logistic regression into score representation (by scaling the values using the provided formula) I am getting numbers that dont make much sense. Formula used for calculating scores: Score_i= (βi × WoE_i + α/n) × Factor + Offset/n where βi is the coefficient of the logistic regression (of variable i), WoE_i is the weight of evidence of corresponding …

Topic: scoring logistic-regression

Category: Data Science

Data science tools for easing the participation of a business into their scoring system

Matthieu Dsprz

2021年1月29日 00:30

I'm a working in a small company. The company sells products on a website and they have a python script that runs everyday to attribute a score to each product based on a set of parameters (google analytics events, similar products popularity, price, etc). The problem is that the scoring outcome is not satisfying, and requiring developers to edit this script arbitrarily, based on business people assumptions, is time consuming and not a proper way to achieve what the business …

Topic: scoring python tools

Category: Data Science

Scoring samples after clusterings

EzrielS

2021年1月27日 10:04

I want to assign a score to all points in a group that I cluster several time. I want the score to indicate how much this point is grouped with the same individuals all time. I suppose this idea allready exists, however I didn't found anything, only scoring on global clusters, as mutual information score. I had an idea, for a point x, to count each point y that is in the same cluster of x in two clusterisations, or …

Topic: scoring clustering

Category: Data Science

Why is the F-measure preferred for classification tasks?

Bruno Lubascher

2021年1月17日 21:41

Why is the F-measure usually used for (supervised) classification tasks, whereas the G-measure (or Fowlkes–Mallows index) is generally used for (unsupervised) clustering tasks? The F-measure is the harmonic mean of the precision and recall. The G-measure (or Fowlkes–Mallows index) is the geometric mean of the precision and recall. Below is a plot of the different means. F1 (harmonic) $= 2\cdot\frac{precision\cdot recall}{precision + recall}$ Geometric $= \sqrt{precision\cdot recall}$ Arithmetic $= \frac{precision + recall}{2}$ The reason I ask is that I need …

Topic: nlg metric scoring evaluation machine-learning

Category: Data Science

Standardizing binary decision with other scales (Like 1-5)

StationaryTraveller

2021年1月15日 16:33

In the company I work for there are 2 different evaluation metrics for a song: Yes / No (Equivalent to like/dislike) 1-5 Scale Customers can use both to rank songs they like. I would like to create a model that predicts the next possible songs you would like. Currently, I'm ignoring the Binary data. I wonder if there's a good way of utilizing the Binary data as tagged data [And not as a feature]. I've thought about two possible solutions: …

Topic: score ranking scoring

Category: Data Science

What is the proper way to bin variables for calculating WoE during credit scoring?

Ach113

2020年12月23日 23:20

I have read this article about developing a credit scorecard in python, where it is stated that when binning the continuous variables, it needs to be ensured that: 1. Each bin should have at least 5% of the observations 2. Each bin should be non-zero for both good and bad loans 3. The WOE should be distinct for each category. Similar groups should be aggregated or binned together. It is because the bins with similar WoE have almost the same …

Topic: scoring

Category: Data Science

About