data-science-model

What to do when one feature has very large importance/weight?

Daria

2022年6月3日 07:27

I am new to Data Science and currently am trying to predict customers churn for a company that offers of subscription-based bookings management software. Its customers are gyms. I have a small unbalanced dataset of a historical data (False 670, True 230) with 2 numerical predictors: age(days since subscription), number of active days in the last month(days on which a customer(gym) had bookings) and 1 categorical: logo (boolean, if a customers uploaded a logo in a software). Predictors have following …

Topic: data-science-model churn logistic-regression classification

Category: Data Science

Neural network is not giving the expected output after training in Python

VASIH

2022年6月1日 18:01

My neural network is not giving the expected output after training in Python. Is there any error in the code? Is there any way to reduce the mean squared error (MSE)? I tried to train (Run the program) the network repeatedly but it is not learning, instead it is giving the same MSE and output. Here is the Data I used: https://drive.google.com/open?id=1GLm87-5E_6YhUIPZ_CtQLV9F9wcGaTj2 Here is my code: #load and evaluate a saved model from numpy import loadtxt from tensorflow.keras.models import load_model …

Topic: data-science-model ai neural-network python machine-learning

Category: Data Science

Keras very low accuracy, saturate after few epochs while training

SOUHARDHYA PAUL

2022年5月30日 05:00

I am very new to the data science domain and directly jumped to TensorFlow models. I've worked on examples provided on the website before. My first time doing any project using it. I am building a Cricket Score Predictor using Keras, Tensorflow. I have a dataset of details of players in a csv containing columns - "striker", "non_striker", "bowler", "run_per_ball", "run_per_ball_avg", "ball_count". "ball_count" and "run_per_ball" are labels of the model and rest are features. I have a total of 51555rows …

Topic: data-science-model keras tensorflow neural-network machine-learning

Category: Data Science

Reviewing a paper - common practice

StatsSorceress

2022年5月29日 16:53

I've been asked to review a paper in which the authors compare their new model (let's call it Model A) to other models (B, C, and D), and conclude theirs is superior on some metric (I know, big surprise!). Here's the problem: in my research, my supervisors always instructed me to code up the competing models and compare my model that way. The paper I'm reviewing, by contrast, just quotes results from previous literature. To clarify, here's what I would …

Topic: data-science-model

Category: Data Science

ML, Statistics and Mathematics

ranit.b

2022年5月25日 07:00

I have just started getting my hands wet in ML and every time I try delving deeper into the concepts/code, I face the challenges of the mathematics and its cryptic notations. Coming from a Computer Science background, I do understand bit of them but majority goes tangent. Say, for example below formulae from this page - I try and really want to understand them but somehow get confused and leave it everytime. Can you please suggest how to start with …

Topic: data-science-model mathematics statistics machine-learning

Category: Data Science

error while running lasso.py

ksn

2022年5月24日 15:08

The following is the error code generated while running lasso.py. Can anybody help in fixing the same. Here is the code: from cvxpy import * import numpy as np import cvxopt from multiprocessing import Pool # Problem data. n = 10 m = 5 A = cvxopt.normal(n,m) b = cvxopt.normal(n) gamma = Parameter(nonneg=True) # Construct the problem. x = Variable(m) objective = Minimize(sum_squares(A*x - b) + gamma*norm(x, 1)) p = Problem(objective) # Assign a value to gamma and find the …

Topic: data-science-model anaconda optimization classification python

Category: Data Science

What's the right input for gpt-2 in NLP

yuqiong11

2022年5月24日 10:59

I'm fine-tuning pre-trained gpt-2 for text summarization. The dataset contains 'text' and 'reference summary'. So my question is how to add special tokens to get the right input format. Currently I'm thinking doing like this: example1 <BOS> text <SEP> reference summary <EOS> , example2 <BOS> text <SEP> reference summary <EOS> , ..... Is this correct? If so, a follow-up question would be whether the max-token-length(i.e. 1024 for gpt-2) means also the concatenate length of text and reference summary? Any comment …

Topic: openai-gpt transformer data-science-model nlp

Category: Data Science

Unable to generate useful insights on a highly cardinal data

dark_rush

2022年5月24日 06:21

I'm working on CRM data, did some cleaning, encoding and ran a decision tree classifier from which i plotted a feature_importance graph From that I found that Sales person column is one of the important feature which is highly cardinal column(around 1300+ categories/sales person). Now i'm trying to generate some insights on this column with respect to target column(binary values). Would like to know in general how to create insights from such a large categorical column? P.S: Other columns are …

Topic: data-science-model data-analysis visualization python machine-learning

Category: Data Science

Handling IP addresses as features when creating machine learning model

taga

2022年5月21日 11:52

I'm working on ML model for fraud detection, and two features that I have is sender_IP_address and receiver_IP_address. I think that this is very important feature that can not be ignored. My question is, how can I handle this kind of feature? My dataset has around 100k rows and 80 columns. I know that IP is categorical data, and that I can use OneHotEncoder (for example), but from those 100k rows, I have around 70k unique IP addresses (one IP …

Topic: data-science-model machine-learning

Category: Data Science

ValueError: continuous is not supported

ruchi yadav

2022年5月20日 07:18

I am working on a regression problem and building a model using Random Forest Regressor but while trying to get the accuracy I am getting ValueError: continuous is not supported. train=pd.read_csv(r"C:\Users\DELL\OneDrive\Documents\BigMart data\Train.csv") test=pd.read_csv(r"C:\Users\DELL\OneDrive\Documents\BigMart data\Test.csv") df=pd.concat([train,test]) df.head() After Data Preprocessing and Visualization, I have tried to build the model : Please help with the error

Topic: data-science-model regression pandas machine-learning

Category: Data Science

Finding the worst affected industry due to COVID in terms of unemployment

NAS

2022年5月19日 01:03

My goal is to find the worst affected industries from COVID—19 in terms unemployment. In terms of the data I will use for this task, I have a time series county-wise unemployment rate data of each month and business distribution data. Business distribution data contains number of establishments in each county by their respective industries. (Manufacturing -121, Accommodation and Food Services -564, Construction-32 etc.) Unemployment rate data gives monthly unemployment rate in each county. From this data, what would your …

Topic: data-science-model distribution correlation

Category: Data Science

Combining two separate confusion matrix results from two seperate machine learning model to overall increase the True Positive accuracy

George Nicholson

2022年5月18日 14:57

What are the steps involved if it is possible to add two confusion matrix results together to get a better final prediction. we have calculated two confusion matrixs as follows from naive bayes and the decision tree True positive totals and lessen the False negatives.

Topic: data-science-model python machine-learning

Category: Data Science

massively imbalanced data

ahman

2022年5月18日 08:23

I am dealing with time series data with +200K (every minute for 6 months)record of gas turbine I am trying to early detect the fault (0 or 1-fault). The issues with the data are: 1.the fault occurred only 5 times (by observing the sudden shutdown). make the data hugely imbalanced. 2.(unsupervised) No binary output. I used 2 of the variables as my output and used them for binary clustering (kmeans) but the result not very good as there are false …

Topic: data-science-model prediction unsupervised-learning machine-learning

Category: Data Science

model.fit vs model.evaluate gives different results?

ptn77

2022年5月18日 03:04

The following is a small snippet of the code, but I'm trying to understand the results of model.fit with train and test dataset vs the model.evaluate results. I'm not sure if they do not match up or if I'm not understanding how to read the results? batch_size = 16 img_height = 127 img_width = 127 channel = 3 #RGB train_dataset = image_dataset_from_directory(Train_data_dir, shuffle=True, batch_size=batch_size, image_size=(img_height, img_width), class_names = class_names) ##Transfer learning code from mobilenetV2/imagenet here to create model initial_epochs = …

Topic: transfer-learning data-science-model keras evaluation accuracy

Category: Data Science

List of main statistics models

william _druk

2022年5月17日 13:13

I am not able to find some list of main statistics models. Is is possible to devide statistics models into categories as supervised (regression,classification) x unsupervised (clustering) or is it something which is used in filed of machine learning but not for categorizing statistics model? Thank you

Topic: data-science-model statistics

Category: Data Science

Two steps optimization of a credit card limit

Juan Esteban de la Calle

2022年5月17日 13:06

I have a problem similar to what is on the title but not the same. The problem on the title allows me to explain the dynamics of my need. I have to determine what the optimal value is for a variable called QUOTA or LIMIT for a credit card. The goal of the model is to allow me to minimize the probability of default, given this variable and others that characterize my costumer. What is the best way to determine …

Topic: data-science-model logistic-regression optimization

Category: Data Science

In Python, how can I transfer/remove duplicate columns from one dataset, such that the rows and columns of all datasets would be equal?

JERE.tech

2022年5月17日 08:48

So I've been trying to improve my Random Decision Tree model for the Titanic Challenge on Kaggle by introducing a Validation Dataset, and now I encounter this roadblock, as shown by the images below: Validation Dataset Test Dataset After inspecting these datasets using the .info function, I've found that the Validation Dataset contains 178 and 714 non-null floats, while the Test Dataset contains an assorted 178 and 419 non-null floats and integers. Further, the Datasets contain duplicate rows, which I …

Topic: data-science-model random-forest classification python

Category: Data Science

NameError: name 'librosa' is not defined

ali hayen

2022年5月16日 13:17

i'm working on Arabic Speech Recognition using Wav2Vec XLSR model. While fine-tuning the model it gives the error shown in the picture below. i can't understand what's the problem with librosa it's already installed !!!

Topic: data-science-model speech-to-text anaconda library deep-learning

Category: Data Science

How to count words in a dataframe?

Jasmine

2022年5月13日 12:55

I would like to count how many Male and female who answer (ex. Biking / Cycling). Below is the sample data:

Topic: data-science-model jupyter descriptive-statistics python machine-learning

Category: Data Science

How to build a unbiased predictive ML model when the record of the event is less compared to the total number of records?

Mashrafi Iqra

2022年5月13日 11:03

I am trying to build a model that will predict the communication loss of a wireless device. For now I am using RandomForestClassifier along with Device and Location as the features. I am getting both the train score and test score as 99%. So I am pretty sure the model is giving biased result. One of the reason might be because the record of communication loss events are very less compared to the the record with no communication loss Some …

Topic: data-science-model data machine-learning

Category: Data Science

About