classifier

Why is my training accuracy decreasing higher degrees of polynomial features?

Apoorv Jain

2022年6月3日 00:10

I am new to Machine Learning and started solving the Titanic Survivor problem on Kaggle. While solving the problem using Logistic Regression I used various models having polynomial features with degree $2,3,4,5,6$ . Theoretically the accuracy on training set should increase with degree however it started decreasing post degree $2$ . The graph is as per below

Topic: classifier logistic-regression accuracy scikit-learn

Category: Data Science

How to deal with name strings in large data sets for ML?

Danny Abstemio

2022年6月1日 23:04

My data set contains multiple columns with first name, last name, etc. I want to use a classifier model such as Isolation Forest later. Some word embedding techniques were used for longer text sequences preferably, not for single-word strings as in this case. So I think these techniques wouldn't be the way that will work correctly. Additionally Label encoding or Label binarization may not be suitable ways to work with names, beacause of many different values on the on side …

Topic: preprocessing classifier encoding nlp python

Category: Data Science

Discrete values as target variable

Chinti

2022年5月12日 17:23

I have discrete values in the target variable(Exactly 13 different values in total) . When I am giving that as input to Random forest Classifier ,it gives error that input as continuous. And if I give it to regressor it is predicting a value between the discrete values. How can I treat this problem

Topic: classifier random-forest scikit-learn python

Category: Data Science

Attitude to text mining and preparing tokens, irrelevant words, low accuracy

heisenberg7584

2022年5月2日 10:01

For purpose of quite big project I am doing a text mining on some documents. My steps are quite common: All to lower case Tokenization Stop list and stop words Lemmatizaton Stemming Some other steps like removing symbols. Then I prepare bag of words, make DTF and classify to 3 classes with SVM and Naive Bayes. But the accuracy I get is not too high (50-60%). I think that may be because in array of words after all the steps …

Topic: classifier naive-bayes-classifier text-mining classification

Category: Data Science

Difference in model performance measures of train and test data sets

Manu Vats

2022年4月30日 18:01

I am using CART classification technique by dividing a dataset into train and test sets. I have been using Mis-classification error, KS by rank ordering, AUC and Gini as MPMs(model performance measures). The problem I am facing is that the MPM values are quite far apart. Dataset Metadata I have tried with minsplit equal to anywhere from 20 to 1400 and minbucket from 5 to 100 but couldn't get expected results. I have also tried oversampling/undersampling through ROSE package but …

Topic: classifier classification r machine-learning

Category: Data Science

Combining multiple probabilities from a classifier. Propagating probabilities

AstroAllie

2022年4月17日 03:07

Let's say I have trained a classifier that classifies images of animals into 10 different classes. And let's say that I have 20 different images of a particular animal and because I know the photographer, I know with certainty that all 20 images are of the same animal. So I use my classifier to make a prediction on what animal it is and get 20 predictions one for each image. The model predicts all the images to be a dog …

Topic: bayesian probability classifier statistics

Category: Data Science

Classification Model based on Ordered Features

Deepak Sharma

2022年4月5日 01:04

I am trying to build a classifier for a specific card dataset let's say cards or no cards. I am using Mobilenet trained on the Imagenet dataset as my classifier and further training it on my dataset. I am able to train it, and its performance is quite good on the dataset. Let's say my card has for different regions of interest as shown below:- It is able to perfectly recognize the above-passed image as a card. But I am …

Topic: classifier machine-learning

Category: Data Science

output F1-score instead of Accuracy

Pedro Silvestre

2022年4月3日 19:47

I have the code below outputting the accuracy. How can I output the F1-score instead? clf.fit(data_train,target_train) preds = clf.predict(data_test) # accuracy for the current fold only r2score = clf.score(data_test,target_test)

Topic: f1score classifier accuracy

Category: Data Science

Correct approach to usage of class labels in cell imaging data

OParry

2022年4月2日 15:20

As part of a group project at university, we are given a series of videos of cell cultures over a 24 hour period. A number of these cells (the "knockout" cells) have had a particular gene removed, which is often absent or mutated in malignancy. We are using a blob detection algorithm to identify the cell centers and radii and further processing to match cells frame-to-frame to build up individual paths, which we then use to calculate various features. We …

Topic: training labels classifier binary

Category: Data Science

Question about reshaping array size for KNN Classifiers

LeeAnn Capistran

2022年3月20日 21:03

I keep trying to run a new set of data through my KNN Classifier but would recieve the message: ValueError: query data dimension must match training data dimension It then used: x_new = pd.read_csv('NewFeaturePractice.csv' , names = attributes) x_new = x_new.values.reshape(52,84) (which is the dimensions of the training data) but would then receive: ValueError: cannot reshape array of size 672 into shape (52,84) The second data set doesn't have the same amount of rows as the first meaning that even …

Topic: k-nn classifier python

Category: Data Science

How to make a classifier work the opposite way?

Herbert

2022年2月7日 20:15

if I have a dataset of (x,y) and target f, how do I learn a model based on that dataset that allows me to insert value of f and get the optimal conditions (x,y) that correspond to it ? thanks in advance.

Topic: classifier optimization dataset python

Category: Data Science

Predicting Disease Drugs

Atom Store

2021年11月20日 07:34

I have a dataset in the format: Keywords Disease/Drugs bradycardia, insomnia, hypotension, hearinglos... NSAIDS Poisoning vomiting, nausea, diarrhea, seizure, edema, an... NSAIDS Poisoning pancreatitis, gi, symptoms, restlessness, leuk... Chronic abacavir use (Nucleoside Analog Revers.. ards, apnea, hepatotoxicity, dyspnea, pulmonar... Chronic stavudine and didanosine use (Nucleosi... There are many data but it is in this format. Converted above data into the format, exploded, and created new rows according to , Keywords Disease/Drugs bradycardia NSAIDS Poisoning insomnia NSAIDS Poisoning pancreatitis Chronic stavudine …

Topic: classifier decision-trees scikit-learn python predictive-modeling

Category: Data Science

Find VC dimension of 1D data

Tuhin Dutta

2021年11月17日 08:58

Consider a data setup of one-dimensional data ∈ R1, where the hypothesis space H is parametrized by {p,q} where x is classified as 1 iff p < x < q. What will be the VC(H)? Here's my approach: Since 1D data so we can represent the hypothesis space in a number line. We will consider 2 points and try all possibilities and see if they can all be classified correctly. Assume data points are d1 and d2. case1: p < …

Topic: vc-theory classifier

Category: Data Science

How to determine which classes are easier to predict with a decision tree?

Anđela Todorović

2021年11月9日 04:45

So, I'm trying to work with decision trees on Iris dataset. I've noticed by trying out different parameter (max_depth, leaves etc) that some of the classes are easier to predict (most of the trees give the same prediction). How do I justify this, and is there a way to visualize it based on different trees?

Topic: classifier decision-trees classification

Category: Data Science

Image Classification on non real images

dadrake

2021年9月21日 04:36

I was wondering how image classifier networks perform on images that are not photographs. For example, if you were to feed a drawing of a car or a face to an image classifier that was only trained on photos would the network still be able to classify the image correctly? Furthermore, what if you were to feed more and more abstract drawings into the network. As humans, we are able to recognize objects even in abstract forms (i.e., modern art) …

Topic: image image-recognition image-classification classifier

Category: Data Science

Classification for different thresholds

user3742518

2021年8月15日 18:48

Betting markets offer betting lines for football matches, where you can bet over or under x offside for a team. For example, for one match they can offer U4.5 offside with odds 2.0/2.0 (lets assume there's no rake). Other matches, where for certain reasons there will be lower likelihood of offside during the match, they could offer U2.5 offside for same odds. Hence, I want to create a model (I have the data) which can give probabilities which can be …

Topic: classifier machine-learning

Category: Data Science

Credit scorecard model

Pavan

2021年8月11日 18:50

Could anyone point me to a blog or content that talks about creating credit scorecards without logistic regression models? Instead, if we use an ensemble technique, such as random forest, how can we create scorecards? Essentially, how do we create a scorecard model through a classifier which is difficult to interpret like random forest? Thanks in advance.

Topic: classifier

Category: Data Science

How is there an inverse relation between precision and recall?

achhainsan

2021年7月3日 09:10

What I know? Firstly, Precision= $\frac{TP}{TP+FP}$ Recall=$\frac{TP}{TP+FN}$ What book says? A model that declares every record has high recall but low precision. I understand that if predicted positive is high, precision will be low. But how will recall be high if predicted positive is high. A model that assigns a positive class to every test record that matches one of the positive records in the training set has very high precision but low recall. I am not able to properly …

Topic: classifier confusion-matrix data-mining

Category: Data Science

How can I do the correlation between two estimators?

juanmah

2021年6月17日 17:03

I'm working with several estimators of all kind. Then, I want to stack these estimators, and the best is if they have low correlation between them. I suppose that the correlation method depends on the type of dependent variable, if it's categorical or numerical. In my case, it's categorical, and the estimators are classifiers. How can I do the correlation between two estimators?

Topic: estimators classifier correlation

Category: Data Science

What input for a combined model (3 nets)

CasellaJr

2021年6月16日 13:09

I have this architecture, made of 3 NNs: In code: class VGGBlock(nn.Module): def __init__(self, in_channels, out_channels,batch_norm=False): super(VGGBlock,self).__init__() conv2_params = {'kernel_size': (3, 3), 'stride' : (1, 1), 'padding' : 1 } noop = lambda x : x self._batch_norm = batch_norm self.conv1 = nn.Conv2d(in_channels=in_channels,out_channels=out_channels , **conv2_params) self.bn1 = nn.BatchNorm2d(out_channels) if batch_norm else noop self.conv2 = nn.Conv2d(in_channels=out_channels,out_channels=out_channels, **conv2_params) self.bn2 = nn.BatchNorm2d(out_channels) if batch_norm else noop self.max_pooling = nn.MaxPool2d(kernel_size=(2, 2), stride=(2, 2)) @property def batch_norm(self): return self._batch_norm def forward(self,x): x = self.conv1(x) x = …

Topic: multi-output vgg16 pytorch classifier neural-network

Category: Data Science

About