supervised-learning

Which supervised ML model to use for exam/grade prediction?

Ahmed Mahmud

2022年6月4日 10:01

So I plan on making a mobile app that will let students predict their final grades based on their mock exam results. I can train my model with previous years results. X: 5 mock results Y: Final grade obtained However, I have the issue that sometimes, or most the times, the user may be using the app whilst not having taken ALL the mock exams yet, they may want to see if they are on track and use it once …

Topic: supervised-learning python machine-learning

Category: Data Science

Ideal difference in the training accuracy and testing accuracy

girl101

2022年6月2日 00:06

In a data classification problem (with supervised learning), what should be the ideal difference in the training set accuracy and testing set accuracy? What should be the ideal range? Is a difference of 5% between the accuracy of training and testing set okay? Or does it signify overfitting?

Topic: training data supervised-learning accuracy classification

Category: Data Science

Is it a best practice to exclude retweets from the data set?

user84037

2022年5月30日 21:06

I am going to build machine learning algorithm to identify fake tweets. The data set has huge retweets which I think might be an issue. Do you think given that the focus is the original tweet, it is better to remove all the retweets? Thank you,

Topic: supervised-learning pandas python machine-learning

Category: Data Science

How to train a model to predict if 2 samples refer to the same thing?

Martin

2022年5月30日 14:04

I have 2 ddbb with around 60,000 samples each. Both have the same features (same column names) that represent particular things with text or categories (turned into numbers). Each sample in a ddbb is assumed to refer to a different particular thing. But there are some objects that are represented in both ddbb, yet with somewhat different values in the same-name column (like different open descriptions, or classified as another category). The aim is to train a machine learning model …

Topic: automl text-classification feature-engineering supervised-learning

Category: Data Science

Laben Encoding for Target Classes: Any Integer or Consecutive Integers from Zero?

Hendrik

2022年5月30日 12:40

I'm handling an very conventional supervised classification task with three (mutually exclusive) target categories (not ordinal ones): class1 class2 class2 class1 class3 And so one. Actually in the raw dataset the actual categories are already represented with integers, not strings like my example, but randomly assigned ones: 5 99 99 5 27 I'm wondering whether it is requested/recommended to re-assign zero-based sequential integers to the classes as labels instead of the ones above like this: 0 1 1 0 2 …

Topic: supervised-learning scikit-learn classification python machine-learning

Category: Data Science

Apply Labeled LDA on large data

Xiancheng Li

2022年5月28日 05:05

I'm using a dataset contains about 1.5M document. Each document comes with some keywords describing the topics of this document(Thus multi-labelled). Each document belongs to some authors(not just one author for a document). I wanted to find out the topics interested by each author by looking at documents they write. I'm currently looking an LDA variation (labeled-LDA proposed by D Ramaga: https://www.aclweb.org/anthology/D/D09/D09-1026.pdf .). I'm using all the documents in my dataset to train a model and using the model to …

Topic: supervised-learning text-mining lda python

Category: Data Science

Perceptron Learning Rule

user67229

2022年5月25日 11:06

I am new to Machine Learning and Data Science. By spending some time online, I was able to understand the perceptron learning rule fairly well. But I am still clueless about how to apply it to a set of data. For example we may have the following values of $x_1$, $x_2$ and $d$ respectively:- \begin{align}&(0.6 , 0.9 , 0)\\ &(-0.9 , 1.7 , 1)\\ &(0.1 , 1.4 , 1)\\ &(1.2 , 0.9 , 0)\end{align} I can't think of how to …

Topic: supervised-learning deep-learning machine-learning

Category: Data Science

focal loss function help

M. Ahmad

2022年5月14日 02:04

I am working on a relation extraction and classification problem. The data is in the form of text files. The data is imbalanced. I want to use focal loss function to address class imbalance problem in the data. My question is: Can focal loss be utilized for extraction and classification task to increase the accuracy? Focal loss has been applied on object detection task and for image classification task. The link is below. I want to use this on text …

Topic: loss-function multiclass-classification supervised-learning class-imbalance

Category: Data Science

High Performance Classification or Similarity Algorithim for Mixed Data Types?

CyberBully2003

2022年5月13日 14:23

I have a database holding 10-ish features that describe different breeds of dogs. They are mostly categorical features, but some provide ranges for values. Here's a demo representation of the database, showing the mixture: |Breed|Min_Height|Max_Height|Min_Weight|Max_Weight|sub_cat|is_friendly| |---------------------------------------------------------------------| |Dober|20 |20 |40 |52 |sport |FALSE | |Pood |15 |25 |35 |45 |water |TRUE | ... As you can see, the data is mixed and the ranges have some overlap from entry to entry. Say I receive an input of: |height|weight|sub_cat|is_friendly| |---------------------------------| |16 |43 …

Topic: supervised-learning classification python similarity clustering

Category: Data Science

Can clustering results based on probability be used for supervised learning?

hahaha

2022年5月11日 16:33

I'm a beginner and I have a question. Can clustering results based on probability be used for supervised learning? Manufacturing data with 80000 rows. It is not labeled, but there is information that the defect rate is 7.2%. Can the result of clustering by adjusting hyperparameters based on the defect rate be applied to supervised learning? Is there a paper like this? Is this method a big problem from a data perspective? When using this method, what is the verification …

Topic: unsupervised-learning supervised-learning clustering machine-learning

Category: Data Science

When is the sum of models the model of the sum?

Sam Castillo

2022年5月10日 21:03

The response variable in a regression problem, $Y$, is modeled using a data matrix $X$. In notation, this means: $Y$ ~ $X$ However, $Y$ can be separated out into different components that can be modeled independently. $$Y = Y_1 + Y_2 + Y_3$$ Under what conditions would $M$, the overall prediction, have better or worse performance than $M_1 + M_2 + M_3$, a sum of individual models? To provide more background, the model used is a GBM. I was surprised …

Topic: mathematics supervised-learning regression statistics

Category: Data Science

Algorithm for user profiling without distinct profiles

Top Lit

2022年5月10日 02:09

I am trying to design an algorithm that takes in a new user with the variables department, location, job_role etc. and I want a machine-learning algorithm to decide what software and hardware this new user would need. I am rattling my brain thinking how I could get this to work - I could use a supervised learning approach and train a model with a dataset of already employed users and the software and hardware they use, however, the variables in …

Topic: data supervised-learning dataset python machine-learning

Category: Data Science

Proper iteration over time series data for LSTM neural network

Հայկ Ավետիսյան

2022年5月9日 11:45

I’m using the supervised learning method with an LSTM network to predict forex prices. To achieve this I’m using deeplearning4j library but I doubt several points of my implementation. I turned off the mini batch feature, then I created many trading indicators from forex data. The point is to provide random chunks of data to the neural network on every epoch and ensure that after every epoch the network state was cleaned. To achieve this I created a dataset iterator …

Topic: lstm normalization supervised-learning neural-network time-series

Category: Data Science

How to use fresh data when target prediction period is long?

Haffi112

2022年5月5日 12:03

I'm using supervised learning on monthly activity data to predict when a customer buys a particular product. This product is typically bought infrequently and at the moment my target variable is whether the customer buys the product in the next twelve months. Assume that for every customer I get a set of features every month, $x_1,x_2,\ldots,x_n$. The goal is to use these features to predict whether $y=0$ or $y=1$ ($y$ is 1 if the customer did buy the product in …

Topic: supervised-learning predictive-modeling

Category: Data Science

Time Series Generation - Multi Dimensional Time Series Data

Burple

2022年4月28日 05:05

Disclaimer: Mathematicians please don't be mad at me for the use of some of the terminologies in this post. I am an Engineer. :-) Background: So I am currently working on a problem where I have to generate a time series sequence of a process in which n actors are moving in a 2d space. But i don't know if this is even possible .The process being learned by some machine learning model M. BTW! I have never worked with …

Topic: generative-models supervised-learning deep-learning time-series machine-learning

Category: Data Science

Predict User Demographics from location based social networks

Nicolas Cardenas

2022年4月26日 13:01

I am currently working on an lbsn (localization-based social network) system and i need to predict the user's age and gender. Every time a user enters a venue, the system creates a "check-in" with the user, the venue and the datetime. Every venue is categorized using Foursquare Venue Categories. The system generate a Weigthed Concept Hierarchy to represent the interest level between a user and a Venue Category. Is it possible to predict the user's age and gender using the …

Topic: supervised-learning classification clustering machine-learning

Category: Data Science

Does knn extend the train dataset by test values during the prediction?

montty

2022年4月25日 01:07

Lets say I have 100 values in my dataset and split it 80% train 20% test. When predicting the last value, is the prediction based on previous 99 (80 test + 19 already predicted values) or only the original 80 train values? For example: if kd-tree is used, is every data point inserted into the tree during the prediction? Is it possible to use knn for the following scenario? I have 20 train values, when I add new observation I …

Topic: k-nn supervised-learning classification machine-learning

Category: Data Science

What kind of learning is needed for anomaly detection? Supervised learning, semi-supervised learning or unsupervised learning?

disney82231

2022年4月14日 14:02

I am doing anomaly detection recently, one of the methods is using AEs model to learn the pattern of normal samples. Determine it as an abnormal sample if it doesn’t match the pattern of normal samples. I train AE without labels but we need to use ‘label’ to determine which sample is normal or abnormal. I am wondering what kind of this training is supervised learning,semi-supervised learning or unsupervised learning?

Topic: unsupervised-learning supervised-learning anomaly-detection semi-supervised-learning

Category: Data Science

bad regression performance on imbalanced dataset

Michael_S

2022年4月12日 07:01

My current dataset has a shape of 5300 rows by 160 columns with a numeric target variable range=[641, 3001]. That’s no big dataset, but should in general be enough for decent regression quality. The columns are features from different consecutive process steps. The project goal is to predict the numerical variable, with the satisfactory object to be very precise in the area up too 1200, which are 115 rows (2,1%). For target variables above 1200 the precision can be lower …

Topic: supervised-learning regression class-imbalance performance

Category: Data Science

How to do feature engineering for email cleaning / text extraction?

Jan Parzydło

2022年4月5日 00:04

I have a large batch of email data that I want to analyse. In order to do that, I need to first prepare the data, as the messages are quite often >80% noise. Generally speaking, my dataset's structure is nowhere near that of the ENRON dataset. I need to get rid of signatures, headers and, most importantly, automatically appended legal / security disclaimers. I have been doing some research and so far I've seen two supervised learning approaches to this …

Topic: feature-engineering supervised-learning feature-selection

Category: Data Science

About