Model Undetermined Number of Labels

I'm look for tutorials on how to build a Tensorflow model that generates predictions from input, for example, generating sentences from a paragraph, then the loss is determined when compared to ground truth labels. Or generating a number of predictions for objects found in an image. The main idea is having undetermined number of predictions or labels.
Category: Data Science

Prediction issue with xgboost custom loss

I have an issue with xgboost custom objectives: I do not manage to get consistent forecasts. In other words, the scale of my forecasts is not in line with the values I would like to predict. I tried many custom loss, but I always get the same issue. import numpy as np import pandas as pd import xgboost as xgb from sklearn.datasets import make_regression n_samples_train = 500 n_samples_test = 100 n_features = 200 X, y = make_regression(n_samples_train, n_features,noise=10) X_test, y_test …
Category: Data Science

Random Forest Classifier Output

Used a RandomForestClassifier for my prediciton model. But the output printed is either 0 or in decimals. What do I need to do for my model to show me 0 and 1's instead of decimals? Note: used feature importance and removed the least important columns,still the accuracy is the same and the output hasn't changed much. Also, i have my estimators equal to 1000. do i increase or decrease this? edit: target col 1 0 0 1 output col 0.994 …
Category: Data Science

Merging two datasets with different features for machine learning prediction

I'm trying to create a model which predicts Real estate prices with xgboost in machine learning, my question is : Can i combine two datasets to do it ? First dataset : 13 features Second dataset : 100 features Thé différence between the two datasets is that the first dataset is Real estate transaction from 2018 to 2021 with features like area , région And the second is also transaction but from 2011 to 2016 but with more features like …
Category: Data Science

Very low probability in naive Bayes classifier 1

I have some training data (TRAIN) and some test data (TEST). Each row of each table contains an observed class (X) and some columns of binary (Y). I'm using a Python script that is intended to predict the probability (Pr) of X given Y in the test data based on the training data. It uses a Bernoulli naive Bayes classifier. Here is my script: https://stackoverflow.com/questions/55187516/look-up-bernoullinb-probability-in-dataframe It works on the dummy data that is included with the script. On the real …
Category: Data Science

Help with Time Series prediction

I'm a complete n00b to both this stackexchange and ML so please don't flame me too bad. I am trying to make a prediction from Time Series data. I have about 10 years worth of 1-minute resolution price data for the S&P500. What I'd like to do is treat each DAY in the data as it's own series to predict what the price movement will be for the last 15 minutes of market hours. I've looked through several books, some …
Category: Data Science

How to make predictions of multiple input samples at once in tf 2 with keras

I am quite confused on the output of model.predict when after training I validate my model on around 6000 samples I use the following pseudo code: model.fit(...) predictions = model.predict(val_set) len(predictions) # == len(val_set) result: tensor array of shape=(len(tensor_array),14) (one prediction for each input sample) in production I currently use the following code to ocr image numbers: model = tf.keras.models.load_model('number_ocr_v2') def predict(list_images): global model print("length:") print(len(list_images)) #predictions = model.predict(list_images) # <- Same result predictions = model.predict_on_batch(list_images) print(len(predictions)) print(predictions) console Output: …
Category: Data Science

Predicting Customer Activity Absence

Could you please assist me with to following question? I have a customer activity dataframe that looks like this: It contains at least 500.000 customers and a "timeseries" of 42 months. The ones and zeroes represent customer activity. If a customer was active during a particular month then there will be a 1, if not - 0. I need determine those customers that most likely (+ probability) will not be active during the next 6 months (2018 July-December). Could you …
Category: Data Science

How to Predict/Forecast street's Traffic based on previous values?

I have a dataset which has the following 5 columns: date, hour, day_of_week, street_id, counts My dataset has information about the number of cars that each street (same city) has in a given hour of a certain date, and I want to predict the traffic count that a certain street has in a given hour of a certain date. I think I could use certain variables depending on the day and hour that I want to predict, for example, if …
Category: Data Science

Why are predictions from my LSTM Neural Network lagging behind true values?

I am running an LSTM neural network in R using the keras package, in an attempt to do time series prediction of Bitcoin. The issue I'm running into is that while my predicted values seem to be reasonable, for some reason, they are "lagging" or "behind" the true values. Right below is some of my code, and farther down I have some graphs to show you what I mean. My model code: batch_size = 2 model <- keras_model_sequential() model%>% layer_lstm(units=22, …
Category: Data Science

Time Series Classification for loan data

I have multiple columns for loan installment repayment. As there is a field for month of repayment, I want to predict if the customer is going to pay next month's installment or not. As I have multiple variables and target variable as installment paid (Y/N), despite repayment being dependent on time variable, i.e., installments paid in past months, I'm looking to solve this problem with time series classification. Any references will be appreciated.
Category: Data Science

How would I check the validity of covariates in my linear model on several hundred datasets?

I have this linear model with predictors that I need to prove are statistically significant and pass the necessary lm assumptions. I know for a single dataset, I can use various LM tests, but the problem is I have several hundred datasets which cannot be combined. Coefficients may be different for each dataset, but I just need to prove(or disprove) that the covariates can be used for lm across all models. I'm assuming I shouldn't run tests on each LM …
Category: Data Science

Logistic Regression for prediction

I would like to ask about the theoretical approach of using Logistic Regression for customer data and more specifically Churn Prediction (in BigQuery and Python). I have my customer data for an online shop and I would like to predict if the customer will churn based on some characteristics. I have created my dataset and the Churn label (based on the hypothesis that if the customer hasn't bought something in the last year then it is assumed that the customer …
Category: Data Science

prediction using LSTM

i have training data from 2015-2017 and testing data of 2018. i have multiple variables my data is multivariate time series data.i want to predict 2019 data by using test data of 2018.is it possible? i am confused about Long short term memory neural networks working what is actually it will do.does my problem come under multivariate multi step forecasting? or multivariate single step forecasting?
Topic: prediction
Category: Data Science

Looking for a ML algorithm to predict a path based on millions of data

I have a dataset with following data format: 3 -> a -> b -> c -> d -> ikd a -> c -> 3 -> dk -> 2 -> l2i Each row represents a path from start to end. Let's take the first row as an example. The start point is 3 and the endpoint is ikd. I have millions of rows like that. And each row may have a different length. What I want to do is let users …
Category: Data Science

Time series prediction using ARIMA vs LSTM

The problem that I am dealing with is predicting time series values. I am looking at one time series at a time and based on for example 15% of the input data, I would like to predict its future values. So far I have come across two models: LSTM (long short term memory; a class of recurrent neural networks) ARIMA I have tried both and read some articles on them. Now I am trying to get a better sense on …
Category: Data Science

massively imbalanced data

I am dealing with time series data with +200K (every minute for 6 months)record of gas turbine I am trying to early detect the fault (0 or 1-fault). The issues with the data are: 1.the fault occurred only 5 times (by observing the sudden shutdown). make the data hugely imbalanced. 2.(unsupervised) No binary output. I used 2 of the variables as my output and used them for binary clustering (kmeans) but the result not very good as there are false …
Category: Data Science

Get negative predicted value in Support Vector Regresion (SVR)

I am doing Covid-19 cases prediction using SVR, and getting negative values, while there should be no number of Covid-9 cases negative. Feature input that I was used is mobility factor (where have negative data) and daily cases of Covid-19. Kernel that I used is RBF kernel. Can anyone explain why I am getting negative values? are the independent variable (mobility) that I used influence that?
Category: Data Science

Where should I find electrolytic capacitor ageing data

I am trying to get a dataset of Electrolytic capacitors ageing and I am not being able to find one that shows the ripple current and the voltage in order to calculate its Equivalent Series Resistance (a nice parameter to check its degradation). I have look on the typical sites (kaggle, dataworld...) but I found none. May someone recomend me a site? Thank you!
Category: Data Science

How does an RNN differ from a CBOW model

CBOW: We are trying to predict the next word based on the context (defined as a certain window of words around the target word) RNN can also be used for predicting the next word in a sequence, where each time the input is the present input and the recent past (i.e. output of the previous step) I am not able to understand how the RNN's approach is somehow better, because I could define a very large window for CBOW, and …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.