What Non-linearities are best in Denoising RNN Autoencoders and where should the go?

I’m employing a denoising RNN autoencoder for a project relating to motion capture data. This is my first time using auto encoder architectures and I was just wondering what non-linearities should be placed in these models and where they should go. This is my model as it stands: class EncoderRNN(nn.Module): def __init__(self, input_size, hidden_size, num_layers): super(EncoderRNN, self).__init__() self.input_size = input_size self.hidden_size = hidden_size self.num_layers = num_layers self.rnn_enc = nn.RNN(input_size=input_size, hidden_size=hidden_size, num_layers=num_layers, batch_first=True) self.relu_enc = nn.ReLU() def forward(self, x): pred, hidden …
Category: Data Science

Classification or Regression approach?

I have a dataset with x variables and the target y (between 0 and 100%, so 0 and 1) My goal os to predict if a sample is in a group of y [0,0.25), [25,50) or [50,100]. And I am wondering if I should use a classification model and number these groups with 3 labels [0,1,2] or perform a regression to obtain a specific value (e.g. 0,18 or 18%) and get the grouping later. Which approach should be used/yield better …
Category: Data Science

r or r+1 in Temporal Difference Learning?

this is probably a very simple question for most of you but I have seen this different formulation of the TD Learning function in many different papers and can't really wrap my head around it: Just as an example: In the english wikipedia you see (https://en.wikipedia.org/wiki/Temporal_difference_learning) the value being updated based on the immediate reward whereas in the German wikipedia (https://de.wikipedia.org/wiki/Temporal_Difference_Learning) it is updated based on the reward in the upcoming trial. This is not the same equation since the …
Category: Data Science

How to manually calculate the gradient that will propagate back over the network using the REINFORCE algorithm?

I am trying to implement deep reinforcement policy gradient REINFORCE in C++ and for my case there is no "autograd" method like in pytorch so I have to manually calculate the gradient. Let´s imaging that I have a scenario where the state space size is 4 and action space size is 2 (Cartpole). Also I collected the followind data for 3 steps: action probability (softmax): [0.21, 0.34, 0.45], [0.91, 0.01, 0.08], [0.50, 0.30, 0.20] sampled action (one hot encoder) : …
Category: Data Science

1D Convolution on multiple channels of varying length

Every datapoint in my dataset consists of 3 time series. The data in the time series is discretized into equal time-bins but the 3 time series were measured for varying length. Time series 1 has 10 bins, series 2 5 bins and series 3 only 1 single bin. So an exmaple datapoint looks like this 1, 4, 1, 7, 3, 7, 3, 7, 3, 1 9, 6, 4, 7, 1 4 I would like to run two 1D convolutional layers …
Category: Data Science

Response variable at the group level, independent variables at the entity level

I have a dataset of entities which each belong to a particular group (i.e. entity=schools and group=school district). I also have lots of auxiliary variables on each entity. However, for my response variable I only have information at the group level (i.e. response variable only at the district level, but regressor variables at the school level). Can someone recommend an algorithm or class of algorithms that might be appropriate here? I'd rather not aggregate all the auxiliary information up to …
Category: Data Science

How to change/adapt loss function while using "class" incremental learning

As a beginner on class incremental learning and trying to understand the general concept. In class incremental learning, we have a model that can make a classification between classes A, B, and C. By using data from another class D, we want to apply class incremental learning to obtain a model which can predict classes among A, B, C and D without training all data with all classes from scratch. My question is, since our softmax (that we use to …
Category: Data Science

Furthering brewing education

I've been through a bunch of books, resources and articles and I'm wondering if there are any specific classes/fields that I can audit. There aren't any brewing programs near me in the formal sense so I'm looking for college-level classes or something that would benefit my homebrewing and further a potential career.
Category: Mac

Individual models gives quite same distribution on Test set, whereas Ensembling gives better result but very different distribution

I am working on a binary classification problem with unbalanced data (17% for positive class). The problem is as following: My three individual models when predicting on the test set (for which I don't have the labels) gives quite similar distribution as for Train set. But ensemling these models, while giving slighltly better result (F1-score), it drastically changes the distribution on Test set going from ~20% to 5%. My question is : I am confused between choosing the best individual …
Category: Data Science

using average precision as metric for imbalanced problem (learning curve example)

I have an imbalanced problem (2% target class) and therefore need an appropriate metric - so I chose average_precision. My code: cv = StratifiedKFold(n_splits=3, shuffle=True, random_state=42) train_sizes, train_scores, test_scores = learning_curve( estimator, X, y, cv=cv, n_jobs=2, train_sizes=train_sizes, scoring= 'average_precision') train_scores_mean = np.mean(train_scores, axis=1) test_scores_mean = np.mean(test_scores, axis=1) plt.grid() However, when I do this, I get a pretty poor result. What can I do differently? Should I undersample? I care about the probabilities so am wondering what best approach is to …
Category: Data Science

How to approach for predicting semantic similarity between two phrases

I need pointers on the latest research, tools, and techniques for predicting semantic similarity between two phrases. Problem Statement: Given two propositions A and B with A know to be true, predict if B agrees, contradicts, or is neutral. Examples A = X is greater than Y | B = Y is greater than X | Result = contradict A = X is greater than Y | B = Y is less than X | Result = agree A = …
Category: Data Science

Linear Regression

I'm starting to learn machine learning and one of the first things that is mentioned is the usage of a linear regression method. Basically, we have a bunch of data points and we want to fit a line such that the errors we get from the line and the actual data points are minimized. I understand the theory and why we would use, for example, something like gradient search methods to find the global minimum point. What I don't understand …
Category: Data Science

Tableau: Dealing with Date values

Im a noob at tableau. I have a quick question I have data that doesn't explicitly list the date. (There is no column labelled "date" ) However there are columns with the following headers: "Income 2015", "Income 2016", "Income 2017" (the cells within these columns have financial data and do not have a "date") Is there a way I can write a rule or do manually that will allow me to create dates (by year) based on these header names? …
Category: Data Science

Colab can not connect to GPU from a python file

I am trying to run a github deep learning repository in Colab but I can not connect the python files to colab GPU. I can connect to GPU when writing a script in the colab notebook e.g. when I run this cod from a notebook cell : import os, torch print('Torch', torch.__version__, 'CUDA', torch.version.cuda) print('Device:', torch.device('cuda:0')) print(torch.cuda.is_available()) I get: Torch 1.4.0 CUDA 10.1 Device: cuda:0 True but when I run it from a file called myExample.py e.g. using !python myExample.py …
Category: Data Science

Knowing Joint probability distribution between feature-label space

I am doing a course CORNELL CS4780 "Machine Learning for Intelligent Systems". you can find the link here for the one I am going to refer 1st lecture The professor explains, we have a sample $D ={ (X_1,y_1),(X_2,y_2), \ldots,(X_n,y_n)} \sim P$ Where, (Xi,yi) is a feature-label pair. There is a joint distribution over the feature-label space and is denoted by $P$. We never have access to the $P$, Only God knows $P$. What we want to do in this supervised …
Category: Data Science

How to use learning curve in reality

CONTEXT: I have some simulated data by which I made and trained a model. during my training, I enjoyed having a large number of samples, and therefore my model is leveraging it by being decently complex. Yet since this model is trained on simulated data, in action, the model must be trained again once the real data is collected. Obtaining those real data is hard and we want to know how many data samples are needed before the model starts …
Category: Data Science

Sklearn Decision Tree as weak learner in Adaboost not working properly

I'm trying to implement Adaboost algorithm with sklearn decision tree as the Weak Learner - at each step I want to choose one feature with one threshold to classify all samples. I have 1400 long feature vectors and want to label them 1 or -1. The features are words from movie ratings and the label represents "bad" or "good". At some of the iterations, the decision tree decides on a feature, threshold 0.5, and classifies all samples as -1 (no …
Category: Data Science

How to explain the connection between the input layer and H1 of this CNN Architecture?

I am currently reading the paper proposed by LeCun et al. for handwritten zip code recognition. There is this figure below visualizing the CNN architecture. But I do not really understand how the connection between Layer H1 and input layer makes sense. If there are 12 kernels with size 5x5, shouldn't the layer H1 be 12x144? Or is there any downsampling taking place here too?
Topic: cnn learning
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.