I’m employing a denoising RNN autoencoder for a project relating to motion capture data. This is my first time using auto encoder architectures and I was just wondering what non-linearities should be placed in these models and where they should go. This is my model as it stands: class EncoderRNN(nn.Module): def __init__(self, input_size, hidden_size, num_layers): super(EncoderRNN, self).__init__() self.input_size = input_size self.hidden_size = hidden_size self.num_layers = num_layers self.rnn_enc = nn.RNN(input_size=input_size, hidden_size=hidden_size, num_layers=num_layers, batch_first=True) self.relu_enc = nn.ReLU() def forward(self, x): pred, hidden …
I have a dataset with x variables and the target y (between 0 and 100%, so 0 and 1) My goal os to predict if a sample is in a group of y [0,0.25), [25,50) or [50,100]. And I am wondering if I should use a classification model and number these groups with 3 labels [0,1,2] or perform a regression to obtain a specific value (e.g. 0,18 or 18%) and get the grouping later. Which approach should be used/yield better …
this is probably a very simple question for most of you but I have seen this different formulation of the TD Learning function in many different papers and can't really wrap my head around it: Just as an example: In the english wikipedia you see (https://en.wikipedia.org/wiki/Temporal_difference_learning) the value being updated based on the immediate reward whereas in the German wikipedia (https://de.wikipedia.org/wiki/Temporal_Difference_Learning) it is updated based on the reward in the upcoming trial. This is not the same equation since the …
I have a binary classification task for time series data. Every 14 rows in my CSV is relevant to one time slot. How should I prepare this data to be used in LSTM? In other word how to feed the model with this data?
I am trying to implement deep reinforcement policy gradient REINFORCE in C++ and for my case there is no "autograd" method like in pytorch so I have to manually calculate the gradient. Let´s imaging that I have a scenario where the state space size is 4 and action space size is 2 (Cartpole). Also I collected the followind data for 3 steps: action probability (softmax): [0.21, 0.34, 0.45], [0.91, 0.01, 0.08], [0.50, 0.30, 0.20] sampled action (one hot encoder) : …
Every datapoint in my dataset consists of 3 time series. The data in the time series is discretized into equal time-bins but the 3 time series were measured for varying length. Time series 1 has 10 bins, series 2 5 bins and series 3 only 1 single bin. So an exmaple datapoint looks like this 1, 4, 1, 7, 3, 7, 3, 7, 3, 1 9, 6, 4, 7, 1 4 I would like to run two 1D convolutional layers …
I have a dataset of entities which each belong to a particular group (i.e. entity=schools and group=school district). I also have lots of auxiliary variables on each entity. However, for my response variable I only have information at the group level (i.e. response variable only at the district level, but regressor variables at the school level). Can someone recommend an algorithm or class of algorithms that might be appropriate here? I'd rather not aggregate all the auxiliary information up to …
As a beginner on class incremental learning and trying to understand the general concept. In class incremental learning, we have a model that can make a classification between classes A, B, and C. By using data from another class D, we want to apply class incremental learning to obtain a model which can predict classes among A, B, C and D without training all data with all classes from scratch. My question is, since our softmax (that we use to …
I've been through a bunch of books, resources and articles and I'm wondering if there are any specific classes/fields that I can audit. There aren't any brewing programs near me in the formal sense so I'm looking for college-level classes or something that would benefit my homebrewing and further a potential career.
I have a dataset for ner which is tagged using BILOU tagging method and example of same is below Minjun B-Person is O from O South B-Location Korea I-Location . O i wish to visualize how model learn if i use lstm with crfD for example how this model is learning context of current sentence
I am working on a binary classification problem with unbalanced data (17% for positive class). The problem is as following: My three individual models when predicting on the test set (for which I don't have the labels) gives quite similar distribution as for Train set. But ensemling these models, while giving slighltly better result (F1-score), it drastically changes the distribution on Test set going from ~20% to 5%. My question is : I am confused between choosing the best individual …
I have an imbalanced problem (2% target class) and therefore need an appropriate metric - so I chose average_precision. My code: cv = StratifiedKFold(n_splits=3, shuffle=True, random_state=42) train_sizes, train_scores, test_scores = learning_curve( estimator, X, y, cv=cv, n_jobs=2, train_sizes=train_sizes, scoring= 'average_precision') train_scores_mean = np.mean(train_scores, axis=1) test_scores_mean = np.mean(test_scores, axis=1) plt.grid() However, when I do this, I get a pretty poor result. What can I do differently? Should I undersample? I care about the probabilities so am wondering what best approach is to …
I need pointers on the latest research, tools, and techniques for predicting semantic similarity between two phrases. Problem Statement: Given two propositions A and B with A know to be true, predict if B agrees, contradicts, or is neutral. Examples A = X is greater than Y | B = Y is greater than X | Result = contradict A = X is greater than Y | B = Y is less than X | Result = agree A = …
I'm starting to learn machine learning and one of the first things that is mentioned is the usage of a linear regression method. Basically, we have a bunch of data points and we want to fit a line such that the errors we get from the line and the actual data points are minimized. I understand the theory and why we would use, for example, something like gradient search methods to find the global minimum point. What I don't understand …
Im a noob at tableau. I have a quick question I have data that doesn't explicitly list the date. (There is no column labelled "date" ) However there are columns with the following headers: "Income 2015", "Income 2016", "Income 2017" (the cells within these columns have financial data and do not have a "date") Is there a way I can write a rule or do manually that will allow me to create dates (by year) based on these header names? …
I am trying to run a github deep learning repository in Colab but I can not connect the python files to colab GPU. I can connect to GPU when writing a script in the colab notebook e.g. when I run this cod from a notebook cell : import os, torch print('Torch', torch.__version__, 'CUDA', torch.version.cuda) print('Device:', torch.device('cuda:0')) print(torch.cuda.is_available()) I get: Torch 1.4.0 CUDA 10.1 Device: cuda:0 True but when I run it from a file called myExample.py e.g. using !python myExample.py …
I am doing a course CORNELL CS4780 "Machine Learning for Intelligent Systems". you can find the link here for the one I am going to refer 1st lecture The professor explains, we have a sample $D ={ (X_1,y_1),(X_2,y_2), \ldots,(X_n,y_n)} \sim P$ Where, (Xi,yi) is a feature-label pair. There is a joint distribution over the feature-label space and is denoted by $P$. We never have access to the $P$, Only God knows $P$. What we want to do in this supervised …
CONTEXT: I have some simulated data by which I made and trained a model. during my training, I enjoyed having a large number of samples, and therefore my model is leveraging it by being decently complex. Yet since this model is trained on simulated data, in action, the model must be trained again once the real data is collected. Obtaining those real data is hard and we want to know how many data samples are needed before the model starts …
I'm trying to implement Adaboost algorithm with sklearn decision tree as the Weak Learner - at each step I want to choose one feature with one threshold to classify all samples. I have 1400 long feature vectors and want to label them 1 or -1. The features are words from movie ratings and the label represents "bad" or "good". At some of the iterations, the decision tree decides on a feature, threshold 0.5, and classifies all samples as -1 (no …
I am currently reading the paper proposed by LeCun et al. for handwritten zip code recognition. There is this figure below visualizing the CNN architecture. But I do not really understand how the connection between Layer H1 and input layer makes sense. If there are 12 kernels with size 5x5, shouldn't the layer H1 be 12x144? Or is there any downsampling taking place here too?