Forecasting on multivariate time series containing quaternions

I have a multivariate time series containing 3D position data ($x,y,z)$ and orientation data (as quaternions) obtained from motion sensors. My goal is to forecast the future position/orientation, and for this I'm looking into use sequence models, esp. LSTMs. A quaternion has 4 elements, one of them denoting the real/scalar part (say $q_w$) and the other three denoting the imaginary/vector part (say $q_x, q_y, q_z$). So my time series has 7 columns in total. My question: Considering that quaternion elements …
Category: Data Science

Regression sequence output loss function

I am fairly new to deep learning, and I have the following task. Based on an audio sequence of shape (200, 1024), I have to predict two sequences of shape (200, 1) of continuous values (for e.g 0.5687) that represent the emotion at each timestep (valence "v" and arousal "a"). So I've created the following LSTM: inputs_audio = Input(shape=(200, 1024)) audio_net = LSTM(256, return_sequences=True)(inputs_audio) audio_net = LSTM(256, return_sequences=True)(audio_net) audio_net = LSTM(256, return_sequences=False)(audio_net) audio_net = Dropout(0.3)(audio_net) final_model = audio_net target_names = …
Category: Data Science

Sequence models word2vec

I am working a data-set with more than 100,000 records. This is how the data looks like: email_id cust_id campaign_name 123 4567 World of Zoro 123 4567 Boho XYz 123 4567 Guess ABC 234 5678 Anniversary X 234 5678 World of Zoro 234 5678 Fathers day 234 5678 Mothers day 345 7890 Clearance event 345 7890 Fathers day 345 7890 Mothers day 345 7890 Boho XYZ 345 7890 Guess ABC 345 7890 Sale I am trying to understand the campaign …
Category: Data Science

Train an LSTM on separate sequences of different lengths

My case is the following: I want to train a sequential classifier to recognize what action is being performed given sensors observations.My data consists of 10 executions of an assembling task for 10 different people. So, basically each person performed the same task and I have the sensor measurements for each millisecond. That means that for each person I have a really big data set with the corresponding measurements and the labels (which action is being performed) for each millisecond. …
Category: Data Science

Advantages of CNN vs. LSTM for sequence data like text or log-files

When do you tend to use CNN rather than LSTM (or the other way round) in classification or generation tasks of sequential data like text or log-data? What are the reasons for the decision and what does it depend on? Are there any papers or statistics that confirm this? I'm thinking of data like Linux log entries or short sentence of length of less than 20 words/tokens. Personally i would almost always use LSTM but I'm curious if CNN wouldn't …
Category: Data Science

LSTM to multivariate sequence classification

How can I train multivariate to multiclass sequence using LSTM in keras? I have 50000 sequences, each in the length of 100 timepoints. At every time point, I have 3 features (So the width is 3). I have 4 classes and I want to bulid a classifier to determine class for sequence. What is the best way to do so? I saw many guides for univariate sequence classification but none for multivariate, and I don't know how to apply this …
Category: Data Science

Modeling the influence of events order on probability

The case is to model if the sequence of events influences the probability of binary target variable. We have for example five different events which occur in time (event: A,B,C,D,E). They can occur in order from 1 to 5. I would like to check if the order of their occurrence influences the target variable. My first idea was to convert the time of occurrence into numbers from 1 to 5 and then for example use logistic regression. Do You know …
Category: Data Science

What is the formal category of problem described by identifying consecutive occurrences of attributes in records?

Apologies for the garbled title, I'd really need to know the answer to the question before I could phrase it properly... Let's imagine I've got a data set of football(soccer if you prefer) match results Let's further imagine that each result has the following attributes Date Venue Team Opponent Home Team Goals Away Team Goals Result Then let's consider a future match, for which we know some attributes but not all (obviously, because it hasn't happened yet) Date - W …
Category: Data Science

Classification when the classification of the previous itens matter

I have a classification problem to solve, that seems to be common but I am struggling to find the name of this task and the best way to model this problem. Suppose I have a series of events that are sequential in time. 2 Jan - I matched with a nice girl on Tinder - ACTION_TYPE = SOCIAL_EVENT 5 Jan - I meet with her, it was nice - ACTION_TYPE = SOCIAL_EVENT 8 Jan - I just got accept to …
Category: Data Science

Given daily sequence of events with only event ID labels (alphanum strings), what algorithms can be used to detect sequences that are outliers?

For example, the data might be something like this: Sequence 1: ["ABC", "AAA", "ZZ123", "RRZZZ45", "AABBCC"] Sequence 2: ["CBA", "AAA", "YY123", "LMNOP", "AABBCC"] Sequence 3: ["ABC", "AAA", "ZZ123", "RRZZZ45", "AABBCC"] ... Sequence N: ["DEF", "AAA", "ZZ123", "YYZZZ45", "AABBCC"] Sequence 1 and 3 are the same, but sequence 2 and N are different. In the data set, there will be thousands of these sequences every day. Additional questions: How could I calculate similarity (or difference) measure between sequences with sequences of …
Category: Data Science

Predict status of upcoming project milestones with intermediate activities

I have data of 100+ project data. Each project has about 175 sequential activities from start to end. There are approximately 7 key milestones between those 175 activities that we want to predict. Data is completely categorical (means every activity status is R, A, G, B, GR.) So we want to predict the status of those 7 milestones (R,A,G), say after every 25 activities. Projects are civil work projects where sequential activities are reqt gathering, review, approvals, high level design, …
Category: Data Science

1D Sequence Classification using Circular Dilated Convolutional Neural Networks

I am working on a multiclass classification task on long 1D sequences. The sequence length may vary between $512$ and $512 \cdot 60$ timesteps, a slice of $100$ timesteps might look like this: What is the best current approach of learning a deep learning model to minimize the cross-entropy loss with respect to the model architecture? I have read some papers using a CNN LSTM for this task, but are there any better suited architectures? I have considered Dilated Causal …
Category: Data Science

ML Modeling approach for Event data

I have this two dataset(image below).The one on the left shows events and the right is the alarm data. Goal : Using the two datasets, after any number of events, an alarm can be triggered.I'd like to predict when the next alarm will rise. Approach : I am a bit confused about the approach though. This is like a time-series data. Is using RNN the best approach or is there other approaches ? Thanks
Category: Data Science

1D Sequence Classification

Cross-post from https://stackoverflow.com/questions/71752744/1d-sequence-classification I am working with a long sequence (~60 000 timesteps) classification task with continuous input domain. The input has the shape (B, L, C) where B is the batch size, L is the sequence length (i.e. timesteps) and C is the number of features where each feature is continuous (i.e. values like 0.6, 0.2, 0.5, 1.3, etc.). Since the sequence is very long, I can't directly apply an RNN or Transformer Encoder layer without exceeding memory limits. …
Category: Data Science

Running out of memory when training Keras LSTM model for binary classification on image sequences

I'm trying to come up with a Keras model based on LSTM layers that would do binary classification on image sequences. The input data has the following shape: (sample_number, timesteps, width, height, channels) where one example would be (1200, 100, 100, 100, 3). So it's a 5D tensor equivalent to video data. timesteps is equal to 100 -> each sample (image sequence) has 100 frames channels is equal to 3 -> RGB data Here's a minimal workable example: import numpy …
Category: Data Science

Algorithm for segmentation of sequence data

I have a large sequence of vectors of length N. I need some unsupervised learning algorithm to divide these vectors into M segments. For example: K-means is not suitable, because it puts similar elements from different locations into a single cluster. Update: The real data looks like this: Here, I see 3 clusters: [0..50], [50..200], [200..250] Update 2: I used modified k-means and got this acceptable result: Borders of clusters: [0, 38, 195, 246]
Category: Data Science

What's an appropriate datastore for variable length sequence data for PyTorch consumption?

I have a large number of sequences - potentially hundreds of thousands - each consisting of between 100 and 10,000 items, which each consist of about 5 floats. I need a datastore that can rapidly serve these up in batches for PyTorch training. I also need to be able to rapidly write new sequences to the store. It's like an experience replay buffer for reinforcement learning, but I want to store every single run. These sequences should each have some …
Category: Data Science

Sequence Embedding using embedding layer: how does the network architecture influence it?

I want to obtain a dense vector representation of protein sequences so that I can meaningfully represent them in an embedding space. We can consider them as sequences of letters, in particular there are 21 unique symbols which are the amino acids (for example: MNTQILVFIACVLIEAKGDKICL). My approach is to use a sequence embedding that can be learned as a part of a deep learning model (built with Python using Keras libraries), that is a classifier (supervised) neural network which I …
Category: Data Science

Predict indices of text using deep learning

I want to predict the start and end indices of text where a certain type of propaganda technique is used like smears, name-calling, loaded language etc. Some examples from the dataset are: ['THERE ARE ONLY TWO GENDERS\n\nFEMALE \n\nMALE\n', 'This is not an accident!', "SO BERNIE BROS HAVEN'T COMMITTED VIOLENCE EH?\n\nPOWER COMES FROM THE BARREL OF A GUN, COMRADES.\n\nWHAT ABOUT THE ONE WHO SHOT CONGRESSMAN SCALISE OR THE DAYTON OHIO MASS SHOOTER?\n"] [[[0, 41]], [], [[47, 83], [3, 14], [33, 41], …
Category: Data Science

Many to one LSTM, where some sequence values are known at prediction step

I have a time series problem, which I am modelling with an RNN (using LSTMs). The input contains a sequence of values x_0 to x_4, for predictions at positions n-k (where k is a configurable parameter - the length of the sequence). I.e. the input shape is (k, 4). This is a regression problem, where the 4 (correlated) sequences are mapped into a prediction of the n+1th position, y_(n+1). However, this is an unusual problem where all values for x_1 …
Topic: lstm rnn sequence
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.