New to PyTorch and the PyTorch Forecasting library and trying to predict multiple targets using the Temporal Fusion Transformer model. I have 7 targets in a list as my targets variable. I'm using MultiLoss as my loss function with a list of 7 CrossEntropy loss functions (1 per target variable) -- In the problem I'm trying to model, there are 7 possible outcomes per time step and I'm trying to find which option is most likely. I'm looking for a …
Are there any advantages of using survival analysis models like Cox’s proportional hazard model with uncensored data over simple linear regression or other classic ML models? I have data with recurrent events and I try to predict the time of the next event. Data contains about 2000 different subjects and about 60 events per subject. The percentage of censored data (the last event of each subject) is small, and I don't think it plays a big role in the prediction.
I would like to cluster/group the curves in the attached picture with Python. The data is already normalized and my approach would be to use dtw (dynamic time warping) to calculate the distance and with that feature use a clustering algorithm (like kmeans or DBSCAN) to classify them. Do I pick out one trajectory as a starting curve to compare the other curves to, or do I calculate an 'average' curve of all curves and use that as the starting …
I have sales data which is seasonal and has no trend. The frequency of this series is 15 mins. I don't know how to compute the exact period of seasonality - whether it is daily or weekly or monthly or yearly. But, from plotting it, I think there is a yearly pattern. I tried removing seasonality before forecasting by lagging the series by a year and differencing the two but even the result has a yearly pattern. Code with what …
I just started to use recurrent neural networks (RNN) with Keras for time-series forecasting and I found this tutorial Forecasting with RNN. I have difficulties understanding how to build the training data both regarding the syntax and the format of the input data. Here is the code: import pandas as pd import numpy as np import tensorflow as tf from tensorflow import keras from matplotlib import pyplot as plt # Read the data for the parameters from a csv file …
Good day, I have ~50 sample trajectories (timeseries) showing reactor temperature over time for a given process. In addition, I have a reference signal of the ideal trajectory for this process. I would like to synchronize all the sample trajectories with the reference trajectory. Performing DTW with 1 sample signal and reference produces new signals along a common axis (as it should). My question is how can I perform this synchronization of all sample trajectories with the reference simultaneously? Such …
Suppose I own a store that sells a variety of apples and I have the following stats each month. Report Date Type of Apple (TA) Quantity Available(QA) Quantity Sold in the Past 30 days(QS30) Quantity Shipping In (QSI) Quantity Needed to Order(QN) Lets make the following assumptions/givens: There are three types of apples: red apples, green apples and yellow apples. T(1) denotes the first month and T(60) denotes the 60th month. QA @ T(i + 1) = QA@T(i) + QSI@T(i) …
I'm a complete n00b to both this stackexchange and ML so please don't flame me too bad. I am trying to make a prediction from Time Series data. I have about 10 years worth of 1-minute resolution price data for the S&P500. What I'd like to do is treat each DAY in the data as it's own series to predict what the price movement will be for the last 15 minutes of market hours. I've looked through several books, some …
I am trying to forecast some sales data with monthly values, I have been trying some classical models as well ML models like XGBOOST. My data with a feature set looks like this with a length of 110 months and I am trying to forecast for next 12 months, When it comes to XGBOOST, I've been spending time mostly on hyperparameter optimization with Gridsearch and also state-of-art packages like optuna. My currently best set of parameters looks like this, parameters …
This time series contains some time frame which each of them are 8K (frequencies)*151 (time samples) in 0.5 sec [overall 1.2288 millions samples per half a second) I need to find anomalous based on different rows (frequencies) Report the rows (frequencies) which are anomalous? (an unsupervised learning method) Do you have an idea to which statistical parameter is more useful for it? mean max min median var or any parameters of these 151 sampling? Which parameter I should use? (I …
Even if a time series is constructed up of numbers only, finding abstract fixed-dim vector representation would be interesting for classification/clustering purposes. As we can learn & find abstract representations/embeddings of text/images, can we do something similar on Time series? Finding such ways would result in better clustering & related tasks instead of traditional ways using some statistical measures like Pearson correlation etc. All thoughts are welcome.
TL;DR: Are there one-sided decomposition alternatives to the naive seasonal_decompose from statsmodels? Are there approaches to adapt intrinsically two-sided algorithms (like STL from statsmodels) to forecasting applications? I'm attempting to perform time-series forecasting. For this I want to decompose a time-series into trend and seasonal parts. I picked STL implementation from statsmodels to handle this. I gravitated towards STL instead of seasonal_decompose, since even the docs down the bottom encourage more sophisticated approaches: I noticed, however, that the decomposition is …
I want to perform k-fold cross-validation for the setting where I have a training dataset consisting of a sequential time series that is fully benign and a test dataset (also a sequential time series) which contains labeled anomalies. I already took a look at this post, but as my data is sequential, the answer doesn't work out. I am especially stuck with the factor that for K-fold cross-validation, you use (k-1)/k parts of your data for training and 1/k parts …
Ok I had a typo in this question before which I have now corrected: my database (df_e) looks like this: 0,Country,Latitude,Longitude,Altitude,Date,H2,Year,month,dates,a_diffH,H2a 1,IN,28.58,77.2,212,1964-09-15,-57.6,1964,9,1964-09-15,-3.18,-54.42 2,IN,28.58,77.2,212,1963-09-15,-120.0,1963,9,1963-09-15,-3.18,-116.82 3,IN,28.58,77.2,212,1964-05-15,28.2,1964,5,1964-05-15,-3.18,31.38 ... and I would like to save the data from the 9th month from the years 1963 and 1964 into a new df. For this I use the command: df.loc[df_e['H2a'].isin(['1963-09-15', '1964-09-15'])] But the result is Empty DataFrame Columns: [Country, Latitude, Longitude, Altitude, Date, H2, Year, month, dates, a_diffH, H2a] Index: [] Where is my mistake?
There are a few hundred time series of a large set of different locations (irregularly distributed) with the following properties: ordered factor (5 levels) between 5 and 25 observations per series lots of missing values within each series temporal and spatial autocorrelation (unknown) temporal frequency The objective is to spatially cluster the time series based on their similarity (of observed value per point in time). What would be adequate methods? The analysis will be carried out in R.
I have a collection of time series data with data points of around 2 years of daily data. I am thinking of a way to increase the number of data points in it so that the neural network gets a better understanding of the fluctuations in the data. I am suggesting a hypothesis where I try to cluster similar time-series data following similar distribution, in order to increase the number of data points fed into the neural network. Is this …
Is there a machine learning model (something like LSTM or 1D-CNN) that takes two time series of variable length as input and outputs a binary classification (True/False whether time series are of same label)? So the data would look something like the following date value label 2020-01-01 2 0 # first input time series 2020-01-02 1 0 # first input time series 2020-01-03 1 0 # first input time series 2020-01-01 3 1 # second input time series 2020-01-03 1 …
I'm working on forecasting daily volumes and have used time series model to check for data stationarity. However, I'm strugging at forecasting data with 90% accuracy. Right now variation is extremely high and I'm just unable to bring it down. I've used log method to transform my data. Please find the link to folder below which contains ipynb and csv files: https://drive.google.com/drive/folders/1QUJkTucLPIf2vjo2mRmoBU6be083dYpQ?usp=sharing Any help will be highly appreciable Thanks, Rahul
I am currently playing around with different CNN and LSTM model architectures for my multivariate time series classification problem. I can achieve validation accuracy of better than 50 %. I would like to lock down an exact architecture at some stage instead of experimenting endlessly. In order to decide this, I want to also tune my hyperparameters. Question: How do I balance the need to experiment with different models, such as standalone CNN and CNN with LSTM against hyperparameter tuning? …
I work with two datasets. The first dataset contains fluor values measured every minute. The second dataset contains certain events and their time. We know that these events cause peaks in fluor values shortly before and shortly after the event time. A simplified reproducible example in R: Here I provide a simplified version of the R code I use to relate the fluor values to events. I have a series of fluor values measured every minute. Next I have a …