Forecasting on multivariate time series containing quaternions

I have a multivariate time series containing 3D position data ($x,y,z)$ and orientation data (as quaternions) obtained from motion sensors. My goal is to forecast the future position/orientation, and for this I'm looking into use sequence models, esp. LSTMs. A quaternion has 4 elements, one of them denoting the real/scalar part (say $q_w$) and the other three denoting the imaginary/vector part (say $q_x, q_y, q_z$). So my time series has 7 columns in total. My question: Considering that quaternion elements …
Category: Data Science

Polynomial regression with two variables. How can I find expressions to describe the coefficients?

I'm not sure if this is an appropriate place for this question, so please feel free to redirect me if it is not. I just moved it from Super User, where it seemed like there weren't many similar questions. Please also feel free to suggest tags. I'm trying to modify part of an old code. It uses regression to describe the relationship between two variables (described as "a fourth order power series in X and y"). I know very little …
Category: Data Science

Which dataset for multivariate time series forecasting

I'm trying to forecast Real estate Price , it's not a prédiction. But a forecast Like the Price of a an appartement in 2023 or 2024, i'm asking about how should be my dataset ? Can I use a dataset from 2018 to 2021 of 13 columns You can find the dataset here: https://www.kaggle.com/datasets/mrdaniilak/russia-real-estate-20182021 Date, area, kitchen_are, nb_rooms Please note that every row is a new house indépendant from others, I'm having this dataset by scrapping a website of ads …
Category: Data Science

Getting vague results using VAR time series forecasting in python!

Firstly, I am a beginner in this field of Data Science and have tried to implement some time series models for wind speed forecasting. Also, I am aware of the fact that some regression models might give better results, but still, my aim is to crack the same with the help of VAR I tried to implement multivariate time series forecasting - VAR in python. To start with I followed the code in this article- https://towardsdatascience.com/simple-multivariate-time-series-forecasting-7fa0e05579b2 However, the forecasted value …
Category: Data Science

Confidence intervals in multivariate linear regression

I am fitting my data to a multivariate linear regression $Y = BX + \Xi$, where the response is bivariate $Y\in R^{n\times 2}$, and the predictor is uni-variate but elevated to the projective plane to account for the intercept $X\in R^{n\times 2}$. Now, finding the best fit reduces to $\hat B = (X^T X)^{-1}X^T Y$. But I am interested in finding a $0.7$ confidence region around $\hat B$. How do I do that?
Category: Data Science

multi variate time forecasting

I want to forecast in a time serie the 'output'. I have from the past the correlated time series 'output', 'capacity' and 'load'. I also know from the nearby future the time series from the 'capacity' and 'load'. See picture. I'm looking for a solution to this problem in python. All variable have the same unit in man-hours per hour (mh/h). For your interest; the output is the work that is finished in a skill group based on the baseline. …
Category: Data Science

How to build multiple variable regression having a mix of numerical & categorical features?

There is a need to estimate Annual Average Daily Traffic Volume (AADT). We have bunch of data about vehicles' speeds during several years. It is noticed that AADT depends on the average number of such samples during some time, so a regression model $Y = f(x_1)$ could help estimating the AADT. The problem is there are other features affecting the dependency which are both numerical $(x_2, .., x_k)$ and categorical $(c_1 = data\ provider, c_2 = road\ class, .., c_m)$. …
Category: Data Science

Getting a balanced sample across many variables

Let’s say each element in my population has several attributes. Let’s call then A, B, C, D, E, F. Let’s say, for simplicity, each attribute has 10 values (but could be any number between 2 and 30). Now I want to get a sample such that the distribution is the same across all features. So for example if the whole population has about 15% of people in feature A with value 1, my sample should be the same. What should …
Category: Data Science

Can the dependency between variables be deduced from data? And if so, how?

I have a data set $X$ that consists of $m$ vectors $\vec{x}$ of $n$ real-valued components. Each vector component lies within a corresponding predefined interval of valid values, which is the same for all vectors in $X$. The assumption is that there exists a dependency graph between the components of each vector, which is also the same for all vectors; for example, the value of the component $x_k$ (maybe) depends on the values of both components $x_p$ and $x_q$ for …
Category: Data Science

Multivariate data preprocessing

I am trying to understand how multivariate data preprocessing works but there are some questions in my mind. For example, I can do data smoothing, transformation (box-cox, differentiation), noise removal in univariate data (for any machine learning problem. Not only time series forecasting). But what if one variable is not noisy and the other is noisy? Or one is not smooth and another one is smooth (i will need to sliding window avg. for one variable but not the other …
Category: Data Science

Sampling trying to keep as much multivariate variance as possible

I was thinking if anyone considered a sampling technique that would try to aim keeping as much of the variance as possible (e.g. as many unique values, or very widely distributed continuous variables). The benefit might be that it will allow development of code around the sample, and really work with the edge cases in the data. You can then later always take a representative sample. So, I am wondering if people have tried to sample for maximum variance before …
Category: Data Science

Interpretation of PCA/FAMD results

I wrote a code about a mix PCA (FAMD - factor analysis of mixed data), where I have a dataset with some categorical variable and some numerical variable. This is my example code in R: library(dplyr) library(PCAmixdata) data <- starwars db_quali <- as.data.frame(starwars[,4:6]) db_quanti <- as.data.frame(starwars[,2:3]) pca_table <- PCAmix(X.quanti = db_quanti, X.quali = db_quali, rename.level=TRUE, graph = TRUE) Gender <- factor(data$gender) par(xpd=TRUE,mar=rep(8,4)) plot(pca_table ,choice="ind",label=FALSE, posleg=xy.coords(2,-10), main="Observations", coloring.ind = Gender) and the output graph is: How this method calculate the coordinate …
Category: Data Science

Getting mean and covariance matrix for multivariate normal from keras model

I have a dataset that has 6 input features and 5 output features. I want to use a keras sequential model to estimate the mean vector and covariance matrix from any row of input features assuming the output features to be following Multivariate Normal Distribution. That is for my dataset for any row of 6 input features, I want to get a mean vector of 5 values and a 5*5 covariance matrix. sample=pd.DataFrame({'X1':[1,2,3,4,5,6], 'X2':[1,3,1,5,2,7], 'X3':[3,0,0,7,5,0], 'X4':[0,4,3,2,5,8], 'X5':[9,7,0,2,4,5], 'X6':[1,1,8,7,0,0], 'Y1':[0.5,1.2,6.3,4.5,1.5,6.6], 'Y2':[6.1,4.3,2.1,1.5,4.2,8.7], …
Category: Data Science

Should I concat multiple stock timeseries datasets into one?

I have several timeseries datasets of stock data, with fundamental indicators. I would like to build a model that selects stocks for buy and hold. I understand that to perform this task I have two options: Train a model for each stock: This way, I understand that it is the most practical, however, the amount of data for each model will be very reduced (Each dataset has less than 1000 lines). Putting all the data together in a single dataset: …
Category: Data Science

Two variables polynomial fit with Python

I have two numpy arrays (the first is 2D, the second 1D) in the form: $X = [[x_1,y_1],[x_2,y_2],[x_3,y_3],...]$ $Z = [z_1,z_2,z_3,...]$ I would like to fit them as I expect they respect a polynomial law. $z = A xy + B x + C y + D$ (the model is separately linear in $x$ and $y$) So I would like a function which takes the two arrays and gives the coefficients $A,B,C$ and $D$. Is there any way to do …
Category: Data Science

Tensorflow Probability Implementation of Automatic Differentiation Variational Inference with Mixtures

In this paper, the authors suggest using the following loss instead of the traditional ELBO in order to train what basically is a Variational Autoencoder with a Gaussian Mixture Model instead of a single, normal distribution: $$ \mathcal{L}_{SIWAE}^T(\phi)=\mathbb{E}_{\{z_{kt}\sim q_{k,\phi}(z|x)\}_{k=1,t=1}^{K,T}}\left[\log\frac{1}{T}\sum_{t=1}^T\sum_{k=1}^K\alpha_{k,\phi}(x)\frac{p(x|z_{k,t})r(z_{kt})}{q_\phi(z_{kt}|x)}\right] $$ They also provide the following code which is supposed to be a tensorflow probability implementation: def siwae(prior, likelihood, posterior, x, T): q = posterior(x) z = q.components_dist.sample(T) z = tf.transpose (z, perm=[2, 0, 1, 3]) loss_n = tf.math.reduce_logsumexp( (−tf.math.log(T) + …
Category: Data Science

How to find mixing ratios in a mixture model with known parameters?

This question does not ask for a formal solution or rephrasing, but for a practical implementation. That is why I am asking here and not on [cross-validate](https://clustering stats.stackexchange.com) Let us assume I have $y$ observations and a mixture model of $g$ Normally distributed components with mixing ratios $\lambda$ and I know their parameters $\theta$. How can I estimate only the ratios $\lambda$ and not the parameters $\theta$? So far I have only managed to estimate the entire mixture model, meaning …
Category: Data Science

Meaning of the covariance matrix?

I wonder about the excessive usage of the covariance matrix across all kinds of machine learning tools. So far, for me, the covariance is just a pre-step to get to the correlation. And as there is an obvious reason for the correlation itself, I wonder why I encounter the covariance so often. And, however, I wonder in general why it is used so much. What is/are the purposes for the covariance matrix?
Category: Data Science

MLE for Poisson conditioned on multivariate Gaussian?

I am writing some Python code to fit 2D Gaussians to fluorescent emitters on a dark background to determine the subpixel-resolution (x, y) position of the fluorescent emitter. The crude, pixel-resolution (x, y) locations of the pixels are stored in a list xy. The height of the Gaussian represents the predicted pixel intensity at that location. Each 2D Gaussian has 5 parameters, and my end goal is to find the optimal value of those 5 parameters for each peak using …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.