How to find average lag time with variance & confidence of two time series

I have two variables as time series, one a consequent of the other, I would like to find the average time delay it takes the dependent variable to act on the independent variable. Additionally, I would like to find the range of variance that is associated with the lag time and its respective confidence level. I am unsure how to go about this in a statistically valid way, but I am using Python. Currently I have used np.diff(np.sign(np.diff(df))) to isolate …
Category: Data Science

Model prediction on meshgrid in python

Suppose I have data with two independent variable $X_1$, $X_2$ and one dependent variable say $y$, as follows: $X_1$: $x_{1,1}$, $x_{1,2}$ , $x_{1,3}$ $X_2$: $x_{2,1}$, $x_{2,2}$, $x_{2,3}$ $y$: $y_1$, $y_2$, $y_3$ I built some Machine learning model which is good . Now I want to generate predictions not just for test data but for all possible combinations of test data for example, if our test data looks like $X_1$: $a$, $b$, $c$ $X_2$: $p$, $q$, $r$ then I want predictions …
Category: Data Science

Is it possible to implement a vectorized version of a Maxout activation function?

I want to implement an efficient and vectorized Maxout activation function using python numpy. Here is the paper in which "Maxout Network" was introduced (by Goodfellow et al). For example, if k = 2: def maxout(x, W1, b1, W2, b2): return np.maximum(np.dot(W1.T,x) + b1, np.dot(W2.T, x) + b2) Where x is a N*D matrix. Suppose k is an arbitrary value(say 5). Is it possible to avoid for loops when calculating each wx + b? I couldn't come up with any …
Category: Data Science

Python: calculate the weighted average correlation coefficient

I am calculating the volatility (standard deviation) of returns of a portfolio of assets using the variance-covariance approach. Correlation coefficients and asset volatilities have been estimated from historical returns. Now what I'd like to do is compute the average correlation coefficient, that is the common correlation coefficient between all asset pairs that gives me the same overall portfolio volatility. I could of course take an iterative approach, but was wondering if there was something simpler / out of the box …
Category: Data Science

Proper datashape and model architecture for recognizing highs and lows in a chart

I am using a Keras LSTM model to try to pinpoint the highs and lows (relative high points and low points) in a chart (I need the actual coordinates to those highs and lows, not just an image). The training process has no errors in it but the prediction output is completely irrelevant to the training output. what I've done so far is, I created the output data by feeding the input data to an algorithm from Scipy, argrelextrema. For …
Category: Data Science

Pre-process data images before training OneClassSVM and decrease number of features

I want to train a OneClassSVM() using sklearn, and I have a set of around 800 images in my training set. I am using opencv to read the images and resize them to constant dimensions (960x540) and then adding them to a numpy-array. The images are RGB and thus have 3-dimensions. For that, I am reshaping the numpy array after reading all the images: #Assume X is my numpy array which contains all the images before reshaping #Now I reshape …
Category: Data Science

Different approaches of creating the test set

I came across different approaches to creating a test set. Theoretically, it's quite simple, just pick some instances randomly, typically 20% of the dataset and set them aside. Below are the approaches The naive way of creating the test set is def split_train_test(data,test_set_ratio): #create indices shuffled_indices = np.random.permutation(len(data)) test_set_size = int(len(data) * test_set_ratio) test_set_indices = shuffled_indices[:test_set_size] train_set_indices = shuffled_indices[test_set_size:] return data.iloc[train_set_indices],data.iloc[test_set_indices] The above splitting mechanism works, but if the program is run, again and again, it will generate a different …
Category: Data Science

Identifing this dataset for sanitising

I am beginner here starting with data science for analytics. I am trying to figure out what data set this is and how to read it from python. I have an idea of the steps but not sure how to code it in python. Open & read the file Search for keywords based on another file If keyword found, search for Term from that line up and copy value of id: which is below it. If more than one keyword …
Category: Data Science

unable to pass X_train and y_train in my regressor variable. i got a ValueError

import pandas as pd import numpy as np import matplotlib.pyplot as plt data = pd.read_csv('housing.csv') data.drop('ocean_proximity', axis=1, inplace = True) data.head() longitude latitude housing_median_age total_rooms total_bedrooms population households median_income median_house_value 0 -122.23 37.88 41.0 880.0 129.0 322.0 126.0 8.3252 452600.0 1 -122.22 37.86 21.0 7099.0 1106.0 2401.0 1138.0 8.3014 358500.0 2 -122.24 37.85 52.0 1467.0 190.0 496.0 177.0 7.2574 352100.0 3 -122.25 37.85 52.0 1274.0 235.0 558.0 219.0 5.6431 341300.0 4 -122.25 37.85 52.0 1627.0 280.0 565.0 259.0 3.8462 342200.0 …
Category: Data Science

Slice NumPy arrays differently along axes (without looping)

I am trying to analyze a temporal signal sampled by a 2D sensor. Effectively, this means integrating the signal values for each sensor pixel (array row/column coordinate) at the times each pixel is active. Since the start time and duration that each pixel is active are different, I effectively need to slice the signal for different values along each row and column. # Here is the setup for the problem import numpy as np def signal(t): return np.sin(t/2)*np.exp(-t/8) t = …
Category: Data Science

How to solve this ValueError: Dimensions must be equal

I'm trying to train an autoencoder model with colored image samples but I got this error ValueError: Dimensions must be equal, but are 476 and 480 for '{{node mean_squared_error/SquaredDifference}} = SquaredDifference[T=DT_FLOAT](model_4/conv2d_28/BiasAdd, IteratorGetNext:1)' with input shapes: [?,476,476,1], [?,480,480,3]. although i have checked the dimensions of the test and training sets all are (480,480,3) from matplotlib import image,pyplot import cv2 IMG_HEIGHT=480 IMG_WIDTH=480 def prepro_resize(input_img): oimg= cv2.imread( input_img, cv2.COLOR_BGR2RGB) return cv2.resize(oimg, (IMG_HEIGHT, IMG_WIDTH),interpolation = cv2.INTER_AREA) x_train_ = [(prepro_resize(x_train[i])).astype('float32')/255.0 for i in range(len(x_train))] x_test_ …
Category: Data Science

OpenCV warpAffine error during image augmentation using Albumentations

I have been trying to do image augmentation using a library called Albumentations. But I got some error from OpenCV while transforming the images. I ran the code below on Kaggle's notebook. The dataset is called "Intel image classification" on kaggle. It has 6 classes. Each image is 150 * 150 * 3. import numpy as np import tensorflow as tf import albumentations as a train_data = tf.keras.utils.image_dataset_from_directory( x_train_path, seed=123, image_size=(150, 150), batch_size=128) x_train_path = "../input/intel-image-classification/seg_train/seg_train" transforms = Compose([ a.Rotate(limit=40), …
Category: Data Science

How to solve MemoryError problem

I've created and normalized my colored image dataset of 3716 sample and size 493*491 as x_train, its type is list I'm tring to convert it into numpy array as follows from matplotlib import image import numpy as np import cv2 def prepro_resize(input_img): oimg=image.imread(input_img) return cv2.resize(oimg, (IMG_HEIGHT, IMG_WIDTH),interpolation = cv2.INTER_AREA) x_train_ = [(prepro_resize(x_train[i])).astype('float32')/255.0 for i in range(len(x_train))] x_train_ = np.array(x_train_) #L1 #print(x_train_.shape) but i get the following error when L1 runs MemoryError: Unable to allocate 10.1 GiB for an array with …
Category: Data Science

Are there any graph embedding algorithms like this already?

I wrote an algorithm for generating node embeddings based on the graph's topology. Most of the explanation is done in the readme file and the examples. The question is: Am I reinventing the wheel? Does this approach have any practical advantages over existing solutions for embeddings generation? Yes, I'm aware there are many algorithms for this based on random walks, but this one is pure deterministic linear algebra and it is quite simple, from my perspective. In short, the algorithm …
Category: Data Science

How to run list comprehensions on GPU?

Is there a way to run complex list comprehensions like the following on GPU? [[x[index] if x[index]>len(x) else x[index]-1 for x in slice] if (len(slice)==1) else slice for slice,index in zip(slices,indices)] To what degree is it Possible? Do I have to convert it to some kind of numpy comprehension (if so what part is speciffically possible/necessary) The goal is performance optimization on large datalists/arrays.
Category: Data Science

Integration of NLP and Angular application

I'm doing a small POC in which I've trained my Machine Learning model (Naive Bayes) and is saved in ".pkl" (pickle) format. Now my next task is to develop a web application which asks the user to enter the Text for the Text classification analysis. This newly taken (from the user) "TEXT" will be the testing dataset which can be fed to the Naive Bayes model that I built in the earlier stage and make prediction on the "text" taken …
Category: Data Science

Tensor dot product with rank one tensor from vector

I'm trying to compute an inner product between tensors in numpy. I have a vector $x$ of shape (n,) and a tensor $y$ of shape d*(n,) with d > 1 and would like to compute $\langle y, x^{\otimes d} \rangle$. That is, I want to compute the sum $$\langle y, x^{\otimes d} \rangle= \sum_{i_1,\dots,i_d\in\{1,\dots,n\}}y[i_1, \dots, i_d]x[i_1]\dots x[i_d].$$ A working implementation I have uses a function to first compute $x^{\otimes d}$ and then uses np.tensordot: def d_fold_tensor_product(x, d) -> np.ndarray: """ …
Topic: numpy python
Category: Data Science

How to create a complex Gaussian random noise with a specific covariance matrix

I am trying to generate a complex Gaussian white noise, with zero mean and the covariance matrix of them is going to be a specific matrix which is assumed to be given. Assume i to be a point on the grid of x axis, where there are N points on the axis. The problem is to generate a complex valued random noise at each point (let's call the random value at the point i as $y_i$), which obeys Gaussian distribution …
Category: Data Science

Why fourier transform extrapolation goes to extreme on edges but not in the middle, how to fix it

Why fourier transform extrapolation goes to extreme on edges but not in the middle, how to fix it with python """ Code to create the Fuorier trasfrom """ data_FT = dataset_ex_df[['Date', 'GS']] close_fft = np.fft.fft(np.asarray(data_FT['GS'].tolist())) fft_df = pd.DataFrame({'fft':close_fft}) fft_df['absolute'] = fft_df['fft'].apply(lambda x: np.abs(x)) fft_df['angle'] = fft_df['fft'].apply(lambda x: np.angle(x)) plt.figure(figsize=(14, 7), dpi=100) fft_list = np.asarray(fft_df['fft'].tolist()) for num_ in [3, 6, 9, 100]: fft_list_m10= np.copy(fft_list); fft_list_m10[num_:-num_]=0 plt.plot(np.fft.ifft(fft_list_m10), label='Fourier transform with {} components'.format(num_)) plt.plot(data_FT['GS'], label='Real') plt.xlabel('Days') plt.ylabel('USD') plt.title('Figure 3: Goldman Sachs (close) stock …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.