I have two variables as time series, one a consequent of the other, I would like to find the average time delay it takes the dependent variable to act on the independent variable. Additionally, I would like to find the range of variance that is associated with the lag time and its respective confidence level. I am unsure how to go about this in a statistically valid way, but I am using Python. Currently I have used np.diff(np.sign(np.diff(df))) to isolate …
Suppose I have data with two independent variable $X_1$, $X_2$ and one dependent variable say $y$, as follows: $X_1$: $x_{1,1}$, $x_{1,2}$ , $x_{1,3}$ $X_2$: $x_{2,1}$, $x_{2,2}$, $x_{2,3}$ $y$: $y_1$, $y_2$, $y_3$ I built some Machine learning model which is good . Now I want to generate predictions not just for test data but for all possible combinations of test data for example, if our test data looks like $X_1$: $a$, $b$, $c$ $X_2$: $p$, $q$, $r$ then I want predictions …
I want to implement an efficient and vectorized Maxout activation function using python numpy. Here is the paper in which "Maxout Network" was introduced (by Goodfellow et al). For example, if k = 2: def maxout(x, W1, b1, W2, b2): return np.maximum(np.dot(W1.T,x) + b1, np.dot(W2.T, x) + b2) Where x is a N*D matrix. Suppose k is an arbitrary value(say 5). Is it possible to avoid for loops when calculating each wx + b? I couldn't come up with any …
I am calculating the volatility (standard deviation) of returns of a portfolio of assets using the variance-covariance approach. Correlation coefficients and asset volatilities have been estimated from historical returns. Now what I'd like to do is compute the average correlation coefficient, that is the common correlation coefficient between all asset pairs that gives me the same overall portfolio volatility. I could of course take an iterative approach, but was wondering if there was something simpler / out of the box …
I am using a Keras LSTM model to try to pinpoint the highs and lows (relative high points and low points) in a chart (I need the actual coordinates to those highs and lows, not just an image). The training process has no errors in it but the prediction output is completely irrelevant to the training output. what I've done so far is, I created the output data by feeding the input data to an algorithm from Scipy, argrelextrema. For …
I want to train a OneClassSVM() using sklearn, and I have a set of around 800 images in my training set. I am using opencv to read the images and resize them to constant dimensions (960x540) and then adding them to a numpy-array. The images are RGB and thus have 3-dimensions. For that, I am reshaping the numpy array after reading all the images: #Assume X is my numpy array which contains all the images before reshaping #Now I reshape …
I came across different approaches to creating a test set. Theoretically, it's quite simple, just pick some instances randomly, typically 20% of the dataset and set them aside. Below are the approaches The naive way of creating the test set is def split_train_test(data,test_set_ratio): #create indices shuffled_indices = np.random.permutation(len(data)) test_set_size = int(len(data) * test_set_ratio) test_set_indices = shuffled_indices[:test_set_size] train_set_indices = shuffled_indices[test_set_size:] return data.iloc[train_set_indices],data.iloc[test_set_indices] The above splitting mechanism works, but if the program is run, again and again, it will generate a different …
I am beginner here starting with data science for analytics. I am trying to figure out what data set this is and how to read it from python. I have an idea of the steps but not sure how to code it in python. Open & read the file Search for keywords based on another file If keyword found, search for Term from that line up and copy value of id: which is below it. If more than one keyword …
I am trying to analyze a temporal signal sampled by a 2D sensor. Effectively, this means integrating the signal values for each sensor pixel (array row/column coordinate) at the times each pixel is active. Since the start time and duration that each pixel is active are different, I effectively need to slice the signal for different values along each row and column. # Here is the setup for the problem import numpy as np def signal(t): return np.sin(t/2)*np.exp(-t/8) t = …
I'm trying to train an autoencoder model with colored image samples but I got this error ValueError: Dimensions must be equal, but are 476 and 480 for '{{node mean_squared_error/SquaredDifference}} = SquaredDifference[T=DT_FLOAT](model_4/conv2d_28/BiasAdd, IteratorGetNext:1)' with input shapes: [?,476,476,1], [?,480,480,3]. although i have checked the dimensions of the test and training sets all are (480,480,3) from matplotlib import image,pyplot import cv2 IMG_HEIGHT=480 IMG_WIDTH=480 def prepro_resize(input_img): oimg= cv2.imread( input_img, cv2.COLOR_BGR2RGB) return cv2.resize(oimg, (IMG_HEIGHT, IMG_WIDTH),interpolation = cv2.INTER_AREA) x_train_ = [(prepro_resize(x_train[i])).astype('float32')/255.0 for i in range(len(x_train))] x_test_ …
I have been trying to do image augmentation using a library called Albumentations. But I got some error from OpenCV while transforming the images. I ran the code below on Kaggle's notebook. The dataset is called "Intel image classification" on kaggle. It has 6 classes. Each image is 150 * 150 * 3. import numpy as np import tensorflow as tf import albumentations as a train_data = tf.keras.utils.image_dataset_from_directory( x_train_path, seed=123, image_size=(150, 150), batch_size=128) x_train_path = "../input/intel-image-classification/seg_train/seg_train" transforms = Compose([ a.Rotate(limit=40), …
I've created and normalized my colored image dataset of 3716 sample and size 493*491 as x_train, its type is list I'm tring to convert it into numpy array as follows from matplotlib import image import numpy as np import cv2 def prepro_resize(input_img): oimg=image.imread(input_img) return cv2.resize(oimg, (IMG_HEIGHT, IMG_WIDTH),interpolation = cv2.INTER_AREA) x_train_ = [(prepro_resize(x_train[i])).astype('float32')/255.0 for i in range(len(x_train))] x_train_ = np.array(x_train_) #L1 #print(x_train_.shape) but i get the following error when L1 runs MemoryError: Unable to allocate 10.1 GiB for an array with …
I wrote an algorithm for generating node embeddings based on the graph's topology. Most of the explanation is done in the readme file and the examples. The question is: Am I reinventing the wheel? Does this approach have any practical advantages over existing solutions for embeddings generation? Yes, I'm aware there are many algorithms for this based on random walks, but this one is pure deterministic linear algebra and it is quite simple, from my perspective. In short, the algorithm …
Is there a way to run complex list comprehensions like the following on GPU? [[x[index] if x[index]>len(x) else x[index]-1 for x in slice] if (len(slice)==1) else slice for slice,index in zip(slices,indices)] To what degree is it Possible? Do I have to convert it to some kind of numpy comprehension (if so what part is speciffically possible/necessary) The goal is performance optimization on large datalists/arrays.
I'm doing a small POC in which I've trained my Machine Learning model (Naive Bayes) and is saved in ".pkl" (pickle) format. Now my next task is to develop a web application which asks the user to enter the Text for the Text classification analysis. This newly taken (from the user) "TEXT" will be the testing dataset which can be fed to the Naive Bayes model that I built in the earlier stage and make prediction on the "text" taken …
I'm trying to compute an inner product between tensors in numpy. I have a vector $x$ of shape (n,) and a tensor $y$ of shape d*(n,) with d > 1 and would like to compute $\langle y, x^{\otimes d} \rangle$. That is, I want to compute the sum $$\langle y, x^{\otimes d} \rangle= \sum_{i_1,\dots,i_d\in\{1,\dots,n\}}y[i_1, \dots, i_d]x[i_1]\dots x[i_d].$$ A working implementation I have uses a function to first compute $x^{\otimes d}$ and then uses np.tensordot: def d_fold_tensor_product(x, d) -> np.ndarray: """ …
I am trying to generate a complex Gaussian white noise, with zero mean and the covariance matrix of them is going to be a specific matrix which is assumed to be given. Assume i to be a point on the grid of x axis, where there are N points on the axis. The problem is to generate a complex valued random noise at each point (let's call the random value at the point i as $y_i$), which obeys Gaussian distribution …