Discrete wavelet transform - DWT (beginner)

I recently stumbled upon this article : https://www.bportugal.pt/sites/default/files/anexos/papers/wp201612_0.pdf In the paper they use DWT and I am having trouble understanding how to construct them. Does anyone have a guide on where to start learning wavelets and slowly move to DWT? I am a beginner in this part so I am really trying to understand from step 0. I will apply the math in python. I have checked the others threads and the questions are to speicific, so let's try to …
Category: Data Science

Which error metric is best for financial returns?

I am trying to predict price change i.e most of the time values around zero (+ and -). In my backtest I predict only one period during test. I would like to know in each iteration which model was the best, compare y_predict vs y_test (both of size 1). Which regression metric would be the best in this case? To capture at the same time the minimised error but also the side of the value. Exemple : y_true = 0.03 …
Category: Data Science

Data source for financial data mining

I plan to do data modeling in the financial area for my master's dissertation. I am thinking of finding the connection between a certain company or country characteristics ( x values) and their creditworthiness (here I am still looking for a y variable such as credit score, bankruptcy occurrence, etc.). Do you know in which databases I could find required input_ Company data would be great, however, most likely country data might be more accessible then I could also do …
Category: Data Science

How to measure Covid impact by analysing credit card transaction of customer

I Want to know how can I identify that is the customer is in financial distress due to the COVID situation using its credit card transactions. I have a daily transaction of customers till current date, Any thoughts or ideas would be really helpful. For example, my thoughts that if sudden increment in credit card utilization then it can be an indicator that person in financial distress as could be a flag to potential lenders or creditors that having trouble …
Category: Data Science

Should I concat multiple stock timeseries datasets into one?

I have several timeseries datasets of stock data, with fundamental indicators. I would like to build a model that selects stocks for buy and hold. I understand that to perform this task I have two options: Train a model for each stock: This way, I understand that it is the most practical, however, the amount of data for each model will be very reduced (Each dataset has less than 1000 lines). Putting all the data together in a single dataset: …
Category: Data Science

Workflow for stock prediction in machine learning

I'm trying to find the best workflow for a stock prediction problem. My idea goes as follows : I will use a classfication and a regression at the same time Classification (-1 ; 0 ; 1) Regression (float) => I will classify the output in the end just like the classification to make a decision (if the float number is really close to 0, it will be a zero for me) The pipeline of the classification and regression will be …
Category: Data Science

Persistence and stationarity together

I am trying to analyse a time series. I want to get only quantitative results (so, I'm excluding things like "looking at this plot we can note..." or "as you can see in the chart ..."). In my job, I analyse stationarity and persistence. First, I run ADF test and get "stationary" or "non-stationary" as results. Then, I need to work on persistence. To do so, I use ACF. My question is: suppose I got "non-stationary" time series. Is it …
Category: Data Science

Automatize autocorrelation in python

I'm trying to automatize my autocorrelation study in Python. My question is: is it possible? Let me explain. I have a time series and I just learnt how to interpret the autocorrelation plot. My question is: given that I need to examine a hundred time series, is it possible to get a result from the data (and not look at plots at all)? Here's my whole python code, which returns some graphs (for just one time series). What do you …
Category: Data Science

change target variable value to reflect better affordability

Context I am working on a regression problem trying to predict affordability. My dataset contains daily installments repaying a purchase in a form of contract. Essentially, a minimum daily rate the customer has to pay for their purchase. Using this data I want to predict the affordability of each customer. My target variable is the daily rate of the last purchase and the features take into account all the payments and similar purchases up that point in time. In this …
Category: Data Science

Legal issues with Machine Learning

I'm confused with why people claim that current legal system cannot handle any wrongdoings of algorithms that involve Machine Learning and Artificial Intelligence. The claim is that it is impossible to find who is liable for the wrongdoing. This claim seems strange: isn't it obvious that it's the company who developed the algorithm that is liable for any issues that this algorithm caused? Can someone explain where the current legal system/framework/laws break down when it comes to any harm caused …
Category: Data Science

How do I get Multiple CSV files (csv file names will be column names) from a folder to a pandas dataframe?

I think the title is enough for what is my problem here. I have 100 tickers in a folder. I got all csv files to one list. but I wanna see the tickers name which these are csv file names. what should I do for getting all tickers like that. I'm doing it manually but I need a simple way to do that. import pandas as pd import os #BİST100-2010.01.01-2020.07.01/ is my folder. log_total = [] for file in os.listdir('BİST100-2010.01.01-2020.07.01/'): …
Category: Data Science

What's the best way to do classification basing on two given datasets (annual data and daily data)?

I want to do binary-classification basing on two given dataset, one is annual statistical data of a company and has the label I should be able to predict like this: company_id | year | annual sales | something else... | label 0 | 2017 | 2000320 | ... | 0 0 | 2018 | 4002530 | ... | 0 0 | 2019 | 800050 | ... | 1 1 | 2017 | 1024380 | ... | 1 1 | 2018 …
Category: Data Science

"Up or down but not sideways" bimodal time series prediction - what is the best way to model it?

Say I have a time series (e.g. bitcoin price). I want to predict tomorrow's price, specifically tomorrow's % change in price from today. Let's say this is gaussian distributed, with the mean at 0%. If the market is trending up, the price prediction should be higher (e.g. +3.1%). If the market is trending down, the price prediction should be lower (e.g. -5.4%). If the market is trending sideways, the price prediction should be neutral (e.g. 0%). However, there are times …
Category: Data Science

Linear regression of times series data with heteroskedasticity

I am trying to find out if stock market movements, on average and in extreme conditions, affect gold prices. I am following the regression model proposed by Baur and McDermott (2010) which is given as: $R_{asset,t}=a+b_tR_{stock,t}+\epsilon_t$ $b_t=c_0+c_1D(R_{stock}q_{10})+c_2D(R_{stock}q_{5})+c_3D(R_{stock}q_{1})$ $h_t=\omega+\alpha\epsilon_{t-1}^2+\beta h_{t-1}$ All models are estimated simultaneously with maximum likelihood methods as mentioned in their published paper which I do not know how to apply. Below is what I have done: reg <- read.csv(file = "MVreturnsqreg.csv") The csv file contains time series of …
Category: Data Science

Data Transformation for Machine Learning Regression Task

I am performing a ML regression task, using XGBoost Regressor. I am using financial time series data, namely the Close price of the EUR/USD exchange rate which I will transform into geometric log returns which will be my predictor variable. Also, I am using a technical analysis library which uses the open, high, low, & close prices to create additional features, e.g. Bollinger Bands, ATR, moving averages etc... When viewing the distribution of, let's say, the Bollinger Bands it looks …
Category: Data Science

Algortihm for distributing volume in 1min stock intervals

Context: I have historical 1min prices for stocks, including premarket. However, when importing real-time data, the standard practice in the financial data industry is to give only OHLC (open, high, low, close) prices and 0 volume for 1min intervals. But they do provide the total amount of pre-market volume. Example: AAPL 1min data from yahoo finance. Open High ... Adj Close Volume Datetime [...] [...] 2021-07-20 09:25:00 143.420000 143.460000 ... 143.400000 0 2021-07-20 09:26:00 143.410000 143.430000 ... 143.395000 0 2021-07-20 …
Category: Data Science

Regression and Classification, which is better in financial market price prediction?

I want to use a model to trade in finanical market. which i have several features, like macd, rsi, or other common features. and my target is to make a tradeable predict in every time point. so my target can be: yield in a fixed time laster, like, 30 min. yt = close(t+ws) - close(t) futures price direction, which only can be 1(price up in the future) -1 (price down in the future) these are difference between regression and classification. …
Category: Data Science

Steps to fit a Machine learning model for prediction of up and down market movement

I have around 5 years of data of an index containing many features on a daily basis. I want to classify whether the index will move up or down the next trading day (up or down movement is determined by next day open/close price). I am using an SVM classifier for this classification. What could be some essential steps which need to be followed? I suppose since I am using financial data, there would be some deviation from the traditional …
Category: Data Science

Optimize Yahoo Finance Code for Analysis

I am trying to analyze a number of companies using financial data I gathered from Yahoo Finance. I am also using the yfinance API to get some more details about the company using functions. Since I am trying to do this for a number of companies Each Iteration needs to be quick. Currently, 1 Company takes about 3 seconds. Is it because of the API calls or requests? Can I increase the below code efficiency? import pandas as pd import …
Category: Data Science

How does the GAN based prediction in K. Zhang et al. (2018) improve performance?

In Stock Market Prediction Based on Generative Adversarial Network by K. Zhang et. al, the authors feed financial data (X0...Xt) into an LSTM to predict Xt+1. Then, they evaluate whether the series (X0...Xt+1) is real or not (with as Xt+1 either the predicted or the one appearing in the data). This is done in a GAN system, with the predictor trained to create an Xt+1 so that the series would fool the discriminator. What is the benefit of training the …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.