Algortihm for distributing volume in 1min stock intervals
Context: I have historical 1min prices for stocks, including premarket. However, when importing real-time data, the standard practice in the financial data industry is to give only OHLC (open, high, low, close) prices and 0 volume for 1min intervals. But they do provide the total amount of pre-market volume.
Example: AAPL 1min data from yahoo finance.
Open High ... Adj Close Volume
Datetime
[...] [...]
2021-07-20 09:25:00 143.420000 143.460000 ... 143.400000 0
2021-07-20 09:26:00 143.410000 143.430000 ... 143.395000 0
2021-07-20 09:27:00 143.410000 143.650000 ... 143.410000 0
2021-07-20 09:28:00 143.625000 143.625000 ... 143.500000 0
2021-07-20 09:29:00 143.500000 143.560000 ... 143.560000 0
2021-07-20 09:30:00 143.460007 144.029999 ... 143.990005 2946764
2021-07-20 09:31:00 143.990005 144.009995 ... 143.309998 587213
2021-07-20 09:32:00 143.320007 143.580002 ... 143.535004 667389
2021-07-20 09:33:00 143.550003 143.710007 ... 143.570007 509797
2021-07-20 09:34:00 143.554993 143.589996 ... 143.210007 421908
The total amount of volume during pre-market hours is given one minute before the market open (09:30)
My question is: what algortihm can I use for filling the 1min intervals with the total volume? I know that the volume amount is closely linked with the size of the 1min interval (price.high - price.low)
.
My solutions so far are those:
I get the average size and volume of 1 min candles (volume average created by using all pre-mkt volume and dividing by the number of candles). For each candle I create a ratio (candle size/candle average). So, if candle is below average, it gets less volume. Problem: This solution is more or less okay, the problem is that is primitive. The relationship with size and volume is not that linear.
Create reinforcement learning model: I would do that by letting the AI choose between 2 actions: to add 1000 of volume or not in each candle. It would loop until there is no more volume left (from the total amount of volume). I would compare each episode with the absolute amount of difference between the real volume (from historical data) and what the reinforcement learning created. I believe it would get the nuances of volume relationship with candle size in a better way.
Create a machine learning with constrain: the problem of predictive model is that I don't know to give the constrain of limited volume to distribute between 1 min candles. I'm quite familiar with ML algorthimns but I don't know how to solve this problem.
I believe that problem is too convoluted for a stack exchange post, if you have any advice on where I can also discuss more in deapth things like this, would be super helpful.
Thanks in advance.
Topic finance reinforcement-learning machine-learning
Category Data Science