Algortihm for distributing volume in 1min stock intervals

Context: I have historical 1min prices for stocks, including premarket. However, when importing real-time data, the standard practice in the financial data industry is to give only OHLC (open, high, low, close) prices and 0 volume for 1min intervals. But they do provide the total amount of pre-market volume.

Example: AAPL 1min data from yahoo finance.

                           Open        High  ...   Adj Close   Volume
Datetime      
[...]                                       [...]                     
2021-07-20 09:25:00  143.420000  143.460000  ...  143.400000        0
2021-07-20 09:26:00  143.410000  143.430000  ...  143.395000        0
2021-07-20 09:27:00  143.410000  143.650000  ...  143.410000        0
2021-07-20 09:28:00  143.625000  143.625000  ...  143.500000        0
2021-07-20 09:29:00  143.500000  143.560000  ...  143.560000        0
2021-07-20 09:30:00  143.460007  144.029999  ...  143.990005  2946764
2021-07-20 09:31:00  143.990005  144.009995  ...  143.309998   587213
2021-07-20 09:32:00  143.320007  143.580002  ...  143.535004   667389
2021-07-20 09:33:00  143.550003  143.710007  ...  143.570007   509797
2021-07-20 09:34:00  143.554993  143.589996  ...  143.210007   421908

The total amount of volume during pre-market hours is given one minute before the market open (09:30)

My question is: what algortihm can I use for filling the 1min intervals with the total volume? I know that the volume amount is closely linked with the size of the 1min interval (price.high - price.low).

My solutions so far are those:

  1. I get the average size and volume of 1 min candles (volume average created by using all pre-mkt volume and dividing by the number of candles). For each candle I create a ratio (candle size/candle average). So, if candle is below average, it gets less volume. Problem: This solution is more or less okay, the problem is that is primitive. The relationship with size and volume is not that linear.

  2. Create reinforcement learning model: I would do that by letting the AI choose between 2 actions: to add 1000 of volume or not in each candle. It would loop until there is no more volume left (from the total amount of volume). I would compare each episode with the absolute amount of difference between the real volume (from historical data) and what the reinforcement learning created. I believe it would get the nuances of volume relationship with candle size in a better way.

  3. Create a machine learning with constrain: the problem of predictive model is that I don't know to give the constrain of limited volume to distribute between 1 min candles. I'm quite familiar with ML algorthimns but I don't know how to solve this problem.

I believe that problem is too convoluted for a stack exchange post, if you have any advice on where I can also discuss more in deapth things like this, would be super helpful.

Thanks in advance.

Topic finance reinforcement-learning machine-learning

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.