Deep Reinforcement Learning for dynamic pricing
I am trying to implement a Deep Q Network model for Dynamic pricing in Logistics. I can define
State Space (Origin, Destination, type of the shipment, customer, Type of the product, Commodity of the shipment, AVAILABILITY of capacity etc.
Action Space (price itself, can range from 0 to inf) we need to determine the price itself.
Reward Signal (Rewards can be based on a similar offer to other customers, seasonality, remaining capacity.
I am planning to use Multi-Layer Perceptron for getting inputs from the state space and the outputting the price.
I am not sure how to define a reward function. Please help me in defining the mathematical formula for the reward function based on the price as an action space?
-- UPDATE --
State space that evolves over the time is the remaining capacity (Logistics). Consider at the initial time step is 10,000 kgs capacity and at over a period of time the capacity decreases and when the capacity is full and it cannot take anymore shipments, then the episode completes.
The agent will have to find an optimal price based on the following rewards.
Topic deepmind dqn tensorflow reinforcement-learning deep-learning
Category Data Science