How to write a reward function that optimizes for profit and revenue?

Question

How to write a reward function that optimizes for profit and revenue?

JimDoe

2021年10月17日 15:56

So I want to write a reward function for a reinforcement learning model which picks products to display to a customer. Each product has a profit margin %.

Higher price products will have a higher profit margin but lower probability of being purchased. Lower price products have a lower profit margin, but higher probability of being purchased.

The goal is to maintain an AVERAGE margin of 5% for ALL products sold, while maximizing the total revenue.

What's the best way to write this reward function?

Topic reward reinforcement-learning machine-learning

Category Data Science

Neil Slater · Accepted Answer · 2021年10月17日 15:56

Your goals include two criteria that interact and may conflict. It is not possible to write a single reward function to solve this perfectly. You have to decide first on relative importance of the two goals. As one is effectively a constraint, you need to decide on how hard you want to apply this constraint.

As the revenue is easy to measure, and already natural expression of what that part of the optimisation is supposed to achieve, you can start by using an arbitrary scaling for revenue that makes the numbers simple for your approximator - e.g. a neural network. Having numbers in the thousands or millions is not great because the error values could be really large during training, so I would try to scale this part of the reward by some order of magnitude depending on values you are expecting.

Following that, you then have to decide how to add in some reward factor for the gross profit margin. There are lots of ways to do this, because the constraint you have been given is not "natural", it is something that a business owner or analyst has determined will result in overall acceptable net profit margin, which is related to but not the same as the gross profit margin goals you have been given (this is not unexpected, net profit margin is the real goal of the company, but much more complicated to figure out than gross profit margin per sale).

I can think of two additional rewards that you could add in order to represent the goal of meeting the gross profit margin target:

As it has been phrased as a constraint, you will want negative rewards for sales that result in gross profit margin below 5% and positive rewards for sales that result in gross profit margin above 5%. You may be able to simplify that down to +1 or -1 per sale depending on what side of the line your margin currently is.
As an individual sale may not move this average by much, you may want add a third reward centred on the 5% mark that simply is the amount above or below the 5% mark for an individual sale. So e.g. an object sold at £104 with a cost of £100 would score -1 reward. This option is a form of "reward shaping". There is a chance it could be counter-productive, but bear it in mind in case short term learning does not steer sales in the right direction.

There are several other ways that you could construct a reward system. The key thing to bear in mind is that all rewards that you are adding from different sources need to be scaled to work together and express the goal of your agent. This is something you will need to establish through trial and error. You may be able to get a feel for the behaviour your weightings are encouraging by working through some examples from your data.

High weights on meeting the 5% constraint may reduce revenue through lack of sales (because all offered items may be more expensive), low weights on the constraint may have the business operating at a loss overall (as it makes sales that cost the company more in overheads than the smaller profit margins can make up for). However, there is no mathematically correct answer to that unless you can somehow model the relationship to net profit margin well enough to use that as the goal instead.

How to write a reward function that optimizes for profit and revenue?

About