How to create transition probability (state) for q-learning algorithm designed to control traffic light system using python?
I am trying to create a q learning algorithm to control traffic light systems. I am representing the state with a matrix :
state = [[no. of cars on up, no. of cars down],[no. of cars on left, no. of cars right]]
but its stochastic since after allowing cars to move through one road , there are probability that cars would enter as well. I wrote the probability as follow every 4 seconds
prob that 0 cars enter on one road = 0.5,
prob that 1 cars enter on one road = 0.3,
prob that 2 cars enter on one road =0.2,
and since there are 4 roads, the
probability of getting one state = prob of state[0][0]*prob of state[0][1] * prob of state[1][0] * prob of state[1][1]
and this is my training algorithm :
reward = reward_dictionary[(hash(totuple(old_state)),action_index)]
old_q_value = q_values_table[hashed_matrix.index(hash(totuple(old_state))),0,action_index]
temporal_difference = (reward + (discount_factor * np.max(q_values_table[hashed_matrix.index(hash(totuple(state))), 0])) - old_q_value)
new_q_value = old_q_value + get_transition_prob(state)*(learning_rate * temporal_difference)
q_values_table[hashed_matrix.index(hash(totuple(old_state))), 0, action_index] = new_q_value
on the q value table, the matrix is represent by their hash value. and now I don't know how to write the get_transition_prob(state) function since if I put the state as a matrix , it doesn't make sense since its like matrix times a constant. Can you please help me make a correct transition function for the state which involves the probability and please help me combine it with my main q learning algorithm . and please tell me if I need to add the transition to my testing function as well.I will be very thankful if you could help since its my research question and its due the day after tomorrow.
Topic q-learning reinforcement-learning python statistics machine-learning
Category Data Science