Cannot see what the "notation abuse" is, mentioned by author of book

Question

Cannot see what the "notation abuse" is, mentioned by author of book

cinqS

2017年12月4日 23:15

From Sutton and Barto, Reinforcement Learning: An Introduction (second edition draft), in equation 3.4 of page 38.

The probabilities given by the four-argument function p completely characterize the dynamics of a finite MDP. From it, one can compute anything else one might want to know about the environment, such as the state-transition probabilities (which we denote, with a slight abuse of notation, as a threeargument function

$p(s^{'} | s, a) \dot{=}Pr\{S_t=s^{'} | S_{t-1} = s, A_{t-1}=a\} = \sum_{r\in{R}}{p(s^{'},r|s,a)}$

The author mentioned, with a slight abuse of notation. where is the abuse in the notation please? I didn't see anything that is not proper.

Thank you.

Topic notation reinforcement-learning

Category Data Science

FatihAkici · Accepted Answer · 2017年12月4日 23:15

The mathematical expression is completely legit. The abuse is in the fact that the function $p$, which is defined first time in equation 3.2, which:

The function $p: S$ x $R$ x $S$ x $A \rightarrow [0,1]$. is an ordinary deterministic function of four arguments...

is re-defined slightly differently just two lines after this definition (equation 3.4), as a three-argument function $p: S$ x $S$ x $A \rightarrow [0,1]$.

If they used $p$ to represent the regular probability measure, there would be no abuse. In the authors' notation, $p$ is a deterministic function, while the regular probability function is denoted as $Pr$; and keeping the same name for slightly different functions, is where the "innocent" notation abuse comes from.

Cannot see what the "notation abuse" is, mentioned by author of book

About