There is confusion because there is no consensus in the community. While I think the definitions are well-explained in your references, here I try to phrase them in my own words hoping that it can clarify your question.
Notice that the two regrets and the two pseudo-regrets below are not the same things.
In the book of Lattimore and Szepesvári, they start with a definition that is tailored to the stochastic MAB setup. In fact, $\mu^{\star}$ is defined because each arm has an associated distribution and thus an associated mean $\mu_a$; we can then set $\mu^{\star}=\max_{a\in\mathcal{A}}\mu_A$. After that, we take $\mu^{\star}$ as the comparator and defined the regret as an expected value, which is just a real number.
$$ \text{Regret: }T\mu^{\star}- \mathbb{E}\Bigg[\sum_{t=1}^{T} X_{a_t,t}\Bigg] $$
On the other hand, as mentioned in the reply of Valentas, both random regret and pseudo-regrets are random variables.
$$ \text{Random regret: } T\mu^{\star}- \sum_{t=1}^{T} X_{a_t,t}\\
\text{Pseudo-regret: } T\mu^{\star}- \sum_{t=1}^{T}\mu_{a_t} $$
The expectation of both of these are the regret defined above. While both of these are random variables that vary from realization to realization, the difference is that in pseudo-regret we evaluate the performance of the learner by the expected reward of the pulled arm while in random regret we consider the actual feedback.
In the paper of Bubeck and Cesa-Bianchi, they first defined the regret as a random variable.
$$\text{Regret: } \max_{a\in\mathcal{A}} \sum_{t=1}^{T} X_{a,t} - \sum_{t=1}^{T} X_{a_t,t}$$
On the other hand, both pseudo regret and expected regret are real numbers. The difference is in how we define the "best" action, or the comparator.
- In expected regret, the "best" action depends on the randomness and is itself a random variable, because the maximum is inside the expectation
$$\text{Expected regret: } \mathbb{E}\Bigg[\max_{a\in\mathcal{A}} \sum_{t=1}^{T} X_{a,t} - \sum_{t=1}^{T} X_{a_t,t}\Bigg]$$
- In pseudo-regret, the "best" action is only defined with respect to the expectation and is not random. The randomness only plays a role once the "deterministic" comparator is fixed.
$$\text{Psudo-regret: } \max_{a\in\mathcal{A}}\mathbb{E}\Bigg[ \sum_{t=1}^{T} X_{a,t} - \sum_{t=1}^{T} X_{a_t,t}\Bigg]$$
If we consider the situation of stochastic MAB, we get
- The pseudo-regret is exactly what we refer to as regret in the first part. In fact, $$\mathbb{E}\Bigg[ \sum_{t=1}^{T} X_{a,t} - \sum_{t=1}^{T} X_{a_t,t}\Bigg]=\mathbb{E}\Bigg[ \sum_{t=1}^{T} \mu_a - \sum_{t=1}^{T} \mu_{a_t}\Bigg]=T\mu_a - \mathbb{E}\Bigg[\sum_{t=1}^{T} \mu_{a_t}\Bigg],$$ so if we maximize with respect to $a$ we get $T\mu^{\star}$ in the first term
- The expected regret is something larger because for each realization, we take the action that maximize the realized reward. If I remember correctly, we can not avoid $\sqrt{T}$ expected regret even in the simple stochastic MAB case. This is in sharp contrast with the $\log T$ instance dependent bound that we can get for the regret / pseudo-regret.
For more contexts
The notion of the regret is a general thing in online learning; it applies to problems like bandits, online convex optimization, learning in games, and probably many others. The essential idea is to compare the cumulative performance of the learner against some type of comparator. Beyond this intuition, I am not sure if there is any consensus in the community. The comparator can change over time (dynamic regret), can be a function (external regret), while the environment can be stochastic, adversarial (oblivious or adaptive), modeled by a game etc. I think this diversity is the main source of confusion.
As mentioned above, the definition the you give here is very specific to the stochastic MAB setup and is only a special case of what we can be referred as regret.