What enables transformers or very deep models "plan" ahead for sequential decision making?

I was watching this amazing lecture by Oriol Vinyals. On one slide, there is a question asking if the very deep models plan. Transformer models or models employed in applications like Dialogue Generation do not have a planning component but behave like they already have the dialogue planned. Dr. Vinyals mentioned that there are papers on how transformers are building up knowledge to answer questions or do all sorts of very interesting analyses. Can any please refer to a few of such works?

Topic transformer reinforcement-learning deep-learning neural-network machine-learning

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.