Is it possible to use structured(tabular) data as a reinforcement learning environment?
I want to do an RL project in which the agent will learn to drop duplicates in a tabular data. But I couldn't find any examples of RL being used that way - checked the RL based recommendation systems if they use a user-item interaction matrix as in collaborative filtering.
I am wondering if it's really possible and how to define the problem (e.g. if episodic; episode terminates when the agent is done iterating over all data samples etc.).
Can someone please give me an idea and show a reference if it's possible?