Reinforcement Learning End Effector Moving To Camera and Stops Learning

I am working on training a 3 finger Jaw gripper. The environment I setup is this:

  • UR10 3 finger robot
  • Pybullet for Simulation
  • Stable baselines and DDPG
  • Observation space is RGB image stacked with Depth and Segmentation Mask
  • Action space is dx,dy,dz added to current position of end effector (wrist of robot) alpha, beta, gamma as orientation angles of end effector and joint positions of fingers.
  • Reward 1: (1 - ((end effector distance from object)/(some max distance)))*10
  • Reward 2: When all three fingers in contact with object, reward is = (height of object) * 30 while staying in contact
  • Reward 3: On certain height I add another 1000 and end the episode
  • Termination 2: After 1000 time steps (will reduce it to 300)

My problem is after training it for 125000 timesteps (in approx. 10 hours). The robot instead of maximizing its reward and moving close to object, moves directly toward camera (which is away from object) and stays there collecting approximately 6.5 reward each step instead of 10 it could get by moving closer to object.

Here is the Picture:

What could be the issue? This is my first try at reinforcement learning and spent about two weeks just learning how to and then setting up the environment. I am kinda clueless here as the reward function looks good enough.

Here is the code for the environment and where I am setting up the training

Topic reward openai-gym reinforcement-learning python

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.