What are "belief maps" and "affinity maps"?

When evaluating Nvidia-Deep Object Pose Estimation, I came across these terms, belief maps and affinity maps. I haven't been able to find a satisfying answer online on what these terms mean.

Topic nvidia deep-learning

Category Data Science


In computer vision, pose estimation is the detection of an object's orientation and positioning.

Belief Maps

A team from Nvidia have proposed "Deep Object Pose Estimation", which has two main components:

  • Detect objects in image and generate 2D keypoint mappings

  • Project 2D mapping to 6D

From what I can see, "belief maps" are generated by the first component of their system and describe elements of the 2D keypoint mappings that are then projected to 6D by the second component. From the article:

The feedforward network takes as input an RGB image of size w×h×3 and branches to produce two different outputs, namely, belief maps and vector fields. There are nine belief maps, one for each of the projected 8 vertices of the 3D bounding boxes, and one for the centroids. Similarly, there are eight vector fields indicating the direction from each of the 8 vertices to the corresponding centroid... to enable the detection of multiple instances of the same type of object.

Affinity Fields

I've not come across affinity maps, but affinity fields are referred to above as inspiration for their approach to pose estimation, specifically in "Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields". That article aims to solve the problem of how to attribute limbs to people in video containing more than one person.

Affinity maps are "an explicit nonparametric representation of the keypoints association that encodes both position and orientation of human limbs", which you can see a demo of in their video (which is pretty cool).

So bringing it back to the first article, affinity fields are analogous to the vector fields mentioned above. They're used to associate body parts with a particular person in 5, or to associate vertices of belief maps to a centroid representing an object in 4.

So belief maps represent the features extracted by their deep CNN, and vector/affinity fields help associate those vertices with an item that's been detected.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.