In computer vision, pose estimation is the detection of an object's orientation and positioning.
Belief Maps
A team from Nvidia have proposed "Deep Object Pose Estimation", which has two main components:
From what I can see, "belief maps" are generated by the first component of their system and describe elements of the 2D keypoint mappings that are then projected to 6D by the second component. From the article:
The feedforward network takes as input an RGB image of size w×h×3 and
branches to produce two different outputs, namely, belief maps and
vector fields. There are nine belief maps, one for each of the
projected 8 vertices of the 3D bounding boxes, and one for the
centroids. Similarly, there are eight vector fields indicating the
direction from each of the 8 vertices to the corresponding centroid...
to enable the detection of multiple instances of the same type of object.
Affinity Fields
I've not come across affinity maps, but affinity fields are referred to above as inspiration for their approach to pose estimation, specifically in "Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields". That article aims to solve the problem of how to attribute limbs to people in video containing more than one person.
Affinity maps are "an explicit nonparametric representation of the keypoints association that encodes both position and orientation of human limbs", which you can see a demo of in their video (which is pretty cool).
So bringing it back to the first article, affinity fields are analogous to the vector fields mentioned above. They're used to associate body parts with a particular person in 5, or to associate vertices of belief maps to a centroid representing an object in 4.
So belief maps represent the features extracted by their deep CNN, and vector/affinity fields help associate those vertices with an item that's been detected.