Shannon entropy is a common measure of uncertainty among a fixed set of choices, as are the ones provided by Brian Spiering.

Regarding your question -- "some approach to compare named entities regarding how difficult to disambiguate?" -- note that the difficulty to disambiguate an entity is completely context and domain dependent. To give a truly useful answer, you would need to provide more specifics about how your system will be used.


Entity Linking is a type of supervised machine learning, thus many of the common performance metrics could be used. In particular, creating a confusion matrix would identify where one label was predicted but the ground-truth was different. Confusion matrices can be calculated with counts or normalized, a normalized data would be an estimate of "ambiguity degree" relative to the other labels in the current dataset.

Other classification measures such has F-score, precision, and recall could also be used. In particular, low precision for a label would suggest the model has trouble disambiguating entities from nearby text. "Cheap and easy entity evaluation" goes into more technical details.

Inter-rater reliability could also be used, the raters could be different humans or different models. If the joint-probability of agreement between different raters is low then entities could be regarded as difficult to disambiguate.

The performance also depends on the relative value of an exact match vs partial match.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.