How to measure Entity Ambiguity?

Question

How to measure Entity Ambiguity?

Abdulrahman Bres

2021年2月4日 14:42

When using/building a system for Entity Linking, is there a well-known measure for "ambiguity degree" of an entity?

Some approach to compare named entities regarding how difficult to disambiguate?

Topic metric research named-entity-recognition text-mining nlp

Category Data Science

ted · Accepted Answer · 2018年12月18日 09:26

Shannon entropy is a common measure of uncertainty among a fixed set of choices, as are the ones provided by Brian Spiering.

Regarding your question -- "some approach to compare named entities regarding how difficult to disambiguate?" -- note that the difficulty to disambiguate an entity is completely context and domain dependent. To give a truly useful answer, you would need to provide more specifics about how your system will be used.

Brian Spiering · Accepted Answer · 2018年12月14日 16:38

Entity Linking is a type of supervised machine learning, thus many of the common performance metrics could be used. In particular, creating a confusion matrix would identify where one label was predicted but the ground-truth was different. Confusion matrices can be calculated with counts or normalized, a normalized data would be an estimate of "ambiguity degree" relative to the other labels in the current dataset.

Other classification measures such has F-score, precision, and recall could also be used. In particular, low precision for a label would suggest the model has trouble disambiguating entities from nearby text. "Cheap and easy entity evaluation" goes into more technical details.

Inter-rater reliability could also be used, the raters could be different humans or different models. If the joint-probability of agreement between different raters is low then entities could be regarded as difficult to disambiguate.

The performance also depends on the relative value of an exact match vs partial match.

How to measure Entity Ambiguity?

About