Using BERT for co-reference resolving, what's the loss function?

Question

Using BERT for co-reference resolving, what's the loss function?

EyeQ Tech

2022年6月3日 08:00

I'm working my way around using BERT for co-reference resolving. I'm following this highly-cited paper BERT for Coreference Resolution: Baselines and Analysis (https://arxiv.org/pdf/1908.09091.pdf). I have following questions, the details can't be found easily from the paper, hope you guys help me out.

What’s the input? is it antecedents + parapraph? What’s the output? clusters mention, antecedent ? More importantly What’s the loss function?

For comparison, in another highly-cited paper by [Clark .et al] using Reinforcement Learning, it's very clear about what reward function is. https://cs.stanford.edu/people/kevclark/resources/clark-manning-emnlp2016-deep.pdf

Topic bert nlp

Category Data Science

Jindřich · Accepted Answer · 2020年8月17日 09:17

NER is approached as a sequence-labeling problem and the end of the network, there is a categorical distribution estimated by softmax that is trained using cross-entropy loss.

The paper you are specifically asking about claims to be based on the End-to-end Neural Coreference Resolution paper that does something more tricky. They explicitly consider all spans to the entities spans and compute the probability for that. Nevertheless, when they get the probabilities they still do in principle the cross-entropy loss (cf. their code).

Using BERT for co-reference resolving, what's the loss function?

About