Using BERT for co-reference resolving, what's the loss function?
I'm working my way around using BERT for co-reference resolving. I'm following this highly-cited paper BERT for Coreference Resolution: Baselines and Analysis (https://arxiv.org/pdf/1908.09091.pdf). I have following questions, the details can't be found easily from the paper, hope you guys help me out.
What’s the input? is it antecedents + parapraph? What’s the output? clusters mention, antecedent ? More importantly What’s the loss function?
For comparison, in another highly-cited paper by [Clark .et al] using Reinforcement Learning, it's very clear about what reward function is. https://cs.stanford.edu/people/kevclark/resources/clark-manning-emnlp2016-deep.pdf
Category Data Science