How to interpret integrated gradients in an NLP toxic text classification use-case?

Question

How to interpret integrated gradients in an NLP toxic text classification use-case?

Revolucion for Monica

2022年4月9日 14:30

I am trying to understand how integrated gradients work in the NLP case.

Let $F: \mathbb{R}^{n} \rightarrow[0,1]$ a function representing a neural network, $x \in \mathbb{R}^{n}$ an input and $x' \in \mathbb{R}^{n}$ a reference. We consider the segment connecting $x$ to $x'$, and we compute the gradient at any point of this segment. The IG method is simply to sum these gradients. Thus, $I G$ in the ith dimension is given by the following formula:

$$ I G_{i}(x)=\left(x_{i}-x'_{i}\right) \frac{\int_{\alpha=0}^{1} d F(x'+\alpha(x-x \prime))}{d x_{i}} d \alpha $$

The advantage that IG has over other existing methods is that it satisfies the two axioms of sensitivity and implementation invariance that we detail in the next paragraph.

In the case of NLP, where $x$ may be a text, $F$ a toxicity classification algorithm, and $x'$ the reference (but what kind of reference? a non-toxic text? Or a toxic one?).

Topic explainable-ai gradient gradient-descent neural-network nlp

Category Data Science

How to interpret integrated gradients in an NLP toxic text classification use-case?

About