Methods for learning with noisy labels
I am looking for a specific deep learning method that can train a neural network model with both clean and noisy labels.
More precisely, I would like this method to be able to leverage noisy data as well, for instance by not fully "trusting" noisy data, or weighting samples, or deciding whether to use a specific sample at all for learning. But primarily, I am looking for inspiration.
Details:
- My task is sequence-to-sequence NLP,
- I have both clean pairs of sequences of
(clean input, clean output)
and noisy ones(noisy_input, noisy_output)
, - I know for certain which samples in my data are noisy, and if possible, I would like the desired method to make use of this information
I am very glad to give more information about my use case if needed.
Edit: Noisy vs. negative examples
First, I wouldn't use the word "noisy" here because if you know which instances are "wrong" then these are not noise, they are negative examples.
My view is that the data I have are noisy examples, but not "negative". Using an example from machine translation from German to English:
clean (equivalent meaning)
DE Wenn es um die Medien geht, lebt Amerika in einem Paralleluniversum.
EN Regarding media, the US are living in a parallel universe.
noisy (meaning overlap)
DE Wenn es um die Medien geht, lebt Amerika in einem Paralleluniversum.
EN Regarding media, the US are weird.
negative (unrelated)
DE Wenn es um die Medien geht, lebt Amerika in einem Paralleluniversum.
EN Is Math related to science?
Topic sequence-to-sequence noise deep-learning nlp
Category Data Science