How to train a machine learning algorithm with multiple labels
I have the following challenge and I very much hope that there is a solution to it. I also suspect that there is a simple approach to it. I just don't see it at the moment. Any help or advice is highly appreciated.
So, I have the following situation:
I asked persons to label about 1000 data points (each twice) on a 5-point scale, whose scores are not equi-distant. Texts were assessed with regard to several qualitative characteristics (such as comprehensibility). As was to be expected, the labelers did not always agree on the assessment. By analysing the inter-rater reliability, however, a "substantial" reliability (according to Landis and Koch) could be determined.
Now I want to use the labelled data as input for a machine learning algorithm (e.g. SVM and Random Forest). The challenge now is how to optimize the data in advance. Currently it is the case that for the same sample there are also different labels available.
The average value between different labels does not seem reasonable to me. So are there standard procedures how I can adjust the data set in advance?
You would help me a lot!
Thanks a lot in advance.