How to aggregate data inserted by users to avoid outliers?

I'm developing a new application based on machine learning. In this application users can insert new data to improve the prediction system. As you may guess, users could insert data that doesn't make sense, generating in this way outliers that may harm the prediction accuracy. I'm pretty new to this field so I would like to ask you: do you know any strategy to mitigate this? Maybe by implementing a voting or aggregating system? In that case, do you have any hint, our could you please direct me to some theoretical topics regarding this?

Topic aggregation data outlier

Category Data Science


The most basic solution would be make your data agree to some rules. But, that may not be possible.

You can find the similarity of the new data with the data you already have. It is beyond a certain threshold then that data doesn't make sense.

You can also use the data directly for training and find it using techniques like TracIn but this may readily usable.

Your best bet is second method of finding similarity and thresholding.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.