ML for data processing. What are the options?
Currently I am working on improving a stage on a data processing pipeline. The source data has a large number of fields and is getting normalized into a simpler entity. This entails that in many cases a destination field value may be copied from arbitrary input fields, according to the context.
My idea was to regress a binary output sources-destinations matrix that associates the possible source fields to the possible source destinations.
I was wondering: is this a problem that has been tackled before? Is there anything in scientific literature that is worth noting?
Topic matrix data preprocessing data-cleaning machine-learning
Category Data Science