ML for data processing. What are the options?

Question

ML for data processing. What are the options?

fstab

2021年9月8日 02:17

Currently I am working on improving a stage on a data processing pipeline. The source data has a large number of fields and is getting normalized into a simpler entity. This entails that in many cases a destination field value may be copied from arbitrary input fields, according to the context.

My idea was to regress a binary output sources-destinations matrix that associates the possible source fields to the possible source destinations.

I was wondering: is this a problem that has been tackled before? Is there anything in scientific literature that is worth noting?

Topic matrix data preprocessing data-cleaning machine-learning

Category Data Science

Brian Spiering · Accepted Answer · 2021年9月6日 14:01

What you are describing could be modeled as bipartite graph, one set of nodes connects to another set of nodes. Thus, it becomes a bipartite graph matching problem.

As the size of each graph grows it quickly becomes intractable, then approximate nearest neighbor search might be useful.

ML for data processing. What are the options?

About