Fuzzily join two large sets of postal addresses
I have two tables of postal address information - the one is about 2 million records, the other roughly 40 million. They have quite bad quality, and also are not quite compatible with each other (different conventions in both sets, some fields cut off in an impractical way... - in other words, Real World Data). They may not be the largest ones around, but compared to the available hardware, they are non-trivial (I cannot simply spin up a lot of …
Category:
Data Science