Solusion to discover/inference the usage/meanings of tables in unkown database?
This is a usual situation I meet recently that customers gave me a database with many tables they don't quite understand too, then ask me to make a model predict the future revenue, classify which user may be valuable or something else.
To be honest, extracting useful data from an unknown database made me exhausted.
For example, I need to figure out
- which table is the user table, product table, or transaction table ...
- which column can use to join(there may not have the foreign key)
- what relationship may exist between two tables
I have thought user_id
may have features:
- is unique in user relative table, but can be duplicated in transactions
- length may lower than a value
- have
one to many
relationship between user table and transaction table.
But this prior information is poor, lack of sample is another problem.
I searched for competitions and paper but nothing was found.
Topic inference relational-dbms databases
Category Data Science