What fits in a Data Description Report/ Data Exploration Report?

So I am trying to get familiar with Crisp-DM and found the terms "Data Description Report" and "Data Exploration Report", which seem oddly vague in their definition. So far I only found this right here: https://www.ibm.com/support/knowledgecenter/en/SS3RA7_15.0.0/com.ibm.spss.crispdm.help/crisp_data_description_report.htm

But this seems to be on the shorter end in my opinion. Is there any example of a Data Description Report anywhere? If not, is there any systematic methodology you personally use to record your findings while trying to understand data?

Topic data terminology dataset definitions

Category Data Science


Generally speaking each data set may have different structure and may relate to different business aspects.
Because of that I think its hard generalizing the description/exploration steps.

Yet, these are my two cents:

Data description:
* Data sources - how was data created, generated, collected.
* Data shape - number of rows, columns.
* Data types(per each column) - numeric, strings, other..
* Missing values - how much, where, why?
* Time frames - if relevant.
* Entities(geographical, markets, population segments, devices) - what identifies a row in the data(if applicable).

Data exploration:
* Counts per entity(grouping keys).
* Categorical variables breakdown.
* Values ranges distributions.
* Simple correlations.
* Story telling(visualizations that depict key aspects in the data<>business realm).
* Missing data deep dive.


Describe the data that has been acquired, including the format of the data, the quantity of data (for example, the number of records and fields in each table), the identities of the fields, and any other surface features which have been discovered. Evaluate whether the data acquired satisfies the relevant requirements.

Remember that this step is about collecting information on the raw tables you acquired on querys.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.