What are the metrics to evaluate Data Quality?

Pluviophile

2022年1月7日 08:35

Data quality refers to the overall utility of a dataset(s) as a function of its ability to be easily processed and analyzed for other uses, usually by a database, data warehouse, or data analytics system.

What are the various techniques and metrics to evaluate the quality of the dataset to be considered for building any ML/Statistical Model? Are there any metrics to evaluate the data quality?

Some measures I thought to consider:

Columns where nearly all values are identical (Stability)
Columns with missing values (Missing)
Columns with complete values (Completeness)

Topic data-quality data-mining machine-learning

Category Data Science

What are the metrics to evaluate Data Quality?

About