What are the metrics to evaluate Data Quality?

Data quality refers to the overall utility of a dataset(s) as a function of its ability to be easily processed and analyzed for other uses, usually by a database, data warehouse, or data analytics system.

What are the various techniques and metrics to evaluate the quality of the dataset to be considered for building any ML/Statistical Model? Are there any metrics to evaluate the data quality?

Some measures I thought to consider:

  • Columns where nearly all values are identical (Stability)
  • Columns with missing values (Missing)
  • Columns with complete values (Completeness)

Topic data-quality data-mining machine-learning

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.