What are the metrics to evaluate Data Quality?
Data quality refers to the overall utility of a dataset(s) as a function of its ability to be easily processed and analyzed for other uses, usually by a database, data warehouse, or data analytics system.
What are the various techniques and metrics to evaluate the quality of the dataset to be considered for building any ML/Statistical Model? Are there any metrics to evaluate the data quality?
Some measures I thought to consider:
- Columns where nearly all values are identical (Stability)
- Columns with missing values (Missing)
- Columns with complete values (Completeness)
Topic data-quality data-mining machine-learning
Category Data Science