How to measure statistical similarity or discrepancy between a dataset and a distribution?

Is any way to measure statistical similarity or discrepancy between a dataset and a distribution? I have do some research, but find most of method are intended to describe discrepancy between data and data, or between distribution and distribution. That is to say, they always are measure the same kind of thing. What I looking for is a method can measure discrepancy between a dataset and a distribution. It would be nice if there were a corresponding method that easy to implementation or having an existing programming implementation. Very appreciate if someone have any idea.

Topic mathematics distribution statistics

Category Data Science


The easiest one that comes to mind is the maximum vertical distance between the two CDFs (one empirical, one theoretical). This goes on to be part of the calculation of the p-value in a one-sample Kolmogorov-Smirnov test (often just called KS), which has a null hypothesis that the data come from the theorized distribution and an alternative hypothesis that the null is false, so you even can get a p-value for distribution equality, if that is a requirement.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.