Data Analytics how to read ECDF graph

Hi there, My question is about how to read ECDF graphs. I am still quite unsure what the jumps / zig-zags in the graph mean and what is happening when there is a horizontal line and so on. I would be happy if someone can explain me how I am suppose to read this graph and what information I can get from it. Thank you

Topic data-analysis data graphs

Category Data Science


Empirical Cumulative Distribution Function (CDF) Plot

From Wikipedia,

In statistics, an empirical distribution function (commonly also called an empirical Cumulative Distribution Function, eCDF) is the distribution function associated with the empirical measure of a sample.

The empirical CDF is a step function that asymptotically approaches $0$ and $1$ on the vertical Y-axis. The step function increases by a percentage equal to $1/N$ for each observation in your dataset of $N$ observations.

Use an empirical cumulative distribution function plot to display the data points in your sample from lowest to highest against their percentiles. These graphs require continuous variables and allow you to derive percentiles and other distribution properties. This function is also known as the empirical CDF or ECDF. It’s empirical because it represents your observed values and the corresponding data percentiles.

enter image description here

Understanding plot structure:

Empirical CDF plots typically contain the following elements:

  • Y-axis representing a percentile scale.
  • X-axis representing the data values.
  • Stepped function displaying the cumulative distribution observed in the sample.
  • A fitted cumulative distribution based on parameters estimated from the sample.

The blue stepped line is the empirical CDF function and the green curve is the fitted CDF for the normal distribution.

Use an empirical CDF plot to assess the following features of your dataset:

  • Percentiles and proportions for data ranges.
  • Identify where most values occur.
  • Assess the range of your data.
  • Compare sample distributions.
  • Determine how well your data follow a fitted distribution.

The jumps in the CDF tell you that this is a Discrete random variable as opposed to a continuous random variable. The points on the x-axis where the jumps happen are the values that the discrete random variable take. The quantum of the jump represents the probability of the random variable taking on that value.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.