Empirical Cumulative Distribution Function: Definition

Empirical Cumulative Distribution Function ( ECDF ) is a Non-Parametric Function that can describe the data of an Unknown Cumulative Distribution Function ( CDF ), without to use Parametric Estimators such as the Mean or the Standard Deviation. It makes the least assumptions:

—The form of distribution that describes the Sample data of a Random Variable, is Unknown

—The observations are random – independent

Empirical Cumulative Distribution Function: Its Symbol

It must be clarified that in the Literature, ECDF is symbolized both as: as well as . Both these Symbols are equal, If you substitute in both ends of the equation either instead of , and vice versa.

Empirical Cumulative Distribution Function: Statistical Definition

—If we hypothesize that we have a random variable

—with a sample of data from / to

—then we are searching for this Empirical CDF

—that it can describe for a given space / steps

e.g. for , or for ,

—the sample data of this Random Variable

—which is Lower than or Equal to :

—.

where:

Specifically, the takes:

i) A value of when there is no value for the given space of .

ii) A value of when the value exists not more than once in the dataset for the given space of .

iii) A value of for each duplicated value that exists in the dataset for the given space of .

Therefore, the Empirical Function gives a weight of for each point of CDF and therefore, it is named also as “Step Function”, because, its visual representation creates steps. In the case that there are Identical / Duplicated values, then the weight that is given to each point of CDF is increased by , for every identical / Duplicated value , for the given space of .

This function, in a simpler – non-valid, for illustration purposes only – way, Empirical Function can also be written as:

Symbol Explanation

means “Don’t Exist”

indicates the addition of all the results after the math operations inside that symbol has been completed.

is the total number of Sample without the Missing values.

is that dataset value that is indicated by , e.g. 1st, 2nd.

The space / steps are defined by your sample data.

is the natural estimator of the Real (Unknown) CDF.

Empirical Cumulative Distribution Function (ECDF): Table

The following Table / Figure graphs help to understand the order of math operations that is taking place inside the Empriical Function, in order to calculate the values of Empirical Cumulative Distribution Function fro every given space / step:

i) By having a Random Variable that contains a sample of values

ii) Then, we must order these values in an Ascending order, from the Lower one to the Highest one:

—

iii) Then you must Count for each given space / step,

iv) the Number of Values that are identical to the value for every step of it, in order to calculate the Indicator for every given step:

—for

—for

—for ,

—for ,

—for ,

—for ,

—for

v) In order for that Function. to make use of its “Cumulative” term, you must add the result that you found for the Indicator for every previous step to the next step. In that way, you calculate this “” term of the function.

—

—

—

—

—

—

—

vi) The final step is to divide the above results with the total number of Sample values of the Random variable, which is for every step. Then, these values are the values of the Empirical Cumulative Distribution Function which can be placed on the axis. The steps are placed on the axis. The results are the following ones:

—

—

—

—

—

—

—

Empirical Cumulative Distribution Function (ECDF): Conclusion

In order to interpret these Results, we must say that ECDF uses the total number of observations as the “total” denominator. Therefore, this event creates Results that are between , and therefore, they are easy to be expressed in percentages . So, what the Empirical Cumulative Distribution Function Results can say about the values of the Random Variable ?

—The of values are EQUAL (or Lower than)

—The of values are equal or Lower than

—The of values are equal or Lower than

—The of values are equal or Lower than

—The of values are equal or Lower than

X value placed on Χ=t | t step for the axis X=t | Indicator: how many X for each t | Σ ( Ι ) explanation | Σ ( Ι ) for t step 0 --> 6 | ( 1*Σ ( Ι ) ) /n for n=8 ECDF value for Y=Fn ( t ) |
---|---|---|---|---|---|

0 | 0 | 1 | 1 => | I ( Xi < 0 ) = 1 | 0.13 |

--- | 1 | 0 | 1+0 => | I ( Xi < 1 ) = 1 | 0.13 |

2, 2 | 2 | 2 | 1+0+2 => | I ( Xi < 2 ) = 3 | 0.38 |

--- | 3 | 0 | 1+0+2+0 => | I ( Xi < 3 ) = 3 | 0.38 |

4 | 4 | 1 | 1+0+2+0+1 => | I ( Xi < 4 ) = 4 | 0.50 |

5, 5 ,5 | 5 | 3 | 1+0+2+0+1+3 => | I ( Xi < 5 ) = 7 | 0.88 |

6 | 6 | 1 | 1+0+2+0+1+3+1=> | I ( Xi < 6 ) = 8 | 1.00 |

The following Graph Figure presents the Empirical Cumulative Distribution Function ( ECDF ) according to the meeting point that is created by:

—The unique values that are placed in the axis that you can find them in the first column of the Table

—The values that are placed in the axis and that you can find them in the last column of the Table.

—For that reason, there are only 5 “Steps”, because so many unique values exist.

—The points that do not exist in the values of the variable, like the value and , they are created with the method of the Linear Interpolation: That is:

—Each “Step” can cover so many steps, until the next “Step” ‘s appearance, that is, until the next Unique value.

—Therefore, value is Linearly Interpolated in axis by extending the step / point of until the next Real Existed value of variable, that is, until , BUT value is not covered / by that step / point.

—In this case, value has the same value as the axis value of , on axis.