what is statistics Empirical Cumulative Distribution function

Empirical Cumulative Distribution Function: Definition
Empirical Cumulative Distribution Function ( ECDF ) is a Non-Parametric Function that can describe the data of an Unknown Cumulative Distribution Function ( CDF ), without to use Parametric Estimators such as the Mean or the Standard Deviation. It makes the least assumptions:

—The form of distribution that describes the Sample data of a Random Variable, is Unknown
—The observations are random – independent

Empirical Cumulative Distribution Function: Its Symbol
It must be clarified that in the Literature, ECDF is symbolized both as: \widehat{F}_{n}( \left X \right ) as well as \widehat{F}_{n}( \left t \right ) . Both these Symbols are equal, If you substitute in both ends of the equation either t instead of X, and vice versa.

Empirical Cumulative Distribution Function: Statistical Definition
—If we hypothesize that we have a random variable  X
—with a sample of n=10 data from / to  X_{1}....X_{n}
—then we are searching for this Empirical CDF  \widehat{F}_{n}
—that it can describe for a given space / steps t
e.g. for t_{0}, or for t_{0\rightarrow 10},
—the sample data of this Random Variable  X
—which is Lower than or Equal to t:
\widehat{F}_{n} \left ( t \right )=P \left ( X \leq t \right ).

 \widehat{F}_{n} \left ( t \right ) =\frac{1}{n}\sum_{i=1}^{n}I_{X_{i}\leq t} where:

 I= \begin{cases}   0 & X_{i} \nexists  t \\  1 & X_{i}\neq X_{i+1} \Leftrightarrow Xi\leq  t \\  I=I_{X_{i}\leq t} + I_{X_{i+1}\leq t} & X_{i} = X_{i+1} \Leftrightarrow X_{i},X_{i+1} \leq  t \\  \end{cases}

Specifically, the  I takes:
i) A value of 0 when there is no X value for the given space of t.
ii) A value of 1 when the X value exists not more than once in the dataset for the given space of t.
iii) A value of  I + 1 for each duplicated X value that exists in the dataset for the given space of t.

Therefore, the Empirical Function  \widehat{F}_{n} gives a weight of  \frac{1}{n} for each point of CDF and therefore, it is named also as “Step Function”, because, its visual representation creates steps. In the case that there are I Identical / Duplicated X values, then the weight that is given to each point of CDF is increased by I + 1, for every identical / Duplicated value X, for the given space of t.

This function, in a simpler – non-valid, for illustration purposes only – way, Empirical Function can also be written as: ECDF = \frac{1*\sum I}{n}

Symbol Explanation
\nexists means “Don’t Exist”
\sum indicates the addition of all the results after the math operations inside that symbol has been completed.
n is the total number of Sample without the Missing values.
X_{i} is that dataset value that is indicated by i, e.g. 1st, 2nd.
The t space / steps are defined by your sample data.
I_{X_{i}\leq t} is the natural estimator of the Real (Unknown) CDF.

Empirical Cumulative Distribution Function (ECDF): Table
The following Table / Figure graphs help to understand the order of math operations that is taking place inside the Empriical Function, in order to calculate the values of Empirical Cumulative Distribution Function fro every given t space / step:

i) By having a Random Variable X that contains a sample of n=8 values

Empirical Cumulative Distribution function_ECDF_0

ii) Then, we must order these values in an Ascending order, from the Lower one to the Highest one:
0,2,2,4,5,5,5,6

Empirical Cumulative Distribution function_ECDF_2

iii) Then you must Count for each given t space / step,
iv) the Number of Values that are identical to the t value for every step of it, in order to calculate the Indicator I for every given t step:
—for t=0 => I=1
—for t=1 => I=0
—for t=2 => I=2,
—for t=3 => I=0,
—for t=4 => I=1,
—for t=5 => I=3,
—for t=6 => I=1

Empirical Cumulative Distribution function_ECDF

v) In order for that Function. to make use of its “Cumulative” term, you must add the result that you found for the Indicator I for every previous t step to the next t step. In that way, you calculate this “\sum ( \left I \right ) ” term of the function.

t = 0 => \sum I = 1
t = 1 => \sum I = 1 + 0 = 1
t = 2 => \sum I = 1 + 0 + 2= 3
t = 3 => \sum I = 1 + 0 + 2 + 0 = 3
t = 4 => \sum I = 1 + 0 + 2 + 0 + 1 = 4
t = 5 => \sum I = 1 + 0 + 2 + 0 + 1 + 3 = 7
t = 6 => \sum I = 1 + 0 + 2 + 0 + 1 + 3 + 1 = 8

Empirical Cumulative Distribution function_ECDF_52

vi) The final step is to divide the above \sum ( \left I \right ) results with the total number of Sample values of the Random variable, which is n=8 for every t step. Then, these values are the values of the Empirical Cumulative Distribution Function \widehat{F}_{n}( \left t \right ) which can be placed on the Y axis. The t steps are placed on the X axis. The ECDF results are the following ones:

t = 0 => \frac{ \sum I}{n} = 1 / 8 = 0.13
t = 1 => \frac{ \sum I}{n} = 1 / 8 = 0.13
t = 2 => \frac{ \sum I}{n} = 3 / 8 = 0.38
t = 3 => \frac{ \sum I}{n} = 3 / 8 = 0.38
t = 4 => \frac{ \sum I}{n} = 4 / 8 = 0.50
t = 5 => \frac{ \sum I}{n} = 7 / 8 = 0.88
t = 6 => \frac{ \sum I}{n} = 8 / 8 = 1.00

Empirical Cumulative Distribution function_ECDF_52

Empirical Cumulative Distribution Function (ECDF): Conclusion
In order to interpret these Results, we must say that ECDF uses the total number of observations as the “total” denominator. Therefore, this event creates Results that are between (\left 0, 1 \right ), and therefore, they are easy to be expressed in percentages \%. So, what the Empirical Cumulative Distribution Function Results can say about the values of the Random Variable X ?

—The 13\% of values are EQUAL (or Lower than) X \leq 0
—The 38\% of values are equal or Lower than X \leq 2
—The 50\% of values are equal or Lower than X \leq 4
—The 88\% of values are equal or Lower than X \leq 5
—The 100\% of values are equal or Lower than X \leq 6

X value placed on Χ=tt step for the axis X=t Indicator: how many X for each tΣ ( Ι ) explanationΣ ( Ι ) for t step 0 --> 6( 1*Σ ( Ι ) ) /n for n=8
ECDF value for Y=Fn ( t )

0011 =>I ( Xi < 0 ) = 10.13
---101+0 =>I ( Xi < 1 ) = 10.13
2, 2221+0+2 =>I ( Xi < 2 ) = 30.38
---301+0+2+0 =>I ( Xi < 3 ) = 30.38
4411+0+2+0+1 =>I ( Xi < 4 ) = 40.50
5, 5 ,5531+0+2+0+1+3 =>I ( Xi < 5 ) = 70.88
6611+0+2+0+1+3+1=>I ( Xi < 6 ) = 81.00

The following Graph Figure presents the Empirical Cumulative Distribution Function ( ECDF ) according to the meeting point that is created by:
—The X unique values that are placed in the X=t axis that you can find them in the first column of the Table
—The \widehat{F}_{n}( \left t \right ) = ECDF values that are placed in the Y axis and that you can find them in the last column of the Table.
—For that reason, there are only 5 “Steps”, because so many unique X values exist.

—The points that do not exist in the values of the X variable, like the value 1 and 3, they are created with the method of the Linear Interpolation: That is:
—Each “Step” can cover so many t steps, until the next “Step” ‘s appearance, that is, until the next Unique X value.
—Therefore, value 1 is Linearly Interpolated in X axis by extending the step / point of 0 until the next Real Existed value of X variable, that is, until 2, BUT value 4 is not covered / by that step / point.
—In this case, value 1 has the same value as the X axis value of 0, on Y axis.

Empirical Cumulative Distribution function_ECDF_figure_graph