What is statistics? – Normal Distribution – Bell curve

Normal Distribution: Definition
Normal Distribution or “Bell Curve”, or Gaussian Distribution / curve is a Probability Distribution which is very Symmetrical around its Arithmetic Mean, and its shape is a “Bell” curve shape. That is, it is wide in the middle and narrowed in its tails.


Normal Distribution: A Little History
Abraham de Moivre (1667-1754) as well Carl Friedrich Gauss (1777-1855) were the first scientists that studied mathematical functions that produce such type of Distribution. Abraham de Moivre was a Mathematician / Statistician and he was studying the rate of Mortality over people’s age in order to calculate profits from their annual payments. The results of this function produced normally distributed data. Nowadays, insurance companies make use of this function.

Also, Carl Friedrich Gauss, a very talented person, invented the “Gaussian” function which shows how the distribution of arbitrarily selected real numbers which are constants, can produce a special distribution the “Normal Distribution”. Moreover, he studied the random errors that were produced in various measurements and he found that they were normally distributed. For example, the electronic noise in electrical circuits produces a Normal Distribution. Therefore, sometimes, the Normal Distribution is also called as Error Distribution.


Normal Distribution: Density of Distribution: Definition
Density of a Distribution can be defined in a graph figure, as the distance that has the curve, that describes some random selected measurements of an event, from the “X” axon, the Horizontal axon. The Density of a Distribution is referred to the number of data that can contain under some area. Therefore, as some points of this curve are placed in a higher distance from this axon, the density of this distribution will increase accordingly. Thus, more data will be contained in some specified intervals under this curve.


Normal Distribution: pdf and cdf
Here, we must have a clear understanding what is the Probability Density Function (pdf) and the Cumulative Probability Function (cdf). Note that the pdf of a continuous random variable is the derivative of its cdf.

In both functions, pdf and cdf, the X axis represents the values of the variable, usually in some specified intervals and the Y axis represents the Probabilities.

Probability Density Function
The pdf represents the relative distribution of frequency of a continuous random variable and it has a Bell shape.

Note that the “area under the probability curve” is equal to 1, or otherwise, is equal to 100%. Thus, a single point (x) has a Probability of “0”, because, it “covers” an -almost- zero area: Pr(X=x)=0. Therefore, intervals are used in order to provide meaningful answers to questions based on pdf: e.g. Pr(a \leq X \leq b).

Therefore, it can answer questions that are using phrases such as: “What is the probability of a specific value of this variable to be lower than, higher than, between some values for an event?”.


Cumulative Probability Function
The cdf can represent any random variables and it has a Sigmoid shape.

Here, instead of area, the actual points on the X axis are used in relation with their probabilities.

It can show what is the probability of this variable (X) to have a value (x) below or equal to some specified number/event.

Therefore, it can answer questions of this type: ” What is the probability that people will have a Height below or equal to 1.90cm?.

Normal Distribution: Usefulness
The Normal Distribution or Gaussian Distribution is used in various scientific fields such as:

i) in Processing Images (Blurring),
ii) in Gases Behavior,
iii) in Communications (Signaling).
iv) It can describe the distribution of many natural phenomenon such as the measurements of Height, and Weight.


Normal Distribution: Example
If we use the Height Distribution of an Adult Population, which tends to be Normally Distributed, we are expecting that:

i) Very few people will have a Height shorter than e.g. 1.40cm and even less people will have a Height shorter than 1.10cm.
ii) Very few people will have a Height higher than e.g. 1.90cm and even less people will have a Height higher than e.g. 2.20cm.
iii) We are expecting that the majority of this population will have a Height between e.g. 1.40cm and 1.90cm.

That is the reason of the “bell curve” shape. The extreme values are in the tails of this distribution which have lower probability to happen, and thus, lower density, and the “popular” measurements are placed in the middle of the Bell Curve which is associated with higher probability as well higher density – “more space” exists.


The below Graph Figure presents the dataset of Galton which measured the height of 928 children. The Red curve shows the distribution of the first 50 measurements while the Green Curve shows the Height distribution of the whole sample. Note, that the measurements of Height shape a Normal Distribution. These two Gaussian curves differ only on the size of sample used. The Red one has a low density (small number of data) and the Green one has a higher density (more data).

Children Height Distributions: Green one includes 928 Height observations (original dataset) while the Red one includes the first 50 Height Observations (in cm) of the original dataset.
Norma_Distribution_Galton's example_children_height

The below graph figure presents three Normal Distributions that differ in Mean and Standard Deviation. The Blue one has a Mean of 7 and Standard Deviation 0.2. The Yellow one has a Mean of 3 and Standard Deviation 0.4. Finally, the Red one has a Mean of 9 and a Standard Deviation 0.3.


Normal Distribution: Central Limit Theorem
Your data always will tend to shape a Bell Curve when:

i) you select multiple events e.g. multiple samples about the Height of people in one country
ii) that include identical observations
e.g. all are relevant to Height of people in the same country
e.g. and in same conditions – you take the height of all people in barefoot condition
iii) in a Random way e.g. you do not select people from only one region.
iv) each observation is independent from each other e.g. the height of each people must appear only once and it must not be repeated in the dataset.

Then, as you increase the number of random samples you take about the Height of people in one country under same conditions, and in a random, independent way, then your final dataset will get every time closer and closer to a Bell Shape curve. Then you may suggest that the underlying population of your data is normally distributed and then the properties of the Normal Distribution can be applied to your sample too. This is stated by the Central Limit Theorem.

Multiple lines below the “Bell Shape” curve represent the tendency of one researcher to increase the random samples that is taking from a population, in order to be able to apply the properties of the Normal Distribution e.g. 1st sample consist of 190 people in one place, 2nd sample 360 people from another place, 3rd sample 320 people from another place etc.


Normal Distribution: Assumptions
Randomness in sample selection is that every participant in the population of statistical interest has the same probability to be selected. For example, UK women (above 40s) selected randomly from Demographical lists instead of selecting the first e.g. 70.

Independence of observations refers that the manipulation of an observation or a participant will not influence another observation or participant. For example, the Menstrual cycle of the women who work or live in the same place in a daily basis can begin the same day. Therefore, this event, the Menstrual cycle of the women who work or live in the same place in a daily basis cannot consist random and independent observations or selection. The same is true about TV Channel choices for people who live in the same house but they have only one TV set. All these persons are “forced” to watch a specific TV channel.

This problem is also known as Autocorrelation. It was first mentioned by Galton in 1888, also known as Galton’s Problem.


Standard Normal Distribution or Z Distribution: Properties
When the independent observations in a Random variable X are so many that get closer and closer to Infinity, and its distribution is a Normal one, and its Arithmetic Mean is equal to \mu=0, and its Standard Deviation is equal to \sigma=1, then this Distribution is called Standard Normal Distribution or Z Distribution or Typical Distribution and it can be denoted as N(0,1). Note that the Normal Distribution, generally, is denoted as N(\mu,\sigma^2).


As it was mentioned, when researchers studied such variables -in large scales- using almost all available population data (e.g. Height data for all men from Military files in a specific Country), and after rescaling the Mean to equal “0”, then, always, a “Standard Normal Distribution” was produced. Note that this Distribution is the “Golden Standard” for all Normal Distributions.

Then, the study of the properties of the “Standard Normal Distribution” can help us to understand the properties of every Normal Distribution.


  • The tails of the Standard Normal Distribution
  • Its “tails”, that is, both ends of this Standard Normal Distribution are getting closer and closer to Horizontal Axon X without never touching it.

  • Central Tendency
  • The values of Arithmetic Mean, Median, and Mode, which are measurements of the Central Tendency, have exactly the same value in Standard Normal Distribution. Moreover, if we draw a Straight Line from the Highest point of this Bell Curve until the point that we meet the value of Mean on Horizontal Axon X, then 50% of the data will be placed before and after this line, always inside the Probability Bell Curve, in Standard Normal Distribution.


  • The rule of thumb: 68-95-99.7-99
  • The rule of thumb 68-95-99.7-99 is based on the fact that there is symmetry around the Arithmetic Mean in Standard Normal Distribution. The percentage of data that exist in the right and in the left of the Arithmetic Mean, always inside the Probability Bell Curve:

    i) In the distance of one (1) Standard Deviation, is 68.2%: \mu\pm1\sigma=68.2\%
    ii) In the distance of Two (2) Standard Deviations, is 95.4%: \mu\pm2\sigma=95.4\%
    iii) In the distance of Three (3) Standard Deviations, is 99.7%: \mu\pm3\sigma=99.7\%
    iv) In the distance of Four (4) Standard Deviations, is 99.99%: \mu\pm4\sigma=99.99\%

    In other words, the Probability of a “x” value to be in a distance of 1 or 2 or 3 or 4 Standard Deviations from the Arithmetic Mean in Standard Normal Distribution is:

    i) Pr(\mu -\sigma \leq x\leq \mu +\sigma )=0.6827
    ii) Pr(\mu -2\sigma \leq x\leq \mu +2\sigma )=0.9545
    iii) Pr(\mu -3\sigma \leq x\leq \mu +3\sigma )=0.9973
    iv) Pr(\mu -4\sigma \leq x\leq \mu +4\sigma )=0.9999

    Standard Normal Distribution and Symmetry (pdf)
    Normal Distribution Examples Percentages

    Probability (pdf) and Cumulative (cdf) Density functions for General Normal and Standard Normal DistributionAs it was explained:
    i) in Standard Normal Distribution, its mean is equal to “0”, and its Standard Deviation to “1”.
    ii) In any other Normal Distribution, Mean and Standard Deviation may deviate from these numbers.

    Probability Density function (pdf)
    —for EVERY Normal Distribution: f(x)=\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}

    —for Standard Normal Distribution: f(x)=\frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2}x^2}

    Cumulative Density function (cdf)
    —for General Normal Distribution: Fx(x)=\int_{-\infty}^{x}\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}dx

    —for Standard Normal Distribution: Fx(x)=\phi(z)=\int_{-\infty}^{x}\frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2}x^2}dx

    Symbol Explanation
    We currently have explained the symbols for the Arithmetic Mean \mu and
    the symbol for the Standard Deviation \sigma (for Population).
    Τhe symbol \pi denotes the number from the division of Circumference by its diameter, and it is a constant which is equal to 3.14159.
    The e is another constant which is also called Euler’s number and it is equal to 2.71828.

    Both constants are very essential ones! Note that they have an infinite number of decimals which still mathematicians try to count!

    The following graph figure shows that the Bell curve is described by the pdf and the under the curve area is described by the cdf. As it was said, the pdf of a continuous random variable is the derivative of its cdf.

    Standard Normal Distribution: Cumulative Density function (cdf)
    Standard_Normal_Distribution (cdf_Cumulative_Density_Function)

    Example: Calculating Normal PDF ( Probability Density Function ) with actual real numbers
    In order to understand the actual function of Standard Normal Probability Density how it works, we present an example with real numbers. So, how can we find what IS the actual number / result of a Normal Density Function for a range of numbers, if we know:
    a) their Standard Deviation
    b) its Arithmetic Mean
    c) and that they follow a Normal Distribution ?

    The below steps must be followed in order to find the corresponding actual result of Normal Probability Density Function ( PDF ):

    — Note euler’s number ( e) and pi number is equal to e=2.71 and pi=3.142
    — Dataset that follows Normal Distribution:
    -4, -3, -2, -1, 0, 1, 2, 3, 4


    i) Calculate its Mean and SD:
    M=0, \sigma=2.582
    ii) Subtract each individual value X from the Mean
    M: X-M
    iii) Square them :
     \left ( X - M \right )^{2}
    iv) Divide the previous Result by 13.33:
    —which is found by Squaring Standard Deviation and then Multiply that result by 2: :
    2*\left (  \sigma ^{2} \right )=13.33
    v) Then, raise e to Results after you have multiplied them by -1
    e^\left ( -1*results \right )
    vi) Finally, divide them by 6.472 which is found by:
    —multiplying Standard Deviation with pi value as well with 2, and then find their square root:

    We presented the way to replace Normal Probability Density Function with real numbers. Final results are presented in the last column of the below Table.

    Data: XStep 1: Find Mean and σStep 2: X - MeanStep 3: square themStep 4: divide them by 2*(σ^2)=2*(2.582^2)=13.33Step 5: e^(-1*results) Step 6: devide them by sqrt( pi()*σ ) = sqrt(3.142*2.582)=6.472
    -4Mean = 0-4-0=-4(-4)^2=1616 / 13.33=1.2e^(-1*1.2)=0.3010.301/6.472=0.047
    -3σ = 2.582-3-0=-3(-3)^2=99 / 13.33=0.675e ^(-1*0.675)=0.5090.509/6.472=0.079
    -2-2-0=-2(-2)^2=44 / 13.33=0.3e ^(-1*0.3)=0.7410.741/6.472=0.115
    -1Note: e=~2.71-1-0=-1(-1)^2=11 / 13.33=0.075e ^(-1*0.075)=0.9280.928/6.472=0.144
    0Note: pi=~3.1420-0=0(0)^2=00 / 13.33=1e ^(-1*1)=11/6.472=0.155
    11-0=1(1)^2=11 / 13.33=0.075e ^(-1*0.075)=0.5090.928/6.472=0.144
    22-0=2(2)^2=44 / 13.33=0.3e ^(-1*9)=0.0750.741/6.472=0.115
    33-0=3(3)^2=99 / 13.33=0.675e ^(-1*0.675)=0.5090.509/6.472=0.079
    44-0=4(4)^2=1616 / 13.33=1.2e^(-1*1.2)=0.3010.301/6.472=0.047

    Abraham de Moivre
    Normal Distribution
    Gauss Function
    Carl Friedrich Gauss
    Euler’s Number: e
    Galton Height Dataset explanation
    Galton’s Problem