PDF(Probability Density Function) and CDF(Cumulative Distribution Function)
Let’s Play with PDF and CDF which is broadly used.
After the end of this article you will be in a position of knowing what’s,why’s and how’s of PDF,CDF.There is a concept of PMF(Probability Mass Function) which is also discussed here.
Before heading towards these concepts lets have a look about random variables because all these are connected to it.
Random Variable : -A random variable is the variable whose output can be any value from a possible set of outcomes.This set could be either a finite set or an infinite set.Example : Events of Tossing a coin

There are 2 types of random variable:
- Continuous random variable:If a random variable can take a continuous value from an infinite set of outcomes, then we call it a continuous random variable.For Example-Finding out the height of a person. In this example, the possible range is 120cm-190cm. Here the value that is taken by the variable lies in this specified range, but we couldn’t say the exact value. It could take values like 162.5,168.99,172.05,181.365,,etc Even after the decimal point, we could get infinite number of combinations of values. So we call it an infinite set of outcomes. As it is taking a value from an infinite set, we call it a continuous random variable.
- Discrete random variable:If a random variable can take a discrete value from a finite set of outcomes, then we call it a discrete random variable.
Ex: Rolling a dice. Here the possible set of outcomes are {1,2,3,4,5,6}. Irrespective of whatever the case and scenario it may be, the output can be any one among these 6 values. So we call it a finite set of outcomes. As it is taking a value from a finite set, we call it a discrete random variable.
PDF(Probability Density Function)
First Question strikes in our mind is why concept of PDF came into picture.Lets see through a example:
I am using Haberman Dataset here which contains data from the study of patients who undergone surgery of breast cancer conducted in University of Chicago’s Billing Hospital between 1958 to 1970.By using this data(Data given are Age,Operation_Year and Axillary Nodes we want to know whether patient will die within 5 years or more than 5 years.Now lets plot Age variable here:

This is called 1-D graph which is very hard to read because points are overlapping to each other.So to make easy visualization of this we break it into parts and count on Y-axis.Now what will se a bar like structure called histogram.

After joining peaks of these different bars a smooth line is induced which is called PDF(Probability Density Function).Blue Line represents patients dying more than 5 years and Orange Line represents patients dying withing 5 years having surgery.So,PDF is version of histogram.
PDF is a statistical term that describes the probability distribution of the continuous random variable.
Mathematical Notation:

Mostly PDF follows Normal Distribution (Bell like Curve)

PMF(Probability Mass Function)
PMF is used to find probability distribution of discrete random variables.
CDF(Cumulative Distribution Function)
We have seen how to describe distributions for discrete and continuous random variables.Now what for both:
CDF is a concept which is used for describing the distribution of random variables either it is continuous or discrete.It is used to tell how much percentage of value is less than a particular value.
For Example : Lets take age variable from haberman dataset and now what i am writing is P(age=50) = 0.60.What it means that 60% of patients are less than age of 50 in dataset.

Orange Line denotes CDF of Age variable.
In Short:
PDF (Describe’s distribution for continuous random variable)
CDF(Describe’s distribution for continuous and discrete random variable).
Thanks for reading it.Hope you like it