Hypothesis Testing

SanDeep DuBey
6 min readAug 29, 2020

A tool of conditional probability

Before starting core idea we should know what is the meaning this word “Hypothesis”

Hypothesis is “Explanation on the basis of limited evidence”.

What it means is we can infer any statement based on the natural happening around us but we haven’t proved that statement with experiment.So the explanation given by you without scientific evidence we can call it as Hypothesis.For Example : Suppose you are a student of botany or biology and what you observed that “wherever sunlight intensity is high,more greenery is there” .But you haven’t proven your statement ,then this statement is Hypothesis.

Statistical Hypothesis

The main purpose of statistics is to test a hypothesis.Statistical Hypothesis is an assumption about the population parameter.All is the game of reject and accept of hypothesis.Best way to determine whether our statistical hypothesis is true or false would be to examine the entire population which is impractical.What researchers do, they take random samples and check if these are consistent with statistical hypothesis or not based on which we reject or accept the hypothesis.

There are two types of statistical hypothesis :

  1. Null Hypothesis
  2. Alternative Hypothesis

I will explain what these are on discussion of Hypothesis Testing.

Hypothesis Testing

Hypothesis testing refers to the formal procedures used by statisticians to accept or reject statistical hypothesis.If you didn’t understand what the whole concept is saying as it is confusing lets take some examples to understand this:

Suppose we want to compare heights of two classes A and B.We want to find is there any difference between these two classes in terms of heights.I am taking samples of these two classes with size of 50 and i want to be very much sure about this,also i want to quantify how sure am i like which class’s students are taller or smaller than other.In this stage where we don’t know the concept of Hypothesis we can solve this by using graph where height of curve represents mean of that random variable.

By looking at graph we can say whose height is larger but the problem is how do we quantify this.To do this we use hypothesis testing.Lets follow some steps to implement this:

  1. Choosing Test-Statistics : First we come up with a number by the help of which we compare these two samples.We define a variable X which is difference of mean height of these two classes.Suppose X comes up with value of 10cm(u2-u1).
  2. Null Hypothesis (Denoted by H0) : It is like a assumption we make and check it is true or not.In this case

H0 : No difference in u1 and u2.

We also define alternative hypothesis which is opposite to null hypothesis

H1 : Difference in u1 and u2.

For those who didn’t understand what these two terms are with the help of above example:

We observed that there is actual difference in heights of class A and class B which is 10cm calculated above.We are taking samples here and by it we can’t say about population.So to check our observation is right for population or not we use hypothesis testing and in it we use two parameters called Null hypothesis and Alternative hypothesis.Alternative Hypothesis(H1) is a actual observation we got and Null hypothesis is that there is no difference in heights of Class A and Class B.Taking H0 and H1,we will prove what should be accepted or rejected.

Next what we will do assume H0 is true and i prove that H0 is incorrect and hence i should accept H1 or assume H0 is true and prove that H0 with high probability.

3.p-Value : It is the probability of actual observation (u2–u1) if null hypothesis is true.Assume H0 is true

for example : if p = 0.90 which means that our observation is correct with confidence of 90% and H0 will be accepted.

p-Value means probability of observation when H0 is true.

If that p-Value <0.05(Level of significance) then we reject H0.

How to compute p-Value?

We use Resampling and Permutation test here.Lets visualize all the steps rather than reading .

Every picture contain description here to understand better what is happening in each step:

First step is we combine these two samples A and B
In 2nd Step break combined set into random samples of equal size and find the simulated difference.Repeat this step until we simulate Null hypothesis
Arrange all simulated differences and sort them.Find your actual difference means 10cm in the list .Check how many elements are smaller than that or larger.Find p-Value from it.

There are two parts of this argument .Very first part is what is hypothesis testing .We made an observation(true) from given data.We are trying to build this probability which you want to compute.Now if p-Value is less we reject the hypothesis or else if p-Value is high we accept the hypothesis.

Intuition with coin-toss example

Given a coin we want to determine if the coin is biased towards heads or not?

Assume H0 : Coin is not biased towards heads

Lets do a small experiment to solve this:

Flip the coin 5 times and count number of heads

X = number of heads(Test Statistics)

In five flips probability of head = 1/32 = 3%………………

P(X = 5|H0) = 0.03(There is a 3% chance of getting 5 heads in five tosses if the coin is not biased towards head).Hence we reject H0.

In general terms if P(obs. by experiment|H0) < 0.05 then it results that our H0 may be incorrect and hence we reject the hypothesis

Note : We always choose some fit sample size because it can change the overall result.For Example : If coin is flipped 3 times then P(X=3|H0) = 1/8=12.5 % which is greater than 5 and we accept the hypothesis.

Practical Example of Hypothesis Testing

Every country has its own drug department and any drug in the market must be issued from this department.Let there is a drug company come to the department and claims that their medicine is much more efficient.They claim that it reduces fever faster than drug available in the market.Lets design hypothesis test here to solve this:

Task : To determine if the claim is true or false.

H0 : D1 and D2 takes same time to reduce the fever

For test statistics :

Next Step is compute P(X≥2|H0)(Probability of difference in time from samples of 50–50 given that both medicines takes same time to reduce the fever)

Significance Level in area of medicines is 1 %

So if P(X≥2|H0) < 0.1 we reject the hypothesis.It means there is some difference in time in D1 and D2.

Main Steps in Designing Hypothesis Testing

  1. Test Statistics
  2. Null Hypothesis(H0) and Alternative Hypothesis(H1)
  3. Find p-Value
  4. Based on p-Value and level of significance reject or accept the hypothesis

Note : The process of finding p-Value is called Resampling and Permutation Test

--

--