Skip to main content

Median test

 Non- Parametric test

Median test

Median test is also a Non-Parametric test and it is alternative to Parametric T test. The median test is used when we are interested to check the two independent sample have same median or not. It is useful when data is discrete or continuous and if data is in small size. 

Assumptions: 

I) the variable under study is ordinal scale

II) the variable is random and Independent.


The stepwise procedure for computation of median test for two independent sample :

Step I :- firstly we define the hypothesis

Null Hypothesis is the two independent sample have same median. 

Against

Alternative Hypothesis is the two independent sample have different median. 

Step II:- In this step we combine two sample data. And calculating the median of combined data.

Step III:- after that for testing hypothesis we constructing the (2x2) contingency table. For that table we divide the sample into two parts as number of observation above and below to the median for both sample then the formulated table is 

 

Sample I

Sample II

Total

No. of observations less then median

a

b

a+b

No. of observations greater than or equal to median

 c

d

c+d

Total

a+c

b+d

n

Here we deciding the data first sample as number of observation less than median is a and the number of observations greater then or equal to median is c. In the above table. Similarly for second sample the number of observation in second sample is less than median is b and number of observations are greater than or equal to median is d. Then calculating marginal totals are (a+c), (b+d),(c+d),(a+b). 

I  .If number of observation  (n1+n2) greeter than 40 then use chi-square test. 

II. If number of observation (n1+n2) less than 20 then use fishers exact test. 

If expected frequency is less than 5 then use corrected formula of chi-Squaee test. 

Step IV- 

The test statistics is chi-square and it is calculated as 

χ2  = {[n (|ad-bc|-(n/2))^2]/ (a+b)x(c+d)x(a+c)x(b+d)}  is calculated value

Corrected formula of chi-Squaee - in this step we defining the test statistics as chi-square and it is calculated as 

χ2  = {[n (|ad-bc|-(n/2))^2]/ (a+b)x(c+d)x(a+c)x(b+d)}  is calculated value

where n = a+b+c+d

Decision  Rule:- for taking decision about hypothesis the critical value for test statistics is obtained in table of critical value table of chi-square. 

If the calculated chi-square is less than the tabulated chi-square we accept the null hypothesis at  ∝ % of level of significance. Otherwise reject the null hypothesis.

  

critical value for chi-square for 1 degrees of freedom is 3.841 at 5% of level of significance and 

6.635 at 1% of level of significance

Small sample test :- 

Fisher's Exact Test:

If we are interested to check the two independent sample are drawn from population with unique median or not. In that case we use Media rest but there some conditions to use median test. In this case if the sample size( n1+n2less than 20 and one or more expected cell frequencies of (2x2) contingency table is less 5 then we use Fisher's exact test.

In test consist of probability associated with all possible combination of (2x2) contingency table.

Assumptions:

I) the sample should be random and Independent.

II) the marginal totals of contingency table is fixed.


For Fisher's exact test follow following procedure:- 

Step I :- we are interested to test the two independent sample are drawn from population with unique median or not. Then we use this test and for testing we set hypothesis as Null hypothesis is that the two samples have same median. Against the Alternative hypothesis is the two samples have different median. 

Step II -: for performing test firstly we constructing the (2x2) contingency table. (Same as in median test) For that table we divide the sample into two parts as number of observation above and below to the median for both sample then the formulated table is 

 

Sample I

Sample II

Total

No. of observations less then median

a

b

a+b

No. of observations greater than or equal to median

 c

d

c+d

Total

a+c

b+d

n

Here we deciding the data first sample as number of observation less than median is a and the number of observations greater then or equal to median is c. In the above table. Similarly for second sample the number of observation in second sample is less than median is b and number of observations are greater than or equal to median is d. Then calculating marginal totals are (a+c), (b+d),(c+d),(a+b) fixed for each possible combination.

Step III:- this test based on the probability of  all possible combinations of(2x2) contingency table. For obtaining all possible combination.

We define ns = smallest marginal Total, this obtained form table. For then if ns= n2= (c+d) then possible combination of  (2x2) contingency table are obtained as (i,j) = (0, n2), (1, n2-1) ...........(n2,0)  using this combination we prepare table with fixed original marginal Total. And computing the all possible value of table as 

Pi j = {(a+b)!x(c+d)!x(b+d)!x(a+c)!}/ {n! x a! x b! x c! x d!}

Step IV:- in this step defining test statistics is P-value and is obtained as adding Pi j  all probabilities which are less than or equal to the actual or original Pi j  of (2x2) contingency table.


Decision rule : If P-value is less than or equal to ∝ % level of significance then we Reject null hypothesis.

the fisher's exact test is a statistical test used to determining the significance of the association between two categories. the fisher's exact the is particularly  used when the sample size is small and the assumption are required for other test, such as the assumptions of chi-square test are not met then fisher's exact test is used. it provides a way to determine if the observed association between two variable is statistically significant, it gives idea about the relation between the variable investigate. 

for example : the fisher's exact test is used in clinical research to assess the association between the two categorical variables. e.g. it is used to examine the relation between a specific treatment and its outcome.

 Example: 10 boy and 15 girls were observed during the two play session. every children was scored for the degree of aggregation as: 

Boy: 56 59 72 65 113 65 141 51 20 65

girl: 55 40 22 56 25 7 58 9 20 46 26 36 50 31 45

use the median test to test there is difference between the score of boys and girl.

Answer:  for testing there is difference between the score of boys and girl.

 firstly we define the hypothesis

Null Hypothesis : there is no significant difference between the score of boys and girl.

Against

Alternative Hypothesis : there is significant difference between the score of boys and girl.

In this step we combine two sample data. And calculating the median of combined data.

7, 9, 20,20, 22, 25, 26, 31, 36, 40, 45, 46, 50, 51, 55, 56, 56,58, 59, 65, 65, 65, 72, 113, 141.

Median = size of  { (n+1) / 2} th observation

                =  size of 13 th observation 

Median = 50 

 after that for testing hypothesis we constructing the (2x2) contingency table. For that table we divide the sample into two parts as number of observation above and below to the median for both sample then the formulated table is 

 

Sample I

Sample II

Total

No. of observations less then median

1

11

12

No. of observations greater than or equal to median

 9

4

13

Total

10

15

25


here n1 + n2 = 25 is between 20 to 40 then we check the expected frequency of table.




Sample I

Sample II

No. of observations less then median

(10x12)/25= 4.8

(12x15)/25=7.2

No. of observations greater than or equal to median

 (13x10)/25=5.2

(15x13)/25=7.8

the one sell have less than 5 expected frequency then we use chi-square statistics with yates correction for contingency table 

χ2  = {[n (|ad-bc|-(n/2))^2]/ (a+b)x(c+d)x(a+c)x(b+d)} 

χ2  = {[25 (|99-4|-(25/2))^2]/ 12x10x15x13} 

χ= 7.271 it id calculated 

and table value is 3.841 i.e. χ2  df, with 5% l.o.s.

the calculated chi-square is greater than the tabulated chi-square we Reject  the null hypothesis at  5 % of level of significance. 

hence there is significant difference between the score of boys and girl. 

Fisher's Exact Test Example:

Example 1:- We are interested in two independent sample are drawn from population having identical median or not. For the following given data. 

 

Sample I

Sample II

No. of observations less then median

10

2

No. of observations greater than or equal to median

 3

5

Use appropriate test. At 5 % level of significance. 

Solution:- for given example we want to test the two sample are drawn from population having identical median or not for testing we  setting the Hypothesis as

 
Null Hypothesis :- the two independent sample drawn from population having identical median.
Alternative Hypothesis:- the two independent sample drawn from population having different median.

for performing test first we calculate marginal total.

 

Sample I

Sample II

Total

No. of observations less then median

10212

No. of observations greater than or equal to median

 3

5

8

Total

137

20

now we check the expected frequencies.



Sample I

Sample II

No. of observations less then median

(12x13)/20= 7.8

(12x7)/20=4.2

No. of observations greater than or equal to median

 (13x8)/20=5.2

(7x8)/20=2.8

now we see the two expected frequencies are less then 5. and number of observation are 20, in this situation fisher's exact test is more appropriate.

now we find the ns=smallest marginal total = min(R1,R2,C1,C2)

Where R1 and R2 are total of first and second row respectively. 
C1 and C2 are total of first and second column respectively. 
Here 7 is smallest marginal Total corresponding to C2
There for we prepare the all possible combination as (0,7 ),(1,6),(2,5),(3,4),(4,3),(5,2),(6,1),(7,0). Using this all possible combinations we prepare (2x2) contingency table for each combination with fixed marginal totals. 
Then we calculating Pij for each (2x2) contingency table. 
Pij calculated as

Pi j = {(a+b)!x(c+d)!x(b+d)!x(a+c)!}/ {n! x a! x b! x c! x d!}

1) for (0,7) 

 

Sample I

Sample II

Total

No. of observations less then median

12

0

12

No. of observations greater than or equal to median

 1

7

8

Total

13

7

20

Pi j = {(a+b)!x(c+d)!x(b+d)!x(a+c)!}/ {n! x a! x b! x c! x d!}

Pi j = {13!x7!x12!x8!}/ {20! x 12! x 0! x 1! x 7!}
Pi j = 0.0001

2)for (1,6)

 

Sample I

Sample II

Total

No. of observations less then median

11

1

12

No. of observations greater than or equal to median

 2

6

8

Total

13

7

20

Pi j = {(a+b)!x(c+d)!x(b+d)!x(a+c)!}/ {n! x a! x b! x c! x d!}
Pi j = {13!x7!x12!x8!}/ {20! x 11! x 1! x 2! x 6!}
P2 =0.0043
 3) for (2,5)

 

Sample I

Sample II

Total

No. of observations less then median

10

2

12

No. of observations greater than or equal to median

 3

5

8

Total

13

7

20

Pi j = {(a+b)!x(c+d)!x(b+d)!x(a+c)!}/ {n! x a! x b! x c! x d!}
Pi j = {13!x7!x12!x8!}/ {20! x 10! x 2! x 3! x 5!}
P3 =0.0477 is the Pij of original value
4)for (3,4) 

 

Sample I

Sample II

Total

No. of observations less then median

9

3

12

No. of observations greater than or equal to median

 4

4

8

Total

13

7

20

Pi j = {(a+b)!x(c+d)!x(b+d)!x(a+c)!}/ {n! x a! x b! x c! x d!}
Pi j = {13!x7!x12!x8!}/ {20! x 9! x 3! x 4! x 4!}
P4 =0.1987
5) for (4,3)

 

Sample I

Sample II

Total

No. of observations less then median

8

4

12

No. of observations greater than or equal to median

 5

3

8

Total

13

7

20

Pi j = {(a+b)!x(c+d)!x(b+d)!x(a+c)!}/ {n! x a! x b! x c! x d!}
Pi j = {13!x7!x12!x8!}/ {20! x 8! x 4! x 5! x 3!}
P5 =0.3576
6) for (5,2)

 

Sample I

Sample II

Total

No. of observations less then median

7

5

12

No. of observations greater than or equal to median

 6

2

8

Total

13

7

20

Pi j = {(a+b)!x(c+d)!x(b+d)!x(a+c)!}/ {n! x a! x b! x c! x d!}
Pi j = {13!x7!x12!x8!}/ {20! x 7! x 5! x 6! x 2!}
P6 =0.28
7) for (5,2)

 

Sample I

Sample II

Total

No. of observations less then median

6

6

12

No. of observations greater than or equal to median

 7

1

8

Total

13

7

20

Pi j = {(a+b)!x(c+d)!x(b+d)!x(a+c)!}/ {n! x a! x b! x c! x d!}
Pi j = {13!x7!x12!x8!}/ {20! x 6! x 6! x 7! x 1!}
P7 =0.0954
8) for (7,0)

 

Sample I

Sample II

Total

No. of observations less then median

5

7

12

No. of observations greater than or equal to median

 8

0

8

Total

13

7

20

Pi j = {(a+b)!x(c+d)!x(b+d)!x(a+c)!}/ {n! x a! x b! x c! x d!}
Pi j = {13!x7!x12!x8!}/ {20! x 5! x 7! x 8! x 0!}
P8 =0.0102

After calculating all  possible value of Pi j . Then finding the P-value 

P-value is sum of all Pi j which are less than or equal to Pi j of original (2x2) contingency table. 
Therefore P- value is 
P-value = .0.0001 + 0.0102 + 0.0477 + 0.0043= 0.062
For taking Decision we comparing P- value with ∝% level of significance 
Here P-value(0.062) is greater than level of significance 0.05 then we accept the null hypothesis and conclude that the two samples are drawn from population have identical median. 


 



Comments

Popular posts from this blog

MCQ'S based on Basic Statistics (For B. Com. II Business Statistics)

    (MCQ Based on Probability, Index Number, Time Series   and Statistical Quality Control Sem - IV)                                                            1.The control chart were developed by ……         A) Karl Pearson B) R.A. fisher C) W.A. Shewhart D) B. Benjamin   2.the mean = 4 and variance = 2 for binomial r.v. x then value of n is….. A) 7 B) 10 C) 8 D)9   3.the mean = 3 and variance = 2 for binomial r.v. x then value of n is….. A) 7 B) 10 C) 8 D)9 4. If sample space S={a,b,c}, P(a) = 0.6 and P(b) = 0.3 then P(c)=….. A)0.6 B)0.3 C)0.5 D)0.1   5 Index number is called A) geometer B)barometer C)thermometer D)centimetre   6.   Index number for the base period is always takes as

Basic Concepts of Probability and Binomial Distribution

 Probability:  Basic concepts of Probability:  Probability is a way to measure hoe likely something is to happen. Probability is number between 0 and 1, where probability is 0 means is not happen at all and probability is 1 means it will be definitely happen, e.g. if we tossed coin there is a 50% chance to get head and 50% chance to get tail, it can be represented in probability as 0.5 for each outcome to get head and tail. Probability is used to help us taking decision and predicting the likelihood of the event in many areas, that are science, finance and Statistics.  Now we learn the some basic concepts that used in Probability:  i) Random Experiment OR Trail: A Random Experiment is an process that get one or more possible outcomes. examples of random experiment include tossing a coin, rolling a die, drawing  a card from pack of card etc. using this we specify the possible outcomes known as sample pace.  ii)Outcome: An outcome is a result of experiment. an outcome is one of the pos

Statistical Inference II Notes

Likelihood Ratio Test 

Measures of Central Tendency :Mean, Median and Mode

Changing Color Blog Name  Measures of Central Tendency  I. Introduction. II. Requirements of good measures. III. Mean Definition. IV . Properties  V. Merits and Demerits. VI. Examples VII.  Weighted Arithmetic Mean VIII. Median IX. Quartiles I. Introduction Everybody is familiar with the word Average. and everybody are used the word average in daily life as, average marks, average of bike, average speed etc. In real life the average is used to represent the whole data, or it is a single figure is represent the whole data. the average value is lies around the centre of the data. consider the example if we are interested to measure the height of the all student and remember the heights of all student, in that case there are 2700 students then it is not possible to remember the all 2700 students height so we find out the one value that represent the height of the all 2700 students in college. therefore the single value represent the whole data and

Time Series

 Time series  Introduction:-         We see the many variables are changes over period of time that are population (I.e. population are changes over time means population increase day by day), monthly demand of commodity, food production, agriculture production increases and that can be observed over period of times known as time series. Time series is defined as a set of observation arranged according to time is called time series. Or a time Series is a set of statistical observation arnging chronological order. ( Chronological order means it is arrangements of variable according to time) and it gives information about variable.  Also we draw the graph of time series to see the behaviour of variable over time. It can be used of forecasting. The analysis of time series is helpful to economist, business men, also for scientist etc. Because it used to forecasting the future, observing the past behaviour of that variable or items. Also planning for future, here time series use past data h

Classification, Tabulation, Frequency Distribution, Diagrams & Graphical Presentation.

Business Statistics I    Classification, Tabulation, Frequency Distribution ,  Diagrams & Graphical Presentation. In this section we study the following point : i. Classification and it types. ii. Tabulation. iii. Frequency and Frequency Distribution. iv. Some important concepts. v. Diagrams & Graphical Presentation   I. Classification and it's types:        Classification:- The process of arranging data into different classes or groups according to their common  characteristics is called classification. e.g. we dividing students into age, gender and religion. It is a classification of students into age, gender and religion.  Or  Classification is a method used to categorize data into different groups based on the values of specific variable.  The purpose of classification is to condenses the data, simplifies complexities, it useful to comparison and helps to analysis. The following are some criteria to classify the data into groups.        i. Quantitative Classification :-

Sequential Analysis: (SPRT)

  Sequential Analysis: We seen that in NP theory of testing hypothesis or in the parametric test n is the sample size and is regarded as fixed and the value of α fixed , we minimize the value of β.  But in the sequential analysis theory invented by A Wald in sequential analysis n is the sample number is not fixed but the both values α and β are fixed as constant. Sequential Probability Ratio Test: (SPRT):

Measures of Dispersion : Range , Quartile Deviation, Standard Deviation and Variance.

Measures of Dispersion :  I.  Introduction. II. Requirements of good measures. III. Uses of Measures of Dispersion. IV.  Methods Of Studying Dispersion:     i.  Absolute Measures of Dispersions :             i. Range (R)          ii. Quartile Deviation (Q.D.)          iii. Mean Deviation (M.D.)         iv. Standard Deviation (S. D.)         v. Variance    ii.   Relative Measures of Dispersions :              i. Coefficient of Range          ii. Coefficient of Quartile Deviation (Q.D.)          iii. Coefficient of Mean Deviation (M.D.)         iv. Coefficient of Standard Deviation (S. D.)         v. Coefficient of Variation (C.V.)                                                                                                                    I.  Introduction. We have the various measures of central tendency, like Mean, Median & Mode,  it is a single figure that represent the whole data. Now we are interested to study this figure(i.e. measures of central tendency) is proper represe

Business Statistics Notes ( Meaning, Scope, Limitations of statistics and sampling Methods)

  Business Statistics Paper I Notes. Welcome to our comprehensive collection of notes for the Business Statistics!  my aim is to provided you  with the knowledge you need as you begin your journey to comprehend the essential ideas of this subject. Statistics is a science of collecting, Presenting, analyzing, interpreting data to make informed business decisions. It forms the backbone of modern-day business practices, guiding organizations in optimizing processes, identifying trends, and predicting outcomes. I will explore several important topics through these notes, such as: 1. Introduction to Statistics. :  meaning definition and scope of  Statistics. 2. Data collection methods. 3. Sampling techniques. 4. Measures of  central tendency : Mean, Median, Mode. 5. Measures of Dispersion : Relative and Absolute Measures of dispersion,  Range, Q.D., Standard deviation, Variance. coefficient of variation.  6.Analysis of bivariate data: Correlation, Regression.  These notes will serve as you

Statistical Quality Control

 Statistical Quality Control  Statistical quality control (S. Q. C.) is a branch of Statistics it deals with the application of statistical methods to control and improve that quality of product. In this use statistical methods of sampling and test of significance to monitoring and controlling than quality of product during the production process.  The most important word in statistical Quality control is quality  The quality of product is the most important property while purchasing that product the product fulfill or meets the requirements and required specification we say it have good quality or quality product other wise not quality. Quality Control is the powerful technique to diagnosis the lack of quality in material, process of production.  Causes of variation:   When the product are produced in large scale there are variation in the size or composition the variation is inherent and inevitable in the quality of product these variation are classified into two causes.  1) chan