Skip to main content

Non-Parametric Test : Kolmogorov - Smirnov one sample test and two sample test.

 Non-Parametric tests

Kolmogorov-Smirnov one sample test.  or Kolmogorov-smirnov goodness of fit test.

The test was discovered by A. N. Kolmogorov and N. V. Smirnov hence that test has name Kolmogorov-smirnov  test, they developed two test for one sample test and two sample test. in this article we discus the Kolmogorov-smirnov  one sample test. this is simple one sample Non- Parametric test used to test whether the data follow specific distribution or their is significant difference between the observed and theoretical distribution. (i.e. theoretical distribution means assumed distribution or considered specific distribution). the test measured the goodness of fit of theoretical distribution. we known that chi-square test is used to test the goodness of fit. the main difference is that the chi-square test is used when data is categorical and Kolmogorov-smirnov  test is used when data is continuous.

Assumptions of Kolmogorov-smirnov test as follows:

1. the sample is selected form the population having unknown distribution. 

2. the observation are independent.

3. the variable under study is continuous.

The procedure of Kolmogorov-smirnov  test is: 

Let X1, X2, ........Xn be a random sample of size n form population with unknown continuous distribution function. F(X). but we are interested to test the data follows a specific distribution F0(X) or not. for testing the hypothesis are

H0: Data follow a specific distribution.    

H1: Data does not follow a specific distribution.

OR 

H0: F(X) =  F0(X)   vs H0: F(X) ≠ F0(X).

(it is two tailed test)

The test consist following steps:

Step I: the Kolmogorov-smirnov  test (i.e. K-S test ) is based on the comparison of empirical cumulative distribution ( observed or sample distribution) with theoretical cumulative distribution function (i.e. specific or considered cumulative distribution function). now we first finding the empirical cumulative distribution function which is based on the sample and is defined as the proportion of sample observation which are less than or equal to some value of X and it is denoted as S(X).

S(X) = (number of observations are less than or equal to x ) / (total number of observations)

Step II: in this step we find the theoretical cumulative distribution function F0(X) for all possible  values of x.

Step III: After finding both empirical and theoretical cumulative distribution function for all possible values of x then we take the difference of empirical and theoretical cumulative distribution function  for all x

i.e. S(X) - F0(X)  for all 

Step IV: the test statistics denoted as Dn =    Sup| S(X) - F0(X) |

where Dis the supreme over all x of absolute value of difference of empirical and theoretical cumulative distribution function.

Step V: the calculated value of the test statistics D=    Sup| S(X) - F0(X) | is compared with the critical value at ∝ % level of significance. and take decision about test to accept or reject the null hypothesis.

here two case are arises if small sample test and large sample test

i) Small sample test: if the sample size n is less than or equal to 40 (i.e. n ≤  40) 

then the  test statistics as D=    Sup| S(X) - F0(X) |  is compared with the critical value at ∝ % level of significance

if the calculated Dn  is greater than   or equal to critical value  Dn α  at α% of level of significance. i.e.  

Dn   > Dn α


 We reject null hypothesis  at α% of level of significance. other wise we accept null hypothesis.

ii) Large sample test: if the sample size n is grater than 40 (i.e. n > 40)

 then the  test statistics as D=    Sup| S(X) - F0(X) |  is compared with the critical value at ∝ % level of significance.

but the sample size  n is greater than 40 then the critical value for test statistics at given ∝ % level of significance is approximately calculated as 

Dn ∝   = (1.36) / √n      (note that in large sample test we calculate the critical value)

where n is sample size

then we comparing the calculated and critical value of D and take decision about test to accept or reject the null hypothesis. if the calculated Dn  is greater than or equal to  the critical value  Dn α  at α% of level of significance. i.e.  Dn   > Dn α then we reject the null hypothesis  at α% of level of significance. other wise we accept null hypothesis.


Kolmogorov-Smirnov two sample test. 

In kolmogorov-Smirnov one sample test is used to compare the empirical cumulative distribution with hypothesized cumulative distribution function. but in two sample test compare the empirical cumulative  distribution of two sample. 

Assumptions: 

1. the sample is selected form the population having unknown distribution. 

2. the observation are independent.

3. the variable under study is continuous.

The procedure of Kolmogorov-smirnov  test is: 

Let X1, X2, ........Xand Y1, Y2, ........Ybe a random sample of size n form first and second population.  Let S1(X) and S2(X) are sample empirical cumulative distribution function of first and second sample respectively now we want to the sample come from population have same distribution or not. for testing the hypothesis are

H0:    F1(X) =  F2(X)

H1:  F1(X) ≠ F2(X)

The test consist following steps:

Step I:  the test is based on the compression  of the sample or empirical cumulative distribution function. so we firstly calculate the sample cumulative distribution for both sample and denoted as S1(X) and S2(X) and calculated as the proportion of number of sample observations are less than or equal to some value of X. to the total number of observations. 

S1(X) = the number of observations less than or equal to value of X  in first sample / total number of observation in sample first.  

and 

S2(X) = the number of observations less than or equal to value of X  in second sample / total number of observation in sample second.   

S1(X) and S2(X) calculated for all values of X 

Step II: after calculating the empirical distribution function S1(X) and S2(X) for all sample we take the difference between them. (i.e. S1(X) - S2(X)).

Step III:  the test statistics denoted as D= Sup| S1(X) - S2(X) |

where Dn  is the supreme or maximum  over all x of absolute value of difference of empirical cumulative distribution function of two samples.

the calculated value of the test statistics D=  Sup| S1(X) - S2(X) | is compared with the critical value at ∝ % level of significance. and take decision about test to accept or reject the null hypothesis.

here two case are arises if small sample test and large sample test

i) Small sample test: if the sample size n is less than or equal to 40 (i.e. n ≤  40) 

then the  test statistics as D=  Sup| S1(X) - S2(X) |  is compared with the critical value at ∝ % level of significance

if the calculated Dn  is greater than  or equal to critical value  Dn α  at α% of level of significance. i.e.  

Dn   > Dn α

 We reject null hypothesis  at α% of level of significance. other wise we accept null hypothesis.

ii) Large sample test: if the sample size n is grater than 40 (i.e. n > 40)

 then the  test statistics as D= Sup| S1(X) - S2(X) | is compared with the critical value at ∝ % level of significance.

but the sample size  n is greater than 40 then the critical value for test statistics at given ∝ % level of significance is approximately calculated as 

Dn ∝  = (1.36) / √n      (note that in large sample test we calculate the critical value) here the selection of quantity 1.36 id based on the level of significance 1.36 is for 0.05 l. o. s. and 1.22 is for 0.10 and 1.63 for 0.01 l.o.s. refer below table.

OR

where n is sample size n1=n2=n and for unequal sample size the critical value calculated using the formula,

 Dn α = c(∝) X √((n1+n2)/(n1n2). we select the value of C(∝) from table.

C(a)

1.22

1.36

1.48

1.63

a

0.1

0.05

0.025

0.01

then we comparing the calculated and critical value of D and take decision about test to accept or reject the null hypothesis. if the calculated Dn  is greater than or equal to  the critical value  Dn α  at α% of level of significance. i.e.  Dn   > Dn α then we reject the null hypothesis  at α% of level of significance. other wise we accept null hypothesis.


this tests are helpful to TY B.Sc. (Statistics) students  share with them.

Comments

Post a Comment

Popular posts from this blog

MCQ'S based on Basic Statistics (For B. Com. II Business Statistics)

    (MCQ Based on Probability, Index Number, Time Series   and Statistical Quality Control Sem - IV)                                                            1.The control chart were developed by ……         A) Karl Pearson B) R.A. fisher C) W.A. Shewhart D) B. Benjamin   2.the mean = 4 and variance = 2 for binomial r.v. x then value of n is….. A) 7 B) 10 C) 8 D)9   3.the mean = 3 and variance = 2 for binomial r.v. x then value of n is….. A) 7 B) 10 C) 8 D)9 4. If sampl...

Measures of Central Tendency :Mean, Median and Mode

Changing Color Blog Name  Measures of Central Tendency  I. Introduction. II. Requirements of good measures. III. Mean Definition. IV . Properties  V. Merits and Demerits. VI. Examples VII.  Weighted Arithmetic Mean VIII. Median IX. Quartiles I. Introduction Everybody is familiar with the word Average. and everybody are used the word average in daily life as, average marks, average of bike, average speed etc. In real life the average is used to represent the whole data, or it is a single figure is represent the whole data. the average value is lies around the centre of the data. consider the example if we are interested to measure the height of the all student and remember the heights of all student, in that case there are 2700 students then it is not possible to remember the all 2700 students height so we find out the one value that represent the height of the all 2700 students in college. therefore the single value represent ...

Business Statistics Notes ( Meaning, Scope, Limitations of statistics and sampling Methods)

  Business Statistics Paper I Notes. Welcome to our comprehensive collection of notes for the Business Statistics!  my aim is to provided you  with the knowledge you need as you begin your journey to comprehend the essential ideas of this subject. Statistics is a science of collecting, Presenting, analyzing, interpreting data to make informed business decisions. It forms the backbone of modern-day business practices, guiding organizations in optimizing processes, identifying trends, and predicting outcomes. I will explore several important topics through these notes, such as: 1. Introduction to Statistics. :  meaning definition and scope of  Statistics. 2. Data collection methods. 3. Sampling techniques. 4. Measures of  central tendency : Mean, Median, Mode. 5. Measures of Dispersion : Relative and Absolute Measures of dispersion,  Range, Q.D., Standard deviation, Variance. coefficient of variation.  6.Analysis of bivariate data: Correlation, Regr...

Classification, Tabulation, Frequency Distribution, Diagrams & Graphical Presentation.

Business Statistics I    Classification, Tabulation, Frequency Distribution ,  Diagrams & Graphical Presentation. In this section we study the following point : i. Classification and it types. ii. Tabulation. iii. Frequency and Frequency Distribution. iv. Some important concepts. v. Diagrams & Graphical Presentation   I. Classification and it's types:        Classification:- The process of arranging data into different classes or groups according to their common  characteristics is called classification. e.g. we dividing students into age, gender and religion. It is a classification of students into age, gender and religion.  Or  Classification is a method used to categorize data into different groups based on the values of specific variable.  The purpose of classification is to condenses the data, simplifies complexities, it useful to comparison and helps to analysis. The following are some criteria to classi...

Measures of Dispersion : Range , Quartile Deviation, Standard Deviation and Variance.

Measures of Dispersion :  I.  Introduction. II. Requirements of good measures. III. Uses of Measures of Dispersion. IV.  Methods Of Studying Dispersion:     i.  Absolute Measures of Dispersions :             i. Range (R)          ii. Quartile Deviation (Q.D.)          iii. Mean Deviation (M.D.)         iv. Standard Deviation (S. D.)         v. Variance    ii.   Relative Measures of Dispersions :              i. Coefficient of Range          ii. Coefficient of Quartile Deviation (Q.D.)          iii. Coefficient of Mean Deviation (M.D.)         iv. Coefficient of Standard Deviation (S. D.)         v. Coefficien...

Basic Concepts of Probability and Binomial Distribution , Poisson Distribution.

 Probability:  Basic concepts of Probability:  Probability is a way to measure hoe likely something is to happen. Probability is number between 0 and 1, where probability is 0 means is not happen at all and probability is 1 means it will be definitely happen, e.g. if we tossed coin there is a 50% chance to get head and 50% chance to get tail, it can be represented in probability as 0.5 for each outcome to get head and tail. Probability is used to help us taking decision and predicting the likelihood of the event in many areas, that are science, finance and Statistics.  Now we learn the some basic concepts that used in Probability:  i) Random Experiment OR Trail: A Random Experiment is an process that get one or more possible outcomes. examples of random experiment include tossing a coin, rolling a die, drawing  a card from pack of card etc. using this we specify the possible outcomes known as sample pace.  ii)Outcome: An outcome is a result of experi...

Statistical Inference I ( Theory of estimation : Efficiency)

🔖Statistical Inference I ( Theory of estimation : Efficiency)  In this article we see the  terms:  I. Efficiency. II. Mean Square Error. III. Consistency. 📚 Efficiency:  We know that  two unbiased estimator of parameter gives rise to infinitely many unbiased estimators of parameter. there if one of parameter have two estimators then the problem is to choose one of the best estimator among the class of unbiased estimators. in that case we need to some other criteria to to find out best estimator. therefore, that situation  we check the variability of that estimator, the measure of variability of estimator T around it mean is Var(T). hence If T is an Unbiased estimator of parameter then it's variance gives good precision. the variance is smaller then it give's greater precision. 📑 i. Efficient estimator: An estimator T is said to be an Efficient Estimator of 𝚹, if T is unbiased estimator of    𝛉. and it's variance is less than any other estima...

The Power of Statistics: A Gateway to Exciting Opportunities

  My Blog The Power of Statistics: A Gateway to Exciting Opportunities     Hey there, future statistician! Ever wondered how Netflix seems to know exactly what shows you'll love, how sports teams break down player performance, or how businesses figure out their pricing strategies? The answer is statistics—a fascinating field that helps us make sense of data in our everyday lives. Let's dive into why choosing statistics for your B.Sc. Part First can lead you to some exciting opportunities.     Why Statistics Matters in Everyday Life     From predicting election outcomes and analyzing social media trends to understanding consumer behavior and optimizing public transport routes, statistics are crucial. It's the backbone of modern decision-making, helping us sift through complex data to uncover meaningful insights that drive innovation and progress.   The Role of Statistics in Future Opportunities ...

Statistical Inference I ( Theory of Estimation) : Unbiased it's properties and examples

 📚Statistical Inference I Notes The theory of  estimation invented by Prof. R. A. Fisher in a series of fundamental papers in around 1930. Statistical inference is a process of drawing conclusions about a population based on the information gathered from a sample. It involves using statistical techniques to analyse data, estimate parameters, test hypotheses, and quantify uncertainty. In essence, it allows us to make inferences about a larger group (i.e. population) based on the characteristics observed in a smaller subset (i.e. sample) of that group. Notation of parameter: Let x be a random variable having distribution function F or f is a population distribution. the constant of  distribution function of F is known as Parameter. In general the parameter is denoted as any Greek Letters as θ.   now we see the some basic terms :  i. Population : in a statistics, The group of individual under study is called Population. the population is may be a group of obj...