Non-Parametric Test : Kolmogorov - Smirnov one sample test and two sample test.

Non-Parametric tests

Kolmogorov-Smirnov one sample test. or Kolmogorov-smirnov goodness of fit test.

The test was discovered by A. N. Kolmogorov and N. V. Smirnov hence that test has name Kolmogorov-smirnov test, they developed two test for one sample test and two sample test. in this article we discus the Kolmogorov-smirnov one sample test. this is simple one sample Non- Parametric test used to test whether the data follow specific distribution or their is significant difference between the observed and theoretical distribution. (i.e. theoretical distribution means assumed distribution or considered specific distribution). the test measured the goodness of fit of theoretical distribution. we known that chi-square test is used to test the goodness of fit. the main difference is that the chi-square test is used when data is categorical and Kolmogorov-smirnov test is used when data is continuous.

Assumptions of Kolmogorov-smirnov test as follows:

1. the sample is selected form the population having unknown distribution.

2. the observation are independent.

3. the variable under study is continuous.

The procedure of Kolmogorov-smirnov test is:

Let X1, X2, ........Xn be a random sample of size n form population with unknown continuous distribution function. F(X). but we are interested to test the data follows a specific distribution F0(X) or not. for testing the hypothesis are

H0: Data follow a specific distribution.

H1: Data does not follow a specific distribution.

H0: F(X) = F0(X) vs H0: F(X) ≠ F0(X).

(it is two tailed test)

The test consist following steps:

Step I: the Kolmogorov-smirnov test (i.e. K-S test ) is based on the comparison of empirical cumulative distribution ( observed or sample distribution) with theoretical cumulative distribution function (i.e. specific or considered cumulative distribution function). now we first finding the empirical cumulative distribution function which is based on the sample and is defined as the proportion of sample observation which are less than or equal to some value of X and it is denoted as S(X).

S(X) = (number of observations are less than or equal to x ) / (total number of observations)

Step II: in this step we find the theoretical cumulative distribution function F0(X) for all possible values of x.

Step III: After finding both empirical and theoretical cumulative distribution function for all possible values of x then we take the difference of empirical and theoretical cumulative distribution function for all x

i.e. S(X) - F0(X) for all

Step IV: the test statistics denoted as Dn = Sup| S(X) - F0(X) |

where Dn is the supreme over all x of absolute value of difference of empirical and theoretical cumulative distribution function.

Step V: the calculated value of the test statistics Dn = Sup| S(X) - F0(X) | is compared with the critical value at ∝ % level of significance. and take decision about test to accept or reject the null hypothesis.

here two case are arises if small sample test and large sample test

i) Small sample test: if the sample size n is less than or equal to 40 (i.e. n ≤ 40)

then the test statistics as Dn = Sup| S(X) - F0(X) | is compared with the critical value at ∝ % level of significance

if the calculated Dn is greater than or equal to critical value Dn _α at α% of level of significance. i.e.

Dn > Dn _α

We reject null hypothesis at α% of level of significance. other wise we accept null hypothesis.

ii) Large sample test: if the sample size n is grater than 40 (i.e. n > 40)

then the test statistics as Dn = Sup| S(X) - F0(X) | is compared with the critical value at ∝ % level of significance.

but the sample size n is greater than 40 then the critical value for test statistics at given ∝ % level of significance is approximately calculated as

Dn ∝ = (1.36) / √n (note that in large sample test we calculate the critical value)

where n is sample size

then we comparing the calculated and critical value of D and take decision about test to accept or reject the null hypothesis. if the calculated Dn is greater than or equal to the critical value Dn _α at α% of level of significance. i.e. Dn > Dn _α then we reject the null hypothesis at α% of level of significance. other wise we accept null hypothesis.

Kolmogorov-Smirnov two sample test.

In kolmogorov-Smirnov one sample test is used to compare the empirical cumulative distribution with hypothesized cumulative distribution function. but in two sample test compare the empirical cumulative distribution of two sample.

Assumptions:

1. the sample is selected form the population having unknown distribution.

2. the observation are independent.

3. the variable under study is continuous.

The procedure of Kolmogorov-smirnov test is:

Let X1, X2, ........Xn and Y1, Y2, ........Yn be a random sample of size n form first and second population. Let S1(X) and S2(X) are sample empirical cumulative distribution function of first and second sample respectively now we want to the sample come from population have same distribution or not. for testing the hypothesis are

H0: F1(X) = F2(X)

H1: F1(X) ≠ F2(X)

The test consist following steps:

Step I: the test is based on the compression of the sample or empirical cumulative distribution function. so we firstly calculate the sample cumulative distribution for both sample and denoted as S1(X) and S2(X) and calculated as the proportion of number of sample observations are less than or equal to some value of X. to the total number of observations.

S1(X) = the number of observations less than or equal to value of X in first sample / total number of observation in sample first.

and

S2(X) = the number of observations less than or equal to value of X in second sample / total number of observation in sample second.

S1(X) and S2(X) calculated for all values of X

Step II: after calculating the empirical distribution function S1(X) and S2(X) for all sample we take the difference between them. (i.e. S1(X) - S2(X)).

Step III: the test statistics denoted as Dn = Sup| S1(X) - S2(X) |

where Dn is the supreme or maximum over all x of absolute value of difference of empirical cumulative distribution function of two samples.

the calculated value of the test statistics Dn = Sup| S1(X) - S2(X) | is compared with the critical value at ∝ % level of significance. and take decision about test to accept or reject the null hypothesis.

here two case are arises if small sample test and large sample test

i) Small sample test: if the sample size n is less than or equal to 40 (i.e. n ≤ 40)

then the test statistics as Dn = Sup| S1(X) - S2(X) | is compared with the critical value at ∝ % level of significance

if the calculated Dn is greater than or equal to critical value Dn _α at α% of level of significance. i.e.

Dn > Dn _α

We reject null hypothesis at α% of level of significance. other wise we accept null hypothesis.

ii) Large sample test: if the sample size n is grater than 40 (i.e. n > 40)

then the test statistics as Dn = Sup| S1(X) - S2(X) | is compared with the critical value at ∝ % level of significance.

but the sample size n is greater than 40 then the critical value for test statistics at given ∝ % level of significance is approximately calculated as

Dn ∝ = (1.36) / √n (note that in large sample test we calculate the critical value) here the selection of quantity 1.36 id based on the level of significance 1.36 is for 0.05 l. o. s. and 1.22 is for 0.10 and 1.63 for 0.01 l.o.s. refer below table.

where n is sample size n1=n2=n and for unequal sample size the critical value calculated using the formula,

Dn _α = c(∝) X √((n1+n2)/(n1n2). we select the value of C(∝) from table.

C(a)	1.22	1.36	1.48	1.63
a	0.1	0.05	0.025	0.01

then we comparing the calculated and critical value of D and take decision about test to accept or reject the null hypothesis. if the calculated Dn is greater than or equal to the critical value Dn _α at α% of level of significance. i.e. Dn > Dn _α then we reject the null hypothesis at α% of level of significance. other wise we accept null hypothesis.

this tests are helpful to TY B.Sc. (Statistics) students share with them.

Comments

Vikrant DesaiApril 10, 2023 at 2:50 AM
👍
ReplyDelete
Replies

Add comment

Shree GaneshA Statistics

Search This Blog