Skip to main content

Non- Parametric Test: Run Test

Non- Parametric Test 

A Non-Parametric tests is a one of the part of Statistical tests that non-parametric test does not assume any particular distribution for analyzing the variable. unlike the parametric test are based on the assumption like normality or other specific distribution  of the variable. Non-parametric test is based on the rank, order, signs, or other non-numerical data. we know both test parametric and non-parametric, but when use particular test? answer is that if the assumption of parametric test are violated such as data is not normally distributed or sample size is small. then we use Non-parametric test they can used to analyse categorical data  or ordinal data and data are obtained form in field like psychology, sociology and biology. For the analysis use the  some non-parametric test that are Wilcoxon signed-ranked test, mann-whiteny U test, sign test, Run test, Kruskal-wallis test. but the non-parametric test have lower statistical power than parametric test when assumption parametric test valid. 

First we see the assumption of parametric and non-parametric test assumptions.

Assumption of Parametric test:-

following are the key assumption of Parametric test: 

1. Normality: The data must be normally distributed. mean that data must be follow normal distribution. and the data should be bell-shaped, with the majority of the data points falling near to the mean. 

2. Independence: the data is should be independent, mean the data point are should be unrelated to each other. and the value of one variable is not affected by the value of other variable.  

3. Homoscedasticity:  that mean homogeneity of variance, the variance of the data should be equal across group or sample being compared. this means that the spread of the data should be similar in different groups. 

4. Continuous data: parametric tests are designed for continuous data, that means that the should me measured in scale that has equal interval between each value.

when these assumption are meet then parametric test is very powerful and accurate. however the assumption are violated any one assumption it gives inaccurate results and misleading conclusions. in that case we use the Non-parametric Test.

Non-Parametric Test:

Non-parametric test are statistical test that dose not required any specific assumption about the probability distribution of variable being analyzed. however the non-parametric test have some assumptions as:

1. Independence: the data is should be independent, mean the data point are should be unrelated to each other. and the value of one variable is not affected by the value of other variable.

2.Random Sampling: the data should be obtained through random sampling. that means the each observation have equal chance of being selected, and sampling should be representative of the population being selected. 

3.Ordinal Data: Non-parametric test is designed for ordinal data. that mean if data is ranked or measured in ordinal scale then we test.

4.Homogeneity of variance: the variance of the group being selected for comparison it should be similar.

Non-Parametric test is more flexible and robust than parametric test. because they do not based on strict assumption about distribution of the data. for accurate result carefully consider assumptions.  

Types of Non-Parametric Test :

the non-parametric test is Broadly divided into three categories as:

1. One-sample Test 

2.Two-sample test

3. K-sample test

we see all these types of Non-Parametric test.


 ONE-SAMPLE TESTS

RUN TEST

One of the fundamental assumptions of the parametric test is that the observed data are random and test statistic and the subsequent analysis are based on this assumption. It is always better to check whether this assumption is true or not.

A very simple tool for checking this assumption is run test. This section is devoted to throw light on the run test. Before discussing the run test first we have to explain what we mean by a “run”.

A run in observations, is defined as a sequence of letters or symbols of one kind, immediately preceded and succeeded by letters of other kind or no letters. For example a sequence of two letters H and T as given below:

HHTHTTTHTHHHTTT                                                                                                            

In this sequence, we start with first letter H and go up to other kind of letter, that is, T. In this way, we get first run of two H’s. Then we start with this T and go up to other kind of letter, that is, H. Then we get a run of one T and so on and finally a run of three T’s. In all, we see that there are eight runs. And it is denoted as r=8. it is shown as below.

HH T H TTT H  T HHH TTT                                                                                                  1   2   3   4     5   6  7         8

Under the run test we Judged randomness of observations by using number of runs in the observed sequence. Too few runs indicates that there is some clustering or trend and too large runs indicates that there is some kind of repeated or cycles according to some patterns.

for example, the following sequence of H's and T's is obtained when tossing a coin 10 times.

HHHHHHTTTT                                                                                                                        1             2 

in the above sequence we see there is only 2 run's of 6 heads and 4 Tails, hence from this we say the similar item tend to cluster together, therefore such sequence of observation is not considered as random. now the anther sequence of 10 tosses is as following:

H T H T H T H T H T                                                                                                        1  2  3 4 5   6  7 8 9 10 .

this numbers indicates the runs for all observation  in this sequence there are 10runs of 5 runs of one head each  and 5 runs of one tail each. this sequence is cloud not be considered as random because there are too many runs they indicates the pattern.

T HHH TT H T HH T HHH TT H T H T                                                                          1    2      3   4  5  6    7   8       9 10 111213 

Here neither the number of runs too small nor too large, this type of sequence may be considered as random. 

Assumptions: 

Run test make the following assumptions:

(i) Observed data should be such that we can categorise the observations into two mutually exclusive types.

 (ii) The variable under study is continuous.                    

Procedure for RUN test

Let X1,X2,...,Xn be a set of no observations arranged in the order in which they occur. Generally, we are interested to test whether a population or a sample or a sequence is random or not. So here we consider only two-tailed case. Thus, we can take the null and alternative hypotheses as

H0 :  The observations are random

H1 :  The observations are not random [two-tailed test]

test consist following steps:

Step 1: First of all, we check the form of the given dada that the given data are in symbolical form such as sequence of H and T, A and B, etc. or in the numeric form. If the data in symbolical form then it is ok, but if data in numeric form then first we convert numeric data in symbolical form. For this, we calculate median of the given observations by using either of the following formula given below             

Median = size of [(n+1) /2 ] th observation.           

provided observations should be either in ascending or descending order of magnitude.

After that, we replace the observations which are above the median by  a symbol ‘A’ (say) and the observation which are below the median by a symbol ‘B’ (say) without altering the observed order. The observations which are equal to median are discarded form the analysis and let reduced size of the sample denoted by n.

 

Step 2: Counts number of times first symbol (A) occurs and denote it by n1

 

Step 3: Counts number of times second symbol (B) occurs and denote it by n2 where, n = n1+n2

 

Step 4: For testing the null hypothesis, the test statistic is the total number of runs so in this step we count total number of runs in the sequence of symbols and denote it by R.

 

Step 5: Obtain critical values of test statistic corresponding n1,n2  at α % level of significance under the condition that null hypothesis is true.  From the Table  of critical value for run test is used to obtain respectively lower (RL) and upper (RU) critical values of the number of runs for a given combination of n1 and n2 at 5% level of significance.

Note1: Generally, critical values for run test are available at 5% level of

significance so we test our hypotheses for 5% level of significance.

Step 6: Decision Rule:

To take the decision about null hypothesis, the test statistic is compared with the critical (tabulated) values.

 

If the observed number of runs(R) is either less than or equal to the lower critical value (RL) or greater than or equal to the upper critical value (RU), that is, if R < RL or R > RU then we reject the null hypothesis at 5% level of significance.

 

If R lies between  Rl and Ru , that is,  Rl < R> Ru, then we Accept

 null hypothesis at 5% level of significance     


Non- Parametric Test

In Non- Parametric Test we see first Sign Test of one sample test, now we discuss the paired sign test it is also called two sample sign test, you first red the sign test it is helpful to you understand the paired sign test.

 2. Paired Sign Test: Or two sample sign test

In social sciences the two related groups are paired and we interested to examine the difference between two related groups. if the observations are recorded or data are available as after and before type means the data are recorded  before diet and after diet that time we get paired observations of same variable or item,  in this situations we use Paired T- Test. if the Assumption of the T-Test are fulfil, if Assumption of T-Test is not fulfil that time we use the paired sign test, that mean the paired sing test is alternative Non-Parametric test for Parametric Paired T-Test. in other word in some situation paired T-Test is not applicable then we use the Non- parametric Paired sign test. e.g. let's say you want to know if  a new exercise program is effective in reducing body fat, you randomly select 10 participants and measure there body fat, then you have them follow the exercise program for 4 weeks and measure there body fat. this give before and after data in that situation we use paired sign test if  data does not fulfil the assumption of Paired T-Test.  also sign test is applicable when data are ordinal or given in symbolically.(i.e. + or -, a or b)

Assumptions:

If the data follow the following assumption then we use Paired Sign Test:

1. the pair of observations are independents,

2. the measurement of variable is at ordinal scale. and variable under study is continuous.

let (X1, Y1),(X2, Y2), .......(Xn, Yn), random sample of size n independent and continuous items or units, each observation is measured in before and after the (e.g. after and before training, diet, or treatment etc. here we want to test the their is effect of diet, training etc. for that we can take the null and alternative hypothesis. 

H:μ12  VS

H1≠ μ2

this is two tailed test

the Paired Sign Test has same procedure as Sign test. they are divided into following steps.

Step I: the sign test is based on Sign hence firstly we converting the data into sequence of plus and Minus signs. for this we compare the observations X and  Yif the observation X  >  Yi then we take plus (+) sign and if the observation X < Yi then we take minus (-) sign. and one of the observation      X  is equal to Yi  or X = Y then we removed that observation form the data then the sample size is reduced is denoted as n.

Step II: In this step we count the number of (+) plus and number of (-) minus signs. and it is denoted as S+ And S- , S+ for number of plus signs and  S- for number of minus signs.

Step IIINow we consider the null hypothesis is true then on the basis of postulated value of median  we expect that the value of variable is greater than median mean we get plus sign then the number of plus sign it consider as success and   number of minus sign is consider as failure approximately equal. then the distribution of sign is binomial distribution with parameter (n, p=0.5). for simplicity we consider smaller number of sign. that means the number of plus sign is less than number of minus sign then plus sign is success and minus is failure. similarly if minus signs are less than number of plus sign then minus sign is success and number of plus sign is failure.

Step IV

i. Small Sample Test : (i.e. n  is less than  or equal to 20 ).

if the number of observations are is less than or equal to 20 (i.e. n  £ 20 ).  is called small sample test.

for Decision about hypothesis we we the p-value and it is determined as 

P-value = 2 P(S £  s)  and where s is equal to    S = min ( S+,S-)  this is foe two tailed test.

(note that for the two sample test we consider P-value = 2P(S £  s), and for any one tailed test the test Statistics is P-value = P(S £  s) this is small change)

if the number of observations are is less than or equal to 20 (i.e. n  £ 20 ).  is called small sample test.

if the p-value is less than or equal to a%  level of significance then we reject the null hypothesis at a% level of significance otherwise accept the null hypothesis.

Large sample test:(i.e. n  greater than 20 )

if the number of observation is greater than 20 we use large sample test  (i.e. n  >20 ).

  for large sample test we use normal approximation for binomial distribution.

as E(S) = n*1/2 = n/2

and S.D.( S) = n*1/4= n/4

the normal approximation is z test gives as

z = (S - E(S))/S.D.(S) 

Z =  (S - (n/2))/(n/4) 

then we comparing the calculated and tabulated value of z. at  a%  level of significance.

if calculated z is less than or equal to  tabulated (critical value ) then we accept null hypothesis other wise reject the null hypothesis.



In  the next part  we discuss the examples of Paired sign test. 

next part coming soon                                                                                       

                                               

Comments

Post a Comment

Popular posts from this blog

MCQ'S based on Basic Statistics (For B. Com. II Business Statistics)

    (MCQ Based on Probability, Index Number, Time Series   and Statistical Quality Control Sem - IV)                                                            1.The control chart were developed by ……         A) Karl Pearson B) R.A. fisher C) W.A. Shewhart D) B. Benjamin   2.the mean = 4 and variance = 2 for binomial r.v. x then value of n is….. A) 7 B) 10 C) 8 D)9   3.the mean = 3 and variance = 2 for binomial r.v. x then value of n is….. A) 7 B) 10 C) 8 D)9 4. If sample space S={a,b,c}, P(a) = 0.6 and P(b) = 0.3 then P(c)=….. A)0.6 B)0.3 C)0.5 D)0.1   5 Index number is called A) geometer B)barometer C)thermometer D)centimetre   6.   Index number for the base period is always takes as

Basic Concepts of Probability and Binomial Distribution

 Probability:  Basic concepts of Probability:  Probability is a way to measure hoe likely something is to happen. Probability is number between 0 and 1, where probability is 0 means is not happen at all and probability is 1 means it will be definitely happen, e.g. if we tossed coin there is a 50% chance to get head and 50% chance to get tail, it can be represented in probability as 0.5 for each outcome to get head and tail. Probability is used to help us taking decision and predicting the likelihood of the event in many areas, that are science, finance and Statistics.  Now we learn the some basic concepts that used in Probability:  i) Random Experiment OR Trail: A Random Experiment is an process that get one or more possible outcomes. examples of random experiment include tossing a coin, rolling a die, drawing  a card from pack of card etc. using this we specify the possible outcomes known as sample pace.  ii)Outcome: An outcome is a result of experiment. an outcome is one of the pos

Statistical Inference II Notes

Likelihood Ratio Test 

Measures of Central Tendency :Mean, Median and Mode

Changing Color Blog Name  Measures of Central Tendency  I. Introduction. II. Requirements of good measures. III. Mean Definition. IV . Properties  V. Merits and Demerits. VI. Examples VII.  Weighted Arithmetic Mean VIII. Median IX. Quartiles I. Introduction Everybody is familiar with the word Average. and everybody are used the word average in daily life as, average marks, average of bike, average speed etc. In real life the average is used to represent the whole data, or it is a single figure is represent the whole data. the average value is lies around the centre of the data. consider the example if we are interested to measure the height of the all student and remember the heights of all student, in that case there are 2700 students then it is not possible to remember the all 2700 students height so we find out the one value that represent the height of the all 2700 students in college. therefore the single value represent the whole data and

Time Series

 Time series  Introduction:-         We see the many variables are changes over period of time that are population (I.e. population are changes over time means population increase day by day), monthly demand of commodity, food production, agriculture production increases and that can be observed over period of times known as time series. Time series is defined as a set of observation arranged according to time is called time series. Or a time Series is a set of statistical observation arnging chronological order. ( Chronological order means it is arrangements of variable according to time) and it gives information about variable.  Also we draw the graph of time series to see the behaviour of variable over time. It can be used of forecasting. The analysis of time series is helpful to economist, business men, also for scientist etc. Because it used to forecasting the future, observing the past behaviour of that variable or items. Also planning for future, here time series use past data h

Classification, Tabulation, Frequency Distribution, Diagrams & Graphical Presentation.

Business Statistics I    Classification, Tabulation, Frequency Distribution ,  Diagrams & Graphical Presentation. In this section we study the following point : i. Classification and it types. ii. Tabulation. iii. Frequency and Frequency Distribution. iv. Some important concepts. v. Diagrams & Graphical Presentation   I. Classification and it's types:        Classification:- The process of arranging data into different classes or groups according to their common  characteristics is called classification. e.g. we dividing students into age, gender and religion. It is a classification of students into age, gender and religion.  Or  Classification is a method used to categorize data into different groups based on the values of specific variable.  The purpose of classification is to condenses the data, simplifies complexities, it useful to comparison and helps to analysis. The following are some criteria to classify the data into groups.        i. Quantitative Classification :-

Sequential Analysis: (SPRT)

  Sequential Analysis: We seen that in NP theory of testing hypothesis or in the parametric test n is the sample size and is regarded as fixed and the value of α fixed , we minimize the value of β.  But in the sequential analysis theory invented by A Wald in sequential analysis n is the sample number is not fixed but the both values α and β are fixed as constant. Sequential Probability Ratio Test: (SPRT):

Measures of Dispersion : Range , Quartile Deviation, Standard Deviation and Variance.

Measures of Dispersion :  I.  Introduction. II. Requirements of good measures. III. Uses of Measures of Dispersion. IV.  Methods Of Studying Dispersion:     i.  Absolute Measures of Dispersions :             i. Range (R)          ii. Quartile Deviation (Q.D.)          iii. Mean Deviation (M.D.)         iv. Standard Deviation (S. D.)         v. Variance    ii.   Relative Measures of Dispersions :              i. Coefficient of Range          ii. Coefficient of Quartile Deviation (Q.D.)          iii. Coefficient of Mean Deviation (M.D.)         iv. Coefficient of Standard Deviation (S. D.)         v. Coefficient of Variation (C.V.)                                                                                                                    I.  Introduction. We have the various measures of central tendency, like Mean, Median & Mode,  it is a single figure that represent the whole data. Now we are interested to study this figure(i.e. measures of central tendency) is proper represe

Business Statistics Notes ( Meaning, Scope, Limitations of statistics and sampling Methods)

  Business Statistics Paper I Notes. Welcome to our comprehensive collection of notes for the Business Statistics!  my aim is to provided you  with the knowledge you need as you begin your journey to comprehend the essential ideas of this subject. Statistics is a science of collecting, Presenting, analyzing, interpreting data to make informed business decisions. It forms the backbone of modern-day business practices, guiding organizations in optimizing processes, identifying trends, and predicting outcomes. I will explore several important topics through these notes, such as: 1. Introduction to Statistics. :  meaning definition and scope of  Statistics. 2. Data collection methods. 3. Sampling techniques. 4. Measures of  central tendency : Mean, Median, Mode. 5. Measures of Dispersion : Relative and Absolute Measures of dispersion,  Range, Q.D., Standard deviation, Variance. coefficient of variation.  6.Analysis of bivariate data: Correlation, Regression.  These notes will serve as you

Statistical Quality Control

 Statistical Quality Control  Statistical quality control (S. Q. C.) is a branch of Statistics it deals with the application of statistical methods to control and improve that quality of product. In this use statistical methods of sampling and test of significance to monitoring and controlling than quality of product during the production process.  The most important word in statistical Quality control is quality  The quality of product is the most important property while purchasing that product the product fulfill or meets the requirements and required specification we say it have good quality or quality product other wise not quality. Quality Control is the powerful technique to diagnosis the lack of quality in material, process of production.  Causes of variation:   When the product are produced in large scale there are variation in the size or composition the variation is inherent and inevitable in the quality of product these variation are classified into two causes.  1) chan