Skip to main content

Mann-Whitney U test.

Non-Parametric Test :

Mann-Whitney U Test

if we are  interested in  testing of difference between mean of two independent population. In that case we use two sample T-Test is used when the data follows assumptions of parametric T-Test. like two independent sample are drawn from normal population & have equal variance. the variable are measured in at least of an interval scale. But if the data are collected on ordinal scale and sample drawn from population is not known. (that situation aeries in the different filed like study of marketing, or biological studies, etc. ). For that cases the parametric test cannot be used. in that situation a Non- Parametric test are  more appropriate. In such circumstances the simple Non-Parametric test used is known as Mann-Whitney U test. 

This Non-Parametric Mann-Whitney U test developed by Mann, Whitney and Wilcoxon. therefore it name is Mann-Whitney U test, and sometimes it is also called Wilcoxon's rank sum test. The Mann-Whitney U test is alternative to the Parametric T-Test for testing difference between means of two populations. if the assumptions of T-Test is fulfill then Mann-Whitney U test gives weaker result than T-Test.

Assumptions: the test based on the following assumptions.

1. The two samples are randomly and independently drawn from population.

2.The variable measured in at least ordinal scale.

3.The variable under study is continuous.

the testing procedure of Mann-Whitney U test is as follows.

Let X1, X2, .......Xn1, and Y1, Y2, .......Yn2, be a random and independent sample are drawn from two population having median μ1 and   μ respectively. here we want to test the hypothesis about the median of two population is same or not therefore the null and alternative hypothesis is 

H0:    μ1 = μV/S     H1:  μ1 ≠  μ2   

OR we also set the hypothesis for testing the two sample are drawn from identical population repetitive to location of population i.e. median of population. Therefore the null and alternative hypothesis is 

H0:    F1(X) = F2(X) V/S     H1:  F1(X) ≠  F2(X)   

the following step consist for testing:

Step I: Firstly we combining all observations from two samples.

Step II: Then assign the rank to all combined observations from smallest to largest observation that means we assign the rank 1 to smallest observation and rank 2 to next smallest observation and so on up to last observation. and in data if the tie is occurs then we assign the average rank to that observations. 

Step III: Now we compute R1 and R2

R1 Is a sum of rank of first sample of size n1

R2 Is a sum of rank of Second sample of size n2

Step IV :

the test statistics is 

U1= n1 n2 + {( n1(n1+1))/2} - R1

U2= n1 n2 +{( n2(n2+1))/2} – R2

Test Statistics is

U= min (U1, U2)

And taking decision about the null hypothesis we firstly obtain the critical value of test statistics at 𝜶 % level of significance. and it can be obtained from the table of critical values of Mann- Whitney U test.

Decision Rule: if the calculated value of test statistics U is  less than or equal to critical value(i.e Ucal  ≤ Utab)   then  we reject the null hypothesis at 𝜶 % level of significance.  other wise we accept the null hypothesis.  (if sample is less than 20 use above test statistics U )

if the sample size is greater than 20 then we use large sample test.

for Large sample   

if either n1 or  n are greater than 20 the test statistics U is approximately normally distributed with mean E(U) and variance var(U) .

E(U) = (n1 x n2) /2  and Var(U) = {(n1x n2(n1 + n+1))}/12

now the normal approximation test statistics is  

Z = {U - E(U)}/√var(U) 

Decision Rule: if the calculated value of test statistics Z is  less than or equal to critical value(i.e Zcal  > Ztab)   then  we reject the null hypothesis at 𝜶 % level of significance.  otherwise we accept the null hypothesis. (i.e Zcal   ≤  Ztab)   

 

also read the assumptions for parametric and Non-Parametric test.

 

 


Comments

Popular posts from this blog

Statistical Inference II Notes

Likelihood Ratio Test 

Index Number

 Index Number      Introduction  We seen in measures of central tendency the data can be reduced to a single figure by calculating an average and two series can be compared by their averages. But the data are homogeneous then the average is meaningful. (Data is homogeneous means data in same type). If the two series of the price of commodity for two years. It is clear that we cannot compare the cost of living for two years by using simple average of the price of the commodities. For that type of problem we need type of average is called Index number. Index number firstly defined or developed to study the effect of price change on the cost of living. But now days the theory of index number is extended to the field of wholesale price, industrial production, agricultural production etc. Index number is like barometers to measure the change in change in economics activities.   An index may be defined as a " specialized  average designed to measure the...

Statistical Inference: Basic Terms and Definitions.

  📚📖 Statistical Inference: Basic Terms. The theory of estimation is of paramount importance in statistics for several reasons. Firstly, it allows researchers to make informed inferences about population characteristics based on limited sample data. Since it is often impractical or impossible to measure an entire population, estimation provides a framework to generalize findings from a sample to the larger population. By employing various estimation methods, statisticians can estimate population parameters such as means, proportions, and variances, providing valuable insights into the population's characteristics. Second, the theory of estimating aids in quantifying the estimates' inherent uncertainty. Measures like standard errors, confidence intervals, and p-values are included with estimators to provide  an idea of how accurate and reliable the estimates are. The range of possible values for the population characteristics and the degree of confidence attached to those est...

B. Com. -I Statistics Practical No. 1 Classification, tabulation and frequency distribution –I: Qualitative data.

  Shree GaneshA B. Com. Part – I: Semester – I OE–I    Semester – I (BASIC STATISTICS PRACTICAL-I) Practical: 60 Hrs. Marks: 50 (Credits: 02) Course Outcomes: After completion of this practical course, the student will be able to: i) apply sampling techniques in real life. ii) perform classification and tabulation of primary data. iii) represent the data by means of simple diagrams and graphs. iv) summarize data by computing measures of central tendency.   LIST OF PRACTICALS: 1. Classification, tabulation and frequency distribution –I: Qualitative data. 2. Classification, tabulation and frequency distribution –II : Quantitative data. 3. Diagrammatic representation of data by using Pie Diagram and Bar Diagrams. 4. Graphical representation of data by using Histogram, Frequency Polygon, Frequency Curve and     Locating Modal Value. 5. Graphical representation of data by using Ogive Curves and Locating Quartile Values....

Basic Concepts of Probability and Binomial Distribution , Poisson Distribution.

 Probability:  Basic concepts of Probability:  Probability is a way to measure hoe likely something is to happen. Probability is number between 0 and 1, where probability is 0 means is not happen at all and probability is 1 means it will be definitely happen, e.g. if we tossed coin there is a 50% chance to get head and 50% chance to get tail, it can be represented in probability as 0.5 for each outcome to get head and tail. Probability is used to help us taking decision and predicting the likelihood of the event in many areas, that are science, finance and Statistics.  Now we learn the some basic concepts that used in Probability:  i) Random Experiment OR Trail: A Random Experiment is an process that get one or more possible outcomes. examples of random experiment include tossing a coin, rolling a die, drawing  a card from pack of card etc. using this we specify the possible outcomes known as sample pace.  ii)Outcome: An outcome is a result of experi...

B. Com. I Practical No. 4 :Graphical representation of data by using Histogram, Frequency Polygon, Frequency Curve and Locating Modal Value.

Practical No. 4 Graphical representation of data by using Histogram, Frequency Polygon, Frequency Curve and Locating Modal Value   Graphical Representation: The representation of numerical data into graphs is called graphical representation of data. following are the graphs to represent a data i.                     Histogram ii.                 Frequency Polygon    iii.                Frequency Curve iv.        Locating Modal Value i.     Histogram: Histogram is one of the simplest methods to representing the grouped (continuous) frequency distribution. And histogram is defined as A pictorial representation of grouped (or continuous) frequency distribution to drawing a...

Non- Parametric Test: Run Test

Non- Parametric Test  A Non-Parametric tests is a one of the part of Statistical tests that non-parametric test does not assume any particular distribution for analyzing the variable. unlike the parametric test are based on the assumption like normality or other specific distribution  of the variable. Non-parametric test is based on the rank, order, signs, or other non-numerical data. we know both test parametric and non-parametric, but when use particular test? answer is that if the assumption of parametric test are violated such as data is not normally distributed or sample size is small. then we use Non-parametric test they can used to analyse categorical data  or ordinal data and data are obtained form in field like psychology, sociology and biology. For the analysis use the  some non-parametric test that are Wilcoxon signed-ranked test, mann-whiteny U test, sign test, Run test, Kruskal-wallis test. but the non-parametric test have lower statistical power than ...

Time Series

 Time series  Introduction:-         We see the many variables are changes over period of time that are population (I.e. population are changes over time means population increase day by day), monthly demand of commodity, food production, agriculture production increases and that can be observed over period of times known as time series. Time series is defined as a set of observation arranged according to time is called time series. Or a time Series is a set of statistical observation arnging chronological order. ( Chronological order means it is arrangements of variable according to time) and it gives information about variable.  Also we draw the graph of time series to see the behaviour of variable over time. It can be used of forecasting. The analysis of time series is helpful to economist, business men, also for scientist etc. Because it used to forecasting the future, observing the past behaviour of that variable or items. Also planning for future...

Statistical Quality Control

 Statistical Quality Control  Statistical quality control (S. Q. C.) is a branch of Statistics it deals with the application of statistical methods to control and improve that quality of product. In this use statistical methods of sampling and test of significance to monitoring and controlling than quality of product during the production process.  The most important word in statistical Quality control is quality  The quality of product is the most important property while purchasing that product the product fulfill or meets the requirements and required specification we say it have good quality or quality product other wise not quality. Quality Control is the powerful technique to diagnosis the lack of quality in material, process of production.  Causes of variation:   When the product are produced in large scale there are variation in the size or composition the variation is inherent and inevitable in the quality of product these variation are clas...