|
Oyeka ICA1 and Okeh UM2* |
1Department of Applied Statistics, Nnamdi Azikiwe University, Awka, Nigeria |
2Department of Industrial Mathematics and Applied Statistics, Ebonyi State University, Abakaliki, Nigeria |
*Corresponding author: |
Okeh UM
Department of Industrial Mathematics and Applied Statistics Ebonyi State University Abakaliki, Nigeria
E-mail: uzomaokey@ymail.com |
|
|
Received January 16, 2013; Published February 28, 2013 |
|
Citation: Oyeka ICA, Okeh UM (2013) Two Sample Median Tests by Ranks. 2:636 doi:10.4172/scientificreports.636 |
|
Copyright: © 2013 Oyeka ICA, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
|
Abstract |
|
This paper proposes a two sample median test based on the ranks of sample observations drawn from two independent populations, for testing the null hypothesis of equality of two population medians. The populations may be measurements on, as low as the ordinal scale. It is shown that the proposed test statistic is at least as efficient and powerful as the Mann-Whitney U-Test of the same, over all sample size. When the two samples are of equal size, the proposed test statistic may also be used as an improved alternative to the sign test for two independent samples of equal size. These methods are illustrated with some data, and shown to compare favorably with the ordinary sign-test, the median test and Mann-Whitney U-Test for two independent samples. |
|
Keywords |
|
Mann-Whitney U-Test; Ranks; Two sample; Median; Population |
|
Introduction |
|
The median test is a statistical procedure for testing whether two independent populations differ in their measure of central tendency or location. That is median test enables us determine, whether it is likely that two independent or unrelated samples not necessarily of the same size have been drawn from two populations with equal medians. The median test may be used whenever the observations or scores obtained from the two populations are at least on the ordinal scale of measurement. In these situations, the assumptions of normality and homogeneity necessary for the valid use of the parametric ’t’ test may not be satisfied so that parametric tests may not here readily recommend themselves [1]. |
|
However, a problem with the median test is that it is based on only the sign or direction of the observations and not on their magnitudes, thereby leading to some loss of information. A procedure that would use both the direction and magnitudes of the observations is likely to be more powerful, and hence, preferable. We propose to develop such a procedure in this paper based on the Kruskal Wallis-One way analysis of variance test by ranks [2]. |
|
The Proposed Method |
|
Let xij be the ith observation in a random sample of size nj independently drawn from population j for i=1,2,…nj; j=1,2.We assume that the two populations are measured on at least the ordinal scale. To apply the two sample median test by ranks, we first pool the two samples into one combined sample of size |
|
|
|
The observations in the pooled sample are now ranked, either from the largest to the smallest, or from the smallest to the largest. Now under the hypothesis of equal population medians, then in the absence of ties, any one randomly selected observation in the combined sample is as likely to be greater as less than any other observation in the sample, and hence, is equally likely to receive any one of the ranks assigned to the observations, thereby justifying the use of the median ranks test for two populations. Let |
|
(1) |
|
Be the sum of the ranks assigned to observations drawn from population j for j=1,2, with mean rank |
|
(2) |
|
The overall mean rank is |
|
|
|
That is |
|
(3) |
|
The total variance of all the ranks is |
|
|
|
(4) |
|
Now the sum of squared deviations of observed sample or treatment group mean rank from their overall mean rank |
|
(5) |
|
Now the quadratic form, |
|
|
|
That is |
|
(6) |
|
has approximately a chi-square distribution with k-1=2-1=1 degree of freedom for sufficiently large n [3,4], and may be used to test the null hypothesis of equal population medians. The null hypothesis is rejected at the α level of significance if otherwise the null hypothesis is accepted. Note that equation 5 can be alternatively expressed as |
|
|
|
Or when further simplified yields |
|
(7) |
|
Hence, the test statistic of equation 6 can be equivalently written as: |
|
(8) |
|
The test statistic of equation 6 and 8 are sufficiently adequate and yields good results, provided n1 and n2 are each at least 5 [2]. Now if the two sample are equal so that n1=n2=m say, then equation 8 further reduces to |
|
(9) |
|
The numerator of equation 9, which is the square of the difference between the sums of the ranks assigned to observations in each of the two samples, is seen to be the same as the square of the sum of the differences between the ranks assigned to members in each pair of the ‘m’ paired observations in the two samples. Hence, this statistic based on the difference between ranks may be used as an improved alternative to the sign test for two independent samples of equal sizes, in that this test statistic which accounts for both the direction and magnitudes of the observations is likely to be more powerful. |
|
Illustrative Example |
|
The random samples of students who took the course and the later grades they earned yields the following data (Table 1). |
|
The null hypothesis to be tested is that students from the two departments performed equally well in the statistic course; that is, the students earned equal median grades in the course. To test this null hypothesis using the median test by ranks, we would first pool the two samples and then, rank the observations combined from the largest grade A+, assigning it a rank of I, through the lowest grade F assigning it the rank of 25. Tied grades are as usual assigned their mean ranks. |
|
|
Table 1: Later grades by random samples of students in a course in statistics. |
|
|
Results |
|
The results are shown in the second and fourth columns of table 1. Using the rank sums shown in table 1 with n1=11 and n2=14 in equation 8, we have |
|
|
|
which, with one degree of freedom is not statistically significant at the 5% level. It may be instructive to compare the present result with what would have been obtained if the data of table 1 had been analyzed, using the ordinary median test for two independent samples. To do this, we as usual pool the two samples and determine the common median, which is here found to be an A- grade. Now, 5 students in department 1 and 8 in department 2 earned a grade of A- or above, while 6 students in both departments earned below an A- grade. Using these information, we calculate the usual chi-square test statistic for a 2×2 table as |
|
|
|
which, with 1 degree of freedom is also not statistically significant at the 5% level. However, the relative sizes of the chi-square values obtained using the two methods suggest that at least for the present example that as expected, the ordinary median test is likely to lead to an acceptance of a false null hypothesis (Type II error), more frequently, and hence, is likely to be less powerful than the proposed two sample median test by ranks. It may also be instructive to compare the proposed method with the Mann-Whitney U-test. To apply the Mann-Whitney U-test, unit normal z-score, we use the test statistic [5]. |
|
|
|
=0.991(P-value=0.1611), which is also not statistically significant at the 5% level. |
|
Discussion |
|
However, although here the proposed method and the Mann- Whitney U test lead to the same conclusions with the present data, the attained significance levels indicate that the Mann-Whitney U test is likely to lead to an acceptance of a false null hypothesis (Type II Error), more frequently, and is hence, likely to be less powerful that the proposed two sample median test by ranks. Note also that the variance of the Mann-Whitney U Test statistic given in equation 10 is larger than the variance of the proposed two sample median test statistic, by ranks of equation (6) for all values of n1,n2>2. In other words, the ratio of the variance of the Mann-Whitney U statistic to the variance of the proposed test statistic: |
|
|
|
To illustrate the use of two sample median tests by ranks, to analyze two independent samples of equal size that may often be used with the two-sample sign test, we use the following data on family size preferences by newly married couples in a certain community. The data in table 2 show the family size preferences by a random sample of newly married couples, when both husband and wife of each couple were asked to state the number of children they would like to have. |
|
|
Table 2: Family size preferences by a random sample of newly married couples. |
|
|
To use the two sample median test by ranks with the data of table 2, we first pool the two samples into one combined sample of size 2m=12+12=24, and then rank the combined observations from the smallest (0), which is assigned a rank of 1 through the largest value (9), which is assigned the rank of 24. All tied observations are as usual assigned their mean ranks. The results are shown in third and fifth columns of table 2. It is seen from this table that the sum of assigned ranks for husbands is R.1=166.5, and that for wives is R. 2=166.5. Now, using these values with n1=n2=m=12 in equation 9 yields which with 1 degree of freedom is not statistically significant at the 5% significance level. To compare the result obtained with this method, with what would have been obtained if the sign test for two independent samples of equal size, had been used to analyze the data of table 2; we see from the last column of this table that there are 4 tied observations, that is 4 couples in which husband and wife both have the same family size preferences. Also, there are 2 plus signs and 6 minus signs, given an effective sample size of n=12-4=2+6=8, for the use of the sign test. Hence, since the less frequently occurring sign is the ‘+’ sign with a frequency of 2, we let X be the number of + signs out of a total of n=8 possible + and − signs, with a probability of P=0.5 of occurrence, thereby obtaining |
|
Since we again do not reject the null hypothesis that newly married husbands and wives do not differ in their family size preferences. However, because as seen above, the attained significance level using the sign test is as expected less than that obtained using the proposed method, the sign test is again likely to lead to an acceptance of a false null hypothesis (Type II Error), more frequently, and is hence, likely to be less powerful than the two sample median test by ranks. |
|
Summary and Conclusion |
|
We have discussed above a method of analyzing two sample data using median test by ranks. It is shown that the proposed test statistic is at least as efficient as the Mann-Whitney U test of equivalent sample size. When the two samples are of equal sizes, the proposed test statistic may also be used as an important alternative to the sign test for two independent samples of equal sizes. The proposed methods are illustrated with some data, and shown to be more powerful than the existing methods used for the same purpose. |
|
|
References |
|
- Oyeka ICA, Utazi CE, Nwosu CR, Ebuh GU, Ikpegbu PA, et al. (2010) A statistical comparison of Test scores for non-parametric approach. Journal of Mathematics Sciences 21: 77-87.
- Gibbons JD (1971) Nonparametric statistical inferences. McGraw Hill, New York, USA.
- Freund JE (1992)
- Hollander M, Wolfe DA (1999)
- Oyeka CA (2009) An introduction to applied statistical methods. (4th Edn), Nobern Avocation Publishing Company, Enugu, Nigeria.
|
|
|