A new method for the Behrens-Fisher Problem
∗
Xu-Qing Liu
a †
, Hong-Yan Jiang
a
, Jian-Ying Rong
b
a. Department of Computing Science, Huaiyin Institute of Technology, Huai’an 223003, PR China;
b. Department of Foundation Courses, Huai’an College of Information Technology, Huai’an 223003, PR China
Abstract In this paper, the well-known Behrens-Fisher problem is revisited and an exact and feasible method of dealing
with it is given by shrinking the larger sample size based on the notion of linear sufficiency. In addition, we apply the main
results to the example of Lehmann’s data and show that the approach used in the paper is acceptable in practice and can
be as an alternative to other existing methods when handling the Behrens-Fisher problem. Finally, two unsolved questions
are mentioned.
MSC 2000: 62D05; 62F25; 62H10; 62G05; 62J12
Keywords: Behrens-Fisher problem; Difference between the means; Sampling distribution Theorem; Confidence interval;
Hypothesis testing Linear model; Best linear unbiased estimate; Linear sufficiency; Mutual independence.
1 Introduction
The well-known Behrens-Fisher problem has been studied several decades and there were numerous contri-
butions to it in the literature. See [
13, 16, 2] and [7, 4] for example. Later, generalized and multivariate cases
were considered by many authors; cf. [
6, 3] and [8, 17, 9], respectively, and the references therein. As an open
problem, it concerns the confidence intervals and the hypothesis testing for the difference between the means
of two particular normal populations without assuming that the ratio of the variances is known. However, it
has not received a complete solution of a real sense. The assertion of [
18] that “Though there are many
approximate solutions (such as Welch’s t-test), the problem continues to attract attention as one of the
classic problems in statistics” motivates us to make a further effort on this problem.
Under this setting, we shall present an exact method of handling the Behrens-Fisher problem based on the
theory of linear models. It seems that each confidence interval of the difference between the two means is given
by means of the difference between the respective sample means as the central point. The reasons for that will
be firstly explained in Section
2. Afterwards, we derive two new sampling distribution theorems in Section 3,
in which one is concerned with the case that the sample sizes of the two normal populations are equal and the
other is concerned with the case that the sample sizes are nonequal. Then, the statistical inferences about the
Behrens-Fisher problem is given in Section
4, including an example that illustrates the results derived in the
paper.
2 Preliminary: best linear unbiased estimate
Suppose X
1
, X
2
, ··· , X
n
1
and Y
1
, Y
2
, ··· , Y
n
2
are two independent simple random samples drawn from the
normal populations, N(µ
x
, σ
2
x
) and N(µ
y
, σ
2
y
), respectively, where σ
2
x
and σ
2
y
are unknown variances without
assuming that the ratio of them is known. In the literature, the difference between the two sample means,
X −Y,
was often utilized to make inferences for µ
x
− µ
y
without telling the reasons.
In general, the starting point may be that X and Y are the best linear unbiased estimate (BLUE) for µ
x
and
µ
y
, respectively. Because of that, we give this section, in which we prove rigorously that
X − Y is just the
∗
This research was supported by Grants HGQ0637 and HGQN0725 and the “Green & Blue Project” Program for 2008 to Cultivate
Young Core Instructors from Huaiyin Institute of Technology.
†
Corresponding author.
Email address:
liuxuqing688@gmail.com (X.Q. Liu).
1
http://www.paper.edu.cn