We begin with formulating the following multihypothesis testing problem for a general non-i.i.d stochastic model. Let
,
, be a filtered probability space with standard assumptions about the monotonicity of the sub-
-algebras
. The sub-
-algebra
of
is assumed to be generated by the sequence
observed up to time
n, which is defined on the space
. The hypotheses are
,
, where
are given probability measures assumed to be locally mutually absolutely continuous, i.e., their restrictions
and
to
are equivalent for all
and all
,
. Let
be a restriction to
of a
-finite measure
Q on
. Under
the sample
has a joint density
with respect to the dominating measure
for all
, which can be written as
where
,
are corresponding conditional densities.
3.1.1. Asymptotic Optimality of Walds’s SPRT
Assume first that
, i.e., that we are dealing with two hypotheses
and
. In the mid 1940s, Wald [
11,
12] introduced the
Sequential Probability Ratio Test (SPRT) for the sequence of i.i.d. observations
, in which case
in (
30) and the LR
is
After
n observations have been made Wald’s SPRT prescribes for each
:
where
are two thresholds.
Let
be the LLR for the observation
, so the LLR for the sample
is the sum
Let
and
. The SPRT
can be represented in the form
In the case of two hypotheses, the class of tests (
31) is of the form
i.e., it upper-bounds the probabilities of errors of Type 1 (false positive)
and Type 2 (false negative)
, respectively.
Wald’s SPRT has an extraordinary optimality property: it minimizes both expected sample sizes
and
in the class of sequential (and non-sequential) tests
with given error probabilities as long as the observations are i.i.d. under both hypotheses. More specifically, Wald and Wolfowitz [
13] proved, using a Bayesian approach, that if
and thresholds
and
can be selected in such a way that
and
, then the SPRT
is strictly optimal in class
. A rigorous proof of this fundamental result is tedious and involves several delicate technical details. Alternative proofs can be found in [
14,
15,
16,
17,
18,
19].
Regardless of the strict optimality of SPRT which holds if, and only if, thresholds are selected so that the probabilities of errors of SPRT are exactly equal to the prescribed values
, which is usually impossible, suppose that thresholds
and
are so selected that
Then
where
and
are Kullback-Leibler (K-L) information numbers so that the following asymptotic lower bounds for ESS are attained by SPRT:
(cf. [
6]). Hereafter
. The following inequalities for the error probabilities of the SPRT hold in the most general non-i.i.d. case
These bounds can be used to guarantee asymptotic relations (
33).
In the i.i.d. case, by the SLLN, the LLR
has the following stability property
This allows one to conjecture that if in the general non-i.i.d. case the LLR is also stable in the sense that the almost sure convergence conditions (
36) are satisfied with some positive and finite numbers
and
, then the asymptotic formulas (
34) still hold. In the general case, these numbers represent the local K–L information in the sense that often (while not always)
and
. Note, however, that in the general non-i.i.d. case the SLLN does not even guarantee the finiteness of the expected sample sizes
of the SPRT, so some additional conditions are needed, such as a certain rate of convergence in the strong law, e.g., complete or quick convergence.
In 1981, Lai [
8] was the first who proved asymptotic optimality of Wald’s SPRT in a general non-i.i.d. case as
. While the motivation was near optimality of invariant SPRTs with respect to nuisance parameters, Lai proved a more general result using the
r-quick convergence concept. Specifically, for
and
, define
(
) and suppose that
for some
and every
, i.e., that the normalized LLR converges
r-quickly to
under
and to
under
:
Strengthening the a.s. convergence (
36) into the
r-quick version (
37), Lai [
8] established first-order asymptotic optimality of Wald’s SPRT for moments of the stopping time distribution up to order
r: If thresholds
,
in the SPRT are so selected that
and asymptotics (
33) hold, then as
,
Wald’s ideas have been generalized in many publications to construct sequential tests of composite hypotheses with nuisance parameters when these hypotheses can be reduced to simple ones by the principle of invariance. If
is the maximal invariant statistic and
is the density of this statistic under hypothesis
, then the invariant SPRT is defined as in (
32) with the LLR
. But even if the observations
are i.i.d. the invariant LLR statistic
is not a random walk anymore and Wald’s methods cannot be applied directly. Lai [
8] has applied the asymptotic optimality property (
38) of Wald’s SPRT in the non-i.i.d. case to investigate optimality properties of several classical invariant SPRTs such as the sequential
t-test, the sequential
-test, and Savage’s rank-order test.
In the sequel, the case where the a.s. convergence in the non-i.i.d. model (
36) holds with the rate
we will call
asymptotically stationary. Assume now that (
36) is generalized to
where
is a positive increasing function. If
is not linear, then this case will be referred to as the
asymptotically non-stationary. A simple example where this generalization is needed is testing
versus
regarding the mean of the normal distribution:
where
is a zero-mean i.i.d. standard Gaussian sequence
and
is a polynomial of order
. Then
for large
n, so
and
in (
39). This example is of interest for certain practical applications, in particular, for the recognition of ballistic objects and satellites [
19].
Tartakovsky et al. [
6] generalized Lai’s results for the asymptotically non-stationary case. Write
for the inverse function for
.
Theorem 4.
Assume that there exist finite positive numbers and and an increasing nonnegative function such that the r-quick convergence conditions
hold. If thresholds and are selected so that and and , then, as ,
This theorem implies that the SPRT asymptotically minimizes the moments of the stopping time distribution up to the order r.
The proof of this theorem is performed in two steps which are related to our previous discussion of the rates of convergence in
Section 2. The first step is to obtain the asymptotic lower bounds in class
:
These bounds hold whenever the following right-tail conditions for the LLR are satisfied:
Note that by Lemma 1 these conditions are satisfied when the SLLN (
39) holds so that the almost sure convergence (
39) is sufficient. However, as we already mentioned, the SLLN for the LLR is not sufficient to guarantee even the finiteness of the SPRT stopping time.
The second step is to show that the lower bounds are attained by the SPRT. To do so, it suffices to impose the following additional left-tail conditions:
for all
. Since both right-tail and left-tail conditions hold if the LLR converges
r-completely to
,
and since
r-quick convergence implies
r-complete convergence (see (
13)), we conclude that the assertions (
40) hold.
Remark 4. In the i.i.d. case, Wald’s approach allows us to establish asymptotic equalities (40) with and being K-L information numbers under the only condition of finiteness . However, Wald’s approach breaks down in the non-i.i.d. case. Certain generalizations in the case of independent but non-identically and substantially non-stationary observations, extending Wald’s ideas, have been considered in [19,20,21,22]. Theorem 4 covers all these non-stationary models.
Fellouris and Tartakovsky [
23] extended previous results on asymptotic optimality of the SPRT to the case of multistream hypothesis testing problem when the observations are sequentially acquired in multiple data streams (or channels or sources). The problem is to test the null hypothesis
that none of the
N streams is affected against the composite hypothesis
that a subset
is affected. Two sequential tests were studied in [
23] – the Generalized Sequential Likelihood Ratio Test and the Mixture Sequential Likelihood Ratio Test. It has been shown that both tests are first-order asymptotically optimal, minimizing moments of the sample size
and
for all
up to order
r as
in the class of tests
where
is the distribution of observations under hypothesis
and
is a class of subsets of
that incorporates prior information which is available regarding the subset of affected streams, e.g., not more than
streams can be affected.
1 The proof is essentially based on the concept of
r-complete convergence of LLR with the rate
. See also Chapter 1 in [
5].
3.1.2. Asymptotic Optimality of the Multihypothesis SPRT
We now return to the multihypothesis model with
that we started to discuss at the beginning of this section (see (
30) and (
31)). The problem of sequential testing of many hypotheses is substantially more difficult than that of testing two hypotheses. For multiple-decision testing problems, it is usually very difficult, if even possible, to obtain optimal solutions. Finding an optimal non-Bayesian test in the class of tests (
31) that minimizes ESS
for all hypotheses
,
is not manageable even in the i.i.d. case. For this reason, a substantial part of the development of sequential multihypothesis testing in the 20th century has been directed towards the study of certain combinations of one-sided sequential probability ratio tests when observations are i.i.d. (see, e.g., [
24,
25,
26,
27,
28,
29]).
We will focus on the following first-order asymptotic criterion: Find a multihypothesis test
such that for some
where
.
In 1998, Tartakovsky [
4] was the first who considered the sequential multiple hypothesis testing problems for general non-i.i.d. stochastic models following Lai’s idea of exploiting the
r-quick convergence in the SLLN for two hypotheses. The results have been obtained for both discrete and continuous-time scenarios and for the asymptotically non-stationary case where the LLR processes between hypotheses converge to finite numbers with the rate
. Two multihypothesis tests were investigated: (1) The
Rejecting test which rejects the hypotheses one by one and the last hypothesis, which is not rejected, is accepted, and (2) The
Matrix Accepting test that accepts a hypothesis for which all component SPRTs that involve this hypothesis vote for accepting it. We now proceed with introducing this accepting test which we will refer to as the
Matrix SPRT (MSPRT). In the present article, we do not consider the continuous-time scenarios. Those who are interested in continuous time we refer to [
4,
6,
20,
22,
30].
Write
. For a threshold matrix
, with
and the
are immaterial (say 0), define the Matrix SPRT
, built on
one-sided SPRTs between the hypotheses
and
, as follows:
and accept the unique
that satisfies these inequalities. Note that for
the MSPRT coincides with Wald’s SPRT.
In the following, we omit the superscript
N in
for brevity. Obviously, with
, the MSPRT in (
42) can be written as
Introducing the Markov accepting times for the hypotheses
as
the test in (
43)–(44) can be also written in the following form:
Thus, in the MSPRT, each component SPRT is extended until, for some
, all
N SPRTs involving
accept
.
Using Wald’s likelihood ratio identity, it is easily shown that
for
,
, so selecting
implies
. These inequalities are similar to Wald’s ones in the binary hypothesis case and are very imprecise. In his ingenious paper, Lorden [
28] showed that with a very sophisticated design that includes accurate estimation of thresholds accounting for overshoots, the MSPRT is nearly optimal in the third-order sense, i.e., it minimizes ESS for all hypotheses up to an additive disappearing term:
as
. This result holds only for i.i.d. models with the finite second moment
. In non-i.i.d. cases (and even for i.i.d. for higher moments
), there is no way to obtain such a result, so we focus on the first-order optimality (
41).
The following theorem establishes asymptotic operating characteristics and optimality of MSPRT under the r-quick convergence of to finite K-L-type numbers , where is a positive increasing function, .
Theorem 5 (MSPRT asymptotic optimality).
Assume that there exist finite positive numbers , , and an increasing nonnegative function such that for some
Then the following assertions are true.
-
(i)
-
(ii)
If the thresholds are so selected that and , in particular as , then for all
Assertion (ii) implies that the MSPRT minimizes asymptotically the moments of the stopping time distribution up to order r for all hypotheses in the class of tests .
Remark 5.
Both assertions of Theorem 5 are correct under the r-complete convergence
i.e., whenever
While this statement was not proved anywhere so far, it can be easily proved using the methods developed for multistream hypothesis testing and changepoint detection [5].
Remark 6.
As the example given in Subsection 3.4.3 of [6] shows, the r-quick convergence conditions in Theorem 5 (or corresponding r-complete convergence conditions for LLR processes) cannot be generally relaxed into the almost sure convergence
However, the following weak asymptotic optimality result holds for the MSPRT under the a.s. convergence: if the a.s. convergence (50) holds with the power function , , then for every ,
whenever thresholds are selected as in Theorem 5(ii).
Note that several interesting statistical and practical applications of these results to invariant sequential testing and multisample slippage scenarios are discussed in Sections 4.5 and 4.6 of Tartakovsky et al. [
6] (see Mosteller [
31] and Ferguson [
16] for terminology regarding multisample slippage problems).