A peer-reviewed article of this preprint also exists.
Submitted:
02 July 2025
Posted:
03 July 2025
You are already at the latest version
Abstract
This paper proposes a closed adaptive sequential procedure for selecting a random-sized subset of size $t(>0)$among $k (\ge t)$ experimental treatments so that the selected subset contains all treatments superior to the control treatment. All the experimental treatments and the control are assumed to produce two binary endpoints, and the procedure is based on those two binary endpoints. A treatment is considered superior if its both endpoints are larger than those of the control. While responses across treatments are assumed to be independent, dependence between endpoints within each treatment is allowed and modeled via an odds ratio. The proposed procedure comprises explicit sampling, stopping, and decision rules. We show that for any sample size n and any parameter configuration, the sequential procedure maintains the same probability of correct selection as the corresponding fixed-sample-size procedure. We use the bivariate binomial and multinomial distributions in the computation and derive design parameters under three scenarios: (i) independent endpoints, (ii) dependent endpoints with known association, and (iii) dependent endpoints with unknown association. We provide tables with the sample size savings achieved by the proposed procedure compared to its fixed-sample-size counterpart. Examples are given to illustrate the procedure.
In Phase II clinical trials, new treatments are evaluated with respect to both efficacy and safety. We propose a closed adaptive sequential procedure for comparing experimental treatments against a control treatment. Each treatment, including the control, is assessed using two binary endpoints: one for efficacy and one for safety. An experimental treatment is considered superior to the control if it demonstrates higher success probabilities in both endpoints. Let the control treatment be denoted by , and the k experimental treatments by . The outcomes for each treatment consist of two binary endpoints that are modeled marginally as Bernoulli random variables with unknown success probabilities. Specifically, for the control, the success probabilities are denoted by , and for each experimental treatment , the corresponding success probabilities are . Comparisons are made between each experimental treatment and the control by evaluating whether and . The proposed procedure incorporates curtailment, allowing for early termination of sampling when sufficient evidence has been gathered to make a decision, thus potentially stopping before reaching the maximum sample size N. The term closed refers to the presence of this upper limit N, while the procedure is adaptive in the sense that the decision to continue sampling depends dynamically on the outcomes observed thus far.
The problem of selecting the best among k () treatments with Bernoulli outcomes, or comparing k Bernoulli experiments with each other and with a control or standard, has a long-standing history in selection theory, particularly with applications in medical trials and the pharmaceutical industry. Sobel and Huyett (1957) proposed a fixed-sample-size procedure for identifying the best Bernoulli population based on the Indifference Zone approach. Later, Gupta and Sobel (1958) introduced a Subset Selection method to select a group that includes the best Bernoulli population. Dunnett (1984) and Thall, Simon, and Ellenberg (1988, 1989) also employed the Indifference Zone framework to select the best among k Bernoulli experiments and compare the selected treatment with a control or standard. Notably, all of these works considered only a single Bernoulli endpoint and were formulated under fixed-sample-size, non-curtailed sampling schemes.
In contrast, this paper investigates a curtailed selection procedure involving treatments with two Bernoulli endpoints. In a curtailed procedure, the experimenter (1) sets an upper limit N on the number of observations per treatment, and (2) continues sampling sequentially from each treatment until either there is sufficient evidence that a treatment is no longer a contender, or the maximum sample size N is reached. The notion of a “contending treatment” is formally defined in Procedure R (Section 3). This early stopping mechanism allows for a potential reduction in the total number of observations required.
Curtailment has previously been applied in clinical trials with Bernoulli outcomes, primarily in the context of hypothesis testing (e.g., Carsten and Chen, 2016) and selection procedures (e.g., Bechhofer and Kulkarni, 1982; Jennison, 1983; Buzaianu and Chen, 2008). However, these works are limited to the case of a single Bernoulli endpoint. The procedure proposed in this paper extends the concept of curtailment to treatment comparisons involving two Bernoulli endpoints. For related work, see Jennison and Turnbull (1993) for normally distributed outcomes, and Bryant and Day (1995), Conway and Petroni (1995), and Chen and Chi (2012) for designs with binary outcomes.
More recently, Buzaianu et al (2022) discussed a curtailed procedure for subset selection involving two binary Bernoulli endpoints. However, their approach compares each experimental treatment to a well-established standard treatment. This design is most appropriate when a widely accepted reference treatment exists. In contrast, our procedure compares new treatments against a control treatment, which may or may not be a recognized standard.
There are many situations in which our approach is more applicable. For example, in the absence of a universally accepted standard treatment—such as when placebo is the only baseline option—it becomes necessary to evaluate new treatments in relation to a control. Similarly, even when a standard treatment exists, it may have been validated only in limited populations (e.g., specific age groups, races, or genders). In such cases, it is important to assess whether the standard treatment continues to perform well in broader or different patient populations. Our design, which explicitly includes the control treatment in the experiment, enables such comparisons and provides a flexible and inclusive framework for decision-making.
This paper addresses the two-endpoint problem using a subset selection approach. We introduce a curtailed, closed sequential procedure in which the total number of observations drawn from each of the contending populations (including the control) is a bounded random variable. We assume that the time between treatment administration and observation of the response is short relative to the overall duration of the experiment. The proposed curtailed procedure uses the fixed-sample-size method as a reference, which will be described in the following sections. We demonstrate that this closed sequential procedure maintains the same probability of correct selection as the fixed-sample-size procedure, while reducing the number of observations drawn from inferior treatments. Section 2 outlines the assumptions, objectives, and probability requirements for two-endpoint clinical trials. Section 3 presents the fixed-sample-size procedure that serves as the benchmark for evaluating the performance of the proposed curtailed procedure.
In Section 4, we propose a sequential selection procedure with curtailment to achieve our objective. We show that the probability of correct selection under the proposed procedure is equal to that of the corresponding fixed-sample-size procedure, uniformly over the parameter space. In Section 5, we evaluate the performance of the proposed curtailed procedure in comparison to its non-curtailed counterpart, with respect to expected sample size. Section 6 provides two numerical examples to illustrate the application of the proposed method. Finally, concluding remarks are presented in Section 7.
2. Assumptions, Goal, and Probability Requirements
Suppose that n independent subjects are assigned to a treatment, and that two binary endpoints—typically representing therapeutic efficacy (“response”) and safety (“nontoxicity”)—are observed for each subject. Following the notation of Conway and Petroni (1995), let denote the number of subjects classified as outcome i on the first endpoint and outcome j on the second endpoint, where and , with 1 representing “success” and 2 representing “failure.” The resulting data can be summarized in a contingency table (see Table 1).
We assume that the random vector follows a multinomial distribution with probabilities , where:
is the probability of success on both endpoints,
is the probability of success on endpoint 1 and failure on endpoint 2,
is the probability of failure on endpoint 1 and success on endpoint 2,
is the probability of failure on both endpoints.
Let and represent the number of successes on endpoints 1 and 2, respectively. The marginal probabilities of success are given by and , respectively. Consequently, and . We denote the binomial probability mass function with parameters n and p by .
The joint distribution of depends not only on and , but also on the association between the two endpoints. To quantify this association, we use the odds ratio , which is a natural and widely used measure in tables. Notably, is independent of the marginal probabilities and . When , the two endpoints are independent; indicates a positive association, and indicates a negative association.
Table 1.
Classification Table.
Table 1.
Classification Table.
Second Endpoint
First Endpoint
1
2
1
2
n
In this paper, we compare the two binary endpoints of each experimental treatment to those of a control treatment, whose success probabilities on the efficacy and safety endpoints are denoted by and , respectively. Let , for , represent the k experimental treatments under investigation, and let denote the control treatment. Each treatment is associated with two binary outcomes. To distinguish between the two endpoints within each treatment, we use a second subscript j in the notation, where corresponds to the efficacy endpoint and to the safety endpoint. Thus, the success probabilities for treatment are denoted by and for the efficacy and safety endpoints, respectively, for . We assume that the treatments are mutually independent, meaning that responses across different treatments are independent. However, responses within a single treatment may exhibit association between the two endpoints. To classify treatments based on their performance, we partition the parameter space using four prespecified constants: and . These constants satisfy the conditions with , and with . In this framework, a treatment is considered ineffective if or , and considered effective if and , where and are success probabilities of the control treatment, and we assume that these two probabilities are known prior to conducting the selection procedure. Our objective is to classify the k experimental treatments into two groups: those that are effective and those that are ineffective. We now describe the formal selection goal.
Our Goal: Select a subset consisting of those treatments for which and ; that is, include all experimental treatments that demonstrate superiority over the control treatment with respect to both efficacy and safety. If no such treatment exists—i.e., if no satisfies both and —then none of the k experimental treatments should be selected.
Our probability requirements: Let and be pre-specified probability constants satisfying and . The probability requirements for the selection procedure are defined as follows:
and
Let denote the event that the selected subset correctly includes all effective treatments, provided such treatments exist. Specifically, occurs when every treatment satisfying and is included in the selected subset. Similarly, let denote the event that no treatment is selected when none are truly effective. That is, occurs if or holds for all . Under this framework, the selection procedure is required to satisfy the following probability criteria:
where and are prespecified thresholds that represent the minimum acceptable probabilities for correctly identifying effective treatments and correctly excluding ineffective treatments, respectively.
Remark 2.1:
When effective experimental treatments exist, a correct selection is made if the selected subset includes all such effective treatments. The rationale for selecting a subset—rather than identifying a single best treatment—is that no natural ordering can be established among the pairs of success probabilities unless one endpoint is explicitly prioritized over the other. Since this paper does not assume any preference between the two endpoints, we adopt a subset selection approach.
3. Fixed Sample Size Procedure
In this section, we first present the fixed-sample-size selection procedure, which serves as a reference for the curtailed procedure introduced in Section 4. This fixed-sample-size procedure was derived by Buzaianu et al. (2025). We also include results related to the derivation of the design parameters that ensure the fixed-sample-size procedure satisfies the probability requirements stated in Conditions 2.1 and 2.2.
For prespecified design parameters , the selection procedure is defined as follows:
Procedure H:
Take n observations from each of the k Bernoulli experimental treatments and the control treatment. Let and be the numbers of successes from the first and second endpoints of treatment . For positive integers and , Procedure H is defined as follows:
(1) Include in the subset all the treatments with and ;
(2) If there is no treatment so that and do not select any experimental treatment.
Typically, ranking and selection problems are solved by obtaining an analytical expression for the probability of a correct selection and then finding the least favorable configuration(LFC), that parameter configuration where the is minimized. Then design parameters are obtained by setting to be at least some pre-specified value . In this subset selection problem, it was not possible to derive an expression for the . Instead, a lower bound for was derived, along with the parameter configuration that minimizes this bound. Then if the minimum value of this lower bound is higher than , the will be at least for any parameter configuration.
We denote by the configuration where , and by the parameter configuration where , . Then and are the configurations under which the lower bounds and of the probabilities of correct selections and , respectively, were computed. also depends on the odds ratios between the two endpoints of each of the k treatments, while does not. We assume that there is the same association between the two endpoints of each of the k treatments. Three cases were considered: independent endpoints, dependent endpoints with known association and endpoints with unknown association. When the association is not known, it was shown that the minimum of is attained when the odds ratio is zero. However, numerical computations showed that the sample size varies very little with the odds ratio. Below we state the theorems on the lower bounds and of the probabilities of correct selections and , respectively, whose profs were given in Buzaianu et al (2025).
Case 1:
, . We first consider the case of two independent endpoints. That is, we assume , In this case, and are independent random variables following binomial distributions, with parameters and , respectively,
Theorem 1.
For fixed , , the probability requirements are satisfied by choosing values of that simultaneously satisfy
and
.
Case 2:
specified We now consider the case when the endpoints of each of treatment are dependent with known association.
Theorem 2.
For fixed values of , the probability requirements are satisfied by choosing values of that simultaneously satisfy
and
.
where , , , .
Case 3:
unspecified , specified. We now consider the case when the endpoints of each of tested treatment have unknown association.
Theorem 3.
For fixed , , the probability requirements are satisfied by choosing values of that simultaneously satisfy
and
where
,
and
Remark 1:
The lower bound on depends only on the odds ratio for the control treatment.
Remark 2:
Buzaianu et al. (2025) demonstrated that increases with the odds ratios of the experimental treatments , for . Therefore, the minimum value of is achieved when the odds ratios of all experimental treatments are zero. To obtain a lower bound for , we evaluated it under the assumption that all treatments tested have odds ratios equal to zero. As a result, the scenario with unspecified odds ratios is effectively handled by considering the scenario where the odds ratios of the tested treatments are zeros.
Remark 3:
Our results are derived under the assumption that the association between the two endpoints is of the same type for each of the k treatments. For example, either the two endpoints are independent for all k treatments, or they are dependent with a known form of association for all k treatments. However, based on the structure of our derivations, scenarios in which treatments exhibit different types of associations between the two endpoints—such as independence for some treatments and unknown dependence for others—can be readily accommodated.
4. Proposed Curtailment Procedure
We propose a curtailed procedure to achieve the objective outlined in Section 2. The proposed procedure, denoted by R, is a sequential method that employs curtailment to reduce the sample size for treatments that are either clearly inferior or sufficiently effective. Let n denote the maximum number of observations per treatment that the experimenter is permitted to collect.
Curtailment Procedure R:
A contending treatment is a treatment that has not been eliminated from the experiment. Procedure R begins with all k+1 populations being the contending populations. We will use the vector-at-a-time sampling rule. By "Step M", where , we mean that a total of M vectors have been sampled thus far. Let and respectively denote the numbers of successes from the two endpoints of through Step M.
Sampling Rule. We use a vector-at-a-time sampling rule with the following restrictions:
(a) At most n observations can be taken from each of the populations. Observations are taken from each contending treatment one at a time until either the total number of observations from that treatment reaches n, or the treatment is eliminated according to conditions (b) or (c) below.
(b) At any step M, if the number of successes for the two endpoints, and , of treatment satisfy
then eliminate treatment and stop sampling from it.
(c) At any step M, if the number of successes for the two endpoints, and , of treatment satisfy
then stop sampling from treatment .
Stopping Rule:
Stop the experiment at the first step M when any of the following three conditions is satisfied:
(i) There exists a partition of the set such that:
(ii) For all ,
(iii) .
Decision Rule:
(a) If the sampling stops according to (i) of the above Stopping Rule, we include in the selected subset all in A.
(b) If the sampling stops according to (ii) of the above Stopping Rule, we declare that no experimental treatment is significantly better than the control treatment .
(c) If the sampling stops according to (iii) of the above Stopping Rule, we include in the selected subset all the treatments with and . If the selected subset is an empty set, we declare that no experimental treatment is significantly better than the control treatment .
Theorem 4.
For given k and n, both H and R select the same subset of k experimental treatments if both use the same and . The result is uniform in .
Proof.
Decision Rule (c) of Procedure R, which is applied if and only if sampling stops according to Stopping Rule (iii), is identical to the decision rule of Procedure H when . Thus, the same subset will be selected by R and H for any sampling outcome in which a total of n observations is taken from each of the k treatments. Therefore, we only need to consider the case in which the decision under R is made according to Decision Rule (a) or (b).
Decision Rules (a) and (b) are invoked if and only if sampling stops according to Stopping Rule (i) or (ii), respectively. Note that whenever the sampling stops due to Stopping Rule (i) or (ii). When this occurs, we have .
If sampling stops according to Stopping Rule (i), then under Procedure R, Decision Rule (a) selects the subset consisting of treatments such that
Now suppose the experiment were to continue as it would under Procedure H. Let and denote the total number of successes for treatment at endpoints 1 and 2, respectively, after n observations. Then, by Rule (2) of Procedure H, treatment would be selected if
Observe that:
Hence, the same subset of treatments would be selected by Procedure H.
Similarly, if sampling stops according to Stopping Rule (ii), then Decision Rule (b) of Procedure R selects no experimental treatment. This is exactly the same decision that would be made by Rule (2) of Procedure H.
This completes the proof of the theorem. □
5. Tables
In this section, we evaluate the performance of the curtailment procedure in terms of sample size savings relative to the corresponding non-curtailment procedure H. We assume the same association structure between the two endpoints for each of the treatments. However, based on our results, parameter derivations for the curtailment procedure can also be extended to scenarios where different treatments exhibit different associations between the two endpoints.
We consider the following cases: , , , , , , , , , and , which follow the settings used by Buzaianu et al. (2025) to create Table 2, Table 3, Table 4, Table 5, Table 6 and Table 7 for the fixed sample size procedure H. For each case, they first determined the minimum number of observations per treatment, n, and the associated values that satisfy the required probability constraints. If multiple combinations met these constraints, the design yielding the highest probability of selecting a single effective treatment was chosen. This n is then used as the maximum number of observations per treatment under the curtailment procedure R. According to Theorem 4.1, with this choice of n and , the curtailment procedure satisfies the same probability requirements as procedure H.
We denote by N the total number of observations required by the fixed sample size procedure to satisfy the probability requirements, where . This N also serves as the upper bound on the total number of observations under the curtailment procedure. Let and denote the expected sample sizes for the curtailment procedure R under configurations and defined in Section 3. These configurations were used to compute the lower bounds and for selecting a correct subset under the alternative and null hypotheses, respectively, and were used to derive the design parameters for the fixed sample size procedure H.
We define the average expected sample size under the curtailment procedure as , following the approach of Rhall, Simon, and Ellenberg, to account for performance under both configurations. The quantities and are estimated via simulation (10,000 repetitions), implemented in R. To generate bivariate binary data with marginal probabilities , and odds ratio , we compute:
and then simulate binary outcomes from a table with cell probabilities .
Table 2, Table 3, Table 4, Table 5, Table 6 and Table 7 report, for each specification, the total sample size N required by the fixed sample size procedure, the expected sample sizes and for the curtailment procedure, and the percentage of observations saved using curtailment: . It is evident that the curtailment procedure R requires substantially fewer observations than procedure H to satisfy the same performance criteria.
Table 2, Table 3, Table 4, Table 5, Table 6 and Table 7 also demonstrate that the odds ratio has little impact on the expected sample size when values are relatively close, under both and , for the curtailment procedure. Greater variability is observed under . This pattern is consistent with the findings of Chen and Chi (2012), who considered only moderate odds ratios () in the context of hypothesis testing, and observed minimal sample size variation under curtailment, with larger variability under the null hypothesis. In our procedure, however, when varies substantially—for example, from 1 to 100—we observe a marked decrease in expected sample size under both and . Chen and Chi (2012) did not report average expected sample sizes, but instead presented results under both the null and alternative hypotheses and calculated percentage savings. Their findings indicated modest savings under the alternative and substantial savings under the null, which align with our observations.
Table 2.
Design parameters when .
Table 2.
Design parameters when .
n
N
0.4
0
81
14
13
243
195.96
232.41
214.18
11.86
0.01
81
14
13
243
196.61
232.10
214.36
11.79
0.1
78
14
12
234
190.27
222.84
206.55
11.73
1
77
14
12
231
189.83
219.01
204.42
11.51
2
75
13
12
225
186.30
212.83
199.56
11.3
4
75
13
12
225
187.11
212.41
199.76
11.22
8
71
13
11
213
177.07
200.73
188.90
11.32
100
69
12
11
207
174.27
194.16
184.22
11.01
0.5
0
80
14
13
240
193.65
229.38
211.52
11.87
0.01
80
14
13
240
193.97
229.26
211.61
11.83
0.1
77
14
12
231
187.35
220.16
203.76
11.79
1
76
14
12
228
186.77
216.31
201.54
11.61
2
74
13
12
222
183.31
210.23
196.77
11.37
4
74
13
12
222
184.22
209.72
196.97
11.27
8
70
13
11
210
174.25
198
186.12
11.37
100
67
12
11
201
169.16
188.56
178.86
11.01
0.6
0
75
14
12
225
180.99
215.14
198.06
11.97
0.01
75
14
12
225
181.04
215.12
198.08
11.97
0.1
73
13
12
219
177.48
208.88
193.18
11.79
1
73
13
12
219
179.61
207.89
193.75
11.53
2
71
14
11
213
173.74
202.21
187.97
11.75
4
69
13
11
207
170.59
195.87
183.23
11.48
8
67
12
11
201
167.16
189.63
178.40
11.25
100
62
12
10
186
155.61
174.58
165.10
11.24
Note:
Table 3.
Design parameters when .
Table 3.
Design parameters when .
n
N
0.4
0
87
15
13
261
212.04
248.41
230.23
11.79
0.01
87
15
13
261
212.71
248.08
230.40
11.73
0.10
85
14
13
255
209.71
241.68
225.70
11.49
1
85
14
13
255
212.24
240.44
226.34
11.24
2
81
14
12
243
202.46
228.73
215.60
11.28
4
81
14
12
243
203.30
228.21
215.75
11.21
8
81
14
12
243
204.02
227.76
215.89
11.16
100
78
13
12
234
198.24
218.67
208.46
10.92
0.5
0
86
15
13
258
209.73
245.41
227.57
11.79
0.01
86
15
13
258
209.97
245.28
227.63
11.77
0.10
84
14
13
252
206.76
238.98
222.87
11.56
1
84
14
13
252
209.18
237.80
223.49
11.31
2
80
14
12
240
199.46
226.11
212.79
11.34
4
80
14
12
240
200.36
225.57
212.97
11.26
8
78
13
12
234
196.99
219.37
208.18
11.03
100
73
13
11
219
185.24
204.25
194.75
11.07
0.6
0
81
15
12
243
196.90
231.23
214.06
11.91
0.01
81
15
12
243
197.00
231.18
214.09
11.90
0.10
81
15
12
243
197.70
230.81
214.25
11.83
1
78
14
12
234
192.70
221.34
207.02
11.53
2
78
14
12
234
193.54
220.89
207.22
11.45
4
77
13
12
231
193.19
217.16
205.18
11.18
8
74
14
11
222
184.30
208.86
196.58
11.45
100
70
12
11
210
178.72
195.91
187.32
10.80
Note:
Table 4.
Design parameters when .
Table 4.
Design parameters when .
n
N
0.4
0
96
15
14
288
237.24
272.11
254.67
11.57
0.01
96
15
14
288
237.93
271.76
254.85
11.51
0.10
96
15
14
288
239.33
271.17
255.25
11.37
1
92
15
13
276
231.39
258.66
245.02
11.22
2
92
15
13
276
232.31
258.15
245.23
11.15
4
90
14
13
270
228.93
251.92
240.42
10.95
8
90
14
13
270
229.75
251.42
240.58
10.90
100
85
14
12
255
217.44
236.74
227.09
10.94
0.5
0
95
15
14
285
234.97
269.09
252.03
11.57
0.01
95
15
14
285
235.24
268.96
252.10
11.54
0.10
93
16
13
279
230.01
263.14
246.57
11.62
1
91
15
13
273
228.35
256.03
242.19
11.29
2
91
15
13
273
229.28
255.49
242.39
11.21
4
89
14
13
267
225.99
249.27
237.63
11.00
8
89
14
13
267
226.93
248.71
237.82
10.93
100
82
13
12
246
211.21
227.98
219.59
10.73
0.6
0
89
15
13
267
219.34
252.27
235.80
11.68
0.01
89
15
13
267
219.46
252.20
235.83
11.67
0.10
89
15
13
267
220.23
251.86
236.04
11.59
1
88
14
13
264
221.17
247.47
234.32
11.24
2
87
16
12
261
216.51
245.11
230.81
11.57
4
85
15
12
255
213.46
238.66
226.06
11.35
8
83
14
12
249
210.29
232.34
221.31
11.12
100
78
14
11
234
198.57
217.13
207.85
11.17
Note:
Table 5.
Design parameters when .
Table 5.
Design parameters when .
n
N
0.4
0.00
95
16
15
380
308.89
362.28
335.59
11.69
0.01
94
17
14
376
305.01
357.95
331.48
11.84
0.10
93
17
14
372
303.28
354.08
328.68
11.65
1.00
91
16
14
364
300.94
344.40
322.67
11.35
2.00
89
15
14
356
296.73
335.82
316.27
11.16
4.00
88
15
14
352
293.36
331.98
312.67
11.17
8.00
86
14
14
344
288.88
323.93
306.41
10.93
100.00
82
14
13
328
277.11
307.69
292.40
10.85
0.5
0.00
92
17
14
368
297.22
351.12
324.17
11.91
0.01
92
17
14
368
297.68
351.02
324.35
11.86
0.10
92
17
14
368
299.18
350.45
324.82
11.73
1.00
90
16
14
360
296.63
340.58
318.61
11.50
2.00
88
15
14
352
292.18
332.41
312.30
11.28
4.00
87
15
14
348
289.80
328.60
309.20
11.15
8.00
85
16
13
340
282.10
320.37
301.24
11.40
100.00
80
14
13
320
270.15
300.19
285.17
10.88
0.6
0.00
88
16
14
352
284.93
335.77
310.35
11.83
0.01
88
16
14
352
285.05
335.56
310.30
11.85
0.10
86
15
14
344
280.29
327.42
303.86
11.67
1.00
86
17
13
344
279.82
326.74
303.28
11.84
2.00
85
16
13
340
279.78
321.82
300.80
11.53
4.00
82
15
13
328
271.86
309.96
290.91
11.31
8.00
80
14
13
320
267.28
301.55
284.42
11.12
100.00
74
13
12
296
250.71
277.28
263.99
10.81
Note:
Table 6.
Design parameters when .
Table 6.
Design parameters when .
n
N
0.4
0.00
101
17
15
404
330.13
383.44
356.78
11.69
0.01
101
17
15
404
330.85
382.99
356.92
11.65
0.10
101
17
15
404
333.10
382.08
357.59
11.49
1.00
98
16
15
392
326.96
369.43
348.19
11.18
2.00
97
17
14
388
323.57
364.62
344.09
11.32
4.00
95
16
14
380
319.33
356.19
337.76
11.12
8.00
94
16
14
376
316.11
352.35
334.23
11.11
100.00
91
15
14
364
309.03
340.14
324.59
10.83
0.5
0.00
99
17
15
396
323.11
376.07
349.59
11.72
0.01
99
17
15
396
323.68
375.74
349.71
11.69
0.10
99
17
15
396
325.23
375.14
350.18
11.57
1.00
97
16
15
388
322.80
365.79
344.29
11.26
2.00
96
17
14
384
319.40
361.15
340.28
11.39
4.00
93
16
14
372
311.32
349.43
330.38
11.19
8.00
93
16
14
372
312.71
348.76
330.74
11.09
100.00
86
15
13
344
291.61
320.66
306.13
11.01
0.6
0.00
94
17
14
376
306.26
357.10
331.68
11.79
0.01
94
17
14
376
306.41
356.78
331.60
11.81
0.10
94
17
14
376
307.53
356.55
332.04
11.69
1.00
92
16
14
368
305.01
347.17
326.09
11.39
2.00
91
16
14
364
302.20
343.25
322.72
11.34
4.00
90
15
14
360
301.71
338.37
320.04
11.10
8.00
87
16
13
348
290.26
326.94
308.60
11.32
100.00
84
15
13
336
284.41
313.77
299.09
10.98
Note:
Table 7.
Design parameters when .
Table 7.
Design parameters when .
n
N
0.4
0.00
111
18
16
444
365.66
419.29
392.48
11.60
0.01
111
18
16
444
366.71
418.74
392.73
11.55
0.10
110
17
16
440
366.62
413.66
390.14
11.33
1.00
108
18
15
432
361.99
404.50
383.24
11.29
2.00
106
17
15
424
358.00
395.96
376.98
11.09
4.00
106
17
15
424
359.44
395.26
377.35
11.00
8.00
104
16
15
416
354.66
387.29
370.97
10.82
100.00
101
17
14
404
343.97
375.04
359.50
11.01
0.5
0.00
109
19
15
436
357.96
411.50
384.73
11.76
0.01
109
19
15
436
358.33
411.49
384.91
11.72
0.10
109
19
15
436
360.19
410.60
385.40
11.61
1.00
107
18
15
428
358.00
401.18
379.59
11.31
2.00
105
17
15
420
354.04
392.32
373.18
11.15
4.00
104
17
15
416
351.23
388.19
369.71
11.13
8.00
103
16
15
412
350.99
383.49
367.24
10.86
100.00
97
16
14
388
332.10
359.83
345.97
10.83
0.6
0.00
104
18
15
416
341.64
392.59
367.11
11.75
0.01
103
17
15
412
340.19
388.10
364.14
11.62
0.10
103
17
15
412
341.38
387.88
364.63
11.50
1.00
102
19
14
408
337.08
383.75
360.41
11.66
2.00
100
18
14
400
332.66
375.32
353.99
11.50
4.00
98
17
14
392
329.37
366.73
348.05
11.21
8.00
97
16
14
388
328.98
361.58
345.28
11.01
100.00
91
16
13
364
309.59
337.64
323.62
11.09
Note:
6. Examples
6.0.1. Immunotherapy in Elderly Patients with Non-Small Cell Lung Cancer
This example considers an experimental trial involving two immunotherapy-based treatments for elderly patients ( years old) diagnosed with advanced non-small cell lung cancer (NSCLC). The trial compares two immunotherapy strategies—PD1-A (anti-PD-1 monotherapy) and PD1-B (anti-PD-1 combined with low-dose chemotherapy)—against the standard chemotherapy regimen consisting of carboplatin and pemetrexed, which serves as the control treatment.
While carboplatin plus pemetrexed is considered the standard of care in general NSCLC populations, this regimen has not been adequately studied in patients aged 75 and above. As a result, its efficacy and safety profile in this elderly subgroup remain uncertain. Historically, in younger NSCLC populations, this standard chemotherapy yields approximately objective response rate (ORR), and around of patients experience grade 3 or higher treatment-related adverse events. These outcomes establish the benchmark efficacy and safety rates for the control treatment as and , respectively. Prior analyses in younger patients suggest an odds ratio of approximately 2 between efficacy and safety, indicating that patients who do not experience toxicity are more likely to respond to treatment.
The goal of the trial is to evaluate whether either PD1-A or PD1-B is superior to the control treatment in terms of both efficacy and safety. Specifically, the experimenter seeks an increase in the response rate of at least and a reduction in high-grade toxicity of at least , corresponding to and . If both experimental treatments fail to demonstrate improvements over the control, the standard chemotherapy will be selected, with thresholds .
When and , Table 3 shows that the fixed sample size procedure requires observations per treatment, with corresponding critical values and . Therefore, the total number of observations required for the fixed sample size procedure to satisfy the probability constraints is . In contrast, the curtailment procedure, while also using at most observations per treatment and the same critical values , , is expected to achieve the same probability guarantees with fewer observations on average. According to Table 3, the expected relative sample size saving from using the curtailment procedure is approximately .
6.0.2. Chemotherapy of Acute Leukemia
This example involves an experimental trial comparing two different combinations of Gemcitabine and Cyclophosphamide—denoted as GemCy1 and GemCy2—each with varying dosage proportions, against the standard Ara-C regimen for treating patients with good-prognosis acute myelogenous leukemia (AML) or myelodysplastic syndrome. Historically, the standard treatment Ara-C yields approximately of patients achieving complete remission (CR), while around either die or experience severe myelosuppression within the first five weeks. These historical outcomes establish the efficacy and safety rates for the control treatment as and , respectively. Additionally, the odds ratio between efficacy and safety is estimated to be 4, indicating that patients who do not experience toxicity are more likely to achieve complete remission.
The goal of the trial is to determine whether either GemCy1 or GemCy2 surpasses the control treatment in both efficacy and safety by at least and , respectively, corresponding to threshold values and . If both experimental treatments fail to outperform Ara-C in terms of both endpoints, the control treatment will be selected, with equivalence thresholds set at .
When and , Table 2 shows that the fixed sample size procedure requires observations per treatment, with corresponding critical values and . Therefore, the total number of observations required for the fixed sample size procedure to satisfy the probability constraints is . In contrast, the curtailment procedure, while also using at most observations per treatment and the same critical values , , is expected to achieve the same probability guarantees with fewer observations on average. According to Table 2, the expected relative sample size saving from using the curtailment procedure is approximately .
7. Conclusions
This paper considers a curtailment procedure for selecting a random-size subset that contains the best treatment whenever it is significantly better than the control treatment. The comparison is made according to the two endpoints associated with the Bernoulli outcomes from each of the k experimental treatments and the control treatment. The proposed procedure is based on the fixed sample size procedure defined by Buzaianu et al (2025). The proposed procedure satisfies the same probability requirements in reaching the selection goal as does the original fixed sample size procedure, but requires fewer observations from the experimental treatments. Based on our simulations, using the curtailment procedure over the original fixed sample size procedure, would produce between a relative total sample size saving of . The sampling rule with curtailment is a highly desirable feature, not only because it reduces the overall sample size, but because it reduces the sample sizes from potentially undesirable populations. However, in order for such a curtailment procedure to be used, it is desirable that the time between application of the treatment and the observation of the response is small compared to the duration of the experiment.
Also, our simulations showed that the odds ratios have minimal impact on sample size; being nonsensitive to the odds ratio makes the procedure robust with regard to departures from independence. We only considered cases where there is the same type of association between the two endpoints of a treatment, for all treatments. However, this can be easily relaxed to accommodate situations where there are treatments that do not display the same type of association between their two endpoints.
Author Contributions
Conceptualization, Pinyuen Chen and Lifang Hsu; Methodology, Elena Buzaianu and Chishu Yin; Software, Chishu Yin; Validation, Elena Buzaianu, Pinyuen Chen, and Lifang Hsu; Formal analysis, Chishu Yin; Investigation, Elena Buzaianu and Chishu Yin; Resources, Chishu Yin; Data curation, Elena Buzaianu and Chishu Yin; Writing—original draft preparation, Chishu Yin; Writing—review and editing, Elena Buzaianu, Pinyuen Chen, and Lifang Hsu; Visualization, Chishu Yin; Supervision, Pinyuen Chen; Project administration, Lifang Hsu; Funding acquisition, Pinyuen Chen. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
No new data were created or analyzed in this study. Data sharing is not applicable to this article.
Acknowledgments
The authors thank the anonymous reviewers for their valuable comments and suggestions.
Conflicts of Interest
The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.
References
Bechhofer, R.E.; Kulkarni, R.V. Closed Adaptive Sequential Procedures for Selecting the Best of k≥2 Bernoulli Populations. In Proceedings of the Third Purdue Symposium in Statistical Decision Theory and Related Topics; Gupta, S.S., Berger, J., Eds.; 1982; Volume I, pp. 61–108. [Google Scholar]
Bryant, J.; Day, R. Incorporating Toxicity Considerations into the Design of Two-Stage Phase II Clinical Trials. Biometrics1995, 51, 1372–1383. [Google Scholar] [CrossRef] [PubMed]
Buzaianu, E.M.; Chen, P. Curtailment Procedure for Selecting Among Bernoulli Populations. Commun. Stat. Theory Methods2008, 37, 1085–1102. [Google Scholar] [CrossRef]
Buzaianu, E.M.; Chen, P.; Hsu, L. A Curtailed Procedure for Selecting Among Treatments With Two Bernoulli Endpoints. Sankhya B2022, 84, 320–339. [Google Scholar] [CrossRef]
Buzaianu, E.M.; Chen, P.; Hsu, L. Selecting among treatments with two Bernoulli endpoints. Commun. Stat. Theory Methods2024, 53, 1964–1984. [Google Scholar] [CrossRef]
Yin, C.; Buzaianu, E.M.; Chen, P.; Hsu, L. A Design for Selecting Among Treatments with Two Binary Endpoints. Unpublished manuscript, 2025. [Google Scholar]
Carsten, C.; Chen, P. Curtailed Two-Stage Matched Pairs Design in Double-Arm Phase II Clinical Trials. J. Biopharm. Stat.2016, 26, 816–822. [Google Scholar] [CrossRef] [PubMed]
Conway, M.R.; Petroni, G.R. Bivariate Sequential Designs for Phase II Trials. Biometrics1995, 51, 656–664. [Google Scholar] [CrossRef]
Chen, C.M.; Chi, Y. Curtailed two-stage designs with two dependent binary endpoints. Pharm. Stat.2012, 11, 57–62. [Google Scholar] [CrossRef] [PubMed]
Dunnett, C.M. Selection of the Best Treatment in Comparison to a Control with an Application to a Medical Trial. In Design of Experiments, Ranking, and Selection; Santner, T.J., Tamhane, A.C., Eds.; Marcel Dekker: New York, 1984; pp. 47–66. [Google Scholar]
Gupta, S.S.; Sobel, M. On Selecting a Subset which Contains all Populations Better than a Standard. Ann. Math. Stat.1958, 29, 235–244. [Google Scholar] [CrossRef]
Jennison, C. Equal Probability of Correct Selection for Bernoulli Selection Procedures. Commun. Stat. Theory Methods1983, 12, 2887–2896. [Google Scholar] [CrossRef]
Jennison, C.; Turnbull, B.W. Group sequential tests for bivariate response: interim analysis of clinical trials with both efficacy and safety endpoints. Biometrics1993, 49, 741–752. [Google Scholar] [CrossRef] [PubMed]
Sobel, M.; Hyuett, M.J. Selecting the Best One of Several Binomial Populations. Bell Syst. Tech. J.1957, 36, 537–576. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.