Modified Simon’s minimax and optimal two-stage designs for single-arm phase II cancer clinical trials

Simon’s two-stage design and the admissible two-stage design have been commonly used in practice for single-arm phase II clinical trials when the primary endpoint is binary. The ethical benefit of the two-stage design over the single-stage design is attained by the early termination of the trial when the treatment seems to be inactive. While Simon’s optimal design is the two-stage design that minimizes the expected number of subjects under the null hypothesis, the probability of falsely declaring futility after the first stage frequently seems undesirably high. In Simon’s minimax design, however, it is often the case that a high proportion of the total planned subjects are evaluated in the first stage, and thus the ethical benefit may not be achieved. In this paper, we propose modified minimax and optimal two-stage designs which guarantee not only type I and II error rates but also reasonable sample size proportions in the first stage, while maintaining the probability of falsely declaring futility under a pre-selected level. The characteristics of the modified two-stage design will be compared with those of Simon’s and the admissible two-stage design. The modified minimax design yields a design that requires modest increase in 29% of cases, while the modified optimal design saves 1 to 13 subjects in 81% of cases for β = 0.2. The modified design approach provides investigators with an alternative when the sample sizes of Simon’s designs are severely unbalanced or the Type II error is unacceptably high after the first stage.


INTRODUCTION
The primary objective of phase II cancer clinical trials is to seek an early indication of anti-tumor activity of a novel treatment and to make a "go/no-go" decision for a larger and more definitive phase III trial. Although the Clinical Trial Design Task Force (CTD-TF) of the National Cancer Institute (NCI) Investigational Drug Steering Committee (IDSC) in general recommended the use of progression-free survival as the primary endpoint and randomization, the CTD-TF acknowledged that the objective response rate as an endpoint and single-arm designs remain relevant in certain situations (Seymour et al. [1]) and such designs remain very common.
The two-stage design for a single-arm phase II clinical trial with binary endpoint has a history dating back to Gehan [2]. The ethical benefit of the two-stage design over the single-stage design is attained by the early termination of the trial when the treatment seems to be inactive. Simon's two-stage design [3] has been commonly used in practice for single-arm phase II cancer clinical trials when the primary endpoint is binary. Within the framework of two-stage design, the trial will be early terminated if n 1 subjects are evaluated in the first stage and the number of responders is less than or equal to r 1 . If the trial proceeds to the second stage, then a total of n subjects will be evaluated and the null hypothesis fails to be rejected if r or fewer responders are observed.

Research Paper www.oncotarget.com
A substantial amount of work has been published concerning two-stage designs with binary endpoint. Herndon [4] proposed a hybrid two-stage design which allows the continuation of patient accrual while the first stage data is being analyzed. Ye and Shyr [5] provided a balanced two-stage design which seeks to equalize the sample size of the two stages, while maintaining total sample size that are comparable with Simon's design. This design is, however, not an optimal design in terms of either total sample size n or the expected sample size under the null hypothesis. Chi and Chen [6] proposed a two-stage design which allows early termination for efficacy and futility. The two-stage adaptive designs by Banerjee and Tsiatis [7] and Lin and Shih [8] and the Bayesian two-stage designs by Heitjan [9], Sambucini [10], Tan and Machin [11], and Wang et al. [12] were developed. The two-stage optimal design for phase II trials under the alternative hypothesis was presented by Mander and Thompson [13].
These design approaches except Ye and Shyr [5], however, do not take into account the balance in sample size between the two stages, and thus the ethical benefit expected by the two-stage design approach may not be achieved if a high proportion of subjects are evaluated in the first stage. In addition, authors have observed the twostage design of which the probability of falsely declaring futility assigned at the first stage is undesirably high, as the design does not place an upper limit on it. Moreover, no admissible design exists if the difference in n between two Simon's designs is less than or equal to 1. To address these two concerns, we propose modified minimax and optimal two-stage designs which can guarantee not only type I and II error rates but also a reasonable range of sample size of the first stage, while maintaining the probability of falsely declaring futility after the first stage under a pre-selected level.

Simon's and the admissible two-stage designs
Suppose that p 0 and p 1 are the success rates under the null and alternative hypotheses, respectively. For given type I and II error rates of α and β, Simon's minimax twostage design is the design, (r 1 , n 1 , r, n), which minimizes the total sample size n. If multiple solutions, (r 1 , n 1 , r, n), exist, the design with the minimal expected sample size under the null hypothesis, is selected as the minimax two-stage design. Herein, the PET 0 is the probability of early termination under p 0 after the first stage; where B(·|p, m) is the cumulative distribution function for the binomial distribution with success probability of p and number of trials, m, respectively. Likewise, Simon's optimal two-stage design is the design which minimizes the EN 0 with the same constraints used for the minimax design. The optimal design is a two-stage design for which the PET 0 should be as high as possible and n 1 as small as possible. Accordingly, the probability of early termination under p 1 (PET 1 ) which corresponds to the type II error spent at the first stage, could be undesirably high, especially for β = 0.2.
The admissible two-stage design by Jung et al. [14] is the design which minimizes the Bayes loss or risk function, with the same constraints as used in Simon's design. Simon's minimax and optimal designs are equal to the admissible two-stage designs with q = 1 and q = 0, respectively. Thus, no additional admissible design exists if the difference in n between two Simon's designs is less than or equal to 1.
As these Simon's designs and the admissible designs do not take into account the balance in the sample size and type II error between two stages, the severe imbalance in the sample size or in type II error is often observed. For example, with design parameters (p 0 , p 1 , α, β) = (0.7, 0.9, 0.05, 0.2), 23 of 26 (88%) and 6 of 27 (22%) subjects will be evaluated in the first stage by Simon's minimax and optimal designs, respectively, and no additional admissible two-stage design is available. The type II errors spent in the first stage by Simon's minimax and optimal designs are 19.3% and 11.4%. For Simon's minimax design with design parameter (p 0 , p 1 , α, β) = (0.5, 0.65, 0.05, 0.2), 66 out of 68 subjects (97%) will be evaluated in the first stage, while Simon's optimal design requires additional 15 subjects. The type II error spent at the first stage by Simon's minimax and optimal designs are as high as 18.9% and 14.3%. Other examples will be discussed in Section 3.3.

Modified minimax and optimal two-stage designs
We propose the modified minimax two-stage design for single-arm phase II clinical trials which is the solution, (r 1 , n 1 , r, n), to an integer optimization problem expressed by minimize n Type I error ≤ α and Type II error ≤ β. The aforementioned two drawbacks of Simon's design can be addressed by considering two additional constraints (1) and (2). With appropriate values of λ 1 , λ 2 , and ε, the pre-selected range of subjects will be evaluated in the first stage and the probability of falsely declaring futility spent at the first stage will be less than or equal to ε ≤ β. As ε, a maximally allowed type II error at the first stage, gets close to β, the impact of constraint (1) becomes diminished. Likewise, the modified optimal two-stage design is the solution which minimizes EN 0 with the same constraints. Note that the modified two-stage design matches Simon's design if it satisfies equation (1) and (2). Investigators may choose different values of λ 1 , λ 2 , and ε, depending on their purpose. λ 1 = 1/4 and λ 2 = 1/2, for instance, could be selected if one wants to conduct the interim analysis with 25% to 50% of the planned information for whether the second stage is open. The optimal timing for interim analyses for the confirmative clinical trials has been examined by Lawrence Gould [15] and Togo and Iwasaki [16]. Lawrence Gould claimed that the interim analysis for futility for randomized two-arm 'proof of concept' trials be carried out after accumulating at least 40% of the planned observations. As Lawrence Gould pointed out, if the interim analysis for futility is carried out with too little data, it is not conclusive enough to support the decision. Little benefit will be gained if the interim analysis is conducted with too much data. In this paper, λ 1 = 1/3, λ 2 = 2/3 and ε = 0.1 are selected to provide practical boundary so that 33% to 67% of subjects will be evaluated in the first stage to make decision with the reasonable amount of data and the PET 1 is controlled under 0.1. For β ≤ 0.1, constraint (1) makes no impact on searching for the solution. With constraint (1), the modified design, however, guarantees that when β is chosen to be > 0.1, the probability of falsely declaring futility after the first stage is controlled to be at most 10%. For β =0.2, a common choice, the modified design is well balanced in terms of type II error as well as sample sizes between two stages. Simon's and the admissible design were computed through Dr. Anastasia Ivanova's website [17].

Comparisons with Simon's and the admissible design
Firstly the total sample size of the modified design with γ 1 = 1/3, γ 2 = 2/3, and ε = 0.1 is compared with Simon's design for Δ = p 1 -p 0 = 0.15 (16 cases) and 0.2 (15 cases) in Figure 1. The top panels of Figure 1A and 1B show the number of additional subjects required for the modified minimax design while the bottom panels indicate those for the modified optimal design. Overall, 66 of 93 (71%) have the same total sample size to Simon's design (10 (11%) have different first stage numbers), with the remaining 27 cases (29%) needing at most 3 additional subjects. For the modified optimal design, the results differ dramatically by β. For β = 0.1, 56/62 cases (90%) have the same total sample size, while 3 cases each require more (1 to 3 subjects) or fewer (2 to 9 subjects). For β = 0.2, only 2 cases (6%) have the same total sample size, while 81% (25/31) of cases save 1 to 13 subjects, and 13% (4/31) require 1-3 additional cases. Thus, for β = 0.2, dramatic improvements over the Simon design can be achieved.
The further comparisons are conducted and summarized in Supplementary Figures 1 and 2. The number in parenthesis for (α, β) = (0.05, 0.2) denotes the difference in n, compared with the modified optimal design. In cases that there is no difference in n, we investigate if the sample size of the first stage n 1 and the early stopping rule for futility r 1 are identical; "=" indicates that the designs are identical but "≠" shows that they are not identical even though the total sample sizes are the same. For example, for (p 0 , p 1 , α, β) = (0.3, 0.5, 0.1, 0.1), the modified minimax design is not identical to Simon's minimax even though the total sample sizes are the same; (r 1 , n 1 , r, n) = (6, 26, 15, 39) for the modified minimax against (7, 28, 15, 39) for Simon's minimax. For (p 0 , p 1 , α, β) = (0.05, 0.25, 0.05, 0.2), three designs, the modified minimax and optimal and Simon's optimal design ((r 1 , n 1 , r, n) = (0, 9, 2, 17)), are identical, and one more subject is required in n, compared to Simon's minimax design, (r 1 , n 1 , r, n) = (0, 12, 2, 16). Figure 2 illustrates the characteristics of Simon's designs for (α, β) = (0.05, 0.2). The left and right panels show the ratio of n 1 to n and the Type II error rate spent after the first stage (PET 1 ), respectively. Top and bottom panels show Simon's minimax and optimal designs, respectively. The PET 1 of Simon's minimax design is greater than 0.1 in 10 of 31 cases (32%) and less than n/3 subjects will be investigated in either the first or the second stage in 11 of 31 cases (35%). The PET 1 of Simon's optimal design for (α, β) = (0.05, 0.2) is greater than 0.1 except for two cases, p 1 -p 0 = (0.05, 0.25) and (0.8, 0.95). With (α, β) = (0.1, 0.1) and (0.05, 0.1), all PET 1 s of Simon's minimax and optimal design considered satisfy constraint (1) (plots are omitted) and thus the modified designs are not identical to Simon's designs if <n/3 subjects are evaluated in either the first or the second stage; 24 of 62 (39%) for Simon's minimax and 9 of 62 (15%) for Simon's optimal design. The EN 0 of the modified minimax design is smaller than or equal to Simon's minimax except for 4 cases (plots are omitted) while the EN 0 of the modified optimal design increases by 0.04 to 3.36. As the EN 0 is highly attributed to the sample size in first stage, n 1 , the large difference in EN 0 between the modified and Simon's design can be found when the ratio of n 1 to n is too large or too small.

Examples
The characteristics of the modified design are compared in detail with the other two designs in Table 1 for four cases. With (p 0 , p 1 , α, β) = (0.35, 0.55, 0.1, 0.1), 86% of subjects will be evaluated in the first stage by Simon's minimax design while 48% will be evaluated in the first stage by the modified minimax design. The modified minimax design is identical to an admissible design and requires two additional subjects in n. The EN 0 of the modified minimax, however, decreases by 5.2. The modified optimal design is the same as Simon's optimal design.
With (p 0 , p 1 , α, β) = (0.7, 0.9, 0.05, 0.2), the sample size of each stage for both Simon's minimax and optimal design is seriously imbalanced (88% and 22% in the first stage) and the PET 1 s of them are as high as 19% and 11%. No additional admissible design is available.
The modified minimax and optimal design provides investigators with a novel design, (r 1 , n 1 , r, n) = (8, 11, 23, 28) which requires 1 or 2 additional subjects if the second stage is open. The PET 1 of this design decreases to 9% (approximately 10% and 2% lower than Simon's optimal and minimax) and 39% of subjects will be evaluated in the first stage. Figure 1: Comparisons of total sample sizes between modified designs with γ 1 = 1/3, γ 2 = 2/3, and ε = 0.1 and Simon's designs for p 1 -p 0 = 0.2 (A) and p 1 -p 0 = 0.15 (B). The top panels of A and B show the number of additional subjects required for the modified minimax designs while the bottom panels indicate those for modified optimal designs. www.oncotarget.com With (p 0 , p 1 , α, β) = (0.8, 0.95, 0.1, 0.1), Simon's minimax design is identical to Simon's optimal design and 7 of 31 (23%) subjects will be evaluated in the first stage and no additional admissible design is available. Similarly, the modified minimax design is optimal in term of EN 0 in those satisfying constraint (1) and (2), and 16 out of 31 (52%) subjects will be evaluated in the first stage. When compared to Simon's optimal design, the EN 0 of the modified design increases by 0.5, which seems ignorable. In fact, the PET 0 of the modified design is much higher than that of Simon's design (0.648 vs. 0.423) and the sample size of the modified design is much better balanced.
With (p 0 , p 1 , α, β) = (0.5, 0.65, 0.05, 0.2), the sample size of each stage for Simon's minimax design is severely imbalanced (97% in first stage) and the PET 1 s of Simon's designs are as high as 19% and 14% for Simon's minimax and Simon's optimal designs. The sample size of each stage for the modified design is well balanced and the PET 1 s are controlled to be below 10%. The total sample size of the modified optimal design decreases by 8, compared with Simon's optimal but the EN 0 of the modified optimal design increases by 1.7. The modified minimax design is identical to one of 4 other admissible designs.

DISCUSSION
As both Simon's two-stage designs and the admissible two-stage design approaches do not take into account the balance in the sample sizes between the two stages, a high proportion of subjects may be evaluated in the first stage, and so the ethical benefit expected by the two-stage design is not be achieved. In addition, the Type II error spent at the first stage is frequently undesirably high, as it is not controlled within framework of Simon's design.
We believe that such designs may not be very acceptable to investigators. Moreover, the admissible design does not exist if the difference in total sample size between Simon's optimal and minimax designs is ≤ 1. These drawbacks of Figure 2: Simon's minimax (top two panels) and Simon's optimal designs (bottom two panels) for p 1 -p 0 = 0.15 and 0.2: ratio of n 1 to n (left panels) and the type II error spent in the first stage (PET 1 , right panels) for (α, β) = (0.05, 0.2). www.oncotarget.com Simon's design can be improved by using the modified design approach presented here which aims to find the minimax and optimal two-stage design satisfying two additional constraints: 1) reasonable sample size proportion in the first stage and 2) ensuring a Type II error of ≤ ε ≤ β after the first stage. With λ 1 = 1/3, λ 2 = 2/3, ε = 0.1, the modified minimax design yields a design that requires modest increase of 1 to 3 additional subjects in 29% of cases, while the modified optimal design saves 1 to 13 subjects in 81% of cases for β = 0.2. Thus, the modified design approach provides investigators with an alternative when the sample sizes of Simon's designs are severely unbalanced or the Type II error is unacceptably high after the first stage. The characteristics of the modified minimax and optimal designs for testing 20% and 15% improvement are presented in Supplementary Tables 1-6.

ACKNOWLEDGMENTS AND FUNDING
This work has been supported in part by the Biostatistics Core Facility at the H. Lee Moffitt Cancer Center & Research Institute, an NCI designated Comprehensive Cancer Center (P30-CA076292).