The application of PRECIS-2 ratings in randomized controlled trials of Chinese herbal medicine

This study tests the feasibility of applying the pragmatic-explanatory continuum indicator summary (version “PRECIS-2”) tool to randomized controlled trials of Chinese herbal medicine. A search was conducted to identify potentially eligible randomized controlled trials. Using the PRECIS-2 tool, assessment of trials was performed independently by 2 evaluators using a scale of 1–5 for each criterion (1 = maximal efficacy, 5 = maximal effectiveness). A total of 7,166 reports were retrieved from databases and 159 were included in the full text. Though PRECIS-2 describes quantitative scoring in detail, evaluators were uncertain about several specific operationalizations and found high evaluator variation in the first independent ratings. After discussion and reaching consensus, inter-evaluator reliability improved. For PRECIS-2 ratings over time, there was no evidence that the design and performance of RCTs of CHM paid more attention to “efficacy” criteria after the implementation of PRECIS (all P > 0.05). More research is needed to establish the easiest and most useful tool to distinguish between effectiveness and efficacy results.


INTRODUCTION
From a comparative effectiveness research (CER) perspective, the "effectiveness" of an intervention in pragmatic trials refers to the extent to which it benefits the targeted population in routine circumstances, with the goal of supporting informed decision-making and improving healthcare. [1][2][3] By contrast, the "efficacy" of an intervention is related to the degree to which the intervention does what is intended under ideal conditions by means of an explanatory randomized controlled trial (RCT). [4,5] Nowadays, many investigators do not value or distinguish between these two concepts when designing and performing clinical trials. They apparently choose these terms randomly, often neglecting the study's true purpose. [6] The pragmatic-explanatory continuum indicator summary (PRECIS) tool which was developed in 2009 [7] and improved in 2015 (version "PRECIS-2"), [8] has been designed to help researchers distinguish between effectiveness and efficacy issues at the design stage of a trial. It also helps ensure that their design options are consistent with their purpose. [8] PRECIS-2 has nine domains, each scored on a 5-point Likert continuum (from 1 = maximal efficacy to 5 = maximal effectiveness). This benefits designers by allowing them to determine whether design options meet their intended purpose in critical appraisal. It can also be used for systematic review, funding, ethics, and publication decisions on RCTs. [8] It is important to note that there is no trial of pure effectiveness or pure efficacy, and different weights exist for terms in a continuum for different features of the trial design. Research Paper www.impactjournals.com/oncotarget CER is particularly valuable for interventions with high variation in practices widely used in daily life. [9,10] Chinese medicine (CM) is becoming increasingly prevalent in Europe and North America and has some variation in diagnoses and treatment (e.g. syndrome differentiation). [11][12][13] In RCTs of Chinese medicine, it is difficult for a researcher to discriminate and execute measures of "efficacy" and "effectiveness" in accordance with the purpose. Thus, our study aims to 1) test the feasibility of applying the PRECIS tool to RCTs of Chinese herbal medicine (CHM); 2) evaluate the extent to which RCTs of CHM are explanatory using PRECIS-2 efficacy as the goal of RCTs.

Search results
7,166 unique citations were retrieved: 913 from Medline, 2,753 from Embase, 1,669 from AMED and 1,831 from CENTRAL. Of these, 6,961 were excluded after identification, screening and eligibility processes, based on titles or abstracts, leaving 1,289 for full-text review. Of the 205 remaining full citations, 46 were excluded and 159 were included. The selection process for all articles is presented in Figure 1.

Characteristics
Characteristics of the 159 selected RCTs are presented in Table 1. The frequency of RCTs of CHM was found to have increased most of the time, but to have declined in 2015 ( Figure 2). Table 1 shows that the most common CHM formulations were capsules (53, 33.3%), granules (29, 18.2%) and tablets (17,10.7%). The digestive system (29, 18.2%) and the nervous system (27, 17.0%) were each major research areas. The majority of the RCTs were conducted in multiple centers (84, 52.8%) and in mainland China (98, 61.6%), with 2 study groups (129, 81.1%) and funding (137, 86.2%). The average sample size was 120, and the majority of the RCTs had been published in journals with impact factors in the range of 1.93-3.0.

Inter-evaluator reliability of ratings
As shown in Table 2, evaluators believed that good judgement existed in the criteria of "Eligibility" and "Organisation," while moderate judgement existed in other criteria.
Though PRECIS-2 describes quantitative scoring in detail, evaluators were uncertain about several specific operations and got high evaluator variation in the first  independent ratings. Several comments we proposed were as follows: uncertain judgement existed when some key information was not reported; judgement still lacked clear quantitative illustration in some items; it was difficult to determine the extent (scoring 1 or 2 is vague). After consensus agreement, inter-evaluator reliability was improved and most of the differences in judgement between the two evaluators were 1 point.

PRECIS-2 ratings over time
The differences in the percentage of ratings of the efficacy-effectiveness continuum in studies published before and after the implementation of PRECIS are presented in Table 3. There was no evidence that the design and performance of RCTs of CHM paid more attention to "efficacy" criteria after the implementation of PRECIS (all P>0.05).

DISCUSSION
This study aimed to analyze RCTs of CHM in order to characterize explanatory versus pragmatic design, and how design details changed before and after the implementation of PRECIS. It was found that after the implementation of PRECIS, the design and performance of RCTs of CHM did not improve, in terms of "efficacy" criteria.
We tested the feasibility of applying the PRECIS tool to appraising the efficacy-effectiveness continuum of RCTs of CHM. Due to insufficient information and lacking clear quantitative illustration of several items, high variation and uncertainty existed in the first independent ratings. Our results were similar to those of previous studies. Witt CM [14] observed that much of the heterogeneity observed between the evaluators was due to information missing from publications or difficulty in operationalizing the criteria. Johnson KE [15] pointed out that evaluators struggled to use the PRECIS system for analysis, as large differences existed in inter-evaluator reliability. Furthermore, El DR et al [16] indicated that the clinical expertise of the investigator also affected scoring in each domain of PRECIS-2.
In our study, evaluators held different understandings and judgements when they referred to illustrations of PRECIS-2 or when there was missing

Features of included RCTs
No. of studies % Abbreviations: RCTs, randomized controlled trials; SD, standard deviation.  Uncertain judgment exists when older publications do not provide the information related to "intent-totreat" or "per-protocol" analysis.
#after consensus max difference of points (scale 1-5, 1 = max. efficacy to 5 = max. effectiveness) for each of the trials for this criteria; ∆ Before: before consensus, after: after consensus; * PRECIS: a pragmatic explanatory continuum indicator summary. information. For example, it was unclear for evaluators which situation constituted the usual care, especially when the trial did not report any recruitment information; sometimes, the evaluators were confused about how to identify "specialist, academic centres" or hospitals; it was difficult to determine degree since a binary scale of 1 or 2 was utilized; the evaluators were unsure how to determine "follow-up visits that are more frequent than occur under usual care;" publications did not provide enough information about "having central adjudication of the outcome or using an assessment that needs special training or tests not normally used in usual care;" judgements were uncertain when older publications did not provide the information related to "intent-to-treat" or "per-protocol" analysis. The challenges pertaining to using the tool, especially for certain criteria, suggest that the PRECIS-2 criteria need to be further refined in order to achieve specificity sufficient to enable evaluators to perform quantitative judgment.
To assess the impact of the introduction of PRECIS on the design and implementation of RCTs in this field, we used PRECIS-2 to compare the distribution of each criterion, both before and after 2013. Our results illustrated that there was no improvement in "efficacy" criteria after the implementation of PRECIS. The reasons for this are as follows: 1) the promotion of PRECIS and the importance of considering "efficacy" and "effectiveness" criteria before trial design were insufficient; 2) due to language barriers and a lack of instructions for the Chinese version of PRECIS, many Chinese scholars do not notice the discrepancy between "efficacy" and "effectiveness" criteria before preparing RCTs of Chinese herbal medicine; 3) the challenges and variations in the understanding and usage of PRECIS, especially for certain criteria, hampers researchers ability to use it. There were moderate judgements existed in the criteria of "Setting" and "Flexibility (adherence)" between two evaluators. However, evaluators believed that good judgement existed other criteria.
Though some limitations exist in applying PRECIS-2, its utility benefits the capturing of complete trial information and judging whether the design is consistent with research objectives. This enables better comparisons across trials and allows for analysis of a broader trial portfolio. We propose several suggestions: 1) More research is urgently needed to establish the easiest and most useful tool to facilitate the applicability of results in clinical practice, distinguish between effectiveness and efficacy results and assist researchers in preparing and planning clinical trials; [16] 2) Researchers should pay attention to PRECIS-2 before they design RCTs and promote self-review during their implementation. 3) Due to the large number of Chinese researchers, the PRECIS-2 guidelines should be translated into Chinese; Related introductory articles should also be published in Chinese to promote a wider range of applications for PRECIS-2; 4) Journals all over the world that publish clinical trials should require authors to include a quantitative score related to the effectiveness or efficacy of their combined research articles; [16][17][18] 5) Several issues specific to CHM should be clarified in the new version of PRECIS. For flexibility (delivery), how do we define "a highly specified, protocol driven intervention" and "permitted co-interventions" in CHM, as doctors of Chinese medicine add or subtract herbs based on syndrome differentiation at different times? A special assessment criterion between effectiveness and efficacy in CHM is needed.

Literature search
A search of Medline, Embase, AMED (the Allied and Complementary Medicine Database) and CENTRAL (Cochrane Library) databases from their inception until December 2016 was conducted to identify potentially eligible studies. We used the string (''Chinese herbal drugs" OR "oriental traditional medicine" OR ''east Asian traditional medicine" OR "herbal medicine" OR "herbaceous agent" OR "Chinese adj5 (herb * or medic * or drug * )" OR "herb * adj5 (medic * or drug * )"). No language restrictions were imposed, and the reference lists of all relevant studies were checked for further reports. The search strategy can be found in the Supplementary Materials.

Types of studies
RCTs were included which evaluated the effects of Chinese herbal medicines for any disease. Quasirandomized trials were excluded.

Types of interventions
Included interventions included: 1) single herb; 2) Chinese proprietary herbal medicine (usually taken as granules, decoction, oral liquid, extract, capsule, injection, herbal tea, pills, powder, ointment, tablets); 3) herbal mixture prescribed by an herbalist (individualized treatment), and usually tailored to an individual's pattern of symptoms. There were no limits on approval status, formulation or mode of administration for herbal medicines. Studies of integrative medicine were excluded.

Comparison group
Placebo, treatment as usual, an alternative presentation of interventions of the study group, no treatment or other active interventions were included as the control group.

Selection of reports to be studied
First, one researcher (LL) picked out duplicate reports using the reference management software EndNote X6, and scanned the titles and abstracts of the citations retrieved by the selection search engine in EndNote X6 (first scanning). Then, the full texts of all potentially eligible reports were viewed together by two researchers (LL and ZL). If a report either did not meet the inclusion criteria or it met the exclusion criteria, they would move it into the appropriate folder with labels in EndNote X6. Several controversial reports were marked as either ''suspicious,'' or "waiting for the next selection."

Data extraction
Two researchers (LL and ZL) used the EpiData 3.1 software (The EpiData Association, Odense, Denmark) to extract and enter the findings from the final included reports by using a unified structure form. Data extracted from each study included the title, publication year, regions where RCTs were conducted, impact factor, single center/multi-center, study groups, choice of interventions, human systems, sample size and funding sources.

Inter-evaluator reliability of ratings
Trial assessment was performed independently by 2 evaluators (LL and ZL) who had been trained in PRECIS-2. The assessments utilized a scale of 1-5 for each criterion (1 = maximal efficacy, 5 = maximal effectiveness). To test the feasibility of applying the PRECIS tool to CHM RCTs and to ensure that the criteria could be applied consistently by more than one person, we pilot-tested a draft data abstraction form using a random sample of 15 included studies prior to beginning data abstraction. The intraclass correlation coefficient (ICC) was calculated both before and after agreement. The ICC calculation formula was as follows: After this, we proposed our comments on PRECIS-2 criteria's' ratings operationalization and improvement. Then, we started rating all reports. Disagreements between the two researchers were discussed by the whole team and ultimately a consensus was reached.

Statistical analysis
Ratings along the efficacy-effectiveness continuum were summarized by descriptive analysis for each time period. Previous studies have argued that a period of 3-4 years after the publication of standards is sufficient to ensure the promotion and adoption of new guidelines. [19,20] The PRECIS tool was developed in 2009. Thus, the publication year of 2013 was used as the cut-off point. We calculated the proportion of each PRECIS criterion's score before 2013 (including 2013) as well as after 2013. We then compared the distribution of each criterion between different date ranges using a rank-sum test. Descriptive statistical analysis and statistical inference were performed using SPSS V.18.0 (SPSS, Illinois, USA).

CONCLUSION
To the best of our knowledge, this is the first study to investigate the impact of the introduction of PRECIS on the design and implementation of RCTs of CHM. It was found that after the implementation of PRECIS, the design and performance of RCTs of CHM did not improve, in terms of the "efficacy" criterion. We expect an improved version of PRECIS, as well as its promotion to contribute to the progress in considering "efficacy" and "effectiveness" criteria before trial designs in the future.