Assessment of the interlaboratory variability and robustness of JAK2V617F mutation assays: A study involving a consortium of 19 Italian laboratories

To date, a plenty of techniques for the detection of JAK2V617F is used over different laboratories, with substantial differences in specificity and sensitivity. Therefore, to provide reliable and comparable results, the standardization of molecular techniques is mandatory. A network of 19 centers was established to 1) evaluate the inter- and intra-laboratory variability in JAK2V617F quantification, 2) identify the most robust assay for the standardization of the molecular test and 3) allow consistent interpretation of individual patient analysis results. The study was conceived in 3 different rounds, in which all centers had to blindly test DNA samples with different JAK2V617F allele burden (AB) using both quantitative and qualitative assays. The positivity of samples with an AB < 1% was not detected by qualitative assays. Conversely, laboratories performing the quantitative approach were able to determine the expected JAK2V617F AB. Quantitative results were reliable across all mutation loads with moderate variability at low AB (0.1 and 1%; CV = 0.46 and 0.77, respectively). Remarkably, all laboratories clearly distinguished between the 0.1 and 1% mutated samples. In conclusion, a qualitative approach is not sensitive enough to detect the JAK2V617F mutation, especially at low AB. On the contrary, the ipsogen JAK2 MutaQuant CE-IVD kit resulted in a high, efficient and sensitive quantification detection of all mutation loads. This study sets the basis for the standardization of molecular techniques for JAK2V617F determination, which will require the employment of approved operating procedures and the use of certificated standards, such as the recent WHO 1st International Reference Panel for Genomic JAK2V617F.

The assessment of the JAK2 V617F allele burden (AB) is a common practice either at diagnosis, for prognostic information, or during treatment as a means to assess minimal residual disease [5]. Indeed, JAK2 V617F AB seems to be correlated with an increased risk of thrombosis and evolution in a secondary myelofibrosis in PV (PPV-MF) and, possibly, in ET (PET-MF) [6,7]. Additionally, low AB is associated with a reduced survival in PMF [8][9][10][11]. With regard to drug therapy, several studies showed that interferon-alpha, and the most recent telomerase inhibitors (Imetelstat), significantly reduces JAK2 V617F mutation burden, whereas, JAK inhibitors and hydroxyhurea (HU) did not have any significant effects [12][13][14][15][16][17][18][19][20][21]. Moreover, JAK2 V617F quantification has been incorporated as a potentially useful tool to predict relapse in those patients who underwent allogeneic stem-cell transplantation (alloHSCT). In this setting of patients, early monitoring of the AB (1, 3 and 6 months post alloHSCT) is crucial to predict overall survival and risk of relapse and might guide therapeutic decisions [22][23][24].
Therefore, to provide a reliable and comparable molecular results, the standardization of molecular techniques is urgently needed. In a recent study by European LeukemiaNet/MPN&MPNr-EuroNet group, nine different JAK2 V617F quantitative assays were evaluated by the 12 participant laboratories, with the aim to identify the most robust one for routine diagnostic purpose and also for post alloHSCT monitoring [39]. Therefore, a network of 19 Italian laboratories was established with the aim 1) to evaluate the inter-and intra-laboratory variability in JAK2 V617F quantification in these 19 centers, 2) to identify the most robust assay for the standardization of the molecular test and 3) to allow consistent interpretation of individual patient analysis results.

RESULTS
Between 2014 and 2015, a network of 19 Italian laboratories, routinely involved in the molecular diagnosis of MPNs, was established. The study was coordinated by the Institute of Hematology "L. e A. Seràgnoli", Bologna, and conceived in 3 different rounds in which seven, ten and nineteen laboratories were included over time, respectively ( Figure 1). Overall, one quantitative (ipsogen JAK2 MutaQuant kit, QIAGEN) and four qualitative assays were evaluated. Of these latter, two were commercial (ipsogen JAK2 MutaSearch kit, QIAGEN, and GeneQuality JAK-2, AB Analitica) and two were built A network of 19 centers was established to 1) evaluate the inter-and intralaboratory variability in JAK2 V617F quantification, 2) identify the most robust assay for the standardization of the molecular test and 3) allow consistent interpretation of individual patient analysis results. The study was conceived in 3 different rounds, in which all centers had to blindly test DNA samples with different JAK2 V617F allele burden (AB) using both quantitative and qualitative assays.
The positivity of samples with an AB < 1% was not detected by qualitative assays. Conversely, laboratories performing the quantitative approach were able to determine the expected JAK2 V617F AB. Quantitative results were reliable across all mutation loads with moderate variability at low AB (0.1 and 1%; CV = 0.46 and 0.77, respectively). Remarkably, all laboratories clearly distinguished between the 0.1 and 1% mutated samples.
In conclusion, a qualitative approach is not sensitive enough to detect the JAK2 V617F mutation, especially at low AB. On the contrary, the ipsogen JAK2 MutaQuant CE-IVD kit resulted in a high, efficient and sensitive quantification detection of all mutation loads. This study sets the basis for the standardization of molecular techniques for JAK2 V617F determination, which will require the employment of approved operating procedures and the use of certificated standards, such as the recent WHO 1st International Reference Panel for Genomic JAK2 V617F . www.impactjournals.com/oncotarget "in-house" methods: allele specific polymerase chain reaction (AS-PCR) and Amplification-refractory mutation system (ARMS) analysis [25,30].

I Round: proficiency test
In order to obtain information about the variability in JAK2 V617F quantification between different centers, seven laboratories were employed to evaluate several DNA samples with their own established JAK2 V617F qualitative and/or quantitative method. Precisely, four DNA samples derived from granulocytes of patients with a diagnosis of MPNs were analyzed. All laboratories using the quantitative assay (ipsogen JAK2 MutaQuant kit) were able to determine the expected JAK2 V617F AB, as summarized in Table 1. Only in one case (i.e. DNA sample 1), the Center 3 obtained a false positive result. Indeed, the sample was found to be positive with an AB of 0.13%. On the contrary, two Centers (i.e. 6 and 7), using a qualitative approach, were not able to detect the positivity of the DNA sample 2 (with an expected AB of 0.15%).

II Round: comparison between molecular assays
With the aim to further investigate on the inter-laboratory variability in quantifying JAK2 V617F mutation, especially at low mutation burden, a second standardization round was developed and three additional laboratories were included. Eight DNA samples, derived from dilution of cell lines negative and positive for the JAK2 V617F mutation, were tested by each laboratory with both ipsogen JAK2 MutaQuant kit and their own routine qualitative or quantitative method.
We first examined the methods sensitivity, and the detection ability of the ipsogen JAK2 MutaQuant kit at low-positive samples (i.e. 0.1 and 1%) was compared to those of qualitative JAK2 commercial and validated "inhouse" methods. Overall, the ARMS-PCR "in-house" method and the ipsogen JAK2 MutaSearch kit were able to detect the positivity of the sample with AB of 1%, whereas none of the laboratories using any qualitative methods were able to detect the low-positive sample (i.e. AB < 0.1%). Remarkably, laboratories using the quantitative approach clearly defined the positivity of both 1% and 0.1% mutated samples. Specifically, 10 out of 16 JAK2 V617F determinations were clearly defined positive, with an AB ≥ 0.091% which is the Limit Of Detection (LOD) of the ipsogen JAK2 MutaQuant kit. In the remaining 6 cases, the JAK2 V617F mutation percentage was found between Limit of Blank (LOB = 0.014%) and LOD.
Additionally, the inter-laboratory variability evaluation was restricted to the ipsogen JAK2 MutaQuant kit, as six out of 10 participating laboratories already used this assay in their routine practice. The data from two laboratories (i.e. Centers 3 and 8) were excluded from statistical analysis, as Negative Controls of JAK2 V617F mutation (NC-VF) were found to be positive (> 0.1%) for each run, and considered as invalid runs. This was mainly due to either operator error or to instrumentation suitability, instead of an intrinsic bias of the kit. Overall, quantitative results between the laboratories were reliable as summarized in Table 2 and showed in Figure 2. A small variability was observed especially at low AB (0.1 and 1%, CV = 0.42 and 0.24, respectively). in which seven, ten and nineteen laboratories participated, respectively. Each laboratory had to blindly test DNA samples with different JAK2 V617F allele burden (AB). Overall, one quantitative and four qualitative assays were evaluated.
Oncotarget 32611 www.impactjournals.com/oncotarget   The study was extended to 9 additional centers to confirm the robustness of the ipsogen JAK2 MutaQuant kit in a larger cohort. Each participating laboratory had to test four DNA samples, derived from dilution of cell lines as described above, only with ipsogen JAK2 MutaQuant kit.
The quantification data from Center 11 were excluded from statistical analysis, as NC-VF was found to be positive (> 0.1%) for each run and Positive Control of JAK2 V617F mutation (PC-VF) did not reach the recommended value (> 99.9%). Moreover Center 9 failed to perform correctly both runs due to instrument failure. Of note, both Center 9 and Center 11 did not assess JAK2 V617F AB in their own routine practice. Among the remaining laboratories, Centers 2, 6 and 17 did not reach the minimum number of JAK2 total copy number required (10.000 copies) in five different determinations (two at 0.1% AB sample, two at 1% and one at 10%, respectively), and, therefore, these points were not included in the analysis. Quantitative results were reliable across all mutation loads, as reported in Table 3 and showed in Figure 3. All the 17 laboratories were able to quantify the 0.1% AB sample with the same variability observed in the previous II round (CV = 0.46 and 0.42, respectively). Specifically, 23 out of 32 JAK2 V617F determinations were clearly defined positive, with an AB > 0.091% (LOD). In the remaining 9 cases, the JAK2 V617F mutation percentage was found between LOB and LOD. Surprisingly, a higher variability between laboratories was observed at 1% of AB (CV = 0.77, vs 0.24 in the II round). More importantly, all laboratories clearly distinguished between the 0.1 and 1% mutated samples.
We also evaluated the robustness of the quantitative approach in terms of amplification efficiency. It is well known that amplification efficiency in PCR measures the amount of template converted into amplified product during each cycle of the exponential phase of the reaction. At 100% efficiency, the quantity of product exactly doubles each cycle, thus an efficiency close to 100% is the best indicator of a robust, reproducible assay. An amplification efficiency of 90-105% is recommended for each assay. In our study, the mean value of amplification efficiency obtained, with respect to all runs performed, was of 92% with a CV value of 0.065, confirming the sensitivity and the robustness of the ipsogen JAK2 MutaQuant kit.

DISCUSSION AND CONCLUSIONS
In this study we demonstrated that a qualitative approach is not sensitive enough to detect JAK2 V617F mutation at low mutation burden (i.e. < 1%). Conversely, the quantitative approach proved to be highly efficient and sensitive, although a modest variability was observed between all participating centers, both in the II and in the III round (CV = 0.42 and 0.46, respectively). Interestingly, only the qualitative ARMS-PCR methods (both "in-house" and ipsogen JAK2 MutaSearch kit) and the quantitative approach were able to detect the positivity of samples with and AB of 1%. An acceptable variability was observed at this AB in the II round (CV = 0.24) whereas a higher inter-laboratory variability was registered in the III one (CV = 0.77). With regard to samples with AB > 1%, a lower inter-laboratory variability was observed, as demonstrated by CV which ranges from a minimum of 0.0005 to a maximum of 0.38.
Overall, the quantitative ipsogen JAK2 MutaQuant kit assay performed consistently across different platforms, affirming itself as a robust method to obtain comparable results. This is confirmed by optimal amplification efficiency obtained from each laboratory involved in the study. Indeed, the mean efficiency obtained in this study was of 92% with a CV of 0.065. The observed variability can be explained with both differences in laboratory experience in JAK2 V617F quantitative determination and intrinsic instrumental bias, as happened in our study, with some laboratories experiencing technical issues. This, together with the variability observed at low AB samples, highlights the need for the standardization of practices, including both pre-analytical and analytical phases. To this aim, the WHO 1st International Reference Panel for Genomic JAK2 V617F (WHO document WHO/ BS/2016.2293) was established in 2016 by the Expert Committee on Biological Standardization of the World Health Organization [40]. The availability of JAK2 V617F primary standards should improve the quality of MPN genomic diagnostics by enabling the calibration of assays and kits, and the derivation of secondary standards for routine diagnostic use in determining testing accuracy and sensitivity, thus providing inter-laboratory comparison towards the harmonization of JAK2 V617F testing.
Moreover, we are considering evaluating digital PCR (dPCR). This emerging technology may improve the ability to detect rare mutations and/or low-positive samples due to higher sensitivity and precision, especially during follow-up to assess minimal residual disease or to monitor patients post alloHSCT [41][42]. But further studies are needed on this technology to reach the level of standardization of real-time qPCR.
In conclusion, this study sets the basis for the standardization of molecular techniques for JAK2 V617F determination which will require the employment of approved operating procedures and the use of certificated standards to calibrate JAK2 V617F quantitative assays.

MATERIALS AND METHODS
The study was conceived in 3 different rounds, in which 19 Italian laboratories were employed. Centers 9 and 16 did not perform JAK2 V617F molecular testing in their routine. Of the remaining centers, 8 used only a quantitative www.impactjournals.com/oncotarget approach (2 "in-house" and 1 commercial assays), 7 performed only a qualitative evaluation (1 "in-house" and 3 commercial assay) of JAK2 V617F mutation whereas 2 laboratories used both qualitative and quantitative approaches (2 "in-house" and 1 commercial assays). Regarding real-time PCR instruments used during the study: most (13 out of 19) of the laboratories used Applied Biosystem platforms (ABI7300/7500/7900; Applied Biosystem, Foster City, CA, USA); two laboratories used a Lightcycler LC480 platform (Roche Applied Science, Penzberg, Germany), and the remaining four laboratories used a Rotor-Gene Q 2plex/MDx 5plex HRM instrument (QIAGEN GmbH, Hilden, Germany) ( Table 4).

I Round
In the I round, we aimed to investigate the interlaboratory variability on different mutation loads. In this first step, seven laboratories were involved (Center 1-7). Four of them routinely performed quantitative analysis of JAK2 V617F with ipsogen JAK2 MutaQuant Kit (QIAGEN), two used a qualitative approach for JAK2 V617F evaluation (1 "in-house" method, 1 ipsogen JAK2 MutaScreen Kit -QIAGEN) whereas one laboratory (Center 5) assessed both qualitative and quantitative assays. Each center had to test four DNA samples (DNA 1-4) with the method routinely employed in their own laboratory. DNA samples were isolated from granulocytes of patients with diagnosis of MPNs. The expected mutation burden of the 4 DNA samples was 0.005 (DNA 1), 0.15 (DNA 2), 5 (DNA 3) and 20% (DNA 4), as previously quantified by Bologna's laboratory with ipsogen JAK2 MutaQuant Kit.
All patients provided an informed written consent in accordance with the Declaration of Helsinki for the use of remnant DNA for investigational purposes. The study was approved by the local Ethics Committee.

II Round
To further investigate the inter-laboratory variability on low-positive samples, a II round was developed. The two main objectives of this round were to assess interlaboratory variability across the 10 participating clinical centers and to compare the low-positive sample detection ability of the ipsogen JAK2 MutaQuant kit with the JAK2 validated "in-house" methods. Compared to the first round, three additional centers (Centers 8-10) were included in this step: only Center 8 and 10 routinely performed JAK2 V617F evaluation (1 with a qualitative and 1 with a quantitative method, respectively). Eight DNA test samples (DNA Samples A-H) were manufactured by QIAGEN and were centrally distributed by Werfen. The DNA samples were derived from dilution of cell lines: K562 (JAK2 V617F negative) and MUTZ-8 (JAK2 V617F positive). The ipsogen JAK2 MutaQuant kits and associated master-mix were provided by QIAGEN and Werfen. Each center had to The expected JAK2 V617F mutation burden of the four DNA samples was 0.1, 1, 10 and 100%.

III Round
The III round was intended to confirm the robustness of the ipsogen JAK2 MutaQuant kit. Qualitative methods were therefore excluded and the study was extended to 9 additional laboratories (Centers [11][12][13][14][15][16][17][18][19]. Centers 12, 13 and 17 routinely assessed quantitative evaluation of JAK2 V617F mutation (2 "in-house" and 1 commercial methods), 4 laboratories (Centers 11, 15, 18 and 19) used qualitative assays (1 "in-house" and 1 commercial methods), Center 14 performed both qualitative ("in-house") and quantitative (ipsogen JAK2 MutaQuant kit) analysis whereas Center 16 did not perform JAK2 V617F molecular testing in its routine. Four DNA test samples (DNA Samples S01-S04), provided from the same batches as II round's DNA Samples, were centrally distributed by Werfen. The ipsogen JAK2 MutaQuant kits and associated master-mix were provided by QIAGEN and Werfen. Each laboratory had to blindly test the DNA samples in 2 runs with the ipsogen JAK2 MutaQuant kit. The expected mutation burden of the four DNA samples was 0.1, 1, 10 and 100%.
Moreover, amplification efficiency (E) was calculated from the slope of the standard curve using the following formula: E = 10 -1/slope . Amplification efficiency was expressed as a percentage, that is the percent of template that was amplified in each cycle. To convert E into a percentage we used the following formula: % Efficiency = (E -1) × 100%.

Data collection and run validity check
Raw data were collected and run validity was checked according to manufacturer's instructions in the kit's handbook.

Statistical method
Statistical analysis was carried out by QIAGEN and by Bologna University. Wild type and mutation copy numbers together with mutation percentage were summarized by mean, median, first and third quartiles, standard deviation and coefficient of variation and plotted by sample for the ipsogen JAK2 MutaQuant kit.

Inter-laboratory variability: II round
Fisher test was performed by sample to compare variance in order to conclude on the acceptability of the inter-laboratory variability.

Inter-laboratory variability: III round
Shapiro-Wilk normality test was performed to check for data normality and quantile-quantile normal plots were employed for data visual inspection. Kruskall-Wallis test was applied and multiple comparison post-hoc test (Wilcoxon test) was carried out to identify the significant differences. False discovery rate correction was applied to avoid increase in type I error (false positive) because of multiple testing following the Benjamini and Hochberg procedure.