Comparative study of single-nucleotide polymorphism array and next generation sequencing based strategies on triploid identification in preimplantation genetic diagnosis and screen

Triploidy occurred about 2-3% in human pregnancies and contributed to approximately 15% of chromosomally caused human early miscarriage. It is essential for preimplantation genetic diagnosis and screen to distinct triploidy sensitively. Here, we performed comparative investigations between MALBAC-NGS and MDA-SNP array sensitivity on triploidy detection. Self-correction and reference-correction algorism were used to analyze the NGS data. We identified 5 triploid embryos in 1198 embryos of 218 PGD and PGS cycles using MDA-SNP array, the rate of tripoidy was 4.17‰ in PGS and PGD patients. Our results indicated that the MDA-SNP array was sensitive to digyny and diandry triploidy, MALBAC-NGS combined with self and reference genome correction strategies analyze were not sensitive to detect triploidy. Our study demonstrated that triploidy occurred at 4.17‰ in PGD and PGS, MDA-SNP array could successfully identify triploidy in PGD and PGS and genomic DNA. MALBAC-NGS combined with self and reference genome correction strategies were not sensitive to triploidy.


INTRODUCTION
Triploidy is an abnormal chromosome kayotype, which occurred about 2-3% in human pregnancies [1] and contributed to approximately 15% of chromosomally caused human early miscarriage. The biological mechanism of triploidy may be of either digyny (one more haploid got from mother) or diandry (one more haploid got from father), and digynic triploidy predominates in fetuses leading to about 50-60% of early triploidy spontaneous pregnancy loss [2][3][4][5]. Although genomic imprinting or whole genome gene expression disturbed resulted to triploidy often early spontaneous abortion, triploidy still occasionally could develop to fetal or newborn period with the birth of an abnormal fetus or infants [6]. And assessment of embryonic phenotype with parental origin showed no correlation between the phenotype of the embryo and parental origin of the extra haploid set in triploid pregnancy [7].
Three pronucleis (3PN) embryo formation is common during in vitro fertilization (IVF), and is believed that polyspermic fertilization or oocyte-derived meiotic failure lead to triploid [8,9]. Those patients with advanced maternal age or severe sperm abnormalities significant increased the incidence of triploidy fertilization during IVF [10]. Intracytoplasmic sperm injection (ICSI) essentially excluded dispermic triploidy but cannot prevent oocyteinduced triploid, such as the second polar boby exhausted failure [11]. It had been reported that triploid embryos negatively associated with IVF pregnancy outcome [12]. ICSI was recommended in patients who would undergo treatment of premimplantation genetic diagnosis and screen (PGD/S) [13]. Previous study suggests that 3PN embryos formation in ICSI treatment was mainly due to nonextrusion of the second polar body [14], severe sperm abnormalities, oocyte aging and women who are high responders to gonadotropins may also contribute to this process [9,10,15]. Sometimes 3PN embryos could not be observed due to fusion of pronuclear during IVF, hence, it was essential to consideration the formation and detectable of triploid embryos during PGD/S. Research Paper www.impactjournals.com/oncotarget PGD/S was used for chromosome structure abnormality carrier patients, single gene mutation carrier patients, advanced maternal age couples, recurrent implantation failure and recurrent miscarriage patients to screen embryos genetic condition prior to transfer [16], along with single cell amplification technology developed, genome-wide technologies such as array-CGH, singlenucleotide polymorphism (SNP) array and next generation sequencing (NGS) are applied to PGD/S [17][18][19]. All of these three methods can successfully detect chromosome imbalances in embryos, also providing extra benefit of simultaneous aneuploidy screen of all 24 chromosomes [20,21]. NGS was becoming more and more popular in PGD/S for lower cost, higher resolution and providing opportunity to simultaneously analyze single-gene disorders and genome-wide chromosome imbalance diagnosis and screen [22][23][24]. MDA and MALBAC, two main powerful single cell amplification methods were widely used in PGD/S, have performed high coverage and accuracy of whole genome [25,26]. SNP based noninvasive prenatal testing (NIPT) have distinguished triploid pregnancy, and NGS based NIPT together with NATUS algorithm analyzing sequencing data identified triploid pregnancy using cfDNA in maternal blood [27,28]. Although PGD/S has increased the pregnancy rate, no studies explore the ability on triploid embryos detection between SNP array and NGS technologies in PGD/S.
Here in our study, we systematically compared SNP array and NGS on detection chromosome deletion, duplication, uniparental disomy, mosaic and triploidy. We firstly identified triploid embryos using SNP array and compared it with NGS during PGD/S, our results indicated that present NGS based PGD/S procedures were unable to detect triploid embryos but SNP array can successfully distinguished triploidy.

Comparative MDA-SNP array and MALBAC-NGS during copy number variance screen
We firstly compared the SNP array and NGS on detection copy number variance. We diluted DNA from previously known SNP karyotype missed abortion chorionic tissues then whole genome amplification using MALBAC. The amplification product were sequenced and analyzed, 46 samples previously known SNP array karyotype including duplication, deletion, mosaics and uniparental disomy (UPD), the results of SNP array and NGS were highly in accordance ( Table 2). The NGS is sensitive to duplication, deletion and mosaics, but UPD couldn't be identified. Furthermore, NGS could identify more copy number variance than SNP array and is powerful on detection low percentage mosaics.

Digyny and diandry triploidy detection using MDA-SNP array
Our previous data detected triploidy in five PGD cycles, however, these only identified arr

Digyny and diandry triploidy detection using MALBAC-NGS
We selected 9 previous known triploids SNP array karotype to validate using NGS. The 9 triploid SNP array kayotype were shown in Supplementary Figure 1, one was amplification failure in our study. The basic information of 9 sample data was shown in Supplementary Table 1. After sequencing, we firstly using reference karyotype methods to analyzed the data, and we found that NGS could not detect triploid for it considered the triploid as normal diploid (Supplementary Table 2). Then we using self correction method to analyze the data, this could not detect the triploid karyotype (See Figure 3). The unique reads of NGS data were ranging from 1.23M to 4.84Mb, both the low and high coverage didn't identify the triploidy in our study. Our results indicated that both self and reference correction may not be able to detect the triploid, mostly because both these methods calculated the reads of 24 chromosome falling into the continuous sliding widows on human reference genome.

DISCUSSION
Triploidy is an abnormal chromosome kayotype, which occurred at very low percent in human pregnancies, this is the first time comprehensive analyzed the triploidy detection using SNP array and NGS in PGD/S and missed abortion chorionic tissues. Our results indicated that the rate of tripoidy was 4.17‰ in PGD/S patients. In our study, we comprehensive compared the MDA based SNP array and MALBAC based NGS for triploid detection on single cell lever. And we concluded that SNP array could detect triploidy, while the present strategies of NGS are not sensitive to triploidy detection.
PGD and PGS were used to detect the aneuploidy and pathological gene carrier embryos and has been used     widely in clinical [19], which had been proved to increase the pregnancy rate of chromosome structure abnormality carrier patients, advanced maternal age, mendelian diseases and recurrent miscarriage patients [17,29,30]. Previous data showed that PGD and PGS were good strategy for single embryo transfer which would reduce the multiple pregnancy rate without affecting the clinical outcomes and take baby home rate [31]. The accuracy and coverage of PGD and PGS technology are critical factors for its clinical applications, such as all aneuploidy kayotypes were successfully detected. SNP-array PGD was reported that may increase the clinical pregnancy outcome of translocation carriers [18]. However, only matched cohort studies relating to patients of advanced maternal age, recurrent miscarriage and implantation failure, limiting the ability to draw meaningful conclusions [32]. Triploidy as an abnormal embryo karyotype can occur at low rate in human spontaneous conception and IVF, it is negative correlation with pregnancy rate in IVF [12]. In our study, our data showed that both SNP array and NGS were sensitive to chromosome duplication, deletion and mosaic, while NGS could not identify uniparental disomy and not sensitive to triploidy. In our study, both self and reference genome correction were used to analyze the NGS data of triploid DNA of chorionic tissues, and we found these two methods could not identify triploidy. The present bioinformatics algorithms mainly calculate the reads falling into the "continues windows" usually ranging from 1K to 1M on each chromosome, it is able to identify the reads number of the chromosome deletion, duplication and mosaic as shown in our study, but it ignores the SNPs and could not detect triploidy. Triploidy usually leads to the SNPs three nucleotides heterozygosis and the homozygous/heterozygosis was 1:2, SNP array well distinguishes the SNPs and well detects the triploidy even after the single cell whole genome amplification, while NGS needs to sequence deeper and much powerful bioinformatics algorisms to analyze the data. Although the NGS is much more powerful on detection micro duplication and deletion, it is weaker than SNP on detection triplody. SNP-based approach detected the relative distributions of alleles at polymorphic loci and does not require a reference chromosome for comparison [28]. Although MALBAC has increased the whole genome coverage rate, the triploidy still enlarge allele dropout rate for every chromosome of triploidy has same frequency to be amplification. Together with the genome recombination of triploidy will increase the difficulty of identification of triploidy. NATUS algorithm was used to distinguish the triploidy fetus using cfDNA from maternal blood, however it could not identify the SNPs precisely, for the allele dropout of single cell amplification [28]. Further study and strategies are essential to be developed to identify the triploidy using NGS. Time-lapse cooperated with NGS would be a better solution to decrease the insensitivity to triploid.

PGD and PGS patients controlled ovarian stimulation and embryos biopsy
The controlled ovarian stimulation (COS) of all the patients underwent PGD and PGS were carried out in a long protocol, GnRH analogues was used for pituitary desensitisation, together with human menopausal gonadotrophins (hMG) or recombinant FSH. The starting dose of gonadotrophins for PGD and PGS patients was determined according to the patient's age, BMI and/or previous response to ovarian stimulation (range from 75 to 300 IU QD). Human chorionic gonadotrophin (hCG) was administered when at least 60% follicles above 16 mm mean diameter and the biggest under 22 mm mean diameter were seen when transvaginal ultrasound scan. Transvaginal ultrasound-guided and vacuum takeoff oocytes collection was scheduled 36 h after hCG administration. Regardless of the sperm quality, Intra Cytoplasmic Sperm Injection (ICSI) was performed rather than IVF to prevent DNA contamination with sperm and cumulus cell's DNA during PGD and PGS. Fertilization was assessed 17-20 h after ICSI and embryo cleavage was recorded every 24 h. Embryo biopsy was performed on day 5 or 6 at blastocyst stage. All patients were informed consents in our study [33].

Single cell multiple displacement amplification (MDA) and SNP array
Multiple Displacement Amplification (MDA) was used for single cell amplification in SNP array. Briefly, REPLI-g Single Cell Kit (QIAGEN, 150345) was used for single cell amplification, single cell was seeding in 4.5μl phosphate-buffered saline, and then was lysised using 3μl DTT and DLB in incubator for 10 min at 65°C. After incubation, 3μl Stop Solution was added in the mix. The amplification mix was prepared and 40μl including 2μl REPLI-g sc DNA polymerase and 29μl REPLI-g sc Reaction Buffer was added to each tube. Then the mixture was incubated in incubator for 8h at 30°C following 65°C for 3 min to inactivate the REPLI-g sc DNA polymerase. The single cell amplification mix was stored at −20°C until for SNP array. The DNA of previous triploidy detected using SNP was diluted and amplification as described above. All the procedures were under the direction of Illumina human cyto12 microarray with minor modification in our center.

Single cell multiple annealing and looping based amplification cycles (MALBAC) and next generation sequencing (NGS)
Multiple Annealing and Looping Based Amplification Cycles (MALBAC) was used for whole genome amplification then high throughput sequencing. The DNA from previously SNP array results were quantified again with Qubit and then were diluted to 10pg for single cell amplification using YK015CHR (Yikon Genomics). Firstly the DNA was diluted into 4.5μl lysis buffer and lysis enzyme, the amplification products were fragmentation and ligated adapter, then PCR and magnetic purification according to the user manual. The barcoded DNA was sequenced using Hiseq 2500 Rapid 1×50 mode in our center.

Single cell and multi cell SNP array data analysis
The SNP array data was analyzed using GenomeStudio Software (2011, Illumina). B allele frequency and Log R ratio was used to analyzed the genotype. The copy number variance was called using KaryoStudio Software v1.4. The uniparental disomy was reported by both KaryoStudio and GenomeStudio.

NGS data analysis using self correction and reference karyotype
The raw data (in.bcl format) was demultiplexed and converted to FASTQ format using a perl script configureBclToFastq.pl in CASAVA 1.8.4 package based on the sample-sheet information. Illumina adaptors, low quality bases (bases with quality score less than 20) and MALBAC primers were removed from the FASTQ file using Trimmomatic [34].
High quality reads were mapped to hg19 reference genome using BWA with default parameters [35]. The mapped reads were sorted and converted to binary format using SamtoBam.jar in Picard package. Unique mapped reads were extract from the alignment reads (.bam file). Then the whole reference genome was divided into nonoverlapping observation windows (bins) with size of 1Mb. Reads number, GC content were calculated in each bin. GC bias correction was processed for every 1% GC content. The GC corrected relative reads number (RRN) of each bin was corrected by the reference training set [36]. 500 normal chromosome samples were prepared as reference training set, as well as self-correction were used to analyze the data. We use R programming language to graph the final RRN of each bin to visualize copy number variations.