Whole-exome sequencing identifies SGCD and ACVRL1 mutations associated with total anomalous pulmonary venous return (TAPVR) in Chinese population

As a rare type of Congenital Heart Defects (CHD), the genetic mechanism of Total Anomalous Pulmonary Venous Return (TAPVR) remains unknown, although previous studies have revealed potential disease-driving regions/genes. Blood samples collected from the 6 sporadic TAPVR cases and 81 non-TAPVR controls were subjected to whole exome sequencing. All detected variations were confirmed by direct Sanger sequencing. Here, we identified 2 non-synonymous missense mutations: c.C652T, p.R218W in activin A receptor type II-like 1 (ACVRL1), c.C717G, p.D239E in sarcoglycan delta (SGCD). Our results offered the landscape of mutations for TAPVR in Chinese population firstly and are valuable in the mutation-based pre- and post-natal screening and genetic diagnosis for TAPVR.


INTRODUCTION
Total Anomalous Pulmonary Venous Return (TAPVR) (OMIM:%106700) is one type of cyanotic Congenital Heart Defects (CHD), in which none of the pulmonary veins connect to the left atrium and are malpositioned to the systemic venous circulation [1] ( Figure 1A, 1B). TAPVR characterized by cardiac function deterioration including cyanosis, pulmonary hypertension, dyspnea, pulmonary edema, congestive heart failure affects 1 in 15,000 live births and 48.8% of them will die without surgery before the age of 1 [2]. Although TAPVR generally presents significant fatality rate, the pathogenesis is still vague and become hot research spot.
In 1960, Neill et al. described the first case that TAPVR in 'scimitar syndrome' [3]. Then, TAPVR has been associated with GATA binding protein 4 (GATA4), Zic family member 3 (ZIC3), and Gap junction protein and alpha 1 (GJA1) [4]. After a de novo 10;21 balanced translocation was reported, Cinquetti et al. defined ANKRD1 as a possible candidate gene for TAPVR in 2008 [5]. Bleyl et al established a locus for TAPVR at 4p13-q12 [6] and found the PDGFRA as a driver gene in the further detailed research [7]. A non-synonymous variant in retinol binding protein 5 (RBP5) by Whole genome sequence (WGS) was analysised in 2 TAPVR patients recently [8]. Nevertheless, the contribution of known genes above was still limited to investigate the genetic cause . Therefore, we applied WES to the investigation of the genetic cause in 6 sporadic TAPVR cases and 81 non-TAPVR controls. We performed independent replication on additional 12 TAPVR patients by Sanger sequencing to identify the possible genetic variants.

Gene classification
After filtered through Public and in-house database, genes were classified referring to ACMG standards and guidelines [9]

Genetic findings
We sequenced six sporadic TAPVR cases with mean coverage of 77-fold (~ 8.33 GB Raw data yield per individual with paired-end, 100bp reads) (Supplementary Table 5). About 19,000 (range from 18824 to 19301) primary variants with high quality were identified per individual with standard GATK-haplotype calling process. Mutation screening was carried out according to American College of Medical Genetics (ACMG) criteria guidelines: (i) UTR, synonymous, intronic variants removed; (ii) variants with minor allele frequency (MAF) < 1% from dbSNP 135, 1000 Genomes, ESP and ExAC; (iii) variants completely absent from 81 controls. After filtering, we used a combined strategy to identify potentially causal variants as follows : (i) Loss of function (LOF) variants or not; (ii) Allele frequency in the population; (iii) Homozygous or heterozygous mutations; (iv) Recurrence of the variants in different gene classification [9]. Finally, all the variants were divided into three Parts according to the possibility of pathogenic [10]. In Part I variants, we found two heterozygous missense SNVs: ACVRL1(NM_000020.2): c.C652T,p.R218W and SMAD9(NM_001127217): c.C743A, p.T248K in Patient 5; In Part II variants, SGCD(NM_172244): c.C717G, p.D239E was recurrent heterozygous missense variant in Patient 5, 6; In Part III variants, remaining variants were mainly referring to 62 LOF variants with MAF < 0.1% (Supplementary Table 6). (Figure 2)

Functional prediction
Firstly, mutational impacts in Part I or Part II variants were predicted from tools including SIFT, Polyphen2 and MutationTaster (Table 1). Prediction Scores of ACVRL1: c.C652T with high scores indicated probably or possibly damaging. However, the mutations in SGCD and SMAD9 were predicted as ''benign''. Additionally, Part III variants were predicted by ToppGene (Table 2). Finally, to quantify the functional relationship,

Validation of variants
In the validation stage, the identified exome sequencing candidate variants were confirmed by PCRbased Sanger sequence in 6 discovery TAPVR cases and another 12 validation TAPVR cases ( Figure 4). Primers of Sanger sequencing were listed (Supplementary Table 7).

DISCUSSION
In our study, 18 TAPVR cases and 81 non-TAPVR controls were included to reveal the genetic etiology of TAPVR in Chinese population. We screened out candidate pathogenic variants in 6 TAPVR patients, and replicated these variants in another 12 TAPVR cases. Further results were annotated and classified into several ranks based on the functional impacts. We provided evidence for ACVRL1 as a known causative gene and for SGCD as candidate genes for TAPVR. Although potential diseasedriving regions or genes (e.g., 4p13-q12, ANKRD1, etc.) in previous studies were detected, we only found few intron or UTR variants in ANKRD1 or PDGFRA in our data. It may be ethnic heterogeneity because the previous studies mainly focused on Utah family rather than Chinese family TAPVR.
A rare missense mutation in ACVRL1 (c.C652T, p.R218W) was identified as a causal mutation for TAPVR. Activin receptor like kinase 1 (ACVRL1) is a type I cellsurface receptor for the TGF-beta superfamily of ligands, located on chromosome 12q13. Johnson et al. [11] noted that the high expression of ACVRL1 in highly vascularized     Oncotarget 27817 www.impactjournals.com/oncotarget tissues such as lung and placenta. More than 225 mutations in ACVRL1 have been reported in HGMD (The Human Gene Mutation Database). Notably, C652T was reported in vein of Galen aneurysmal malformation (VGAM) [12]. T649G (p.T217G), G650A (p.W217X) [13][14][15] and G656A (p.G219H) [16] in nearby were also critical for HHT1 or pulmonary hypertension. It is interesting that we aimed at finding new TAPVR gene, but WES ultimately led to the identification a known causal variant in a different but related disease. Pulmonary hypertension is a rare lung disorder in which the arteries that carry blood from the heart to the lungs become narrowed, making it difficult for blood to flow through the vessels. All of the TAPVR patients sequenced suffered from pulmonary hypertension and mostly recovered after surgery. Therefore, we could infer that mutations in ACVRL1 may account for TAPVR.
A recurrent mutation in SGCD (c.C717G, p.D239E) was detected, which coded for the dystrophin-glycoprotein complex (DGC). Mutations of SGCD were accompanied with dilated cardiomyopathy and muscular dystrophy in Caucasians, although not previously reported on TAPVR [17,18]. Particularly, the recent study confirmed that the c.S151A mutation in SGCD causes a mild, subclinical cardiomyopathy phenotype [19]. Notably, they were found in 2/6 discovery patients, 1/12 validation patients and not in 81 control samples. Therefore, C717G were proven more responsible for TAPVR.
In contrast to previous studies [5,7], whole-exome sequencing has already arrived in the clinic and was a powerful tool to investigate candidate mutations for rare diseases. The FORGE (Finding of Rare Disease Genes) Canada Consortium studied 264 disorders from the 371 submitted and identified disease-causing variants for 146 disorders over a 2-year period [20]. Several limitations in our study should be concerned including sample size. However, considering the low incidence of TAPVAR in population, our study is still a good attempt. In addition, analysis only focused on variations in coding regions, information for other regions were missing (such as introns, UTR or intergenic regions) [21]. Taken together, our findings need to be further validated in functional studies or large well-designed population-based studies in the future.
In conclusion, our study was the first attempt to dissect the etiology of TAPVR using whole-exome sequencing strategy in Chinese population. The results will provide important value to translate mutations detected by whole-exome sequencing to clinical diagnosis.

Subjects
All of the TAPVR children were recruited from Nanjing Children's Hospital, Nanjing Medical University. The discovery cohort consisted of 6 cases (mean age: 5 months (range 1-11 months); gender: 1 female, 5 male) (Supplementary Table 1). The replication cohort consisted of 12 cases (mean age: 27 months (range 1-152 months); gender: 8 female, 4 male). All patients are Non-syndromic TAPVR with atrial septal defects (ASD) or pulmonary hypertension (PH). We performed Identity-by-descent (IBD) on 6 cases using PLINK 1.07 and confirmed no blood relationship on them (Supplementary Table 2). All of patients were proved by echocardiography, 12-lead electrocardiography and surgery. An in-house control database, including 81 non-TAPVR individuals, were also performed WES. All subjects included in this study have written informed consent, and the study was approved by the institutional ethical committee of Nanjing Medical University. All methods and experimental protocols were approved by Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, and carried out in accordance with the approved guidelines.

Extraction of DNA
A DNA extraction procedure was carried out using a QIAamp™ DNA and Blood Mini kit Qiagen™) according to the manufacturer's protocols. Total DNA concentration and quantity were assessed by measuring absorbance at 260 nm with NanoDrop 2000c Spectrophotometer (Thermo Scientific™).

Library preparation and sequencing
Genomic DNA from whole blood (0.2 µg DNA) and was captured using the Agilent SureSelect Human All Exon v5 Kit. Input amounts were fragmented to a size range of 200 bp followed by end repair, adaptor ligation, and 11 PCR cycles. Appropriate amounts of enrichment DNA libraries were barcoded, pooled and loaded to lanes of a HiSeq Flow Cell, followed by 101 bp paired-end sequencing using Illumina HiSeq 1500 platform according to manufacturer's protocol.

Bioinformatics analysis
For whole-exome sequencing (WES), image analysis and base calling were performed with CASAVA v1.8.2 using default parameters. The sequence reads were aligned to the hg19 reference sequence using BWA [22]. Picard tools were used to mark duplicates, and then multiple GATK tools (GATK LeftAlignIndels, IndelRealligner and Base Quality score recalibration) were applied to improve alignment accuracy. Variant discovery and genotype calling of single nucleotide variants (SNVs), insertions and deletions were performed on all individuals globally using the HaplotypeCaller modules of Genome Analysis Toolkit (GATK v2.8) [23]. Variant annotation process was performed using SnpEff, SnpSift (http://