Genetic variation and forensic efficiency of autosomal insertion/deletion polymorphisms in Chinese Bai ethnic group: phylogenetic analysis to other populations

Thirty insertion/deletion loci were utilized to study the genetic diversities of 125 bloodstain samples collected from Bai group in Yunnan Dali region, China. The observed heterozygosity and expected heterozygosity of the 30 loci ranged from 0.1520 to 0.5680, and 0.1927 to 0.4997, respectively. No deviations from Hardy-Weinberg equilibrium tests after Bonferroni correction were found at all 30 loci in Bai group. The cumulative probability of exclusion and combined discrimination power were 0.9859 and 0.9999999999887, respectively, which indicated the 30 loci could be used as complementary genetic markers for paternity testing and were qualified for personal identification in forensic cases. We found the studied Bai group had close relationships with Tibetan, Yi and Han groups from China by the population structure, principal component analysis, population differentiations, and phylogenetic reconstruction studies. Even so, for a better understanding of Bai ethnicity's genetic milieu, DNA genotyping at various genetic markers is necessary in future studies.


INTRODUCTION
As a new system of diallelic genetic marker, insertion/deletion (Indel) polymorphism, also called DIP, has recently been used in forensic sciences and population genetic studies.Due to the simple construction, Indels can be amplified to small amplicons and even degraded DNA samples can be analyzed accurately.Also, Indels have low mutation rates [1,2].Additionally, the genotyping method of Indels is similar to that of STRs comprising polymerase chain reaction (PCR) and capillary electrophoresis (CE) and easily achieved in forensic biology laboratories.In our study, InDels are diallelic markers which show 2 different alleles (+ or -), the number of InDel alleles is much less than those of STRs (for instance, D5S818 locus, allele 7, 8, 9, 10, 11, 12, 13 and 14).As a result, the forensic parameters of InDels, like discrimination power (DP), are lower than those of STRs with the calculation formula, DP 1i 2 i 1 n P = = ∑ [3].We could infer that the DP values are increased with the more alleles through integral theorem which was employed to make the explanation more comprehensive and objective.Therefore, Indels could be utilized as complement markers to improve the efficiency of existing genetic markers in personal identification and paternity testing.After Weber et al. reported human diallelic Indel polymorphisms for the first time [4], Indels have also drawn much attention in the biogeogarphic field [5][6][7].For multiplex amplification of 30 Indel loci plus a sex-determining locus Amelogenin, Qiagen Investigator DIPplex kit (Qiagen, Hilden, Germany) which has recently been made commercially available was employed in our study [8].
According to the 6 th nationwide population census of the People's Republic of China, Bai, with 1,933,510 people, ranks the 15 th of the 56 ethnic groups (http://www.stats.gov.cn/tjsj/pcsj/rkpc/6rp/indexce.htm).Approximately 80% of Bai people live in concentrated communities distributed in the Dali Bai Autonomous Prefecture in Yunnan Province, southwest China.Archaeological findings indicated that the history of Bais could be traced back to the Neolithic Age.Bai's language originated from Sino-Tibetan language family.The language of Bai maintains lots of Han words due to the Bais' long-term cultural contact with Han, the largest Chinese ethnic group.Bai ethnic group became a member of China, preserving their own the science and culture, since the Western Han Dynasty (109 BC) [9].
In this research, we obtained population data and forensic parameters of 30 Indels in the Bai ethnic group, and then analyzed the genetic differentiation between the Bai ethnic group and other populations.

Linkage disequilibrium analysis
The results of LD (linkage disequilibrium) were shown on the form of an inverted triangle (Figure 1).As shown by the red arrow, the pairwise LD between the locus HLD6 and the rest 29 loci were arranged in the left bottom (29 square grids).The others were performed so on.No red color area embraced with thick black curve was observed in the linkage disequilibrium graph.

Forensic parameters
The allele frequencies and forensic statistical parameters of the 30 Indel loci in Chinese Bai ethnic group were exhibited in Table 1, and the detailed Indel genotypes were shown in Supplementary Table 1.The HWE (Hardy-Weinberg equilibrium) tests showed the p value below 0.05 at HLD56 (Table 1).Therefore we used the validation analysis by program HWE (version 1.10) [10] observed three p values below 0.05 at HLD56, HLD81, and HLD118 loci, respectively.And these two independent analysts with high efficiency analyzed the HWE tests of 30 InDels in Bai ethnic group.After the Bonferroni correction (significance level, 0.0017), no deviations from HWE tests were observed at the 30 Indel loci of the Bai group.The observed heterozygosity (Ho) values ranged from 0.1520 at HLD118 locus to 0.5680 at HLD131 locus, while the expected heterozygosity (He) ranged from 0.1927 at HLD118 locus to 0.4997 at HLD136 locus, and the polymorphic information content (PIC) ranged from 0.1741 at HLD118 locus to 0.3749 at HLD136 locus.We found the highest power of exclusion (PE) (0.2542) at HLD131 locus and the lowest PE (0.0181) at HLD118 locus.The DP values of selected loci were in the range from 0.3100 (HLD118) to 0.6533 (HLD56).The highest value of typical paternity index (TPI) was 1.1574 (HLD131) and the lowest was 0.5896 (HLD118).

Principal component analysis
In Figure 3, the allele frequencies of 30 Indel loci of the Bai ethnic group in the study along with referenced 25 populations were utilized to perform a PCA plot.As shown in Figure 3, the first component accounted for 55.40% and the second 22.19%.After aggregation, the two principle components occupied the 77.59% of total variance.

Population differentiations
We employed the Analysis of molecular variance method to compare the studied Bai group with 25 reported populations mentioned above.The p values of Fst distances were shown in Table 2.The minimum and maximum numbers of significant difference observed at 30 loci were 2 (Bai and Yi, Bai and Tibetan1), and 22 (Bai and Hungarian), respectively.In Supplementary Table 2, we calculated the D A values among 26 populations, ranging from 0.0006 (Guangdong Han and Shanghai Han, Henan Han and Beijing Han1) to 0.0358 (Dong and Hungarian).Among Bai and other 25 reported populations, the minimum was 0.0012 (Bai and Tibetan1) whereas the maximum was 0.0238 (Bai and Basque).

Phylogenetic reconstruction
To analyze the genetic background of Bai group, we drew a Neighbor Joining (NJ) Tree [25] based on the D A genetic distances (Supplementary Table 2).The NJ tree showed two main clusters (Figure 4).One comprised the Uyghur1 and 17 East Asian populations (She, Guangdong Han, Shanghai Han, Beijing Han, South Korean, Xibe, Tibetan, Tibetan2, Tibetan1, Bai, Yi, Henan Han, Beijing Han1, Chengdu Han, Zhuang, Miao and Dong populations).And the other was composed of the Kazak, Uyghur2, Uyghur, Uruguayan and other European groups (Dane, Hungarian, Basque, and Central Spanish).

Linkage disequilibrium analysis
If no relevance exists between two loci from the same chromosome or two random chromosomes, in other words, they are suitable for forensic applications as independent loci [26].The LD analysis indicated no significant LD (r 2 < 0.8) among the 30 loci.We observed the LD analysis of studied Bai group and Chinese Yi group previously reported [27], those two studies obviously verified no linkage among 30 Indel loci, therefore, those loci could be used as independent markers for forensic and population genetic analysis.

Forensic parameters
In this research, the cumulative probability of exclusion (CPE) and combined discrimination power (CDP) were 0.9859 and 0.9999999999887, respectively.The CPE values were 0.9861, 0.9968, 0.9957, 0.9975, 0.9884, 0.9900 in the previously published Dong, Uyghur1 [22], Tibetan, Uyghur [17], Taiwanese, Poles [28], respectively.And we could observe that the CPE of 30 Indel loci in various populations changed within a narrow scope.Collectively, the CPE values using the kit were relatively lower and could not reach a high level of exclusion in forensic paternity cases compared to STR genetic markers [29,30].Nonetheless, the CDP values had the ability to provide considerable level of discrimination when the kit came to forensic identification cases.According to the formula calculating PIC [31], we established the PIC calculation model of diallelic markers:    PIC=2p-4p 2 +4p 3 -2p 4 , p represented the frequency of random InDel allele (+ or -).PIC value was expected to range from 0 to 0.5.Accordingly, it indicated that the forensic parameter PIC would be less than 0.5 for diallelic genetic markers, for example InDels and SNPs [32], much lower than STRs [33].
To be concluded, we could utilize the kit as supplementary mean for forensic cases tested by autosomal STRs.

Population structure
Individuals of different populations far from each other (for example, Asian and European) always had diverse membership coefficient in deductive clusters.In the present study, at K=3, the 10 East Asian groups (Chengdu Han, Zhuang, Dong, Miao, Henan Han, Beijing Han1, Tibetan1, Yi, Tibetan, and Bai) were all constituted of different components by an uniform proportion, which indicated the Bai group's similar membership to East Asian populations.Compared to the 17 East Asian groups, constructions of the 4 Eurasian groups (Uyghur, Uyghur1, Uyghur2, and Kazak) were more semblable with the Uruguayan and European groups [34].Based on the instruction of structure program [35], the populations sharing similar structure were close in memberships.Therefore, through structure analysis on raw data of 30 Indel loci, the Bai ethnic group was homogeneous with other Asian groups.
lower left quadrant, which indicated their close genetic relationships.

Population differentiations
The values of D A genetic distance were consistent with the geographic locations of these populations.Supposing threshold value was 0.05, we could observe the significant differences between the Bai group and 25 referenced groups.Compared with East Asian groups and Eurasian groups, Central Spanish, Uruguayan, Dane, Basque and Hungarian groups were found had more differences with Bai group.Significant differences between the Bai group and all East Asian and Eurasian groups were found at less than 12 loci, whereas the Bai group and Uruguayan as well as European groups were more than 17 loci.For geographical concerns, the results corresponded with the geographic distribution of the 26 groups.
In the aspects of 30 loci, the highest ethnic diversity was found at HLD114 locus which reflected significant differences between the studied Bai group and other 19 groups.By contrast, no significant differences were observed at HLD92 locus between the Bai group and other 25 groups.Similar to STRs [36,37], the ability of Indels to distinguish ethnic groups was at various levels.Along with more typing data of different populations at novel Indel loci, evolution in human history would be investigated comprehensively and detailedly.

Phylogenetic reconstruction
The results of the phylogenetic reconstruction were roughly in line with population differentiation.Previously, we had reported that Bai ethnic group was close to Southern Chinese Han, Changsha Han, and Guangdong Han populations within the scope of mitochondrial DNA [38].It demonstrated that Bai ethnic group in Chinese Yunnan Province had multi origins, for example, Chinese Han groups, Yi group and Tibetan group, etc.In history, descents of different ethnic groups integrated into Yunnan Bai group including Diqiang system from Northern China, Pu system from Southern China and Chinese Han [39] (http://www.china.org.cn/e-groups/shaoshu/shao-2-bai.htm).Diqiang and Pu systems were both ancient nations in China dating back to 1000 BC.Ancient Diqiang was the origin of Tibetan, and ancient Pu was the origin of Dai, Blang, De'ang and Va ethnic minorities [40].Hence, our study illustrated Bai ethnic group in Yunnan province was not only close to Chinese Han in relationship, but In summary, the inference focusing on the association between studied Bai group and Chinese Han population nationwide, Yi, Tibetan group and other ethnic groups was supported by the studies of population structure, principal component analysis, population differentiations, and phylogenetic reconstruction, which was consistent with population migration and cultural exchange in history.Studies on Bai ethnic group utilizing STR as genetic markers [41,42] also indicated a close relationship between Bai group and Han populations nationwide.

Samples and DNA extraction
We collected bloodstain samples from 125 unrelated healthy Bai individuals living in Dali Bai Autonomous Prefecture in Yunnan province with informed consent.In this study, donors should have ancestors living in Dali for over three generations and have no significant migration in their family history.The collection process follows the human and ethical research principles of Xi'an Jiaotong University Health Science Center, China.The Chelex-100 method was used to extract genomic DNA from bloodstain samples as described by Walsh [43].

PCR amplification and genotyping for InDels
In this study, we employed Investigator DIPplex kit for InDels genotyping including 30 InDel loci plus a sexdetermining locus Amelogenin, which has been validated before [8,44].The PCR amplification conducted in a GeneAmp PCR System 9700 Thermal Cycler (Applied Biosystems, Foster City, CA, USA) in accordance with the manufacturer's instruction.After that, the ABI 3130 Genetic Analyzer (Applied Biosystems, Foster City, CA, USA) was utilized to perform electrophoresis in the conditions described in the manufacturer's recommendations.With the BTO 550 (Qiagen, Hilden, Germany) as internal lane standard, we could determine the fragment sizing.We used the GeneMapperID software v3.2 (Applied Biosystems, Foster City, CA, USA) to identify the alleles.

Quality control
The study was conducted following ISFG recommendations as Schneider described in the aspects of DNA polymorphisms [45].

Statistical analyses
The modified PowerStates (version 1.South Korea) [47] to perform LD analysis.We estimated Fst and p values between pairwise populations by Arlequin (version 3.0) [48] to embody the variances in allele frequencies of different populations.To analyze the population structure, we used the Structure program (version 2.2) [49] to estimate the membership coefficients.
The Ln Pr(X|K) [50] and Delta K [51] were calculated to determine the most appropriate K value.On the basis of allele frequencies, principal component analysis (PCA) was carried out in MATLAB 2007a (MathWorks Inc., USA).For phylogenetic reconstruction, we utilized the D A distance and employed phylogenetic analysis (DISPAN) program [52].

CONCLUSION
In conclusion, the combination of 30 Indel loci provided a relatively low level CPE (0.9859) and a high level of CDP (0.9999999999887).Research of population genetics on population structure, principal component analysis, population differentiations, and phylogenetic reconstruction of Bai group based on 30 Indel loci supplementally supported that the Bai group had close relationships with certain ethnic groups from China, for example Tibetan and Yi group, consistent with the analysis by STR markers and history migration.With more genotyping profiles of various ethnic groups, we could have a comprehensive understanding of population migration and ancestry origin in China.

Figure 1 :
Figure 1: The LD analysis schema between the 30 Indel loci utilizing the SNPAnalyzer 2.0 program.

Figure 2 :
Figure 2: Clustering structure for the full-loci dataset assuming K= 2-7 of the bai group with other groups.Population names were labeled beneath while language names on top.

Figure 3 :
Figure 3: PCA analysis based on 30 Indel loci of the studied Bai group and referenced 25 groups.

Figure 4 :
Figure 4: Phylogenetic tree constructed by the agglomerative clustering (Neighbor Joining) method based on the D A distances between the 26 groups.
2) spreadsheet (Promega, Madison, WI, USA) was used to calculate the allele frequencies, forensic parameters and the exact chi-square test for the HWE of 30 InDels.The forensic parameters included Ho, PE, PIC, DP, and TPI.He values were calculated by the formula, .We utilized the SNP Analyzer version 2.0 (Istech,

Table 1 : Allele frequency distribution and forensic statistical parameters of the 30 Indel loci in Chinese Bai ethnic group (n=125)
index; PIC, polymorphic information content; PE, power of exclusion; DP, discrimination power.