Phylogenic analysis and forensic genetic characterization of Chinese Uyghur group via autosomal multi STR markers

We investigated the allelic frequencies and forensic descriptive parameters of 23 autosomal short tandem repeat loci in a randomly selected sample of 1218 unrelated healthy Uyghur individuals residing in the Xinjiang Uyghur Autonomous Region, northwest China. A total of 281 alleles at these loci were identified and their corresponding allelic frequencies ranged from 0.0004 to 0.5390. The combined match probability and combined probability of exclusion of all loci were 5.192 × 10−29 and 0.9999999996594, respectively. The results of population genetic study manifested that Uyghur had close relationships with those contiguous populations, such as Xibe and Hui groups. In a word, these autosomal short tandem repeat loci were highly informative in Uyghur group and the multiplex PCR system could be used as a valuable tool for forensic caseworks and population genetic analysis.


INTRODUCTION
The Uyghur ethnic group, the fourth largest ethnic minority of China, primarily lives in northwest China's Xinjiang Uyghur Autonomous Region. Previous population genetic studies [1][2][3] indicated that Uyghurs possess mixed anthropometric and genetic traits of both Europeans and Central Asians. Therefore, the genetic data analysis of more markers in Uyghur group will be of benefit to shed light on its genetic relationships with other populations.
Short tandem repeats (STRs) have been publicly recognized as vital genetic markers in forensic sciences. In order to achieve better performances in forensic applications, especially in some mutation events of parentage testing, we need more genetic markers with high polymorphisms. Meantime, we should obtain population genetic data of these markers as much as possible before putting them into actual forensic cases. In this study, we investigated the allelic frequencies and forensic descriptive parameters of autosomal multi STR loci in Uyghur group using HuaXia Platinum PCR Amplification system, which included 23 autosomal STR loci and two sex associated markers (Y-chromosome insertion and deletion and Amelogenin). Furthermore, we evaluated genetic relationships between studied Uyghur group and other reference populations [4][5][6][7][8][9][10][11][12][13][14][15][16][17] from the same or different regions based on 14 shared STR loci including D8S1179, D21S11, D7S820, CSF1PO, TH01, D13S317, D16S539, D2S1338, D19S433, vWA, TPOX, D18S51, D5S818 and FGA loci.

Allelic frequencies and forensic descriptive parameters
The results of allelic frequencies and forensic descriptive parameters for 23 STR loci were presented in Table 1 and Table 2, respectively. A total of 281 alleles were observed in Uyghur group with their corresponding frequencies ranging from 0.0004 to 0.5390. For Hardy-Weinberg equilibrium (HWE) test, twenty-two of 23 STR loci were observed to exhibit HWE after Bonferroni correction (p = 0.05/23≈0.0022); only D3S1358 violated Hardy-Weinberg equilibrium. The match of probability (MP) ranged from 0.0123 at Penta E locus to 0.2003 at TPOX locus with the average of 0.0692. The probability of exclusion (PE) ranged from 0.2862 at TPOX locus to 0.8102 at Penta E locus with the average of 0.5969. The mean values of discrimination power (DP), polymorphic information content (PIC), observed heterozygosity (Ho) and expected heterozygosity (He) were 0.9380, 0.7776, 0.7957 and 0.8048, respectively; and the highest values of DP, PIC, Ho and He were also found at Penta E locus, the least observed at TPOX locus. The combined match probability (CMP) and combined probability of exclusion (CPE) of 23 STR loci were 5.192×10 -29 and 0.9999999996594, respectively.

Comparisons of inter-population genetic differentiations and genetic distances
As shown in Table 3, the results of population differentiation comparisons (p-value) were obtained with the method of analysis of molecular variance (AMOVA). After applying Bonferroni correction to multiple tests (p = 0.05/238≈0.00021), the least differentiations were found between Uyghur group and Uyghur1, Hui and Xibe groups at the same 14 loci, with significant differences at 0, 1 and 3 loci, respectively. By contrast, more significant differences were observed between Uyghur group and African American and Caucasian populations, with significant differences at 13 loci. Among all the compared loci, the most population diversity locus was found at TH01 with significant difference between Uyghur and 16 reference populations, the least found at TPOX with only 2 reference populations.
The pairwise fixation index (Fst) genetic distances between Uyghur group and other 17 reference populations were generated by the Genepop v4.0.10 [18] based on 14 overlapping STR loci which were shown in Supplementary  Table 1. The close genetic distances were found between two Uyghur groups (0.0009), followed by Hui (0.0038) and Xibe (0.0043) groups; and the longest distance was observed between Uyghur and Mexican (0.0225) group.

Multidimensional scaling and principal component analysis
A principal component analysis (PCA) plot was drawn using allelic frequencies of 14 shared loci of Uyghur group and 17 previously published populations by MVSP v3.1 software [19]. As demonstrated in Figure 1, four conglomerate groups were obviously observed: the first group consisting of Asian and nine Chinese populations including Beijing Han, Henan Han, Shaanxi Han, Guangdong Han, Hui, Tibetan, She, Yi and Xibe populations located in the right part of the graph; the second group including Spanish, Portuguese, Caucasian, and African American populations located in the top left corner; the third one, Hispanics and Mexican group, found to be positioned in the lower left part; the last group of two Uyghur groups clustering together located in the middle part. As shown in Figure 2, a multidimensional scaling (MDS) plot of eighteen populations was illustrated based on genetic distances (Fst values) using the 'R' package (http://www.r-tutor.com/category/r-packages). The similar population distributions were observed in the MDS plot: Chinese populations and Asian population were located in the right part; Mexican and African American were located in the left part; Spanish, Caucasian, Portuguese and Hispanics were located in the middle of the plot.

Phylogenic reconstructions
By using neighbor-joining method, two phylogenic trees were constructed by the software PHYLIP v3.6 ( Figure 3A) and MEGA v5 ( Figure 3B), respectively. The similar results which were obtained from two phylogenic trees depicted three clusters. And the first cluster consisted of Hui, Tibetan, Xibe, Shaanxi Han, Beijing Han, Henan Han, Yi, She, Asian and Guangdong Han; the second clade was shared by two Chinese Uyghur groups; the last comprised Mexican, Hispanics, African American, Spanish, Portuguese and Caucasian populations.

Forensic parameter analysis
HWE announces that a large random-mating population without selecting, mutating or migrating is thought to be in HWE. For populations in HWE, allelic    At 23 STR loci, the Penta E locus possessed the greatest values of PD and PE, whereas the TPOX locus owned the lowest. Previous studies [22][23] also found the similar results, which indicated Penta E was the most valuable locus and TPOX was a lower polymorphic locus than other STR loci in forensic cases. Therefore, we should screen more highly polymorphic loci so as to obtain better system effectiveness in subsequent forensic DNA and population genetic studies. The CMP and CPE could be regarded as indicators to evaluate efficiency of genetic markers in forensic application. In this study, the CMP and CPE obtained from the studied Uyghur group were 5.192 × 10 -29 and 0.9999999996594, respectively, which indicated the panel could be a robust and valid tool for individual identification and parentage testing in forensic caseworks.

Analysis of inter-population differentiations and genetic distances
If the loci with significant differences in allelic frequencies between pairwise populations are too many, it means that their genetic relationships are long distance and vice versa. Comparisons of genetic differentiations between Uyghur group and other reference populations based on 14 overlapping STR loci indicated the Uyghur had closer genetic relationships with some Chinese populations than those from different continents. By investigating interpopulation differentiations based on allelic frequencies using AMOVA method, the two Uyghur groups showed no significant differences at these loci, and the significant differences were found between the studied Uyghur and Hui group at only TH01 locus; Xibe group at CSF1PO, TH01, D13S317 loci, revealing the short distances between Uyghur and Hui, Xibe groups. Deng [24] et al. reported the similar result that significant difference between Hui and Uyghur group was only observed at TH01 locus at the same 14 STR loci. Furthermore, the TH01 locus showed the most population diversities in the current study, which was also reported by Meng et al [9]; and the locus would have contributed to study the genetic differentiations among different populations.
Fst is a measure of genetic differentiation between compared populations which is used frequently in population genetics. The pairwise populations whose Fst is small usually show similar allelic frequency distributions; on the contrary, they show greatly different. In this study, the relative small Fst values were observed between Uyghur group with its adjacent populations such as Hui and Xibe groups, which indicated they had tight genetic relationships.

Phylogenic analysis
The genetic relationships between Uyghur group and 17 reference populations were revealed by the PCA and MDS figures based on allelic frequencies of the same 14 STR and population pairwise Fst values, respectively. The results of PCA and MDS analysis indicated that the populations with closer geographical distances had more intimate relationships. Besides, results obtained once again showed that Uyghur is a Eurasian population. In  Notes: The numbers in bold mean statistically significant after Bonferroni correction (p = 0.05/238≈0.00021). www.impactjournals.com/oncotarget order to further evaluate genetic relationships among these populations, we constructed two phylogenic trees. The results obtained from phylogenic trees were basically in agreement with above genetic relationship analysis. In addition, previous studies also reported the similar results as following. Shen

Population samples and DNA extraction
We gathered blood samples from 1218 unrelated healthy individuals of Uyghur ethnic minority in northwest China's Xinjiang Uyghur Autonomous Region after receiving their written informed consents. Each individual whose family has been living in the region for at least three generations didn't intermarriage with other ethnic groups. All experiment procedures were in agreement with the ethical committee of Xi'an Jiaotong University Health Science Center, China. Under the instructions of Walsh et al, we obtained genomic DNA of each sample with the Chelex-100 method [29].

PCR and DNA typing
All loci were co-amplified using HuaXia Platinum PCR Amplification kit in GeneAmp 9700 PCR system (Applied Biosystems, Foster City, USA) under the producer's specification. Thermal cycler conditions were as described below: pre-denaturation at 95°C for 1 min, followed by 27 cycles of 94°C for 3 s, 59°C for 16 s, 65°C for 29 s, and a final extension for 5 min at 60°C. The whole PCR reaction could be finished in 1 hour. Separation of amplified products was performed by capillary electrophoresis on the ABI 3130xl Genetic Analyzer (Applied Biosystems, Foster City, CA, USA) following the instructions in 11 µl reactions which consist of 1 µl PCR product or allelic ladder and the mixture of 9.6 µl Hi-Di Formamide and 0.4 µl GeneScan ® -600 LIZ ® Size Standard v2.0. The loading mixture was first denatured at 95°C for 3 min, followed by cooling at 4°C for 3 min immediately. The results of STRs typing were identified by the GeneMapper ID 3.2 software (Applied Biosystems, Foster City, CA, USA). Deionized water and control DNA from human cell line 9948 were typed as negative and positive control, respectively.

Statistical analysis
We calculated allelic frequencies, MP, PE, DP, and PIC using the corrected PowerStats v1.2 [30]. We evaluated Ho and He by making use of GenAlEx software version 6.503 [31] which could analyze a range of population genetic data. The test of HWE for each locus was performed by the Genepop v4.0.10 using the Markov chain algorithm. AMOVA was employed to calculate the inter-population differentiation by ARLEQUIN v3.5.