Gene master regulators of papillary and anaplastic thyroid cancers

We hypothesize that distinct cell phenotypes are governed by different sets of gene master regulators (GMRs) whose strongly protected (by the homeostatic mechanisms) abundance modulates most cell processes by coordinating the expression of numerous genes from the corresponding functional pathways. Gene Commanding Height (GCH), a composite measure of gene expression control and coordination, is introduced to establish the gene hierarchy in each phenotype. If the hypothesis is true, than one can selectively destroy cancer nodules from a heterogeneous tissue by altering the expression of genes whose GCHs are high in cancer but low in normal cell phenotype. Here, we test the hypothesis and show its utility for the thyroid cancer (TC) gene therapy. First, we prove that malignant and cancer free surrounding areas of a surgically removed papillary TC (PTC) tumor are governed by different GMRs. Second, we show that stable transfection of a gene induces larger transcriptomic alterations in the cells where it has higher GCH than in other cells. For this, we profiled the transcriptomes of the papillary BCPAP and anaplastic 8505C TC cell lines before and after stable transfection with NEMP1, DDX19B, PANK2 or UBALD1. The four genes were selected to have similar expression levels but significantly different GCH scores in the two cell lines before transfection. Indeed, each of the four genes triggered larger alterations in the cells where they had larger GCH. Our results prove the feasibility of a personalized gene therapy approach that selectively targets the cancer cells from a tissue.


INTRODUCTION
Thyroid cancer (TC) is the most common endocrine malignancy in United States with rapidly rising incidence. The American Cancer Society estimates 56,850 new cases of thyroid cancer in 2017. Morphologically, the thyroid cancers are classified as papillary, follicular, medullary and anaplastic. Differentiated papillary (PTC) and follicular forms comprise 90-95% of thyroid cancers and are treatable, whereas anaplastic thyroid cancer (ATC) is the rarest but the most fatal and incurable form of the disease, with a median survival of 5 months [1]. ATC is an undifferentiated cancer arising presumably from pluripotent thyroid progenitor cells and accumulating genetic lesions. ATC is aneuploid with several chromosomal abnormalities and loss of www.impactjournals.com/oncotarget/ Oncotarget, 2018, Vol. 9, (No. 2), pp: 2410-2424

Research Paper
heterozygosity. ATC can range from poorly differentiated TC characterized by high mitotic index to squamaous TC where its thyroid origin is difficult to ascertain.
For most ATCs, mutation of BRAF (v-raf murine sarcoma viral oncogene homolog B) or in rare cases, detection of RET/PTC and PAX8/PPAR fusions suggests thyroid origin of cells [2][3][4]. Presence of the tumor protein TP53 and β-catenin commonly found in ATCs but occasionally seen in poorly differentiated carcinoma also suggests tumor dedifferentiation [5][6][7][8][9][10]. ATC etiology is not completely understood; however several premises indicate that it originates from thyroid stem cells to manifest an undifferentiated, but aggressive phenotype [11,12]. It is believed that both the papillary and follicular thyroid cancers result from a similar differentiation program coupled with the development of mutations most notably RET/PTC, RAS and BRAF for PTC and/or PAX8/PPARΎ [13][14][15][16][17]. However, genetic lesions are not uniformly distributed and no single or group of mutations define ATC [12,13,[18][19][20]. The heterogeneity of tumor cell differentiation and the repertoire of genetic lesions (although TC has the least numbers of genetic lesions as compared to other TCs) makes difficult the task of developing a universal therapy with demonstrated clinical efficacy [21,22].
There are several interactive open-access dbases of cancer transcriptomic signatures and even an Atlas of the Human Cancer Transcriptome [23] presenting favorable and unfavorable prognostic genes with associated Kaplan-Mayer surviving diagrams. In previous papers [24][25][26][27][28], we have analyzed the significance and utility of several potential TC biomarkers and therapeutic targets and their dependence on sex hormones [29][30][31][32]. However, being selected from the most frequently altered (as sequence or/and expression) genes in large population cohorts, the biomarkers appeared as less protected by the homeostatic mechanisms like low players in cell life. Therefore, restoration of the structure/expression of the altered biomarkers has most likely little therapeutic value. This explains why so far, no TC gene biomarker [33,34] proved therapeutically efficient. Moreover, not only the biomarker(s) but thousands other genes are altered in TC, in (although partially overlapping) never repeatable combinations and nobody knows whether the neglected contributions of the other gene alterations are really negligible.
If cancer and normal cells are governed by distinct gene master regulators (GMRs) than "smart" manipulation of GMRs would selectively destroy the cancer nodules without much damage to the surrounding healthy tissue. The idea of master regulators has been around for almost four decades, most authors looking for transcription factors occupying the top of the regulatory hierarchy that determines cell fate and differentiation. Sophisticated algorithms using reverse engineering of transcriptional networks have been proposed and validated for use in therapeutic decision making [35]. KEGG (http://www.genome.jp), GenMapp (http:// genmapp.org), IPA (http://ingenuity.com), DAVID (http:// david.abcc.ncifcrf.gov) and other popular software have been developed to ensemble the biomarkers and other genes in functional pathways. Regardless of the method (Pearson correlation, Boolean, Bayesian, differential equations or just knowledge-based) to network the genes with respect to their co-regulation in different conditions [36,37], such algorithms implicitly assumes ironclad functional pathways. This is a major weakness given the evident morphological and physiological changes during cancerization and in response to chemo-, radio and cell therapy. Although manually curated by genomic experts, the functional pathways constructed by these (actually) text miners are also "too" universal, lacking specificity with respect to race/strain, sex, age, and risk factors. Moreover, they are deterministic (unique network) in spite of the stochastic nature of the chemical reactions leading to an environmentally depending spectrum of possible "wirings" of the same subset of genes.
We consider as GMR a gene whose highly protected expression by the homeostatic mechanisms governs the phenotype by regulating the transcription of genes involved in major functional pathways through expression coordination. The high protection (indicating the critical importance for cell life) confines expression oscillations of GMRs in narrow intervals. Therefore, GMRs are rarely found spontaneously regulated and by consequence not selected as biomarkers. We estimate the protection of GMR from the reduced expression variability and its power to modulate a pathway from the Pearson correlation with expression oscillations of the pathway genes in biological replicas (obtained by splitting in four the malign and normal regions of the removed tumor).
Our approach, consistent with the Genomic Fabric Paradigm (GFP, [38,39], does not provide novel biomarkers for TC diagnostic in ALL patients but a revolutionary way to cure the TC of the ACTUAL patient. The genomic fabric is defined as the transcriptome associated with the most interconnected and stably expressed network of genes responsible for a particular functional pathway. The fabric exhibits specificity with respect to race/strain, sex and sex hormones, age, tissue/ cell type, and life style and environmental factors. It remodels during development, progression of a disease and in response to external stimuli and treatments.
Recently [40], we tested the hypothesis that cancer and normal cells are governed by different GMRs in the normal cortex and two primary tumor regions of a surgically removed clear cell renal cell carcinoma from a 74y old male. Here, we test again this hypothesis in a papillary cancer and cancer free surrounding tissue from the left thyroid lobectomy of a 33y old female. The efficacy of GMR targeting was checked by determining the transcriptomic effects of stable transfection of genes with different GCHs in the papillary BCPAP [41] and anaplastic 8505C [42] human TC cells that we used in previous studies [25,[27][28][29].

Cancer alters expression and coordination of numerous genes
Compared to the normal tissue, expression of 5.1%% of the quantified unigenes was up-regulated and of 2.7% was down-regulated in the malignant region of the excised thyroid tumor. When comparison was extended to the papillary BCPAP cells, 17.10% of the genes were upregulated and 15.53 down-regulated, while in the anaplastic 8505C cells, 17.09% of the genes were up-and 18.79% were down-regulated. Table 1 lists the fold-change (negative for down-regulation) and the p-value of the regulation, and Figure 1A the interconnection of the genes included in the KEGG 1 -determined pathway of thyroid cancer with respect to the surrounding unaffected tissue of thyroid.
We found a significant overexpression of members of the Ras oncogene family HRAS and KRAS (known for their prominent roles in various types of cancer (e.g. [44,45]. MAP2K1 (mitogen-activated kinase 1), part of the RAS/ MAPK pathways, which informs the cell nucleus about the extracellular chemical environment was also up-regulated. Expression of TP53 was not affected, indicating lack of ATC development [46,47]. Figure 1B presents the percentages of up-and down regulated genes in several well-defined groups of genes. Interestingly, out of the investigated groups of genes, the biomarkers selected from [48] formed the second most regulated group (after the thyroid cancer genes), confirming their value for TC diagnosis. In contrast, the transcription factors (TRF) formed the least regulated group (0.0% up and 2.3% down). With 16.5% up and 9.6% down-regulated genes (listed in Table 2) apoptosis is also a major pathway with substantial alteration in thyroid cancer.
We found also substantial remodeling of the transcriptomic networks by which the long intergenic non-protein coding RNAs (LI ncRNA) modulate major functional pathways through expression coordination (principle in Figure 1C) with pathway genes. Thus, Figure 1D presents the remodeling of part of the transcriptomic networks that LI ncRNAs form with apoptotic genes (coordination values in Table 3). Notably, some synergistically expressed gene pairs in normal tissue became independently expressed in cancer (ANKRD36BP2-BCL2, PMS2L2-TNFRSF10D, PMS2L2-TNFRSF1A) or an independently expressed pair (HCG11-PPP3CB) in normal became antagonistically expressed in cancer. H19 strong antagonism with expression of apoptotic genes in normal tissue is practically cancelled in cancer, confirming its important role in cancer proliferation revealed by several authors [49]. Our coordination analysis revealed that LI ncRNAs can modulate expression of genes located not only on the same but also on other chromosomes. For instance, in the normal tissue, H19 from Chr 11 antagonizes apoptosis genes from Chr 1 ( Table 1 lists all regulated genes in the cancer nodule with respect to the surrounding normal tissue.

Cancer and normal cells have different gene hierarchies in the thyroid
Like our previous finding in a case of clear cell renal cell carcinoma [40] the microarray experiment on the left thyroid lobectomy proved existence of genes with large GCH differences between the cancer and the normal part of the tumor. Figure 2 presents the gene commanding heights (GCH) in normal and malign regions of the analyzed thyroid tumor for several TC biomarkers, oncogenes, apoptosis genes and ncRNAs. Note that no biomarker has high GCH, explaining why none of them proved therapeutically efficient for TC. However, with GCH = 26.14 in the malign part and 1.41 (18.5x smaller) in the normal tissue, the 2.71x significantly up-regulated member of the RAS oncogene family RAB15 may be therapeutically actionable for this person as reported for other cancer cases (e.g. [50]).

Anaplastic and papillary TC phenotypes have major differences in cell-cycle pathway and gene networking
The transcriptomic profiles of the anaplastic (8505C) and papillary (BCPAP) TC cell lines were largely different, with the largest differences (61% = 45% up-and 16% down) in the expression of cell-cycle pathway genes ( Figure  3A). This finding explains the accelerated progression to the undifferentiated form of the TC in 8505C cells [51]. 1 modified from map05216, Kanehisa Laboratories, www.kegg.jp Significant expression differences were noticed in both DNA replication (S phase) and mitosis (M phase) phases of the mitotic cell cycle progression as well as in the temporal gaps known as G1 and G2 phases. Such differences between the two TC cell lines justify some of the therapeutic approaches of the ATC. Thus, overexpression of the key regulatory cyclin-dependent kinase CDK1 explains the choice of dinaciclib, a cyclin-dependent kinase inhibitor   [55]). The transcriptomes of the two phenotypes also differ in the way the genes are networked to accomplish various biological processes. For instance, Figure 3B and 3C present the apoptosis genes that are significantly (p < 0.05) coordinately expressed with CIC (= capicua transcriptional repressor) a gene related to the various forms of cancer (e.g.: [56,57]. Of note is that CIC coordinates the expression of many more apoptosis genes in the anaplastic phenotype and that only synergism with IL1A, PRKACA and PRKAR1A is common to both phenotypes.

Predictive value of GCH score: expression manipulation of a gene has larger effects on cells it commands
We determined and compared the GCH scores of individual genes in the BCPAP and 8505C cell lines before any transfection (partially illustrated in Figure 4A). DDX10, NEMP1, PANK2 and UBALD1 were selected because of their availability (through Albert Einstein College of Medicine Genomics Facility), significantly different GCH-scores but close expression levels (AVE) and low coefficient of variation (CV) in the two cell lines ( Figure 4B). The characteristics of the clones chosen to be stably transfected (one at a time) into the two types of cells are presented in Figure 4C. The microarray experiment validated (p-value = 0.000152) our hypothesis that manipulation of a gene's expression induced larger transcriptomic alterations in the cells where it has larger GCH ( Figure 4D-4G). There is a perfect positive Spearman rank correlation between the (% regulated in BCPAP, % regulated in 8505C) and (GCH in BCPAP, GCH in 8505C) for all four transfected genes. Moreover, the strong positive Pearson product-momentum correlation with the percentage of regulated genes ( Figure 4H) for each cell line validated the predictive value of GCH score.

DISCUSSION
The main methodological contribution of our report is the introduction and validation of the Gene Commanding Height (GCH) as a new measure of how influential the expression of a gene is for the phenotypic expression of a cell. With this measure one can establish the gene hierarchy and identify the Gene Master Regulators (GMRs) of cancer nodules and unaffected surrounding tissue. Thus, GCH analysis opens a novel cancer gene-therapy avenue by selecting the targets with high differences in favor of the malign region from the affected tissue.
Owing to the cancer dependence on race, sex, age, genetic heritage, medical history, environmental and life-style associated risk factors, each patient most likely has a distinct and dynamic GMR-hierarchy. Therefore, an efficient gene-therapy should identify the targets separately for each individual. Manipulation of a GMR could complement current treatment options by making cancer cells more vulnerable and normal ones more resistant to chemo-/radiation therapy. It could also be used post-surgical removal of the tumor to reduce the probability of cancer recurrence.
GMRs are not biomarkers because biomarkers are the most alterable while GMRs are the most protected genes. GMRs are not selected as the most co-regulated with other genes in cancer vs. normal phenotypes because gene networks remodel in cancer and co-regulation does not necessarily mean gene interaction. Instead, GMRs are    Red/blue/yellow background of the coordination value indicates that the paired genes were significantly (p-val < 0.05) synergistically/antagonistically/ independently expressed in the indicated region. Note the differences between the two regions. the most coordinately (synergistically or antagonistically) expressed with other genes in one phenotype at a time, dictating the transcriptomic stoichiometry [58] of gene networks.
In summary, we verified that cancer nodules and surrounding normal tissue are governed by different GMRs, and that manipulating the expression of a gene has larger effects on cells in which it has larger GCH.

Figure 3: Major differences between anaplastic 8505C and papillary BCPAP TC cells include expression of cell-cycle pathway genes (A)
and transcriptomic networks by which CIC (= capicua transcriptional repressor) coordinates expression of apoptotic genes (B-C). In B and C a red/blue line indicates that CIC and the other gene are (p-val < 0.1) synergistically/antagonistically expressed. In order to visualize the strength of the expression coordination, the length of the line is proportional to ρ 3 (ρ = Pearson pair-wise correlation coefficient). Thus, longer distances to CIC (as for PRKAR1B, IRAK2 and CYCS in the 8505C network) indicate stronger expression coordination and by consequence stronger downstream influence. Note the differences between the two networks. www.impactjournals.com/oncotarget GCH of the selected genes in the two cell lines before any transfection. X = expression ratio, CUT = absolute fold-change cut-off for significantly differentially expressed genes, P-VAL = p-value of the differential expression. (C) Characteristics of the transfected clones. In all transfections, we used the same lentiviral vector pLX304. (D-G) GCH vs % regulated genes for each transfection experiment. Note that always, higher GCH is associated with larger percentage of regulated genes. (H) GCH increases with GCH in each cell type. Strong positive Pearson product-momentum correlations (r = 0.914 for BCPAP and r = 0.873 for 8505C cells) were found between the GCH scores and the percentage of significantly regulated genes in both cell lines. www.impactjournals.com/oncotarget

Patient sample
Four (2 -6 mm 3 ) samples were dissected from a frozen unilateral, single, 32.0 mm papillary carcinoma, pathological stage pT3NOMx (http://emedicine.medscape. com/article/2006643-overview) collected in 2010 from a 33y old woman. Four small pieces from the negative for malignancy resection margins of the same tumor were used as control. The study was approved by New York Medical College and Westchester Medical Center (WMC) Committees for Protection of Human Subjects, commonly known as Institutional Review Boards (IRBs) by L-11,606 ("Comprehensive molecular analysis of thyroid cancer: diagnosis, predictors of progression and targets for directed therapy", PI RK Tiwari) and L-11,376 ("Quantifying cancer-associated remodeling of key genomic fabrics", PI. DA Iacobas). The approval granted access to frozen cancer specimens from the WMC Pathology Archives and depersonalized pathology reports, waiving patient's informed consent. Although the four samples dissected from each (malignant and malignant-free) region were chosen to be as homogeneous as possible cells of different phenotypes were not completely eliminated.

Cell lines
The predictive value of the GCH score was tested in papillary thyroid cancer cell line BCPAP and anaplastic thyroid cancer line 8505C, purchased from DSMZ in Braunschweig, Germany. Cell lines were cultured in Rosswell Park Memorial Institute (RPMI)-1640 supplemented with 10% fetal bovine serum (FBS), penicillin 10,000 IU/mL, streptomycin 10 mg/mL, and 2 mM L-glutamine. Validation of the cell lines was performed by the Genomics Core of the Albert Einstein College of Medicine (AECOM) of Yeshiva University (http://www.einstein.yu.edu/research/shared-facilities/ cores/46/genomics/).

Gene transfection
The stable transfection was performed using plasmids in ORF lentiviral plX304 vector produced by the AECOM shRNA Core Facility with the characteristics indicated in Figure 4C.

Microarrays
We used our standard protocol [59] for extraction the total RNA, reverse transcription and fluorescent labeling, and hybridization with Agilent (here human) 4x44k two-color gene expression microarrays in the "multiple-yellow" design. Four biological replicas (cell culture dishes or tissue samples) were profiled from each phenotype/cell type subjected to each condition.

Data analysis
A gene was considered as differentially expressed between two types of samples if the absolute expression fold-change exceeded the combined effect of microarray noise and biological variability and the p-value of the means' equality was below 0.05. All genes were assigned to functional pathways using Kyoto Encyclopedia of Genes and Genomes (http://www.genome.jp/kegg/ pathway.html).
In previous papers (e.g.: [38,60,61]) we have used the Relative Expression Variability (REV = median of the Bonferroni-corrected chi-square interval estimate of the coefficient of variation) as a statistical estimate of the expression variability of one gene among biological replicas. Expression of individual genes depends on local conditions that, although similar, are not identical among biological replicas. We assume that expression of key genes is kept by the cellular homeostatic mechanisms within narrow intervals while that of nonkey genes is less restrained to readily adapt to environmental changes. REV may shift in pathological conditions, suggesting that control mechanisms are also affected. The expression variability allows computing the Pearson productmomentum correlation coefficient between the expression levels of any two genes in the same condition. Using the coordination analysis, we determined the transcriptomic networks (that may cross cell boundaries as shown in [62])

Gene commanding height
Gene Commanding Height (GCH) in sample α (= benign, malign, BCPAP, 8505C) was computed for each protein-coding and non-coding transcript in each of the four conditions as a combined measure of the gene expression stability and coordination with each other gene: