Characterization of epithelial-mesenchymal transition intermediate/hybrid phenotypes associated to resistance to EGFR inhibitors in non-small cell lung cancer cell lines

Increasing evidence points to a key role played by epithelial-mesenchymal transition (EMT) in cancer progression and drug resistance. In this study, we used wet and in silico approaches to investigate whether EMT phenotypes are associated to resistance to target therapy in a non-small cell lung cancer model system harboring activating mutations of the epidermal growth factor receptor. The combination of different analysis techniques allowed us to describe intermediate/hybrid and complete EMT phenotypes respectively in HCC827- and HCC4006-derived drug-resistant human cancer cell lines. Interestingly, intermediate/hybrid EMT phenotypes, a collective cell migration and increased stem-like ability associate to resistance to the epidermal growth factor receptor inhibitor, erlotinib, in HCC827 derived cell lines. Moreover, the use of three complementary approaches for gene expression analysis supported the identification of a small EMT-related gene list, which may have otherwise been overlooked by standard stand-alone methods for gene expression analysis.


INTRODUCTION
Epithelial-mesenchymal transition (EMT) -wherein epithelial cells depolarize, lose their cell-cell contacts and gain an elongated fibroblast-like or amoeboid shape -plays important roles in embryonic development, tissue repair and cancer biology [1][2][3]. Several studies, mostly in vitro, have supported the hypothesis that during cancer progression tumor cells undergo dynamic and reversible transitions from epithelial to mesenchymal, and from mesenchymal to epithelial (MET) phenotypes and that EMT and MET are required for cancer invasion and metastasis [1,4]. Moreover, through EMT cancer cells may also acquire stem cell properties and self-renewal capability [5,6]. The EMT/MET hypothesis as a driving force for cancer invasion and metastasis has been recently challenged by very elegant in vivo murine studies [7,8]. In these studies, EMT was not rate limiting for invasion and metastasis, but rather associated to chemotherapy resistance [7,8].
Multiple signaling pathways and complex genetic and epigenetic mechanisms regulate the EMT www.impactjournals.com/oncotarget/ Oncotarget, 2017, Vol. 8, (No. 61), pp: 103340-103363 Research Paper program in normal and neoplastic epithelial tissues [1,[9][10][11][12]. Importantly, the EMT is not a binary process and cancer cells with intermediate or hybrid epithelial/ mesenchymal (E/M) phenotypes characterized by a mixture of epithelial and mesenchymal traits have been described [13][14][15][16]. Intermediate E/M phenotypes may contribute to cancer collective cell migration and cell clusters formation by preservation of cellcell interactions including epithelial as well as E/M cells. Circulating tumor cell (CTC) clusters have been increasingly observed in the bloodstream of many patients with aggressive malignancies including lung cancer and these clusters have been associated with worse clinical outcomes as compared to the presence of single CTCs [17][18][19].
Lung cancer is the most frequent cause of cancer-related mortality worldwide leading to over a million deaths each year [20]. Based on histological characteristics, the two principal types of human lung cancer are small cell lung cancer (SCLC) and nonsmall cell lung cancer (NSCLC). The latter contributes to nearly 85% of lung cancer cases. Identification of all driver oncogene alterations in lung adenocarcinoma and consequently adoption of coherent molecular target therapies are challenging because of a large burden of passenger events per tumor genome [21][22][23]. However NSCLC patients, whose tumors harbor sensitizing and driving mutations in the epidermal growth factor receptor (EGFR), get a meaningful clinical benefit from EGFR tyrosine kinase inhibitor (TKI) treatments. Unfortunately acquired resistance invariably develops [24,25]. Importantly, acquired NSCLC resistance has also been associated to EMT [26][27][28][29].
In order to investigate the mechanisms of resistance to TKI, we have recently reported the establishment and characterization of NSCLC cell lines resistant to the EGFR inhibitor erlotinib [30].
The effect of TKI target therapy on the selection of intermediate E/M phenotypes in cancer cells is still poorly investigated. Therefore, in this study, we used wet and in silico approaches to investigate whether E/M phenotypes are associated to erlotinib-resistance in our cellular model system. The combination of different analysis techniques allowed us to describe intermediate and complete EMT phenotypes in HCC827-and HCC4006-derived erlotinib-resistant cell lines respectively. Interestingly, EMT intermediate phenotypes, collective cell migration and increased stem-like ability associate to resistance to target therapy in the erlotinib-resistant HCC827-derived cell lines. Moreover, the use of three complementary approaches for gene expression analysis supported the identification of a small EMT-related gene list, which may have otherwise been overlooked by standard stand-alone methods for gene expression analysis.

EMT features analysis of erlotinib-resistant NSCLC cells
Recently, in order to investigate mechanisms leading to resistance to EGFR-targeted therapy, two NSCLC cell lines (HCC827 and HCC4006) have been used to derive in vitro models of acquired resistance to the EGFR TKI erlotinib [30]. Both parental cell lines harbor EGFR activating mutations in the tyrosine kinase domain, precisely in exon 19. In particular, the HCC827 cell line carries a deletion in exon 19 (ΔE746-A750) and the HCC4006 carries a deletion (ΔL747-E749) and a point mutation (A750P) in exon 19. Both HCC827 and HCC4006 cell lines are highly sensitive to TKIs targeting the EGFR, while their derived cell lines (i.e: RA1, RA2, RB1, RB1.1, RB2 derived from HCC827 and the RC2.2 derived from HCC4006) are stably resistant to erlotinib (IC 50 > 10 μM) [30]. Characterization of these erlotinibresistant cell lines, all negative for the common T790M EGFR mutation, has been previously described [30] and is schematically summarized in Supplementary Table 1.
Interestingly, morphological analysis of the erlotinib-resistant NSCLC cells showed the presence of cells with a fibroblast-like cell shape reminiscent of EMT, especially in the RA1, RB1, RB2 and RC2.2 cell lines (Supplementary Figure 1). Indeed, EMT features in the erlotinib-resistant cell lines were detected by assaying the epithelial marker Cadherin-1 (also known as E-cadherin) and the mesenchymal marker Vimentin by different methodologies, such as immunofluorescence and confocal microscopy (Figure 1a, 1b and 1d), western blot ( Figure 1c) and mRNA expression analysis (Figure 1e). In particular, RC2.2 cells are all negative for Cadherin-1 and positive for Vimentin, similarly to the HCC4006ER cell line previously described [31] (Figure 1). An increase in the number of Vimentin positive cells compared to parental cell lines is observed not only in the RC2.2 cells but also in RA1, RA2, RB1 and RB2 erlotinib-resistant cell lines at both protein (Figure 1b   cell lines stained with specific reactive antibodies for Cadherin-1 (green), Vimentin (red) and Hoechst 33258 for nuclei (blue). (b) Histograms represent the percentage of Vimentin positive cells (mean ± SD), obtained by the analysis of at least 10 images with approximately 350-500 nuclei/image for each cell line in at least three independent experiments; (c) Western blot analysis of Cadherin-1 and Vimentin in the indicated parental (P) and erlotinib-resistant cell lines; (d) Confocal microscopy analysis of cell lines stained with specific reactive antibodies for Cadherin-1 (green) and Vimentin (red) and Hoechst 33258 for nuclei (blue). Images represent the projection of 4-5 selected focal planes spanning 1.6-2 μm of the specimens. Scale bars are indicated; (e) qPCR analysis of CDH1 and VIM mRNAsnormalised to rp-L31 mRNA and expressed relative to the levels in parental (P) cell lines set as 1 (mean ± SD). qPCR data are representative of those obtained from three independent analysis.
Asterisks in the figure indicate significant t-test p-values relative to the comparison of given derived cell lines with the corresponding parental one. * p < 0.05; ** p < 0.01; *** p < 0.001. Only significant p-values are shown. www.impactjournals.com/oncotarget Vimentin and Cadherin-1 despite generally used to characterize the EMT phenotype, could be not sufficient to identify and describe complex EMT features. Therefore, the expression levels of other EMT-related markers, such as Cadherin-2 (also known as N-cadherin), Claudin-1 (CLDN1), SNAI1 (also known as Snail), SNAI2 (also known as Slug) and ZEB1, were also investigated by western blotting (Figure 3a) and qPCR analysis ( Figure  3b). ZEB1 and SNAI1 resulted upregulated in all erlotinibresistant cell lines, as expected for an EMT phenotype [3,32]. Because EMT is regulated, among others, by miR-200 family targeting ZEB [33][34][35], we also investigated the expression status of some miR-200 family members by qRT-PCR. Remarkably, miR-200a, miR-200b, and miR- 200c were highly downregulated only in the HCC4006 erlotinib-resistant cell line, RC2.2, while their expression levels in all HCC827 erlotinib-resistant cell lines were comparable to parental cell line expression levels ( Figure  3c).
A more complex picture was found for some EMT markers. In particular Cadherin-2 and Claudin-1 are observed respectively upregulated and downregulated in the HCC4006-derived RC2.2 cell line while in the HCC827-derived cell lines, Cadherin-2 appears to be upregulated only in RA1, RA2 and RB1 cell lines ( Figure  3a) and Claudin-1 is downregulated in all cell lines with the exception of RA2 (Figure 3a) at the protein level. At the mRNA level a significant increase of Claudin-1 (1 fold) was observed in both RA1 and RA2 cell lines ( Figure  3b). SNAI2 is downregulated in the RC2.2 cell line and barely detectable, at the protein level, in all HCC827- Cadherin-2 (N-cadherin antibody), SNAI1 (Snail antibody), SNAI2 (Slug antibody) and ZEB1 in the indicated parental (P) and erlotinibresistant cell lines; (b) qPCR analysis of the indicated mRNAsnormalized to the rp-L31 mRNA. Expression levels (y-axis) are expressed as relative abundance of derived compared to parental (P) cell lines (mean ± SD of triplicate determinations). (c) qPCR analysis of miR-200 family members (miR-200a, miR-200b and miR-200c) normalized to RNU24. Data are expressed as fold changes in expression levels of derived compared to parental (P) cell lines (mean ± SD of triplicate determinations). Asterisks in the figure indicate significant t-test p-values relative to the comparison of given derived cell lines with the corresponding parental one. * p < 0.05; ** p < 0.01; *** p < 0.001. Only significant p-values are shown. www.impactjournals.com/oncotarget derived erlotinib-resistant cell lines, with the exception of RA2 (Figure 3a and 3b).
Overall the data suggest that the total population of the RC2.2 cell line, that does not express Cadherin-1 and expresses high levels of Vimentin, is in a mesenchymal state while the majority of HCC827-derived erlotinibresistant cell lines might likely contain subpopulations with intermediate states of EMT.

Cell migration features of erlotinib-resistant NSCLC cell lines
We used wound-healing assays in the erlotinibresistant cell lines in order to investigate the effect of the observed EMT phenotypes on cell migration. The RC2.2 (HCC4006-derived) and the RA2 (HCC827-derived) erlotinib-resistant cell lines show an increased rate of wound closure, while for all other erlotinib-resistant cell lines quantitative differences in terms of wound closure capacity were not observed by comparison with their parental cell line ( Figure 4a). Importantly, the RA2 and RC2.2 cell lines increased cell migration was also confirmed in transwell cell migration assays (Figure 4b).
Many cancer cells are known to exhibit collective invasion potential wherein cell-cell adhesions remain intact. Accordingly, in EMT phenotypes, marked

Cell invasion and sphere-forming abilities of erlotinib-resistant NSCLC cell lines
Erlotinib-resistance and EMT phenotypes can associate with more aggressive behaviours, such as increased cell invasion and stem-like features. Therefore to further study the functional phenotype of the erlotinibresistant NSCLC cell lines, matrigel-coated transwell invasion and sphere-forming assays were performed. The matrigel assays showed an increased cell invasion of the RA2 and RC2.2 NSCLC cell lines when compared to their parental erlotinib-sensitive cell lines (Figure 5a and Supplementary Figure 3). Instead similar or decreased invasion ability was recorded for the RA1, RB1, RB1.1 and RB2 cell lines (Figure 5a and Supplementary Figure  3).
Regarding the sphere-forming assays, we observed that the number of spheroids of the RA1, RB1, RB1.1 and RB2 cell lines with a diameter ≥ 50 μm was 2-3 fold higher than of the HCC827 parental cell line (Figure 5b) while the spheroid size was similar (Figure 5c, 5d). On the contrary the number of spheroids of the RA2 and RC2.2 cell lines was respectively 4 and 10 fold lower than of their parental cell lines. Moreover, the spheroids of the RA2 and RC2.2 were remarkably smaller than the spheroids of the other NSCLC cell lines (Figure 5c and 5d).
Overall the results show that the increased in vitro cell migration and cell invasion features of the RA2 and RC2.2 erlotinib-resistant cell lines do not associate with an increased stem-like ability. On the contrary the majority of HCC827-derived erlotinib resistant NSCLC cell lines with intermediated E/M phenotypes that do not show an increase in cell migration or invasion are more prone to form spheroids in vitro when compared to their parental erlotinib-sensitive cell line.

Gene expression analysis in erlotinib-resistant NSCLC cell lines
To identify genes and pathways involved in erlotinib-resistance, microarray gene expression analysis was performed. Thousands of genes differentially expressed in the erlotinib-resistant cell lines were identified (Table 1). Interestingly, a subset of these genes showed recurrent deregulation associated with erlotinib resistance. In particular, over 500 genes were found to be differentially expressed in at least 5/6 erlotinib-resistant cell lines showing recurrent upregulation (114 genes) or downregulation (393 genes) ( Figure 6). Differentially expressed genes were ranked according to average logfold changes across all the 5/6 comparison of expression levels between erlotinib-resistant cell lines and erlotinibsensitive parental cell lines (Supplementary Table 2). While several interesting genes appear among the differentially expressed we decided to focus our analysis on the impact that differentially expressed genes have on pathways and biological processes. Remarkably, functional analysis of modulated genes identified enriched annotation to several pathways and biological processes that may be related to the EMT phenotype, such as Cadherin and TGF β signaling pathways, regulation of actin cytoskeleton and response to wounding ( Table 2-3; Supplementary Table 2). www.impactjournals.com/oncotarget

Figure 4: Cell migration of erlotinib-sensitive and erlotinib-resistant NSCLC cell lines. (a) Analysis of cell migration
by wound healing assays. The wound closure was quantified for each of the indicated cell lines at 16-24h post-wound by measuring the cell-free area using the ImageJ software [64]. Bar plots represent the percentage of relative wound closure (mean ± SEM), calculated as described in material and methods. Data were obtained by the analysis of 5-10 immunofluorescence images for each cell line at the time points indicated and they are representative of 3 independent experiments. (b) Analysis of transwell migration assays. Bar plots represent the optical reading (OD) (mean ± SEM), of BSA (black) and FBS (white) guided transwell cell migration assays as described in material and methods. The data are representative of 3 independent experiments. Asterisks in the figure indicate significant t-test p-values relative to the comparison of given derived cell lines with the corresponding parental one. ** p < 0.01; *** p < 0.001. Only significant p-values are shown. (c) Confocal microscopy images of HCC827 and derived erlotinib-resistant cell lines stained with specific reactive antibodies for Cadherin-1 (green), Vimentin (red) and Hoechst 33258 for nuclei (blue) at the wound edge. Images represent the projection of 12-18 selected focal planes spanning 4.8-7.2 μm of the specimens. Scale bar is indicated.  cell invasion. The cell invasion ability of the indicated NSCLC cell lines was assayed using matrigel-coated transwells as described in material and methods. In three independent experiments the samples were assayed in triplicate. The area of the membrane filters occupied by successfully invasive cells was quantified 24 h after plating by ImageJ software. Data are represented as relative fold change ± SEM of parental cell lines. ** p < 0.01; *** p < 0.001. (b-d) Sphere-forming assays. The number and surface area of spheroids with a diameter ≥50 μm were recorded and quantified by ImageJ software 6-7 days after the start of the culture period. Bar plots represent in (b) the spheroids number and in (c) the spheroids size. The data are representative of 2-3 independent experiments. Asterisks in the figure indicate significant p-values of comparison of derived cell lines vs the corresponding parental one. ** p < 0.01; *** p < 0.001. Only significant p-values are shown; (d) representative images of spheroids are shown, scale bar is indicated. Statistical analysis of invasion and spheroids data was performed using the non-parametric Mann-Whitney U test. Differences between the means were considered statistically significant when p≤ 0.05. www.impactjournals.com/oncotarget Counts of differentially expressed genes in erlotinib-resistant cell lines.

Analysis of EMT-related gene expression in erlotinib-resistant NSCLC cell lines
In order to analyze the EMT phenotypes associated to erlotinib-resistance, we collected five lists of genes whose expression has been previously associated to EMT (  [41]). These gene lists were pooled together (335 genes) and 270/335 genes (COMBO_270 gene list) also interrogated in our microarray experiments were further investigated (Supplementary Table 3). The first evidence gained from the analysis was that the above EMT-related lists showed little gene overlap among each other, with the maximum overlap (N=15) found between the Byers et al. and the Huang et al. gene lists (Supplementary Figure 4). However, hierarchical clustering based on expression levels relative to each of these EMT-related gene list alone (data not shown) yielded a grouping of the different NSCLC cell lines similar to that observed by considering the pooled list of 270 genes (Supplementary Figure 5a). In particular, the erlotinibsensitive and -resistant NSCLC cell lines separate in the cluster primarily based on their cellular origin. This evidence was also confirmed by Principal Component Analysis (PCA) of the expression profiles of the pooled list, which shows the maximum distance in the plane identified by the first and second principal components (PCs), as well as in any other PCs plane, as the one separating the HCC827 from the HCC4006 parental cell lines (both epithelial-like), (Supplementary Figure 5b). Altogether, these results indicate that the EMT-related gene sets were not able to capture the common epithelial phenotype observed in parental cell lines. Therefore we decided to extend the analysis by applying more sensitive supervised methods to further interrogate the EMT gene lists.

Supervised selection of EMT-related genes discriminating epithelial vs mesenchymal traits in NSCLC cell lines
Three different supervised methods were applied to select genes maximally associated to the intermediate E/M phenotypic differences observed in the tested NSCLC cell lines: (i.) cut-off on gene expression (CO); (ii.) support vector machine (SVM); (iii.) expression profile analysis (PROF). Specifically, we used these supervised methods to identify subsets of genes having similar expression levels within the epithelial cell lines (HCC827 and HCC4006) while exhibiting different expression in the mesenchymal (RC2.2) and the cell lines with E/M features (RA1, RA2, RB1, RB1.1, RB2) (see the methods section for details). The gene subsets resulting by application of the above three methods, hereafter also referred to as CO_59, SVM_85, and PROF_58 according to the related selection method (CO, SVM or PROF) and the number of selected genes (59, 85 and 58 respectively) (Supplementary Table  3  Enriched pathways of erlotinib-resistant differentially expressed gene set.  while the RA2 separates from the other HCC827-derived cell lines (Figure 7a-7c). Similar relative positioning in the PCA plane identified by the first and second principal components (ie: the most explicative as being the ones explaining most of the viewed differences) suggest that the three differently selected gene lists, while qualitatively diverse, are all agreeing on capturing similar cell line characteristics. In fact, these three EMTrelated selected gene lists only share a limited number (=25) of genes (INT_25 list in Supplementary Table  3) as shown in the Venn diagram (Figure 8a) and the heatmap (Figure 8b). Remarkably, PCA plot relative to this restricted list of common genes recapitulates the ones obtained by analyzing the three cognate subsets of EMT-related genes (Figure 7d). Five out of the twentyfive shared EMT-related genes (namely CTSL2, KLC3, SLPI, ST14 and STEAP1) were selected for the qPCR validation in that they are expressed at different levels in the erlotinib-sensitive and -resistant cells lines in the microarray data and were previously associated to cancer development (Figure 8c). CTSL2 or cathepsin L2 gene, a pro-apoptotic target of E2F1 and a modulator of apoptosis [42] previously reported to be upregulated in several human tumors including breast and colon carcinoma [43], is downregulated in the HCC827-derived cell lines RA1, RB1, RB1.1 and RB2 as well in the HCC4006derived RC2.2 cell line (Figure 8b, 8c). KLC3 or kinesin light chain 3, a member of the kinesin light chain gene family involved in binding cargo and regulating kinesin activity, has been reported to be upregulated in some cases of breast, ovarian, head and neck, lung, skin, urothelial and pancreatic cancers (URL: http://www.proteinatlas. org/ENSG00000104892-KLC3/cancer). KLC3 while donwregulated in the majority of our erlotinib-resistant cell lines is upregulated in the RA2 cell line (Figure  8b, 8c). Interestingly SLPI a serine protease inhibitor overexpressed in a wide variety of human cancers, including breast, lung, ovarian and colorectal carcinomas, and glioblastoma [44][45][46][47][48], is highly upregulated in RA1, RB1, RB1.1 and RB2 erlotinib-resistant cell lines (Figure 8b, 8c). STEAP1, a member of a family of metalloreductases described as a cell-surface antigen in prostate tissue [49,50] and reported to be highly expressed in prostate, breast, bladder, colon ovarian carcinomas, in Ewing's sarcoma [51] and in endothelial cells in the vessels of human lung tumours [52], is highly upregulated in RA1, RB1, RB1.1 and RB2 erlotinib-resistant cell lines (Figure 8b, 8c). Finally ST14, or suppression of tumorigenicity 14, an integral membrane serine protease and a ZEB1 responsive gene in NSCLC [31], is strongly downregulated in the RC2.2 cell line and only slightly downregulated in the majority of the HCC827-derived NSCLC cells lines (Figure 8b, 8c). The expression level of STEAP1 and KLC3 proteins was also evaluated by  (Figure 8d). STEAP1 barely detectable in HCC827 and HCC4006 erlotinibsensitive cell lysates is strongly upregulated in RA1, RB1, RB1.1 and RB2 cell lines. KLC3 well expressed at the protein level in the parental erlotinib-sensitive cell lines is strongly downregulated in all erlotinib-resistant cell lines with the exception of the HCC827-derived RA2 cell lines (Figure 8d).

Functional enrichment analysis of selected EMTrelated gene lists
Analysis of enriched functional annotations for the three subsets of EMT-related genes highlighted pathways associated to the EMT process, including cell adhesion molecules, tight junction, Cadherin and TGF β signaling pathway. Interestingly the TGF β pathway was found enriched in all three gene lists (Tables 5-7).

Effect of TGF β and TGFR-1 inhibitor on EMT features and cell growth
Prompted by the functional enrichment analysis, we decided to further investigate the TGF β signaling pathway. Indeed, it has been previously reported that NSCLC cells undergo EMT phenotypic changes upon short exposure to TGF β [53] or long term treatment [54] and that TGF β ligand-induced activation of the TGF β pathway led to the development of resistance to erlotinib  [54,55]. Thus, we used immunofluorescence studies to evaluate the effect of TGF β1 and a TGFR-1 inhibitor (LY-364947), alone or in combination, on Vimentin expression, as a marker of the EMT phenotype. TGF β1 treatment strongly increases the number of Vimentin positive cells in the erlotinib-sensitive parental HCC827 cell line (3 fold increase) and to a less marked extent in its derived erlotinib-resistant cell lines (≤2 fold increase) (Figure 9a). The observed increase is prevented by treatments with LY-364947 in all cell lines, except for the RA1 (Figure 9a). Remarkably, the LY-364947 treatment alone induces a 30-40% reduction in the number of Vimentin-positive cells in the RB1, RB1.1 and RB2 cell lines while not changing the number of Vimentin-positive cells in the RA1 and RA2 cell lines (Figure 9b).
Finally, growth inhibition assays indicated that LY-364947, alone or in combination with erlotinib, does not impair the viability of HCC827-derived erlotinib-resistant cell lines (Supplementary Figure 6).
Overall these results suggest that the observed EMT stable phenotypes associated to erlotinib-resistance likely appear independent or only partially mediated by TGF β-1 activation.

DISCUSSION
Increasing evidence points to a key role played by EMT in cancer progression and drug resistance.
Here we concentrated our study on the intermediate E/M phenotypes and their association to target therapy resistance in NSCLC cell lines. As observed in both physiology and pathology, a cell population can be highly heterogeneous with respect to its EMT phenotype. Indeed, we consistently observed distinct subpopulations of cells  and positive for Vimentin. Interestingly, only the RA2 and RC2.2 cell lines showed an increase in cell migration and cell invasion rate, suggesting that a qualitative rather than a quantitative variation of cell migration and eventually invasion may deserve further investigation in the other erlotinib-resistant cell lines. Remarkably HER3 a receptor suggested to be a strong epithelial marker and regulator of EGFR TKI sensitivity [56], was found not expressed or constitutively inactivated only in the RA2 and RC2.2 erlotinib-resistant cell lines [30]. Interestingly the increased in vitro cell migration and cell invasion of the RA2 and RC2.2 erlotinib-resistant cell lines do not associate with increased stem-like features. On the contrary the majority of HCC827-derived erlotinib-resistant NSCLC cell lines, with intermediated E/M phenotypes, have similar or decreased cell migration and cell invasion rate when compared with their parental cell line but show an increased sphere-forming ability. Importantly, the mRNA expression profile of some largely used stemness markers could not predict such stem-like behavior (Supplementary Table 4).
Intriguingly, while all erlotinib-resistant cells lines express higher levels of ZEB1 compared to their parental cell lines, only the RC2.  Table 2). Importantly these genes have been previously associated to the regulation of intermediate EMT phenotypes, multi-step transition between epithelial and mesenchymal states, and positive miR-200 family regulation [15,[57][58][59][60].
Furthermore, wound healing cell migration assays demonstrated a collective cell migration (reviewed in [61]) property of the HCC827-derived cell lines. The remarkable localization of Vimentin and Cadherin-1 double positive cells as well as epithelial Vimentin negative cells at the leading edge together with their collective migration highlighted the potentially dangerous contribution of E/M cell subsets in metastasis. A propensity for these E/M cell subpopulations to cluster with epithelial cells was also observed. Gene expression analysis of single epithelial, or hybrid E/M CTC clusters, could provide valuable information about their possible different functional properties.
Next, we used our NSCLC cell lines microarray data to evaluate the expression of publicly available EMTrelated gene signatures and applied different supervised analytical methods for selecting novel gene sets able to partition our cancer cell lines according to their observed EMT phenotypes. Notably, despite the heterogeneous composition, all the selected gene sets showed significant enrichment of the TGF β signaling pathway, confirming the relevance of this pathway with respect to the EMT phenotypes [53][54][55].
Finally we investigated the effect of the modulation of the TGF β1 signaling pathway on cell survival and EMT phenotypes. We observed that the inhibition of TGFR-1, alone or in combination with EGFR inhibition, did not affect NSCLC cell lines survival, but it was able to partially decrease the number of Vimentin positive cells in some HCC827-derived erlotinib-resistant cell lines. Further investigation of the TGF β1 and of other enriched signaling pathways (i.e. the Wnt pathways) will be required to understand their contribution to the erlotinib resistance and the intermediate E/M phenotypes in our cellular model system.
Overall these results confirmed cellular heterogeneity during EMT in cancer cell lines upon
Power SYBR Green PCR master mix was from Applied Biosystems. Trizol reagent was from Life Technologies, the GoScript reverse transcription system kit was from Promega.

Cell growth inhibition MTT assay
MTT assay was performed as previously described [30]. Briefly cells (1-2 x10 4 cells/well) plated in 96-well plates at day 0 were treated at day 1 with increasing concentrations of erlotinib or LY-364947 (from 64 pM to 10 μM) or their combination in the ratio (1:3) in complete tissue culture medium and cultured for 72 hours at 37°C in 5% CO 2 . Next, cells were gently washed with 1x PBS, incubated for 4 hours with MTT and processed for color detection with DMSO. The resulting purple solution was spectrophotometrically measured at 570 nm and 630 nm [62,63].

Wound healing assay
The HCC827 and HCC4006 parental cell lines and the erlotinib-resistant derived cell lines were plated onto 35 mm plastic petri dishes and grown until confluence. A cell-free area was introduced in a monolayer of cells using a sterile yellow tip (time t 0 ) and then cells were allowed to migrate from the edge of the scratch at 37°C/5% CO 2 . The monolayer was imaged at different time points to record the size of the wound closure at 5-24 (time t) hours with an Olympus BX41 microscope using a 4-10x lens and 8-10 images were recorded with an Olympus SP-350 camera. The area of wound coverage was calculated using the ImageJ software [64]. The degree of cell migration was expressed as the percentage of relative wound closure calculated as: For immunofluorescence and confocal microscopy analysis the cells were also subjected to wound healing assay with the μ-Dish 35mm with Culture Insert (Ibidi) and an equal number of cells (0.5-1 × 10 5 ) in 70 μl of culture medium was seeded into the two reservoirs of the same insert and incubated at 37°C/5% CO 2 . After appropriate cell attachment (24 h) the culture insert was gently removed using sterile tweezers, and a 500 μm wide cell-free gap (wound) was generated. The dish was then filled with 2 mL of FBS containing medium and incubated for 8-16 hours. Then cells were processed for immunofluorescence and confocal microscopy analysis.

Transwell migration assay
Assays were performed using a transwell (Corning Costar, Rochester, NY, USA) containing a polycarbonate membrane filter (8μm pore size) for 24-well plates according to the manufacturer's instructions. HCC827 and RA2 (1x10 5 ), HCC4006 (2.5x10 4 ) or RC2.2 (2.5x10 4 ) were seeded onto the upper surface of a filter with a volume of 100 μl RPMI 1640 medium containing 0.5% BSA and placed into the lower wells containing 600 μl of complete medium with 10% FBS or 0.1% BSA (negative control) to induce cell migration. The migration transwell chambers were incubated for 16h at 37°C in a humidified atmosphere containing 5% CO2. Following incubation, cells were fixed in 3.7% PFA for 10 min at room temperature, stained with 0.1% crystal violet (20% methanol solution) for 10 min, then washed 3 times with PBS. Next, the cells on the upper side of the membrane were removed by cotton swabs, and the crystal violet staining on the bottom surface of the membrane was eluted in 10% acetic acid and read at an optical density of 590 nm with a Varioskan Lux instrument.

Transwell invasion assay
Matrigel invasion assays using transwell chambers (Corning Costar, Rochester, NY, USA) containing a polycarbonate membrane filter (8μm pore size) for 24-well Area t Area t Area t 0 0 100 − * plates coated with matrigel was used to determine NSCLC cell invasion properties. In brief, 1 mg/ml matrigel (50 μl) was added to each insert and solidified in a 37°C incubator for 5 h to form a thin gel layer. Next, 24 h serum-starved cells (3×10 5 /w) in RPMI 1640 containing 0.5% BSA were seeded in the upper chamber, and 500 μl RPMI 1640 containing 10% FBS was added to the lower chamber. After 24 h incubation at 37°C in a 5% CO 2 atmosphere, cells that remained in the upper chamber were removed by cotton swabs and penetrating cells were stained with 1% crystal violet in 20% methanol. Then, the membranes were rinsed several times with water and images were recorded using a stereomicroscope Leica MZ12 served by an Olympus SP-350 camera. Next images were processed with the Image software in order to calculate the area of the membrane filter occupied by invasive cells.

Sphere forming assay
The assay was performed essentially as previously described [65]. Briefly single-cell suspension cultures were prepared at densities of 5x10 3 cells per well in serum-free medium DMEM/F12 supplemented with B27 supplement, EGF (20 ng/mL), FGF (20 ng/mL), and heparin (10 U/mL), and seeded into 6-well ultra-low attachment plates (Corning, Corning, NY, USA). Culture medium was replaced every 3 days; the number and size of the spheroids were recorded at 6-7 days after the start of the culture period using a stereomicroscope Leica MZ12 served by an Olympus SP-350 camera. The number of spheroids exceeding a diameter of 50 μm and their surface area were evaluated by the ImageJ software [64].

Determination of IC 50 value and regression trendline
Data were analysed by GraphPad Prism software v6.0c. The optical density values of cells treated with drugs, obtained by MTT assay reading, were expressed as percentage of cell survival and normalized with the value of cells treated with vehicle (DMSO). The regression trendline were fitted using a non-linear regression method and IC 50 values were determined using a sigmoidal dose response inhibition variable slope method. The synergy of erlotinib and LY-364947 was evaluated by the Chou-Talalay method [66] using CompuSyn software [67] as previously described [30].

Immunofluorescence and confocal microscopy analysis
HCC827 parental and erlotinib-resistant cells, grown on poly-L-lysine coated coverslips or on 35 mm tissue culture plates or on the Ibidi culture dish and HCC4006 and RC2.2 cells, on 35 mm tissue culture plates, were fixed with 3.7% PFA for 10 minutes at room temperature. Next cells were permeabilized (1x PBS, 0.1% Triton X100) for 6 minutes at room temperature, treated with blocking solution (1x PBS, 1% BSA) and stained with primary antibodies in blocking solution, for 1 hour, washed with 1x PBS for 10 times and stained with fluorochrome-labeled secondary antibodies for 45 min at room temperature and Hoechst 33258 for 5 min. HCC827 samples were analysed with laser scanning confocal microscope, TCS SP5 (Leica Microsystems, Mannheim) using a 63x (NA=1.4) or 40x (NA=1.25) oil-immersion lens with optical pinhole at 1AU. For fluorescence images Argon ion laser at 488 nm, HeNe ion laser at 543 nm and blue diode laser at 405 nm were used as excitation sources. Confocal Z-stacks were collected at 0.4 μm intervals to a total optical depth of 8-12 μm. Confocal images (at least 10 images for each sample in three independent experiments) were processed with Volocity (Improvision, Perkin Elmer) and Adobe Photoshop CS4 software for image rendering and representation of x/y view. Images for direct comparison were all collected under same parameters and representative images were chosen. HCC4006 and RC2.2 cells, stained in tissue culture plates, were analyzed with an Olympus AX70 fluorescence microscope using a 20x or 40x lens and images were recorded with an Olympus XM10 camera and processed using Olympus CellSens Standard 1.8.1 software. The following setting were used for fluorescence microscopy images acquisition: exposure time at 500 ms, gain at 4 dB, and thresholds of the fluorescence intensity on dynamic range in order to include the overall histogram area. Next, the immunoreactivity for Vimentin, was independently evaluated by two investigators and Vimentin positive cells were independently counted. When discrepancies in counting occurred, a consensus interpretation was reached after reexamination. All cell counts were carried out using the ImageJ software [64]. All image panels presented were assembled using Adobe Illustrator CS4 (14.0.0)

Flow cytometry analysis
Adherent cells were detached from tissue cell culture plates by treatment with a cell dissociation buffer enzymefree PBS (Thermo Fisher) at 37°C. Cells were collected in 15 ml tubes, washed with ice-cold RPMI 1640 medium supplemented with 10% FBS and centrifuged at 400g for 5 min at 4°C. Next cell pellets were first washed with icecold PBS supplemented with 1% FBS to block unspecific antibody binding and fixed with PFA (3.7%) for 15 min on ice. After a wash with ice-cold PBS supplemented with 1% FBS, cells were permeabilized with 0.1% Triton X-100 in PBS for 15 min at room temperature and washed with PBS+1%FBS. Then 0.5-1x10 6 cells were stained in 5ml round bottom tubes (BD Falcon) with Alexa Fluorconjugated antibodies for 45 min at room temperature. Cells were then washed twice with PBS+1% FBS. Finally samples were analyzed using a BD FACS Calibur flow cytometer (Becton Dickinson). Unspecific background of individual channels was determined with isotype matched controls. Plots were generated using FlowJo v10.2 software. www.impactjournals.com/oncotarget

Western blot analysis
Cells were lysed in ice-cold RIPA buffer (50 mM Tris-HCl pH 7.5, 150 mM NaCl, 1% NP40, 0.25% sodium deoxycholate, 1 mM EDTA) containing a combination of protease and phosphatase inhibitor cocktails or in SDS Laemmli sample buffer. Cell lysates (RIPA buffer) were quantified for proteins content with the Bio-Rad DC Protein Assay kit. Proteins (25-50 μg) were then separated by 8-12% SDS-PAGE and transferred to nitrocellulose membranes 0.45 μm (GE Healthcare Life Sciences). The membranes were blocked with 5% non-fat milk (EuroClone) or 2% BSA (Sigma-Aldrich) in 1x TBS pH 7.6-8.0 containing 0.1 or 0.2% Tween 20 (TBST) for 2 hour at room temperature (RT) and subsequently probed with primary antibodies in 5% non-fat milk or 2-5% BSA in TBST overnight at 4°C. Then membranes were washed 10 minutes for 3 times with TBST and probed with horseradish peroxidase-conjugated secondary antibodies in 5% non-fat milk in TBST for 1 hour at RT. Chemidoc XRS Bio-Rad was used for images acquisition with a chemi-luminescent camera, band signals were quantified using ImageLab 4.0 Bio-Rad software.

RNA analysis
For mRNA analysis, total RNA was extracted using the TRIzol (Life Techologies) reagent and retrotranscribed with the GoScript Reverse Transcription System (Promega) using oligo (dT) and random primers. qPCR analysis was performed with specific primers (Table  8) and analyzed as previously described [30]. qPCR analysis was performed using a 7500 Fast Real-Time PCR System (Applied Biosystems). Ribosomal protein L31 (rp-L31) was used as a reference gene to normalize quantitation of target genes for differences in the amount of total RNA in each sample. microRNA levels were analyzed from total RNA using the following TaqMan MicroRNA Assays: miR-200a ID:000502, miR-200b ID:002251, miR-200c ID:002300 and RNU24 ID:001001(Applied Biosystems, Thermo Scientific). Relative expression was calculated using the comparative Ct method. The small nucleolar RNA U24 (RNU24) was used as a reference gene.
Both for mRNA and microRNA analysis, total RNA of HCC827 and HCC4006 parental cell lines were used as calibrator samples, relative to which differences in the RNA amount of resistant cell lines have been calculated. The data were analyzed using SDS (Ver. 1.4) software (Applied Biosystems).

Microarray data description and analysis
For microarray gene expression analysis, total RNA was isolated using a total RNA purification kit (Norgen Biotek, Thorold, ON, Canada). RNA quantity was determined by absorbance at 260 nm using a NanoDrop ND-1000 UV-VIS spectrophotometer (Thermo Fisher Scientific, Wilmington, DE, USA). RNA quality was assessed with Agilent Bioanalyzer RNA 6000 Nano kit; 200 ng of RNA were labeled with LowInput QuickAmp Labeling Kit One-Color (Agilent Technologies), purified and hybridized overnight onto an Agilent SurePrint G3 Human Gene Expression Array (8x60K v2 G4851B -039494) before detection, according to the manufacturer's instructions. Agilent DNA Microarray scanner was used for slide acquisition and spot analysis was performed with Feature Extraction software (Agilent Technologies).
Microarray raw data, provided by the European Brain Research Institute (EBRI, Rome) were background corrected, log2 transformed and quantile normalised by using limma package [68]. A total of 42,898 probes (27,841 unique official gene symbol identifiers) were selected after filtering out control and low intensity probes. Probes that map on the same gene were summarized into one averaged expression value.
The whole Agilent microarray dataset used in this study is publicly available from the NCBI's Gene Expression Omnibus (GEO) database [69] under accession number GSE80344 at the following URL: http://www. ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE80344.
PCA of cell lines was performed using gene expression values as variables that were scaled to have unit variance before the analysis took place. Unsupervised hierarchical clustering was performed using the complete linkage method and Euclidean distances applied to the normalized expression data matrix (log scale).

Differential expression analysis between erlotinib-resistant and -sensitive cell lines
The expression values of biological sample duplicates were averaged and mean expression differences were calculated as positive and negative fold-changes of gene abundance between erlotinib-resistant and -sensitive cell lines. We used -1 and 1 log fold-change cutoffs in order to select genes with relevant down-and upregulation, respectively. Recurrence in gene expression modulation of erlotinib-resistant cell lines were evaluated by selecting genes with down-or up-regulation in at least five of the six derived vs parental cell lines comparisons. The recurrent differentially expressed genes were ranked according their log fold-change mean in the 5/6 erlotinibresistant cell lines.
Cut-off on gene expression (CO). This method selects genes differentially expressed between epithelial and mesenchymal cell lines (fold-change ≥ 2) and with no modulation between the subset of epithelial cell lines (fold-change within epithelial cell lines lower than 1.5). This method selected a list of 59 genes (hereafter, the CO_59 list). Support Vector Machines classification (SVM). A supervised classification of cell lines was performed by using the SVM implemented by the Weka software [70]. Based on this classifier, each gene from the COMBO_270 list was assigned a separation hyperplane coefficient for the following three pairwise problems: epithelial vs E/M, E/M vs mesenchymal, and epithelial vs mesenchymal. Genes with all three coefficients greater than 0.01 were selected (hereafter, the SVM_85 list).
Expression profile analysis (PROF). A hierarchical clustering was firstly performed to group genes of the COMBO_270 list according to their expression profiles. The resulting cluster dendrogram was cut at N=38 clusters and expression trends of each cluster separately evaluated. Next, genes populating clusters showing modulation between epithelial and mesenchymal cell lines and/or between E/M phenotype cell lines and both epithelial and mesenchymal cell lines were selected (hereafter the PROF_58 list).
Finally, intersection of the above subsets of selected genes yielded a shared list of 25 genes related to EMT (hereafter INT_25 list).

Functional enrichment analysis of gene sets
Differentially expressed genes between erlotinibresistant and -sensitive cell lines and EMT-related gene sets were submitted to the DAVID tool suite [71] to analyze functional enrichment in selected pathways by testing annotation terms provided by different sources, such as KEGG (Kyoto Encyclopedia of Genes and Genomes), PANTHER, BioCarta, Reactome and BBID (Biological Biochemical Image Database). Annotation terms with p-value < 0.05 were considered relevant. As regarding the selected EMT-related gene lists (i.e. CO_59, SVM_85 and PROF_58) gene sets, the annotation relevant terms had p-value lower than 0.05 and fold-enrichment higher than the ones of COMBO_270 gene list.

Statistical analysis
Statistical analysis was performed using Microsoft Excel and GraphPad Prism software 6.0c. Two-tailed t-test was used for all indicated comparisons if not otherwise stated and p-values lower than 0.05 were considered significant. In particular, the effects of treatment with LY-364947, TGF β1 and their combination were evaluated by analyzing the fold-changes of Vimentin positive cells in vehicle (control) vs LY-364947, vehicle vs TGF β1, and TGF β1 vs LY-364947+ TGF β1 comparisons. In the wound healing assay, differences between leading edge and inner side were evaluated by comparing the counts of total Vimentin positive, Vimentin/Cadherin-1 double positive, and Vimentin positive/Cadherin-1 negative cells. In qPCR experiments resistant cell lines were compared with their corresponding parental cell lines. Statistical analysis of the invasion and sphere-forming assays data was performed using the non-parametric Mann-Whitney U test.