The highly expressed COL4A1 genes contributes to the proliferation and migration of the invasive ductal carcinomas

Background Invasive ductal carcinoma is a kind of very typical breast cancer. The goal of our research was to figure out the molecular mechanism of Invasive ductal carcinoma and to find out its potential therapy targets. Results The total amount of 478 differentially expressed genes in Invasive ductal carcinoma which compared with normal breast epithelial cells were recognized. Functional enrichment analysis proved the most part of differentially expressed genes had connection with ECM-receptor interaction. The two genes lists were contrasted in PPI network, and miRNA regulation networks, The most two crucial genes were identified in our study, which may be helpful to improve Invasive ductal carcinoma treatment. Additionally, experimental results shows that the COL4A1 gene, one of identified genes, played important roles in both of proliferation and colony formation in Invasive ductal carcinoma. Conclusions Invasive ductal carcinoma could have connection with ECM-receptor mutations. These 9 vital genes could be an important part in the progression of Invasive ductal carcinoma and be offered as therapy targets and prognosis indicator. and the experimental results showed that one of the most crucial genes, COL4A1, was the key gene that influence the proliferation and colony formation of the Invasive ductal carcinoma cell.


INTRODUCTION
Invasive ductal carcinomas (IDC) is a very typical breast cancer [1]. How to describe ductal carcinomas into a terminological phrase is still a controversial problem as on purely anatomy because of lacking of a wideacceptable classification. IDC and some other types of breast cancer, like invasive lobular carcinomas (ILC) are generated from the terminal duct lobular unit (TDLU), and their morphologic distinction is impossible to show physiological source of lesions but the different parts in mechanisms of carcinogenesis. Both tumor types own a same statement in clinical pathological parameter as size, stage, etc [2], but researchers recognized that their pattern of advancement and further development are different which is based on new clinical data and analysis of pattern of metastasis [3,4]. Matched-treatment is similar for tumors in different stage [5], but ILCs can always resist to treatment [6]. Compared with IDCs patients, ILCs Patients speared in older age stage and have low grade tumor and Research Paper www.impactjournals.com/oncotarget less lymphatic invasion, but their survival chance are similar [7,8].
There is a useful technology, microarrays. It makes the synchronous study to the expression of enormous genes is possible. This technology is a combination of several functions, including tumor classification, molecular pathway modeling, operational genomics, and comparison of gene expression profiles between groups [9]. Ductal breast cancer have been classified into two classes, offensive phenotype and non-offensive phenotype by early researchers with microarrays. There are also some researchers identified that expression patterns is clear between BRCA1/2 status [10][11][12][13]. The gene expression in breast cancer was studied by many researchers by using microarray, but the carcinogenesis and pathophysiological mechanism of IDC isn't completely understood.
COL4A1, as the one kinds of the collagens which were the unique collagen constructed the basement membrances [14]. so the COL4A1 was tightly associated with the cell proliferation, it was reported that COL4A1 knockdown led to reduced cell viability and cell cycle arrest [15], and clinical performance associated with mutations of COL4A1 include perinatal cerebral hemorrhage and porencephaly [16], hereditary angiopathy, nephropathy, aneurysms, and muscle cramps (HANAC) [17], ocular dysgenesis, myopathy and Walker-Warburg syndrome [18]. the latest reports identify the truncation of C-terminal NC1 domain of type IV collagen 1 by frameshift mutation tightly linked with renal disease and demonstrate that the highly conserved C-terminal part of the NC1 domain of the α1 chain of type IV collagen is important in the integrity of glomerular basement membrane in humans [19].
In this research, we collected 15 microarray data from microarray dataset: normal ductal cells from 10 patients; five surgical specimens with IDC and proceeded several of bioinformatics analyses to recognize the molecular mechanism of IDC [20]. We identified the different expressions between normal cells and cancer cells. Several genes that might have a crucial function in the development of IDC were also founded. based on the bioinformatics analysis, a series of the experiments were performed to verify the results obtained from the data mining, and indicated that overexpressed COL4A1 gene promote the proliferation of the invasive ductal carcinomas cells SKBR3.

Function and pathway enrichment analysis
478 DEGs in gene list was uploaded to DAVID internet site and conducted GO analysis which p under or equal to 0.05. In Figure 2 showed the top 10 enriched GO terms. We found that most of them were connected with cytoplasmic movement method (6/10),including Cell adhesion (20.5% DEGs were enriched with p = 3.64E-10), biological adhesion (15.66%; p = 1.18E-06), skeletal system development (12.05%; p = 4.79E-06), extracellular structure organization (12.05%; p = 2.3E-05) and cell motion. Not only cytoplasmic movement system, but GO terms connected with cellular development were also seemed to being triggered, such as spidermis development (15.66%; p = 1.73E-08), cell proliferation (10.84%; p = 1.62E-05). KEGG pathway analysis showed a same results (

Analysis of Mi-RNA/gene network
Based on the starBase, a total of 37 potential regulation MiRNA and 125 MiRNA-Gene regulation pairs were obtained ( Figure 4).The top 20 according to their node degree of co-expression network were PTPRC, MX1, FCGR3A, UGT1A6, LY86, GRN, CP, TSPO, CD68, SKAP2, CD53 Table 3. By comparing those two lists of DEGs in PPI network, and MiRNA-Gene regulation network, we found that, 9 genes existed in MiRNA-Gene regulation network regulated by at least 3 miRNAs and interacted with at least 3 genes in PPI www.impactjournals.com/oncotarget network: ITSN1, TCF4, EPHA4, VCAN, SIK3, IFFO2, SAMD4A, KMM6B and COL4A1.These overlapped DEGs could be more crucial part in IDC.

Key genes filter and survival analysis
To visualizing gene expression level of the 9 most overlapped genes, we used pheatmap package implemented in R to generate a heatmap ( Figure 5) to detect the gene expression differences between normal ductal cells and IDC. In this step, we targeted two genes: VCAN and COL4A1,because they showed the most distinction in gene expression profile. To check out if the expression status of VCAN and COL4A1 has any medical value for treatment, we did analyses by using the Kaplan-Meier method. As data displayed in Figure 6A and Figure 6B, lower expression of VCAN or COL4A1 predicted a longer overall survival in patients (P = 0.007; P < 0.001).

The mRNA level of COL4A1 were up-regulated in the SKBR3 cells
In order to assess the expression of candidate genes in the SKBR3 cells, RT-PCR and Western Blot was employed and the results were shown in Figure 7A and 7B, compared to the control (the normal lung epithelial cells BEAS-2B), a up-regulation in the mRNA levels of the gene COL4A1 was observed which is consistent with the previous findings from bioinformatics analysis. but for the VCAN gene. no significant change in the mRNA level were observed compared with the control. these results showed that COL4A1 gene were up-regulated in the SKBR3 cells.

Knocking down COL4A1 gene in SKBR3 cells
Lentiviral vector delivery of shRNA targeting the Col4A1 gene into the SKBR3 cells resulted in the reduction of COL4A1 mRNA and protein, as measured by RT-PCR and Western blot analysis, respectively. as can be seen from the Figure 8, the corresponding expression of COL4A1 in the mRNA and protein levels were obviously down-regulated by shRNA treatment. andthe cell treated with COL4A1 shRNA did not show obvious difference in morphology, indicating no significant change in cytoskeletal architecture by the shRNA treatment.

Knocking down the COL4A1 gene significantly reduced the proliferation and colony formation
To assess the roles of COL4A1 in regulating IDC cell proliferation. the SKBR3 cells were infected by COL4A1 shRNA or control shRNA lentivirus. all treated cells were counted continuously for 4 days by spectroscopic assay. SKBR3 cells proliferation were inhibited significantly by COL4A1 knockdown (Figure 9, Upper). To explore the effect of COL4A1 in IDC colony formation ability. SKBR3 cells treated with COL4A1 shRNA or control shRNA lentivirus were allowed to grow for 14 days to form colonies. as shown in Figure 9, Lower. COL4A1 knockout resulted in significantly decrease in the number of colonies in SKBR3 cells, as compared with the control shRNA group. Thus, our data indicated that the COL4A1 is able to regulate colony forming ability of IDC.

DISCUSSION
IDC also called infiltrating ductal breast carcinoma, is a frequent style, covering mostly 70-80% of all breast cancer diagnoses. IDC invades man with the highest rate of frequency [21]. Study on DEGs between IDC and normal ductal cells might be helpful in finding related genes of invasive ductal breast carcinoma. In this research, we collected the Microarray data of GSE5764 from GEO system, and 478 DEGs particular expression in IDC were identified. We speculated that they were possible to have a connenction with invasive ductal breast carcinoma. GO and KEGG enrichment analysis, PPI network analysis, MiRNA-Gene interatction network analysis were done and resulted the 478 DEGs. We found there were exciting       consequences. With the help from GO enrichment analysis, there seems to be a clear connection between the DEGs and GO terms of cytoplasmic movement. KEGG pathway analysis also proved that. Besides, in this research, we recognized 9 genes flapped in both gene lists. They associate with ECMreceptor interaction. Two of them, VCAN and COL4A1, show the most different expression profile between IDC and normal ductal cells. VCAN was called as proteoglycan-M (PG-M), is a large hyaluronan (HA)-binding chondroitin sulfate extracellular matrix proteoglycan which is part of the lectican family [22]. but experimental results showed that COL4A1 was significant upregulated gene compared with the control, but VCAN was not obvious upregulated in the SKBR3 cell line, COL4A1 encode type IV collagen was found in nearly all basement membranes and is truly preserved within species, and comprise 52 and 48 exons specifically. They can be divided into 127 nucleotides containing a shared, bi-directional promoter that requires alternative factors to control the structure uniqueness and the level of protein expression [23]. Herein, we described that COL4A1 is a negative prognosis symbol in invasive ductal breast cancer. This result is consistent with a former research, which had reported that collagen IV expressed by COL4A1 can be regulated by P4HA2 in tumor growth and metastasis [24]. and also the COL4A1 was also identified as the potential therapeutic target genes in head and neck squamous cell carcinoma [25], colorectal carcinoma [26]     and thyroid papillary carcinoma [27], it was also found that COL4A1 was the most significantly upregulated genes during the formation of the avian blood-gas barrier. Mutations in COL4A1 derived from the vascular component were sufficient to cause defects in vascular development and the blood-gas barrier. and mutation in COL4A1 resulted in disrupted myofibroblast proliferation, differentiation and migration [28]. where this result are also consisted with our experimental results that COL4A1 silence lead to the suppress in proliferation, differentiation and migration of IDC.

Microarray data
The microarray data GSE5764 was collected from Gene Expression Omnibus(GEO) which built upon the platform of Affymetrix Human Genome U133 Plus 2.0 Array. This platform was stored by Turashvili et al. [20] which included 30 microarray data of 10 patients' regular ductal and lobular cells, 10 surgical samples which were acquired by mastectomy from postmenopausal patients with IDC and ILC were checked out. There were 5 IDCs and 5 ILCs.

Data preprocessing
The original CEL data were imported into R and affy package was used for the background correction and normalization. The expression of genes that corresponding to multi probes were summaried. By the function of mas5calls, those samples which had no gene express were taken out in Affy package. Finally, we obtained 8642 genes in expression levels.

Differentially expressed genes selection
There are four kinds of samples in the datase, 10 normal ductal cells, 10 normal lobular cells, 5 IDC and 5 ILC. DEGs between 10 normal ductal cells and 5 IDC were identified with the use of Limma package [29]. The p-value under or equal to 0.05 and |log2fold change| larger than 1 were decided as check standard.

Functional annotation and pathway analysis of DEGs
Database for Annotation, Visualization, and Integrated Discovery [30] is a online program that combines functional genomic annotations with intuitive graphical summaries [31]. Gene lists or protein identifiers are swiftly annotated and concluded with the using of comprehensive categorical data for Gene Ontology (GO), protein domain, and biochemical pathway membership. For making a extensive evaluation of connected pathway or biologic methods in IDC, GO and Pathway enrichment analysis on DEGs were executed with the DAVID analysis system with threshold of p-value under or equal to 0.05.

Protein interaction networks analysis
Retrieval of Interacting Genes/Proteins (STRING) [32] database (http://string-db.org/) was decided to be the Search Tool to provides a chance to reach all interaction data, including data which might be unsafe and/or predictions made by computers. There are more than 1100 organisms in extensive protein connection with global data. In this research, protein-protein interaction (PPI) network of DEGs were founded by STRING online database with threshold of score lager than 0.4. The hub protein was selected by the node degree, and the network were visualized by Cytoscape [33].

MiRNA-gene regulation network analysis
The starBase (http://starbase.sysu.edu.cn/) is a database that deciphers the regulation relationships between RNA and RNA, or protein and RNA by managing the 108 CLIP-Seq datasets from 37 independent studies [34]. It can provide miRNA-mRNA regula tion pairs when using a different parameter setting. In this research, miRNA-mRNA interaction network of DEGs were collected by starBase online database.

Kaplan-Meier survival analysis
KM Plotter (http://kmplot.com/analysis/) is used for the meta-analysis based biomarker analysis in this paper [35] which analysed more than thousands of gene expression samples of breast cancer patients, was used to work out Kaplan-Meier survival evaluation to further assess the relationship of DEGs and prognosis. In the survival evaluation quartiles were used as cutoff values and it was set at p under 0.05.

cells and reagents
a human breast cancer cell line (SKBR3) and normal lung epithelial cell line (BEAS-2B) were obtained from the ATCC and cultured in low-glucose Dullbecco's modified eagle's medium (DMEM),

Real-time PCR
Total RNA was extracted from the cultured MCF7 cells and the concentration of extracted RNA was estimated by optical density measurement (A260/A280 ratio) with a NanoVus Plus (GE, USA). Real-time PCR amplification was carried out using the SYBR Green-based PCR Master Mix (Applied Biosystems/Life Technologies, USA). The ABI PRISM 7500 system (ABI, USA) was used for all reactions in a total volume of 25 µL.

Western blotting
Cells were harvested and total protein was isolated using RIPA buffer (50 mM Tris-HCl at pH 7.4, 1 mM EDTA, 150 mM NaCl, 1% NP-40 and 0.5% SDS) supplemented with proteinase cocktail (Roche, Switzerland), and heated for 5 min at 100°C. BCA Protein Assay Kit (Takara, Japan) was used to determine protein concentration. Equal amount of denatured protein samples were loaded and separated by 10% SDS-polyacrylamide gels, and then transferred onto polyvinylidene difluoride membranes (PALL). After blocking with 5% non-fat milk power in Tris-buffered saline/0.05% Tween 20(TBST), the membrane was incubated with a specific primary antibody, followed by the HRP-conjugated secondary antibody. Proteins were visualized using ECL reagents (Tanon, China).

Lentivirus production and cell transduction
To knock-down the expression of COL4A1, SKBR3 cells were transducted with lentivirus carrying shRNA (sequence, CCGGCCTGGGATTG ATGGAGTTAAACTCGAGTTTAACTCCATCAATCC CAGGTTTTTG). To produce lentivirus, 15 ug of the transfer plasmid, 9ug of the package plasmid (psPAX2), and 6 ug of the envelope plasmid (pMD2.G) were cotransfected into 293T cells by the calcium phosphate method. After 24 h of transfection, the cells were cultured in 15 ml of fresh serum-free medium for another 48 h. The culture medium with virions was then collected, filtered through a 0.45-um filter, and centrifuged at 90,000 g for 90 min. For transduction, SKBR3 cells were infected for 16 h with 10 ul of virus suspension containing 8 ug/ml of Polybrene (Sigma)

Cell proliferation
Cell proliferation was assessed by water-soluble tetrazolium salt (WST) assay with the Cell Counting Kit-8 (Dojindo, Kumamoto, Japan) and measured per the manufacturer's instructions. At 24 hours after transfection with vehicle or COL4A1 shRNA , SKBR3 cells were seeded onto 96-well plates (2 × 10 3 cells/well), and cell proliferation was documented every 24 hours for 4 days. The number of viable cells was assessed by measurement of the absorbance at 450 nm.

Colony-formation assay
In plate colony-formation assay, malignant melanoma cells were resuspended in RPMI 1640 containing 10% FBS and layered onto 6-well plates (5 × 10 2 cells/well). The cells were incubated for 5 days and stained with Giemsa. Colonies containing 50 cells or more were counted.

Statistical analysis
Data are shown as the means ± standard deviation. The statistical significance of differences between groups was assessed via one-way analysis followed by Student's t-tests of comparison. p-values less than 0.05 were considered to be statistically significant. Statistical analyses were conducted using the GraphPad Prism 4.0 software.

CONCLUSIONS
In this research, a general amount of 478 DEGs were identified to have a connection with pathological IDC. Within them, 9 vital genes were additionally recognized by equating PPI network, and MiRNA-Gene network. Two of them were selected to evaluate the relationship between their expression and prognosis. moreover, experimental results indicated that down-regulated expression of COL4A1 gene significantly inhibit the colony formation of the IDC cells and suppress the proliferation ability of the IDC cells. These genes could be a crucial part in the progression of IDC and be offered as therapy targets and prognosis indicator.