A meta-analysis of transcriptomic characterization revealed extracellular matrix pathway involved in the H5N1 and H7N9 infections

Avian-origin H5N1 and H7N9 influenza A viruses are capable of causing lethal infection in humans, with serious lung pathology and leading to acute respiratory distress syndrome. The contribution of host response associated with the poor prognosis of H5N1 and H7N9 infections remains unclear. The aim of this study was to identify the host factors involved in the high pathogenicity of H5N1 and H7N9 by a systematical meta-analysis. The RNA-seq datasets related to H5N1, H7N9, and H1N1 infections with time series were retrieved from GEO. After merging the data from different series, ComBat was used to adjust the known variances from different batches. The transcription factors binding the genes in each cluster were predicted by PASTAA. We figured out the genes that were differentially expressed at any time point in samples infected with H5N1, H7N9, or H1N1. The analysis of biological function showed that genes related with cytokine were up-regulated in all three viruses. However, genes associated with carbon metabolism were found exclusively down-regulated in H7N9 and the extracellular matrix pathway were only enriched in H5N1 and H7N9. To summary, our study suggested that the extracellular matrix might be associated with the high fatality of H5N1 and H7N9 viruses in humans.


INTRODUCTION
Avian influenza, the infections of birds with avian influenza type A viruses (IAVs), occurred naturally among wild aquatic birds worldwide. Avian influenza viruses do not commonly infect humans, but human infections are reported sporadically. Recently, outbreaks of human infections after contact with infected birds or their secretion or through limited person-to-person transmission caused the attention of the public. Following the first appearance of the H5N1 in 1997, the H7N9, H10N8, and H5N6 subtypes of influenza A virus were detected following under the ongoing surveillance efforts, which have all caused severe infections [1].
Among all avian IAVs, H7N9 and H5N1 have been responsible for most human infections worldwide to date, including the most serious illnesses and deaths. Since the first recognized human case of H5N1 infection in 1997, the World Health Organization (WHO) has reported 854 confirmed human infection cases as of July 2016, with a fatality rate of about 66% [1,2]. Besides, a total of 1307 laboratory-confirmed human infections with the H7N9 has been reported through the notification of the International Health Regulations (IHR) since early 2013 with a fatality rate of 40.4% [1,3]. More than 400 additional laboratoryconfirmed cases of human infection have been reported to WHO from China in the recent months since 2017 [4]. Fortunately, both H5N1 and H7N9 were reported to have limited potential for human-to-human transmission, but it might gain the ability to spread between people through the antigenic shift or antigenic drift. Such risk urged us to prevent and cure the infection of H5N1 and H7N9.
Unlike the common seasonal flu, H5N1 and H7N9 viruses were highly pathogenic. H5N1 was highly pathogenic in both human beings and birds [5], while H7N9 seemed to exhibit low pathogenicity in birds with the severe disease that occurs in human [6]. There was more H7N9 human infections than H5N1, partly because of asymptomatic infection of H7N9 caused by the transmission through direct contact with seemingly healthy but infected birds [7]. FDA approved the first monovalent adjuvanted vaccine for prevention of H5N1 avian influenza in 2013 [8]. This vaccine could be used in the event that the H5N1 avian influenza virus develops the capability of efficient humanhuman transmission. However, there currently is no publicly available vaccine to protect against H7N9 virus infection [9].
For H5N1 and H7N9, the exact contribution of individual viral effect to pathogenicity at the molecular level is still largely unknown, which is certainly helpful to decrease the fatality. Microarray is a useful tool to understand their infection at the level of transcriptional regulation in the host cells which is important for the interpretation of these complex genetic changes [10]. However, a lot of studies have reported that findings of microarray data were not reproducible or were sensitive to the data perturbations [11,12]. Moreover, microarray used over 10 thousand probes on tens or hundreds of samples, which exacerbated the accuracy of the potential predictors. As a result, a meta-analysis was used to increase the reliability and generalizability of results.
Through this meta-analysis, we aimed to obtain a more precise set of differentially expressed genes and analyze their biological functions. In this study, we utilized the 6 and 2 available public microarray datasets from Gene Expression Omnibus (GEO) repository [13] for H5N1 and H7N9, respectively, to figure out the genes which were differentially expressed in cell lines infected with influenza virus and control. Another two datasets of H1N1 were analyzed as a comparison, so that we were able to figure out the possible cause of the high pathogenicity of H5N1 and H7N9 and point out the direction to clinical treatment.

RESULTS
The data pre-processing for microarray metaanalysis As the basis for the meta-analysis, there were totally 6 and 2 GEO datasets with time series available for H5N1 and H7N9 respectively and we also selected another 2 datasets from H1N1 to compare the avian influenza with the 2009 pandemic flu in order to figure out why H5N1 and H7N9 were so pathogenic (Table 1).
Because it was generally agreed that microarray data from distinct experimental platforms, often using distinct reference samples, are not directly comparable, we used some strategies to overcome it. After background correction and quantile normalization, the expression level of each gene was estimated by the median of the expression levels of all the probes mapped to it. Next, ComBat was utilized to adjust the known dataset differences with an empirical Bayesian framework. The hierarchical clustering of the expression profiles before and after adjustment indicated that our strategies worked for meta-analysis. Before adjustment, the samples from the same datasets were clustered together ( Figure 1A, 1C). But after adjustment, the samples from the same time points were the closest to each other ( Figure 1B, 1D) with an exception for dataset GEO66597 and the 7h time point. Considering its aberrant clustering performance and the fact that the cell line used in GEO66597 were different from other studies, which meant that they used different tissues as a target, we decided to remove it from the following studies. The 12h and 24h time points of H5N1 were clustered together well ( Figure 1B), but only 24h time point of H7N9 were clustered together. This suggested two possibilities: first, the limited datasets for H7N9 constrained the performance; and second, the cells might need more time to react when infected with H7N9 than H5N1.
The principal component analysis also showed the time points were the largest possible variance for H5N1 where H7N9 were not shown but had similar results as H5N1 ( Figure 2A).

Identification of differentially expressed genes using microarray meta-analysis
First, we calculated the Pearson correlation of each time point for H5N1 and H7N9. As expected, the internal variance increased as the time passed ( Figure 2B, H7N9 not shown). The differentially expressed genes were identified for each time point of H5N1, H7N9, and H1N1. The count of genes significantly up-regulated or downregulated in every group were plotted as Figure 2C.
The KEGG pathways were enriched for the upregulated or down-regulated genes in H5N1, H7N9, and H1N1 virus strain. Figure 3 showed the enriched pathways for H5N1 and the pathways enriched for H7N9 and H1N1 were available in Table 2. The Cytokine-cytokine receptor interaction, Toll-like receptor signaling pathway, Cytosolic DNA-sensing pathway and Influenza A pathway were all up-regulated in cells infected with three virus strains. These results suggested that the cells were using chemokine and cytokines to defense the virus. But the down-regulated genes of the three viruses had different patterns. There were no down-regulated genes available for H1N1, and no enriched pathway for H5N1, while the down-regulated genes in H7N9 were associated with metabolism. The dataset annotated was not in the analysis considering it was aberrant after normalization and batch adjustment.

Identification of clusters of expression profiles in virus strains
The fold changes of genes that were differentially expressed in any strain were utilized to decide the clusters of the expression profiles ( Figure 4). The expression levels of most genes were only changed dramatically in 24h time point. Moreover, the correlation between H5N1 and H7N9 at 24h time point (Pearson correlation: 0.82) were higher than the correlation between H5N1 and H1N1 (Pearson correlation: 0.61) or H7N9 and H1N1(Pearson correlation: 0.59). It suggested that H5N1 and H7N9, both of which were avian influenza viruses, were more similar to each other.
Finally, we obtained 10 clusters based on their expression profiles in H5N1, H7N9, and H1N1. Based on the genes involved in 10 clusters, the involved pathways and the enriched transcript factors were summarized in Table 3. Some clusters did not have enriched pathway or transcript factors, which might by led from a few genes involved. The only pathways enriched for up-regulated clusters were Influenza A pathways and most of the transcript factors belonged to interferon regulatory transcription factor (IRF) family. It suggested that the upregulated genes were highly associated with the interferon regulation and were common in both avian influenza and pandemic flu. Besides, Sta5a and Sta5b mediate cellular responses to the cytokine KITLG/SCF and other growth factors, which might help to fight against infection.
However, the down-regulated clusters had different performance. First, cells infected H1N1 viruses did not have any down-regulated genes in this study surprised but it might be caused by the strict criteria for differentially expressed genes and the large amount of up-regulated genes. Second, the genes only down-regulated in H7N9 were highly associated with metabolic pathways, while the genes down-regulated in both H7N9 and H5N1 were related to the extracellular matrix pathway. Though there was no pathway enriched for genes only low-expressed in H5N1, the transcript factor Creb and Crebβ might have an impact.

DISCUSSION
Influenza viruses can be categorized as either low pathogenicity or high pathogenicity based on their ability to induce disease in a specific host. A better understanding of mechanisms that may lead to severe illness or death caused by high pathogenicity H5N1 or H7N9 would benefit the treatment of those infections when happens in humans. Studies of differences in the host response to different virus strains would provide insight into why viruses exhibit severe damage to then certain host. In this study, we preformed a meta-analysis to evaluate the host response to two high pathogenicity H5N1 and H7N9 avain influenza viruses with the comparison with the seasonal flu H1N1, which have been reported to cause high mortality in human infection cases.
As describe in the results, cells infected with H5N1, H7N9, and H1N1 shared the similar functions of up-regulated genes. They were the cytokine-cytokine receptor interaction, Toll-like receptor signaling pathway, cytosolic DNA-sensing pathway and influenza A pathway. The transcription factors were also similar which were associated with the interferon regulatory transcription factor (IRF) family. As expected, most of the up-regulated genes were associated with the immune system against infection. However, the down-regulated genes showed various patterns for H5N1 and H7N9, while there were no down-regulated genes in H1N1.
We failed in obtaining the enriched pathway for the genes that were only down-regulated in H5N1. In order to understand their function in another way, we also utilized STRING v10 as an alternative to look at their network status and GO enrichment [14]. It showed that these genes were significantly related with positive regulation of cell migration and negative regulation of cell differentiation. And also these genes seemed to have significantly more interactions than expected, which indicated that the genes were at least partially biologically connected as a group. As a result, we could conclude that the low-expression of these genes would increase cell differentiation and decrease cell migration, which in further might be related with virus-elicited inflammatory and immune reactions.
The cluster with genes only down-regulated in H7N9 was associated with metabolic pathways, especially the carbon metabolism. Qin, Zhang [15] also reported that H7N9 infection could be linked to saccharide or polysaccharide metabolism. It also reported that central metabolism could be strongly affected by virus infections [16]. Besides, biosynthesis of amino acid was also enriched. Thus, the carbon synthesis and amino acid synthesis might be essential to the virus replication of H7N9.
Finally, the down-regulated genes shared by H5N1 and H7N9 were related with the extracellular matrix. The KEGG pathway, the extracellular matrix (ECM), was defined as a complex mixture of structural and functional macromolecules and serving an important   role in tissue and organ morphogenesis and in the maintenance of cell and tissue structure and function [17]. The functional macromolecules in the extracellular matrix contained proteoglycans, non-proteoglycan polysaccharide, collagens, fibronectin and laminin and so forth. Several studies have reported the relationship between ECM and viral infections. For example, ECM and interacting proteins involved in the entry of various viruses like gamma-retrovirus, hepatitis B virus and rhabdovirus [18,19]. Leung, Li [20] reported that treating cells with anti-fibronectin antibodies or fibronectin-specific small interfering RNA inhibited the H1N1 replication, but did not inhibit the H5N1 viruses. Moreover, H3N2 virus was able to use intercellular connections to spread to neighboring cells [21]. These reports indicated that H5N1 and H1N1 replicated with the help of the extracellular matrix in a different manner and the extracellular matrix helped the virus entry, replication, and spread. And according to the functional analysis, H7N9 should be similar to H5N1 in this aspect. Additionally, Chen, Zhou [22] also reported that EMC pathway was involved in H7N9 infection. On the other hand, Chen, Cui [23] found that higher plasma levels of hydrolysis of fibronectin and collagens IV related proteins in survivors of severe H7N9 infection and they hinted the ongoing tissue remodeling after severe H7N9 infection. Thus, the perturbed extracellular matrix pathway in cells might also hint the damaged tissue remodeling of cells, which devastated the fatality. To summary, we performed a systematic metaanalysis to evaluate the expression profiles in samples infected with H5N1 and H7N9 and we suggested that the extracellular matrix worked positively in the high fatality of H5N1 and H7N9 through affecting viral replication and spread and reducing tissue modeling. www.impactjournals.com/oncotarget

Data selection and characteristics
"H5N1", "H7N9", and "H1N1" were used, respectively, as keyword when search GEO series (https:// www.ncbi.nlm.nih.gov/geo/). The Homo Sapiens and expression profiling by array were used as filtering for the organism and series type. A total of 9 series were available for H5N1, among which 6 series (GSE76599, GSE66597 [24], GSE49840 [25], GSE43203, GSE43204, and GSE28166 [26]) had time series data and were included in this study. Only 2 series (GSE49840 [25] and GSE69026) of H7N9 datasets met our criteria. To compare the host response of highly pathogenic avian influenza with the pandemic H1N1 flu, 2 series of H1N1datasets (GSE80697 and GSE40844) were randomly selected using the same criteria. The virus strains selected in this study were: A/ Vietnam/1203/2004 (H5N1); A/Anhui/01/2013 (H7N9); A/California/04/2009 (H1N1). The detail characteristics of the GEO data series were listed in Table 1. All the datasets were generated with the Agilent platform. To prevent the differences introduced by different methods used in data preprocessing of these series, we downloaded the raw data and processed them using the same procedures. The R package GEOquery [27] was utilized to download the raw Cel files and also the information of each sample. The platform information of every series was retrieved directly from the GEO website.

Microarray data pre-processing
The data pre-procession were carried out with the help of the limma package in R software [28]. All the included microarray data used single-channel Agilent platform, so only the intensities of the green channel were extracted. The medians of the foreground and background pixels were calculated as the estimated foreground and background signals. Next, we adjusted the foreground adaptively for the background intensities based on convolution model using the method called 'normexp', which was preferable to background subtraction when assessing differential expressions [29]. Finally, quantile-normalization was applied to the data. Only the mock-infected samples and the samples infected with the targeted influenza were kept into the next step.

Data merge and batch adjustment
The gene names were mapped to the probes based on the platform information. The expression level of each gene was calculated using the median value of the probes that were mapped to it. The data from different series were merged together according to the gene symbol and any genes missing in any series were excluded. The batch adjustment between different GEO series was completed using package SVA in R software [30]. ComBat function was effective to adjust for known batches using an empirical Bayesian framework [31].
In order to confirm the effects of batch adjustment and also the prerequisite of this meta-analysis study, the expression levels before and after adjustment were visualized using hierarchy clustering and principal component analysis. One series (GSE66597) seemed to be an outlier and was removed in the following study.

Differentially expressed genes identification
The differentially expressed genes on each time point of all three H5N1, H7N9, and H1N1 were identified through limma package by comparing the samples with virus infections and with mock infection [28,32]. Linear models were fitted and empirical Bayes method was used for statistical analysis and assessing differential expression. The obtained p-values were adjusted by false discovery rate (FDR). The genes with FDR less than 0.01 and absolute log2-fold-change larger than 2 were considered as significantly differentially expressed genes.
The genes that were differentially expressed in any virus strain were extracted and they were clustered based on their logarithm of fold change on three time points in different strains and 10 different clusters were found.

Biological function analysis
The KEGG pathways were enriched using DAVID Bioinformatics Resources v6.8 [33,34] for genes upregulated or down-regulated in samples infected with any influenza and for genes belonging to different clusters. HumanGenome in Agilent Backgrounds was selected as population background, considering that all the datasets used Agilent platform. For genes upregulated or down-regulated in H5N1, H7N9 or H1N1, the p-values were adjusted using Bonferroni methods. However, due to the limited genes in each cluster, the p-values of enriched pathways for different clusters were adjusted using Benjamini method, which is looser than Bonferroni.
The transcription factors binding the genes in each cluster were predicted by PASTAA [35]. Only the transcription factors with an association score larger than 6 were considered as an enriched transcription factor for each cluster.