Towards understanding the lifespan extension by reduced insulin signaling: bioinformatics analysis of DAF-16/FOXO direct targets in Caenorhabditis elegans

DAF-16, the C. elegans FOXO transcription factor, is an important determinant in aging and longevity. In this work, we manually curated FOXODB http://lyh.pkmu.cn/foxodb/, a database of FOXO direct targets. It now covers 208 genes. Bioinformatics analysis on 109 DAF-16 direct targets in C. elegans found interesting results. (i) DAF-16 and transcription factor PQM-1 co-regulate some targets. (ii) Seventeen targets directly regulate lifespan. (iii) Four targets are involved in lifespan extension induced by dietary restriction. And (iv) DAF-16 direct targets might play global roles in lifespan regulation.


INTRODUCTION
DAF-16, the C. elegans FOXO transcription factor, plays as a molecular switch in lifespan regulation [1]. When activated by reduced insulin signaling, it could extend C. elegans's lifespan by activating or inhibiting its downstream genes [2,3]. Presumably, these downstream genes largely determine how the lifespan can be extended. Yet, little is known about their positions in the regulatory network: which are directly regulated by DAF-16, and which are indirect targets.
To identify DAF-16 targets, various high throughput techniques have been used, such as microarray [2,3], proteomics [4], and DamID (DNA adenine methyltransferase identification) [5]. Microarray and proteomics could identify DAF-16 downstream genes, but have difficult to figure out whether they are direct or indirect targets. DamID could identify DAF-16 direct targets in theory, but may have probability to identify false positives and negatives in practice [6].
Insulin signaling is remarkably conserved in C. elegans, Drosophila melanogaster and mammals, and reduced signaling of this pathway has been shown to extend lifespan in all of these animals [7]. For FOXO and its orthologs, there are different identifiers of genes, transcripts and proteins in different species. We called them all "FOXOs" hereafter. Currently, many experimentally validated FOXOs direct targets scattered in literatures [6,8,9]. Collecting these known targets, and then mapping them to C. elegans through orthologous analysis would be helpful for longevity research in C. elegans.
In this work, by manually reading literatures, we collected 208 experimentally validated FOXOs direct targets. Through orthologous mapping, we eventually got 109 DAF-16 direct targets in C. elegans. To make data easily accessible, we set up FOXODB (http://lyh.pkmu. cn/foxodb/). Bioinformatics analysis on the 109 targets revealed interesting results.

FOXODB: a database of FOXO direct targets
As shown in Figure 1, we searched PubMed with keywords "FOXOs" and found more than 2700 papers. We manually read the papers. When a gene was determined as a FOXOs direct target, key information was extracted and record in FOXODB (http://lyh.pkmu.cn/foxodb).
Rules for collecting a gene to FOXODB were strict. (i) The gene should be differentially expressed in FOXOs (+) versus FOXOs (-); (ii) FOXOs must be able to bind to the promoter of the gene; And (iii) only traditional experimental evidence(s) was adopted. Details can be found in Materials and Methods.
Currently, FOXODB covers 302 entries and 208 genes, including 35, 26, and 147 direct targets in C. elegans, Drosophila melanogaster and mammals, respectively. FOXODB is well designed and friendly to user ( Figure 2). As in our previous works [10,11], FOXODB was written in PHP (Hypertext Preprocessor). We believe FOXODB will be a valuable resource to the field.

DAF-16 direct targets significantly overlaps with previous results
Inparanoid is a database specially designed for orthologue analysis [12]. We used it to map FOXODB genes to C. elegans orthologs and got 109 DAF-16 direct targets eventually. (Supplementary Table 1 and 2).

Seventeen DAF-16 direct targets directly regulated lifespan
GenAge [16], an useful longevity research resource, covers 681 longevity genes in C. elegans. Compared with the 109 targets, 17 genes overlapped, significantly higher than 3.71 under random, p = 0 ( Table 2). Of the 17 genes, 10 were obtained by orthologous mapping. This means they were for the first time known as DAF-16 direct targets that regulate lifespan.

Four DAF-16 direct targets were involved in lifespan extension induced by dietary restriction
Many dietary restriction methods could extend C. elegans's lifespan [18]. Some of them such as eat mutation or some forms of bacterial dilution do not require DAF-16, while some other forms of bacterial dilution and peptone dilution require DAF-16 [18]. Thus, it was interesting to know whether DAF-16 direct targets were involved in lifespan extension induced by dietary restriction. GenDR, a database collecting lifespan-regulating genes related to dietary restriction, covers 48 genes in C. elegans [19]. Here, we compared them with the 109 targets and found 4 overlapping genes: age-1, hsp-12.6, daf-16 and daf-2. This was significantly higher than 0.26 under random, with p = 1.35E-4. This result supported that some dietary restriction methods required DAF-16 for lifespan extension.

DAF-16 direct targets might play global roles in lifespan regulation
Proteins do not function in isolation but through interaction with each other. And from network view, the more interaction partners (higher degrees) one protein has, the more important the protein might be. Here, we studied the degrees of the 109 targets, and found that the average degree is 17.77, significantly higher than 11.85, the average degree for other proteins in the network (p = 0.0014, Kolmogorov-Smirnov test, KS test for short). As analyzed above, 17 targets directly regulated lifespan. The average degree for them is the highest, 36.31 (see Figure  3A). This result was consistent with our previous work, the degrees of longevity genes tend to be higher than that of non-longevity genes [20].
K-core, another network index, takes into account not only the number of direct neighbors but also the placement of a protein in the network. It assumed that centrally located proteins are more important than the peripheral ones [21]. As shown in Figure 3B, the 109 targets have an average K-core 7.59, significantly higher than 7.37, the average for other proteins in the network (p = 8.4*E-4, KS test). And the 17 lifespan-regulating targets had the highest average K-core 10.
To know whether DAF-16 direct targets function through cooperation with each other, we computed for each protein the 'target neighbor ratio'. It is the ratio of the number of interaction partners that belong to the DAF-16 direct targets to its degree [22]. As shown in Figure 3C, DAF-16 direct targets tend to directly interact with each other (p = 2.7*E-4, KS test).
In all, these results revealed that DAF-16 direct targets tended to have more interaction neighbors, locate   network center and interact with each other. This implied that DAF-16 direct targets might play global roles in lifespan regulation.

DISCUSSION
In this work, we manually curated FOXODB by reading literatures. It now covers 208 FOXOs direct targets. To our knowledge, this is the largest. 109 DAF-16 direct targets in C. elegans were found by orthologous mapping. And 17 of them directly regulated lifespan. These are also important data to the field.
We searched DAF-16 binding element (DBE) in 1kb promoter region of the 109 DAF-16 direct targets, and found 30 of them contained the DBE (GTAAACA or TGTTTAC) while the others not. It was difficult to understand why so many DAF-16 direct targets did not contain DBE. For explanation, first, different works used different DBE motifs [2][3][4]. It was hard to know which one was correct. We chose a strict DBE motif and thus resulted in few sequence matches. If using a loose DBE motif, more genes with DBE could be found. For example, when using DBE, RTAAAYA, R = A/G, Y = C/T, as in previous work [3], 91 of the 109 targets would contain the DBE in 1kb promoter region. Second, we only searched the 1kb promoter region. Some DBE may locate outside of the region and thus not be found.
We did the first network analysis on DAF-16 direct targets. The results showed they tended to be higher in degree, locate network center and directly interact with each other. The protein interactions used for network analysis include several kinds of interactions such as physical interaction, genetic interaction and predicted interaction. However, it's worth noting that some of the interactions might be collected from literatures. Thus, the more a gene being studied, the more likely the gene has higher degree. Though the collected interactions may be only a small part of the whole data, we still cannot exclude the possibility that this might affect the results.

Data source
The gene sequences were downloaded from WormBase, version 220. Protein interaction network was obtained from our previous work [20]. The network was constructed by integrating different kinds of interactions including physical interactions, genetics interactions and predicted interactions, covering a total of 7, 219 proteins and 41, 132 edges [20].

Workflow
As shown in Figure 1, we searched PubMed with 'FOXOs' and found more than 2700 papers. We manually read the papers and found 208 FOXOs direct targets. Inparanoid is a database specially designed for orthologous analysis [12]. We used it to map the 208 targets to their orthologs in C. elegans and finally got 109 genes. Bioinformatics analysis on this list were done including comparison with previous results, transcription factor binding site enrichment, lifespan regulation and network topological feature analysis. We built a database to make all data easily accessible.

Database creation
To collect FOXOs direct targets, we searched

Hypergeometric model
The hypergeometric model was used for calculating the significance of two gene sets with a certain number of overlapping genes. The P-value is calculated as follows: N: Number of genes in C. elegans genome, 20,000 was used for approximation in this work. m: Number of genes in gene set 1. n: Number of genes in gene set 2. k: Number of overlapping genes between the m genes and the n genes.

K-core K
A K-core of a graph can be obtained by recursively removing all nodes with a degree less than K, until all nodes in the remaining graph have a degree at least K.
Functions are the definitions of the topological features. Descriptions give explanations for symbols in the definitions.