PRMDA: personalized recommendation-based MiRNA-disease association prediction

Recently, researchers have been increasingly focusing on microRNAs (miRNAs) with accumulating evidence indicating that miRNAs serve as a vital role in various biological processes and dysfunctions of miRNAs are closely related with human complex diseases. Predicting potential associations between miRNAs and diseases is attached considerable significance in the domains of biology, medicine, and bioinformatics. In this study, we developed a computational model of Personalized Recommendation-based MiRNA-Disease Association prediction (PRMDA) to predict potential related miRNA for all diseases by implementing personalized recommendation-based algorithm based on integrated similarity for diseases and miRNAs. PRMDA is a global method capable of prioritizing candidate miRNAs for all diseases simultaneously. Moreover, the model could be applied to diseases without any known associated miRNAs. PRMDA obtained AUC of 0.8315 based on leave-one-out cross validation, which demonstrated that PRMDA could be regarded as a reliable tool for miRNA-disease association prediction. Besides, we implemented PRMDA on the HMDD V1.0 and HMDD V2.0 databases for three kinds of case studies about five important human cancers in order to test the performance of the model from different perspectives. As a result, 92%, 94%, 88%, 96% and 88% out of the top 50 candidate miRNAs predicted by PRMDA for Colon Neoplasms, Esophageal Neoplasms, Lymphoma, Lung Neoplasms and Breast Neoplasms, respectively, were confirmed by experimental reports.


INTRODUCTION
Discovered in Caenorhabditis elegans at first, microRNAs (miRNAs) are a highly profuse class of short, with length of 21-24 nucleotides, endogenous singlestranded non-coding RNAs (ncRNAs) [1,2].Due to the diversity in sequence and expression patterns, miRNAs play important roles in regulating genes in both animals and vegetation by targeting miRNAs for cleavage or translational repression [3,4].The first two detected miRNAs lin-4 and let-7 are considered to be unique when first described, which were found to control developmental timing in Caenorhabditis elegans.However, several following findings suggested that miRNA genes were probably one of the most phylogenetically numerous and miscellaneous classes of ncRNA genes [5][6][7].Since the discovery of the first two miRNAs, thousands of miRNAs have been discovered in the eukaryote ranging from

Research Paper
fungus to mammals on the basis of copious experimental implementations and computational models in the past few years.With increasing genetic and bioinformatics analysis, researchers have found that by yielding a negative impact on the expression level of their target genes, miRNAs work through two different mechanisms: the miRNAs could perfectly or near-perfectly bind to their binding sequences within the 3′ untranslated regions (UTR) of their target mRNAs to indirectly induce cleavage of mRNA or control gene expression at the translational phase through imperfect target matching [8].Furthermore, more and more evidences accumulated by substantial experiments indicate that miRNAs have a significant impact on various crucial cellular processes including propagation, differentiation, development, apoptosis, transduction, metabolism, viral infection and so on [4,[9][10][11][12][13][14][15].Besides, plenty of studies have shown that miRNA mutations or mis-expression are closely related with various human cancers, indicating that miRNAs could perform as tumor suppressors and oncogenes [16].For instance, Zhang et al. confirmed that downregulation of miRNA-181d, probably through reverse regulation on a microRNA-181d gene (Na+/K+ transporting ATPase interacting 2), could suppress the development of pancreatic cancer cell lines [17].MiRNA is also common in hepatocellular carcinoma, an example is that Au SL et al.'s finding suggests that enhancer of zeste homolog 2 (EZH2) exerts its prometastatic function through epigenetic silencing of multiple tumor suppressor miRNAs including miR-101, miR-139-5p, let-7c, miR-125b, and miR-200b [18].For lung cancer, one of the pivotal causes contributing to most cancer deaths in the United States, it has been verified that regular process of DNA methylation reinstatement, re-expression of methylation-silenced tumor suppressor genes, and inhibition of tumorigenicity was owing to the compulsory miR-29 expression in cell lines of lung cancer [19].MiRNA expression was proved to be related to tumor formation, progression, development as well as reaction to treatment by amassing experimental evidence, from which we could deduce that miRNA has the potential practical application as biomarkers for diagnose, prognosis and prediction [20].Lu et al. conducted a comprehensive analysis to the human miRNA-disease association database.As a result, the analysis unveiled significantly statistical patterns of miRNA-disease associations [21].Taking a considerable number of biological databases related with miRNA into consideration, developing innovative and efficient computational models to identify possible miRNA-disease associations is urgently required.In Recent years, more and more new miRNAs and diseases have been discovered by researchers with the development of technology.Meanwhile, substantial number of associations between miRNAs and diseases remain to be identified.There is no doubt that prioritizing related-diseases and related-miRNAs for newly discovered miRNAs and diseases could effectively contribute to promoting disease biomarker detection for the prevention, diagnosis and treatment of human diseases [22].It is also considered as a critical function for a method of identifying miRNA-disease associations.
Plenty of computational models for potential miRNA-disease association prediction have been proposed, based on the conjecture that miRNAs having similar function are likely to be related to phenotypically similar diseases [23][24][25].For instance, Jiang et al. [26] developed a network-based computational approach to predict miRNA-disease associations by taking advantage of integration of miRNA functional network, human phenome-miRNAome network, and known miRNAdisease association network.However, the method only adopted local neighbor information, which greatly limited the performance.Moreover, depending on the postulation that target genes will perform abnormal regulation if miRNAs are included in a specific tumor phenotype , Xu et al. [27] devised a model according to the miRNA target-dysregulated network (MTDN) to infer new disease related miRNAs.What MTDN differs from other network-based model is that it identified dysregulated network edges (regulations) rather than dysregulated nodes (miRNAs) to assemble diseaserelated signatures.But the MTDN only focused on prostate cancer, topological feature difference may result in improper outcomes when we applied MTDN to other diseases.Besides, Chen et al. [28] developed a method, HGIMDA, integrating various known heterogeneous databases including disease semantic similarity, miRNA functional similarity, Gaussian interaction profile kernel similarity and experimentally validated miRNA-disease associations into a heterogeneous network to identify potential miRNA-disease associations.HGIMDA was developed based on an iterative procedure to figure out the optimal solutions based on the integrated global network, where it inferred possible relationship between certain disease and miRNA by calculating all paths satisfying specific condition.However, the selection of decay factor in the model remains unresolved.Furthermore, Chen et al. [29] proposed a model of Within and Between Score for MiRNA-disease Association Prediction (WBSMDA) to predict potential miRNA-disease associations.WBSMDA calculated the Within-Score, finding miRNA achieving highest-similarity-score among miRNAs having relationship with the investigated disease, and Between-Score, finding miRNA achieving highest-similarity-score among miRNAs without the known relationship with the investigated disease, to predict potential miRNA-disease associations.Nevertheless, result of WBSMDA shows that its performance is still not satisfying.By considering information of miRNA cluster and family , Xuan et al. [30] devised an approach of Prediction of microRNAs Associated with Human Diseases Based on Weighted k Most Similar Neighbors (HDMP).In the framework of HDMP, miRNAs in a cluster or family were assigned www.impactjournals.com/oncotargethigher weight while constructing miRNA functional similarity matrix to further calculate relevance score with investigated disease because of their higher probability to be related with similar diseases.In addition, random walk and its various variants have been broadly applied in bioinformatics, such as disease gene prediction [31], disease-related long non-coding RNAs prediction [32][33][34], drug-target interaction prediction [35,36], disease-related miRNA-environmental factor interactions prediction [37] and disease-related microbiota prediction.Consequently, several miRNA-disease association prediction models were designed by implementing random walk to prioritize related miRNAs for diseases.Chen et al. [38] presented a method of Random Walk with Restart for MiRNA-Disease Association (RWRMDA) by applying random walk with restart on the miRNA functional similarity network.RWRMDA made full use of global similarity network rather than local one, compared with previous models, which has been proved to improve performance.However, the main defect of the method is that it is not able to predict for diseases any known associated miRNA.Furthermore, another random-walk applied framework focusing on the functional connection between disease genes and miRNA targets in protein-protein interaction (PPI) networks at the systematic level was proposed by Shi et al. [39].For the purpose of identifying the functional links, which was used to construct a bipartite miRNA-disease network, random walk analysis was implemented as a distance measure method.Recently, Xuan et al. [40] have also constructed a miRNA-disease association prediction model based on random walk.This model exploited nodes' prior information and local topological structures of the different classes of nodes by constructing miRNA network based on paired miRNAs' associated diseases information and assigning different weights to different nodes.Moreover, by involving protein information, Mork et al. [41] proposed scoring schemes that ranked miRNA-disease associations by combining protein-disease association scores and miRNA-protein association scores.
Apart from the aforementioned models, computational approaches deploying machine learning methods are becoming increasingly prevailing in bioinformatics [42][43][44].Chen et al. [45] proposed a model of Regularized Least Squares for MiRNA-Disease Association (RLSMDA) to identify potential association between miRNAs and diseases, which is a global and semi-supervised method without need of negative samples.RLSMDA is capable of predicting potentially associated miRNAs for diseases without known related miRNAs.However, the performance is not satisfactory enough.Furthermore, a computational model of Restricted Boltzmann machine for multiple types of miRNA-disease association prediction (RBMMMDA) was devised in order to discern different miRNA-disease association types [46].RBM model is a bilayer undirected graphical model including layers of visible modules, disease, and invisible modules, unknown features describing miRNAdisease associations, to predict both the miRNA-disease associations and its corresponding types.Nevertheless, parameter selection remains unresolved in RBMMMDA.In conclusion, previous models have the following limitations.First of all, some methods need negative samples, which is difficult to identify in miRNA-disease association network.Besides, the information provided by the known miRNA, disease and miRNA-disease networks has not been fully exploited.Furthermore, some models rely heavily on parameter selection, which remains unsolved at last.Therefore, a reliable and effective approach for predicting potential miRNAdisease associations is eagerly necessitated.In order to clearly illustrate the input, output and limitation of each computational models aforementioned, we published a comparison table, Supplementary Table 1.
In this study, we proposed a novel computational method of Personalized Recommendation-based MiRNA-Disease Association prediction (PRMDA).Recommendation algorithms, as a universal computational algorithm, has been applied in many aspects including bioinformatics [47].The reason why we choose personalized recommendation algorithm is that among all recommendation algorithms, personalized recommendation algorithm is remarkably superior in dealing with data sparsity and scalability compared with other algorithms.Generally, in e-commerce system, personalized recommendation algorithm could effectively solve the data sparseness and cold start problems without much participation of users, which corresponds to our study of sparsely distributed data of miRNAs and diseases and prioritizing potentially associated miRNAs for new diseases without known related miRNAs.In our study, potential miRNA-disease associations are recommended with high priority by taking the information of related miRNAs and diseases into account for each miRNAdisease pair respectively, as the name "personalized" suggests, to exploit the similarity network expansively.By integrating known miRNA-disease association network, miRNA-miRNA functional similarity network and diseasedisease semantic similarity network to predict potential miRNA-disease associations, PRMDA is a global method that is capable of prioritizing miRNAs for all diseases simultaneously.Besides, PRMDA could prioritize candidate miRNAs for diseases without any known related miRNAs.More importantly, we implemented personalized recommendation-based algorithm on integrated similarity network for miRNAs and diseases, based on miRNA functional similarity network, disease semantic network and Gaussian interaction profile kernel similarity, to remarkably reduce data sparseness.As a result, PRMDA showed superior performance in leaveone-out cross validation by obtaining superior AUC result, which outperformed previous prediction models.Besides, in the case studies of a few important human www.impactjournals.com/oncotargetcancers, more than 80% out of top 50 predicted miRNAs for Colon Neoplasms (CN), Esophageal Neoplasms (EN), Lymphoma, Lung Neoplasms (LN) and Breast Neoplasms (BN) have been experimentally validated.We could draw a conclusion that PRMDA is an efficient and reliable miRNA-disease association prediction model.

Performance evaluation
In order to evaluate the prediction accuracy of PRMDA, we implemented leave-one-out cross validation (LOOCV) based on verified miRNA-disease associations recorded in the HMDD V2.0 database [48], along with performance comparison among five advanced computational approaches for miRNA-disease association prediction: HGIMDA [28], RLSMDA [45], HDMP [30], WBSMDA [29], and RWRMDA [38].For the procedure of LOOCV, each known disease-related miRNA was regarded as a test sample consecutively; and the training set was composed of all other 5429 known miRNA-disease associations.In each turn, a test sample was considered to be a successful prediction if its rank was higher than the given threshold, while compared with all the candidate miRNAs, having no verified association with the investigated disease.As for the disease with only one known related miRNA, which will have no related miRNAs in LOOCV, PRMDA could use constructed global network to infer the potentially associated miRNAs for the disease according to the verified related miRNAs of other similar diseases.In order to better illustrate the performance of PRMDA, Receiver operating characteristics (ROC) curve was drawn by plotting the true positive rate (TPR, sensitivity) against the false positive rate (FPR, 1-specificity) according to different thresholds.Sensitivity measures the ratio of positives that are correctly identified, which denotes the proportion of the test miRNA-disease associations scoring points greater than the assigned threshold in this study.In contrast, specificity measures the ratio of negatives that are correctly identified, which indicates the proportion of negative miRNA-disease pairs scoring ranks less than the given threshold.We calculated Area under the ROC curve (AUC) to evaluate the forecast capability of PRMDA.Here, AUC = 1 suggests perfect prediction performance of the evaluated model, and AUC = 0.5 means that the method makes the prediction randomly.
The performance comparison in terms of LOOCV results was shown in Figure 1.As a result, in the LOOCV, PRMDA, HGIMDA, RLSMDA, HDMP, WBSMDA, and RWRMDA achieved AUCs of 0.8315, 0.8077, 0.6953, 0.7702, 0.8031, and 0.7891, correspondingly.It is apparent that the performance of PRMDA outperformed previous prediction models to a great degree based on known miRNA-disease associations.We can draw a conclusion that PRMDA has displayed accurate and credible prediction performance and possesses the practical value to uncover unknown miRNA-disease associations.

Case studies
In order to further demonstrate the reliability precision of PRMDA, we carried out case studies of several vital human cancers.Prediction results were confirmed by matching miRNA-disease associations verified by experimental reports to another two databases: miR2Disease [49] and dbDEMC [50].We implemented three kinds of case studies in all.Firstly, for case studies of CN, EN and Lymphoma, we implemented PRMDA for prediction on all miRNA-disease associations recorded in the HMDD V2.0 database.In the second type of case study for LN, we removed all known related miRNAs with LN and then implemented PRDMA to infer potential related miRNAs for LN, which means that PRDNA could also work for diseases having no related miRNA.As for the case study of BN, we applied PRMDA to identify potential miRNA-disease associations based on the HMDD V1.0 database and matched the results with the data in miR2Disease, dbDEMC and HMDD V2.0.
As the most common type of gastrointestinal cancer, CN, poses great threaten to human's lives [51,52].Owing to the intricacy of taking precautions against metastatic disease with apposite therapies, statistics indicate that half of the patients suffering from CN die of metastatic disease within 5 years after being diagnosed [53].In the past few years, researchers managed to identify several related miRNAs for CN.For instance, Guo et al. found that an omnipresent absence of miR-126 in CN lines in comparison with normal human colon epithelia.Consequently, the experimental evidence proved that the down-regulation of miR-126 weakens its function as growth suppressor in CN cells, which indirectly promotes CN development [54].The miRNA hsa-mir-145 down-regulates the insulin receptor substrate-1 (IRS-1), an abutting protein for receptors, and inhibits the growth of CN cells [55].In our case study of CN, PRMDA was implemented to select the highest-rank miRNAs from candidate miRNAs for CN (See Table 1).The result suggests that all of the top ten candidate miRNAs have been confirmed to be related to CN. Besides, 92% of top 50 prioritized miRNAs were confirmed to have association with CN.Taking miRNA has-mir-21 (rated 1st in prediction list) for example, numerous experiments validated the significantly higher expression of has-mir-21 in CN pathological tissue than adjacent common tissue [56,57].In addition, studies also confirmed that high expression of hsa-mir-155, ranked 2nd in prediction list, was closely correlated with lymph node metastases, which promoted CN tumor growth [58].The miRNA hsa-let-7a, ranked 3rd in the list, was detected to perform down-regulation in clinical experiment for CN patients [59].Regarded as one of the most common cancer worldwide, EN is typically diagnosed at a partially advanced phase or at a phase involving lymph nodes.Besides, the general five-year survival rate of EN keeps at a low level, which requires novel miRNA-disease prediction for disease detection at an early stage.According to previous studies, miRNA deregulation are frequently detected in EN, suggesting that miRNAs are of great significance to tumorigenesis [60].For instance, studies found that Notch-1 specific miRNAs miR-21 and miR-34a are down-regulated during curcumin, a powerful inhibitor of EN growth, treatment [61].And the upregulated expression of tumor suppressor let-7a is an extremely important determining factor in reacting to chemotherapy by regulating IL-6/STAT3 pathway in esophageal squamous cell carcinoma [62].In our case study of EN by implementing PRMDA (See Table 2), 8 out of the top 10 predicted miRNAs have been validated to be EN-related by dbDEMC and miR2Disease datasets.Furthermore, 94% of the top 50 candidate miRNAs were verified.
Lymphomas are always classified into two types: Hodgkin Lymphomas (HL) and non-Hodgkin Lymphomas (NHL).HL, which is far more common than NHL, derives from preapoptotic germinal center B cells, where universal deficiency of B cell phenotype is distinguished [63].NHL is treated mostly through local radiotherapy and chemotherapy treatment [64].Recent studies found that PRDM1/blimp-1, a major regulator in terminal B-cell differentiation, is also a target for down-regulation mediated by miR-9 and let-7a in HL cell line, which functionally targeted specific pairing position in the PRDM1/blimp-1 mRNA 3′ untranslated region and suppressed luciferase reporter liveness by repressing translation [65].Besides, researchers also discovered that the distinct set of five miRNAs (miR-150, miR-550, miR-518b, miR-124a and miR-539) was differentially expressed in gastritis in contrast with MALT lymphoma [66].We implemented PRMDA on HMDD V2.0 database to prioritize related miRNAs for Lymphoma (See Table 3).As a result, 9 out of the top 10 candidate miRNAs for Lymphoma and 43 out of the top 50 miRNAs in the prediction list have been verified by the researches recorded in dbDEMC and miR2Disease databases.We further compared PRMDA with another three recent miRNA-disease association prediction models, MCMDA [67], HGIMDA [28] and WBSMDA [29], in terms of the case studies of CN, EN and Lymphoma.The comparison was presented in Table 4. Besides, we conducted related-miRNA prediction and verification for more diseases, and the number of verified miRNA-disease associations are presented in Table 5.
LN is considered to be responsible for considerable mortality worldwide.According to statistics, compared with never smokers, former smokers are far more likely to come down with LN [68].Currently, diagnosing LN at an early stage remains difficult for the majority, meanwhile five-year survival rates are less than 15% after diagnosis.Recent studies suggested that using miRNArelated methods for LN detection could be more effective than the main detecting method: screening computed tomography scans [69].Many LN-related miRNAs were identified by researchers in the past few years.For instance, Takamizawa et al. issued the first report of reduced expression of miRNA let-7 in human LN and suggested the potential clinical and biological influence of miRNA dysfunction [70].Besides, miRNAs miR-511 and miR-1297 act as LN tumor suppressor genes, which could suppress adenocarcinoma cell proliferation in vitro and in vivo by indirectly increasing CCAAT/enhancer-binding protein alpha expression [71].Here, we implemented the second type of case study on LN by removing all known miRNA-LN associations (See Table 6).All of the top 10 and 48 out of the top 50 prioritized miRNAs have been verified to be related to LN by experiments.In this way, PRMDA presented excellent prediction capability for diseases without known related miRNAs.
BN contributes most to cancer-caused deaths in women at the age of 40 and younger in developed countries [72].Worse still, the survival rates of young women with BN remain lower than those of elder women [73].Therefore, it is an increasingly urgent problem in low-and middle-income countries all over the world [74].It has been acknowledged that gene-expression profiling exerts substantial influence on our comprehending of BN biology.In the past two decades, f innate molecular subclasses of  [76].Here, in order to evaluate the performance, we also implemented PRMDA on the HMDD V1.0 database to predict candidate miRNAs for BN (See Table 7).As the result suggests, 100% of the top 10 candidate related miRNAs are proved to be BN-associated, and 44 out of the top 50 rated miRNAs are verified to be related with BN by HMDD V2.0, dbDEMC and miR2Disease datasets.The results of case studies and cross validation strongly indicate that PRMDA is a reliable and effective computational model for potential miRNA-disease association prediction referring to known associations.At last, we further applied PRMDA to predict related miRNAs for every disease and published the prediction lists of miRNA and disease pairs in supplementary table (See Supplementary Table 2) based on known miRNA-disease associations recorded in HMDD V2.0.The predicted pairs with higher ranks could be given reasonable priority for future researches.We also provided the score ranked by all miRNA-disease association pairs for each disease in Supplementary File 1.

DISCUSSION
The prominent performance of PRMDA could be attributed to several following factors.First of all, PRMDA implemented personalized recommendationbased algorithm on integrated similarity for both miRNAs and diseases.Moreover, as one of the most prevailing recommendation algorithm, personalized recommendation-based algorithm could remarkably improve data sparseness especially for diseases and miRNAs with few known related miRNAs and diseases, in which the content of associated miRNAs and diseases are taken into consideration for each miRNA-disease pair respectively, as the name "personalized" implies, to utilize the similarity network expansively.Secondly, PRMDA is a global method, which could prioritize miRNAs for all diseases simultaneously.Compared with the previous model, the significantly increasing amount of verified miRNA-disease associations involved in our model further ensures the credibility of prediction results.Last of all, the success of PRMDA gives credit to integration of miRNA-disease association network, disease semantic similarity, miRNA functional similarity, and Gaussian interaction profile kernel similarity, which promotes the precision and diminishes bias caused by incomplete database simultaneously during prediction.Furthermore, Sun et al. [77] proposed a model of MiRNA-Disease Association based on Network Topological Similarity (NTMSDA).NTMSDA constructed 2 novel adjacent matrixes according to miRNA and disease network topological similarity matrix.Nevertheless, NTMSDA failed to prioritize miRNAs for diseases without known related miRNAs due to its strict topological dependence.NTMSDA is also a recommendation-based method, the difference The second, third, fourth and fifth column record the number of verified miRNA-diseases associations among top 50 prioritized associations by PRMDA, MCMDA, HGIMDA and WBSMDA respectively.The second, third and fourth column record the number of verified miRNA-disease associations based on known associations in HMDD V2.0 database out of top 10, top 20 and top 50 respectively.www.impactjournals.com/oncotargetfrom PRMDA lies in several aspects: firstly, the way building new rating matrix for miRNAs and diseases was totally different.Secondly, NTMSDA requires parameter selection during the process of incorporating the two new integrated adjacent matrixes.Lastly, compared with NTSMDA, PRMDA integrated more similarity information in the last step of ranking.With more and more discoveries of new diseases, which do not have related miRNAs, PRMDA could perfectly work for such diseases, as well as prioritizing diseases for newly identified miRNAs with no related diseases.However, there are still some existing limitations that could be ameliorated in the future.Firstly, the completeness of miRNA-disease association network remains to be enriched with more experimental validations.Secondly, the performance of PRMDA may be improved by integrating more datasets which provide other information about miRNAs, diseases and associations between them.Finally, the personalized recommendation-based algorithm implemented by PRMDA may cause some bias for diseases with more related miRNAs, based on the hypothesis that miRNAs performing function similarly are more probable to be interacted with diseases with similar phenotypes.

MiRNA functional similarity
MiRNA functional similarity was calculated by MISIM [25], which was composed of four procedures: identifying miRNA-related diseases, calculating sematic values of diseases, calculating sematic similarity for disease pairs and determining miRNA functional similarity based on sematic similarity of related diseases.We downloaded miRNA functional similarity scores from http://www.cuilab.cn/files/images/cuilab/misim.zip in January 2010.Similarly, we built the miRNA functional similarity matrix MS, in which the entity MS (m(i), m(j)) indicates the functional similarity between miRNA m(i) and m(j).

Disease semantic similarity model 1
We downloaded MeSH descriptors from the National Library of Medicine (http://www.nlm.nih.gov/) to construct disease semantic similarity model.Diseasedisease associations were depicted into a Directed Acyclic Graph (DAG).DAG (D) = (D,T(D),E(D)) represents the disease D, where T(D) is the node set containing node D's ancestor nodes and itself, and E(D) denotes the corresponding edge set comprising the direct edges from parent nodes to child nodes.The semantic value of disease D is calculated as follows: where ∆ is defined as the semantic contribution factor.For a specific disease D, the contribution of D to the semantic value of disease D is considered to equal 1.Furthermore, the contribution is inversely proportional to the distance between D and other diseases.Consequently, disease nodes in the same level are believed to contribute the same to the semantic value of disease D.
Based on the assumption that larger shared part between two diseases in DAGs indicates larger semantic similarity, we used DS to represent the disease semantic The disease semantic matrix calculated by the model 1 is provided in Supplementary Table 3.

Disease semantic similarity model 2
Here, we also calculated disease sematic similarity in another way.Since diseases in the same level of DAG(D) may appear different times in the other disease DAGs, the disease semantic similarity model 1 may result in bias of assigning the same contribution for all diseases in the same level.We could conclude that the disease appearing in less disease DAGs is more specific and should be given higher contribution.
Here, we defined the contribution of disease d to the semantic value of disease D in DAG(D) as follows.
where n dt is the number of DAGs containing disease d.Similarly, we calculated the semantic similarity between diseases d (i) and d (j) was calculated as follows: ( 2 ( ) 2 ( )) ( ( ), ( )) ( ( )) ( ( )) The disease semantic matrix calculated by the model 2 is provided in Supplementary Table 4.

Gaussian interaction profile kernel similarity for diseases
Based on the assumption that functionally similar miRNAs are more likely to be associated with phenotypically similar diseases, Gaussian interaction profile kernel similarity for diseases was calculated by taking into consideration the topologic information of known miRNA-disease association network.Firstly, we denoted the interaction profiles of disease d(i) with a binary vector IP(d(i)) by checking whether disease d(i) is associated with each miRNA or not, that's to say, the value of ith row in association adjacency matrix A. Then, based on the interaction profiles, Gaussian kernel similarity between disease d(i) and d(j) was calculated as follows: where the parameter γ d controls the kernel bandwidth and is obtained through the normalization of a new bandwidth parameter γ′ d by the average number of known related miRNAs for all the diseases.Thus, γ d was defined as follows: At last, KD was the Gaussian interaction profile kernel similarity matrix for diseases, where the entity KD(d(i),d( j)) was the Gaussian interaction profile kernel similarity between disease d(i) and disease d(j).

Gaussian interaction profile kernel similarity for miRNAs
Similar to Gaussian interaction profile kernel similarity calculation for diseases, miRNA Gaussian interaction profile kernel similarity matrix could be calculated in a similar way: In the equations above, the interaction profile IP(m(i)) for miRNA (m(i) is determined by whether miRNA m(i) is associated with each disease or not.γ m was obtained by normalizing a new bandwidth parameter γ' m by the average number of associated diseases for all the miRNAs.

Integrated similarity for miRNAs and diseases
Integrated miRNA similarity matrix Sm and integrated disease similarity matrix Sd were established by combining miRNA functional similarity, disease semantic similarity, and Gaussian interaction profile kernel similarity, respectively.The similarity matrix was defined as below: where the semantic similarity between disease d(i) and disease d(j) was calculated as follows:

PRMDA
In this study, we proposed a novel computational model of personalized recommendation-based MiRNA-Disease Association prediction (PRMDA) to predict potential miRNA-disease associations.The core idea for PRMDA is to construct a new rating matrixes by implementing personalized recommendation-based algorithm on the integrated miRNA similarity matrix and integrated disease similarity matrix.The flowchart of PRMDA was shown in Figure 2. The source code of PRMDA could be downloaded from http://www.escience.cn/system/file?fileId=89408.
First of all, we built a new rating matrix for known miRNA-disease association network, which measures the importance of miRNA to disease and disease to miRNA for each miRNA-disease pair.The new rating matrix was derived from two submatrices for diseases and miRNA, respectively, we illustrated how one submatrix for diseases, denoted by Z d , was built.In order to clearly demonstrate the process of constructing Z d , we took the calculation of entity Z d (i, j) for example, which represents the new association score between disease d(i) and miRNA m(j).Before building new rating matrixes, we calculated two matrixes: integrated miRNA similarity matrix, denoted by S r , and integrated disease similarity matrix, denoted by S d .Then, we got the related-miRNA set RM i for d(i), which means that RM i contains all d(i)related miRNAs m(k) that satisfies A(i, k) = 1, where A is the miRNA-disease association adjacency matrix.Then, we counted the number of related diseases d(t) for d(i), satisfying S d (d(t), d(i)) > 0, and denoted the number by variable n a .Next, we defined the dimension variable N a as the total number of miRNAs.In the next step, we calculated the variable n tal .n tal representing the number of miRNAs m(k), which also had relationship with m(j), in RM i , in other words, S m (m(k), m(j))>0.The less n tal is, the more specific the m(j) is.The personalized weight matrix for disease, W d , was established, where W d (i, j), the personalized miRNA m(j) weight for disease d(i), was calculated as follows: ( , ) log( ) Then, for the new rating submatrix for disease: Z d , the entity Z d (i, j) was defined as below according to personalized weight: Similarly, the new rating submatrix for miRNA, denoted by Z m , was constructed in the same way with Narepresenting the number of diseases.In summary, the new rating submatrices for miRNA and diseases quantified the importance of disease for miRNA and miRNA for disease regarding each miRNA-disease association pair, respectively, by taking miRNA-related miRNAs and disease-related diseases information into consideration.After the construction of submatrices for both miRNA and disease, we normalized Z m and Z d to get final integrated rating matrix Z.We defined Z as follows: ( , ) ( , ) ( , ) ( ) ( ) where Z was considered to be the new rating matrix for miRNA-disease association network, which took both personalized weight for diseases and personalized weight for miRNAs into consideration.
Furthermore, we got K m and K d by multiplying integrated similarity for miRNAs and integrated similarity for diseases with miRNA-disease association adjacency matrix respectively, based on the assumption that miRNAs with similar functions tend to be related with diseases with similar phenotypes And we calculated matrix K, as the addition of normalized K m and K d , as follows: ( , ) ( , ) ( , ) ( ) ( ) Finally, the prediction matrix R was defined as below: ( , ) ( , ) ( , ) ( ) ( ) where R (i, j) represents the rating for disease d(i) and miRNA m(j) association calculated by PRMDA.

Figure 1 :
Figure 1: Performance comparisons between PRMDA and five advanced disease-miRNA association prediction models (HGIMDA, RLSMDA, HDMP, WBSMDA, and RWRMDA) in terms of ROC curve and AUC based on the framework of LOOCV.As a result, PRMDA achieved AUC of 0.8315 in LOOCV, significantly outperforming all the previous computational models in prediction accuracy.

2
d j d i and d j has semantic similarity S d i d j KD d i d j otherwise  =  

Figure 2 :
Figure 2: Flowchart of PRMDA model to prioritize potential related miRNA for diseases based on the HMDD V2.0 database.

Table 2 : Prediction list of the top 50 prioritized miRNAs associated with Esophageal Neoplasms based on known associations in HMDD V2.0 database
www.impactjournals.com/oncotarget

Table 3 : Prediction list of the top 50 prioritized miRNAs associated with Lymphoma based on known associations in HMDD V2.0 database
[75]impactjournals.com/oncotargetBN(LuminalA, Luminal B, HER2-enriched, Basal-like and Claud in-low) have been discovered and intensively studied[75].A typical example is that miRNA let-7a represses BN cell migration and invasion through downregulation of C-C chemokine receptor type 7, which is critical in metastatic and chemotactic responses in numerous cancers including BN