For a gene without any GO annotations, its semantic similarity with other genes is zero. Gross et al. doi: 10.1093/bib/bbw067, Mi, H., Muruganujan, A., Casagrande, J. T., and Thomas, P. D. (2013).

(2004). Table 2. King et al. Genet. Curr. Early solutions simply treated gene function prediction as a binary (or multi-class) classification problem (Hua and Sun, 2001; Lanckriet et al., 2003; Leslie et al., 2004). Existing methods of computational gene function prediction generally focus on the three tasks ( illustrated in Figure 4): (i) predicting missing (new) annotations, which updates some entries in Y with value 0 into 1 to identify new functional annotations of genes; (ii) identifying noisy annotations, which updates some entries in Y with value 1 into 1 to remove these false positive annotations; (iii) predicting negative examples, which updates some entries in Y with value 0 into 1 to state that the gene clearly does not carry out this function. Int. 14, 119128.

10:e1003644. (2017). Biol. (2015d) introduced a downward Random Walks model (dRW), which performed random walks on the GO hierarchy while taking the terms annotated to a gene as the initial nodes. ProSNet: Integrating homology with molecular networks for protein function prediction? in Pacific Symposium on Biocomputing (Hawaii), 2738. However, NoisyGOA does not evaluate the reliability of different annotations, and includes noisy annotations when quantifying the semantic similarity between genes. Metrics for GO based protein semantic similarity: a systematic evaluation. 45, D331?D338. Cross-species protein function prediction with asynchronous-random walk. Bioinformatics 29, 14241432. To solve these problems, Zhao et al. U.S.A. 101, 28882893. Furthermore, semantic measures are computed with respect to massive GO terms and, thus, are less reliable with sparse annotations. Pairwise measures generally employ an average combination (Lord et al., 2003), maximum combination (Sevilla et al., 2005), or best match average combination (BMA) to integrate the proximity between pairwise terms. (2002). Among them, BMA provides a good balance between the maximum and average measure, since the latter two measures are inherently influenced by the number of terms being combined (Pesquita et al., 2009). Accurate quantification of functional analogy among close homologs. Given the incomplete functional knowledge of genes, we have to admit that existing gene function prediction solutions are still no substitute for wet-lab experiments.

Inform. A. Trends Genet. 71, 264273. doi: 10.1073/pnas.0307326101, King, O. D., Foulger, R. E., Dwight, S. S., White, J. V., and Roth, F. P. (2003). transcripts functionally polyps cnidaria phylum specialized hydractinia hydrozoan annotated venn polyp padj regulated (2013) demonstrated that comparing the sequences of just two genes participating in the same biological processes is somewhat inaccurate. doi: 10.1038/nprot.2013.092, Mistry, M., and Pavlidis, P. (2008). However, most solutions based on semantic similarity are still impacted by incomplete GO annotations. maize drought genomic genetic yield genome flowering analyses prediction enrichment ontology functional Obozinski et al. Methods Mol. LncRNAdisease: a database for long-non-coding RNA-associated diseases. Several excellent surveys provide a comprehensive literature summation of the progress in gene function prediction (a.k.a. (2010) introduced a method called NtN, which applies singular value decomposition (SVD) (Golub and Reinsch, 1971) on the gene-term association matrix, whose entries are weighted by the term frequency-inverse document frequency and GO hierarchy; thus, the semantic relationships between genes and between terms were explored and the missing associations between genes and terms were completed.

(2017c) introduced a more advanced and adaptive approach (NoGOA), which used evidence codes of annotations to deferentially weight annotations and sparse representation to quantify the similarity between genes to identify noisy annotations. doi: 10.1093/bioinformatics/btv260, Wang, S., Qu, M., and Peng, J. Eisen (1998) found that utilizing evolutionary information improved gene function prediction. Biol. 1446,368. IEEE/ACM Trans. The first task has been extensively studied, while the latter two tasks are attracting research interest. Comput. U.S.A. 100, 83488353. Lu et al. BMC Bioinformatics 17:445. doi: 10.1186/s12859-016-1294-0, Tiwari, A. K., and Srivastava, R. (2014). (2007) quantified the semantic similarity between genes by combing the hierarchical relationships between terms and known GO annotations of genes, then using a k nearest neighbor (kNN) classifier with the semantic similarity to predict unknown annotations of genes. Overall, these solutions each model GO by using the pattern of GO annotations and/or GO hierarchy. First, the GO annotations of genes are still incomplete, shallow, imbalanced across species and even noisy (Thomas et al., 2012; Dessimoz and kunca, 2017). Nucleic Acids Res. To alleviate this difficulty, researchers have tried to compress massive terms, and predict gene functions in a compressed label space. doi: 10.1093/bioinformatics/btk048, Blake, J. doi: 10.1371/journal.pcbi.1001074, Cho, H., Berger, B., and Peng, J. Synergy of multi-label hierarchical ensembles, data fusion, and cost-sensitive methods for gene functional inference. They separately explored the prediction performance for two species with high or low homology, finding that annotations of highly-homologous species were complementary, while those of less homologous species did not complement each other. of genes. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). Predicting gene ontology functions from ProDom and CDD protein domains. 36:e12.

Front. doi: 10.1371/journal.pcbi.1003073, Chicco, D., Sadowski, P., and Baldi, P. (2014). Predicting protein function using multiple kernels. doi: 10.1093/nar/25.17.3389, Zhao, Y., Fu, G., Wang, J., Guo, M., and Yu, G. (2019a). To take advantage of information about features of genes and the available-but-scanty negative examples, Fu et al. For example, Tian et al.

doi: 10.1093/bioinformatics/btx794, Fu, G., Wang, J., Yang, B., and Yu, G. (2016a). Kahanda and Ben-Hur (2017) proposed a structured output solution that adopted a structural kernel function. Proc. doi: 10.1109/TCBB.2012.20, Zhang, Z., Miller, W., and Lipman, D. J. A bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae). Therefore, we focus on function prediction methods using GO. doi: 10.1109/TCBB.2017.2701379, Zhang, M.-L., and Zhou, Z.-H. (2014). For example, different species have different distributions of GO annotations; zebrafish is heavily studied in terms of developmental biology and embryogenesis, while rat is the standard model for toxicology (Dessimoz and kunca, 2017). where IC(t) is the information content of the term t, which estimates a term's specificity by its frequency of annotation to genes (Lin, 1998). The funcat, a functional annotation scheme for systematic classification of proteins from whole genomes. Sci. Others attempted to use the inter-relationships among GO terms, and introduced a variety of solutions based on multi-label learning. doi: 10.1109/TCBB.2019.2943342, Zhao, Y., Wang, J., Guo, M., Zhang, Z., and Yu, G. (2019c). PLoS Comput. The emerging Era of genomic data integration for analyzing splice isoform function. Sci. (2013). (2016a) proposed a semantic data fusion method (SimNet), which optimized the weights of multiple functional association networks to align with a semantic-similarity kernel matrix induced from the GO annotations of genes. Nat. Each ontological term is represented by an alphanumeric identifier, and its biological function is described by controlled words. doi: 10.1007/978-3-030-18576-3_19, Xu, Y., Guo, M., Shi, W., Liu, X., and Wang, C. (2013). PLoS Comput. True path rule hierarchical ensembles for genome-wide gene function prediction. doi: 10.1093/bioinformatics/btz535, Yu, G., Wang, K., Fu, G., Guo, M., and Wang, J. For recent gene function prediction, AUC, Fmax, and Smin are recommended by CAFA (Critical Assessment of protein Function Annotation algorithms) (Radivojac et al., 2013; Jiang et al., 2016; Zhou et al., 2019). Genome Biol. doi: 10.1093/bioinformatics/btg153, Lu, C., Chen, X., Wang, J., Yu, G., and Yu, Z. doi: 10.2174/157016461302160514004307. Acad. (1997). doi: 10.1093/bioinformatics/btu472, Jiang, Y., Oron, T. R., Clark, W. T., Bankapur, A. R., D'Andrea, D., Lepore, R., et al. Bioinformatics 26, 976978. in Pacific Symposium on Biocomputing (Hawaii: World Scientific), 299310. Integrating multiple heterogeneous networks for novel lncRNA-disease association inference. On the use of gene ontology annotations to assess functional similarity among orthologs and paralogs: a short report. Therefore, it lacks negative annotations, which limits the discriminative ability of function prediction models (Youngs et al., 2014; Fu et al., 2016a). We want to remark that Y(i, t) = 0 simply indicates that till now there is no evidence that this gene does or does not carry out the function related to term t. This specification is based on the incompleteness and open-world assumption of GO annotations (Schnoes et al., 2013; Dessimoz and kunca, 2017). doi: 10.1371/journal.pcbi.1003644, Youngs, N., Penfold-Brown, D., Drew, K., Shasha, D., and Bonneau, R. (2013). The rest of this review is organized as follows. (2003). Noisy annotations are also still largely overlooked by the community, which may mislead wet-lab experimental verification, GO enrichment analysis, and more. Protein function prediction based on zero-one matrix factorixation. Prediction of protein functions with gene ontology and interspecies protein homology data. Correlation between gene expression and GO semantic similarity. Gene ontology: tool for the unification of biology. Each GO term can be modeled as a semantic label and, thus, the gene function prediction task can be treated as a classification problem to determine whether the label is positive for the gene or not. (2017b) applied a random walk on the GO hierarchy and biological network to enrich the links between nodes, and then factorized the updated relational matrices of hierarchy and the network into two low-rank numeric matrices (one for the feature data matrix and the other for the GO label matrix), and finally imposed a semi-supervised classification on the two low-rank matrices to infer positive or negative annotations of genes. A tutorial on multilabel learning. Valentini (2011) and Cesa-Bianchi et al. Three issues in gene function prediction (left), and categorization of existing computational solutions based on GO (right). doi: 10.1016/j.patcog.2013.01.012, Radivojac, P., Clark, W. T., Oron, T. R., Schnoes, A. M., Wittkop, T., Sokolov, A., et al. NegGOA: Negative go annotations selection using ontology structure. 12, 5668. Fourth, due to the research priorities of biologists and animal/plant ethics, the collected GO annotations of genes are imbalanced across different species (Schnoes et al., 2013). Biol. 40, D940?D946. (2018a) proposed a method called NewGOA, which used a bi-random walk strategy on a hybrid graph to predict new annotations of genes. A., and Troyanskaya, O. G. (2008). 8, 15511566. Improving the measurement of semantic similarity by combining gene ontology and co-functional network: a random walk based approach. doi: 10.1093/bioinformatics/btt110, Yu, G., Domeniconi, C., Rangwala, H., and Zhang, G. (2013a). Matrix factorization-based data fusion for the prediction of lncRNA-disease associations. The evaluation protocol for gene function prediction is generally performed one of two ways. Nucleic Acids Res. 21 Articles, This article is part of the Research Topic, 4. However, this solution ignored GO terms in the GO hierarchy that were not yet annotated to studied genes. Bioinformatics 22, 830836. Transductive multi-label ensemble classification for protein function prediction? in Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Beijing), 10771085. More efforts can be devoted into identifying noisy annotations and irrelevant (or negative) annotations of genes. (2014). (2016b) studied cross-species gene function prediction based on semantic similarity. doi: 10.1007/978-3-642-40988-2_37, Yu, G., Domeniconi, C., Rangwala, H., Zhang, G., and Yu, Z. (2006). Wang, K., Wang, J., Domeniconi, C., Zhang, X., and Yu, G. (2020). doi: 10.1109/TITB.2009.2033116. Predicting protein function via semantic integration of multiple networks. A novel insight into gene ontology semantic similarity. Genemania: a real-time multiple association network integration algorithm for predicting gene function. Bioinformatics 32, 29963004. Sci. (2003). To achieve low storage and fast retrieval, hashing has been widely used in big data applications (Wang et al., 2016; Liu et al., 2019). Consistent probabilistic outputs for protein function prediction. Learn. A. BMC Bioinformatics 8:170. doi: 10.1186/1471-2105-8-170, Kahanda, I., and Ben-Hur, A. Nucleic Acids Res. But this inter-species method only considered a small number of GO terms. doi: 10.1093/nar/gkr972, Schug, J., Diskin, S., Mazzarelli, J., Brunk, B. P., and Stoeckert, C. J. Wang et al. The three main issues in gene function prediction are summarized on the left side of Figure 3. To predict new GO annotations of genes, Elisseeff and Weston (2002) pioneered a rank-based support vector machine that ranked relevant annotations of genes ahead of irrelevant ones. For example, GO:0048087 is a direct child and also a grandson of GO:0048066, and its furthest distance to the root term is 5, while GO:0006856 is another direct child of GO:0048066 and its furthest distance to the root is 4, so GO:0006856 is plotted at a higher level than GO:0048087. Due to differences in the preferences of biologist and in research ethics for experiments involving humans, animals, and plants, the curated annotations of genes for different species are biased, incomplete, and imbalanced (Schnoes et al., 2013; Dessimoz and kunca, 2017; Zhao et al., 2019b). Therefore, they generally obtain a better performance than counterparts without such modeling. Therefore, they have to project heterogeneous data onto the common latent feature space, which obscures the intrinsic structures of the respective data sources. Disease ontology: a backbone for disease semantic integration. (2019b) quantified the individual walk-lengths for each node of a hybrid network composed of genes, GO terms and their hierarchical relations; then, a random walk with individual walk-lengths on the network was performed to achieve cross-species gene function prediction. Makrodimitris et al. Rev. IEEE Trans. Therefore, both the GO hierarchy and annotations are regularly updated with new knowledge and archived for reference.

doi: 10.1186/gb-2008-9-s1-s2, Peng, J., Li, H., Liu, Y., Juan, L., Jiang, Q., Wang, Y., et al. Many models use the hierarchical inter-relations between GO terms and prove that the appropriate use of inter-relations can improve the gene function prediction (Tao et al., 2007; Done et al., 2010; Yu et al., 2015b). Researchers also recently employed hashing learning techniques to convert the typical one-hot coding of massive GO terms into short binary hashing codes. Each GO term is defined by a unique alphanumeric identifier and can be viewed as a vertex of the graph, and the function is described using controlled words. However, this method did not obey the GO hierarchy very well. 3, 9931022. Another issue is that alternative splicing causes a gene to be translated into different isoforms or protein variants, but GO collectively stores the associations between GO terms and genes irrespective these variants. Biol. doi: 10.7544/issn1000-1239.2017.20170644, Yu, G., Wang, Y., Wang, J., Fu, G., Guo, M., and Domeniconi, C. (2018b). Exploiting ontology graph for predicting sparsely annotated gene function. Protein function prediction using dependence maximization? in Joint European Conference on Machine Learning and Knowledge Discovery in Databases (Prague: Springer), 574589. To solve this problem, Zhao et al. (2017e). Clark and Radivojac (2011) investigated the quality of NAS and IEA annotations, and found IEA annotations were much more reliable than NAS ones in MFO branch. PLoS ONE 4:e6633. Comput. BMC Bioinformatics 16:1. doi: 10.1186/s12859-014-0430-y, Yu, G., Zhu, H., Domeniconi, C., and Guo, M. (2015c). This work was financially supported by Natural Science Foundation of China (61872300), Fundamental Research Funds for the Central Universities (XDJK2019B024 and XDJK2020B028), Natural Science Foundation of CQ CSTC (cstc2018-jcyjAX0228), and King Abdullah University of Science and Technology, under award number FCC/1/1976-19-01. An information-theoretic definition of similarity? in Proceedings of 15th International Conference on Machine Learning (Madison, WI), 296304. doi: 10.1142/9789812704856_0029, Lee, D. D., and Seung, H. S. (1999). (2019).

Another limitation of semantic similarity-based solutions is that they cannot predict new annotations for a gene without any annotations. Fmax is the overall maximum harmonic mean of precision and recall across all possible thresholds on the predicted gene-term association matrix (Jiang et al., 2016). Bioinformatics 7, 9199. (2019b) constructed a heterogeneous network including the GO hierarchy, intra- and inter-species subnetworks. Yu et al. (2016) developed a web tool called InteGO2 to select the most appropriate measure from a set of measures using a voting method, or to integrate measures via a meta-heuristic search method.

NTEL assumed a gene is a document and all terms affiliated with that gene are words of that document; then it used a Latent Dirichlet Allocation topic model (Blei et al., 2003) to select negative examples. In addition, Yu et al. doi: 10.1093/bioinformatics/btv590, Mazandu, G. K., Chimusa, E. R., and Mulder, N. J. doi: 10.1142/9789813207813_0004, Wang, Y., Yu, G., Domeniconi, C., Wang, J., Zhang, X., and Guo, M. (2019). One way is called history to recent, which takes advantage of previously archived GO annotations to train a model and evaluate the model's predictions by referring to more recent GO annotations. Biomed. This hypothesis may be often violated, since a gene may be annotated with one or more of those sibling terms as more experimental evidence becomes available. (2003) directly applied the annotation patterns of genes to induce a decision tree or Bayesian classifier to predict gene functions. Tao et al. Genomics 101, 368375. Die Grundlehren der mathematischen Wissenschaften (in Einzeldarstellungen mit besonderer Bercksichtigung der Anwendungsgebiete), Vol. In actual fact, the huge number of GO terms also causes a heavy computation burden for GO-based semantic similarity studies (Mistry and Pavlidis, 2008; Yu et al., 2015d). It is positively correlated with the feature similarity between them, which is computed from other biological data (Pesquita et al., 2009; Yu et al., 2015d). doi: 10.1145/2382936.2382962, Yu, G., Zhao, Y., Lu, C., and Wang, J. Similarly, each negative annotation indicates the gene product does not perform the function described by this term. doi: 10.1186/gb-2008-9-s1-s4, Obozinski, G., Lanckriet, G., Grant, C., Jordan, M. I., and Noble, W. S. (2008).

Information theory applied to the sparse gene ontology annotation network to predict novel gene function.

BMC Bioinformatics 16:271. doi: 10.1186/s12859-015-0713-y, Zeng, X., Zhang, X., and Zou, Q. Predicting cell cycle regulated genes by causal interactions. Genomics 111, 334342. Front. Bioinformatics 34, 660668. A. A gradient boosting decision tree-based method for predicting interactions between target genes and drugs. Human monogenic disease genes have frequently functionally redundant paralogs. Inform.

Predicting gene function from gene expressions and ontologies,? clusDCA manifested a significantly improved performance on sparse terms. After that, SimNet applied these weights to fuse the networks into a composite network, and then performed random walks on the composite network to make a prediction. There are three main differences between the two ways. A survey of computational intelligence techniques in protein function prediction. The semantic similarity between genes is quantified using GO annotations and/or GO hierarchy. Received: 02 January 2020; Accepted: 30 March 2020; Published: 24 April 2020. Predicting gene ontology function of human micrornas by integrating multiple networks. Second, from the prediction results, the history to recent way evaluates the fixed, recent annotations and, thus, it does not have a variance. doi: 10.1007/978-3-662-39778-7_10, Gross, A., Hartung, M., Kirsten, T., and Rahm, E. (2009). Genome Res. doi: 10.1038/nmeth.2340, Raychaudhuri, S., Chang, J. T., Sutphin, P. D., and Altman, R. B. AUC defines different thresholds to plot the receiver-operating characteristics curve of each GO term, and then calculates the average-area value of these terms. (2009). For example, Yu et al. Biol. Front. The GO hierarchical structure has also been used to identify noisy annotations, which is a less-studied but practical topic of gene function prediction. Threshold optimisation for multi-label classifiers. Fu et al. (2017) developed a deep learning-based method (DeepGO) to predict gene function from sequences. (2017e) adopted a hashing technique that preserved the graph structure from Liu et al. Genome Biol. 9:e1003343. From this rule, we have. 25, 33893402. (2018) preset weights for different evidence codes and upward-propagated weights to ancestor annotations via the GO hierarchy.

BMC Bioinformatics 9:S4. Bioinformatics 30, i609?i616. doi: 10.1101/gr.440803, Kissa, M., Tsatsaronis, G., and Schroeder, M. (2015). Brief. (2013). doi: 10.1109/TCBB.2015.2459713, Yu, G., Li, F., Qin, Y., Bo, X., Wu, Y., and Wang, S. (2010). doi: 10.1093/bioinformatics/btt160, The Gene Ontology Consortium (2017). IEEE/ACM Trans. Whole-genome annotation by using evidence integration in functional-linkage networks. To replenish the missing annotations of partially annotated genes, Yu et al. Incorporating functional inter-relationships into protein function prediction algorithms. Genet. (2016). Biol. 9, 509515. (2012b) proposed a gene function prediction model based on weak label learning (ProWL), in which the labels of the annotated training data were incomplete. ITSS (Tao et al., 2007), dRW (Yu et al., 2015d), HashGO (Yu et al., 2017e), HPHash (Zhao et al., 2019a), and NMFGO (Yu et al., 2020b) are some representative methods introduced in sections 3.1.2, 3.2.2. To consider GO, Mitrofanova et al. J. Comput. doi: 10.1360/N112017-00009, Yu, G., Fu, G., Wang, J., and Zhao, Y. (2018) recently presented the GOLabeler, which separately trained five different classifiers from five different feature descriptors on sequence data, and then combined these classifiers to make a prediction. Hence, more efficient and effective models are still welcomed. These generally obtained an improved accuracy (Mostafavi et al., 2008; Mostafavi and Morris, 2010; Yu et al., 2012a, 2015a). Comput. BMC Syst. Mazandu et al. (2007). PLoS Comput. Empirical studies show that hash tables-based solutions can speed up diverse semantic similarity metrics, e.g., the group-based one (Teng et al., 2013) and Best Match Average (Pesquita et al., 2008). 41, D983?D986. Protein complexes discovery based on protein-protein interaction data via a regularized sparse generative network model. (2015d). In DeepGO, the deep learning model predicted the GO annotations of genes based on gene sequences and dependencies between GO terms. Comput. Two species with high homology have a large number of homologous genes, which should share similar (or even identical) GO annotations (Schnoes et al., 2013). If a gene is annotated with GO term t, then this gene is also annotated with t's ancestor terms. To address that, Lu et al. A review on multi-label learning algorithms. 11, 411. TreeFam: a curated database of phylogenetic trees of animal gene families. Biol. Genome Biol. doi: 10.1109/TCBB.2005.50, Shehu, A., Barbar, D., and Molloy, K. (2016). Categories of solutions that use different inter-relations between GO terms. Use and misuse of the gene ontology annotations. BMC Bioinformatics 18:350. doi: 10.1186/s12859-017-1764-z, Yu, G., Luo, W., Fu, G., and Wang, J. bayes

ページが見つかりませんでした – MuFOH