1887

Abstract

Network querying algorithms provide computational means to identify conserved network modules in large-scale biological networks that are similar to known functional modules, such as pathways or molecular complexes. Two main challenges for network querying algorithms are the high computational complexity of detecting potential isomorphisms between graphs and ensuring the biological significance of the query results. In this work, we propose a novel network querying algorithm that can enhance the biological significance of the query results through the use of a context-sensitive random walk (CSRW) model. In order to identify conserved subnetwork regions in the target network that are similar to a given query network, the algorithm estimates the node correspondence between the query and the target networks based on the CSRW model. Inspired by the pair hidden Markov model (pair-HMM) that has been widely used in comparative sequence analysis, the CSRW model can effectively capture the similarities between graphs by explicitly accounting for potentially inserted and deleted network nodes. Based on the estimated CSRW node correspondence scores, our algorithm first identifies high-scoring regions (referred to as the seed networks) in the target network that bear considerable similarity with (part of) the query. The seed networks are further extended by maximally minimizing the network conductance, which can effectively identify nearby nodes (e.g., proteins) that may share similar functions with other nodes in the current seed. Finally, the query result is pruned by removing irrelevant nodes based on an extension reward score, thereby improving the consistency of the final query result. Through extensive numerical simulations based on 938 real biological complexes, we show that our proposed algorithm yields accurate query results with enhanced biological significance.

References

Sharan, R., Ideker, T.: Modeling cellular machinery through biological network comparison. Nat. Biotechnol. 24, 427–433 (2006).

Sharan, R., Suthram, S., Kelley, R.M., Kuhn, T., McCuine, S., Uetz, P., Sittler, T., Karp, R.M., Ideker, T.: Conserved patterns of protein interaction in multiple species. Proc. Natl. Acad. Sci. USA 102, 1974–1979 (2005).

Yoon, B.-J., Qian, X. and Sahraeian, S.M.E.: Comparative analysis of biological networks: Hidden Markov model and Markov chain-based approach. IEEE Signal Processing Magazine 29, 22–34 (2012).

Girvan, M., Newman, M.E.J.: Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA 99, 7821–7826 (2002).

Newman, M.E.J.: Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E 74, 036104 (2006).

Spirin, V., Mirny. L.A.: Protein complexes and functional modules in molecular networks. Proc. Natl. Acad. Sci. USA 100, 12123–12128 (2003).

Sharan, R., Ulitsky, I., Shamir, R.: Network-based prediction of protein function. Mol. Syst. Biol. 3, 88 (2007).

Cook, S.A.: The complexity of theorem-proving procedures. Proceedings of the third annual ACM sym- posium on theory of computing, pp 151–158 (1971).

Jeong, H., Yoon, B.-J.: Effective Estimation of node-to-node correspondence between different graphs. IEEE Signal Processing Letters 22, 661–665 (2015).

Jeong, H., Yoon,B.-J.: Accurate multiple network alignment through context-sensitive random walk. BMC Systems Biology 9, S7 (2015).

Kelley, B.P., Yuan, B., Lewitter, F., Sharan, R., Stockwell, B.R., Ideker, T.: Path BLAST: a tool for alignment of protein interaction networks. Nucleic Acids Res. 32, W83–W88 (2004).

Shlomi, T., Segal, D., Ruppin, E., Sharan, R.: QPath: a method for querying pathways in a protein- protein interaction network. BMC Bioinformatics 7, 199 (2006).

Dost, B., Shlomi, T., Gupta, N., Ruppin, E., Bafna, V., Sharan, R.: QNet: a tool for querying protein interaction networks. J. Comput. Biol. 15, 913–925 (2008).

Bruckner, S., Hffner, F., Karp, R.M., Shamir, R., Sharan, R.: TORQUE: topology-free querying of protein interaction networks. Nucleic Acids Res. 37, W106–W108 (2009).

Bruckner, S., Hffner, F., Karp, R.M., Shamir, R., Sharan, R.: Topology-free querying of protein inter- action networks. J. Comput. Biol. 17, 237–252 (2010).

Sahraeian, S.M.E., Yoon, B.-J.: RESQUE: Network reduction using semi-Markov random walk scores for efficient querying of biological networks. Bioinformatics 28, 2129–2136 (2012).

Huang, Q., Wu, L.Y., Zhang, X.S.: An efficient network querying method based on conditional random fields. Bioinformatics 27, 3173–3178 (2011).

Huang, Q., Wu, L.Y., Zhang, X.S.: Corbi: a new R package for biological network alignment and querying. BMC Systems Biology 7, S6 (2013).

El-Kebir, M., Brandt, B.W., Heringa, J., Klau, G.W.: Natalie Q: A web server for protein-protein inter- action network querying. BMC Systems Biology 8, 40 (2014).

Klau, G.W.: A new graph-based method for pairwise global network alignment. BMC Bioinformatics 20, S59 (2009).

Hashemifar, S., Jinbo, X.: HubAlign: an accurate and efficient method for global alignment of protein- protein interaction networks. Bioinformatics 30, i438–i444 (2014).

Hu, J., Knut, R.: LocalAli: an evolutionary-based local alignment approach to identify functionally conserved modules in multiple networks. Bioinformatics 31, 363–372 (2015).

Hakes, L., Pinney, J.W., Robertson, D.L., Lovell, S.C.: Protein-protein interaction networks and biology– what's the connection? Nat. Biotechnol. 26, 69–72 (2008).

Kuchaiev, O., Raajski, M., Higham, D.J., Prulj, N.: Geometric de-noising of protein-protein interaction networks. PLoS Comput. Biol. 5, e1000454 (2009).

Sahraeian, S.M.E., Yoon, B.-J.: SMETANA: accurate and scalable algorithm for probabilistic alignment of large-scale biological networks. PLoS One 8, e67995 (2013).

Micale, G., Pulvirenti, A., Giugno, R., Ferro, A.: GASOLINE: a greedy and stochastic algorithm for optimal local multiple alignment of interaction networks. PLoS One 9, e98750 (2014).

Szklarczyk, D., Franceschini, A., Kuhn, M., Simonovic, M., Roth, A., Minguez, P., Doerks, T., Stark, M., Muller, J., Bork, P., Jensen, L.J., von Mering, C.: The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res. 39, D561–D568 (2011).

Ruepp, A., Brauner, B., Dunger-Kaltenbach, I., Frishman, G., Montrone, C., Stransky, M., Waegele, B., Schmidt, T., Douieu, N.O., Stumpflen, V., Mewes, H.W.: CORUM: the comprehensive resource of mammalian protein complexes. Nucleic Acids Res. 36, D646–D650 (2008).

Cherry, J.M., Hong, E.L., Amundsen, C., Balakrishnan, R., Binkley, G., Chan, E.T., Christie, K.R., Costanzo, M.C., Dwight, S.S., Engel, S.R. et al.: Saccharomyces Genome Database: the genomics resource of budding yeast. Nucleic Acids Res. 40, D700–D705 (2011).

Boyle, E.I., Weng, S., Gollub, J., Jin, H., Botstein, D., Cherry, J.M., Sherlock, G.: GO::TermFinder– open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes. Bioinformatics 20, 3710–3715 (2004).

Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T. et al.: Gene Ontology: tool for the unification of biology. Nature Genet. 25, 25–29 (2000).

Loading

Article metrics loading...

/content/papers/10.5339/qfarc.2016.HBPP1733
2016-03-21
2020-10-25
Loading full text...

Full text loading...

http://instance.metastore.ingenta.com/content/papers/10.5339/qfarc.2016.HBPP1733
Loading
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error