Functional annotation of genes and their protein products is an essential step in the course of genome analysis. Experimental functional analysis techniques such as microarray or yeast two-hybrid systems simply can not handle the quantity of sequences made available by next-generation sequence technologies, and thus annotation of gene products is primarily predicted applying computational tools. A variety of computational methods are now available applying different methodologies, amongst others, homology-, sequence-, structure- or network-based methods. Nonetheless, so far there is no method that predicts the function of a group of genes and their products; for instance genes that are expressed during the course of a disease or cellular stress. We developed a computational pipeline that fuzes different network data sources, namely protein-protein interaction, gene ontology, phylogenetic, gene expression and pathway information, in order to predict the group function(s) of genes. The main steps of the pipeline are, first, the network integration of the different input sources, second, the clustering of the involved genes according to their similarity, and, third, the (re-)assignment of genes/proteins with unknown function. These steps are repeated until the algorithm converges into one or more final clusters/groups, which are additionally mapped onto KEGG pathways in order to biologically identify and interpret higher-level systemic/organismic functions. We successfully applied the pipeline to different groups of genes over-expressed in diseases of major interest in Qatar, such as type-2 diabetes, breast cancer and pancreatic cancer.


Article metrics loading...

Loading full text...

Full text loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error