1. Academic Validation
  2. QAUST: Protein Function Prediction Using Structure Similarity, Protein Interaction, and Functional Motifs

QAUST: Protein Function Prediction Using Structure Similarity, Protein Interaction, and Functional Motifs

  • Genomics Proteomics Bioinformatics. 2021 Dec;19(6):998-1011. doi: 10.1016/j.gpb.2021.02.001.
Fatima Zohra Smaili 1 Shuye Tian 2 Ambrish Roy 3 Meshari Alazmi 4 Stefan T Arold 5 Srayanta Mukherjee 3 P Scott Hefty 6 Wei Chen 7 Xin Gao 8
Affiliations

Affiliations

  • 1 Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia.
  • 2 Department of Biology, Southern University of Science and Technology of China (SUSTC), Shenzhen 518055, China.
  • 3 Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.
  • 4 Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia; College of Computer Science and Engineering, University of Ha'il, Ha'il 55476, Saudi Arabia.
  • 5 Biological and Environmental Sciences and Engineering (BESE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia.
  • 6 Department of Molecular Bioscience, University of Kansas, Lawrence, KS 66047, USA.
  • 7 Department of Biology, Southern University of Science and Technology of China (SUSTC), Shenzhen 518055, China. Electronic address: chenw@sustech.edu.cn.
  • 8 Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia. Electronic address: xin.gao@kaust.edu.sa.
Abstract

The number of available protein sequences in public databases is increasing exponentially. However, a significant percentage of these sequences lack functional annotation, which is essential for the understanding of how biological systems operate. Here, we propose a novel method, Quantitative Annotation of Unknown STructure (QAUST), to infer protein functions, specifically Gene Ontology (GO) terms and Enzyme Commission (EC) numbers. QAUST uses three sources of information: structure information encoded by global and local structure similarity search, biological network information inferred by protein-protein interaction data, and sequence information extracted from functionally discriminative sequence motifs. These three pieces of information are combined by consensus averaging to make the final prediction. Our approach has been tested on 500 protein targets from the Critical Assessment of Functional Annotation (CAFA) benchmark set. The results show that our method provides accurate functional annotation and outperforms other prediction methods based on sequence similarity search or threading. We further demonstrate that a previously unknown function of human tripartite motif-containing 22 (TRIM22) protein predicted by QAUST can be experimentally validated.

Keywords

EC number; Functionally discriminative motif; GO term; Protein function prediction; Protein structure similarity.

Figures
Products