Identification and analysis of single- and multiple-region mitotic protein complexes by grouping gene ontology terms

Wen Lin Huang, Chyn Liaw, Chia Ta Tsai, Shinn-Ying Ho

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations


Many mitotic proteins are assembled into protein super complexes in three regions - midbody, centrosome and kinetochore (MCK) - with distinctive roles in modulating the mitosis process. However, more than 16% of the mitotic proteins are in multiple regions. Advance identification of mitotic proteins will be helpful to realize the molecular regulatory mechanisms of this organelle. Few ensemble-classifier methods can solve this problem but these methods often fuse various complementary features. In which, Gene ontology (GO) terms play an important role but the GO-term search space is massive and sparse. This motives this work to present an easily implemented method, namely mMck-GO, by identifying a small number of GO terms with support vector machine (SVM) and k-nearest neighbor (KNN) in predicting single- and multiple-region MCK proteins. The mMck-GO method using a simple grouping scheme based on a SVM classifier assembles the GO terms into several groups according to their numbers of annotated proteins in the training dataset, and then measures which top-grouped GO terms performs the best. A new MCK protein dataset containing 701 (611 single- and 90 multiple-region) is established in this work. None of the MCK proteins has a 25% pair-wise sequence identity with any other proteins in the same region. When performing on this dataset, we find that the GO term with the maximum annotation number annotates 49.2% of the training protein sequences; contrarily, 56.5% of the GO terms annotate single one protein sequence. This shows the sparse character of GO terms and the effectiveness of top-grouped GO terms in distinguishing MCK proteins. Accordingly, a small group of top 134 GO terms is identified and mMck-GO fuses the GO terms with amino acid composition (AAC) as input features to yield and independent-testing accuracies of 71.66% and 69.18%, respectively. Top 30 GO terms contain eight, eight, and 14 GO terms belonging to molecular function, biological process and cellular component branches, respectively. The 14 GO terms in cellular-component ontology in addition to centrosome and kinetochore are reverent to subcellular compartments, microtubule, membrane, and spindle, where GO:0005737 (cytoplasm) is ranked first. The eight GO terms enabling molecular functions comprise GO:0005515 (protein binding), GO:0000166 (nucleotide binding), and GO:0005524 (ATP binding). Most of the eight GO terms in biological-process ontology are reverent to cell cycle, cell division and mitosis but two GO terms, GO:0045449 and GO:0045449, are reverent to regulation of transcription and transport processes, which helps us to clarify the molecular regulatory mechanisms of this organelle. The top-grouped GO terms can be as an indispensable feature set when concerning other feature types to solve multiple-class problems in the investigation of biological functions.

Original languageEnglish
Title of host publicationInformation Technology for Manufacturing Systems IV
Number of pages9
StatePublished - 29 Oct 2013
Event4th International Conference on Information Technology for Manufacturing Systems, ITMS 2013 - Auckland, New Zealand
Duration: 28 Aug 201329 Aug 2013

Publication series

NameApplied Mechanics and Materials
ISSN (Print)1660-9336
ISSN (Electronic)1662-7482


Conference4th International Conference on Information Technology for Manufacturing Systems, ITMS 2013
CountryNew Zealand


  • Amino acid composition
  • Biological function
  • Cell division
  • Cellular component
  • K-nearest neighbor
  • Mitosis
  • Molecular functions
  • Support vector machine

Fingerprint Dive into the research topics of 'Identification and analysis of single- and multiple-region mitotic protein complexes by grouping gene ontology terms'. Together they form a unique fingerprint.

Cite this