Prediction of Protein Subcellular Localizations

Chin Sheng Yu, Jenn-Kang Hwang

Research output: Contribution to conferencePaper

3 Scopus citations

Abstract

The support vector machine (SVM) method based on n-peptide composition (Yu et al, Proteins: Struct. Funct. Genet. 2003:50:531-536) is used to predict the subcellular localizations of proteins. For an unbiased assessment of the results, we apply our approach to two independent data sets: one set consisting of two parts (Reinhardt and Hubbard, Nucleic Acids Res. 1998; 26:2230-2236): the prokaryotic set includes 997 protein sequences in three categories and the eukaryotic set includes 2427. sequences in four localization categories; another set comprising 2191 proteins in 12 subcellular localizations (Chou and Cai, J. Biol. Chem. 2002; 277:45765-45769). Our approach provides excellent results for both data sets. For the first data set, our approach gives an overall prediction accuracy 93.2% for prokaryotic sequences, 88.1% for eukaryotic sequences. Our approach also yields significantly better Matthews correlation coefficient for each subcellular localization than the existing approaches. For the second data set, our approach achieves an overall prediction accuracy 83.2%, which is also around 10% higher than the best existing result. Our approaches should be valuable in the high throughput analysis of genomics and proteomics.
Original languageEnglish
Pages165-+
DOIs
StatePublished - 2008

Keywords

  • AMINO-ACID-COMPOSITION; SUPPORT VECTOR MACHINES; FUNCTIONAL DOMAIN COMPOSITION; SECONDARY STRUCTURE; NEURAL-NETWORKS; LOCATION; ACCURACY; SEQUENCE

Fingerprint Dive into the research topics of 'Prediction of Protein Subcellular Localizations'. Together they form a unique fingerprint.

  • Cite this