A computational efficient algorithm for protein sequence classification

Yi-Ming Li*, Hsiao Mei Lu

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In this paper we present statistical algorithms to classify the stability of proteins by their sequence. A protein sequence consists of successive amino acid codes and can be considered as multivariate categorical data. Based on the statistical variance analysis for data set in each group (stable or unstable protein), the weights are calculated and become an important clue for the effects of the combination of amino acids codes on protein stability. Once the weights for every combination of amino acid codes have been decided, we can assign each protein a score presenting its stability. The distribution of the score for a stable protein is different from the score of an unstable protein. Our algorithm is well suit in the protein stability analysis by its sequence. We propose weighting algorithms and compare them as the results of protein stability classification. It provides an alternative for the protein stability classification and a predictable result as the reference before the protein mutation.

Original languageEnglish
Title of host publication2003 Nanotechnology Conference and Trade Show - Nanotech 2003
EditorsM. Laudon, B. Romanowicz
Pages24-27
Number of pages4
StatePublished - 23 Feb 2003
Event2003 Nanotechnology Conference and Trade Show - Nanotech 2003 - San Francisco, CA, United States
Duration: 23 Feb 200327 Feb 2003

Publication series

Name2003 Nanotechnology Conference and Trade Show - Nanotech 2003
Volume1

Conference

Conference2003 Nanotechnology Conference and Trade Show - Nanotech 2003
CountryUnited States
CitySan Francisco, CA
Period23/02/0327/02/03

Keywords

  • Classification of protein sequence
  • Computational statistics
  • Prediction model
  • Protein stability
  • Statistical analysis

Fingerprint Dive into the research topics of 'A computational efficient algorithm for protein sequence classification'. Together they form a unique fingerprint.

Cite this