We define a schema representation for visualizing the relationship between primary and secondary protein structures. In the low sequence similarity training set, the steady-state genetic algorithm outperforms the association rule mining to find those high discrimination and confidence schemata. These found schemata not only can be provided to biologists for the regularity of protein secondary structures but also applied to predict the protein secondary structures. Because of the poor Q3 accuracy in the previous study, we offer a clustering method to the steady-state genetic algorithm. The clustering method plays two important roles: one is to generate parts of initial chromosomes in genetic algorithms and another one is to assist schemata in predicting secondary protein structures. In accordance with our tests, the new approach improves 12% of Q3 accuracy by comparing to previous efforts. We also raise some new examples of schemata with the interesting biological meaning to do some discussions.
|Number of pages||8|
|Journal||WSEAS Transactions on Systems|
|State||Published - 1 Feb 2006|
- Data mining
- Genetic algorithms
- Knowledge discovery
- Protein secondary structure