## Abstract

In dealing with large data sets, the reduced support vector machine (RSVM) was proposed for the practical objective to overcome some computational difficulties as well as to reduce the model complexity. In this paper, we study the RSVM from the viewpoint of sampling design, its robustness, and the spectral analysis of the reduced kernel. We consider the nonlinear separating surface as a mixture of kernels. Instead of a full model, the RSVM uses a reduced mixture with kernels sampled from certain candidate set. Our main results center on two major themes. One is the robustness of the random subset mixture model. The other is the spectral analysis of the reduced kernel. The robustness is judged by a few criteria as follows: 1) model variation measure; 2) model bias (deviation) between the reduced model and the full model; and 3) test power in distinguishing the reduced model from the full one. For the spectral analysis, we compare the eigenstructures of the full kernel matrix and the approximation kernel matrix. The approximation kernels are generated by uniform random subsets. The small discrepancies between them indicate that the approximation kernels can retain most of the relevant information for learning tasks in the full kernel. We focus on some statistical theory of the reduced set method mainly in the context of the RSVM. The use of a uniform random subset is not limited to the RSVM. This approach can act as a supplemental algorithm on top of a basic optimization algorithm, wherein the actual optimization takes place on the subset-approximated data. The statistical properties discussed in this paper are still valid.

Original language | English |
---|---|

Pages (from-to) | 1-13 |

Number of pages | 13 |

Journal | IEEE Transactions on Neural Networks |

Volume | 18 |

Issue number | 1 |

DOIs | |

State | Published - 1 Jan 2007 |

## Keywords

- Canonical angles
- Kernel methods
- Maximinity
- Minimaxity
- Model complexity
- Monte Carlo sampling
- Nyström approximation
- Reduced set
- Spectral analysis
- Support vector machines (SVMs)
- Uniform design
- Uniform random subset