Clustering data with partial background information

Chien-Liang Liu, Wen Hoar Hsaio*, Tao Hsing Chang, Hsuan Hsun Li

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Clustering with partial supervision background information or semi-supervised clustering, learning from a combination of both labeled and unlabeled data, has received a lot of attention over the last decade. The supervisory information is usually used as the constraints to bias clustering towards a good region of search space. This paper proposes a semi-supervised algorithm, called constrained non-negative matrix factorization (Constrained-NMF), with a few labeled examples as constraints to improve performance. The proposed algorithm is a matrix factorization algorithm, in which initialization of matrices is required at the beginning. Although the benefits of good initialization are well-known, randomized seeding of basis matrix and coefficient matrix is still the standard approach for many non-negative matrix factorization (NMF) algorithms. This work devises an algorithm called entropy-based weighted semi-supervised fuzzy c-means (EWSS-FCM) algorithm to initialize the seeds. The experimental results indicate that the proposed Constrained-NMF can benefit from the initialization obtained from EWSS-FCM, which emphasizes the role of labeled examples and automatically weights them during the course of clustering. This work considers labeled examples in the objective functions to devise the two algorithms, in which the labeled information is propagated to unlabeled examples iteratively. We further analyze the proposed Constrained-NMF and give convergence justifications. The experiments are conducted on five real data sets, and experimental results indicate that the proposed algorithm generally outperforms the other alternatives.

Original languageEnglish
Pages (from-to)1123-1138
Number of pages16
JournalInternational Journal of Machine Learning and Cybernetics
Volume10
Issue number5
DOIs
StatePublished - 1 May 2019

Keywords

  • Clustering
  • Fuzzy clustering
  • Non-negative matrix factorization (NMF)
  • Semi-supervised learning

Fingerprint Dive into the research topics of 'Clustering data with partial background information'. Together they form a unique fingerprint.

Cite this