Big active learning

Er Chen Huang, Hsing Kuo Pao, Yuh-Jye Lee

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Scopus citations


Active learning is a common strategy to deal with large-scale data with limited labeling effort. In each iteration of active learning, a query is ready for oracle to answer such as what the label is for a given unlabeled data. Given the method, we can request the labels only for those data that are essential and save the labeling effort from oracle. We focus on pool-based active learning where a set of unlabeled data is selected for querying in each run of active learning. To apply pool-based active learning to massive high-dimensional data, especially when the unlabeled data set is much larger than the labeled set, we propose the APRAL and MLP strategies so that the computation for active learning can be dramatically reduced while keeping the model power more or less the same. In APRAL, we avoid unnecessary data re-ranking given an unlabeled data selection criteria. To further improve the efficiency, with MLP, we organize the unlabeled data in a multi-layer pool based on a dimensionality reduction technique and the most valuable data to know their label information are more likely to store in the top layers. Given the APRAL and MLP strategies, the active learning computation time is reduced by about 83% if compared to the traditional active learning ones; at the same time, the model effectiveness remains.

Original languageEnglish
Title of host publicationProceedings - 2017 IEEE International Conference on Big Data, Big Data 2017
EditorsZoran Obradovic, Ricardo Baeza-Yates, Jeremy Kepner, Raghunath Nambiar, Chonggang Wang, Masashi Toyoda, Toyotaro Suzumura, Xiaohua Hu, Alfredo Cuzzocrea, Ricardo Baeza-Yates, Jian Tang, Hui Zang, Jian-Yun Nie, Rumi Ghosh
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages8
ISBN (Electronic)9781538627143
StatePublished - 12 Jan 2018
Event5th IEEE International Conference on Big Data, Big Data 2017 - Boston, United States
Duration: 11 Dec 201714 Dec 2017

Publication series

NameProceedings - 2017 IEEE International Conference on Big Data, Big Data 2017


Conference5th IEEE International Conference on Big Data, Big Data 2017
CountryUnited States


  • active learning
  • high dimensionality
  • large-scale data
  • pool-based sampling

Fingerprint Dive into the research topics of 'Big active learning'. Together they form a unique fingerprint.

Cite this