TY - JOUR
T1 - A 41.3/26.7 pJ per Neuron Weight RBM Processor Supporting On-Chip Learning/Inference for IoT Applications
AU - Tsai, Chang Hung
AU - Yu, Wan Ju
AU - Wong, Wing Hung
AU - Lee, Chen-Yi
PY - 2017/10/1
Y1 - 2017/10/1
N2 - An energy-efficient restricted Boltzmann machine (RBM) processor (RBM-P) supporting on-chip learning and inference is proposed for machine learning and Internet of Things (IoT) applications in this paper. To train a neural network (NN) model, the RBM structure is applied to supervised and unsupervised learning, and a multi-layer NN can be constructed and initialized by stacking multiple RBMs. Featuring NN model reduction for external memory bandwidth saving, low power neuron binarizer (LPNB) with dynamic clock gating and area-efficient NN-like activation function calculators for power reduction, user-defined connection map (UDCM) for both computation time and bandwidth saving, and early stopping (ES) mechanism for learning process, the proposed system integrates 32 RBM cores with maximal 4k neurons per layer and 128 candidates per sample for machine learning applications. Implemented in 65nm CMOS technology, the proposed RBM-P chip costs 2.2 M gates and 128 kB SRAM with 8.8 mm2 area. Operated at 1.2 V and 210 MHz, this chip achieves 7.53G neuron weights (NWs) and 11.63G NWs per second with 41.3 and 26.7 pJ per NW for learning and inference, respectively.
AB - An energy-efficient restricted Boltzmann machine (RBM) processor (RBM-P) supporting on-chip learning and inference is proposed for machine learning and Internet of Things (IoT) applications in this paper. To train a neural network (NN) model, the RBM structure is applied to supervised and unsupervised learning, and a multi-layer NN can be constructed and initialized by stacking multiple RBMs. Featuring NN model reduction for external memory bandwidth saving, low power neuron binarizer (LPNB) with dynamic clock gating and area-efficient NN-like activation function calculators for power reduction, user-defined connection map (UDCM) for both computation time and bandwidth saving, and early stopping (ES) mechanism for learning process, the proposed system integrates 32 RBM cores with maximal 4k neurons per layer and 128 candidates per sample for machine learning applications. Implemented in 65nm CMOS technology, the proposed RBM-P chip costs 2.2 M gates and 128 kB SRAM with 8.8 mm2 area. Operated at 1.2 V and 210 MHz, this chip achieves 7.53G neuron weights (NWs) and 11.63G NWs per second with 41.3 and 26.7 pJ per NW for learning and inference, respectively.
KW - Low-power design
KW - machine learning
KW - memory bandwidth reduction
KW - non-linear functions
KW - restricted Boltzmann machine (RBM)
UR - http://www.scopus.com/inward/record.url?scp=85023161329&partnerID=8YFLogxK
U2 - 10.1109/JSSC.2017.2715171
DO - 10.1109/JSSC.2017.2715171
M3 - Article
AN - SCOPUS:85023161329
VL - 52
SP - 2601
EP - 2612
JO - IEEE Journal of Solid-State Circuits
JF - IEEE Journal of Solid-State Circuits
SN - 0018-9200
IS - 10
M1 - 7967660
ER -