3D hand pose estimation from a single RGB image is important but challenging due to the lack of sufficiently large hand pose datasets with accurate 3D hand keypoint annotations for training. In this work, we present an effective method for generating realistic hand poses, and show that existing algorithms for hand pose estimation can be greatly improved by augmenting training data with the generated hand poses, which come naturally with ground-truth annotations. Specifically, we adopt an augmented reality simulator to synthesize hand poses with accurate 3D hand-keypoint annotations. These synthesized hand poses look unnatural and are not adequate for training. To produce more realistic hand poses, we propose to blend each synthetic hand pose with a real background and develop tonality-alignment generative adversarial networks (TAGAN), which align the tonality and color distributions between synthetic hand poses and real backgrounds, and can generate high-quality hand poses. TAGAN is evaluated on the RHP, STB, and CMU-PS hand pose datasets. With the aid of the synthesized poses, our method performs favorably against the state-of-the-arts in both 2D and 3D hand pose estimation.
|State||Published - 2020|
|Event||30th British Machine Vision Conference, BMVC 2019 - Cardiff, United Kingdom|
Duration: 9 Sep 2019 → 12 Sep 2019
|Conference||30th British Machine Vision Conference, BMVC 2019|
|Period||9/09/19 → 12/09/19|