Semi-Supervised 3D Human Pose Estimation by Jointly Considering Temporal and Multiview Information

Wei Ta Chu, Zong Wei Pan

Research output: Contribution to journalArticlepeer-review


Three-dimensional human pose estimation is usually conducted in a supervised manner. However, because collecting labeled 3D skeletons is expensive and time-consuming, semi-supervised methods that need much fewer amount of labeled 3D data are urgently demanded. Some semi-supervised learning methods propose to independently consider information from consecutive video frames, or frames simultaneously captured from multiple viewpoints. In this article, we propose to jointly consider temporal information and multiview information in a unified adversarial learning framework. Given a 2D skeleton, a pose generator network is developed to estimate the corresponding 3D skeleton, and a camera network is developed to estimate camera parameters. The estimated 3D skeleton is evaluated by a critic network to examine whether the estimated one is a plausible 3D human pose or not. Based on the estimated camera parameters, the estimated 3D skeleton can be re-projected into a 2D skeleton, which should be similar to the input 2D skeleton. The ideas of re-projection and adversarial learning enable the scheme of self supervision. We design network architectures of the aforementioned networks to take 2D skeletons from multiple viewpoints in temporally consecutive frames. By jointly considering two types of information, we verify that performance can be largely improved.

Original languageEnglish
Article number9298758
Pages (from-to)226974-226981
Number of pages8
JournalIEEE Access
Publication statusPublished - 2020

All Science Journal Classification (ASJC) codes

  • Computer Science(all)
  • Materials Science(all)
  • Engineering(all)

Fingerprint Dive into the research topics of 'Semi-Supervised 3D Human Pose Estimation by Jointly Considering Temporal and Multiview Information'. Together they form a unique fingerprint.

Cite this