A Robust Infant 2D Pose Estimation and Posture Detection System

Data Efficient Machine Learning: Infant Pose and Posture Estimation

Institute Reference: INV-21102

Background

Recent advances in computer vision have led to powerful human activity recognition models. However, models trained on large-scale adult activity datasets have limited success in estimating infant actions/behaviors due to the significant differences in their body ratios, the complexity of infant poses, and the types of their activities. More specifically, publicly available large-scale human pose datasets are predominantly comprised of scenes from sports, TV, and other daily activities performed by adult humans, and none of these datasets provides exemplars of activities of young children or infants. Additionally, privacy and security considerations hinder the availability of adequate infant images/videos required for training of a robust model from scratch.

Technology Overview

To mitigate these data limitation issues towards developing a robust infant behavior estimation/tracking system, this Northeastern University group has developed a two-stage data efficient infant pose/posture estimation framework bootstrapped on both transfer learning and synthetic data augmentation approaches.

This posture estimation approach makes the following contributions:

(1) Presents a fine-tuned domain-adapted infant pose (FiDIP) estimation model composed of a pose estimation sub-network to leverage transfer learning from a pre-trained adult pose estimation network and a domain confusion sub-network for adapting the model to both real infant and synthetic infant datasets.

(2) Achieves a highly accurate and robust end-to-end posture-based-on-pose estimation pipeline, called FiDIP-Posture that is trained with limited posture labels, since pose can be seen as a low-dimensional representation for posture learning.

(3) Builds a synthetic and real infant pose (SyRIP) dataset, which includes 700 fully- labeled real infant images in diverse poses as well as 1000 synthetic infant images produced by adopting two different human image generation methods.

Benefits

A reliable 2D pose estimation model that is particularly adaptive to infants: Currently there exist very few recent attempts initiated by the computer vision community to automatically perform pose estimation and tracking on videos taken from infants.
Application of data augmentation method (generating plenty of synthetic infant images) to overcome the widespread problem of insufficient training data for infants.
The fully non-contact and unobtrusive way of collecting data from a simple webcam or an RGB camera makes it very inexpensive to implement anywhere.
The quantitative and qualitative experiments show that our FiDIP model systematically and significantly outperforms the state-of-the-art 2D pose estimation methods.

FiDIP-Posture when applied on a fully novel infant dataset in their interactive natural environments achieves mean average precision (mAP) as high as 86.3 in pose estimation and a classification accuracy of 77.9\% for posture recognition.