State of the Art Pose Estimation: The Ultimate Overview

Sports player pose estimation golfer

Computer vision is a category in artificial intelligence that deals with the modeling of human vision. In simple terms, it implies understanding what we can visually see. Some applications of computer vision include image classification, image segmentation, face detection, object detection, or pose estimation.

In this article, we will be looking into pose estimation, an interesting and relatively new aspect of computer vision.

Pose Estimation Basics

Pose estimation utilizes the use of pose and orientation to predict and track the location of a person or object. Accordingly, pose estimation allows programs to estimate spatial positions (“poses”) of a body in an image or video. In general, most pose estimators are 2 steps frameworks that detect human bounding boxes and then estimate the pose within each box.

Pose estimation operates by finding key-points of a person or object. Taking a person, for example, the key points would be joints like the elbow, knees, wrists, etc. There are two types of pose estimation: multi-pose and single pose. Single pose estimation is used to estimate the poses of a single object in a given scene, while multi-pose estimation is used when detecting poses for multiple objects.

Existing Pose Estimation Architectures

Because pose estimation is an easily applicable computer vision technique, we can implement a custom pose estimator using existing architectures. The existing architectures for getting you started with developing a custom pose estimator include:

  1. High Resolution Net (HRNet) is a neural network for human pose estimation. It is an architecture used in image processing problems to find what we know as key-points (joints) with respect to the specific object or person in an image. One advantage of this architecture over other architectures is that most existing methods match high-resolution representations of postures from low-resolution representations with respect to using high-low resolution networks. In place of this bias, this network maintains high-resolution representations when estimating postures. For instance, this architecture is helpful for the detection of human posture in televised sports.
  2. OpenPose is one of the most popular bottom-up approaches for multi-person human pose estimation. This architecture features real-time, multi-person pose estimation. OpenPose is an open-sourced real-time multi-person detection, with high accuracy in detecting body, foot, hand, and facial keypoints. An advantage of OpenPose is that it is an API that gives users the flexibility of selecting source images from camera fields, webcams, and others, more importantly for embedded system applications (for instance, integration with CCTV cameras and systems). It has support for different hardware architectures, such as CUDA GPUs, OpenCL GPUs, or CPU-only devices.
  3. DeepCut is another popular bottom-up approach for multi-person human pose estimation. DeepCut is used for detecting the poses of multiple people. The model works by detecting the number of people in an image and then predicting the joint locations for each image. DeepCut can be applied to videos or images with multi-persons/objects, for example, football, basketball, etc.
  4. Regional Multi-Person Pose Estimation (AlphaPose) is a popular top-down method of pose estimation. It is useful for detecting poses in the presence of inaccurate human bounding boxes. That is, it is an optimal architecture for estimating human poses via optimally detected bounding boxes. This architecture is applicable for detecting both single and multi-person in images or video fields.
  5. Deep Pose: This is a human pose estimator that leverages the use of deep neural networks. The DNN captures all joints, hinges a pooling layer, a convolution layer, and a fully-connected layer to form part of these layers.
  6. PoseNet: PoseNet is a pose estimator architecture built on tensorflow.js to run on lightweight devices such as the browser or mobile device. It can be used to estimate either a single pose or multiple poses.
  7. DensePose: This is a pose estimation technique that aims at mapping all human pixels of an RGB image to the 3D surface of the human body. This can also be used for single and multiple pose estimation problems.

Potential Use Cases of Pose Estimation

Pose estimation has applications in lots of fields, some of which are listed below:

  1. Human Activity Estimation
  2. Motion Transfer and Augmented Reality
  3. Training Robots
  4. Motion Tracking for Consoles
  5. Human Fall Detection

Human Activity Estimation: A rather obvious application of pose estimation is tracking and measuring human activity and movement. Architectures like DensePose, PoseNet or OpenPose are often used for activity, gesture or gait recognition. Examples of human activity tracking via the use of pose estimation include:

  • Application for detecting sitting gestures
  • Full body/sign language communication (for example traffic policemen signals)
  • Applications to detect if a person has fallen down or is sick
  • Applications to support the analysis of football, basketball and sports
  • Applications to analyze dance techniques (for example in ballet dances)
  • Application of posture learning for body works and finesses
  • Applications in security and surveillance enhancement
Pose Estimation Computer Vision Example
Example of a Pose Estimation Use Case

Augmented Reality and Virtual Reality: As of today, pose estimation interfaced with augmented and virtual reality applications gives users a better online experience. For instance, users can virtually learn how to play games like tennis, via virtual tutors who are pose represented. More so, pose estimators can also be interfaced with augmented reality-based applications. For instance, The United States Army experiments with augmented reality programs to be used in combat. These programs aim to help soldiers distinguish between enemies and friendly troops, as well as improve night vision.

Training Robots: Typical use cases of pose estimators is in the application of making robots learn certain crafts. In place of manually programming robots to follow trajectories, robots can be made to learn actions and movements by following the tutor’s posture look or appearance.

Motion Tracking for Consoles: Other applications of pose estimation are in-game applications, where human subjects auto-generate and inject poses into the game environment for an interactive gaming experience. For instance, Microsoft’s Kinect used 3D pose estimation (using IR sensor data) to track the motion of the human players and to use it to render the actions of the characters virtually into the gaming environment.

What’s Next?

Pose estimation is a fascinating aspect of computer vision that can be applied in multiple fields including technology, healthcare, business, and others. Aside from its prominence in modeling human characters via Deep Neural Networks that learn various key points, it is also used for security and surveillance systems. If you enjoyed reading this article, we recommend you reading about:

Share on linkedin
Share on twitter
Share on whatsapp
Share on facebook
Share on email
Related Articles

Want to use Computer Vision applications?

Get the all-in-one Suite to build and deliver Computer Vision Applications. 
Learn more

This website uses cookies. By continuing to browse this site, you agree to this use.