• Train




          Data Collection

          Building Blocks​

          Device Enrollment

          Monitoring Dashboards

          Video Annotation​

          Application Editor​

          Device Management

          Remote Maintenance

          Model Training

          Application Library

          Deployment Manager

          Unified Security Center

          AI Model Library

          Configuration Manager

          IoT Edge Gateway

          Privacy-preserving AI

          Ready to get started?

          Expert Services
  • Why Viso Suite
  • Pricing
Close this search box.

The Complete Guide to OpenPose in 2024


Viso Suite is the all-in-one solution for teams to build, deliver, scale computer vision applications.

Need Computer Vision?

Viso Suite is the world’s only end-to-end computer vision platform. Request a demo.

This article provides a guide to the OpenPose library for real-time multi-person keypoint detection. We will provide a review of its architecture and features and provide a comparison with other human pose estimation methods.

In the following, we will cover the following:

  • Pose Estimation in Computer Vision
  • What is OpenPose? How does it work?
  • How to Use OpenPose? (research, commercial)
  • OpenPose Alternatives
  • What’s Next

About us: Viso.ai provides the leading Computer Vision Platform, Viso Suite. Global organizations use it to develop, deploy, and scale all computer vision applications in one place. Get a personal demo.


Viso suite screenshot
Viso Suite – End-to-End Computer Vision and No-Code for Computer Vision Teams


The video shows the output of a pose estimation application built using Viso Suite:


In the era of AI, more and more computer vision and machine learning (ML) applications need 2D human pose estimation as information input. This also involves subsequent tasks in image recognition and AI-based video analytics. Single and multi-person pose estimation are computer vision tasks important across domains, such as action recognition, security, sports, and more.

Pose Estimation is still a pretty new computer vision technology. However, in recent years, human pose estimation accuracy achieved great breakthroughs with the emergence of Convolutional Neural Networks (CNNs).


Pose Estimation with OpenPose

A human pose skeleton denotes the orientation of an individual in a particular format. Fundamentally, it is a set of connected data points describing one’s pose. Each data point in the skeleton can also be called a part coordinate, or point. A relevant connection between two coordinates is known as a limb or pair. However, it is important to note that not all combinations of data points give rise to relevant pairs.

Human Pose Keypoints
Human Pose Keypoints

Knowing one’s orientation paves the road for many real-life applications, many of them in sports and fitness. The first-ever technique typically estimated the pose of a single individual in an image consisting of a single person. OpenPose provides a more efficient and robust approach that applies pose estimation to images with crowded scenes.


Keypoint estimation human pose with OpenPose
Keypoint estimation human pose with OpenPose – Source


What is OpenPose?

OpenPose is a real-time multi-person human pose detection library that has for the first time shown the capability to jointly detect the human body, foot, hand, and facial keypoints on single images. OpenPose is capable of detecting a total of 135 keypoints.

The method is the winner of the COCO 2016 Keypoints Challenge and is popular for its decent quality and robustness to multi-person settings.


Keypoints detected by OpenPose on the Coco Dataset
Keypoints detected by OpenPose on the Coco Dataset.
Who created OpenPose?

Ginés Hidalgo, Yaser Sheikh, Zhe Cao, Yaadhav Raaj, Tomas Simon, Hanbyul Joo, and Shih-En Wei created the OpenPose technique. It is, however, maintained by Yaadhav Raaj and Ginés Hidalgo.

What are the features of OpenPose?

The OpenPose human pose detection library has many features but given below are some of the most remarkable ones:

  • Real-time 3D single-person keypoint detections
    • 3D triangulation with multiple camera views
    • Flir camera compatibility
  • Real-time 2D multi-person keypoint detections
    • 15, 18, 27-keypoint body/foot keypoint estimation
    • 21 hand keypoint estimation
    • 70 face keypoint estimation
  • Single-person tracking for speeding up the detection and visual smoothing
  • Calibration toolbox for the estimation of extrinsic, intrinsic, and distortion camera parameters
Costs of OpenPose for commercial purposes

OpenPose falls under a license allowing for free non-commercial use and redistribution under these conditions. If you want to use OpenPose in commercial applications (non-exclusive commercial use), they require a non-refundable annual fee of USD 25’000.


How to Use OpenPose

Lightweight OpenPose

Pose Estimation algorithms usually require significant computational resources and are based on heavy models with large model sizes. This makes them unsuitable for real-time applications (video analytics) and deployment on resource-constrained hardware (edge devices in edge computing). Hence, there is a need for lightweight real-time human pose estimators deployable to devices to perform on-device edge machine learning.

Lightweight OpenPose is a heavily optimized OpenPose implementation to perform real-time inference on the CPU with minimal accuracy loss. It detects a skeleton containing keypoints and the connections between them to determine human poses for every single person in the image. The pose may include multiple keypoints, including ankles, ears, knees, eyes, hips, nose, wrists, neck, elbows, and shoulders.

Hardware and Camera

OpenPose supports image video webcam input from images, videos, and camera streams of webcams, Flir/Point Grey cameras, IP cameras (CCTV), and custom input sources (such as depth cameras, stereo lens cameras, etc.)

Hardware-wise, OpenPose supports different versions of Nvidia GPU (CUDA), AMD GPU (OpenCL), and non-GPU (CPU) computing. It can be run on Ubuntu, Windows, Mac, and Nvidia Jetson TX2.

How to use OpenPose?

The fastest and easiest way to use OpenPose is using a platform like Viso Suite. This end-to-end solution provides everything needed to build, deploy, and scale OpenPose applications. Using Viso Suite, you can easily apply OpenPose using common cameras (CCTV, IP, Webcams, etc.), implement multi-camera systems, and compute workloads on different AI hardware at the Edge or in the Cloud (Get the Whitepaper here).

  • Find the official installation guide of OpenPose here.
  • Find tutorials on the Lightweight implementation version here.


How Does OpenPose Work?

The OpenPose library initially pulls out features from a picture using the first few layers. You then input the extracted features into two parallel divisions of convolutional network layers. The first division predicts a set of 18 confidence maps — with each of them denoting a specific part of the human pose skeleton. The next branch predicts another set of 38 Part Affinity Fields (PAFs) that denote the level of association between parts.

The model uses later stages to clean the predictions made by the branches. With the help of confidence maps, bipartite graphs are made between pairs of parts. Through PAF values, weaker links are pruned in the bipartite graphs. Now, applying all the given steps, the model estimates and allocates human pose skeletons to every person in the picture.


How OpenPose Works
How OpenPose Works – Source

Overview of the Pipeline

The OpenPose Pipeline consists of multiple sequential tasks:

  • a) Acquisition of the entire image as input (image or video frame)
  • b) Two-branch CNNs jointly predict confidence maps for body part detection
  • c) Estimate the Part Affinity Fields (PAF) for parts association
  • d) Set of bipartite matchings to associate body parts candidates
  • e) Assemble them into full-body poses for all people in the image


Hand tracking and gesture recognition with computer vision
Pose estimation detects human body hand, or other limbs.


OpenPose vs. Alpha-Pose vs. Mask R-CNN

OpenPose is one of the most well-renowned bottom-up approaches for real-time multi-person body pose estimation. One of the reasons is because of their well-written GitHub implementation. Just like the other bottom-up approaches, Open Pose initially detects parts from every person in the image, known as key points, trailed by allocating those key points to specific individuals.

OpenPose vs. Alpha-Pose

RMPE or Alpha-Pose is a well-known, top-down technique of pose estimation. The creators of this technique suggest that top-down methods are usually based on the precision of the person detector, as pose estimation is conducted on the area where the person is present. This is why errors in localization and replicate bounding box predictions can result in the pose extraction algorithm working sub-optimally.

To solve this issue, the creators introduced a Symmetric Spatial Transformer Network (SSTN) to pull out a high-quality person region from an incorrect bounding box. A Single Person Pose Estimator (SPPE) is applied in this extracted area to estimate the human pose skeleton for that individual. A Spatial De-Transformer Network (SDTN) is applied to remap the human pose back to the initial image coordinate system. Moreover, the authors also introduced a parametric pose Non-Maximum Suppression (NMS) method to handle the problem of irrelevant pose deductions.

Along with this, a Pose Guided Proposals Generator has also been proposed to multiply training samples to help better train the SPPE and SSTN networks. The most important feature of Alpha-Pose is that it can be extended to any blend of a person detection algorithm and an SPPE.

OpenPose vs. Mask R-CNN

Last but not least, Mask RCNN is a popular architecture for performing semantic and instance segmentation. It anticipates both the bounding box locations of the different objects in the image and a mask that segments the objects semantically (image segmentation). The architecture of Mask RCNN can be simply extended for human pose estimation.

It first extracts feature maps from a picture through a Convolutional Neural Network (CNN). A Region Proposal Network (RPN) uses these feature maps to get bounding box candidates for the presence of entities. The bounding box candidates select a region from the feature map. Since the bounding box candidates can be of different sizes, the RoIAlign layer decreases the size of the extracted features so that they become uniform in size.

Now, the extracted features pass into the parallel branches of CNNs for the ultimate prediction of the bounding boxes and the segmentation masks. The object detection algorithm can be trained to determine the region of individuals. By merging one’s location information and their set of keypoints, we can obtain the human pose skeleton for every individual in the image.

This technique is very similar to the top-down method, however, you conduct the person detection step along with the part detection step. Put simply, the keypoint detection phase and the person detection phase are independent of each other.


Mask R-CNN - The Mask R-CNN Framework for Instance Segmentation
Mask R-CNN Architecture


The Bottom Line

Real-time multi-person pose estimation is an important element in enabling machines to understand humans and their interactions. OpenPose is one of the most popular detection libraries for pose estimation and is capable of real-time multi-person pose analysis.

The lightweight variant makes it possible to apply OpenPose in Edge AI applications and to deploy it for on-device Edge ML Inference.

To develop, deploy, maintain and scale pose estimation applications effectively, you need a wide range of tools. The Viso Suite platform provides all those capabilities in one end-to-end solution. Get in touch and request a demo for your organization.


What’s next for OpenPose?

Moving ahead, OpenPose represents a significant advancement in artificial intelligence and computer vision. This development also paves the way for future research and applications that have the potential to transform how we engage with technology.

Read more about related articles.

Follow us

Related Articles

Join 6,300+ Fellow
AI Enthusiasts

Get expert news and updates straight to your inbox. Subscribe to the Viso Blog.

Sign up to receive news and other stories from viso.ai. Your information will be used in accordance with viso.ai's privacy policy. You may opt out at any time.
Play Video

Join 6,300+ Fellow
AI Enthusiasts

Get expert AI news 2x a month. Subscribe to the most read Computer Vision Blog.

You can unsubscribe anytime. See our privacy policy.

Build any Computer Vision Application, 10x faster

All-in-one Computer Vision Platform for businesses to build, deploy and scale real-world applications.