• Train




          Data Collection

          Building Blocks​

          Device Enrollment

          Monitoring Dashboards

          Video Annotation​

          Application Editor​

          Device Management

          Remote Maintenance

          Model Training

          Application Library

          Deployment Manager

          Unified Security Center

          AI Model Library

          Configuration Manager

          IoT Edge Gateway

          Privacy-preserving AI

          Ready to get started?

          Expert Services
  • Customers
  • Company
Close this search box.

Make a People Counting System in Less Than 10 Minutes

People Analytics with Computer Vision
Build, deploy, operate computer vision at scale
  • One platform for all use cases
  • Scale on robust infrastructure
  • Enterprise security

Computer vision best satisfies artificial intelligence tasks that would otherwise be solved with human eyesight. Hence, people counting, also known as crowd counting, is a common application of computer vision.

This specific use case is often applied in situations where understanding crowd characteristics and behavior is necessary. For example, this may look like utilizing count data to understand people entering stores at peak business hours. This data informs marketing campaigns giving customers an extra push to enter the stores at certain hours or optimize operations. An example of such is having enough staff on hand when stores are busy.

In this article, we will dive into the following:

  • Build my people-counting solution using the No Code Platform Viso Suite
  • Share a step-by-step how-to
  • Review the Viso Builder interface in an easy-to-follow tutorial

People counting technology can track crowd movement, estimate foot traffic, or calculate the population density of a crowd. The statistics of people counted provide useful information for event detection or strategy planning for a moderated area.



People Counting with Computer Vision and Deep Learning

Person Detection and Tracking

The people counter system I will build in this tutorial should be based on object detection. This is to count the number of people using neural networks. To create an object counter, we use object detection methods in combination with a region of interest to focus on a specific image region, and a counting logic to aggregate the detected classes (“Person”) that are the output of the algorithm.

We will deploy a pre-trained computer vision algorithm to a device. The algorithms process images fetched from a connected camera or video source. These sources can include CCTV, IP camera, USB camera, webcam, or even a video file played in a loop to simulate a camera stream.

Region-of-Interest and Counting Logic

A common practice to implement a counting logic is to use a region of interest (coordinates of a specific section within the image), with a crossing line. The deep learning algorithms are only applied within the region of interest, which enables significant performance gains (smaller images, less complex background, etc).

Object detection is used to detect the object, followed by object tracking to fetch the path of the detected object (here, the detected object is the class “person”). The counting system counts tracked objects that cross the predefined crossing line, simulating the entrance of a retail store, as an example.


People counting Use Case with Object Detection. Showing the box with the Region of Interest (ROI) and the crossing line.
Real time people counting in a shopping mall with Object Detection. You can see the box with the Region of Interest (ROI) and the crossing line.


Viso Suite provides all popular and state-of-the-art deep learning models as pre-built modules for use in the visual editor. You can use pre-trained neural networks that were trained to detect people and other classes on massive public image datasets, such as the MS COCO dataset.

Since most pre-trained CNN models (YOLO, SSD-mobile, etc.) are trained on frontal view or side view, only a few models are trained on top-down views. However, research shows that those models are very robust and provide good results even when applied in top-down view-based people counting applications.


Connect Pre-Built Modules To Build the People Counting Application

For this Tutorial, you need a Viso Suite account.

Logged into Viso Suite, I want to create my people counting system using pre-trained models and off-the-shelf tools. This will be done in the Viso Builder, a visual programming interface for building computer vision applications.

The Viso Builder makes it easy to add nodes to an application. I simply drag and drop the nodes mentioned above into the workspace grid. They are ready to be configured without any additional programming.

For the system to work correctly, the nodes need to be connected in the right way. The video source should send the input frames to the Region of Interest (ROI) node to be further processed. At the same time, the frames are sent to the Output Preview node, where the results are displayed for debugging. Hovering over the connection dots shows the output of each node which makes it simple to choose the right connections.

  1. Video-Input: To get started, we need to configure the video source or where the frames will come from. These settings will tell my application whether to read the frames from an IP camera, USB camera, or video file. Capturing the frames from the right source is the very first step before passing the frames to the next node.
  2. Region of Interest: In this step, I want to tell my application where inside the image the algorithm should be applied and where the counting should take place. This will speed up the processing time and configure the counting area for the system. The Regions of Interest can be configured as rectangles, polygons, or sections.
  3. Object Detection: From the pre-processed frames, I want to detect the objects of interest, in our case “people”. The Object Detection node allows me to select from several pre-trained AI models for different hardware architectures, using available AI accelerators such as VPU (e.g., Intel Neural Compute Stick 2) or TPU (Google Coral) out of the box.
  4. Object Count: In this step, we need to tell the system what to do with the detected persons, in our case “counting”. The Object Count node lets me define the analysis and aggregation interval of the detected objects and lets me set the upload interval to send the results to the cloud, where they can be picked up and displayed in a dashboard.
  5. Output Preview: The Video View node creates an end-point for showing the processed video stream, including detecting and counting results in real-time. While this will not be needed for my system in production, it is a good way to debug and tweak certain parameters while testing.


Computer Vision Smart City Use Cases for People Counting
Computer Vision Smart City Use Cases


Configure the People Counting Application

After the nodes are connected using the Viso Builder canvas, I want to configure each node to suit my needs. While the Region of Interest (ROI) piece of the application will need to be configured using a separate configuration interface, all other nodes are directly configured in the Viso Builder. Upon saving my application, it will create the first version for deployment to an edge device of my choice.

  • Video-Input: My camera source will be a video file I previously uploaded to my Viso Suite workspace for testing purposes. The video is used to demo a real-world setting and is available for free via the Internet. It simulates a real camera input and can later easily be changed to an IP or USB camera.
    For frame width, height, and FPS, I will keep the original video settings, 1920 x 1080px at 30 frames per second. The video input node will automatically resize the frames if these parameters are changed or skip/duplicate frames respectively in case of a difference in the input FPS and the configured FPS value on the video input node.
  • Region of Interest: First, I need to select the right type of ROI for my project. While section and polygon ROIs are more suitable for projects where we need to detect something inside or outside a certain area, the rectangle ROI fits best for people counting. Second, I want to ensure the frame has the correct orientation and that the “entrance” or “walking directions” are top-down.
    This way, it will be easier for the algorithm to detect people at a higher accuracy. This can be achieved by setting the correct angle while checking the changes in real time. Third, I want to draw my rectangle and adjust the cross-line, which will trigger the counts once a person crosses the line (for both directions). These settings can be changed later on to test different angles and configurations.
  • Object Detection: The Object Detection node lets me define the algorithms and hardware architectures for my system. Additionally, it allows me to set the objects of interest. In my case, I will test with a rather lightweight model on a Vision Processing Unit (VPU) as an accelerator (the Intel Neural Compute Stick 2).
    In my case, I will use Intel Movidius Myriad X for model inference. I will test with SSD MobileNet V2 and select “persons” as my target objects to be detected. As for object tracking, I will use dlib’s implementation of the correlation tracking algorithm. This is available with a single click from the Object Detection node interface.
  • Object Count: Now, I want to configure that my collected data is aggregated every 15 minutes and sent to the cloud once per hour. I select the analysis and upload interval accordingly, and the rest, including the secure connection to the managed cloud backend, will be configured automatically as I deploy the application.
  • Output Preview (optional): The last step, useful for debugging, configures a local endpoint to check video output in real time. I set the desired URL such as /video and check the output preview using the device’s IP address and URL I set in the Output Preview interface. I additionally check the input field “keep ratio” to keep the original frame size in my Output Preview.


People counting and crowd control with Viso Suite
People counting with Viso Suite


Check the People Counting Result Preview

The people counting system is now ready to run. The program’s output can be reviewed with the Output Preview module, which was added to the workflow. The short extract of the video shows what to expect from the application we’ve just built.

Once the application is created successfully, it can be deployed to edge devices at the click of a button. Additionally, the data can be sent to a custom cloud dashboard, directly within Viso Suite.



What’s Next?

If you enjoyed reading this article, I suggest having a look at:

One unified infrastructure to build deploy scale secure

real-world computer vision

Play Video