• Train




          Data Collection

          Building Blocks​

          Device Enrollment

          Monitoring Dashboards

          Video Annotation​

          Application Editor​

          Device Management

          Remote Maintenance

          Model Training

          Application Library

          Deployment Manager

          Unified Security Center

          AI Model Library

          Configuration Manager

          IoT Edge Gateway

          Privacy-preserving AI

          Ready to get started?

          Expert Services
  • Why Viso Suite
  • Pricing
Close this search box.

Build a Person Detection System in 5 Minutes on Viso Suite


Viso Suite is the all-in-one solution for teams to build, deliver, scale computer vision applications.

Need Computer Vision?

Viso Suite is the world’s only end-to-end computer vision platform. Request a demo.

In this article, you will learn how to build a person detection system with deep learning in about 5 minutes using the no-code computer vision platform Viso Suite. Read how you can use the latest and best performing deep learning algorithms to create a person detector vision system that you can easily connect to other systems, visualize data in dashboards, or send alerts by email.

About us: Viso Suite is the end-to-end computer vision infrastructure for enterprises. By simplifying the machine learning lifecycle, businesses can start realizing value with Viso Suite in only three days. Learn more by booking a demo with our team.

Viso Suite Computer Vision Enterprise Platform
Viso Suite is the Computer Vision Enterprise Platform


How to Detect People With Computer Vision

Computer vision involves AI technology being used to make computers see and imitate human vision. Therefore, video streams of cameras are analyzed with machine learning (ML) algorithms. The state-of-the-art machine learning method is deep learning recently brought great advances in the field of image recognition. Deep learning uses deep neural networks (deep means multiple connected layers) to perform image processing, similar to how the human brain employs connected neurons to understand visual information.

People or person detection is a typical computer vision task; technically, it is a subtype of object detection. Use cases of people detection involve the automation of manual tasks that would otherwise be solved with human eyesight.

The output of the application will provide you with the number and location of the detected people. This information can be used in a diverse set of use cases. For example, to detect persons in restricted or dangerous areas, perform crowd analytics. It is a typical application of computer vision that is being used across industries to increase the security or safety of employees, analyze and enhance operational efficiency, and automate products or services.


People detection with computer vision
Human detection with computer vision


Person Detection With Any Camera and Deep Learning

Person Detection

The person detection system I will build in this tutorial is based on object detection to detect people using neural networks. I will deploy a pre-trained computer vision algorithm to a device (on-device machine learning, Edge AI). The algorithms process images fetched from a connected camera or video source. The camera could be any CCTV, IP camera, USB camera, webcam, or even a video file played in a loop to simulate a camera stream. The pre-trained algorithm (and the ready-to-use application) can be downloaded from the Viso Marketplace.

Pre-Trained Models

The object detection module provided by Viso Suite comes with pre-trained algorithms to detect various objects, including persons. These algorithms were trained on high-quality, massive datasets. There are multiple models available for the use case in this tutorial; you can test different settings quickly and benchmark various algorithms without a single line of code. The AI models are also provided for different hardware architectures such as CPU, VPU, GPU, or TPU. You can later exchange the AI model you use with one click.


Build the Person Detector Application in the Visual Editor
Build Computer Vision application in the Visual Editor


Visual Programming and No-code

To build the system presented in this tutorial, I will use the no-code Viso Builder, which provides a visual programming interface. This allows me to intuitively model the application workflow by visually combining modules that can be configured from drop-down menus. While developers can still add custom code (low-code development), there is no need to write everything from scratch, leaving more time to test and tweak the parameters. Hence, updating and maintaining the application, even as it becomes more complex, becomes a lot easier and faster.


Build the Person Detection Application

For this tutorial, you need a Viso Suite account and workspace. The workspace provides all the tools you need to develop a person detection system. You can use any digital video stream and process with basically any computer that you enroll in the workspace. Everything can be done without coding.

Install the Required Modules

Logged into Viso Suite, I want to create my person detection system using a pre-trained model available in the Viso Marketplace. The application-building process is done in the Viso Builder, a visual programming interface for building computer vision applications.

The person detection system will contain several connected nodes, each performing a specific task towards accomplishing the final application.

    1. Video-Input: To get started, we need to configure the video source or where the frames will come from. These settings will tell my application to read the frames from an IP camera, USB camera, or video file. Capturing the frames from the right source is the first step before passing the frames to the next node.
    2. Object Detection: From the incoming frames, I want to detect the objects of interest, in our case, “persons.” The Object Detection node allows me to select from several pre-trained AI models for different hardware architectures, using available AI accelerators such as VPU (e.g., Intel Neural Compute Stick 2) or TPU (Google Coral) out of the box.
    3. Output Preview: The Video View node creates an endpoint for showing the processed video stream, including the detection results in real time. While this will not be needed for my system in production, it is a good way to debug and tweak specific parameters while testing.

The Viso Builder makes it easy to add nodes to an application. I drag and drop the nodes mentioned above into the workspace grid, and they are ready to be configured without any additional programming.


Low-code Computer Vision projects in the cloud
Install the required modules to your Workspace Library
Wire Together Pre-built Modules

For the system to work correctly, the nodes need to be connected in the right way.

  • The video source should send the input frames to the Object Detection node to be further processed.
  • At the same time, the frames should be sent to the Output Preview node, where the results will be displayed for debugging.
  • Hovering over the connection dots shows the output of each node which makes it simple to choose the right connections.
  • The resulting stream of the Object Detection node will be sent to the Preview node so that we can see the detection boxes in real time.


Configure the Person Detection Application

After the nodes are connected using the Viso Builder canvas, I want to configure each node to suit my needs. All selected nodes are directly configured in the Viso Builder. You can set the parameters with the visual interface, no coding is required.

  • Video-Input: My camera source will be a video file. The video is used to demo a real-world setting and is imported if you download the person detection application from the Viso Marketplace (you can also upload your own video files for testing). It simulates a real camera input and can later easily be changed to an IP or USB camera. For frame width, height, and FPS, I want to keep the original video settings which are 1920 x 1080px at 15 frames per second. The video input node will automatically resize the frames if these parameters are changed or skip/duplicate frames respectively in case of a difference in the input FPS and the configured FPS value on the video input node.
  • Object Detection: The Object Detection node lets me define the algorithms and hardware architectures for my system. Additionally, it allows me to set the objects of interest. In my case, I would like to test with a pre-trained OpenVino model. I select the OpenVino framework and Myriad as my target device. This will make my model run on the Movidius Myriad X vision processing unit inside my device. You can select another model or target device anytime. The model I would like to test is called “Person Detection Retail 0013” and can be selected from the model drop-down. I choose a threshold of 0.3 which means detection results with confidence of over 0.3 will be returned. I will keep the default overlap value of 0.7 and set object width and height as 0.99 to include all object sizes. These settings can be changed later on if you see that the detection does not perform as expected. I select to show the output results to see the detection boxes on my video preview.
  • Output Preview: The last step, which is optional but helpful for debugging, configures a local endpoint to check the video output in real time. I set the desired URL as /video and will be able to check the output preview using the device’s IP address and the URL I put in the Output Preview interface ([ip_address:1880/[URL]). I additionally check the input field “keep ratio” to keep the original frame size in my Output Preview.


People image annotation example
People detection technology image annotation example


And that’s it! I can save my application, and it will create the first version ready to be deployed to an edge device of my choice.

Check the Person Detection Result Preview

The person detection system is now ready to run. The program’s output can be reviewed with the Output Preview module, which was added to the workflow. Once the application is created successfully, it can be deployed to edge devices at the click of a button. Additionally, the data can be sent to a custom cloud dashboard directly within Viso Suite.

Build Logic Around the Person Detector

You can further add if-this-then rules and set rules to trigger alerts, send emails, Slack messages, SMS, and more. Also, you can send the insights directly to third-party systems. The visual editor makes it possible to build custom people counting systems, with rules and logic depending on your use case. Thus, you can simply modify the application you built and experiment with new application versions.


Crowd detection
Computer vision to detect humans


What’s Next?

If you enjoyed reading this article, I suggest having a look at:

Follow us

Related Articles

Join 6,300+ Fellow
AI Enthusiasts

Get expert news and updates straight to your inbox. Subscribe to the Viso Blog.

Sign up to receive news and other stories from viso.ai. Your information will be used in accordance with viso.ai's privacy policy. You may opt out at any time.
Play Video
Would you like a demo?

See how your team can build your real-world AI vision systems faster with our end-to-end solution.

Join 6,300+ Fellow
AI Enthusiasts

Get expert AI news 2x a month. Subscribe to the most read Computer Vision Blog.

You can unsubscribe anytime. See our privacy policy.