In this article, you will learn how to build a person detection system with deep learning in about 5 minutes using Viso Suite infrastructure. Read how you can use the latest and best performing deep learning algorithms to create a person detector vision system that you can easily connect to other systems, visualize data in dashboards, or send alerts by email.
About us: Viso Suite is the end-to-end computer vision infrastructure for enterprises. By simplifying the machine learning lifecycle, businesses can start realizing value with Viso Suite in only three days. Learn more by booking a demo with our team.
How to Detect People With Computer Vision
Computer vision involves AI technology being used to make computers see and imitate human vision. Therefore, video streams of cameras are analyzed with machine learning (ML) algorithms. The state-of-the-art machine learning method is deep learning recently brought great advances in the field of image recognition. Deep learning uses deep neural networks (deep means multiple connected layers) to perform image processing, similar to how the human brain employs connected neurons to understand visual information.
People or person detection is a typical computer vision task; technically, it is a subtype of object detection. Use cases of people detection involve the automation of manual tasks that would otherwise be solved with human eyesight.
The output of the application will provide you with the number and location of the detected people. This information can be used in a diverse set of use cases. For example, to detect persons in restricted or dangerous areas, perform crowd analytics. It is a typical application of computer vision that is being used across industries to increase the security or safety of employees, analyze and enhance operational efficiency, and automate products or services.
Person Detection With Any Camera and Deep Learning
Person Detection
The person detection system I will build in this tutorial is based on object detection to detect people using neural networks. I will deploy a pre-trained computer vision algorithm to a device (on-device machine learning, Edge AI). The algorithms process images fetched from a connected camera or video source. The camera could be any CCTV, IP camera, USB camera, webcam, or even a video file played in a loop to simulate a camera stream. The pre-trained algorithm (and the ready-to-use application) can be downloaded from the Viso Marketplace.
Pre-Trained Models
The object detection module provided by Viso Suite comes with pre-trained algorithms to detect various objects, including persons. These algorithms were trained on high-quality, massive datasets. There are multiple models available for the use case in this tutorial; you can test different settings quickly and benchmark various algorithms without a single line of code. The AI models are also provided for different hardware architectures such as CPU, VPU, GPU, or TPU. You can later exchange the AI model you use with one click.
Visual Programming
To build the system presented in this tutorial, I will use the Viso Builder, which provides a visual programming interface. This allows me to intuitively model the application workflow by visually combining modules that can be configured from drop-down menus. While developers can still add custom code (low-code development), there is no need to write everything from scratch, leaving more time to test and tweak the parameters. Hence, updating and maintaining the application, even as it becomes more complex, becomes a lot easier and faster.
Build the Person Detection Application
For this tutorial, you need a Viso Suite account and workspace. The workspace provides all the tools you need to develop a person detection system. You can use any digital video stream and process with basically any computer that you enroll in the workspace. Everything can be done without coding.
Install the Required Modules
Logged into Viso Suite, I want to create my person detection system using a pre-trained model available in the Viso Marketplace. The application-building process is done in the Viso Builder, a visual programming interface for building computer vision applications.
The person detection system will contain several connected nodes, each performing a specific task towards accomplishing the final application.
- Video-Input: To get started, we need to configure the video source or where the frames will come from. These settings will tell my application to read the frames from an IP camera, USB camera, or video file. Capturing the frames from the right source is the first step before passing the frames to the next node.
- Object Detection: From the incoming frames, I want to detect the objects of interest, in our case, “persons.” The Object Detection node allows me to select from several pre-trained AI models for different hardware architectures, using available AI accelerators such as VPU (e.g., Intel Neural Compute Stick 2) or TPU (Google Coral) out of the box.
- Output Preview: The Video View node creates an endpoint for showing the processed video stream, including the detection results in real time. While this will not be needed for my system in production, it is a good way to debug and tweak specific parameters while testing.
The Viso Builder makes it easy to add nodes to an application. I drag and drop the nodes mentioned above into the workspace grid, and they are ready to be configured without any additional programming.
Wire Together Pre-built Modules
For the system to work correctly, the nodes need to be connected in the right way.
- The video source should send the input frames to the Object Detection node to be further processed.
- At the same time, the frames should be sent to the Output Preview node, where the results will be displayed for debugging.
- Hovering over the connection dots shows the output of each node which makes it simple to choose the right connections.
- The resulting stream of the Object Detection node will be sent to the Preview node so that we can see the detection boxes in real time.
Configure the Person Detection Application
After the nodes are connected using the Viso Builder canvas, I want to configure each node to suit my needs. All selected nodes are directly configured in the Viso Builder. You can set the parameters with the visual interface, no coding is required.
- Video-Input: My camera source will be a video file. The video is used to demo a real-world setting and is imported if you download the person detection application from the Viso Marketplace (you can also upload your own video files for testing). It simulates a real camera input and can later easily be changed to an IP or USB camera. For frame width, height, and FPS, I want to keep the original video settings which are 1920 x 1080px at 15 frames per second. The video input node will automatically resize the frames if these parameters are changed or skip/duplicate frames respectively in case of a difference in the input FPS and the configured FPS value on the video input node.
- Object Detection: The Object Detection node lets me define the algorithms and hardware architectures for my system. Additionally, it allows me to set the objects of interest. In my case, I would like to test with a pre-trained OpenVino model. I select the OpenVino framework and Myriad as my target device. This will make my model run on the Movidius Myriad X vision processing unit inside my device. You can select another model or target device anytime. The model I would like to test is called “Person Detection Retail 0013” and can be selected from the model drop-down. I choose a threshold of 0.3 which means detection results with confidence of over 0.3 will be returned. I will keep the default overlap value of 0.7 and set object width and height as 0.99 to include all object sizes. These settings can be changed later on if you see that the detection does not perform as expected. I select to show the output results to see the detection boxes on my video preview.
- Output Preview: The last step, which is optional but helpful for debugging, configures a local endpoint to check the video output in real time. I set the desired URL as /video and will be able to check the output preview using the device’s IP address and the URL I put in the Output Preview interface ([ip_address:1880/[URL]). I additionally check the input field “keep ratio” to keep the original frame size in my Output Preview.
And that’s it! I can save my application, and it will create the first version ready to be deployed to an edge device of my choice.
Check the Person Detection Result Preview
The person detection system is now ready to run. The program’s output can be reviewed with the Output Preview module, which was added to the workflow. Once the application is created successfully, it can be deployed to edge devices at the click of a button. Additionally, the data can be sent to a custom cloud dashboard directly within Viso Suite.
Build Logic Around the Person Detector
You can further add if-this-then rules and set rules to trigger alerts, send emails, Slack messages, SMS, and more. Also, you can send the insights directly to third-party systems. The visual editor makes it possible to build custom people counting systems, with rules and logic depending on your use case. Thus, you can simply modify the application you built and experiment with new application versions.
What’s Next?
If you enjoyed reading this article, I suggest having a look at: