In the last decade, computer vision has evolved as a key technology for numerous applications replacing human supervision and monitoring. This article provides a research overview on state-of-the-art computer vision in video surveillance and AI security monitoring.
We will discuss the following topics:
- State of the art in AI video security technology in 2022
- Anomaly Detection with computer vision
- AI vision applications in surveillance and security
- Challenges of applied computer vision in security
- Getting Started – Software Solution
At viso.ai, we provide the enterprise No Code Computer Vision Platform Viso Suite. The solution is used by leading organizations to build and scale their computer vision applications – with numerous use cases and case studies. This is why we have first-hand knowledge about the most popular applications, cutting-edge technology, and the challenges of implementing AI vision.
List of AI Vision Applications in Surveillance and Security
- Application #1: Human Detection
- Application #2: People Movement Analysis
- Application #3: Person Recognition
- Application #4: Weapon Detection
- Application #5: Human Behavior Understanding
- Application #6: Virtual Fencing
- Application #7: Traffic Incident Detection
- Application #8: Vehicle Surveillance
- Application #9: Vehicle Identification
- Application #10: Traffic Safety Applications
- Application #11: Illegal Activity detection
- Application #12: Anomaly Detection
- Application #13: Safety Assessment
- Application #14: Infrastructure Security
- Application #15: Emergency Management
- Application #16: Video Summarization
Later in this article, we will provide more information about those and more applications. Let’s jump right into the topic!
State of the Art in AI Video Surveillance
Computer vision uses a combination of technologies to analyze and understand video data with computers. In surveillance and security industry applications, the primary goal of computer vision is to automate human supervision. The ability to capture and digitize real-life scenes provides new opportunities to detect threats better and earlier, quantify risk, and provide real-time security assessments.
The list of computer vision applications keeps growing rapidly, driven by new machine learning, advanced computing, Artificial Intelligence, and connected IoT technologies that make AI vision much more powerful, flexible, and scalable.
Edge AI for Computer Vision in Surveillance
Applying computer vision has only recently become possible through advances in deep learning and edge computing. Deep learning is a subfield of machine learning that enables machines to learn from training data and apply those algorithms to new data.
Edge Computing is the concept of moving computing tasks from the cloud to the network edge in close proximity to the data source (camera). As a result, edge computing eliminates the challenges of connected cameras and devices, such as network congestion, constant connectivity, latency, robustness, privacy, and data management.
Modern computer vision systems use edge computing to process video without sending video data to the cloud or another storage unit. The combination of on-device machine learning and edge computing is also called Edge AI or Edge Intelligence. In surveillance and security applications of computer vision, those emerging technologies play an important role in enabling real-life AI applications. Also, Edge AI vision infrastructure allows significant cost reductions in large-scale and real-time computer vision systems.
Intelligent Surveillance Cameras
With the widespread use of security cameras in public places, AI video analysis and scene understanding with computer vision have become essential features of surveillance systems. Visual data from camera streams contain rich information compared to other information sources such as mobile location, GPS, radar signals, etc. Large-scale video analytics systems are able to collect statistical information about the status of road traffic, public places, buildings, or private areas.
Modern AI vision software allows using the video feed of virtually any network camera. Depending on its hardware configuration, a single edge device can process the video feed of multiple cameras. Powerful edge servers can analyze dozens to hundreds of cameras.
Some IP camera manufacturers or turnkey point solutions provide on-camera intelligence where the computing processor is integrated into the camera. However, enterprise systems usually separate AI computation from the camera itself – for a number of reasons.
First of all, businesses need to stay vendor-independent and maintain the ability to negotiate. Then, companies need to avoid technology lock-in, and ensure extensibility and integration with their systems. Also, cameras with integrated AI processing don’t allow to scale up hardware recourses if a company needs to extend features or increase AI performance.
In addition, most businesses operate video systems with a plethora of cameras from various brands, generations, and types (Sony, Panasonic, Axis, Hikvision, Dahua, Samsung, and so on). Replacing all cameras is too expensive, and standardizing would lead to lock-in costs. Also, most camera products are periodically replaced with a new model every two years.
AI Video Surveillance Systems
In traditional video surveillance, systems fully depend on human operators and individual judgment and attentivity. Intelligent AI analysis supports human operators with very fast, objective, and consistent information. Depending on the use case, the AI vision software performs tasks for detecting and predicting situations such as traffic congestions, security threats, accidents, and other anomalies.
A typical computer vision system integrates multiple software capabilities, from data input acquisition to image preprocessing, deep learning inference, output aggregation, communication, and visualization. Such a computer vision system is able to run one or multiple applications that each address specific problems (anomaly detection, object detection, etc.).
Computer Vision Applications vs. AI Models
An AI model is not a computer vision application; the two terms are often incorrectly used interchangeably. A computer vision application contains a computer vision pipeline (or flow) that contains one or multiple AI models.
The AI model requires upstream functions to fetch and preprocess image data before feeding it into the model. Only then an AI model performs an algorithmic task to transform video frames into specific metadata (e.g., classes of “person” with confidence). The raw model output requires interpretation and aggregation with logic to be useful for solving a business or security problem.
Anomaly Detection in AI Vision
What is Anomaly Detection?
In computer vision, anomaly detection is a sub-field of behavior understanding from surveillance scenes. Anomalies are typically aberrations of scene entities (people, vehicles, environment) and their interactions from normal behavior. Typically, anomaly detection methods learn the normal behavior via training. Anomaly detection typically uses unsupervised, semi-supervised, or unsupervised learning.
Use Cases of Anomaly Detection
Anything deviating significantly from the normal behavior can be considered “anomalous.” Examples include vehicle presence on walkways, the sudden dispersal of people within a gathering, a person falling while walking, jaywalking, signal bypassing in traffic, or U-turns of vehicles at red signals.
Anomaly Detection Solution
An anomaly detection systems leverage data acquisition, feature extraction, scene learning and/or activity learning, and behavioral understanding. Anomaly detection systems are architected towards specific use cases, optimized for particular deployment environments and camera positioning.
Systems for anomaly event detection are nontrivial and require a set of techniques that span multiple research fields. In general, such systems process video data to perform scene analysis using a combination of video processing techniques, vehicle and/or person detection and tracking, multi-camera-based techniques, intelligent event detection, and more.
In AI vision, there are different types of anomalies. The three types include point anomalies, contextual anomalies, and collective anomalies. Point anomalies include, for example, a non-moving car on a busy road or in a tunnel. Contextual anomalies could be normal in a different context. For example, in slow-moving traffic, if a vehicle moves faster than others, what would be normal behavior in less dense situations. Collective anomalies occur when a group of instances together may cause an anomaly even though individually they may be normal, for example, a group of people dispersing within a short span of time.
Scope and Class of Anomalies
In the context of visual surveillance, it is common to see anomalies classified as global and local anomalies. Global anomalies can be present in a frame or segment of the video without specifying exactly where an event occurred (no localization). Local anomalies usually happen within a specific area of the scene but may be missed by global anomaly detection algorithms. Some methods can detect both global and local anomalies.
AI Vision Applications in Surveillance and Security
Intelligent video surveillance includes a wide range of applications and use cases for anomaly detection, object detection and tracking, movement analysis technologies, monitoring systems, prevention, identification, and warning systems. Cooperative video surveillance enables large-scale AI vision systems that integrate numerous cameras at remote locations.
People detection uses object detection algorithms to localize people in video feeds. Automated single-person and multiple-person detection are vital features of intelligent video surveillance systems. Human detection also includes crowd analysis to estimate scene density and evaluate moving object interaction in crowded and uncrowded scenes.
People Movement Analysis
Path learning combines human detection with path modeling techniques and clustering to perform people movement analysis. For example, in Smart City applications, movement analysis is used to perform motion prediction and analyze vehicle behavior, pedestrian behavior, acceleration, movement speed, and trajectories.
Certain security surveillance systems apply face recognition software to perform person identification/recognition. On a high level, AI-based people recognition software combines multiple steps to detect faces, crop the face area and apply image classification to match it against a database.
Another alternative method of human recognition includes behavioral biometrics. In most use cases, such image recognition services require privacy-preserving computer vision techniques.
Weapon or Dangerous Object Detection
Real-time object detection uses deep learning to detect and localize specific objects in video scenes. Common object recognition applications in security include weapon detection (firearms or knives) or protective equipment detection. As for many computer vision applications, object detection in real-life settings is very challenging to implement; we will discuss the reasons later in this article.
Human Behavior Understanding
Person detection, classification, and person tracking are used for human behavior understanding in video-based surveillance applications. Specific behavior patterns can be learned with classification models to recognize specific human actions. This can be used for aggression detection, brawl detection, robbery or theft detection, and more. Applications to analyze human behavior also include trajectory clustering using different machine learning and imaging technologies such as multi-camera detection.
Virtual fencing of sensitive locations is a popular feature of AI vision surveillance systems. Specific regions of interest mark virtual fences to detect intrusion events.
Traffic Incident Detection
In traffic management and surveillance, vehicle detection and tracking algorithms are used to identify specific incidents and events. Such systems are popular in smart city applications, also for traffic parameters collection, vehicle counting, video-based tolling, traffic flow analysis, and behavior understanding. Other use cases include accident detection, highway vehicle detection, and vehicle classification (profiling). Common methods apply foreground segmentation or background subtraction in combination with convolutional neural networks (CNN) for deep learning tasks.
Vehicle or moving object detection and tracking are used in combination with license plate recognition. Image classification algorithms are used to determine vehicle model and type, color, or logo recognition. Camera-based vehicle surveillance is popular in parking lot analytics to detect and track the occupancy of multiple parking spaces with computer vision.
Vision-based vehicle identification uses automatic number plate recognition (ANPR) and vehicle feature detection (color, type) to identify and count individual vehicles using cameras. ANPR is also called LPR (License Plate Recognition). Vehicle identification software first detects the vehicle with object detection, locates the license plate, and finally reads the number plate using optical character recognition (OCR).
Traffic Safety Applications
Stationary or vehicle-mounted camera systems are used to perform different types of anomaly detection. Applications include lane departure warning, pedestrian detection, and adaptive warning systems. In-vehicle safety systems include driver monitoring, for example, seatbelt detection or gaze recognition (to analyze tiredness and fatigue).
Illegal Activity detection
Human action recognition and motion pattern detection are used to detect suspicious events or behavior. Popular techniques include pose estimation, 3D sensing, learning, and classification to detect violations of guidelines or the law. Illegal activities can include littering, loitering, begging, and more.
Anomaly Detection in Security Monitoring
AI video analysis is used for anomaly detection in traffic, subway, campus, trains, boats, buildings, and public places. Examples of anomaly detection in visual AI include stopped vehicle detection, panic detection, or abnormal pedestrian activity recognition.
AI camera systems are used to implement vision-based gap analysis, threat assessment, risk, conflict, and accident detection. Deep learning models perform recognition to digitize real-world situations and gather data to model and predict threat situations.
The ability to digitize the visual world and translate it into metadata is used to provide high-level security assessments, which are also important for insurance applications. For example, dynamic reports can be created that provide information about vehicle or people movement, interactions, or terrain information.
Visual surveillance uses computer vision for roadside warning systems and decision support in security monitoring of public places. Security applications include abandoned object detection to find possibly dangerous items early.
Emergency Management With Computer Vision
AI vision systems perform emergency classification of natural events such as storms, flooding detection, smoke, and fire detection. Abrupt event detection is used to identify anomalies across different locations and cameras.
Surveillance and security systems are also used to detect human-made emergencies such as road accidents, dangerous crows, weapon threat detection, drowning person detection, injured person or falling person detection.
AI vision algorithms are used to perform video summarization, synopsis generation, and content-based video retrieval. In video surveillance, the historical data output of the deep learning models can be used to identify specific events and find related video material.
Challenges of AI vision
Video-based anomaly detection in video surveillance is very challenging. There is a number of factors that make the real-life applications of computer vision very difficult to implement and scale:
- Lack of real-life data: There is a huge need for real-life data collection to train effective algorithms and build computer vision applications that perform well in real-world settings.
- Illumination: Managing variations of illumination is difficult because trained features are hard to extract from the videos.
- Pose and perspective: The camera angles that determine the surveillance area have a substantial impact on the performance of deep learning algorithms. This is because the appearance of objects or people may change depending on their distance from the camera.
- Heterogenous objects: Learning the movement of heterogeneous objects and entities in a scene can be difficult at times. The variability of appearance considerably lowers the application performance.
- Sparse vs. Dense: The methods used for detecting anomalies in sparse and dense conditions are different. Some methods are suitable for event recognition in sparse environments but can generate many false negatives in dense scene-based conditions, for example, with large crowds.
- Occlusion: Detection and tracking under occlusion with partially or fully hidden instances (people or objects) is very challenging though this task is comparably easy for humans.
Computer vision is a nontrivial technology, and applications are almost always unique and require a high level of integration and configuration. There are many platforms for computer vision, image recognition, video analysis, machine learning, data collection, model training, image annotation, edge device management, and so on. Many organizations end up with a sprawl of point solutions, expensive lock-ins, and insufficient flexibility and standardization.
This is why we’ve built Viso Suite. It’s an integrated, end-to-end computer vision application platform that covers the entire lifecycle in one suite. Viso Suite provides powerful business and enterprise features along with privacy-preserving Edge AI and helps to manage security and governance.
Viso Suite lets you bring teams together and use no-code and low-code capabilities with professional computer vision tools to accelerate every step in the application lifecycle.