It’s only a matter of time until Computer Vision and Deep Learning will surpass human vision. In some areas such as in the medical field this has already happened, AI is able to detect breast cancer with higher accuracy than humans. In this article, we will be looking into reasons why Computer Vision projects fail – and how to overcome them.
The value of Computer Vision
Computers can analyze video streams in real-time, turn them into variables, and apply logic workflows to solve a complex visual problem.
Based on AI, a computer is able to complete visual problems such as counting objects or recognizing a visual shape (Object Detection) at higher precision and speed when compared to humans. Also, a computer is able to quickly and autonomously repeat such a task for as many times as needed. This is the basis of a wide range of real-world computer vision applications across multiple industries, a recent example is the use of computer vision for coronavirus control.
There are many situations where a computer is able to complete a visual task better than humans – at higher consistency and precision. The advantages are pretty much the same as with the automation of any manual task.
Hence, it’s not surprising that many businesses in offline-industries could potentially make use of computers that see to increase the quality of their product or service and/or save costs in their operations. However, there are some big pitfalls that come with the real-world use of Computer Vision and visual AI in general.
The objects are not visible
As simple as it sounds: AI vision is not magic and cannot overcome physics. A problem that deals with things that are not visible cannot be solved using Computer Vision. The vision of the computer can only be as good as the video stream of the camera or a video file is.
Some time ago I worked on a remote monitoring project where dogs should be counted using Deep Learning algorithms. Because some of the dogs’ fur was of the same color as the floor, they were “invisible” to the camera and also to the AI.
The solution to such situations is fairly simple, either the camera angle or the scene needs to change to ensure the objects are clearly visible. In some cases where this is not possible, a complex logic can be set up to count objects over time. This makes sense if either the scene or the objects change and become visible over a span of time.
The model is too heavy
For many AI vision applications, the traditional cloud computing model is not suitable since AI tasks are computationally intensive and require massive amounts of data. Therefore, machine learning algorithms are deployed to the edge device (Edge AI) where the data is generated, in a resource-constrained environment (power usage, commuting hardware).
With AI moving from the cloud to the edge, the main challenge is no longer to find an algorithm to do something but to achieve an efficient setup. Especially, since numerous Computer Vision Libraries and Deep Learning Frameworks have recently been open-sourced.
The accuracy and latency of Computer Vision tasks depend on the availability of computational resources. More accurate models (for example Mask R-CNN) tend to be heavier and consume significantly more resources.
Especially in high-scale AI vision solutions this matters greatly, the ability to achieve similar results with lower grade hardware means cost savings that quickly go to the millions. Many visual AI solutions are not viable in production, because they rely on a very heavy model that requires expensive GPUs. The economic benefit achieved would not make up for the costs that come with such a setup.
But I have good news for you! As the computing costs drastically decline year by year, computers not only get more powerful but also cheaper. Hence, models that were considered to be heavy can be used more broadly and switching the hardware to use modern AI accelerators can result in great performance gains.
If waiting or exchanging the AI hardware is not an option, there is much you can do. Many visual problems can be solved by drastically reducing the Frames per Second (FPS). If you for example count static objects, much higher precision can be achieved by processing only 1 frame or less per second. As surprising as it may seem, the perceived quality of the application could be much higher.
If you want to learn more about the basics of Computer Vision, we recommend you to read the following articles: