Vision Processing Unit (VPU) for next-gen AI Inference on the Edge

ai chip - vision processing unit

Computer vision is beginning to transition from the laboratory setting to the real world. Hence, operating real-world computer vision applications requires a new approach to manage power, weight, cost, and space constraints.

Megatrend Deep Learning

The recent advances in deep learning methods and convolutional networks (CNNs) have drastically impacted the role of machine learning on a wide range of computer vision tasks.

With deep learning, object classification and object detection accuracy have been greatly improving. The inference error rate of machine learning algorithms has become remarkably low as well, reaching a state that already surpasses human performance in certain scenarios (e.g. in face recognition).

Optimized Deep Learning Hardware

With the deep learning trend comes the need for new hardware architectures that enables higher performance for machine learning tasks, both during training and inference.

The use of general-purpose processors for machine learning applications is limited, mainly due to the irregularity of memory access that comes with long memory stalls and large bandwidth requirements. As a side effect, this leads to significant increases in power consumption and thermal dissipation requirements.

Innovations at the software level introduced novel data formats that use tensors (a tensor is a generalization of vectors and matrices, easily understood as a multidimensional array). These breakthroughs provide multiple advantages in terms of performance and power consumption.

The industry is shifting towards designing processors where cost, power, and thermal dissipation are key concerns. Hence, specialized co-processors have emerged with the purpose of reducing the energy consumption constraints, while improving the overall computing performance for deep learning tasks.

Therefore, the adoption of power-efficient AI accelerators for computer vision and machine learning on the “edge” is an important field in robotics and the Internet-of-Things (IoT).

Vision Processing Units (VPU)

A Vision Processing Unit (VPU) is a type of processor that emerges as a category of chips that aim to provide ultra-low power capabilities, without compromising performance. Hence, Vision Processing Unit chips are optimized to perform inference tasks using pre-trained convolutional network (CNN) models. The term “vision” relates to the chips’ original purpose, which is to accelerate computer vision applications on the “edge”. The multicore, always-on system chips are optimized to power computer vision for edge, mobile, and embedded applications.

Myriad 2 VPU

A popular exponent is the Movidius Myriad 2 VPU based on the Intel Neural Compute Stick (NCS) platform that can be used for inference in convolutional networks with a pre-trained network. The Myriad 2 VPU is designed as a 28-nm co-processor that provides high-performance tensor acceleration. Hence, it provides high-level APIs that allow application programmers to easily take advantage of its features and a software-controlled memory subsystem that enables fine-grained control on different workloads.

The architecture of this chip is inspired by the observation, that beyond a certain frequency limit for any particular design and target process technology, the cost is quadratic in power for linear increases in operating frequency. The Myriad 2 VPU was designed following this principle, with 12 highly parallelizable vector processors, named Streaming Hybrid Architecture Vector Engines (SHAVE). Its parallelism and instruction set architecture provide highly sustainable performance efficiency across a range of computer vision applications, including those with low latency requirements on the order of milliseconds.

The Myriad 2 VPU aims to provide an order of magnitude higher performance efficiency, allowing high-performance computer vision systems with very low latency to be built while dissipating less than 1 Watt.

Inference performance of CPU, GPU, and Vision Processing Unit (VPU)
Inference performance of the ILSVRC dataset using CPU, GPU, and multi-VPU. – Source

Performance benchmarks have shown that a combination of multiple VPU chips can potentially provide equivalent performance compared to a reference CPU and GPU-based system while reducing the thermal-design power (TDP) up to 8 times. Meanwhile, the number of inferences per Watt of VPUs is over 3 times higher in comparison to reference CPU or GPU systems. In tests, the estimated top-1 error rate was 32% on average, with a confidence error difference of 0.5%.


Throughput performance comparison per Watt using the CPU, GPU, and Vision Processing Unit (VPU)
Throughput performance comparison per Watt using the CPU, GPU, and multi-VPU. – Source
Myriad X VPU

Intel’s Myriad  X VPU is the third generation and the most advanced VPU from Movidius. Intel’s Myriad X VPU for the first time in its class features the Neural Compute Engine, a specialized hardware AI accelerator for deep neural network deep-learning inferences.

The Neural Compute Engine in conjunction with the 16 SHAVE cores and an ultra-high throughput (Movidius states that it can achieve over one trillion operations per second of peak DNN inferencing throughput) makes Myriad X a popular option for on-device deep neural networks and computer vision applications.

The Myriad X VPU has and a native 4K image processor pipeline with support for up to 8 HD sensors connecting directly to the VPU. As with Myriad 2, the Myriad X VPU is programmable via the Myriad Development Kit (MDK) which includes development tools, frameworks, and APIs to implement custom vision, imaging and deep neural network workloads on the chip.

How to use VPUs to power AI vision systems

Movidius Neural Compute Stick 2 (NCS) is a tiny fanless USB deep-learning device that is built on the latest Intel Movidius Myriad X VPU. Those Vision Processing Units are used to power scalable, always-on computer vision applications at the edge.

Therefore, dedicated software platforms can be used to efficiently distribute and scale AI algorithms to distributed edge devices.

A wide range of vision-based deep learning applications can be powered using vision processing units, for example, people counting systems or human fall detectors.


Intel NCS2 Intel Movidius Neural Compute Stick
The Intel Neural Compute Stick 2 is built on the Myriad X VPU and offers an easy-to-use USB interface.


What’s Next?

If you want to learn more about other computer vision topics, we recommend you to read the following articles:

Share on linkedin
Share on twitter
Share on whatsapp
Share on facebook
Share on email
Related Articles

Want to use Computer Vision applications?

Get the all-in-one Suite to build and deliver Computer Vision Applications. 
Learn more

This website uses cookies. By continuing to browse this site, you agree to this use.