Deep neural networks (DNNs) have recently demonstrated impressive performance in complex machine learning tasks such as image classification or speech recognition.
In this article, we will cover the basics of Deep Neural Networks and the three most popular classes: Multi-Layer Perceptrons (MLP), Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN).
What Is a Deep Neural Network?
Machine learning techniques have been widely applied in a variety of areas such as pattern recognition, natural language processing, and computational learning. During the past decades, machine learning has brought enormous influence on our daily life with examples including efficient web search, self-driving systems, computer vision, and optical character recognition.
Especially, deep neural network models have become a powerful tool of machine learning and artificial intelligence. A deep neural network (DNN) is an artificial neural network (ANN) with multiple layers between the input and output layers.
The success of deep neural networks has led to breakthroughs such as reducing word error rates in speech recognition by 30% over traditional approaches (the biggest gain in 20 years), or drastically cutting the error rate in an image recognition competition since 2011 (from 26% to 3.5% while humans achieve 5%).
Concept of Deep Neural Networks
Deep neural network models were originally inspired by neurobiology. On a high level, a biological neuron receives multiple signals through the synapses contacting its dendrites and sends a single stream of action potentials out through its axon. The complexity of multiple inputs is reduced by categorizing its input patterns. Inspired by this intuition, artificial neural network models are composed of units that combine multiple inputs and produce a single output.
Neural networks target brain-like functionality and are based on a simple artificial neuron: a nonlinear function (such as max(0, value)) of a weighted sum of the inputs. These pseudo neurons are collected into layers, and the outputs of one layer becoming the inputs of the next in the sequence.
What makes a Neural Network “Deep”?
Deep neural networks employ deep architectures in neural networks. “Deep” refers to functions with higher complexity in the number of layers and units in a single layer. The large datasets in the cloud made it possible to build more accurate models by using additional and larger layers to capture higher levels of patterns.
The two phases of neural networks are called training (or learning) and inference (or prediction), and they refer to development versus production. The developer chooses the number of layers and the type of neural network, and training determines the weights.
3 Types of Deep Neural Networks
Three kinds of deep neural networks are popular today:
- Multi-Layer Perceptrons (MLP)
- Convolutional Neural Networks (CNN)
- Recurrent Neural Networks (RNN)
Multilayer Perceptrons (MLPs)
A multilayer perceptron (MLP) is a class of a feedforward artificial neural network (ANN). MLPs models are the most basic deep neural network, which is composed of a series of fully-connected layers.
Each new layer is a set of nonlinear functions of a weighted sum of all outputs (fully connected) from the prior one.
Convolutional Neural Network (CNN)
A convolutional neural network (CNN, or ConvNet) is another class of deep neural networks. CNNs are most commonly employed in computer vision. Given a series of images or videos from the real world, with the utilization of CNN, the AI system learns to automatically extract the features of these inputs to complete a specific task, e.g., image classification, face authentication, and image semantic segmentation.
Different from fully connected layers in MLPs, in CNN models, one or multiple convolution layers extract the simple features from input by executing convolution operations. Each layer is a set of nonlinear functions of weighted sums at different coordinates of spatially nearby subsets of outputs from the prior layer, which allows the weights to be reused.
Applying various convolutional filters, CNN models can capture the high-level representation of the input data, making it most popular for computer vision tasks, such as image classification (e.g., AlexNet, VGG network, ResNet, MobileNet) and object detection (e.g., Fast R-CNN, Mask R-CNN, YOLO, SSD).
- AlexNet. For image classification, as the first CNN to win the ImageNet Challenge in 2012, AlexNet consists of five convolution layers and three fully connected layers. AlexNet requires 61 million weights and 724 million MACs (multiply-add computation) to classify the image with a size of 227×227.
- VGG-16. To achieve higher accuracy, VGG-16 is trained to a deeper structure of 16 layers consisting of 13 convolution layers and three fully connected layers, requiring 138 million weights and 15.5G MACs to classify the image with a size of 224×224.
- GoogleNet. To improve accuracy while reducing the computation of DNN inference, GoogleNet introduces an inception module composed of different sized filters. GoogleNet achieves a better accuracy performance than VGG-16, while only requiring seven million weights and 1.43G MACs to process the image with the same size.
- ResNet. ResNet, the state-of-the-art effort, uses the “shortcut” structure to reach a human-level accuracy with a top-5 error rate below 5%. The “shortcut” module is used to solve the gradient vanishing problem during the training process, making it possible to train a DNN model with deeper structure.
The performance of popular CNNs applied for AI vision tasks gradually increased over the years, surpassing human vision (5% error rate in the chart below).
Recurrent Neural Network (RNN)
A recurrent neural network (RNN) is another class of artificial neural networks that use sequential data feeding. RNNs have been developed to address the time-series problem of sequential input data.
The input of RNN consists of the current input and the previous samples. Therefore, the connections between nodes form a directed graph along a temporal sequence. Each neuron in an RNN owns an internal memory that keeps the information of the computation from the previous samples.
RNN models are widely used in natural language processing due to the superiority of processing the data with an input length that is not fixed. The task of the AI here is to build a system that can comprehend natural language spoken by humans, e.g., natural language modeling, word embedding, and machine translation.
In RNNs, each subsequent layer is a collection of nonlinear functions of weighted sums of outputs and the previous state. The basic unit of RNN is called “cell”, and each cell consists of layers and a series of cells that enables the sequential processing of RNN models.
Learn more about other popular fields of computer vision and deep learning technologies. We recommend you to explore the following topics:
- Read about the difference between CNN and ANN
- An easy-to-understand guide to Deep Reinforcement Learning
- Read an introduction to Self-Supervised Learning
- Learn about the difference between Deep Learning vs. Machine Learning