On this page

Face detection: a guide to deep learning applications

Easy-to-read summary about Face Detection technology with deep learning. Exciting use cases, techniques, and a list of datasets.

Subscribe to the viso blog

Stay connected with viso.ai and receive new blog posts straight to your inbox.

Face detection is one of the most widely used computer vision applications and a fundamental problem in computer vision and pattern recognition. In the last decade, multiple face feature detection methods have been introduced. However, it is only recently that the success of Artificial Intelligence (AI) and convolutional neural networks (CNN) achieved great results in the development of highly accurate face detection solutions.

Face detection is a computer technology that identifies human faces in digital images. Given an image, the goal of facial recognition systems is to determine whether there are any faces and return the bounding box of each detected face (see object detection).

Face Detect Model in Computer Vision — Face Detection Model in Computer Vision

Other objects like trees, buildings, and bodies are ignored in the digital image. Hence, face detection can be regarded as a specific case of object-class detection, where the task is finding the location and sizes of all objects in an image that belong to a given class.

Face detection is the necessary first step for all facial analysis algorithms, including face alignment, face recognition, face verification, and face parsing. Also, facial recognition is used in multiple areas such as content-based image retrieval, video coding, video conferencing, crowd video surveillance, and intelligent human-computer interfaces.

Crowd Face Detection With Deep Learning using Viso Suite

The detection of human faces is a difficult computer vision problem. Mainly because the human face is a dynamic object and has a high degree of variability in its appearance. Recently, facial recognition techniques have achieved significant progress. However, high-performance face detection remains a challenging problem, especially when there are many tiny faces.

Face detection approaches

There are two types of approaches to detecting facial parts: (1) feature-based and (2) image-based approaches.

Feature-based face detection approach

Technique: Feature-based methods try to find invariant features of faces for detection. Hence, the underlying idea is based on the observations that human vision can effortlessly detect faces in different poses and lighting conditions, so there must be consistent properties or features despite those variabilities. A wide range of methods has been proposed to detect facial features to infer a face’s presence.
Examples: Edge detectors commonly extract facial features such as eyes, nose, mouth, eyebrows, skin color, and hairline. Based on the extracted features, statistical models were built to describe their relationships and verify a face’s presence in an image.
Advantages: Easy to implement, the traditional approach.
Disadvantages: A major problem of feature-based algorithms is that the image features can be severely corrupted due to illumination, noise, and occlusion. Also, feature boundaries can be weakened for faces, and shadows can cause strong edges, which together render perceptual grouping algorithms useless.

Image-based face detection approach

Technique: Image-based methods try to learn templates from examples in images. Hence, appearance-based methods rely on machine learning and statistical analysis techniques to find the relevant characteristics of “face” and “no-face” images. The learned characteristics are in the form of distribution models or discriminant functions that are applied for face detection tasks.
Examples: Image-based approaches include neural networks (CNN), support vector machines (SVM), or Adaboost.
Advantages: Good performance, higher efficiency
Disadvantages: Difficult to implement. Dimensionality reduction is usually required for the sake of computation efficiency and detection efficacy. So, this means reducing the dimensionality of the feature space with consideration by obtaining a set of principal features and retaining meaningful properties of the original face data.

The face detection dataset WIDER FACE has a high degree of variability in scale, pose, occlusion, expression, appearance and illumination. — The face detection dataset WIDER FACE has a high degree of variability in scale, pose, occlusion, expression, appearance, and illumination. – Source

Face detection methods

Multiple face detection techniques have been introduced.

The beginnings

Before 2000, despite many studies, the practical performance of facial recognition was far from satisfactory until the milestone work proposed by Viola and Jones. Starting from the pioneering work of Viola-Jones (Viola and Jones 2004), face detection has made great progress. Viola and Jones pioneered the use of Haar features and AdaBoost to train a face detector with promising accuracy and efficiency (Viola and Jones 2004), which inspired several different approaches afterward. However, its feature size was relatively large. Also, it is not able to effectively handle non-frontal faces and faces in the wild.

Early stages – machine learning

Early approaches mainly focused on extracting different types of hand-crafted features with domain experts in computer vision and training effective classifiers for detection with traditional machine learning algorithms. Such methods are limited in that they often require computer vision experts to craft effective features, and each component is optimized separately, making the whole detection pipeline often sub-optimal. To address the first problem, much effort has been devoted to coming up with more complicated features like HOG (histograms of oriented gradients), SIFT (Scale Invariant Feature Transform), SURF (speeded-up robust features), and ACF (aggregate channel features). Additionally, to enhance the robustness of detection, a combination of multiple detectors that had been trained separately for different views or poses has been developed. Nevertheless, training and testing of such models were usually more time-consuming, and the boost in detection performance was relatively limited.

State of the art – deep Learning

Recent years have shown significant advances in facial recognition software using deep learning methods, especially deep convolutional neural networks (CNN), which have achieved remarkable successes in various computer vision tasks. In contrast to traditional computer vision approaches, deep learning methods avoid the hand-crafted design pipeline and have dominated many well-known benchmark evaluations, such as the ImageNet Large Scale Visual Recognition Challenge (ILSVRC). Recently, researchers applied the Faster R-CNN, one of the state-of-the-art generic object detectors, and achieved promising results. In addition, joint training conducted on CNN cascade, region proposal network (RPN), and Faster R-CNN has realized end-to-end optimization. Faster R-CNN face detection algorithm with hard negative mining and ResNet, and achieved significant boosts in detection performance on benchmarks like FDDB.

opencv demo example of object detection — Face Detection with OpenCV

Why is face detection difficult?

Challenges in face detection are the reasons that reduce the accuracy and detection rate of facial recognition. These challenges are complex backgrounds, too many faces in images, odd expressions, illuminations, less resolution, face occlusion, skin color, distance, orientation, etc.

Unusual expression. Human faces in an image may show unexpected or odd facial expressions.
Illuminations. Some image parts may have very high or low illumination or shadows.
Skin types. Detecting faces of different face colors is challenging for detection and requires a wider diversity of training images.
Distance. If the distance to the camera is too high, the object size (face size) may be too small.
Orientation. The face orientation and angle toward the camera impact the rate of face detection.
Complex background. A high number of objects in a scene reduces the accuracy and rate of detection.
Many faces in one image. An image with a high number of human faces is very challenging for an accurate detection rate.
Face occlusion. Faces may be partially hidden by objects such as glasses, scarves, hands, hair, hats, and other objects, which impacts the detection rate.
Low resolution. Low-resolution images or image noise impact the detection rate negatively.

Face Detection using Computer Vision with Facial Keypoints — Face Detection using Deep Learning to detect Facial key points

Use cases of face detection applications

Crowd surveillance. Face detection is used to detect and analyze crowds in frequented public or private areas. Use cases include crowd estimation and real-time alerting for law enforcement.
Human-computer interaction (HCI). Multiple human-computer interaction-based systems use facial recognition to detect the presence of people in specific areas.
Photography. Some recent digital cameras use face detection for autofocus. Mobile apps use facial recognition to detect regions of interest in slideshows.
Facial feature extraction. Specific facial features such as the nose, eyes, mouth, skin color, and more can be extracted from images and live video feeds on social media.
Gender classification. Applications are built to recognize gender information with face detection methods. Such technologies are used for visitor and customer analysis.
Face recognition. A face recognition system is designed to verify and identify people from a digital image or video frame, often as part of access control or identification verification solutions.
Marketing. Face detection is becoming more and more important for marketing, analyzing customer behavior, or segment-targeted advertising.
Attendance. Facial recognition is used to detect the attendance of individuals. It is often combined with biometric detection for access management.

Deep convolutional network method for face detection — Face detection with a deep convolutional network, achieving high recall of faces even with severe occlusions and head pose variations – Source

Datasets used for face recognition

Annotated Faces in the Wild Dataset (AFW). The AFW dataset is built using Flickr images. It includes 205 images with 473 labeled faces. For each face, image annotations include a rectangular bounding box, 6 landmarks, and the pose angles.
PASCAL Face Dataset (PASCAL FACE). This dataset is used for facial recognition and face recognition; it is a subset of the PASCAL VOC and contains 1’335 labeled faces in 851 images with large face appearance and pose variations.
MIT Face Dataset (CBCL Face Database). The MIT-CBCL face recognition database contains a training set (2’429 faces, 4’548 non-faces) and a test set (472 faces, 23’573 non-faces).
Face Detection Data Set and Benchmark (FDDB). The dataset contains 5’171 faces annotated in 2’845 images with a wide range of difficulties, such as occlusions, difficult poses, and low image resolutions. Thereafter, these images are used to train with large appearance changes, heavy occlusions, and severe blur degradations that are prevalent in detecting a face in unconstrained real-life scenarios.
CMU Multi-PIE Database (PIE). The CMU Multi-PIE Face Database contains 41’368 images of 68 people, each person under 13 different poses, 43 different illumination conditions, and 4 different expressions.
Surveillance Cameras Face Database (SCface Dataset). SCface is a database of static images of human faces. The images were taken in an uncontrolled indoor environment using five video surveillance cameras of various qualities. The dataset contains 4’160 static images (visible and infrared spectrum) of 130 subjects.
WIDER FACE dataset (WIDER). The face detection benchmark dataset includes 32’203 images and 393’703 labeled faces with a high degree of variability in scale, pose, and occlusion, making face detection extremely challenging. Also, the WIDER FACE dataset is organized based on 61 event classes.

What’s next for facial recognition technology?

Learn more about other popular fields of computer vision and deep learning technologies, for example, the difference between supervised learning and unsupervised learning. Explore use cases of face detection in smart retail, education, surveillance and security, manufacturing, or Smart Cities.

Face detection: a guide to deep learning applications

Face detection: a guide to deep learning applications

Subscribe to our newsletter

Subscribe to the viso blog

Face detection approaches

Feature-based face detection approach

Image-based face detection approach

Face detection methods

The beginnings

Early stages – machine learning

State of the art – deep Learning

Why is face detection difficult?

Use cases of face detection applications

Datasets used for face recognition

What’s next for facial recognition technology?

Read more about face detection

Face detection: a guide to deep learning applications

Face detection: a guide to deep learning applications

Subscribe to our newsletter

Share

Subscribe to the viso blog

Face detection approaches

Feature-based face detection approach

Image-based face detection approach

Face detection methods

The beginnings

Early stages – machine learning

State of the art – deep Learning

Why is face detection difficult?

Use cases of face detection applications

Datasets used for face recognition

What’s next for facial recognition technology?

Read more about face detection