Face Detection in 2021: Real-time applications with deep learning

Face Detection with Deep Learning Methods

Face detection is one of the most widely used computer vision applications. It is a fundamental and important problem in computer vision and pattern recognition. In the last decade, multiple face feature detection methods have been introduced. However, the success of deep learning and convolutional neural network (CNN) based approaches have recently shown great successes.

What is Face Detection?

Face detection is a computer technology that determines the location and size of a human face in digital images. Given an image, the goal of facial recognition is to determine whether there are any faces and return the bounding box of each detected face (see object detection). Other objects like trees, buildings, and bodies are ignored from the digital image. Face detection can be regarded as a specific case of object-class detection, where the task is finding the location and sizes of all objects in an image that belongs to a given class.

Face detection is the necessary first step for all facial analysis algorithms, including face alignment, face recognition, face verification, and face parsing. Also, facial recognition is used in multiple areas such as content-based image retrieval, video coding, video conferencing, crowd surveillance, and intelligent human-computer interfaces.

Deep convolutional network method for face detection
Face detection with a deep convolutional network, achieving high recall of faces even with severe occlusions and head pose variations – Source

The detecting of human faces is a difficult computer vision problem. Mainly because the human face is a dynamic object and has a high degree of variability in its appearance. In recent years, facial recognition techniques have achieved significant progress. However, high-performance face detection still remains a very challenging problem, especially when there exist many tiny faces.

There are two types of approaches to detect facial parts, (1) feature-based and (2) image-based approaches.

1. Feature-based approach
  • Technique: Feature-based methods try to find invariant features of faces for detection. The underlying idea is based on the observations that the human vision can effortlessly detect faces in different poses and lighting conditions, so there must be properties or features which are consistent despite those variabilities. A wide range of methods has been proposed for the detection of facial features to then infer the presence of a face.
  • Examples: Edge detectors commonly extract facial features such as eyes, nose, mouth, eyebrows, skin-color, and hairline. Based on the extracted features, statistical models were built to describe their relationships and to verify the presence of a face in an image.
  • Advantages: Easy to implement, the traditional approach
  • Disadvantages: A major problem of feature-based algorithms is that the image features can be severely corrupted due to illumination, noise, and occlusion. Also, feature boundaries can be weakened for faces, and shadows can cause strong edges which together render perceptual grouping algorithms useless.
2. Image-based approach
  • Technique: Image-based methods try to learn templates from examples in images. Hence, appearance-based methods rely on machine learning and statistical analysis techniques to find the relevant characteristics of “face” and “no-face” images. The learned characteristics are in the form of distribution models or discriminant functions that is applied for face detection tasks.
  • Examples: Image-based approaches include neural networks (CNN), support vector machines (SVM) or Adaboost.
  • Advantages: Good performance, higher efficiency
  • Disadvantages: Difficult to implement. Dimensionality reduction is usually required, for the sake of computation efficiency and detection efficacy. This means reducing the dimensionality of the feature space with consideration by obtaining a set of principal features, retaining meaningful properties of the original data.
The face detection dataset WIDER FACE has a high degree of variability in scale, pose, occlusion, expression, appearance and illumination.
The face detection dataset WIDER FACE has a high degree of variability in scale, pose, occlusion, expression, appearance and illumination. – Source

Face Detection Methods

Multiple face detection techniques have been introduced.

  • The beginnings. Face detection has been a challenging research field since its emergence in the 1990s. Before 2000, despite many studies, the practical performance of f facial recognition was far from satisfactory until the milestone work proposed by Viola and Jones. Starting from the pioneering work of Viola-Jones (Viola and Jones 2004), face detection has made great progress. Viola and Jones pioneer to use Haar features and AdaBoost to train a face detector with promising accuracy and efficiency (Viola and Jones 2004), which inspires several different approaches afterward. However, it has several critical drawbacks. First of all, its feature size was relatively large. Also, it is not able to effectively handle non-frontal faces and faces in the wild.
  • Early stages – machine learning: Early approaches mainly focused on extracting different types of hand-crafted features with domain experts in computer vision, and training effective classifiers for detection with traditional machine learning algorithms. Such methods are limited in that they often require computer vision experts in crafting effective features and each individual component is optimized separately, making the whole detection pipeline often sub-optimal. To address the first problem, much effort has been devoted to coming up with more complicated features like HOG (histograms of oriented gradients), SIFT (Scale Invariant Feature Transform), SURF (speeded up robust features), and ACF (aggregate channel features). To enhance the robustness of detection, a combination of multiple detectors that had been trained separately for different views or poses have been developed. Nevertheless, training and testing of such models were usually more time-consuming, and the boost in detection performance was relatively limited.
  • State of the art – deep learning: Recent years have shown significant advances of facial recognition using deep learning methods, especially deep convolutional neural networks (CNN), have achieved remarkable successes in various computer vision tasks. In contrast to traditional computer vision approaches, deep learning methods avoid the hand-crafted design pipeline and have dominated many well-known benchmark evaluations, such as the ImageNet Large Scale Visual Recognition Challenge (ILSVRC). Recently, applied the Faster R-CNN, one of the state-of-the-art generic object detector, and achieved promising results. In addition, joint training conducted on CNN cascade, region proposal network (RPN), and Faster R-CNN has realized end-to-end optimization. Faster R-CNN face detection algorithm with hard negative mining and ResNet and achieved significant boosts in detection performance on face detection benchmarks like FDDB.

The Main Challenges

Challenges in face detection, are the reasons which reduce the accuracy and detection rate of facial recognition. These challenges are complex background, too many faces in images, odd expressions, illuminations, less resolution, face occlusion, skin color, distance, and orientation, etc.

  • Unusual expression. Human faces in an image may show unexpected or odd facial expressions.
  • Illuminations. Some image parts may have very high or low illumination or shadows.
  • Skin types. Detecting faces of different face colors is a challenge for detection and requires a wider diversity of training images.
  • Distance. If the distance to the camera is too high, the object size (face size) may be too small.
  • Orientation. The face orientation and angle to the camera impact the rate of face detection.
  • Complex background. A high number of objects in a scene reduces the accuracy and rate of detection.
  • Many faces in one image. An image with a high number of human faces is very challenging for an accurate detection rate.
  • Face occlusion. Faces may be partially hidden by objects such as glasses, scarves, hands, hairs, hats, and other objects which impacts the detection rate.
  • Low resolution. Low-resolution images or image noise impacts the detection rate negatively.

Face Detection Applications

  • Crowd surveillance. Face detection is used to detect crowds in frequented public or private areas.
  • Human-computer interaction. Multiple human-computer interaction-based systems use facial recognition to detect the presence of humans.
  • Photography. Some recent digital cameras use face detection for autofocus. Mobile apps use facial recognition to detect regions of interest in slideshows.
  • Facial feature extraction. Facial features like nose, eyes, mouth, skin-color and more can be extracted from images.
  • Gender classification. Applications are built to detect gender information with face detection methods.
  • Face recognition. A face recognition system is designed to identify and verify a person from a digital image or video frame.
  • Marketing. Face detection is becoming more and more important for marketing, to analyze customer behavior or for targeted advertising.
  • Attendance. Facial recognition is used to detect the attendance of humans, it is often combined with biometric detection for access management.

Datasets used for Face Recognition

  • Annotated Faces in the Wild Dataset (AFW). The AFW dataset is built using Flickr images. It includes 205 images with 473 labeled faces. For each face, annotations include a rectangular bounding box, 6 landmarks and the pose angles.
  • PASCAL Face Dataset (PASCAL FACE). This dataset is used for facial recognition and face recognition, it is a subset of the PASCAL VOC and contains 1335 labeled faces in 851 images with large face appearance and pose variations.
  • MIT Face Dataset (CBCL Face Database). The MIT-CBCL face recognition database contains a training set (2’429 faces, 4’548 non-faces) and a test set (472 faces, 23’573 non-faces).
  • Face Detection Data Set and Benchmark (FDDB). The dataset contains 5171 faces annotated in 2845 images with a wide range of difficulties, such as occlusions, difficult poses, and low image resolutions. These images are used to train with large appearance changes, heavy occlusions, and severe blur degradations that are prevalent in detecting a face in unconstrained real-life scenarios.
  • CMU Multi-PIE Database (PIE). The CMU Multi-PIE Face Database contains 41’368 images of 68 people, each person under 13 different poses, 43 different illumination conditions, and with 4 different expressions.
  • Surveillance Cameras Face Database (SCface Dataset). SCface is a database of static images of human faces. The images were taken in an uncontrolled indoor environment using five video surveillance cameras of various qualities. The dataset contains 4’160 static images (visible and infrared spectrum) of 130 subjects.
  • WIDER FACE dataset (WIDER). The face detection benchmark dataset includes 32’203 images and 393’703 labeled faces with a high degree of variability in scale, pose and occlusion which makes face detection extremely challenging. Also, the WIDER FACE dataset is organized based on 61 event classes.

What’s next

Learn more about other popular fields of computer vision and deep learning technologies. We recommend you to explore the following topics:

Share on linkedin
Share on twitter
Share on whatsapp
Share on facebook
Share on email
Related Articles

Want to use Computer Vision applications?

Get the all-in-one Suite to build and deliver Computer Vision Applications. 
Learn more

This website uses cookies. By continuing to browse this site, you agree to this use.