• Train

          Develop

          Deploy

          Operate

          Data Collection

          Building Blocks​

          Device Enrollment

          Monitoring Dashboards

          Video Annotation​

          Application Editor​

          Device Management

          Remote Maintenance

          Model Training

          Application Library

          Deployment Manager

          Unified Security Center

          AI Model Library

          Configuration Manager

          IoT Edge Gateway

          Privacy-preserving AI

          Ready to get started?

          Overview
          Whitepaper
          Expert Services
  • Why Viso Suite
  • Pricing
Search
Close this search box.

Deep Learning for Person Re-Identification

About

Viso Suite is the all-in-one solution for teams to build, deliver, scale computer vision applications.

Contents
Need Computer Vision?

Viso Suite is the world’s only end-to-end computer vision platform. Request a demo.

Person re-identification (Re-ID) is used to retrieve a person of interest across multiple non-overlapping cameras. With the advancement of deep neural networks and the increasing demand for intelligent video surveillance, this problem has gained significantly increased interest in the computer vision community.

This article will cover the following aspects:

  1. What is Person Re-Identification?
  2. What are the main challenges?
  3. How does Re-Identification with Deep Learning work?
  4. The next step: Unsupervised Re-Identification
  5. Outlook and what to expect in the future

About us: Viso.ai provides the world’s only end-to-end computer vision platform. Leading organizations across industries use it to build and implement their custom deep learning applications. Get a personalized demo here.

 

computer vision in smart city
Computer vision in Smart Cities – Built with Viso Suite

 

What Is Person Re-Identification?

Person Re-Identification Problem

Person re-identification is a specific person retrieval problem across non-overlapping, disjoint cameras. Re-ID aims to determine whether a person-of-interest has appeared in another place at a distinct time captured by a different camera or even the same camera at a different time instant. A person’s query can be represented by an image, a video sequence, and even a text description.

The field of re-identification is a widely studied research field. With the urgent demand for public safety and an increasing number of surveillance cameras, the re-identification of people is also an important goal with great practical importance.

 

Challenges of Person Re-Identification

Re-identification is challenging due to various viewpoints, low-image resolutions, illumination changes, unconstrained poses, occlusions, heterogeneous modalities, complex camera environments, background clutter, unreliable bounding box generation, and more. All those factors lead to greatly varying settings and uncertainty.

Additionally, for practical model deployment, the dynamically updated camera network, a large-scale gallery with efficient retrieval, group uncertainty, unseen testing scenarios, incremental model updating, and changing clothes also greatly increase the difficulties.

These challenges are the main reason that re-identification is still considered to be an unsolved problem for real-world applications.

 

Person and object detection in low-illumination
Person and object detection in low-illumination with YOLOv7 – built on Viso Suite

 

Re-ID with Deep Learning Methods

Early approaches mainly focus on hand-crafted feature construction with body structures or distance metric learning. However, with the advancement of deep learning, person re-identification has achieved promising performance on the popular benchmarks.

However, there is still a large gap between the research-oriented scenarios and practical vision re-identification applications.

 

How Re-Identification With Deep Learning Works

The following shows the concept of a practical person re-identification system to solve the problem of pedestrian retrieval across multiple surveillance cameras. Generally, building a person re-identification system requires five main steps:

  1. Video Data Collection: The primary requirement is the availability of raw video data from surveillance cameras. Such cameras are usually placed in different places under varying environments. Often, the raw visual data contains a large amount of complex and noisy background clutter.
  2. Bounding Box Generation: People in the video data are detected using person detection and tracking algorithms. Bounding boxes that contain the person images are extracted from the video data.
  3. Training Data Annotation: The cross-camera labels are annotated. Training data annotation is usually essential for discriminative Re-identification model learning due to the large cross-camera variations. For large domain shifts, the training data usually needs to be annotated in every new scenario.
  4. Model Training: In the training phase, a discriminative and robust Re-ID model is trained with the previously annotated person images or videos. This is the core of the development of a re-identification system and is widely researched. Extensive models have been developed to handle the various challenges, concentrating on feature representation learning, distance metric learning, or their combinations.
  5. Pedestrian Retrieval: The testing phase conducts the pedestrian retrieval. Given a query for a person-of-interest and a gallery set, the Re-ID model extracts feature representations learned in the previous stage. A ranking list is obtained by sorting the calculated query-to-gallery similarity (probability of ID-match).

 

Deep Learning Method for Person Re-Identification
Steps of a Deep Learning based Method for Person Re-Identification – Source

 

State-of-the-Art Re-Identification: Closed-World

The widely studied “closed-world” setting is usually applied under research assumptions and has achieved relevant advances using deep learning techniques on several datasets. Typically, a standard closed-world Re-ID system contains three main components:

  • Feature Representation Learning, which focuses on developing feature construction strategies.
  • Deep Metric Learning for designing the training objectives with different loss functions or sampling strategies.
  • Ranking Optimization to optimize the retrieved ranking list.
Concept of a person re-identification system
Concept of a vision-based Person Re-Identification System – Source

The Next Era of Re-Identification: Open-World

With the performance saturation in a closed-world setting, the research focus for person Re-ID has recently moved to the open-world setting, facing more challenging issues:

  • Heterogeneous Re-ID by matching person images across heterogeneous modalities. This includes re-identification between depth and RGB images, text-to-image re-identification, visible-to-infrared re-identification, and cross-resolution re-identification.
  • End-to-end Re-ID from the raw images or videos. This alleviates the reliance on the additional step for bounding box generation.
  • Noise-robust Re-ID. This includes partial Re-ID with heavy occlusion, Re-ID with sample noise caused by detection or tracking errors, and Re-ID with label noise caused by annotation error.
  • Open-set person Re-ID. When the correct match does not occur in the gallery, Open-set Re-identification is usually formulated as a person verification problem, such as discriminating whether two person images belong to the same identity.
  • Semi- or unsupervised Re-ID with limited or unavailable annotated labels.

 

Unsupervised Re-Identification with Deep Learning

In recent years, video-based re-identification has made great advances. Video sequences provide visual and temporal information that can be obtained using object tracking algorithms in practical video surveillance applications.

However, the annotation difficulty limits the scalability of supervised methods in large-scale camera networks enabled by distributed Edge AI, which drives the need for unsupervised video re-identification.

The difference between unsupervised learning and supervised learning is the availability of labels (image annotation). An intuitive idea for unsupervised learning is to estimate Re-identification labels as accurately as possible, which is called “cross-camera label estimation”.

The estimated labels are subsequently used in feature learning to train robust re-ID models.

With the success of deep learning, Unsupervised Re-ID has achieved increasing attention in recent years. Within three years, the unsupervised Re-ID performance for the Market-1501 dataset has increased significantly, the Rank-1 accuracy increased from 54.5% to 90.3%, and mAP increased from 26.3% to 76.7%. Even given the promising achievements, the current unsupervised Re-identification is still underdeveloped and has to be further improved.

There is still a large gap between the unsupervised and supervised Re-ID. For example, the rank-1 accuracy of supervised ConsAtt has achieved 96.1% on the Market-1501 dataset, while the highest accuracy of unsupervised SpCL is about 90.3%. Recently, researchers demonstrated that unsupervised learning with large-scale unlabeled training data has the ability to outperform supervised learning on various tasks.

 

What’s next

Person Re-identification (Re-ID) solves a visual retrieval problem by searching for the queried person from a gallery of disjoint cameras. Deep learning techniques paved the way for important breakthroughs in recent years.

In the future, we expect to see several breakthroughs in supervised Re-identification methods for open-world settings, using unsupervised Re-identification techniques to overcome the bottlenecks of data annotation.

If you want to learn more about related topics, we recommend the following articles:

 

References:

  • Deep Learning For Person Re-Identification – Source
  • Dynamic Label Graph Matching for Unsupervised Video Re-Identification – Source
  • Momentum Contrast for Unsupervised Visual Representation Learning – Source
  • Survey on Reliable Deep Learning-Based Person Re-Identification Models – Source

Follow us

Related Articles

Join 6,300+ Fellow
AI Enthusiasts

Get expert news and updates straight to your inbox. Subscribe to the Viso Blog.

Sign up to receive news and other stories from viso.ai. Your information will be used in accordance with viso.ai's privacy policy. You may opt out at any time.
Play Video

Join 6,300+ Fellow
AI Enthusiasts

Get expert AI news 2x a month. Subscribe to the most read Computer Vision Blog.

You can unsubscribe anytime. See our privacy policy.

Build any Computer Vision Application, 10x faster

All-in-one Computer Vision Platform for businesses to build, deploy and scale real-world applications.