viso.ai
        • Train

          Develop

          Deploy

          Operate

          Data Collection

          Building Blocks​

          Device Enrollment

          Monitoring Dashboards

          Video Annotation​

          Application Editor​

          Device Management

          Remote Maintenance

          Model Training

          Application Library

          Deployment Manager

          Unified Security Center

          AI Model Library

          Configuration Manager

          IoT Edge Gateway

          Privacy-preserving AI

          Ready to get started?

          Overview
          Whitepaper
          Expert Services
  • Customers
  • Company
Search
Close this search box.

Active Learning in Computer Vision – Complete 2024 Guide

active-learning-computer-vision-deep-learning-visoai
Build, deploy, operate computer vision at scale
  • One platform for all use cases
  • Scale on robust infrastructure
  • Enterprise security
Contents

Active Learning is a subset of Machine Learning where a model is trained on a limited amount of labeled data, and it then actively selects additional data points to be labeled to improve its performance. In this article, we will explore the concept of active learning in computer vision, related terms, real-world examples, and its benefits.

This article will cover the following:

  • Concept and definition of active learning
  • Step-by-step process of the active learning feedback loop.
  • Different query strategies used in active learning
  • Advantages of active learning methods
  • Real-world applications of active learning

About us: Viso.ai powers the leading end-to-end Computer Vision Platform Viso Suite. Our solution enables teams to seamlessly build and deliver computer vision applications. Get a demo for your company.

One unified infrastructure to build deploy scale secure

real-world computer vision

What is Active Learning in Machine Learning?

Active learning is a machine learning technique that involves an algorithm iteratively selecting the most informative samples for labeling to improve the model’s performance. The idea behind active learning is that by selecting the most informative samples for labeling, the model can learn more efficiently and accurately, thereby reducing the amount of labeled data required for training.

 

active learning in computer vision
Concept of the active learning cycle

 

The active learning process starts with a small set of labeled data and then selects data points for labeling in an iterative process, thereby minimizing the cost of annotating data manually. This is particularly important in laborious labeling tasks such as image annotation with massive amounts of data.

 

learning curves for a classification task
Learning curves for two selection strategies: Uncertainty sampling (active learning) and random sampling (passive learning)

 

Active Learning Approaches

A query strategy determines the method used by the active learning algorithm to select the most informative samples for labeling. Some popular query strategies include uncertainty sampling, diversity sampling, and entropy-based sampling:

  • Pool-based Sampling: Given a pool of unlabeled samples, pool-based active learning tries to select the most useful ones to label so that a model built from them can achieve the best possible performance.
  • Uncertainty Sampling: This query strategy selects data points that the model is uncertain about, i.e., data points with a high variance in predictions.
  • Diversity Sampling: For this query strategy, the algorithm selects data points that represent a diverse range of features or data distribution.
  • Entropy-Based Sampling: This query strategy selects data points with the highest entropy, i.e., data points that are the most uncertain.
  • Membership Query Synthesis: This is where the learner generates new unlabeled instances for querying by itself instead of selecting samples from the real-world distribution.
  • Stream-based Sampling: The selection process is similar to a pipeline where the unlabeled samples are firstly input into the model one by one. Then, the active learning strategy needs to decide whether to pass it to the annotator for labeling or reject it directly.

 

Pool-based active learning
Pool-based active learning workflow

 

The Active Learning Process

See how incorporating active learning works in the example of the following step-by-step process:

  • Step #1: Start with a small set of labeled data: The process starts with a small set of labeled data. This labeled data is used to train an initial ML model.
  • Step #2: Train a machine learning model: The labeled data is used to train a machine learning model. This model is used to make predictions on new data that is not labeled.
  • Step #3: Select the most informative samples: The algorithm selects the most informative samples for labeling based on a query strategy. The query strategy determines the method used by the algorithm to select the most informative samples for labeling.
  • Step #4: Label the selected samples: The selected samples are manually labeled by humans, and the labeled data is added to the training data.
  • Step #5: Retrain the machine learning model: The newly labeled data is added to the training data, and the machine learning model is retrained on the expanded dataset.
  • Step #6: Repeat steps 3-5: The active learning model continues to select the most informative samples for labeling and adds them to the training data, and the model is retrained.

This iterative process continues until the model performance reaches a desired level or the cost of additional data collection and data labeling outweigh the benefits of improved model performance.

 

 

Real-World Examples of Active Learning

Medical Image Analysis

Active learning in medical image analysis has been extensively researched, with several studies showing improved accuracy with less labeled data. In one study, researchers used active learning frameworks for medical image segmentation.

They achieved full accuracy while only using 22.69% and 48.85% of the available data for each dataset, respectively. The datasets were composed of MRI scans and CT scans of tumors.

Lung cancer classification model
Lung cancer classification model to analyze CT medical imaging

 

Object Detection and Counting

Active learning is increasingly applied for image recognition. In object detection, it can be used to improve the detection of rare objects in a dataset. For instance, a custom object detection model can be trained on very small datasets and then iteratively select the most informative samples for labeling, which can help the model learn to detect rare objects with greater accuracy.

Rare object detection is important in manufacturing for quality control applications to identify detective products automatically. Other applications include security and surveillance applications for the detection of suspicious behavior and unauthorized access.

In environmental monitoring, rare detection can be used to identify unusual species or environmental changes in water quality or air pollution.

Casting manufacturing product quality inspection to detect irregularities
Product quality inspection with AI vision trained using YOLOv7 – Built on Viso Suite
Autonomous Vehicles

Active learning is popularly used for the training of computer vision models in autonomous driving. For example, a model can be trained on a small set of labeled data and then iteratively select the most informative samples for labeling, such as identifying objects on the road, pedestrians, and traffic signs, which can help improve the vehicle’s perception.

 

YOLOS for real-time traffic object detection
YOLOS for real-time traffic object detection

 

Benefits of Active Learning

When creating a new labeled dataset, human data scientists and annotators must review and annotate large numbers of images. This image annotation process is time-consuming and a barrier to the deployment of new computer vision solutions, particularly for rarely occurring objects.

  1. Reduced Labeling Costs: Active learning can significantly reduce the cost of annotating data, as it enables the model to learn from a limited amount of labeled data.
  2. Data Reduction: Active learning requires significantly fewer data points compared to passive learning on a randomly acquired dataset while it still can achieve full accuracy.
  3. Improved Model Performance: Active learning can improve model performance by selecting the most informative samples for labeling, which can help the model learn more efficiently and accurately.
  4. Faster Time to Market: Active learning can reduce the time to market for machine learning applications by enabling models to be trained more quickly and efficiently.
model performance and amount of annotated data in active learning
Comparison of model performance and amount of annotated data active versus passive learning

 

Concepts Related to Active Learning

Active learning is related to several other concepts in machine learning:

  1. Semi-supervised learning is a type of machine learning in which the AI algorithm is trained on both labeled and unlabeled data. Active learning can be used as a strategy for selecting which examples to label in semi-supervised learning.
  2. Reinforcement learning is a machine learning technique where an ML model learns to make decisions based on rewards and punishments. Active learning can be used to select which actions to take in reinforcement learning.
  3. Transfer learning is another machine learning type in which knowledge learned in one task is applied to another task. Active learning can be used as a strategy for deciding which examples to transfer knowledge from in transfer learning.

 

Conclusion

Active learning is a powerful technique for improving the efficiency of machine learning algorithms. By selecting the most informative examples to learn from, active learning reduces the amount of labeled data required to train a model. Meanwhile, the model accuracy can be maintained or even improved.

Various Active Learning techniques have been used in real-world applications. As the amount of available data continues to grow, this technique is likely to become an increasingly important tool in the machine learning toolbox.

 

Play Video