Active Learning is a subset of Machine Learning where a model is trained on a limited amount of labeled data, and it then actively selects additional data points to be labeled to improve its performance. In this article, we will explore the concept of active learning in computer vision, related terms, real-world examples, and its benefits.
This article will cover the following:
- Concept and definition of active learning
- Step-by-step process of the active learning feedback loop.
- Different query strategies used in active learning
- Advantages of active learning methods
- Real-world applications of active learning
About us: Viso.ai powers the leading end-to-end Computer Vision Platform Viso Suite. Our solution enables teams to seamlessly build and deliver computer vision applications. Get a demo for your company.
What is Active Learning in Machine Learning?
Active learning is a machine learning technique that involves an algorithm iteratively selecting the most informative samples for labeling to improve the model’s performance. The idea behind active learning is that by selecting the most informative samples for labeling, the model can learn more efficiently and accurately, thereby reducing the amount of labeled data required for training.
The active learning process starts with a small set of labeled data and then selects data points for labeling in an iterative process, thereby minimizing the cost of annotating data manually. This is particularly important in laborious labeling tasks such as image annotation with massive amounts of data.
Active Learning Strategies
A query strategy determines the method used by the active learning algorithm to select the most informative samples for labeling. Some popular query strategies include uncertainty sampling, diversity sampling, and entropy-based sampling:
- Pool-based Sampling: Given a pool of unlabeled samples, pool-based active learning tries to select the most useful ones to label so that a model built from them can achieve the best possible performance.
- Uncertainty Sampling: This query strategy selects data points that the model is uncertain about, i.e., data points with a high variance in predictions.
- Diversity Sampling: For this query strategy, the algorithm selects data points that represent a diverse range of features or data distribution.
- Entropy-Based Sampling: This query strategy selects data points with the highest entropy, i.e., data points that are the most uncertain.
- Membership Query Synthesis: This is where the learner generates new unlabeled instances for querying by itself instead of selecting samples from the real-world distribution.
- Stream-based Sampling: The selection process is similar to a pipeline where the unlabeled samples are firstly input into the model one by one. Then, the active learning strategy needs to decide whether to pass it to the annotator for labeling or reject it directly.
The Active Learning Process
See how active learning works at the example of the following step-by-step process:
- Step #1: Start with a small set of labeled data: The active learning process starts with a small set of labeled data. This labeled data is used to train an initial ML model.
- Step #2: Train a machine learning model: The labeled data is used to train a machine learning model. This model is used to make predictions on new data that is not labeled.
- Step #3: Select the most informative samples: The active learning algorithm selects the most informative samples for labeling based on a query strategy. The query strategy determines the method used by the algorithm to select the most informative samples for labeling.
- Step #4: Label the selected samples: The selected samples are manually labeled by humans, and the labeled data is added to the training data.
- Step #5: Retrain the machine learning model: The newly labeled data is added to the training data, and the machine learning model is retrained on the expanded dataset.
- Step #6: Repeat steps 3-5: The active learning model continues to select the most informative samples for labeling and adds them to the training data, and the model is retrained.
Real-World Examples of Active Learning
Medical Image Analysis
Active learning in medical image analysis has been extensively researched, with several studies showing improved accuracy with less labeled data. In one study, researchers used active learning frameworks for medical image segmentation.
They achieved full accuracy while only using 22.69% and 48.85% of the available data for each dataset, respectively. The datasets were composed of MRI scans and CT scans of tumors.
Object Detection and Counting
Active learning is increasingly applied for image recognition. In object detection, active learning can be used to improve the detection of rare objects in a dataset. For instance, a custom object detection model can be trained on very small datasets and then iteratively select the most informative samples for labeling, which can help the model learn to detect rare objects with greater accuracy.
Rare object detection is important in manufacturing for quality control applications to identify detective products automatically. Other applications include security and surveillance applications for the detection of suspicious behavior and unauthorized access.
In environmental monitoring, rare detection can be used to identify unusual species or environmental changes in water quality or air pollution.
Active learning is popularly used for the training of computer vision models in autonomous driving. For example, a model can be trained on a small set of labeled data and then iteratively select the most informative samples for labeling, such as identifying objects on the road, pedestrians, and traffic signs, which can help improve the vehicle’s perception.
Benefits of Active Learning
When creating a new labeled dataset, human data scientists and annotators must review and annotate large numbers of images. This image annotation process is time-consuming and a barrier to the deployment of new computer vision solutions, particularly for rarely occurring objects.
- Reduced Labeling Costs: Active learning can significantly reduce the cost of annotating data, as it enables the model to learn from a limited amount of labeled data.
- Data Reduction: Active learning requires significantly fewer data points compared to passive learning on a randomly acquired dataset while it still can achieve full accuracy.
- Improved Model Performance: Active learning can improve model performance by selecting the most informative samples for labeling, which can help the model learn more efficiently and accurately.
- Faster Time to Market: Active learning can reduce the time to market for machine learning applications by enabling models to be trained more quickly and efficiently.
Concepts Related to Active Learning
Active learning is related to several other concepts in machine learning:
- Semi-supervised learning is a type of machine learning in which the AI algorithm is trained on both labeled and unlabeled data. Active learning can be used as a strategy for selecting which examples to label in semi-supervised learning.
- Reinforcement learning is a machine learning technique where an ML model learns to make decisions based on rewards and punishments. Active learning can be used to select which actions to take in reinforcement learning.
- Transfer learning is another machine learning type in which knowledge learned in one task is applied to another task. Active learning can be used as a strategy for deciding which examples to transfer knowledge from in transfer learning.
Active learning is a powerful technique for improving the efficiency of machine learning algorithms. By selecting the most informative examples to learn from, active learning reduces the amount of labeled data required to train a model. Meanwhile, the model accuracy can be maintained or even improved.
Various Active Learning techniques have been used in real-world applications. As the amount of available data continues to grow, this technique is likely to become an increasingly important tool in the machine learning toolbox.