Search
Close this search box.

What Does it Mean to Have “Humans-in-the-Loop?”

human-in-the-loop machine learning for computer vision

Build, deploy, operate computer vision at scale

  • One platform for all use cases
  • Connect all your cameras
  • Flexible for your needs
Contents

Human-in-the-loop (HITL) is a machine-learning (ML) training technique that aims to incorporate human feedback into the ML training process. This is an iterative approach where the user interacts with a machine-learning algorithm such as a computer vision (CV) system and provides feedback on its outputs. This allows the artificial intelligence (AI) model to adapt and change its perspective with every feedback. This iterative approach involves user interaction with a machine-learning algorithm, such as a computer vision (CV) system, providing feedback on its outputs.

Machine learning (ML) and Artificial Intelligence have become state-of-the-art techniques for many tasks including computer vision. However, there are many unique challenges when it comes to creating such systems. Techniques like Human-in-the-loop suggest that incorporating user knowledge into the system can be beneficial. This means more accurate results and automated machine-learning processes mainly because human domain knowledge exceeds that of machine learning.

In this comprehensive overview, we will explore human-in-the-loop machine learning for computer vision tasks. We will explain its key principles, its applications in computer vision, benefits, and challenges, as well as best practices.

Understanding Human-in-the-Loop Machine Learning

HITL ML is becoming an increasingly important area of research because the integration of human knowledge and experience can train much more accurate models with minimum cost. In a typical workflow for building a machine-learning model, training can be one of the most resource-extensive steps in time and computation power.  However, if the developers can encourage the model to learn from human feedback we can get to faster and more accurate training processes. In this section, we will understand what human-in-the-loop ML is and how it works.

What is Human-in-the-loop Machine Learning

The human-in-the-loop concept is an extensive research area encompassing the intersection of computer science, cognitive science, and psychology. When building machine-learning models three main steps create a cycle or a loop where humans can interfere.

The Machine Learning Cycle for human-in-the-loop
Machine Learning Development Cycle. Source.
  • First, is data preprocessing, where developers and data scientists prepare the data to become suitable for machine learning models.
  • Second, is data modeling or learning where the model is being fitted on the data using techniques like backpropagation.
  • Lastly, the developer modifies and repeats depending on the results.

The machine-human interaction in these steps can bring the best learning effects, especially since the results of ML models are generally unpredictable. Human-in-the-loop is the integration of human knowledge into the ML cycle. This can be integrated into different steps of the cycle, humans can interact with the data preprocessing for example by labeling the data iteratively while training the model. Next, let’s discuss the different roles humans can play in the ML cycle.

Roles of Humans in the Machine Learning Cycle

The recent developments in deep learning models have led to AI’s irreplaceable role in many fields. Consequently, human-in-the-loop machine learning is gaining increasing importance. Different research has pointed out the various parts where manual human intervention can be applied. Following are some of those areas.

  • Data Processing
  • Model Training and Inference
  • System Construction and Application

Each of these represents a stage in the pipeline where humans can interact, they each play an important role in the human-in-the-loop pipeline to affect the AI’s learning outcomes.  In computer vision (CV) data processing is an important step, humans annotate and label the data, as well as cleaning and analyzing data to ensure its quality and identify ways to improve model performance.

The data processing in human-in-the-loop
A human-in-the-loop data processing pipeline. Source.

For model training and inference, humans can actively refine the output of machine learning models through active learning. An expert human can interact with the model and correct its output through feedback and correction. Image restoration, for example, can be improved with this process in computer vision. A human can iteratively fix the output image, put it back on the dataset, and retrain the model.

In system construction and application, humans can design systems and user interfaces around machine learning models and incorporate their domain knowledge into the system’s decision-making processes. Humans can also actively function as supervisors and users, by monitoring the system’s performance, providing feedback, and participating in decision-making.

Applications of Human-in-the-loop in Computer Vision

Our primary focus in this article is on the application of human-in-the-loop in computer vision. Computer vision (CV) models rely on deep learning architectures consisting of artificial neural networks or convolutional neural networks (CNNs). However, these methods can encounter limitations in handling some scenarios. To improve these models we can integrate human feedback into the deep learning architecture which makes the system more accurate and better at its task.  In this section, we will explore the applications of human-in-the-loop in CV.

Image Classification and Object Detection

Image classification, image recognition, and object detection are some of the most fundamental topics in computer vision. Those fields have received significant attention in recent years improving performance and efficiency at all levels. Image classification and object detection are similar tasks that detect visual objects of a specific class or multiple classes (individuals, vehicles, animals, etc.). A human’s role in these tasks can be to verify the detected objects or detect objects that the model can’t detect automatically. Then train the model over the supplementary objects annotated by individuals.

general human-in-the-loop frameworks in computer vision
The general human-in-the-loop frameworks for model training and inferencing in Computer Vision. Source.

However, the integration of human feedback and verification into object detection and image classification is a challenging task, thus, many approaches were introduced to address these challenges. One of those early approaches addresses the cost and time consumption required with the number of iterations in human-in-the-loop frameworks. For this approach, researchers used an active learning approach to minimize the human annotation time and optimize the model based on annotation costs.

Interactive annotation in human-in-the-loop in computer vision
The early interactive object detection framework. The numbers denote the predicted annotation cost. Source.

However, this approach is old, and computer vision tasks have gotten much more complex which puts a lot of challenges on such an approach. Newer research has introduced more efficient and suitable frameworks. One specific research put forward an efficient human-in-the-loop object detection framework composed of bi-directional deepSORT and annotation-free segment identification (AFSID). Bi-directional deep SORT improves object tracking by running the deep SORT algorithm both forward and backward. AFSID analyzes videos to identify segments where objects are likely tracked accurately, eliminating the need for human annotation in those sections.

A Schematic diagram of proposed human-in-the-loop object detection framework.
A Schematic diagram of a proposed human-in-the-loop object detection framework. Source.
Semantic Segmentation and Instance Segmentation

Image segmentation is a crucial task in computer vision. This field’s popularity has exploded recently because it plays a crucial role in a variety of computer vision applications. Semantic segmentation is the more general task that classifies pixels based on semantic meaning. This process treats all objects of the same category as one. On the other hand instance segmentation can effectively differentiate between objects of the same class and make multiple predictions.

However, since this task involves pixel-wise accuracy it can be difficult to integrate human feedback into the loop. So, a few approaches were introduced as human-in-the-loop solutions for segmentation. This first approach uses human-in-the-loop data processing. The researchers identify subsets that can be visually much harder for a segmentation model. Experts refine this list further, ensuring high-quality ‘gotchas’ for the model. Then the model is retrained on these tricky examples, improving its performance.

Human-in-the-loop for image segmentation
Here the researchers compare two subsets of the data with (A) being much harder visually. Source.

Image segmentation can be very useful when it comes to medical images. The precise pixel-wise classification can provide great accuracy for diagnosis and treatment planning like outlining tumors. With the new research in human-in-the-loop for computer vision, researchers introduced other ways we can interact and collaborate with CV models. One specific research proposed the use of conditional generative adversarial networks (cGANs) to do the initial segmentation and rank it based on how easy or hard it is. Experts step in to label the difficult cases and use them to improve the model.

Human-in-the-loop for medical imaging segmentation
Learning interactive cGAN in a loop. Source.

Human-in-the-loop frameworks can work with other CV tasks like image restoration video segmentation and more. Next, let’s look at the benefits and disadvantages of human-in-the-loop in computer vision.

Benefits and Challenges of Human-in-the-loop in Computer Vision

We have seen the multiple ways a human can collaborate with computer vision models, from data processing to system construction, humans can play an important role in improving computer vision models. In this section, we will explore how human-in-the-loop can benefit CV models and look at quantitative results. We will also look into the challenges with human-in-the-loop approaches for computer vision.

More recently, a growing number of researchers are making efforts to incorporate human knowledge into ML systems. This gives us the data we need to study the effects and challenges of these approaches.

Human-in-the-loop number of publications data.
The increasing research interest in human-in-the-loop. Source.

As we can see the increasing number of publications in the HITL field is growing year over year. Now, let’s look at the difference HITL made to benefit different CV models.

Advantages of Human-in-the-loop

Human-in-the-loop aims to improve machine-learning outcomes from multiple aspects. Let’s delve into these benefits and explore their quantitative impact.

  • Improved Accuracy and Performance: Models can struggle to handle complex scenarios like blurry images, or rare object classes, especially if trained on limited data sets. The human intervention helps the model learn and adapt to these edge cases, improving its overall performance.
  • Faster and More Efficient Training: Active learning and semi-supervised learning are widely studied and used in computer vision. Those methods allow humans to make input on the most informative examples. This streamlines training and saves time.
  • Increased Interpretability: Bias and AI explainability are major concerns in computer vision. Human input helps identify and address bias in AI systems, while also making the model’s decision-making process more explainable.

Integrating humans into the computer vision loop has many benefits from better performance to more reliable, trustworthy, and ethically sound AI systems. Now, let’s discuss some quantitive results from research.

Results for human-in-the-loop in computer vision.
overview of representative works in human-in-the-loop CV. Source.

The table above shows different research, with the corresponding task type (OD: Object Detection, IR: Image Restoration, IS: Image Segmentation, IE: Image Enhancement, VOS: Video Object Segmentation), motivation, and quantitative results.

  • Human-Machine Collaboration for Medical Image Segmentation (Ravanbakhsh et al. (2020)): An improvement from 0.645 accuracy to 0.846 with the framework used.
  • Interactive Video Object Segmentation in the Wild (Benard et al. (2017)): This paper addresses the interesting task of video object segmentation. This paper proposed a human-in-the-loop framework for this task which improved the Intersection over Union (IoU) score from 0.504 to 0.822 which are impressive results.

This being said, human-in-the-loop frameworks often face challenges, let’s explore these next.

Challenges of Human-in-the-loop for CV

While human-in-the-loop offers many advantages, it also comes with fundamental challenges. These challenges can range from comprehensive systems to effectively integrating human input. Let’s explore some key challenges when implementing HITL for computer vision tasks.

  • Effective Human-Image Interaction: It is challenging to directly allow people to interact with images effectively beyond simple labeling. So, researchers have to focus on ways to add human experience and knowledge to the model throughout the cycle. One effective way researchers are studying is to use multi-modal approaches that could potentially bridge this gap and enhance the interaction process.
  • Knowledge Input: Figuring out how models can learn from more abstract human knowledge like reasoning, and design principles remains a challenge in all HITL approaches.
  • Sample Selection: We have mentioned how researchers use difficult examples and edge cases to refine and improve the model performance. However, finding metrics to identify such images is not always straightforward. Confidence-based methods work well for classification tasks, but for other vision tasks like segmentation or object detection, it gets tricky.
  • General Frameworks: While some platforms have been developed to encompass the human-AI interaction such as Prodigy and Labelbox, creating a single human-in-the-loop system that can handle a variety of computer vision tasks remains an open challenge.

However, development for the human-in-the-loop field is ongoing and promises even greater possibilities for collaboration between humans and AI.

The Future of HITL For Computer Vision

We have seen the potential of human-in-the-loop for computer vision tasks. This potential can reshape industry standards by integrating human intelligence into the ML development cycle which can greatly improve the quality of training data for computer vision models, leading to more accurate and adaptable models. While humans can’t process and analyze huge datasets of images or videos, computer vision works to do that, even in real-time.

However, humans can still intervene in this process through different stages and improve the end outcome. By improving the accuracy and precision of these models we can expect to enhance fields such as medical imaging and autonomous vehicles. We also discussed the challenges of such systems, like striking the right balance between human intervention and automation. Addressing such challenges will help us mitigate human error, and ethical concerns which will be crucial in ensuring the responsible and effective deployment of HITL systems.

Despite the challenges, the benefits of human-in-the-loop machine learning are undeniable. The collaborative relationship between humans and AI can create computer vision systems that are more accurate, transparent, and trustworthy. Lastly, the future of human-in-the-loop machine learning in computer vision is bright and full of possibilities to empower us with AI systems that are more capable than ever.

FAQs

Q1. What is Human-in-the-Loop Machine Learning (HITL)?

HITL is all about integrating human expertise and feedback into the machine-learning process to improve model performance and adaptability.

Q2. How is Human-in-the-loop ML Applied?

Human feedback and input can be integrated into different stages of the machine-learning process. This includes data processing (annotation, labeling, etc.), training, and inference which creates a loop of inferencing the model, refining the data, and re-training, or even in the system construction stage.

Q3. How does human input help in computer vision tasks like object detection?

It’s like having a teacher double-check your work. Humans can verify the model outputs, spot undetected objects, and refine tricky situations like blurry images or rare objects for the model to learn better.

Q4. What is the future of HITL ML in computer vision?

HITL ML has the potential to revolutionize computer vision by creating more accurate, transparent, and trustworthy AI systems that can tackle complex real-world problems.