Image annotation plays a significant role in computer vision, the technology that allows computers to gain high-level understanding from digital images or videos and to see and interpret visual information just like humans.
Computer vision technology provides several astonishing AI applications like self-driving cars, tumor detection, and unmanned aerial vehicles. However, most of these remarkable applications of computer vision would be possible without image annotation.
Image annotation is a primary step in the creation of most computer vision models. It is necessary for datasets to be useful components of machine learning and deep learning for computer vision.
This article provides a detailed dive into the purpose of image annotation, types of image annotation, and techniques used to make image annotation possible.
In particular, this article will discuss:
- Definition of image annotation. What is image annotation, and why is it needed?
- Process of annotating images. How to successfully annotate an image dataset.
- Types of image annotation. Popular algorithms and distinct strategies for annotating images.
What is Image Annotation?
Image annotation is the process of labeling images of a dataset to train a machine learning model. Therefore, image annotation is used to label the features you need your system to recognize. Training an ML model with labeled data is called Supervised Learning.
The annotation task usually involves manual work, sometimes with computer-assisted help. A Machine Learning engineer predetermines the labels, known as “classes”, and provides the image-specific information to the computer vision model. After the model is trained and deployed, it will predict and recognize those predetermined features in new images that have not been annotated yet.
Why is Image Annotation needed?
Labeling images is necessary for functional datasets because it lets the training model know what the important parts of the image are (classes) so that it can later use those notes to identify those classes in new, never-before-seen images.
Video annotation is based on the concept of image annotation. For video annotation, features are manually labeled on every video frame (image) to train a machine learning model for video detection. Hence, the dataset for a video detection model is comprised of images rather than videos.
How does Image Annotation work?
To annotate images, you can use any open source or freeware data annotation tool. The Computer Vision Annotation Tool (CVAT) is probably the most popular open-source image annotation tool.
While dealing with a large amount of data, a trained workforce will be required to annotate the images. The annotation tools offer different sets of features to annotate single or multiple frames efficiently.
Labels are applied to the objects using any of the annotation techniques explained below within an image; the number of labels on each image may vary, depending upon the use case.
How to Annotate Images?
- Step #1: Prepare your image dataset.
- Step #2: Specify the class labels of objects to detect.
- Step #3: In every image, draw a box around the object you want to detect.
- Step #4: Select the class label for every box you drew.
- Step #5: Export the annotations in the required format (COCO JSON, YOLO, etc.)
Free Image Annotation Tools
We tested the top free software tools for image annotation tasks. Here is which image annotation tool you should use.
The online, web-based image annotation tool MakeSense.AI is free to use under the GPLv3 license. GitHub Stars: 1.8k
- No installation is required; the tool is fully online.
- Makesense.ai supports multiple annotation shapes.
- A good option for beginners, this annotation tool walks the user through the annotation process.
- The annotation tool features a modern interface and new, time-saving add-ons that are appealing for large datasets.
CVAT – Computer Vision Annotation Tool
Developed by Intel researchers, CVAT is a popular open-source tool for image annotation. GitHub Stars: 5.7k
- This annotation tool requires some manual installation as it is based on Github.
- Once it is set up, it provides more tools and features than others, for example, shortcuts and a label shape creator.
- CVAT supports add-ons like TensorFlow Object Detection and Deep Learning Deployment Toolkit.
Written in Python, LabelImg is a popular barebones graphical image annotation tool. GitHub Stars: 14.7k
- The installation is relatively simple and is generally done through a command prompt/terminal.
- The image annotation tool is great for datasets under 10,000 images, as it requires a lot of manual interaction and is made to help annotate datasets for object detection models.
- The simple interface makes it easy to use what makes it a good tool for beginner ML programmers with many well-documented tutorials out there.
Types of Image Annotation
Image annotation is frequently used for image classification, object detection, object recognition, image segmentation, machine learning, and computer vision models. It is the technique used to create reliable datasets for the models to train on and thus is useful for supervised and semi-supervised machine learning models.
For more information on the distinction between supervised and unsupervised machine learning models, we recommend Introduction to Semi-Supervised Machine Learning Models and Self-Supervised Learning: What It Is, Examples and Methods for Computer Vision. In those articles, we discuss their differences and why some models require annotated datasets while others don’t.
The purposes of image annotation (image classification, object detection, etc.) require different techniques of image annotation in order to develop effective datasets.
1. Image Classification
Image classification is a type of machine learning model that requires images to have a single label to identify the entire image. The image annotation process for image classification models aims at recognizing the presence of similar objects in images of the dataset.
It is used to train an AI model to identify an object in an unlabeled image that looks similar to classes in annotated images that were used to train the model. Training images for image classification is also referred to as tagging. Thus, image classification aims to simply identify the presence of a particular object and name its predefined class.
An example of an image classification model is where different animals are “detected” within input images. In this example, the annotator would be provided with a set of images of different animals and asked to classify each image with a label based on the specific animal species. The animal species, in this case, would be the class, and the image is the input.
Providing the annotated images as data to a computer vision model trains the model for the unique visual characteristic of each type of animal. Thereby, the model would be able to classify new unannotated animal images into the relevant species.
2. Object Detection and Object Recognition
Object detection or recognition models take image classification one step further to find the presence, location, and the number of objects in an image. For this type of model, the image annotation process required boundaries to be drawn around every detected object in each image, allowing us to locate the exact position and number of objects present in an image. Therefore, the main difference is that classes are detected within an image rather than the entire image being classified as one class (Image Classification).
The class location is a parameter in addition to the class, whereas in image classification, the class location within the image is irrelevant because the entire image is identified as one class. Objects can be annotated within an image using labels such as bounding boxes or polygons.
One of the most common examples of object detection is people detection. It requires the computing device to continuously analyze frames to identify specific object features and recognize present objects as persons. Object detection can also be used to detect any anomaly by tracking the change in the features over a certain period of time.
3. Image Segmentation
Image segmentation is a type of image annotation that involves partitioning an image into multiple segments. Image segmentation is used to locate objects and boundaries (lines, curves, etc.) in images. It is performed at the pixel level, allocating each pixel within an image to a specific object or class. It is used for projects requiring higher accuracy in classifying inputs.
Image segmentation is further divided into the following three classes:
- Semantic segmentation depicts boundaries between similar objects. This method is used when great precision regarding the presence, location, and size or shape of the objects within an image is needed.
- Instance segmentation identifies the presence, location, number, and size or shape of the objects within an image. Therefore, instance segmentation helps to label every single object’s presence within an image.
- Panoptic segmentation combines both semantic and instance segmentation. Accordingly, panoptic segmentation provides data labeled for background (semantic segmentation) and the object (instance segmentation) within an image.
4. Boundary Recognition
This type of image annotation identifies lines or boundaries of objects within an image. Boundaries may include the edges of a particular object or regions of topography present in the image.
Once an image is properly annotated, it can be used to identify similar patterns in unannotated images. Boundary recognition plays a significant role in the safe operation of self-driving cars.
In image annotation, different annotation shapes are used to annotate an image based on the selected technique. In addition to shapes, annotation techniques like lines, splines, and landmarking can also be used for image annotation.
The following are popular image annotation techniques that are used based on the use case.
1. Bounding Boxes
The bounding box is the most commonly used annotation shape in computer vision. Bounding boxes are rectangular boxes used to define the location of the object within an image. They can be either two-dimensional (2D) or three-dimensional (3D).
Polygons are used to annotate irregular objects within an image. These are used to mark each of the vertices of the intended object and annotate its edges.
This is used to identify fundamental points of interest within an image. Such points are referred to as landmarks or key points. Landmarking is significant in face recognition.
4. Lines and Splines
Lines and splines annotate the image with straight or curved lines. This is significant for boundary recognition to annotate sidewalks, road marks, and other boundary indicators.
Image annotation is the task of annotating an image with data labels. The annotation task usually involves manual work with computer-assisted help.
Image annotation tools such as the popular Computer Vision Annotation Tool CVAT help to provide information about an image that can be used to train computer vision models.
If you want to learn more about Computer Vision, I recommend reading the following articles: