What is Image Annotation? (Easy-to-understand Guide)

Image annotation

Hi, we are viso.ai from Switzerland. We power a no-code computer vision platform. Thank you for reading our blog.

Need Computer Vision?

Viso Suite is an all-in-one solution for organizations to build computer vision apps without coding. Learn more.

Image annotation plays a significant role in computer vision, the technology that allows computers to gain high-level understanding from digital images or videos and to see and interpret visual information just like humans.

Computer vision technology provides several astonishing AI applications like self-driving cars, tumor detection, and unmanned aerial vehicles. However, most of these remarkable applications of computer vision would be possible without image annotation.

Image annotation is a primary step in the creation of most computer vision models. It is necessary for datasets to be useful components of machine learning and deep learning for computer vision.

This article provides a detailed dive into the purpose of image annotation, types of image annotation, and techniques used to make image annotation possible.

In particular, this article will discuss:

  1. Definition of image annotation. What is image annotation, and why is it needed?
  2. Process of annotating images. How to successfully annotate an image dataset.
  3. Types of image annotation. Popular algorithms and distinct strategies for annotating images.

What is Image Annotation?

Image annotation is the process of labeling images of a dataset to train a machine learning model. Therefore, image annotation is used to label the features you need your system to recognize. Training an ML model with labeled data is called Supervised Learning.

The annotation task usually involves manual work, sometimes with computer-assisted help. A Machine Learning engineer predetermines the labels, known as “classes”, and provides the image-specific information to the computer vision model. After the model is trained and deployed, it will predict and recognize those predetermined features in new images that have not been annotated yet.

Popular annotated image datasets are the Microsoft COCO Dataset (Common Objects in Context) with 2.5 million labeled instances in 328k images, and Google’s OID (Open Images Database) dataset with approximately 9 million pre-annotated images.


MS Coco sample image segmentation
An annotated image of the MS COCO Dataset
Why is Image Annotation needed?

Labeling images is necessary for functional datasets because it lets the training model know what the important parts of the image are (classes) so that it can later use those notes to identify those classes in new, never-before-seen images.

Object Detection Example with YOLO
Example with detected 3 classes “bicycle”, “dog” and “truck” of the 80 pre-trained COCO classes with the real-time algorithm YOLO.
Video Annotation

Video annotation is based on the concept of image annotation. For video annotation, features are manually labeled on every video frame (image) to train a machine learning model for video detection. Hence, the dataset for a video detection model is comprised of images for the individual video frames.

The video below shows video-based real-time object detection and tracking with deep learning. The application was built on the Computer Vision Platform Viso Suite.

When do I need to annotate images for Computer Vision?

To train and develop computer vision algorithms based on deep neural networks (DNN), data annotation is needed in cases where pre-trained models are not specific or accurate enough.

As mentioned before, there are enormous public image datasets available, with millions of image annotations (COCO, OID, etc.). For common and standardized object detection problems (e.g. person detection), an algorithm that is trained on a massive public dataset ( pre-trained algorithm) provides very good results and the benefits of additional labeling do not justify the high additional costs in those situations.

However, in some situations, image annotation is essential:

  • New Tasks: Hence, image annotation is important when AI is applied to new AI tasks without appropriate annotated data available. For example, in industrial automation, computer vision is frequently applied to detect specific items and their condition.
  • Restricted Data: While there is plenty of data available on the internet, some image data requires a license agreement and its use may be restricted for the development of commercial computer vision products. In some areas such as medical imaging, manual data annotation generally comes with privacy concerns, when sensitive visuals (faces, identifiable attributes, etc.) are involved. Another challenge is the use of images that contains a companies’ intellectual property.

How does Image Annotation work?

To annotate images, you can use any open source or freeware data annotation tool. The Computer Vision Annotation Tool (CVAT) is probably the most popular open-source image annotation tool.

While dealing with a large amount of data, a trained workforce will be required to annotate the images. The annotation tools offer different sets of features to annotate single or multiple frames efficiently.

Labels are applied to the objects using any of the annotation techniques explained below within an image; the number of labels on each image may vary, depending upon the use case.


how to add image annotations
CVAT stands for Computer Vision Annotation Tool. It is a free online video and image annotation tool for computer vision.
How to Annotate Images?
  • Step #1: Prepare your image dataset.
  • Step #2: Specify the class labels of objects to detect.
  • Step #3: In every image, draw a box around the object you want to detect.
  • Step #4: Select the class label for every box you drew.
  • Step #5: Export the annotations in the required format (COCO JSON, YOLO, etc.)

Free Image Annotation Tools

We tested the top free software tools for image annotation tasks. Here is which image annotation tool you should use.


The online, web-based image annotation tool MakeSense.AI is free to use under the GPLv3 license. GitHub Stars: 1.8k

  • No installation is required; the tool is fully online.
  • Makesense.ai supports multiple annotation shapes.
  • A good option for beginners, this annotation tool walks the user through the annotation process.
  • The annotation tool features a modern interface and new, time-saving add-ons that are appealing for large datasets.


Free Image Annotation Tool MakeSense
Free Image Annotation Tool MakeSense.AI – Source
CVAT – Computer Vision Annotation Tool

Developed by Intel researchers, CVAT is a popular open-source tool for image annotation. GitHub Stars: 5.7k

  • This annotation tool requires some manual installation as it is based on Github.
  • Once it is set up, it provides more tools and features than others, for example, shortcuts and a label shape creator.
  • CVAT supports add-ons like TensorFlow Object Detection and Deep Learning Deployment Toolkit.


how to add image annotations
CVAT (Computer Vision Annotation Tool) is an image labeling tool from Intel

Written in Python, LabelImg is a popular barebones graphical image annotation tool. GitHub Stars: 14.7k

  • The installation is relatively simple and is generally done through a command prompt/terminal.
  • The image annotation tool is great for datasets under 10,000 images, as it requires a lot of manual interaction and is made to help annotate datasets for object detection models.
  • The simple interface makes it easy to use what makes it a good tool for beginner ML programmers with many well-documented tutorials out there.


Image Labeling Tool for Computer Vision and Machine Learning
LabelImg is a free tool for labeling – Source

How long does Image Annotation take?

The time needed to annotate images greatly depends on the complexity of the images, the number of objects, the complexity of the annotations (polygon vs. boxes), and the required accuracy and level of detail.

Usually, even image annotation companies have a hard time telling how long image annotation takes before some samples have to be labeled to make an estimation based on the results. But even then, there is no guarantee that the annotation quality and consistency allow precise estimations. While automated image annotation and semi-automated tools help to accelerate the process, there is still a human element required to ensure a consistent quality level (hence “supervised”).

In general, simple objects with fewer control points (window, door, sign, lamp) require far less time to annotate compared to region-based objects with more control points (fork, wineglass, sky). Tools with semi-automatic image annotation and preliminary annotation creation with a deep learning model help to speed up both the annotation quality and speed.

Read our article about CVAT, a tool that provides semi-automatic image annotation features.

Types of Image Annotation

Image annotation is frequently used for image classification, object detection, object recognition, image segmentation, machine learning, and computer vision models. It is the technique used to create reliable datasets for the models to train on and thus is useful for supervised and semi-supervised machine learning models.

For more information on the distinction between supervised and unsupervised machine learning models, we recommend Introduction to Semi-Supervised Machine Learning Models and Self-Supervised Learning: What It Is, Examples and Methods for Computer Vision. In those articles, we discuss their differences and why some models require annotated datasets while others don’t.

The purposes of image annotation (image classification, object detection, etc.) require different techniques of image annotation in order to develop effective datasets.

1. Image Classification

Image classification is a type of machine learning model that requires images to have a single label to identify the entire image. The image annotation process for image classification models aims at recognizing the presence of similar objects in images of the dataset.

It is used to train an AI model to identify an object in an unlabeled image that looks similar to classes in annotated images that were used to train the model. Training images for image classification is also referred to as tagging. Thus, image classification aims to simply identify the presence of a particular object and name its predefined class.

An example of an image classification model is where different animals are “detected” within input images. In this example, the annotator would be provided with a set of images of different animals and asked to classify each image with a label based on the specific animal species. The animal species, in this case, would be the class, and the image is the input.

Providing the annotated images as data to a computer vision model trains the model for the unique visual characteristic of each type of animal. Thereby, the model would be able to classify new unannotated animal images into the relevant species.

2. Object Detection and Object Recognition

Object detection or recognition models take image classification one step further to find the presence, location, and the number of objects in an image. For this type of model, the image annotation process required boundaries to be drawn around every detected object in each image, allowing us to locate the exact position and number of objects present in an image. Therefore, the main difference is that classes are detected within an image rather than the entire image being classified as one class (Image Classification).

The class location is a parameter in addition to the class, whereas in image classification, the class location within the image is irrelevant because the entire image is identified as one class. Objects can be annotated within an image using labels such as bounding boxes or polygons.

One of the most common examples of object detection is people detection. It requires the computing device to continuously analyze frames to identify specific object features and recognize present objects as persons. Object detection can also be used to detect any anomaly by tracking the change in the features over a certain period of time.


People image annotation example
Image annotation to train a people detection model
3. Image Segmentation

Image segmentation is a type of image annotation that involves partitioning an image into multiple segments. Image segmentation is used to locate objects and boundaries (lines, curves, etc.) in images. It is performed at the pixel level, allocating each pixel within an image to a specific object or class. It is used for projects requiring higher accuracy in classifying inputs.

Image segmentation is further divided into the following three classes:

    • Semantic segmentation depicts boundaries between similar objects. This method is used when great precision regarding the presence, location, and size or shape of the objects within an image is needed.
    • Instance segmentation identifies the presence, location, number, and size or shape of the objects within an image. Therefore, instance segmentation helps to label every single object’s presence within an image.
    • Panoptic segmentation combines both semantic and instance segmentation. Accordingly, panoptic segmentation provides data labeled for background (semantic segmentation) and the object (instance segmentation) within an image.
Differences of segmantic segmentation and instance segmentation
Image Segmentation – the difference of semantic segmentation vs. instance segmentation
4. Boundary Recognition

This type of image annotation identifies lines or boundaries of objects within an image. Boundaries may include the edges of a particular object or regions of topography present in the image.

Once an image is properly annotated, it can be used to identify similar patterns in unannotated images. Boundary recognition plays a significant role in the safe operation of self-driving cars.


Image annotation with polygon and rectangle shapes
Image annotation with polygon and rectangle shapes

Annotation Shapes

In image annotation, different annotation shapes are used to annotate an image based on the selected technique. In addition to shapes, annotation techniques like lines, splines, and landmarking can also be used for image annotation.

The following are popular image annotation techniques that are used based on the use case.

1. Bounding Boxes

The bounding box is the most commonly used annotation shape in computer vision. Bounding boxes are rectangular boxes used to define the location of the object within an image. They can be either two-dimensional (2D) or three-dimensional (3D).

2. Polygons

Polygons are used to annotate irregular objects within an image. These are used to mark each of the vertices of the intended object and annotate its edges.

3. Landmarking

This is used to identify fundamental points of interes

t within an image. Such points are referred to as landmarks or key points. Landmarking is significant in face recognition.

4. Lines and Splines

Lines and splines annotate the image with straight or curved lines. This is significant for boundary recognition to annotate sidewalks, road marks, and other boundary indicators.


Image annotation with polylines
Image annotation with polylines

What’s Next

Image annotation is the task of annotating an image with data labels. The annotation task usually involves manual work with computer-assisted help.

Image annotation tools such as the popular Computer Vision Annotation Tool CVAT help to provide information about an image that can be used to train computer vision models.

If you want to learn more about Computer Vision, I recommend reading the following articles:

Related Articles

Join 6,300+ Fellow
AI Enthusiasts

Get expert AI news 2x a month. Subscribe to the most read Computer Vision Blog.

You can unsubscribe anytime. See our privacy policy.

Develop Computer Vision
10x faster with Viso Suite

End-to-end computer vision platform
for businesses to accelerate the
entire application lifecycle.

Schedule a live demo

By clicking “Request Demo” you agree to our Terms of Use and Privacy Policy.

Not interested?

We’re always looking to improve, so please let us know why you are not interested in using Computer Vision with Viso Suite.