The computer vision annotation tool CVAT provides a powerful solution for image annotation in computer vision. Computational vision is the research field that uses machines to collect and analyze images and videos to extract information from processed visual data. Modern vision systems use algorithms based on machine learning, deep learning especially, that need to be trained on images annotated by humans (supervised learning). CVAT is a tool to create image and video annotations efficiently.
This article will cover the following topics:
- What is CVAT, and who provides it?
- Review and key features of CVAT
- How to use the computer vision annotation tool?
- Semi-automatic Image Annotation features and AI tools
What is CVAT?
Who developed CVAT?
CVAT is being developed and used by Intel for computer vision image annotation. It is developed based on feedback from professional data annotation teams to make image annotation more streamlined for supervised problems in machine learning.
For training deep neural networks that are the core of AI vision, data scientists and computer vision professionals depend on a large amount of annotated data. Intel originally developed CVAT for internal use to provide a better method for large-scale image annotation of thousands of images. This annotation process is very laborious and takes hundreds or thousands of hours. Therefore, the CVAT tool was designed to accelerate and the process of annotating videos and images for use in training computer vision algorithms. CVAT provides automatic labeling and semi-automated image annotation to speed up the annotation process and expedite annotation services (more about this later).
Where can I try CVAT?
CVAT is free and can be hosted as a web-based online annotation tool. You can try it online on cvat.org without downloading any dependencies or packages for free. The online CVAT demo is limited to 500Mb and 10 tasks per user. Also, the installation analytics are disabled.
What is Image Annotation?
The training of deep learning models, for example, for object detection and object recognition, requires extensive image collections with ground truth labels. Image annotation is the process of creating those labels on images from a dataset that can be used for model training (supervised learning). Those labels provide information about the object classes present in each image and their shape, locations, and additional attributes such as pose.
To learn more about image annotation and how it works, check out our article:
What is Image Annotation? (Guide).
What is an image annotation tool?
Image annotation tools such as CVAT facilitate the creation of images or video frames by creating workflows, managing classes, and providing shapes (rectangles, polygons, etc.) to indicate the exact location of classes. Such tools for annotation can be run on a local computer or as web-based annotation tools that allow collaboration between team members.
How to annotate images faster
Image annotation to develop and train algorithms is a long and time-consuming process that can be very costly. Therefore, it shouldn’t be the AI engineers who annotate images but either an internal annotation team or an external image annotation company.
- Image annotation services are provided by specialized companies that coordinate a workforce of qualified people and set up workflows to annotate images fast. Annotation services are costly but provide sound quality that will impact the algorithm’s accuracy.
- Outsourcing companies provide the workforce to annotate images quickly using the tools that are provided to them. This way is comparably cost-efficient, but the quality may not be sufficient if the annotators were not instructed well enough.
- Tools for internal data annotation like CVAT to efficiently annotate images and speed up the process. The software tool was developed to quickly assign new tasks and manage the work process. It’s easy to balance the price and quality of the work.
CVAT Software Review
The CVAT interface makes the application remarkably easy to use for beginners and experts. The image and video annotation software can be used entirely web-based without the need to install a local client. It supports work scenarios for both individuals and teams. Compared to other image annotation tools, CVAT provides many features (semi-automatic annotation, 3D annotation, key frame interpolation, etc.) but is still very intuitive to use.
Advantages of CVAT
- Advantage #1: CVAT is web-based; there is no installation of an application needed to annotate data.
- Advantage #2: Users can collaborate and create a public task to split the work between other users.
- Advantage #3: Automatic annotation in CVAT allows users to employ interpolation between keyframes.
- Advantage #5: CVAT is suitable for integration into computer vision platforms, for example, Viso Suite.
Limitations of CVAT
- Limitation #1: Limited browser support of CVAT requires the use of Google Chrome.
- Limitation #2: Lack of source code documentation can make it challenging to understand the tool’s inner workings.
- Limitation #3: Testing checks have to be done manually, slowing the development process.
Key Features of CVAT
Use the integrated features for typical annotation asks such as automation. The most important automation tools are “copy and propagate” objects, interpolation, automatic annotation using the TensorFlow Object Detection API or other, visual settings shortcuts, filters, and more.
CVAT can be used to interpolate bounding boxes and attributes between multiple key frames. This is used to automatically annotate a set of images, for example to not draw the same bounding box multiple times.
Attribute annotation mode
The attribute annotation mode of CVAT is optimized for image classification. It speeds up the process of attribute annotation by focusing on just one exact attribute.
This mode is used for annotation with polygons for semantic segmentation and instance segmentation. Optimized visual settings help to facilitate the annotation work.
Annotation import and export
In CVAT, you can upload annotations or dump annotations (download). There are multiple annotation formats to choose from; the formats below are supported for import and export:
- CVAT for images (annotation)
- CVAT for a video (interpolation)
- Datumaro (only export)
- PASCAL VOC
- Segmentation masks from PASCAL VOC
- MS COCO Object Detection
- LabelMe 3.0
- WIDER Face
What annotation shapes are available in CVAT?
CVAT offers the following shapes which to annotate images:
- Rectangle or Bounding box
- Cuboid in 3d task
Use cases of CVAT
In the past 10 years, artificial neural networks (ANN) have shown great success in computer vision applications. The use of neural network-based solutions for computational vision depend on visual data (pictures, photos, videos, deep maps) to train an AI algorithm for image recognition tasks. When AI engineers develop neural network algorithms, they often face the problem of insufficient reliable training data that is used as ground truth examples for model training. The amount of such data influences the prediction quality of the algorithm.
CVAT Medical Image Annotation Tool
Since AI is a significant technology in medicine, especially in times of the COVID-19 pandemic. There is a high demand for image annotation in medical use cases. CVAT is one of few image annotation tools to label DICOM data (Digital Imaging and Communication in Medicine), a standard to store medical images and data in .dcm files. Hence CVAT is an alternative to simple annotation tools such as md.ai or complex solutions with a lot of features for data annotation that come with restrictions for commercial use (medseg.ai).
While CVAT originally has not been developed to support the .dcm format, it is possible to use CVAT to annotate medical images. Its quite challenging since DICOM data may contain complex data with different content, such as CT (computed tomography), CR (computed radiography), LEN (lensometry), MR (magnetic-resonance therapy), and others, with a huge number of different attributes or tags specified. Some medical imaginary data could include multiple images (slices) that often cannot be interpreted as regular pixels since they are defined as physical values measured by a certain device.
The CVAT development team at Intel used the Python module of a library to convert DICOM files to regular images. Find a complete tutorial on how to use CVAT for medical image annotation here.
How data annotation with CVAT works
- Step #1: Create an annotation task by providing the name, specify the labels using the constructor to enter the label, and set the color. Find more details here.
- Step #2: Provide the files (bulk images or video) loaded from a local computer, from your network from a connected file share, or a remote source via URL.
- Step #3: Create and open the task, select a job link in the jobs list. Next, choose the correct section for your task type and start annotating using the annotation shapes bounding box, polygon, etc.
- Step #4: To download the annotations (dump annotation), save your changes first and select “Export task dataset” from the menu. Select the dump annotation format to start the download. Find more here.
For a detailed step-by-step guide, check out the official documentation here.
Semi-Automatic Image Annotation Tools in CVAT
CVAT is optimized for semi-automatic and automatic image annotation with deep learning models. The use of AI tools requires that corresponding models are available in the models’ section.
Create polygons semi-automatically with interactors. The interaction uses a deep learning model to get a mask for an object using positive points and negative points to determine the shape of the polygon (positive points are those related to the object). After placing the required number of points (depending on the model), the request is sent to the server to create a polygon. The created polygon can be adjusted by manually setting or removing points.
Deep Extreme Cut (DEXTR)
The deep extreme cut (DEXTR) model uses the information about extreme points of an object to get its mask that is then converted to a polygon. On CPU, this is the fastest interactor.
Inside-outside guidance is a model that uses a bounding box and points (inside/outside) to create a mask and create the polygon. Create the automated annotation with a bounding box that wraps the object. Set positive and negative points to tell the model where the object is and where the background is.
Automatic Image Annotation Tools in CVAT
There are different ways for automation in image annotation with CVAT. The two prominent use cases involve 1) preliminary annotations for multiple images or 2) model-based annotations in one image frame.
Create preliminary annotations for tasks
Automatic image annotation uses deep learning models to create preliminary annotations and speed up the annotation process. In CVAT, primary AI models or uploaded manually can be used and managed from the models’ section.
Automated annotation in one image frame
Detectors are used to automatically annotate image frame data with deep learning models that support specific labels. CVAT supports the automated detection of objects. Select the DL model, match the model’s labels with the labels in your task and click annotate.
Read more on how to use automated image annotation tasks with CVAT here.
CVAT provides a free and easy-to-use image and video annotation tool for regular and commercial use. Read more about other topics related to computer vision, machine learning, deep learning, and AI.