AI in Autonomous Underwater Vehicles (AUVs): A Deep Dive

Build, deploy, operate computer vision at scale

One platform for all use cases
Connect all your cameras
Flexible for your needs

Autonomous underwater vehicles (AUVs) are unmanned underwater robots controlled by an operator or pre-programmed to explore different waters autonomously. These robots are usually equipped with cameras, sonars, and depth sensors, allowing them to autonomously navigate and collect valuable data in challenging underwater environments. Unlike remotely operated vehicles (ROVs), AUVs do not require continuous input from operators, and with the development of AI, those vehicles are more capable than ever. AI has enabled AUVs to navigate complex underwater environments, make intelligent decisions, and perform various tasks with minimal human intervention.

In this article, we’ll delve into AI in AUVs. We’ll explore the key AI technologies that enable them, examine real-world applications, and a hands-on tutorial for obstacle detection.

About us: Viso Suite is the end-to-end platform for building, deploying, and scaling visual AI. It makes it possible for enterprise teams to implement AI solutions like people tracking, defect detection, and intrusion alerting seamlessly into their business processes. To learn more about Viso Suite, book a demo with our team.

Viso Suite for the full computer vision lifecycle without any code — Viso Suite is the only end-to-end computer vision platform

AI Technologies for Autonomous Underwater Vehicles (AUVs)

Artificial intelligence (AI) and machine learning (ML) have been transforming various industries including autonomous vehicles. Whether it is self-driving cars or AUVs, AI technologies like computer vision (CV), provide abilities that take those ideas to reality. CV is a field of AI that enables machines to understand through vision. There are multiple ways a machine can “see”, this includes techniques like depth estimation, object detection, recognition, and scene understanding. This section will explore the AI technologies engineers use for autonomous underwater vehicles.

Computer vision (CV)

Computer Vision is one of the main AI applications in AUVs. There are a lot of things to consider with underwater vision, it is a challenging task and there are several factors that can affect this vision. Underwater, objects are less visible because of lower levels of natural illumination because light travels differently underwater. So, high-quality cameras capable of capturing clear images in low-light conditions are a requirement for effective computer vision. Furthermore, the depth level not only affects the vision but also affects the hardware. Deep waters have high pressures and equipment must be able to withstand that.

Autonomous Underwater Vehicles Challange — Some Scenes underwater show the varying challenges. Source.

With these challenges solved AI can start analyzing footage and doing a wide range of tasks. Following are some computer vision tasks autonomous underwater vehicles perform.

Object Detection
Image classification
Underwater Mapping

Most popular underwater object detection models utilize regular models like YOLOv8. Those models are based on convolutional neural networks (CNNs) which are a popular type of artificial neural networks (ANNs) that work great for vision tasks like classification and detection. However, researchers fine-tune those machine learning models and come up with variations that work better for underwater object detection tasks. Some modifications include adding a cross-stage multi-branch (CSMB) module and a large kernel spatial pyramid (LKSP) module.

Autonomous Underwater Vehicles Object Detection — The Architecture of UODN, an underwater object detection model. Source.

Other tasks include underwater mapping, where autonomous underwater vehicles (AUVs) play a crucial role. AUVs enable the creation of detailed 3D maps of the ocean floor and underwater structures. This process often involves combining computer vision techniques with other sensor data, such as sonar and depth sensors. Depth estimation can be used to generate depth maps, that can be used for 3D reconstruction and mapping.

Navigation and Path Planning

Artificial intelligence becomes particularly useful for tasks like navigation and path planning. The underwater environment, especially at high depths, puts forward various challenges. Those challenges include poor communication making it hard for a ground operator to navigate the waters accurately. Additionally, underwater environments are different, making adaptability a key to navigating correctly. This includes always taking the best path for energy consumption and mission goals. AI enables these capabilities by providing algorithms and techniques that allow AUVs to adapt to dynamic conditions and make intelligent decisions.

Autonomous Underwater Vehicles Navigation — 3D Model of Underwater environment for AUVs. Source.

Before an autonomous underwater vehicle can navigate the environment, it needs to understand its surroundings. This is usually done with environment modeling, using techniques like underwater mapping. This model as seen above includes obstacles, currents, and other relevant features of the environment. Once the environment is modeled, the AUV needs to plan a path from its starting point to its destination depending on factors like energy consumption and the mission goal. This requires using a variety of path-learning algorithms to optimize against certain criteria. Following are some of those algorithms.

A* Search
Dijkstra’s Algorithm
Reinforcement Learning
Neural Networks
Swarm Intelligence
Genetic Algorithms

Autonomous Underwater Vehicles Path planning — A comparison of different optimization methods for path planning algorithms. Source.

Each algorithm can benefit path planning differently. For example, neural networks are a great way to optimize for adaptability, by learning complex relationships between sensor data and optimal control actions. Swarm intelligence is especially useful for multiple AUVs sharing data for cooperative tasks. Researchers also use more classical algorithms like A* and Dijkstra’s. They work by finding the most optimal path depending on the goal, which is great for environments with well-defined obstacles.

Underwater Mapping

Underwater mapping can be undertaken from different platforms, such as ships, autonomous underwater vehicles, or even low-wing aircraft. The vehicle must be equipped with devices like sonars, sensors, cameras, and more. The data from these devices can then be turned into maps, AI techniques can be used to enhance the map and accelerate its creation in several ways.

Occupancy Grids
Depth Estimation
Oceanographic Data Integration

Autonomous Underwater Vehicles Underwater Mapping — Underwater maps generated by sonar and sensor data. Source.

As seen in the image above, accurate depth maps can be created by using sensor and sonar data. AI algorithms can process this data to update the occupancy grid and provide a representation of the obstacles in the environment. Combined with deep learning techniques like depth estimation and 3D reconstruction, this data can be further used to create detailed maps of the underwater environment. Plus, makes the map highly customizable and adaptable.

For example, researchers might add additional data to the mapping process like current forecasts, water temperatures, and wave speeds and lengths. Underwater mapping is an essential task to understand the environment under the oceans and seas, it can help with path planning, but it can also help with things like tsunami risk assessments. Let’s explore more applications of AUVs in the next section.

Applications of AI-Powered Autonomous Underwater Vehicles

AI-powered AUVs are essential to many applications in the water. This section will explore some of the most impactful ways AI AUVs are being used in industries and underwater research.

Oceanographic Research

AI-powered AUVs are crucial for oceanographic research, they provide a more autonomous and efficient way to collect and analyze vast amounts of data from the ocean. AI can analyze the data from sensors to measure parameters like temperature, salinity, currents, and even the presence of specific marine organisms. The transformation is the ability of modern AI algorithms to analyze and provide insights into this data in real time.

Autonomous Underwater Vehicles Oceanography — Autonomous vehicles collect and analyze data. Source.

The ocean is a very vast and complex environment, research has only discovered and studied 5% of the oceans. However, current developments in AI are enabling capable underwater autonomous vehicles, that facilitate the discovery and research accelerating it towards the future. Additionally, the AI data analysis helps identify subtle changes in ocean currents, track the movement of schools of fish, or even identify potential sites for underwater geological formations.

Environmental Monitoring

Environmental monitoring is another area where AI-powered AUVs are making a significant impact. Researchers are deploying them to monitor the health of underwater ecosystems, assess pollution levels, or even inspect underwater infrastructure. It could identify signs of coral bleaching, detect the presence of invasive species, or even monitor changes in water quality that might threaten the reef’s health. Engineers can also adapt the vehicle structure, mimicking the biology and physics of fish, making it last more in the environment and adapt to it to gather accurate data.

In another scenario, an AI-powered AUV could be used to inspect underwater pipelines or cables, identifying signs of corrosion, damage, or potential leaks. This type of proactive monitoring can help prevent costly repairs or even environmental disasters.

Underwater Archaeology

Underwater archaeology is an interesting field that often involves exploring and documenting shipwrecks, ancient ruins, or other historical sites hidden under the seas. AI-powered AUVs are providing new tools for archaeologists to investigate these sites without disturbing them. Autonomous under water vehicles are also used to create 3D models and structures for those sites. AI algorithms can analyze the data collected from images and sensors to identify potential artifacts or reconstruct the ship back allowing us to create interesting simulations.

Light AUVs are a popular tool archaeologists use to explore and understand historical sites underwater. Those sites are usually fragile and can be hard to navigate, but LAUVs provide a non-invasive approach. This non-invasive approach not only helps preserve delicate underwater sites but also allows for a better and more comprehensive view with techniques like lighting correction.

Those are only a few interesting applications of AUVs in research and engineering, but there are many more. Furthermore, in the next section, we will explore a step-by-step tutorial to build an obstacle detection model.

Hands-on Tutorial: Underwater Object Detection For Autonomous Underwater Vehicles

The autonomous mechanism in AUVs primarily uses reinforcement learning, and computer vision combined with hardware like sensors and cameras. Those vehicles are usually sent on missions, which can include looking for something specific, like inspecting submarines for damages or looking for a particular species in the ocean. Almost any mission goal can use object detection capabilities to increase efficiency. This tutorial will use object detection to look for waste plastic underwater.

Collecting The Data

For this tutorial, we will use Kaggle, Python, and YOLOv5. Kaggle will provide the space to collect data, process it, and train the model. Kaggle also contains a wide collection of datasets to use for autonomous underwater vehicles. However, since our mission object is to detect and find waste we will use one specific dataset here. Our first step is to start the Kaggle notebook and load the specified dataset into it. Then we can import the libraries we need.

import os
import yaml
import matplotlib.pyplot as plt
import matplotlib.patches as patches
from PIL import Image

Now let’s look at what kind of objects are included in this dataset. Those are called classes, and we can find them in the “data.yaml” file. The following Python code defines the path, finds the “data.yaml” file, and prints all the classes.

dataset_path = "/kaggle/input/underwater-plastic-pollution-detection/underwater_plastics"
with open(os.path.join(dataset_path, "data.yaml"), 'r') as f:
    data = yaml.safe_load(f)
class_list = data['names']
print("Classes in the dataset:", class_list)

This dataset includes the following classes: [‘Mask’, ‘can’, ‘cellphone’, ‘electronics’, ‘gbottle’, ‘glove’, ‘metal’, ‘misc’, ‘net’, ‘pbag’, ‘pbottle’, ‘plastic’, ‘rod’, ‘sunglasses’, ‘tire’], however, we will not need all of them so in the next section we will process it and take the classes we need. But first, let’s look at some samples from this dataset. The following code will show the defined number of samples from the dataset, also showing the annotated bounding boxes.

def visualize_samples(dataset_path, num_samples=10):

    with open(os.path.join(dataset_path, "data.yaml"), 'r') as f:
        data = yaml.safe_load(f)
    class_list = data['names']
    image_dir = os.path.join(dataset_path, "train", "images")
    label_dir = os.path.join(dataset_path, "train", "labels")
    for i in range(num_samples):
        image_file = os.listdir(image_dir)[i]
        label_file = image_file[:-4] + ".txt"
        image_path = os.path.join(image_dir, image_file)
        label_path = os.path.join(label_dir, label_file)
        img = Image.open(image_path)
        fig, ax = plt.subplots(1)
        ax.imshow(img)
        with open(label_path, 'r') as f:
            lines = f.readlines()
        for line in lines:
            class_id, x_center, y_center, width, height = map(float, line.strip().split())
            class_name = class_list[int(class_id)]
            x_min = (x_center - width / 2) * img.width
            y_min = (y_center - height / 2) * img.height
            bbox_width = width * img.width
            bbox_height = height * img.height
            rect = patches.Rectangle((x_min, y_min), bbox_width, bbox_height, linewidth=1, edgecolor='r', facecolor='none')
            ax.add_patch(rect)
            ax.text(x_min, y_min, class_name, color='r')
        plt.show()
visualize_samples(dataset_path)

Following are some samples of classes we are interested in.

As mentioned previously, it is better to take only the needed classes from the dataset, so for the mission goal the following classes seem to be the most relevant: [“can”, “cellphone”, “net”, “pbag”, “pbottle”, ‘Mask’, “tire”]. Next, let’s process this data to extract the classes we need.

Data Processing

In this section, we will take a list of classes from the dataset to use later in training the YOLOv5 model. The mission goal is to detect trash and waste for removal. The dataset we have has many classes but we only want a handful of those. With Python, we can extract the needed classes and organize them in a new folder. For this, I have prepared a simple Python function that will take a dataset and extract the needed classes into a new output folder.

import os
import shutil
import yaml
from pathlib import Path
from tqdm import tqdm
def extract_classes(dataset_path, classes_to_extract, output_dir):
    """
    Extracts specified classes from the dataset into a new dataset.
    
    Args:
        dataset_path (str): Path to the dataset directory
        classes_to_extract (list): List of class names to extract
        output_dir (str): Path to the output directory for the new dataset
    """
    dataset_path = Path(dataset_path)
    output_dir = Path(output_dir)
    
    # Read class names from yaml
    try:
        with open(dataset_path / "data.yaml", 'r') as f:
            data = yaml.safe_load(f)
        class_list = data['names']
        
        # Get indices of classes to extract
        class_indices = {class_list.index(class_name) for class_name in classes_to_extract 
                        if class_name in class_list}
        
        if not class_indices:
            raise ValueError(f"None of the specified classes {classes_to_extract} found in dataset")
            
    except FileNotFoundError:
        raise FileNotFoundError(f"Could not find data.yaml in {dataset_path}")
    except KeyError:
        raise KeyError("data.yaml does not contain 'names' field")
    
    # Create output structure
    output_dir.mkdir(parents=True, exist_ok=True)
    
    # Copy data.yaml with only extracted classes
    new_data = data.copy()
    new_data['names'] = classes_to_extract
    with open(output_dir / "data.yaml", 'w') as f:
        yaml.dump(new_data, f)
    
    # Process each split
    for split in ['train', 'valid', 'test']:
        split_dir = dataset_path / split
        if not split_dir.exists():
            print(f"Warning: {split} directory not found, skipping...")
            continue
            
        # Create output directories for this split
        out_split = output_dir / split
        out_images = out_split / 'images'
        out_labels = out_split / 'labels'
        out_images.mkdir(parents=True, exist_ok=True)
        out_labels.mkdir(parents=True, exist_ok=True)
        
        # Process label files first to identify needed images
        label_files = list((split_dir / 'labels').glob('*.txt'))
        needed_images = set()
        
        print(f"Processing {split} split...")
        for label_path in tqdm(label_files):
            keep_file = False
            new_lines = []
            
            try:
                with open(label_path, 'r') as f:
                    lines = f.readlines()
                
                for line in lines:
                    parts = line.strip().split()
                    if not parts:
                        continue
                    class_id = int(parts[0])
                    if class_id in class_indices:
                        # Remap class ID to new index
                        new_class_id = list(class_indices).index(class_id)
                        new_lines.append(f"{new_class_id} {' '.join(parts[1:])}\n")
                        keep_file = True
                
                if keep_file:
                    needed_images.add(label_path.stem)
                    # Write new label file
                    with open(out_labels / label_path.name, 'w') as f:
                        f.writelines(new_lines)
                        
            except Exception as e:
                print(f"Error processing {label_path}: {str(e)}")
                continue
        
        # Copy only the images we need
        image_dir = split_dir / 'images'
        if not image_dir.exists():
            print(f"Warning: images directory not found for {split}")
            continue
            
        for img_path in image_dir.glob('*'):
            if img_path.stem in needed_images:
                try:
                    shutil.copy2(img_path, out_images / img_path.name)
                except Exception as e:
                    print(f"Error copying {img_path}: {str(e)}")
    
    print("Extraction complete!")
    
    # Print statistics
    print("\nDataset statistics:")
    for split in ['train', 'valid', 'test']:
        if (output_dir / split).exists():
            n_images = len(list((output_dir / split / 'images').glob('*')))
            print(f"{split}: {n_images} images")

Deep learning datasets usually split the data into 3 folders, those are training, testing, and validation, in the code above we go to each of those folders, find the images folder and the labels folder, and extract the images with the labels we want. Now, we can use this code by calling it as follows.

classes_to_extract = ["can", "cellphone", "net", "pbag", "pbottle", 'Mask', "tire"]
output_dir = "/kaggle/working/extracted_dataset"
extract_classes(dataset_path, classes_to_extract, output_dir)

Now we are ready to use the extracted dataset to train a YOLOv5 model in the next section.

Train Model

We will first start by downloading the YOLOv5 repository and install the needed libraries.

!git clone https://github.com/ultralytics/yolov5
!pip install -r yolov5/requirements.txt

Now we can import the installed libraries.

import os
import yaml
from pathlib import Path
import shutil
import torch
from PIL import Image
from tqdm import tqdm

Lastly, before starting the training script let’s prepare our data to match the model requirements like resizing the images, updating the data configuration file “data.yaml” and defining a few important parameters for the YOLOv5 model.

DATASET_PATH = Path("/kaggle/working/extracted_dataset")
IMG_SIZE = 640
BATCH_SIZE = 16
EPOCHS = 50
# Read original data.yaml
with open(DATASET_PATH / 'data.yaml', 'r') as f:
    data = yaml.safe_load(f)
# Create new YAML configuration
train_path = str(DATASET_PATH / 'train')
val_path = str(DATASET_PATH / 'valid')
nc = len(data['names'])  # number of classes
names = data['names']    # class names
yaml_content = {
    'path': str(DATASET_PATH),
    'train': train_path,
    'val': val_path,
    'nc': nc,
    'names': names
}
# Save the YAML file
yaml_path = DATASET_PATH / 'dataset.yaml'
with open(yaml_path, 'w') as f:
    yaml.dump(yaml_content, f, sort_keys=False)
print(f"Created dataset config at {yaml_path}")
print(f"Number of classes: {nc}")
print(f"Classes: {names}")

Great! Now we can use the train.py script we got by downloading the YOLOv5 repository to train the model. However, this is not training from scratch, as that would need extensive time and resources, we will use a pre-trained checkpoint which is the YOLOv5s (small) this model is efficient and will be practical to install on a trash collection autonomous underwater vehicle. Additionally, we have defined the number of epochs the model will train for. Following is how we would use the defined parameters with the training script.

!python train.py \
    --img {IMG_SIZE} \
    --batch {BATCH_SIZE} \
    --epochs {EPOCHS} \
    --data {yaml_path} \
    --weights yolov5s.pt \
    --workers 4 \
    --cache

This process will take around 30 minutes to complete 50 epochs, this can be reduced but might provide less accurate results. After the training, we can infer our trained model with a few examples online to test the model on images different from what exists in the dataset. The following code loads the trained model.

from ultralytics import YOLO
# Load a model
model = YOLO("/kaggle/working/yolov5/runs/train/exp2/weights/best.pt")

Next, Let’s try it out!

results = model("/kaggle/input/test-AUVs_underwater_pollution/image.jpg")
results[0].show()

Following are some results.

Autonomous Underwater Vehicles YOLOv5 results — Results.

The Future Of Autonomous Underwater Vehicles

The advancements in AI and AUVs have opened up new possibilities for underwater exploration and research. AI algorithms are enabling AUVs to become more intelligent and capable of operating with greater autonomy. This is particularly important in underwater environments where communication is limited and the ability to adapt to dynamic conditions is crucial. Furthermore, the future of AUVs is promising, with potential applications in various fields.

In oceanographic research, AI-powered AUVs can explore vast and uncharted places, collecting valuable data and providing insights into the mysteries of our oceans. In environmental monitoring, AUVs can play a crucial role in assessing pollution levels, monitoring underwater ecosystems, and protecting marine biodiversity. Moreover, AUVs can be used for underwater infrastructure inspection, such as pipelines, cables, and offshore platforms, ensuring their integrity and preventing potential hazards.

As AI technology continues to advance, we can expect AUVs to become even more sophisticated and capable of performing complex tasks with minimal human intervention. This will not only expand their applications in research and industry but also open up new possibilities for underwater exploration and discovery.

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
elementor	never	This cookie is used by the website's WordPress theme. It allows the website owner to implement or change the website's content in real-time.
JSESSIONID	session	The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
ZCAMPAIGN_CSRF_TOKEN	session	This cookie is used to distinguish between humans and bots.
zfccn	session	Zoho sets this cookie for website security when a request is sent to campaigns.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gat_gtag_UA_177371481_2	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
zabUserId	1 year	This cookie is set by Zoho and identifies whether users are returning or visiting the website for the first time
zabVisitId	one year	Used for identifying returning visits of users to the webpage.
zft-sdc	24hours	It records data about the user's navigation and behavior on the website. This is used to compile statistical reports and heat maps to improve the website experience.
zps-tgr-dts	1 year	These cookies are used to measure and analyze the traffic of this website and expire in 1 year.

Cookie	Duration	Description
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.

Cookie	Duration	Description
2d719b1dd3	session	This cookie has not yet been given a description. Our team is working to provide more information.
4662279173	session	This cookie is used by Zoho Page Sense to improve the user experience.
ad2d102645	session	This cookie has not yet been given a description. Our team is working to provide more information.
zc_consent	1 year	No description available.
zc_show	1 year	No description available.
zsc2feeae1d12f14395b6d5128904ae3746	1 minute	This cookie has not yet been given a description. Our team is working to provide more information.