Vision

Overview

An approach to robot perception which tries to minimize resource utilization.

Segmentation

A real-time segmentation model, DSANet, was utilized to efficiently and accurately perform semantic segmentation in outdoor environments.

It enables the robot to interpret its surroundings, distinguishing between areas permitted for movement (marked in green) and obstacles (marked in red).

By integrating depth estimation, a virtual plane can be generated on which the robot is allowed to navigate.

Depth Estimation

To reduce the costs associated with expensive depth cameras and LiDAR hardware, a real-time depth estimation model, FastDepth, was utilized.

The model employs a lightweight and efficient encoder-decoder network architecture, which emphasizes low latency.

It is capable of operating at 60-70 frames per second (fps) on the robot's hardware. However, to ensure smooth execution of other concurrent tasks, it is intentionally limited at 18 fps.

Application Specific Vision Models (ASVMs)

Wave is designed to carry out tasks in various industries, including agriculture, oil and gas and mining. With an initial focus being on monitoring activities.

To address the demands of each industry, specialized vision models need to be developed specifically for each sector. These models do not perform real-time processing onboard. Instead, the camera footage is uploaded and processed in a cloud instance.

One of these models was specifically developed to identify batches of grapes and estimate a range of attributes such as disease, ripeness, weight, and type.

The procedure employs Faster R-CNN, fine-tuned on the WGISD dataset, to generate bounding boxes around potential targets. Additionally, it utilizes a second neural network based on ResNet 34, trained using a multi-task learning approach with hard parameter sharing on a combination of datasets [3], [4], [5], and [6]. This network provides estimates on the specified attributes for each of the bounding boxes.

References

[1] FastDepth: Fast Monocular Depth Estimation on Embedded Systems

[2] DSANet: Dilated Spatial Attention for Real-time Semantic Segmentation in Urban Street Scenes

[3] Embrapa Wine Grape Instance Segmentation Dataset - Embrapa WGISD

[4] GrapesNet: Indian RGB & RGB-D vineyard image datasets for deep learning applications

[5] Grape CS-ML Database

[6] GrapesNet: A grapevine leaves dataset for early detection and classification of esca disease in vineyards through machine learning