Improved Pothole Detection Using YOLOv7
and ESRGAN
Nirmal Kumar Rout, Gyanateet Dutta, Varun Sinha, Arghadeep Dey, Subhrangshu Mukherjee, Gopal Gupta
Abstract— Potholes are common road hazards that is causing damage to vehicles and posing a safety risk to drivers. The introduction of Convolutional
Neural Networks (CNNs) is widely used in the industry for object detection based on Deep Learning methods and has achieved significant progress in
hardware improvement and software implementations. In this paper, a unique better algorithm is proposed to warrant the use of low-resolution cameras or
low-resolution images and video feed for automatic pothole detection using Super Resolution (SR) through Super Resolution Generative Adversarial
Networks (SRGANs). Then we have proceeded to establish a baseline pothole detection performance on low quality and high quality dashcam images using
a You Only Look Once (YOLO) network, namely the YOLOv7 network. We then have illustrated and examined the speed and accuracy gained above the
benchmark after having upscaling implementation on the low quality images.
Index Terms- CNN, Deep Learning, Pothole Detection, YOLOv7, ESRGAN, Transfer Learning.
OTHOLES are a major issue on roads worldwide, causing
damage to vehicles and posing a safety risk to drivers.
Automated pothole detection systems can help to identify and
repair potholes more efficiently, but the use of low-resolution
cameras or low-quality video feed can be a challenge. In this
paper, we propose a novel approach for improving the
performance of pothole detection using low-resolution cameras
or low-quality images and video feed. Our approach involves
using an Enhanced Super Resolution Generative Adversarial
Networks (ESRGAN) [1] to enhance the resolution of low-
quality images and video feed, and then applying the You
Only Look Once(YOLOv7) [2] object detection algorithm to
detect potholes in the enhanced images. We compare the speed
and accuracy of our approach to a baseline pothole detection
system using YOLOv7 on high-quality images and show that it
provides a significant improvement in both areas. We also
demonstrate that our approach can be applied to a range of
different road conditions and pothole types. One of the major
advantages of our approach is its cost-effectiveness. ESRGAN
can be used to improve the resolution of low-quality images
and video feed from low-cost cameras, rather than requiring
the use of high-resolution cameras with expensive sensors.
This can greatly reduce the cost of implementing pothole
detection systems, especially in resource-constrained settings.
To validate the effectiveness of our approach, we conduct a
series of experiments on a medium sized dataset of dash-cam
images and video feed from a variety of international locations
which indicate real life scenarios. Our results show that use of
ESRGAN and YOLOv7 can significantly improve the
performance of pothole detection systems and provide a
reliable solution for detecting potholes in low-resolution
images and video feed. This has the potential to greatly
enhance the efficiency and effectiveness of pothole repair
efforts and improve road safety for drivers worldwide.
Nirmal Kumar Rout is with School of Electronics Engineering, KIIT
University, Bhubaneswar, India. Email: nkrout@kiit.ac.in.
Gyanateet Dutta, Varun Sinha, Subhrangshu Mukherjee, Arghadeep
Dey, Gopal Gupta are with School of Electronics Engineering, KIIT
University, Bhubaneswar, India. E-mail: {1930198, 1930055,
1930053, 1930069, 1930020} @kiit.ac.in.
2 RELATED WORKS
A number of approaches have been proposed in the literature
for automated pothole detection. The earlier approaches [3]
required 3-D equipment which can be very expensive and not
suitable for use for all purposes. These techniques frequently
use image data taken by digital cameras [4, 5] and depth
cameras, thermal technology, and lasers. Recent approaches
rely on machine learning algorithms and deep learning
algorithms for image processing and detect potholes.
Techniques based on Convolutional-neural-networks (CNN)
are widely used for feature extraction of potholes from
images because they can accurately model the non-linear
patterns and perform automatic feature extraction and their
robustness in separating unecessary noise and other image
conditions in road images [6]. Even though, CNNs have been
used in many approaches [7, 8, 9] they are ineffective in
certain scenarios like while detecting objects which are smaller
relative to the image. This can be solved by using high
resolution images for detection but then the computational cost
required for processing is too high, reason being CNNs are
very memory consuming and they also require significantly
high computation time. For addressing this issue, Chen et al.
[10] suggested to using smaller input images or image patches
from HR images for training the network. The first method is a
two-phase system where a localization network (LCNN) is
employed initially for locating frame segment of pothole in the
image and then using a network for classification developed
on part (PCNN) to calculate the classes. A recent study by
Salcedo et al. [11] developed a road maintenance prioritization
system for India using deep learning models such as UNet,
which incorporates ResNet34 as the encoder, EfficientNet, and
YOLOv5 on the Indian driving dataset(IDD). The study by
Silva et al. [11], employed the YOLOv4 to detect damage on
roads on a dataset of images taken from overhead view of an
airborne drone. The study experimentally evaluated the
accuracy and applicability of YOLOv4 in subject to
recognizing highway road damages, and found an accuracy of
95%. The work proposed by Mohammad et al. [12] comprised
of a system of using an edge platform using the AI kit(OAK-D)
on frameworks such as the YOLOv1, YOLOv2, YOLOv3,
YOLOv4, Tiny-YOLOv5, and SSD - mobilenet V2. In the
work Anup et al. [13] proposed a 1D Convolutional Neural