Abstract
Nowadays, an image must be verified or analyzed keenly for further
processing methods. Few numbers of images can be analyzed manually for a particular
object or a human being. But when millions and millions of images are present in a
dataset, and every single one of them must be verified and classified based on the
objects present in the image, it is necessary to find an algorithm or a technique to assist
this process. Of many types and applications of object detection, this research aims at
pedestrian detection. Pedestrian detection is a special way that only aims to detect the
human beings in the uploaded image. The Mask R-CNN algorithm is used in pedestrian
detection. The Mask R-CNN model is a Deep Learning (DL) model that can detect a
certain object from an image and can also be used for image augmentation. The Mask
R-CNN stands for Mask Regional Convolutional Neural Network (CNN). This is one
of the classification techniques which can be designed using the Convolutional
Network theories. The Mask R-CNN algorithm is the updated version of the Faster RCNN. The main difference between the faster R-CNN and the Mask R-CNN is that the
mask R-CNN can bind the borders of the object detected while the faster R-CNN uses a
box to identify the object. One of the major advantages of mask R-CNN is that it can
provide high-quality image augmentation and is also one of the fastest image
segmentation algorithms. This model can be easily implemented when compared to
other object detection techniques. Python contains various inbuilt dependencies which
can be installed with the help of repositories. These dependencies are used to design the
model. The model is then trained and tested with various inputs for better accuracy.
Image segmentation is used in various fields like medical imaging, video surveillance,
traffic control, etc., so the mask R-CNN technique would be an extremely efficient DL
algorithm.