Abstract
The term “object detection” refers to a technology that enables humans to
recognise specific types of things present in visual media. One of the important
applications of the technique is autonomous driving cars. In the application, the activity
is to detect the various objects present in the single image frame. Examples of objects
belonging to multiple classes are trucks, bikes, persons, cars, dogs, and cats. For this
task, we use object localization and classification as we have to locate multiple objects
in the image. Various techniques available in the market based on Deep Learning use
inbuilt architectures such as VGG-16 and InceptionV3. Using these techniques to solve
the problem is a reasonable solution but the response time from these architectures may
not be feasible as the autonomous vehicles have to react in less than 0.02 milliseconds
in order to avoid collisions of all sorts. So using YOLO, we simply predict the classes
and the bounded co-ordinates of the object in a single run of the model and detect
multiple objects from the image rather than focusing only on the interested regions of
the image as formerly employed by various models. YOLO is fast and accurate with
the help of Convolution Neural Networks and is less likely to produce localization
errors.