Abstract
Traditionally, image retrieval is done using a text-based approach. In the
text-based approach, the user must query metadata or textual information, such as
keywords, tags, or descriptions. The effectiveness and utility of this approach in the
digital realm for solving image retrieval problems are limited. We introduce an
innovative method that relies on visual content for image retrieval. Various visual
aspects of the image, including color, texture, shape, and more, are employed to
identify relevant images. The choice of the most suitable feature significantly
influences the system's performance. Convolutional Neural Network (CNN) is an
important machine learning model. Creating an efficient new CNN model requires
considerable time and computational resources. There are many pre-trained CNN
models that are already trained on large image datasets, such as ImageNet containing
millions of images. We can use these pre-train CNN models by transferring the learned
knowledge to solve our specific content-based image retrieval talk.
In this chapter, we propose an efficient pre-trained CNN model for content-based
image retrieval (CBIR) named as ResNet model. The experiment was conducted by
applying a pre-trained ResNet model on the Paris 6K and Oxford 5K datasets. The
performance of similar image retrieval has been measured and compared with the stateof-the-art AlexNet model. It is found that the AlexNet architecture takes a longer time
to get more accurate results. The ResNet architecture does not need to fire all neurons
at every epoch. This significantly reduces training time and improves accuracy. In the
ResNet architecture, once the feature is extracted, it will not extract the feature again. It
will try to learn a new feature. To measure its performance, we used the average mean
precision. We obtained the result for Paris6K 92.12% and Oxford5K 84.81%. The
Mean Precision at different ranks, for example, at the first rank in Paris6k, we get
100% result, and for Oxford5k, we get 97.06%.