Abstract
Information extraction from images is now incredibly valuable for many
new inventions. Even though there are several simple methods for extracting
information from the images, feasibility and accuracy are critical. One of the simplest
and most significant processes is feature extraction from the images. Many scientific
approaches are derived by the experts based on the extracted features for a better
conclusion of their work. Mathematical procedures, like Scientific methods, play an
important role in image analysis. The Bag of Visual Words (BoVW) [1, 2, 3] is one of
them, and it is helpful to figure out how similar a group of images is. A set of visual
words characterises the images in the Bag of Visual Words model, which are
subsequently aggregated in a histogram per image [4]. The histogram difference
depicts the similarities among the images. The reweighting methodology known as
Term Frequency – Inverse Document Frequency (TF-IDF) [5] refines this procedure.
The overall weighting [6] for all words in each histogram is calculated before
reweighting. As per the traditional way, the images are transformed into the matrix
called as Cost matrix. It is constructed through two mathematical: Euclidean distances
and Cosine distances. The main purpose of finding these distances is to detect
similarity between the histograms. Further the histograms are normalized and both
distances are calculated. The visual representation is also generated. The two
mathematical methods are compared to see which one is appropriate for checking
resemblance. The strategy identified as the optimum solution based on the findings aids
in fraud detection in digital signature, Image Processing, and classification of images.