Abstract
Introduction: An image captioning system is a crucial component in the domains of computer vision and natural language processing. Deep neural networks have been an increasingly popular tool for the generation of descriptive captions for photos in recent years.
Method: However, these models frequently have the issue of providing captions that are unoriginal and repetitious. Beam search is a well-known search technique that is utilized for the purpose of producing descriptions for images in an effective and productive manner. The algorithm keeps track of a set of partial captions and expands them iteratively by choosing the probable next word throughout each step until a complete caption is generated. The set of partial captions, also known as the beam, is updated at each step based on the predicted probabilities of the next words. This research paper presents an image caption generation system based on beam search. In order to encode the image data and generate captions, the system is trained on a deep neural network architecture.
Results: This architecture brings together the benefits of CNN with RNN. After that, the beam search method is executed in order to provide the completed captions, resulting in a more diverse and descriptive set of captions compared to traditional greedy decoding approaches. The experimental outcomes indicate that the suggested system is superior to the existing image caption generation techniques in terms of the precision and variety of the generated captions.
Conclusion: This demonstrates the effectiveness of beam search in enhancing the efficiency of image caption generation systems.