Abstract
Background: Instant access to desired information is the key element for building an intelligent environment creating value for people and steering towards society 5.0. Online newspapers are one such example which provide instant access to information anywhere and anytime on our mobiles, tablets, laptops, desktops, etc. But when it comes to searching for a specific advertisement in newspapers, online newspapers do not provide easy advertisement search options. Also, there are no specialized search portals which can provide for keyword-based advertisement search across multiple online newspapers. As a result, to find a specific advertisement in multiple newspapers, a sequential manual search is required across a range of online newspapers.
Objective: This research paper proposes a keyword-based advertisement search framework to provide an instant access to the relevant advertisements from online English newspapers in a category of reader’s choice.
Methods: First, an image extraction algorithm is proposed which can identify and extract the images from online newspapers without using any rules on advertisement placement and/or size. It is followed by a proposed deep learning Convolutional Neural Network (CNN) model named ‘Adv_Recognizer’ which is used to separate the advertisement images from non-advertisement images. Another CNN Model, ‘Adv_Classifier’, is proposed, which classifies the advertisement images into four pre-defined categories. Finally, Optical Character Recognition (OCR) technique is used to perform keyword-based advertisement searches in various categories across multiple newspapers.
Results: The proposed image extraction algorithm can easily extract all types of well-bounded images from different online newspapers and this algorithm is used to create ‘English newspaper image dataset’ of 11,000 images, including advertisements and non-advertisements. The proposed ‘Adv_Recognizer’ model separates advertisement and non-advertisement images with an accuracy of around 97.8%. and the proposed ‘Adv_Classifier’ model classifies the advertisements in four predefined categories exhibiting an accuracy of around 73.5%.
Conclusion: The proposed framework will help newspaper readers in performing exhaustive advertisement searches across a range of online English newspapers in a category of their own interest. It will also help in carrying out advertisement analysis and studies.
Keywords: Advertisement image classification, convolutional neural networks (CNN), newspaper advertisements, newspaper layout segmentation, optical character recognition (OCR), residual networks (ResNet), transfer learning.
Graphical Abstract
[http://dx.doi.org/10.1109/TIT.1967.1053964]
[http://dx.doi.org/10.1023/A:1009744630224]
[http://dx.doi.org/10.1007/BFb0026666]
[http://dx.doi.org/10.1007/BF00994018]
[http://dx.doi.org/10.1016/j.patcog.2017.10.013]
[http://dx.doi.org/10.1109/ISCAS.2010.5537907]
[http://dx.doi.org/10.1145/3065386]
[http://dx.doi.org/10.1016/j.neucom.2015.09.116]
[http://dx.doi.org/10.1109/CVPRW.2014.131]
[http://dx.doi.org/10.1109/5.156470]
[http://dx.doi.org/10.1109/5.156468]
[http://dx.doi.org/10.48129/kjs.v48i2.9589]
[http://dx.doi.org/10.1109/ICDAR.1999.791849]
[http://dx.doi.org/10.1109/ICDAR.2001.953971]
[http://dx.doi.org/10.1016/j.imavis.2003.11.001]
[http://dx.doi.org/10.1109/ICPR.2004.1334135]
[http://dx.doi.org/10.1109/ICDAR.2007.4377118]
[http://dx.doi.org/10.1109/ICDAR.2009.272]
[http://dx.doi.org/10.1145/2361354.2361383]
[http://dx.doi.org/10.1109/ICDAR.2013.293]
[http://dx.doi.org/10.1109/DAS.2014.42]
[http://dx.doi.org/10.1109/ICS.2016.0086]
[http://dx.doi.org/10.1109/ICDAR.2017.75]
[http://dx.doi.org/10.1109/ICMLA.2019.00223]
[http://dx.doi.org/10.1109/IICIP.2016.7975367]
[http://dx.doi.org/10.1109/DICTA.2016.7797053]
[http://dx.doi.org/10.17762/turcomat.v12i2.1505]
[http://dx.doi.org/10.1145/1180639.1180697]
[http://dx.doi.org/10.1109/ICME.2007.4284718]
[http://dx.doi.org/10.1109/ICME.2007.4285011]
[http://dx.doi.org/10.1006/jcss.1997.1504]
[http://dx.doi.org/10.1109/CAR.2010.5456544]
[http://dx.doi.org/10.1109/KSE.2017.8119458]
[http://dx.doi.org/10.3390/e20120982] [PMID: 33266705]
[http://dx.doi.org/10.5120/11871-7665]
[http://dx.doi.org/10.1007/978-3-319-50835-1_66]
[http://dx.doi.org/10.5121/iju.2015.6303]
[http://dx.doi.org/10.1109/DAS.2016.69]
[http://dx.doi.org/10.1145/3078081.3078098]
[http://dx.doi.org/10.3390/app9224853]
[http://dx.doi.org/10.1145/3219819.3219861]
[http://dx.doi.org/10.1109/ICDAR.2019.00055]
[http://dx.doi.org/10.21917/ijsc.2015.0133]
[http://dx.doi.org/10.1007/978-3-540-31865-1_25]
[http://dx.doi.org/10.1109/ICCICCT.2014.6993140]
[http://dx.doi.org/10.1109/TSMCC.2002.804448]
[http://dx.doi.org/10.1109/TPAMI.1987.4767941] [PMID: 21869411]
[http://dx.doi.org/10.1007/978-1-4842-2766-4_12]
[http://dx.doi.org/10.1109/TKDE.2009.191]
[http://dx.doi.org/10.4018/978-1-60566-766-9.ch011]
[http://dx.doi.org/10.1109/CVPR.2016.90]
[http://dx.doi.org/10.1007/978-3-642-76153-9_28]
[http://dx.doi.org/10.1007/s10443-012-9286-3]
[http://dx.doi.org/10.1109/ICOEI.2019.8862686]