Abstract
In this chapter, we briefly review the online learning algorithms applied to enable content-based multimedia annotation, which is scalable to handle large-scale multimedia data as well as the associated semantic concepts. Multimedia search uses annotated semantic concepts to approach efficient content-based indexing. This is a promising direction to enable real content-based multimedia search. However, due to large amounts of multimedia samples and semantic concepts, existing techniques for automatic multimedia annotation are not able to handle large-scale multimedia corpus and concept set, in terms of both annotation accuracy and computation cost. To enable large-scale semantic concept annotation, a practical multimedia annotation method ought to be scalable on both multimedia sample dimension and concept label dimension. In real-world cases, large-scale unlabeled multimedia samples arrive consecutively in batches with an initial prelabeled training set, based on which a preliminary multi-label classifier is built. For each arrived batch, a multi-label active learning engine is applied, which selects a set of unlabeled samples with selected set of labels to get label confirmation from data labelers. And then an online learner updates the original classifier by taking the newly labeled sample-label pairs into consideration. This process repeats until all data are arrived. During the process, new labels, even without any pre-labeled training samples, can be incorporated into the process anytime. In this chapter, we review the large-scale online active annotation for Internet multimedia in the above two basic techniques - active learning and online computing. By combining these two techniques in a unified framework, scalable multimedia annotation can be achieved in an online manner so that both annotation accuracy and efficiency are able to be significantly improved.
Keywords: large-scale multimedia search and mining, online learning, multi-label annotation, multimedia sample dimension, concept label dimension, sample-label pair, multi-label active learning, large-scale online active annotation, Correlative Multi- Label, 2D Active Learning