Explainable video topics for content taxonomy : a multimodal retrieval approach to industry-compliant contextual advertising

de Silva, Waruna and Fernando, Anil (2025) Explainable video topics for content taxonomy : a multimodal retrieval approach to industry-compliant contextual advertising. IEEE Access, 13. pp. 30597-30612. ISSN 2169-3536 (https://doi.org/10.1109/ACCESS.2025.3542562)

[thumbnail of De-Silva-Fernando-IEEEA-2025-Explainable-video-topics-for-content-taxonomy]
Preview
Text. Filename: De-Silva-Fernando-IEEEA-2025-Explainable-video-topics-for-content-taxonomy.pdf
Final Published Version
License: Creative Commons Attribution 4.0 logo

Download (2MB)| Preview

Abstract

Owing to the increased video content consumption in recent years, the need for advanced contextual advertising methods that leverage increasing user engagement and relevance on advertisement-based video-on-demand platforms has increased. Traditional behavior-based advertisement targeting is waning, particularly owing to the recent strict privacy policies that favor user consent and privacy. This study proposes an innovative approach for integrating advanced natural language processing with multi-modal analysis for video contextual advertising. To this end, transformer-based architectures, specifically BERTopic, computer vision techniques, and large language models were used to extract sets of topics from visual and textual video data automatically and systematically. The proposed framework decodes the taxonomy of content efficiently through videos in different levels of noise and languages. Empirical analysis of the YouTube-8M dataset shows the potential for the approach to change the paradigm in video advertising. Built to be scalable and easily adaptable, this solution can handle multifarious and complex user-generated content well, suited for a wide range of applications across various media platforms.

ORCID iDs

de Silva, Waruna and Fernando, Anil ORCID logoORCID: https://orcid.org/0000-0002-2158-2367;