Context-aware semantic video coding using pretrained motion-adaptive models and bidirectional prediction
Samarathunga, Prabhath and Ganearachchi, Yasith and Fernando, Thanuj and Jayasingam, Adhuran and Sivalingam, Thushan and Rajatheva, Nandana and Fernando, Anil (2026) Context-aware semantic video coding using pretrained motion-adaptive models and bidirectional prediction. IEEE Access, 14. 69730 - 69755. ISSN 2169-3536 (https://doi.org/10.1109/access.2026.3686329)
Preview |
Text.
Filename: Samarathunga-etal-2026-Context-aware-semantic-video-coding-using-pretrained-motion-adaptive-models.pdf
Final Published Version License:
Download (3MB)| Preview |
Abstract
The rapid growth of video traffic and high-resolution real-time services exposes fundamental limitations in both conventional hybrid codecs and fully learned video compression methods, particularly in scalability, computational efficiency, and temporal modeling capability. Existing semantic video coding approaches partially address these challenges by combining learned representations with standardized residual coding. However, their reliance on per-group-of-pictures online training and simplistic temporal prediction limits practical deployment and generalization. This paper proposes a context-conditioned semantic video coding framework that fundamentally redefines hybrid semantic coding by eliminating per-group-of-pictures training through a shared pretrained foundation model with motion-adaptive specialization. A hierarchical bidirectional prediction architecture is introduced to explicitly model long-range temporal dependencies, while motion-aligned contextual references generated via bidirectional frame interpolation provide semantic guidance for reconstruction. This context-conditioned decoding mechanism reduces ambiguity in latent interpretation and significantly improves temporal coherence and prediction accuracy. To further enhance coding efficiency, a differential semantic latent representation is proposed and entropy-coded using Deep Context-Adaptive Binary Arithmetic Coding, while a Versatile Video Coding-based residual pathway guarantees reconstruction fidelity. The overall system is formulated within a unified context-conditioned rate–distortion optimization framework, bridging semantic and conventional coding paradigms. Comprehensive evaluations across diverse benchmark datasets demonstrate consistent and significant rate–distortion gains, achieving an average Bjøntegaard delta rate reduction of 22.03% over the Versatile Video Coding Test Model and competitive performance relative to state-of-the-art learned codecs. These results establish context-conditioned semantic coding with pretrained models and hierarchical bidirectional prediction as a scalable and deployable direction for next-generation video compression.
ORCID iDs
Samarathunga, Prabhath, Ganearachchi, Yasith
ORCID: https://orcid.org/0000-0002-8337-3739, Fernando, Thanuj, Jayasingam, Adhuran, Sivalingam, Thushan, Rajatheva, Nandana and Fernando, Anil
ORCID: https://orcid.org/0000-0002-2158-2367;
-
-
Item type: Article ID code: 96177 Dates: DateEvent12 May 2026Published22 April 2026Published Online1 April 2026AcceptedSubjects: Science > Mathematics > Electronic computers. Computer science Department: Faculty of Science > Computer and Information Sciences Depositing user: Pure Administrator Date deposited: 05 May 2026 13:24 Last modified: 03 Jun 2026 16:01 Related URLs: URI: https://strathprints.strath.ac.uk/id/eprint/96177
Tools
Tools






