Context-aware semantic video coding using pretrained motion-adaptive models and bidirectional prediction

Samarathunga, Prabhath and Ganearachchi, Yasith and Fernando, Thanuj and Jayasingam, Adhuran and Sivalingam, Thushan and Rajatheva, Nandana and Fernando, Anil (2026) Context-aware semantic video coding using pretrained motion-adaptive models and bidirectional prediction. IEEE Access, 14. 69730 - 69755. ISSN 2169-3536 (https://doi.org/10.1109/access.2026.3686329)

[thumbnail of Samarathunga-etal-2026-Context-aware-semantic-video-coding-using-pretrained-motion-adaptive-models]
Preview
Text. Filename: Samarathunga-etal-2026-Context-aware-semantic-video-coding-using-pretrained-motion-adaptive-models.pdf
Final Published Version
License: Creative Commons Attribution 4.0 logo

Download (3MB)| Preview

Abstract

The rapid growth of video traffic and high-resolution real-time services exposes fundamental limitations in both conventional hybrid codecs and fully learned video compression methods, particularly in scalability, computational efficiency, and temporal modeling capability. Existing semantic video coding approaches partially address these challenges by combining learned representations with standardized residual coding. However, their reliance on per-group-of-pictures online training and simplistic temporal prediction limits practical deployment and generalization. This paper proposes a context-conditioned semantic video coding framework that fundamentally redefines hybrid semantic coding by eliminating per-group-of-pictures training through a shared pretrained foundation model with motion-adaptive specialization. A hierarchical bidirectional prediction architecture is introduced to explicitly model long-range temporal dependencies, while motion-aligned contextual references generated via bidirectional frame interpolation provide semantic guidance for reconstruction. This context-conditioned decoding mechanism reduces ambiguity in latent interpretation and significantly improves temporal coherence and prediction accuracy. To further enhance coding efficiency, a differential semantic latent representation is proposed and entropy-coded using Deep Context-Adaptive Binary Arithmetic Coding, while a Versatile Video Coding-based residual pathway guarantees reconstruction fidelity. The overall system is formulated within a unified context-conditioned rate–distortion optimization framework, bridging semantic and conventional coding paradigms. Comprehensive evaluations across diverse benchmark datasets demonstrate consistent and significant rate–distortion gains, achieving an average Bjøntegaard delta rate reduction of 22.03% over the Versatile Video Coding Test Model and competitive performance relative to state-of-the-art learned codecs. These results establish context-conditioned semantic coding with pretrained models and hierarchical bidirectional prediction as a scalable and deployable direction for next-generation video compression.

ORCID iDs

Samarathunga, Prabhath, Ganearachchi, Yasith ORCID logoORCID: https://orcid.org/0000-0002-8337-3739, Fernando, Thanuj, Jayasingam, Adhuran, Sivalingam, Thushan, Rajatheva, Nandana and Fernando, Anil ORCID logoORCID: https://orcid.org/0000-0002-2158-2367;