Latent prediction-based generative semantic communication for video transmission in wireless networks

Lokumarambage, Maheshi and Sivalingam, Thushan and Dong, Feng and Rajatheva, Nandana and Fernando, Anil (2026) Latent prediction-based generative semantic communication for video transmission in wireless networks. IEEE Open Journal of the Communications Society, 7. pp. 3974-3986. ISSN 2644-125X (https://doi.org/10.1109/OJCOMS.2026.3684230)

[thumbnail of Lokumarambage-etal-IEEE-OJCS-2026-Latent-prediction-based-generative-semantic-communication-for-video-transmission]
Preview
Text. Filename: Lokumarambage-etal-IEEE-OJCS-2026-Latent-prediction-based-generative-semantic-communication-for-video-transmission.pdf
Final Published Version
License: Creative Commons Attribution 4.0 logo

Download (1MB)| Preview

Abstract

The increasing dominance of video traffic in intelligent sensing and control applications introduces a major challenge to the capacity limits of modern wireless networks. Classical information theory defines fixed physical boundaries on channel capacity, beyond which further improvement requires rethinking what information is transmitted. Semantic communication (SemCom) bridges this by only sending the semantics of the intended message. This paper presents a SemCom framework that leverages latent-space procedural video prediction with world-model-guided temporal dynamics. Instead of transmitting pixel data, the transmitter encodes high-level semantic representations of context frames and sends them through the physical channel. A temporal transformer predicts future latent states at the receiver. The framework jointly optimizes perceptual, adversarial, and temporal objectives to preserve both visual quality and trajectory consistency under channel impairments. Experiments conducted on video sequences of the BAIR robot pushing dataset demonstrate that the proposed method achieves lower normalized endpoint and velocity errors compared to learned baseline real-time intermediate flow estimation (RIFE) + better portable graphics (BPG) with reduced bit-rate compared to traditional codecs. The results indicate that incorporating temporal dynamics as semantics into the communication process enables efficient and anticipatory video transmission suitable for applications such as tele-robotic, underwater, and autonomous systems.

ORCID iDs

Lokumarambage, Maheshi, Sivalingam, Thushan, Dong, Feng, Rajatheva, Nandana and Fernando, Anil ORCID logoORCID: https://orcid.org/0000-0002-2158-2367;