Generative AI-based vector quantized end-to-end semantic communication system for wireless image transmission
Lokumarambage, Maheshi and Sivalingam, Thushan and Dong, Feng and Rajatheva, Nandana and Fernando, Anil (2025) Generative AI-based vector quantized end-to-end semantic communication system for wireless image transmission. IEEE Transactions on Machine Learning in Communications and Networking, 3. pp. 1050-1074. ISSN 2831-316X (https://doi.org/10.1109/TMLCN.2025.3607891)
Preview |
Text.
Filename: Lokumarambage-etal-IEEE-TMLCN-2025-Generative-AI-based-vector-quantized-end-to-end-semantic-communication-system.pdf
Final Published Version License:
Download (5MB)| Preview |
Abstract
Semantic communication (SemCom) systems enhance transmission efficiency by conveying semantic information in lieu of raw data. However, challenges arise when designing these systems due to the need for robust semantic source coding for information representation extending beyond the training dataset, maintaining channel-agnostic performance, and ensuring robustness to channel and semantic noise. We propose a novel generative artificial intelligence (AI) based SemCom architecture conditioned on quantized latent. The system reduces the communication overhead of the wireless channel by transmitting the index of the quantized latent over the communication channel by mapping the quantized vector to the learned codebook vectors. The learned codebook is the shared knowledge base. The encoder is designed with a novel spatial attention mechanism based on image energy, focusing on object edges. The critic assesses the realism of generated data relative to the original distribution, with the Wasserstein distance. The model introduces novel contrastive objectives at multiple levels, including pixel, latent, perceptual, and task output, tailored for noisy wireless semantic communication. We validated the proposed model for transmission quality and robustness with low-density parity-check (LDPC), which outperforms the baselines of better portable graphics (BPG), specifically at low signal-to-noise ratio (SNR) levels ( < 5 dB). Additionally, it shows comparable results with joint source-channel coding (JSCC) with lower complexity and latency. The model is validated for human perception and machine perception-oriented task utility. The model effectively transmits high-resolution images without requiring additional error correction at the receiver. We propose a novel semantic-based matrix to evaluate the robustness to noise and task-specific semantic distortion.
ORCID iDs
Lokumarambage, Maheshi, Sivalingam, Thushan, Dong, Feng, Rajatheva, Nandana and Fernando, Anil
ORCID: https://orcid.org/0000-0002-2158-2367;
-
-
Item type: Article ID code: 94180 Dates: DateEvent17 September 2025Published9 September 2025Published Online31 August 2025Accepted10 February 2025SubmittedSubjects: Science > Mathematics > Electronic computers. Computer science > Other topics, A-Z Department: Faculty of Science > Computer and Information Sciences Depositing user: Pure Administrator Date deposited: 16 Sep 2025 08:12 Last modified: 17 Nov 2025 22:19 URI: https://strathprints.strath.ac.uk/id/eprint/94180
Tools
Tools






