ActiveEye : enabling continuous and responsive video understanding for smart eyewear systems
Xu, Zhenyu and Lu, Tianlin and Zhao, Yingying and Wang, Yujiang and Dong, Mingzhi and Chang, Yuhu and Lv, Qin and Dick, Robert P. and Yang, Fan and Lu, Tun and Gu, Ning and Shang, Li (2025) ActiveEye : enabling continuous and responsive video understanding for smart eyewear systems. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 9 (4). 228. ISSN 2474-9567 (https://doi.org/10.1145/3770641)
Preview |
Text.
Filename: Xu-etal-2025-PACMIWUT-ActiveEye-enabling-continuous-and-responsive-video-understanding.pdf
Accepted Author Manuscript License:
Download (11MB)| Preview |
Abstract
Integrating vision-language models (VLMs) with wearable devices offers great potential for continuous and responsive video understanding, a key capability for applications such as smart eyewear-based conversational assistants. However, achieving this on resource-constrained devices is challenging due to the high energy demands of continuous spatial-temporal sampling and transmission. We propose ActiveEye, a VLM designed for energy-efficient and responsive video understanding. ActiveEye separates visual and motion semantic representations and incorporates an active perception-based feedback path to adaptively adjust spatial-temporal sampling and transmission rates. Implemented as a wearable-mobile-cloud system, ActiveEye is evaluated for energy efficiency, real-time semantic change detection, and video understanding in both laboratory and field studies. Using the EgoSchema dataset, ActiveEye reduces the front-end energy consumption by 49.14%, supporting 8.37 hours of continuous operation on a 2.1 Wh battery. It achieves the highest F1 score (0.80) and the lowest average time difference (1.30 s) compared with heuristic-based event detection algorithms, validating its timely semantic detection. Furthermore, ActiveEye achieves a visual question answering (VQA) accuracy of 61.6%, which is comparable to state-of-the-art VLM agents, despite their reliance on larger language decoders and more computationally intensive frame selection strategies. Two rounds of in-field user evaluations further confirm its effectiveness in real-world settings, demonstrating its practical viability as a continuous and responsive video understanding system, conversational assistant, and wearable companion.
ORCID iDs
Xu, Zhenyu, Lu, Tianlin, Zhao, Yingying
ORCID: https://orcid.org/0000-0001-5902-1306, Wang, Yujiang, Dong, Mingzhi, Chang, Yuhu, Lv, Qin, Dick, Robert P., Yang, Fan, Lu, Tun, Gu, Ning and Shang, Li;
-
-
Item type: Article ID code: 94959 Dates: DateEvent2 December 2025Published19 September 2025AcceptedSubjects: Science > Mathematics > Electronic computers. Computer science Department: Faculty of Science > Computer and Information Sciences Depositing user: Pure Administrator Date deposited: 10 Dec 2025 10:42 Last modified: 16 Jan 2026 01:51 URI: https://strathprints.strath.ac.uk/id/eprint/94959
Tools
Tools






