Classifying Tor traffic encrypted payload using machine learning

Choorod, Pitpimon and Weir, George and Fernando, Anil (2024) Classifying Tor traffic encrypted payload using machine learning. IEEE Access, 12. 19418 - 19431. ISSN 2169-3536 (https://doi.org/10.1109/access.2024.3356073)

[thumbnail of Choorod-etal-IEEE-Access-2024-Classifying-Tor-traffic-encrypted-payload-using-machine-learning]
Preview
Text. Filename: Choorod-etal-IEEE-Access-2024-Classifying-Tor-traffic-encrypted-payload-using-machine-learning.pdf
Final Published Version
License: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 logo

Download (1MB)| Preview

Abstract

Tor, a network offering Internet anonymity, presented both positive and potentially malicious applications, leading to the need for efficient Tor traffic monitoring. While most current traffic classification methods rely on flow-based features, these can be unreliable due to factors like asymmetric routing, and the use of multiple packets for feature computation can lead to processing delays. Recognising the multi-layered encryption of Tor compared to nonTor encrypted payloads, our study explored distinct patterns in their encrypted data. We introduced a novel method using Deep Packet Inspection and machine learning to differentiate between Tor and nonTor traffic based solely on encrypted payload. In the first strand of our research, we investigated hex character analysis of the Tor and nonTor encrypted payloads through statistical testing across 8 groups of application types. Remarkably, our investigation revealed a significant differentiation rate of 94.53% between Tor and nonTor traffic. In the second strand of our research, we aimed to distinguish Tor and nonTor traffic using machine learning, based on encrypted payload features. This proposed feature-based approach proved effective, as evidenced by our classification performance, which attained an average accuracy rate of 95.65% across these 8 groups of applications. Thereby, this study contributes to the efficient classification of Tor and nonTor traffic through features derived solely from a single encrypted payload packet, independent of its position in the traffic flow.