A new semantic attribute deep learning with a linguistic attribute hierarchy for spam detection

He, Hongmei and Watson, Tim and Maple, Carsten and Mehnen, Jörn and Tiwari, Ashutosh; (2017) A new semantic attribute deep learning with a linguistic attribute hierarchy for spam detection. In: 2017 International Joint Conference on Neural Networks (IJCNN). IEEE, Piscataway, NJ. ISBN 9781509061822 (https://doi.org/10.1109/IJCNN.2017.7966343)

[thumbnail of He-etal-IJCNN2017-A-new-semantic-attribute-deep-learning-with-a-linguistic-attribute]
Preview
Text. Filename: He_etal_IJCNN2017_A_new_semantic_attribute_deep_learning_with_a_linguistic_attribute.pdf
Accepted Author Manuscript
License: Creative Commons Attribution-NonCommercial 3.0 logo

Download (271kB)| Preview

Abstract

The massive increase of spam is posing a very serious threat to email and SMS, which have become an important means of communication. Not only do spams annoy users, but they also become a security threat. Machine learning techniques have been widely used for spam detection. In this paper, we propose another form of deep learning, a linguistic attribute hierarchy, embedded with linguistic decision trees, for spam detection, and examine the effect of semantic attributes on the spam detection, represented by the linguistic attribute hierarchy. A case study on the SMS message database from the UCI machine learning repository has shown that a linguistic attribute hierarchy embedded with linguistic decision trees provides a transparent approach to in-depth analysing attribute impact on spam detection. This approach can not only efficiently tackle ‘curse of dimensionality’ in spam detection with massive attributes, but also improve the performance of spam detection when the semantic attributes are constructed to a proper hierarchy.