Classifying suspicious content using frequency analysis
Gellineau, Obika and Weir, George R. S.; Weir, George and Ishikawa, S. and Poonpol, K., eds. (2011) Classifying suspicious content using frequency analysis. In: Corpora and Language Technologies in Teaching, Learning and Research. University of Strathclyde Publishing. ISBN 9780947649821
Preview |
Text.
Filename: Gellineau_Weir_2011_Classifying_suspicious_content_using_frequency.pdf
Accepted Author Manuscript Download (92kB)| Preview |
Abstract
This paper details an experiment to explore the use of chi by degrees of freedom (CBDF) and Log-Likelihood statistical similarity measures with single word and bigram frequencies as a means of discriminating subject content in order to classify samples of chat texts as dangerous, suspicious or innocent. The control for these comparisons was a set of manually ranked sample texts that were rated, in terms of eleven subject categories (five considered dangerous and six considered harmless). Results from this manual rating of chat text samples were then compared with the ranked lists generated using CBDF and Log-Likelihood measures, for both word and bigram frequency. This was achieved by combining currently available textual analysis tools with a newly implemented software application. Our results show that the CBDF method using word frequencies gave discrimination closest to the human rated samples.
ORCID iDs
Gellineau, Obika and Weir, George R. S. ORCID: https://orcid.org/0000-0002-6264-4480; Weir, George, Ishikawa, S. and Poonpol, K.-
-
Item type: Book Section ID code: 54902 Dates: DateEvent2011PublishedSubjects: Science > Mathematics > Computer software Department: Faculty of Science > Computer and Information Sciences Depositing user: Pure Administrator Date deposited: 11 Dec 2015 04:12 Last modified: 11 Nov 2024 14:59 URI: https://strathprints.strath.ac.uk/id/eprint/54902