Strathprints logo
Strathprints Home | Open Access | Browse | Search | User area | Copyright | Help | Library Home | SUPrimo

Detecting word substitutions in text

Fong, SeWong and Roussinov, Dmitri and Skillicorn, David (2008) Detecting word substitutions in text. IEEE Transactions on Knowledge and Data Engineering, 20 (8). pp. 1067-1076. ISSN 1045-9227

[img]
Preview
PDF - Draft Version
Download (319Kb) | Preview

    Abstract

    Searching for words on a watchlist is one way in which large-scale surveillance of communication can be done, for example in intelligence and counterterrorism settings. One obvious defense is to replace words that might attract attention to a message with other, more innocuous, words. For example, the sentence the attack will be tomorrow" might be altered to the complex will be tomorrow", since 'complex' is a word whose frequency is close to that of 'attack'. Such substitutions are readily detectable by humans since they do not make sense. We address the problem of detecting such substitutions automatically, by looking for discrepancies between words and their contexts, and using only syntactic information. We define a set of measures, each of which is quite weak, but which together produce per-sentence detection rates around 90% with false positive rates around 10%. Rules for combining persentence detection into per-message detection can reduce the false positive and false negative rates for messages to practical levels. We test the approach using sentences from the Enron email and Brown corpora, representing informal and formal text respectively.

    Item type: Article
    ID code: 34651
    Keywords: pointwise mutual information, textual analysis, counterterrorism, word frequencies, data mining, co-occurrence, Library Science. Information Science, Computational Theory and Mathematics, Information Systems, Computer Science Applications
    Subjects: Bibliography. Library Science. Information Resources > Library Science. Information Science
    Department: Faculty of Science > Computer and Information Sciences
    Related URLs:
      Depositing user: Pure Administrator
      Date Deposited: 20 Oct 2011 11:34
      Last modified: 05 Sep 2014 13:37
      URI: http://strathprints.strath.ac.uk/id/eprint/34651

      Actions (login required)

      View Item

      Fulltext Downloads: