Cloud-based textual analysis as a basis for document classification

Tools

Weir, George and Owoeye, Kolade and Oberacker, Alice and Alshahrani, Haya; Zine-Dine, Khalid and Smari, Waleed W., eds. (2018) Cloud-based textual analysis as a basis for document classification. In: 2018 International Conference on High Performance Computing & Simulation (HPCS). IEEE, FRA, pp. 629-633. ISBN 9781538678787 (https://doi.org/10.1109/HPCS.2018.00110)

[thumbnail of Weir-etal-ICHPCS-2018-Cloud-based-textual-analysis-as-a-basis-for-document-classification]

Preview

Text. Filename: Weir_etal_ICHPCS_2018_Cloud_based_textual_analysis_as_a_basis_for_document_classification.pdf
Accepted Author Manuscript
Download (324kB)| Preview

Abstract

Growing trends in data mining and developments in machine learning, have encouraged interest in analytical techniques that can contribute insights on data characteristics. The present paper describes an approach to textual analysis that generates extensive quantitative data on target documents, with output including frequency data on tokens, types, parts-of-speech and word n-grams. These analytical results enrich the available source data and have proven useful in several contexts as a basis for automating manual classification tasks. In the following, we introduce the Posit textual analysis toolset and detail its use in data enrichment as input to supervised learning tasks, including automating the identification of extremist Web content. Next, we describe the extension of this approach to Arabic language. Thereafter, we recount the move of these analytical facilities from local operation to a Cloud-based service. This transition, affords easy remote access for other researchers seeking to explore the application of such data enrichment to their own text-based data sets.

ORCID iDs

Weir, George

, Owoeye, Kolade, Oberacker, Alice and Alshahrani, Haya

; Zine-Dine, Khalid and Smari, Waleed W.

Share and Export

Item metadata

Item type:	Book Section
ID code:	66459
Dates:	Date Event 1 November 2018 Published 1 November 2018 Published Online 1 May 2018 Accepted
Subjects:	Science > Mathematics > Electronic computers. Computer science
Department:	Faculty of Science > Computer and Information Sciences
Depositing user:	Pure Administrator
Date deposited:	21 Dec 2018 12:26
Last modified:	02 Feb 2025 20:42
Related URLs:	Scopus publication Publisher
URI:	https://strathprints.strath.ac.uk/id/eprint/66459

CORE (COnnecting REpositories)