Machine learning for literature classification during systematic literature review – establishing the minimum threshold for labelling papers

Venugopal, Vivek and Ates, Aylin and McKiernan, Peter (2022) Machine learning for literature classification during systematic literature review – establishing the minimum threshold for labelling papers. In: 36th Annual Conference of the British Academy of Management, 2022-08-31 - 2022-09-02, Alliance Manchester Business School.

[thumbnail of Venugopal-etal-BAM-2022-Machine-learning-for-literature-classification-during-systematic-literature-review]
Preview
Text. Filename: Venugopal_etal_BAM_2022_Machine_learning_for_literature_classification_during_systematic_literature_review.pdf
Accepted Author Manuscript
License: Strathprints license 1.0

Download (1MB)| Preview

Abstract

Taking inspiration from the use of machine learning in the field of medicine for literature classification, this paper explores the use of machine learning to aid the classification of documents during systematic literature reviews in the field of business and management studies. The performances of two machine learning models, SVM and Logistic regression, are compared. The dataset used is a labelled dataset on weak signal literature. The data is iteratively split into training and testing sets with the aim of minimising the training set. The models were evaluated on Sensitivity (Recall), Precision, Specificity, Accuracy, and f1_Score to find the optimal training split. The optimal value was found to be between 40% to 50%. Which meant only 40% to 50% of the dataset needed to be labelled for the machine learning model to predict the labels for the rest of the dataset. Even though machine learning will not eliminate the labour involved in systematic literature reviews, it will save the amount of labour involved and the amount of time required.