Machine learning techniques for automated software fault detection via dynamic execution data : empirical evaluation study

Almaghairbe, Rafig and Roper, Marc and Almabruk, Tahani; (2020) Machine learning techniques for automated software fault detection via dynamic execution data : empirical evaluation study. In: Proceedings of the 6th International Conference on Engineering and MIS 2020, ICEMIS 2020. ACM, New York, NY., pp. 1-12. ISBN 9781450377362

Text (Almaghairbe-etal-ICEMIS-2020-Machine-learning-techniques-for-automated-software-fault-detection-via-dynamic-execution-data)
Accepted Author Manuscript

Download (260kB)| Preview


    The biggest obstacle of automated software testing is the construction of test oracles. Today, it is possible to generate enormous amount of test cases for an arbitrary system that reach a remarkably high level of coverage, but the effectiveness of test cases is limited by the availability of test oracles that can distinguish failing executions. Previous work by the authors has explored the use of unsupervised and semi-supervised learning techniques to develop test oracles so that the correctness of software outputs and behaviours on new test cases can be predicated [1], [2], [10], and experimental results demonstrate the promise of this approach. In this paper, we present an evaluation study for test oracles based on machine-learning approaches via dynamic execution data (firstly, input/output pairs and secondly, amalgamations of input/output pairs and execution traces) by comparing their effectiveness with existing techniques from the specification mining domain (the data invariant detector Daikon [5]). The two approaches are evaluated on a range of mid-sized systems and compared in terms of their fault detection ability and false positive rate. The empirical study also discuss the major limitations and the most important properties related to the application of machine learning techniques as test oracles in practice. The study also gives a road map for further research direction in order to tackle some of discussed limitations such as accuracy and scalability. The results show that in most cases semi-supervised learning techniques performed far better as an automated test classifier than Daikon (especially in the case that input/output pairs were augmented with their execution traces). However, there is one system for which our strategy struggles and Daikon performed far better. Furthermore, unsupervised learning techniques performed on a par when compared with Daikon in several cases particularly when input/output pairs were used together with execution traces.