Picture of person typing on laptop with programming code visible on the laptop screen

World class computing and information science research at Strathclyde...

The Strathprints institutional repository is a digital archive of University of Strathclyde's Open Access research outputs. Strathprints provides access to thousands of Open Access research papers by University of Strathclyde researchers, including by researchers from the Department of Computer & Information Sciences involved in mathematically structured programming, similarity and metric search, computer security, software systems, combinatronics and digital health.

The Department also includes the iSchool Research Group, which performs leading research into socio-technical phenomena and topics such as information retrieval and information seeking behaviour.


Comparing text-based and dependence-based approaches for determining the origins of bugs

Davies, Steven and Roper, Marc and Wood, Murray (2014) Comparing text-based and dependence-based approaches for determining the origins of bugs. Journal of Software: Evolution and Process, 26 (1). pp. 107-139.

Text (Davies-etal-JOSEAP2015-determining-the-origins-of-bugs)
Davies_etal_JOSEAP2015_determining_the_origins_of_bugs.pdf - Accepted Author Manuscript

Download (875kB) | Preview


Identifying bug origins – the point where erroneous code was introduced – is crucial for many software engineering activities, from identifying process weaknesses to gathering data to support bug detection tools. Unfortunately, this information is not usually recorded when fixing bugs, and recovering it later is challenging. Recently, the text approach and the dependence approach have been developed to tackle this problem. Respectively, they examine textual and dependence-related changes that occurred prior to a bug fix. However, only limited evaluation has been carried out, partially because of a lack of available implementations and of datasets linking bugs to origins. To address this, origins of 174 bugs in three projects were manually identified and compared to a simulation of the approaches. Both approaches were partially successful across a variety of bugs – achieving 29–79% precision and 40–70% recall. Results suggested the precise definition of program dependence could affect performance, as could whether the approaches identified a single or multiple origins. Some potential improvements are explored in detail and identify pragmatic strategies for combining techniques along with simple modifications. Even after adopting these improvements, there remain many challenges: large commits, unrelated changes and long periods between origins and fixes all reduce effectiveness.