Picture of athlete cycling

Open Access research with a real impact on health...

The Strathprints institutional repository is a digital archive of University of Strathclyde's Open Access research outputs. Strathprints provides access to thousands of Open Access research papers by Strathclyde researchers, including by researchers from the Physical Activity for Health Group based within the School of Psychological Sciences & Health. Research here seeks to better understand how and why physical activity improves health, gain a better understanding of the amount, intensity, and type of physical activity needed for health benefits, and evaluate the effect of interventions to promote physical activity.

Explore open research content by Physical Activity for Health...

Comparing text-based and dependence-based approaches for determining the origins of bugs

Davies, Steven and Roper, Marc and Wood, Murray (2014) Comparing text-based and dependence-based approaches for determining the origins of bugs. Journal of Software: Evolution and Process, 26 (1). pp. 107-139.

Text (Davies-etal-JOSEAP2015-determining-the-origins-of-bugs)
Davies_etal_JOSEAP2015_determining_the_origins_of_bugs.pdf - Accepted Author Manuscript

Download (875kB) | Preview


Identifying bug origins – the point where erroneous code was introduced – is crucial for many software engineering activities, from identifying process weaknesses to gathering data to support bug detection tools. Unfortunately, this information is not usually recorded when fixing bugs, and recovering it later is challenging. Recently, the text approach and the dependence approach have been developed to tackle this problem. Respectively, they examine textual and dependence-related changes that occurred prior to a bug fix. However, only limited evaluation has been carried out, partially because of a lack of available implementations and of datasets linking bugs to origins. To address this, origins of 174 bugs in three projects were manually identified and compared to a simulation of the approaches. Both approaches were partially successful across a variety of bugs – achieving 29–79% precision and 40–70% recall. Results suggested the precise definition of program dependence could affect performance, as could whether the approaches identified a single or multiple origins. Some potential improvements are explored in detail and identify pragmatic strategies for combining techniques along with simple modifications. Even after adopting these improvements, there remain many challenges: large commits, unrelated changes and long periods between origins and fixes all reduce effectiveness.