Using Developer Information as a Factor for Fault Prediction May 20, 2007 Elaine Weyuker Tom Ostrand Bob Bell AT&T Labs – Research
GOAL : To determine which files of a software system with multiple releases are particularly likely to contain large numbers of faults.
Because this should allow us to build highly dependable software systems more economically by allowing us to better allocate testing effort and resources, including personnel. Prioritize testing. Why is this important?
Infrastructure Projects use an integrated change management/version control system. Any change to the software requires that a modification request (MR) be opened. MRs include information such as the reason that the change is to be made, a description of the change, a severity rating, the actual change, development stage during which the MR was initiated.
Explanatory Variables Size of file - log(KLOC) Age of file – 0, 1, 2-4, >4. New to the current release, and if not, whether it was changed during prior release? Sqrt(number of changes in the previous release) Sqrt(number of changes two releases ago). Sqrt(number of faults in the previous release). Programming language used.
Systems Studied 84% 9 years Maintenance Support 75% 2.25 years Voice Resp 83% 2 years Provisioning 83% 4 years Inventory 20% Files Period Covered System Type
Maintenance Support System Developed and maintained by a different company. Very mature system - 9 years of field data. The 20% of the files identified by our model contained 84% of the faults.
Adding Developer Information to Improve Predictions for Changed Files The number of developers who modified the file during the prior release. The number of new developers who modified the file during the prior release. The cumulative number of distinct developers who modified the file during all releases through the prior release. NB: Don’t know who created the file.
Cumulative Number of Developers After 20 Releases (526 Files, Mean 3.54)
Mean Cumulative Number of Developers by File Age (Age 20 = 3.54)
Proportion of Changed Files with Multiple Developers by File Age
Proportion of Changed Files with at Least 1 New Developer by File Age
Percentage Faults in Identified 20% Files 84.9 83.9 Mean Rel 6-35 92 92 31-35 91 90 26-30 88 89 21-25 86 84 16-20 73 71 11-15 79 78 6-10 With Developers W/O Developers Release Number
Conclusions Using developer information helps, but only a little bit. Factors like size and whether or not the file is new or changed are much more important.

Using Developer Information as a Prediction Factor

  • 1.
    Using Developer Informationas a Factor for Fault Prediction May 20, 2007 Elaine Weyuker Tom Ostrand Bob Bell AT&T Labs – Research
  • 2.
    GOAL : Todetermine which files of a software system with multiple releases are particularly likely to contain large numbers of faults.
  • 3.
    Because this shouldallow us to build highly dependable software systems more economically by allowing us to better allocate testing effort and resources, including personnel. Prioritize testing. Why is this important?
  • 4.
    Infrastructure Projects usean integrated change management/version control system. Any change to the software requires that a modification request (MR) be opened. MRs include information such as the reason that the change is to be made, a description of the change, a severity rating, the actual change, development stage during which the MR was initiated.
  • 5.
    Explanatory Variables Sizeof file - log(KLOC) Age of file – 0, 1, 2-4, >4. New to the current release, and if not, whether it was changed during prior release? Sqrt(number of changes in the previous release) Sqrt(number of changes two releases ago). Sqrt(number of faults in the previous release). Programming language used.
  • 6.
    Systems Studied 84%9 years Maintenance Support 75% 2.25 years Voice Resp 83% 2 years Provisioning 83% 4 years Inventory 20% Files Period Covered System Type
  • 7.
    Maintenance Support SystemDeveloped and maintained by a different company. Very mature system - 9 years of field data. The 20% of the files identified by our model contained 84% of the faults.
  • 8.
    Adding Developer Informationto Improve Predictions for Changed Files The number of developers who modified the file during the prior release. The number of new developers who modified the file during the prior release. The cumulative number of distinct developers who modified the file during all releases through the prior release. NB: Don’t know who created the file.
  • 9.
    Cumulative Number ofDevelopers After 20 Releases (526 Files, Mean 3.54)
  • 10.
    Mean Cumulative Numberof Developers by File Age (Age 20 = 3.54)
  • 11.
    Proportion of ChangedFiles with Multiple Developers by File Age
  • 12.
    Proportion of ChangedFiles with at Least 1 New Developer by File Age
  • 13.
    Percentage Faults inIdentified 20% Files 84.9 83.9 Mean Rel 6-35 92 92 31-35 91 90 26-30 88 89 21-25 86 84 16-20 73 71 11-15 79 78 6-10 With Developers W/O Developers Release Number
  • 14.
    Conclusions Using developerinformation helps, but only a little bit. Factors like size and whether or not the file is new or changed are much more important.