And now for something completely different ...
<Insert Picture Here> Code coverage. The pragmatic approach. Александр Ильин Java Quality architect alexandre.iline@oracle.com
What it is about? Should the testing be stopped at 100% coverage? Should 100% be the goal? How (else) to use code coverage information? 3 What it is not about? Tools
<Insert Picture Here> Preface
What is the code coverage data for Measure to which extent source code is covered during testing. 5 consequently … Code coverage is A measure of how much source code is covered during testing. Testing is A set of activities aimed to prove that the system under test behaves as expected. finally …
CC – how to get • Create a template 6 • “Instrument” the source/compiled code/bytecode • Run testing, collect data • Generate report Template is a collection of all the code there is to cover Insert instructions for dropping data into a file/network, etc. May need to change environment HTML, DB, etc
• Block / primitive block • Line • Condition/branch/predicate • Entry/exit • Method • Path/sequence 7 CC – kinds of
CC – how to use • 8 • Perform analysis • Develop more tests • GOTO 1 Performed repeatedly, so resource-efficiency is really important Find what tests you need to develop. • 1: Measure (prev. slide) for testbase improvement Find what code you need to cover.Find what code you need to cover. • Find dead code Measure (prev. slide)
<Insert Picture Here> Mis-usages
• Must get to 100% 10 • 100% means no more testing • CC does not mean a thing • There is that tool which would generate tests for us and we're done CC – how not to use mis-usages May be not. No it does not. It does mean a fair amount if it is used properly. Nope.
<Insert Picture Here> Mis-usages Test generation
Test generation “We present a new symbolic execution tool, ####, capable of automatically generating tests that achieve high coverage on a diverse set of complex and environmentally-intensive programs.” #### tool documentation
Test generation cont. if ( b != 3 ) { double a = 1 / ( b – 2); } else { … }
Test generation cont. if ( b != 3 ) { double a = 1 / ( b – 3); } else { … } Reminder: testing is ... A set of activities aimed to prove that the system under test behaves as expected.
Test generation - conclusion Generated tests do not verify conformance of the core to the requirements. * Hence … Generated tests code coverage should not be mixed with regular functional tests code coverage. 15 (*) Same is true for all static analysis techniques
Who watches the watchmen? • Test logic gotta be right • No way to verify the logic • No metrics • No approaches • No techniques • Code review – the only way • Sole responsibility of test developer
<Insert Picture Here> Mis-usages What does 100% coverage mean?
100% block/line coverage 1 false
100% branch coverage 1 true -1 false
100% domain coverage 0 0 .1 √.1 -.1 Exception 0 e
100% sequence coverage (-1,-1)1 (1,1) (0,0) 1 NaN b (-1,1) (1,-1) 1 1
100% sequence coverage (-1,-1)1 (1,1) (0,0) 1 NaN b (-1,1) (1,-1) -1 -1 But … isPositive(float) has a defect!
100% sequence coverage • Has conceptual problems • Code semantics • Loops • One of the two • Assume libraries has no errors • Done in depth – with the libraries • Very expensive • A lot of sequences: 2# branches , generally speaking • Very hard to analyze data
100% coverage - conclusion 100% block/line/branch/path coverage, even if reachable, does not prove much. Hence … No need to try to get there unless ... 24
100% coverage - conclusion Most importantly ... A code coverage only measures coverage of a code which has been written. 25
<Insert Picture Here> Mis-usages Target value
Block coverage target value
100% coverage - conclusion 100% block/line/branch/path coverage, even if reachable, does not prove much. Hence … No need to try to get there unless … 100% is the target value. Which could happen if cost of a bug is really big and/or the product is really small. 31
Target value - conclusion True target value for block/line/branch/path comes from ROI, which is really hard to calculate and justify. 32
<Insert Picture Here> Usages
• Test base improvement. 34 • Dead code. • Metric • Control over code development CC – how to use Right. How to select which tests to develop first Barely an artifact Better have a good metric. • Control flow analysis
<Insert Picture Here> CC as a metric
What makes a good metric Simple to explain Simple to work towards Has a clear goal So that you could explain your boss why is that important to spend resources on So that you know what to do to improve So you could tell how far are you.
Is CC a good metric? Simple to explain Simple to work towards Has a clear goal Is a metric of quality of testing. (Relatively) easy to map uncovered code to missed tests. Nope. ROI – too complicated. + + -
Filter code coverage … to only leave code which should be covered completely Examples • Public API coverage • UI coverage • Controller code coverage • “Important code” coverage
Public API* Is a set of program elements suggested for usage by public documentation. For example: all functions and variables which are described in documentation. For a Java library: all public and protected methods and fields mentioned in the library javadoc. For Java SDK: … of all public classes in java and javax packages. (*) Only applicable for a library or a SDK
Public API
True Public API (c) Is a set of program elements which could be accessed directly by a library user Public API + all extensions of public API in non-public classes
True public API example ArrayList.java My code
UI coverage
UI coverage In a way, equivalent to public API but for a UI product • %% of UI elements shown – display coverage • %% user actions performed – action coverage Only “action coverage” could be obtained from CC data (*). (*) For UI toolkits which the presenter is familiar with.
Action coverage javax.swing.Action.actionPerformed(ActionEvent) javafx.event.EventHandler.handle(Event) org.myorg.NodeAction.actionPerformed(ActionEvent) org.myorg.NodeAction.nodeActionPerformed(Node myNode)
“Controller” code coverage Model Contains the domain logic View Implements user interaction Controller Maps the two. Only contains code which is called as a result of view actions and model feedbacks. Controller has very little boilerplate code. A good candidate for 100% block coverage.
“Important” code • Development/SQE marks class/method as important • We use an annotation @CriticalForCoverage • List of methods is obtained which are marked as important • We do that by an annotation processor right while main compilation • CC data is filtered by the method list • Goal is 100%
Examples of non-generic metrics • BPEL elements • JavaFX properties • A property in JavaFX is something you could set, get and bind • Project type coverage in NetBeans • Insert your own
CC as a metric - conclusion There are multiple ways to filter CC data to a set of code which needed to be covered in full. There are generic metrics and there is a possibility to introduce product specific metric. Such metrics are easy to use, although not always so straightforward to obtain. 51
<Insert Picture Here> Test prioritization
Test prioritization 100500 uncovered lines of code! Metric • Pick a metric • Develop tests “Metrics for managers. Me no manager! Me write code!” Consider mapping CC data to few other source code characteristics. “OMG! Where do I start?”
Age of the code New code is better be tested before getting to customer. (Improves bug escape rate, BTW) Old code is more likely to be tested by users or Not used by users.
What's a bug escape metric? Ratio of defects sneaked out unnoticed # defects not found before release # defects in the product In theory: # defects found after + # defects found before Practical: # defects found after release
Number of changes More times a piece of code was changed, more atomic improvements/bugfixes were implemented in it. Hence … Higher risk of introducing a regression.
Number of lines changed More lines changed – more testing it needs. Better all – number of uncovered lines which were changed in the last release.
Bug density Assuming all the pieces were tested equally well … Many bugs means there are, probably, even more • Hidden behind the known ones • Fixing existing ones may introduce yet more as regressions
Code complexity Assuming the same engineering talent and the same technology … More complex the code is – more bugs likely to be there. Any complexity metric would work: from class size to cyclomatic complexity
Putting it together A formula (1 – cc) * (a1 *x1 + a2 *x2 + a3 *x3 + ...) Where cc – code coverage (0 - 1) xi – a risk of bug discovery in a piece of code ai – a coefficient
Putting it together (1 – cc) * (a1 *x1 + a2 *x2 + a3 *x3 + ...) The ones with higher value are first to cover • Fix the coefficients • Develop tests • Collect statistics on bug escape • Fix the coefficient • Continue
Test prioritization - conclusion CC alone may not give enough information. Need to accompany it with other characteristics of test code to make a decision. Could use a few of other characteristics simultaneously. 62
<Insert Picture Here> Test prioritization Execution
Decrease test execution time Exclude tests which do not add coverage (*). But, be careful! Remember that CC is not everything and even 100% coverage does not mean a lot. While excluding tests get some orthogonal measurement as well, such as specification coverage. (*) Requires “test scales”
Control flow analysis Study the coverage report, see what test code exercises which code. (*). Recommended for developers. (*) Also requires “test scales”
Controlled code changes Do not allow commits unless all the new/changed code is covered. Requires simultaneous commits of tests and the changes.
Code coverage - conclusion 100% CC does not guarantee that the code is working right 100% CC may not be needed It is possible to build good metrics with CC CC helps with prioritization of test development Other source code characteristics could be used with CC 67
Coverage data is not free • Do just as much as you can consume * • Requires infrastructure work • Requires some development • Requires some analysis (*) The rule of thumb
Coverage data is not free • Do just as much as you can consume • Requires infrastructure work • Requires some development • Requires some analysis • Do just a little bit more than you can consume * • Otherwise how do you know how much you can consume? (*) The rule of thumb
<Insert Picture Here> Code coverage. The pragmatic approach. Александр Ильин Java Quality architect alexandre.iline@oracle.com

Pragmatic Code Coverage

  • 1.
    And now forsomething completely different ...
  • 2.
    <Insert Picture Here> Codecoverage. The pragmatic approach. Александр Ильин Java Quality architect alexandre.iline@oracle.com
  • 3.
    What it isabout? Should the testing be stopped at 100% coverage? Should 100% be the goal? How (else) to use code coverage information? 3 What it is not about? Tools
  • 4.
  • 5.
    What is thecode coverage data for Measure to which extent source code is covered during testing. 5 consequently … Code coverage is A measure of how much source code is covered during testing. Testing is A set of activities aimed to prove that the system under test behaves as expected. finally …
  • 6.
    CC – howto get • Create a template 6 • “Instrument” the source/compiled code/bytecode • Run testing, collect data • Generate report Template is a collection of all the code there is to cover Insert instructions for dropping data into a file/network, etc. May need to change environment HTML, DB, etc
  • 7.
    • Block /primitive block • Line • Condition/branch/predicate • Entry/exit • Method • Path/sequence 7 CC – kinds of
  • 8.
    CC – howto use • 8 • Perform analysis • Develop more tests • GOTO 1 Performed repeatedly, so resource-efficiency is really important Find what tests you need to develop. • 1: Measure (prev. slide) for testbase improvement Find what code you need to cover.Find what code you need to cover. • Find dead code Measure (prev. slide)
  • 9.
  • 10.
    • Must getto 100% 10 • 100% means no more testing • CC does not mean a thing • There is that tool which would generate tests for us and we're done CC – how not to use mis-usages May be not. No it does not. It does mean a fair amount if it is used properly. Nope.
  • 11.
  • 12.
    Test generation “We presenta new symbolic execution tool, ####, capable of automatically generating tests that achieve high coverage on a diverse set of complex and environmentally-intensive programs.” #### tool documentation
  • 13.
    Test generation cont. if( b != 3 ) { double a = 1 / ( b – 2); } else { … }
  • 14.
    Test generation cont. if( b != 3 ) { double a = 1 / ( b – 3); } else { … } Reminder: testing is ... A set of activities aimed to prove that the system under test behaves as expected.
  • 15.
    Test generation -conclusion Generated tests do not verify conformance of the core to the requirements. * Hence … Generated tests code coverage should not be mixed with regular functional tests code coverage. 15 (*) Same is true for all static analysis techniques
  • 16.
    Who watches thewatchmen? • Test logic gotta be right • No way to verify the logic • No metrics • No approaches • No techniques • Code review – the only way • Sole responsibility of test developer
  • 17.
    <Insert Picture Here> Mis-usages Whatdoes 100% coverage mean?
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
    100% sequence coverage (-1,-1)1 (1,1) (0,0) 1NaN b (-1,1) (1,-1) -1 -1 But … isPositive(float) has a defect!
  • 23.
    100% sequence coverage •Has conceptual problems • Code semantics • Loops • One of the two • Assume libraries has no errors • Done in depth – with the libraries • Very expensive • A lot of sequences: 2# branches , generally speaking • Very hard to analyze data
  • 24.
    100% coverage -conclusion 100% block/line/branch/path coverage, even if reachable, does not prove much. Hence … No need to try to get there unless ... 24
  • 25.
    100% coverage -conclusion Most importantly ... A code coverage only measures coverage of a code which has been written. 25
  • 26.
  • 27.
  • 28.
    100% coverage -conclusion 100% block/line/branch/path coverage, even if reachable, does not prove much. Hence … No need to try to get there unless … 100% is the target value. Which could happen if cost of a bug is really big and/or the product is really small. 31
  • 29.
    Target value -conclusion True target value for block/line/branch/path comes from ROI, which is really hard to calculate and justify. 32
  • 30.
  • 31.
    • Test baseimprovement. 34 • Dead code. • Metric • Control over code development CC – how to use Right. How to select which tests to develop first Barely an artifact Better have a good metric. • Control flow analysis
  • 32.
  • 33.
    What makes agood metric Simple to explain Simple to work towards Has a clear goal So that you could explain your boss why is that important to spend resources on So that you know what to do to improve So you could tell how far are you.
  • 34.
    Is CC agood metric? Simple to explain Simple to work towards Has a clear goal Is a metric of quality of testing. (Relatively) easy to map uncovered code to missed tests. Nope. ROI – too complicated. + + -
  • 35.
    Filter code coverage …to only leave code which should be covered completely Examples • Public API coverage • UI coverage • Controller code coverage • “Important code” coverage
  • 36.
    Public API* Is aset of program elements suggested for usage by public documentation. For example: all functions and variables which are described in documentation. For a Java library: all public and protected methods and fields mentioned in the library javadoc. For Java SDK: … of all public classes in java and javax packages. (*) Only applicable for a library or a SDK
  • 37.
  • 38.
    True Public API(c) Is a set of program elements which could be accessed directly by a library user Public API + all extensions of public API in non-public classes
  • 39.
    True public APIexample ArrayList.java My code
  • 40.
  • 41.
    UI coverage In away, equivalent to public API but for a UI product • %% of UI elements shown – display coverage • %% user actions performed – action coverage Only “action coverage” could be obtained from CC data (*). (*) For UI toolkits which the presenter is familiar with.
  • 42.
  • 43.
    “Controller” code coverage Model Containsthe domain logic View Implements user interaction Controller Maps the two. Only contains code which is called as a result of view actions and model feedbacks. Controller has very little boilerplate code. A good candidate for 100% block coverage.
  • 44.
    “Important” code • Development/SQEmarks class/method as important • We use an annotation @CriticalForCoverage • List of methods is obtained which are marked as important • We do that by an annotation processor right while main compilation • CC data is filtered by the method list • Goal is 100%
  • 45.
    Examples of non-genericmetrics • BPEL elements • JavaFX properties • A property in JavaFX is something you could set, get and bind • Project type coverage in NetBeans • Insert your own
  • 46.
    CC as ametric - conclusion There are multiple ways to filter CC data to a set of code which needed to be covered in full. There are generic metrics and there is a possibility to introduce product specific metric. Such metrics are easy to use, although not always so straightforward to obtain. 51
  • 47.
  • 48.
    Test prioritization 100500 uncoveredlines of code! Metric • Pick a metric • Develop tests “Metrics for managers. Me no manager! Me write code!” Consider mapping CC data to few other source code characteristics. “OMG! Where do I start?”
  • 49.
    Age of thecode New code is better be tested before getting to customer. (Improves bug escape rate, BTW) Old code is more likely to be tested by users or Not used by users.
  • 50.
    What's a bugescape metric? Ratio of defects sneaked out unnoticed # defects not found before release # defects in the product In theory: # defects found after + # defects found before Practical: # defects found after release
  • 51.
    Number of changes Moretimes a piece of code was changed, more atomic improvements/bugfixes were implemented in it. Hence … Higher risk of introducing a regression.
  • 52.
    Number of lineschanged More lines changed – more testing it needs. Better all – number of uncovered lines which were changed in the last release.
  • 53.
    Bug density Assuming allthe pieces were tested equally well … Many bugs means there are, probably, even more • Hidden behind the known ones • Fixing existing ones may introduce yet more as regressions
  • 54.
    Code complexity Assuming thesame engineering talent and the same technology … More complex the code is – more bugs likely to be there. Any complexity metric would work: from class size to cyclomatic complexity
  • 55.
    Putting it together Aformula (1 – cc) * (a1 *x1 + a2 *x2 + a3 *x3 + ...) Where cc – code coverage (0 - 1) xi – a risk of bug discovery in a piece of code ai – a coefficient
  • 56.
    Putting it together (1– cc) * (a1 *x1 + a2 *x2 + a3 *x3 + ...) The ones with higher value are first to cover • Fix the coefficients • Develop tests • Collect statistics on bug escape • Fix the coefficient • Continue
  • 57.
    Test prioritization -conclusion CC alone may not give enough information. Need to accompany it with other characteristics of test code to make a decision. Could use a few of other characteristics simultaneously. 62
  • 58.
    <Insert Picture Here> Testprioritization Execution
  • 59.
    Decrease test executiontime Exclude tests which do not add coverage (*). But, be careful! Remember that CC is not everything and even 100% coverage does not mean a lot. While excluding tests get some orthogonal measurement as well, such as specification coverage. (*) Requires “test scales”
  • 60.
    Control flow analysis Studythe coverage report, see what test code exercises which code. (*). Recommended for developers. (*) Also requires “test scales”
  • 61.
    Controlled code changes Donot allow commits unless all the new/changed code is covered. Requires simultaneous commits of tests and the changes.
  • 62.
    Code coverage -conclusion 100% CC does not guarantee that the code is working right 100% CC may not be needed It is possible to build good metrics with CC CC helps with prioritization of test development Other source code characteristics could be used with CC 67
  • 63.
    Coverage data isnot free • Do just as much as you can consume * • Requires infrastructure work • Requires some development • Requires some analysis (*) The rule of thumb
  • 64.
    Coverage data isnot free • Do just as much as you can consume • Requires infrastructure work • Requires some development • Requires some analysis • Do just a little bit more than you can consume * • Otherwise how do you know how much you can consume? (*) The rule of thumb
  • 65.
    <Insert Picture Here> Codecoverage. The pragmatic approach. Александр Ильин Java Quality architect alexandre.iline@oracle.com