Code Analysis
Overview • Introduction • Existing solutions • Run time errors • Design • Implementation • Future Work
Code Analysis Difference between project success & failure. • If there's going to be a program, there has to be construction. • Code is often the only accurate description of the software available. • Code must follow coding standards and code conventions.
Source code Conventions • 80% of the lifetime cost of a piece of software goes to maintenance. • Hardly any software is maintained for its whole life by the original author. • Code conventions improve the readability of the software. • Source code like any other product should be well packaged
Code optimization based analysis • Code Verification and Run-Time Error prediction at compile time using syntax directed translation. • Predict run time errors without program execution or test cases. • Uses Intermediate Code
Existing Solutions
Possible Run time Errors 1) Detecting uninitialized Variables Using variables before they have been initialized by the program can cause unpredictable results 2) Detecting Overflows, Underflows, and Divide by Zeros
Consider pseudo-code: X=X/(X-Y) Identifying all possible causes for error on the operation: o X and Y may not be initialized   X-Y may overflow or underflow  - X and Y may be equal and cause a division by zero  e X/(X–Y) may overflow or underflow  
All possible values of x & y in program p If the value of x & y both fall on the black line, there is a divide by zero error.
3) Detecting incorrect argument data types and incorrect number of arguments   • Checking of arguments for type and for the correct order of occurrence. • Requires both the calling program and the called program to be compiled with a special compiler option. • Checks can be made to determine if the number and types of arguments in function (and subroutine) calls are consistent with the actual function definitions.
4) Detecting errors with strings at run-time • A string must have a null terminator at the end of the meaningful data in the string. A common mistake is to not allocate room for this extra character. This can also be a problem with dynamic allocation. char * copy_str = malloc( strlen(orig_str) + 1); strcpy(copy_str, orig_str); • The strlen() function returns a count of the data characters which does not include the null terminator. • In the case of dynamic allocation, it might corrupt the heap
  a. Detecting Out-of-bounds indexing of statically and dynamically allocated arrays   A common run-time error is the reading and writing of arrays outside of their declared bounds. b. Detecting Out-of-Bounds Pointer References   A common run-time error for C and C++ programs occurs when a pointer points to memory outside its associated memory block.
Pseudo code for out of bound references for(i=0;i<5;i++) A[i]=i; p=A; for(i=0;i<=5;i++) p++; a=*p; /* out-of-bounds reading using pointers */
5) Detecting Memory Allocation and Deallocation Errors • A memory deallocation error occurs when a portion of memory is deallocated more than once. • Another common source of errors in C and C++ programs is an attempt to use a dangling pointer. A dangling pointer is a pointer to storage that is no longer allocated.
6) Detecting Memory Leaks • A program has a memory leak if during execution the program loses its ability to address a portion of memory because of a programming error; • A pointer points to a location in memory and then all the pointers pointing to this location are set to point somewhere else • A function/subroutine is called, memory is allocated during execution of the function/subroutine, and then the memory is not deallocated upon exit and all pointers to this memory are destroyed
Source code analyzer predicates Reliable: Proven free of run- time errors and under all operating conditions within the scope Faulty: Proven faulty each time the operation is executed. Dead: Proven unreachable (may indicate a functional issue) Unproven: Unproven code section or beyond the scope of the analyzer.
Specifications •Why Java for developing analyser?
Specifications •Why C/C++ as input language?
Design for Code Analyzer Input program (C File) S Lexical Analyzer y m b o l T a b l Parser e IC(SDT) Generation Run Time Error Predictions
Analysis of Code Input Program Lexical Analysis-Stream Tokenizer Parser- Condition = "(" Expression ("=="|"!="|">"|"<"|">="|"<=") Expression ")" Expression = Term {("+"|"-") Term} Term = Factor {("*"|"/") Factor} Factor = number | identifier | Intermediate code generation: Postfix Evaluation
3 address code generation Target Source File: argument operator operand operand result Test(n){ 1 2 int b,a,n,j; 0 < j n if(j<n) 1 if 0 gotol0 { 2 + a b a=a+b;} 3 = a 2 } l0:
Work Done: Intermediate Code
Further Work • Evaluation of intermediate code for performing data flow and control flow analysis. • Prediction of run time errors using intermediate code. • Using code optimization techniques such as constant folding to predict code behavior
REFERENCES • A V. Aho, R Sethi, J D. Ullman., Compilers: Principles, Techniques and Tools, 2nd ed. , Addison-Wesley Pub. Co. • G R. Luecke, J Coyle, J Hoekstra “A Survey of Systems for Detecting Serial Run-Time Errors”, The Iowa State University's High Performance Computing Group, Concurrency and Computation. : Practice and Experience. 18, 15(Dec. 2006), 1885-1907. • T Erkkinen, C Hote “Code Verification and Run-Time Error Detection Through Abstract Interpretation”, AIAA Modeling and Simulation Technologies Conference and Exhibit ,21 - 24 Aug 2006, Keystone, Colorado. • PolySpace Client for C/C++ 6 datasheet. Available HTTP: http://www.mathworks.com/products/polyspaceclientc.html. • D.M. Dhamdhere, Compiler Construction, Tata McGraw-Hill. • Semantic designs, “Flow analysis for control and data”, Available HTTP: http://www.semdesigns.com/Products/DMS/FlowAnalysis.html.

Code Analysis-run time error prediction

  • 1.
  • 2.
    Overview • Introduction • Existingsolutions • Run time errors • Design • Implementation • Future Work
  • 3.
    Code Analysis Difference between project success & failure. • If there's going to be a program, there has to be construction. • Code is often the only accurate description of the software available. • Code must follow coding standards and code conventions.
  • 4.
    Source code Conventions •80% of the lifetime cost of a piece of software goes to maintenance. • Hardly any software is maintained for its whole life by the original author. • Code conventions improve the readability of the software. • Source code like any other product should be well packaged
  • 5.
    Code optimization basedanalysis • Code Verification and Run-Time Error prediction at compile time using syntax directed translation. • Predict run time errors without program execution or test cases. • Uses Intermediate Code
  • 6.
  • 7.
    Possible Run timeErrors 1) Detecting uninitialized Variables Using variables before they have been initialized by the program can cause unpredictable results 2) Detecting Overflows, Underflows, and Divide by Zeros
  • 8.
    Consider pseudo-code: X=X/(X-Y) Identifying all possible causes for error on the operation: o X and Y may not be initialized   X-Y may overflow or underflow  - X and Y may be equal and cause a division by zero  e X/(X–Y) may overflow or underflow  
  • 9.
    All possible valuesof x & y in program p If the value of x & y both fall on the black line, there is a divide by zero error.
  • 10.
    3) Detecting incorrectargument data types and incorrect number of arguments   • Checking of arguments for type and for the correct order of occurrence. • Requires both the calling program and the called program to be compiled with a special compiler option. • Checks can be made to determine if the number and types of arguments in function (and subroutine) calls are consistent with the actual function definitions.
  • 11.
    4) Detecting errorswith strings at run-time • A string must have a null terminator at the end of the meaningful data in the string. A common mistake is to not allocate room for this extra character. This can also be a problem with dynamic allocation. char * copy_str = malloc( strlen(orig_str) + 1); strcpy(copy_str, orig_str); • The strlen() function returns a count of the data characters which does not include the null terminator. • In the case of dynamic allocation, it might corrupt the heap
  • 12.
      a. Detecting Out-of-bounds indexing of statically and dynamically allocated arrays   A common run-time error is the reading and writing of arrays outside of their declared bounds. b. Detecting Out-of-Bounds Pointer References   A common run-time error for C and C++ programs occurs when a pointer points to memory outside its associated memory block.
  • 13.
    Pseudo code forout of bound references for(i=0;i<5;i++) A[i]=i; p=A; for(i=0;i<=5;i++) p++; a=*p; /* out-of-bounds reading using pointers */
  • 14.
    5) Detecting MemoryAllocation and Deallocation Errors • A memory deallocation error occurs when a portion of memory is deallocated more than once. • Another common source of errors in C and C++ programs is an attempt to use a dangling pointer. A dangling pointer is a pointer to storage that is no longer allocated.
  • 15.
    6) Detecting MemoryLeaks • A program has a memory leak if during execution the program loses its ability to address a portion of memory because of a programming error; • A pointer points to a location in memory and then all the pointers pointing to this location are set to point somewhere else • A function/subroutine is called, memory is allocated during execution of the function/subroutine, and then the memory is not deallocated upon exit and all pointers to this memory are destroyed
  • 16.
    Source code analyzerpredicates Reliable: Proven free of run- time errors and under all operating conditions within the scope Faulty: Proven faulty each time the operation is executed. Dead: Proven unreachable (may indicate a functional issue) Unproven: Unproven code section or beyond the scope of the analyzer.
  • 17.
    Specifications •Why Java fordeveloping analyser?
  • 18.
  • 19.
    Design for CodeAnalyzer Input program (C File) S Lexical Analyzer y m b o l T a b l Parser e IC(SDT) Generation Run Time Error Predictions
  • 20.
    Analysis of Code InputProgram Lexical Analysis-Stream Tokenizer Parser- Condition = "(" Expression ("=="|"!="|">"|"<"|">="|"<=") Expression ")" Expression = Term {("+"|"-") Term} Term = Factor {("*"|"/") Factor} Factor = number | identifier | Intermediate code generation: Postfix Evaluation
  • 21.
    3 address codegeneration Target Source File: argument operator operand operand result Test(n){ 1 2 int b,a,n,j; 0 < j n if(j<n) 1 if 0 gotol0 { 2 + a b a=a+b;} 3 = a 2 } l0:
  • 22.
  • 23.
    Further Work • Evaluationof intermediate code for performing data flow and control flow analysis. • Prediction of run time errors using intermediate code. • Using code optimization techniques such as constant folding to predict code behavior
  • 24.
    REFERENCES • A V.Aho, R Sethi, J D. Ullman., Compilers: Principles, Techniques and Tools, 2nd ed. , Addison-Wesley Pub. Co. • G R. Luecke, J Coyle, J Hoekstra “A Survey of Systems for Detecting Serial Run-Time Errors”, The Iowa State University's High Performance Computing Group, Concurrency and Computation. : Practice and Experience. 18, 15(Dec. 2006), 1885-1907. • T Erkkinen, C Hote “Code Verification and Run-Time Error Detection Through Abstract Interpretation”, AIAA Modeling and Simulation Technologies Conference and Exhibit ,21 - 24 Aug 2006, Keystone, Colorado. • PolySpace Client for C/C++ 6 datasheet. Available HTTP: http://www.mathworks.com/products/polyspaceclientc.html. • D.M. Dhamdhere, Compiler Construction, Tata McGraw-Hill. • Semantic designs, “Flow analysis for control and data”, Available HTTP: http://www.semdesigns.com/Products/DMS/FlowAnalysis.html.