BKK16-503 Undefined Behavior and Compiler Optimizations – Why Your Program Stopped Working With A Newer Compiler
The document discusses undefined behavior in C/C++ programming, highlighting its definitions, examples, and implications for compiler optimization. It outlines various cases of undefined behavior, such as signed integer overflow and divide by zero, and emphasizes the importance of detecting these issues through compiler warnings and the Undefined Behavior Sanitizer (ubsan). The conclusion reiterates that undefined behavior can lead to subtle bugs and security vulnerabilities, urging programmers to be vigilant in their coding practices.
BKK16-503 Undefined Behavior and Compiler Optimizations – Why Your Program Stopped Working With A Newer Compiler
1.
Presented by Date Event Undefined Behaviorand Compiler Optimization Why Your Program Stopped Working With A Newer Compiler Kugan Vivekanandarajah and Yvan Roux BKK16-503 March 11, 2016 Linaro Connect BKK16
2.
Outline ● What isundefined behavior ● Examples for undefined behavior ● Detecting undefined behavior ● Conclusion
3.
How many timesdoes this loop iterate ? int d[16]; int SATD (void) { int satd = 0, dd, k; for (dd = d[k = 0]; k<16; dd = d[++k]) { satd += (dd < 0 ? -dd : dd); } return satd; }
4.
How many timesdoes this loop iterate ? Infinite loop generated on non-infinite code (?) ● 464.h264ref goes into infinite loop for gcc 4.8 ● https://gcc.gnu.org/PR53073 int d[16]; int SATD (void) { int satd = 0, dd, k; for (dd = d[k = 0]; k<16; dd = d[++k]) { satd += (dd < 0 ? -dd : dd); } return satd; }
5.
How many timesdoes this loop iterate ? ● C standard says: ○ It is legal for a pointer to point to one element past the end ○ Accessing that location is undefined ● Compiler can therefore assume that the “k” can never be 16 at the point of k < 16 int d[16]; int SATD (void) { int satd = 0, dd, k; for (dd = d[k = 0]; k<16; dd = d[++k]) { satd += (dd < 0 ? -dd : dd); } return satd; }
6.
What is undefinedbehavior ● ISO Standard definition: “behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements” ● C FAQ definition: “Anything at all can happens; the Standard imposes no requirements. The program may fail to compile, or it may execute incorrectly (either crashing or silently generating incorrect results), or it may fortuitously do exactly what the programmer intended” ● They may lead to very subtle bugs or have critical impact on security ○ Must be avoided by programmers
7.
Why undefined behaviorexists ? ● C/C++ is designed to be an efficient low-level programming language ● Offers compiler writers freedom to optimize ● Safe languages like Java have few undefined behavior ○ Safe and reproducible behavior across implementations ○ At the expense of performance X*2/2 X Optimized if no overflow Example:
8.
Other kinds ofbehavior ● Implementation-defined behavior: unspecified behavior where each implementation documents how the choice is made
9.
Other kinds ofbehavior ● Implementation-defined behavior: unspecified behavior where each implementation documents how the choice is made ● Unspecified behavior: use of an unspecified value, or other behavior where this International Standard provides two or more possibilities and imposes no further requirements on which is chosen in any instance
10.
Other kinds ofbehavior ● Implementation-defined behavior: unspecified behavior where each implementation documents how the choice is made ● Unspecified behavior: use of an unspecified value, or other behavior where this International Standard provides two or more possibilities and imposes no further requirements on which is chosen in any instance ● Locale-specific behavior: behavior that depends on local conventions of nationality, culture, and language that each implementation documents
11.
Examples for undefinedbehavior ● Signed integer overflow ● Shifting an n-bit integer by n or more bits ● Divide by zero ● Dereferencing a NULL pointer ● Pointer arithmetic that wraps ● Two pointers of different types that alias ● Reading an uninitialized variable
12.
How to detectundefined behavior ● Compiler warnings ○ The compiler can issue a warning at compile time when it can statically detect some kind of wrongdoing ● Undefined Behavior Sanitizer - ubsan ○ GCC gained ubsan support from version 4.9 ■ Run-time checker for the C and C++ languages ○ Some undefined uses will not be obvious ■ Input might come from user / other modules and statically not detectable ■ Ubsan can help here with right input
13.
What will bethe output when int is 32 bit ? #include <stdio.h> int foo (int a) { if (a + 100 > a) printf ("%d GT %dn", a + 100, a); else printf ("%d LT %dn", a + 100, a); return 0; } int main () { foo (100); foo (0x7fffffff); return 0; }
14.
Signed integer overflow- PR30475 ● Output At -O0 200 GT 100 -2147483549 LT 2147483647 ● Output At -O2 200 GT 100 -2147483549 GT 2147483647 #include <stdio.h> int foo (int a) { if (a + 100 > a) printf ("%d GT %dn", a + 100, a); else printf ("%d LT %dn", a + 100, a); return 0; } int main () { foo (100); foo (0x7fffffff); return 0; }
15.
Signed integer overflow- PR30475 ● According to C and C++ language standards overflow of a signed value is undefined behavior ○ A correct (standard conforming) C/C++ program must never generate signed overflow ● (int + 100 > int) in example is always true ● -fno-strict-overflow /-fwrapv disables it gcc -O2 t.c -Wstrict-overflow t.c: In function ‘foo’: t.c:4:5: warning: assuming signed overflow does not occur when assuming that (X + c) >= X is always true [-Wstrict-overflow] gcc -O2 t.c -fsanitize=undefined ; ./a.out 200 GT 100 t.c:5:7: runtime error: signed integer overflow: 2147483647 + 100 cannot be represented in type 'int' -2147483549 GT 2147483647
16.
Signed integer overflowand security implications ● Defence against many software security problems is validating input ● If the validating code is undefined, code can be vulnerable ● An axample from https://lwn. net/Articles/278137/ ○ If the “len” comes from user, then it can be used to overflow the buffer (and exploit) ● Some of these overflow checks are used as common idioms to prevent buffer overflows in security sensitive code if (buffer + len >= buffer_end || buffer + len < buffer) die_a_gory_death ("len is out of rangen"); if (buffer + len >= buffer_end) die_a_gory_death ("len is out of rangen"); Will be optimized into:
17.
What will bethe output for this ? #include <stdio.h> int foo (int x, int y) { x >>= (sizeof (int) << y); return x; } int main () { printf ("%dn", foo (1000, 3)); return 0; }
18.
Shifting n-bit integerby n or more bits undefined - PR48418 gcc t.c -O0; ./a.out 1000 gcc t.c -O2; ./a.out 0 gcc -O2 t.c -fsanitize=undefined ; ./a.out t.c:5:4: runtime error: shift exponent 32 is too large for 32-bit type 'int' 1000 #include <stdio.h> int foo (int x, int y) { x >>= (sizeof (int) << y); return x; } int main () { printf ("%dn", foo (1000, 3)); return 0; }
19.
What will bethe output for this ? #include <stdio.h> int testdiv (int i, int k) { if (k == 0) printf ("found divide by zeron"); return (i / k); } int main() { int i = testdiv (1, 0); return (i); }
20.
Divide by zero- PR29968 ● Divide by zero is undefined ● Based on error found with PostgreSQL 8.1.5 on Solaris 9 sparc with gcc-4.1 ● Since k is divisor, compiler assumed “k” cannot be zero ○ print statement is optimized away #include <stdio.h> int testdiv (int i, int k) { if (k == 0) printf ("found divide by zeron"); return (i / k); } int main() { int i = testdiv (1, 0); return (i); }
21.
if (!msize) msize= 1 / msize; /* provoke a signal */ Divide by zero - PR29968 ● Divide by zero is undefined ● Based on error found with PostgreSQL 8.1.5 on Solaris 9 sparc with gcc-4.1 ● Since k is divisor, compiler assumed “k” cannot be zero ○ print statement is optimized away An example from Linux : (http://www.spinics.net/linux/fedora/linux-security-module/msg12814.html) #include <stdio.h> int testdiv (int i, int k) { if (k == 0) printf ("found divide by zeron"); return (i / k); } int main() { int i = testdiv (1, 0); return (i); }
22.
What will bethe output for this ? unsigned char* addr = (unsigned char*)0xfffffffe; unsigned len = 4; if (addr + len < addr) { printf( "wrapsn"); } else { printf( "no wrapn"); }
23.
Pointer arithmetic thatwraps - PR54365 (for ARM) gcc -O2 t.c; ./a.out no wrap gcc t.c; ./a.out wraps ● Current version of ubsan does not model this ○ No errors flagged ○ Not all the undefined behaviours are modelled in ubsan / as compiler warnings unsigned char* addr = (unsigned char*)0xfffffffe; unsigned len = 4; if (addr + len < addr) { printf( "wrapsn"); } else { printf( "no wrapn"); }
24.
static unsigned inttun_chr_poll (struct file *file, poll_table * wait) { struct tun_file *tfile = file->private_data; struct tun_struct *tun = __tun_get(tfile); struct sock *sk = tun->sk; unsigned int mask = 0; if (!tun) return POLLERR; …. } Dereferencing a NULL pointer ● Example from Linux Kernel (https://lwn. net/Articles/342330/)
25.
Calling a NULLObject - PR68853 ● gcc-6 exposes undefined behavior in Chromium v8 garbage collector ○ Calling a NULL object is undefined ○ -fno-delete-null-pointer-checks gets this to work
26.
Reading an uninitializedvariable ● Reading an uninitialized variable is undefined behavior ○ Compiler can assign any value to the variable and expressions derived from the variable ● http://kqueue. org/blog/2012/06/25/more- randomness-or-less/ ○ When compiled with a version of LLVM, entire seed computation is optimized away ○ Results of gettimeofday () and getpid () are not used at all ○ srandom () is called with some garbage value. struct timeval tv; unsigned long junk; gettimeofday (&tv, NULL); srandom ((getpid() << 16) ^ tv.tv_sec ^ tv. tv_usec ^ junk);
27.
Conclusion ● Undefined behaviormeans standard non conforming code ○ Compilers will assume that undefined behavior is not present in the code while optimizing ● Undefined behavior can be subtle and not always obvious ● Use compiler warnings and ubsan to detect them ○ Unfortunately, not all the undefined behavior are modelled in ubsan ○ Know the compiler flags (if any) to disable optimization that might be relying on undefined behavior