* * * * * “You can't fix the bug until you have filled out form 27B/6.” I swear, I just cannot adjust to the new development process. Last month, as I was starting work on “Project: Bradenburg [1],” I ended up using GCC (GNU's Not Unix Compiler Collection) 11, and just on a whim, I decided to try GCC 11 on “Project: Lumbergh [2]” and that's when I found an issue—“Project: Lumbergh” crashed immediately. It took about ten minutes to track down the issue—undefined behavior! Effectively, “Project: Lumbergh” was taking a pointer to a variable in an inner scope and using it outside said scope (don't worry, there's an example of what I'm talking below). I duly recorded a bug in Jira [3], but did not check in the patch as the other developer, CZ, was doing a bunch of work on “Project: Lumbergh” and I wanted to run the bug by him first just so he knew about it. In retrospect, I think I should have just checked in the patch and asked forgiveness, because the ensuing “process” nearly killed me. First off, the bug basically languished due to the holliday season and everybody taking vacation and what not. When I brought up the bug on Monday, CZ demanded a test case to show it failing. Problem—there was no reason to create a specific test case, because any test would have shown the crash. Second problem—it's undefined behavior—it only shows (as far as I know) when compiled with GCC 11. We don't use GCC 11 in production, and the GCC version we use in production produces code that runs, thus not finding the bug any ealier. CZ tried using valgrind [4] to locate the issue and couldn't. Of course CZ couldn't—the executable worked. As I tried to state, it only fails when compiled with GCC 11 (which we as a company don't use yet). I spent the week trying to convince him to just accept the patch because it removed undefined behavior. CZ was adamant in trying to reproduce the issue for himself. Because with our Corporate Overlords, all bugs must be reproducable, must have a test case, and must pass said test case, before it will be patched. Even in the case of blatant undefined behavior! **ARE YOU XXXXXXX KIDDING ME? IS OUR CODE SO XXXXXXX FRAGILE THAT MOVING THREE VARIABLES DECLARATIONS WILL XXXXXXX BREAK THE PROGRAM? IS THAT WHAT YOU ARE SO XXXXX­XX AFRAID OF? XXXXX­XXXXX­XXXXX­XXXXX!** [Calm down! Breath! In … out … in … out. —Editor] Sigh. Eventually, I was given the go-ahead to submit the patch. Here was my revision notes: > Bug fix XXXXX­X—remove undefined behavior > > This was found using GCC 11, and it's very hard to reproduce, since the > code invoked undefined behavior by storing pointers to data in a scope that > technically no longer exists when it's referenced. > > The reason this is hard to find depends upon how the compiler lays out the > stack frame for the given function. It can, for instance, collect all the > variable created at all the scopes in the function and set aside space for > all of them at once, which in that case the code will work without issue. > Or it could collect all the variables from all the scopes, and create just > enough space for all the variables, allowing the variables from one scope > to reuse memory for another scope, in which case, it may work, or not, > depending upon intervening scopes. Or, the compiler can create the scoped > variables when the code enters the scope, in which case the code may lead > to undeterministic behavior, depending upon the code following the scope. > > An example: > > -----[ C ]----- > struct bar > { > char b[MAX]; > }; > > struct foo > { > char f[MAX]; > struct bar *sub; > }; > > static int foo(A a, B b, C c) > { > struct foo f; > bool flag; > > flag = to_display(f.f,sizeof(f.f),a); > if (flag) > { > struct bar b; > > b.b = to_other_display(b.b,sizeof(b.b),b,c); > f.sub = &b; > } > else > f.sub = NULL; > > if (check_this()) > { > int x = tointeger(b); > char buf[MAX]; > > to_c_display(buf,sizeof(buf),c); > syslog(LOG_DEBUG,"check=%d c=%s",x,buf); > } > > do_something(&f); > return 0; > } > -----[ END OF LINE ]----- > > In this sample function, compiler A may collect all the local variables > into a single block at the start of the function: > > -----[ C ]----- > struct foo f; > bool flag; > struct bar b; > int x; > char buf[MAX]; > -----[ END OF LINE ]----- > > and even though this provokes undefined behavior, it will still work, and > valgrind won't find an issue. Compiler B might create space for all the > variables, but could reuse the space for the two sub blocks: > > -----[ C ]----- > struct foo f; > bool flag; > union > { > struct bar b; > struct { int x ; char buf[MAX]; } > }; > -----[ END OF LINE ]----- > > (not valid C syntax, but it should give the intent I'm going for). In this > case, the space pointed to by b could be overwritten if check_this() > returns true. Compiler C may only create the space as needed, so the > resulting assembly code may look something like: > > -----[ Assembly ]----- > foo > push rbp > mov rbp,rsp > sub rsp,sizeof(struct foo) + sizeof(flag) + padding > … > cmp al,[rbp+flag] > jz foo_skip > > sub rsp,sizeof(struct bar) + padding > … > lea rbx,[rbp - f]; > lea rdx,[rbp - b]; > mov [rbx + sub],rdx > … > add rsp,sizeof(struct bar) + padding > > foo_skip: > … > call check_this > tst al > jz foo_skip2 > > sub rsp,sizeof(int) + sizeof(buf) + padding > … > add rsp,sizeof(int) + sizeof(buf) + padding > > foo_skip2: > > call do_something > -----[ END OF LINE ]----- > > And in this case, if check_this() even returns false, data in struct bar > could be overwritten just with the call to check_this() itself. > > This is the very definition of “undefined behavior” and in this case, it's > might have been hard to find, even with tools like valgrind, since it > really depends upon the code generated by the compiler. I think we're lucky > in that GCC 11 layed out the code such that it crashed and thus, I was able > to isolate the issue long before it becomes important. > [1] gopher://gopher.conman.org/0Phlog:2021/12/10.1 [2] gopher://gopher.conman.org/0Phlog:2018/09/11.2 [3] https://www.atlassian.com/software/jira [4] https://valgrind.org/ Email author at sean@conman.org .