Detecting Memory Protection Errors in GNU-R using Static Checking
The C source code of R packages and the GNU-R interpreter interacts directly with objects allocated on the garbage collected R heap. This C code must maintain an explicit root set known as pointer protection stack (of “protected” objects), because the GC does not know about pointers stored on the C stack and in CPU registers. But, not every pointer held in a C local variable needs to be protected: we may know from the context that the object is reachable from other roots, that no functions we’re calling allocate (allocation triggers GC) or that functions we’re calling take the pointer as argument and protect it on their own. Based on such knowledge, the C code in R generally avoids protecting pointers unnecessarily. Maintaining the pointer protection stack is error-prone - note e.g. that if a function that did not allocate before now allocates, we have to revisit pointer protection in all functions that call (recursively) into it. We’ve implemented a bug-finding tool using the LLVM framework which automatically looks for protection errors using static analysis and static checking. The tool cannot find all errors and produces false alarms, but we have found maybe a hundred of (true) protection errors in the GNU-R interpreter using the tool and fixed them. This talk will focus on how the tool works internally and may be also of interest to R core implementers and implementors of R packages who use C.