Overview
Internally, we’ve gotten requests for having our clang-based compiler support users in mitigating their code against “skip-fault attacks”. I’ve come up with a solution that I would deem sufficient, but I wanted to reach out to request feedback. Particularly, my goal is to have my changes be upstreamed, and if there are any problems with the design, I would to remedy them sooner than later.
Background
Users of our compiler are required to protect some parts of their code against a “skip-fault” attack, which involves some attacker - who has physical access to a piece of hardware - triggering a microprocessor to “skip” a single instruction. For instance:
- Return instructions
- Branches, and conditional code feeding into those branches
- Reading/Writing specific regions of memory
- Calling functions
For simple cases like return instructions, all we’ve been asked to do is duplicate the branch instruction, like so:
bx lr |->| bx lr |->| bx lr We’ve been told that while skipping a single instruction is a risk we should account for, skipping more than one has been deemed outside our margin of risk, and we shouldn’t need to plan for it.
Interface
The frontend is not set in stone, and is subject to change, but currently I am moving forward with the assumption that I’ll be adding a new pragma of some kind, or a C11 statement attribute.
#pragma skip_fault frame_code|body int foo(int *arr, int n) { int acc = 0; for (int i = 0; i < n; i++) { if (i % 2 == 0) { acc += arr[i]; } else { acc -= arr[i]; } } return acc; } This pragma will allow users to specify certain things that the user wishes to be skip-fault mitigated. Ideally, users would be able to independently ask for certain things about a function to be skip-fault mitigated, such as the frame-code, any control-flow instructions, etc. This pragma should ideally work on a region of code too, not just a whole function, like so:
int foo(int *arr, int n) { int acc = 0; for (int i = 0; i < n; i++) { #pragma skip_fault control_flow if (i % 2 == 0) { acc += arr[i]; } else { acc -= arr[i]; } } return acc; } In the above, only the if statement will get the appropriate mitigations, and the rest of the function will be generated as normal.
I imagine this pragma will add some metadata to the instructions, which will then be operated on after ISel.
Hand Waving I’m currently not sure how the metadata is expected to live past ISel.
LLVM Implementation
The plan for LLVM is to massage the IR to a point where all instructions that have the skip-fault metadata can just be naively duplicated. This involves making sure that all relevant instructions are idempotent. In my mind, this means that we must guarantee that for any given instruction, there should be no overlap between the registers being def’d and used. The appropriate massaging I believe can happen in two passes:
- Tied-Def Rewrite Pass
- This pass is meant to remove all instances of instructions that use tied-defs, and rewrite them with instructions that don’t use tied-defs.
- Since tied-defs by definition must have a single register be def’d and used, we must eliminate all of them, and replace them with idempotent instructions.
- This pass should ideally run before register allocation, as the rewritten forms of the tied-def instructions will certainly use more registers.
- Register Allocation Fixup Pass
-
;; We may often run into a situation like the following: %0 = add killed %1, %2 ;; Where an instruction has a killed register as one of its inputs, ;; which encourages the register allocator to reuse whatever register ;; was killed for one of the def'd registers, like so: $r0 = add $r0, $r1 ;; The plan for the RA fixup pass was to go through register kills, ;; and instead move the kills to a separate instruction: %0 = add %1, %2 kill %1 ;; This way, the register allocator is forced to use distinct ;; registers for %0, %1, and %2. $r0 = add $r1, $r2 kill $r1 -
This pass would need to run right before the register allocator, and would need another target hook in the optimized regalloc pipeline, because the scheduler pass that runs before regalloc will certainly shift around instructions, which may cause the kill instructions to move out of place.
-
The actual naive duplication pass should run post-RA, but before - maybe even right before - prologue epilogue insertion, as that’s the first pass to care about the size of the function.
I’m a bit conflicted about the described implementation, and would like some feedback on the following if possible.
- I like the above design for how localized it is, but I wonder if skip-fault mitigation should be a more comprehensive change. Perhaps skip-fault mitigation should be an opt-level kind of option, where all passes should be aware of skip-fault mitigation. Should ISel be aware that we are compiling with skip-fault in mind, and not generate instructions with tied-defs?
- My rewrite pass doesn’t prevent future passes from generating instructions that have tied-defs. If any optimization pass happens between it and the duplication pass, we miss out on certain instructions being duplicated.
- Maybe instead of adding kills to the IR, the register allocator could be modified to extend live ranges by a single instruction? Would this be cleaner than modifying the IR?