On Tue, Jan 24, 2017 at 08:52:22PM +0000, Maciej W. Rozycki wrote: > On Tue, 24 Jan 2017, James Hogan wrote: > > > > Can't for example the low-level exception handling entry/exit code be > > > moved out of the way of the EVA overlap range and then all watchpoints > > > masked for the duration of kernel mode execution? This would be quite > > > expensive, however it could only be executed if a task flag indicates > > > watchpoints are being used. > > > > That doesn't cover data watches. RAM would still need accessing, e.g. to > > save/restore the watch state from the thread context, or even to read > > the task flag, and stack accesses in C code. > > All the critical data structures would have to be outside the EVA > overlap. This in itself is awkward. If a SoC supports multiple RAM sizes, e.g. up to 2GB, you might want a single EVA kernel that could support both. Normally you could just go with a 2GB compatible layout (for the sake of argument, lets say RAM cached @ kernel VA 0x40000000 .. 0xBFFFFFFF, ignoring BEV overlays for now), but if less than 1GB is fitted then none of that RAM is outside of the user overlap range. > > > The only safe way for it to work would be to somehow disable or inhibit > > watchpoints before clearing EXL, and re-enable them after setting EXL, > > though you'd still get a loop of deferred watchpoints if it hit on the > > way out to user mode unless cleared at the last moment before ERET. > > Ah, I forgot about CP0.Cause.WP -- is it not enough to clear the bit to > have any pending exception cancelled? If so (and the architecture manual > is actually clear that it is) then it looks like we have a solution and we > don't have to place any code or data specially, although it'll have to be > carefully coded. > > > > Alternatively perhaps we could clobber > > > CP0.EntryHi.ASID, at least temporarily; that would be cheaper. > > > > Kernel mode still needs to access the user address space. > > Sure, that's why it would have to be temporary. Low-level exception > entry/exit code is not supposed to have a need to access user memory. Ah I see, really temporary. > > So we can put aside a certain ASID value, say 0 (for easy pasting with > INS from $0), and never use it for a real context. Then it can be cleared > right away at the general exception entry point if EVA is in use, say: > > mfc0 $k0, $10 > ins $k0, $zero, 0, 10 > mtc0 $k0, $10 > > (there'll be a hazard here, but we can clear it later on if needed). > There is no neeed to save the old ASID as we can retrieve the original > from our data structures. > > Then we can proceed with the usual switch to the kernel mode, switching > stacks, saving registers, etc. We can then check CP0.Cause.WP and stash > it away for further processing if needed (though discarding it would I > think be the usual if not only choice) and clear, with a hazard barrier, > right before CP0.Status.EXL is cleared. > > Now that we're in the regular kernel mode, with ASID still set to 0, we > can check if process tracing has been enabled and if so, then iterate over > the watchpoints registers masking them all. At this point we can restore > the correct ASID in CP0.EntryHi and proceed with the exception handler. > > And then we'd do the reverse in the exception epilogue, only restoring > the ASID as the last instruction before final ERET. > > > Alternatively we could set WatchHi.ASID to a reserved one, and only > > clear/set the WatchHi.G bit (to bypass the ASID match) at the first/last > > moment while EXL=1. It still wouldn't protect against code watches > > around there exposing the kernel address of that code by the resulting > > hang though, so would need to move the ebase out of the overlap range > > too (which would have to be platform specific). > > You'd still have to iterate over all WatchHi registers, a variable number > up to 8 architecturally, which I think would be too expensive for the > common exception path. > > Poking at ASID as I described above is just a couple of instructions at > entry and exit, and the rest would only be done if tracing is active. > Plus you don't actually have to move anything away, except from the final > ERET, though likely not even that, owing to the delayed nature of an ASID > update. Probably, so long as you ignore QEMU. > > So can you find a flaw in my proposal so far? not a functional one. > We'll have to think about > the TLB refill handler yet though. A deferred watch from refill handler (e.g. page tables) would I think trigger an immediate watch exception on eret, and get cleared / ignored. It would probably make enough of a timing difference for userland to reliably detect (in order to probe where the process' page tables are in kernel virtual memory, to be able to mount a more successful attack given some other vulnerability). Cheers James