On Fri, 2018-01-12 at 18:05 +0000, Andrew Cooper wrote: > > If you unconditionally fill the RSB on every entry to supervisor mode, > then there are never guest-controlled RSB values to be found. > > With that property (and IBRS to protect Skylake+), you shouldn't need > RSB filling anywhere in the middle. Yes, that's right. We have a choice — we can do it on kernel entry (in the interrupt and syscall and NMI paths), and that's nice and easy and really safe because we know there's *never* a bad RSB entry lurking while we're in the kernel. The alternative, which is what we seem to be learning towards now in the latest tables from Dave (https://goo.gl/pXbvBE and https://goo.gl/Grbuhf), is to do it on context switch when we might be switching from a shallow call stack to a deeper one. Which has much better performance characteristics for processes which make non- sleeping syscalls. The caveat with the latter approach is that we do depend on the fact that context switches are the only imbalance in the kernel. But that's OK — we don't have a longjmp or anything else like that. Especially that goes into a *deeper* call stack. Do we?