On Tue, 2018-01-23 at 11:44 +0100, Ingo Molnar wrote: > * David Woodhouse wrote: > > Hm? We still have GCC emitting 'call __fentry__' don't we? Would be nice to get  > > to the point where we can patch *that* out into a NOP... or are you saying we  > > already can? > Yes, we already can and do patch the 'call __fentry__/ mcount' call site into a  > NOP today - all 50,000+ call sites on a typical distro kernel. > > We did so for a long time - this is all a well established, working mechanism. That's neat; I'd missed that. > > But this is a digression. I was being pedantic about the "0 cycles" but sure,  > > this would be perfectly tolerable. > It's not a digression in two ways: > > - I wanted to make it clear that for distro kernels it _is_ a zero cycles overhead >   mechanism for non-SkyLake CPUs, literally. > > - I noticed that Meltdown and the CR3 writes for PTI appears to have established a >   kind of ... insensitivity and numbness to kernel micro-costs, which peaked with >   the per-syscall MSR write nonsense patch of the SkyLake workaround. >   That attitude is totally unacceptable to me as x86 maintainer and yes, still >   every cycle counts. Yeah, absolutely. But here we're talking about the overhead on non-SKL, and on non-SKL the IBRS overhead is zero too (well, again not precisely zero because it turns into NOPs). You're absolutely right that we shouldn't stop counting cycles. I've already noted that on SKL IBRS is actually a lot faster than on earlier generations, and we also get back some of the overhead by turning the retpoline into a bare jmp again. We haven't *forgotten* about performance. I'd like to see your solution once the details are sorted out, and see proper benchmarks — both microbenchmarks and real workloads — comparing the two. And then make a reasoned decision based on that, and on how happy we are with the theoretical holes that your solution leaves, in the cold light of day. We should also look at whether we want to set STIBP too, which is somewhat orthogonal to using IBRS to protect the kernel, and could end up with some of the same MSR writes (at least setting to zero) on some of the same code paths.