On Fri, Aug 08, 2014 at 12:43:40PM -0400, Steven Rostedt wrote: > On Fri, 8 Aug 2014 18:27:14 +0200 > Peter Zijlstra wrote: > > > On Fri, Aug 08, 2014 at 10:58:58AM -0400, Steven Rostedt wrote: > > > > > > > No, they are also used by optimized kprobes. This is why optimized > > > > > kprobes depend on !CONFIG_PREEMPT. [ added Masami to the discussion ]. > > > > > > > > How do those work? Is that one where the INT3 relocates the instruction > > > > stream into an alternative 'text' and that JMPs back into the original > > > > stream at the end? > > > > > > No, it's where we replace the 'int3' with a jump to a trampoline that > > > simulates an INT3. Speeds things up quite a bit. > > > > OK, so the trivial 'fix' for that is to patch the probe site like: > > > > preempt_disable(); INC GS:%__preempt_count > > call trampoline; CALL 0xDEADBEEF > > preempt_enable(); DEC GS:%__preempt_count > > JNZ 1f > > CALL ___preempt_schedule > > 1f: > > > > At which point the preempt_disable/enable() are the read side primitives > > and call_rcu_sched/synchronize_sched are sufficient to release it. > > > > With the per-cpu preempt count stuff we have on x86 that is 4 > > instructions for the preempt_*() stuff -- they're 'big' instructions > > though, since 3 have memops and 2 have a segment prefix. > > > > > > Now the question is, how do you do that atomically? And safely. > Currently, all we replace at the call sites is a nop that is added by > gcc -pg and us replacing the call mcount with it. That looks much more > complex than our current solution. Same way kprobes already does it. You can place that kprobe anywhere as long as the function is long enough. The JMP you write for the optimized kprobes is often longer than the instruction its patching. So you start by writing the INT3 which is atomic, after that you 'copy' the original text you're going to destroy into the tail of the trampoline, followed by a 'return' JMP. Then you write the tail end of the above sequence, and finally you 'fixup' the first instruction by removing the INT3. But yes, I'm not sure that's going to work for the mcount thing, but it will work for kprobes just fine, since its already doing this afaik.