This is the V2 patchset of the kprobes jump optimization (a.k.a OPTPROBES)for powerpc. Kprobe being an inevitable tool for kernel developers, enhancing the performance of kprobe has got much importance. Currently kprobes inserts a trap instruction to probe a running kernel. Jump optimization allows kprobes to replace the trap with a branch, reducing the probe overhead drastically. In this series, conditional branch instructions are not considered for optimization as they have to be assessed carefully in SMP systems. The kprobe placed on the kretprobe_trampoline during boot time, is also optimized in this series. Patch 4/4 furnishes this. The first two patches can go independently of the series. The helper functions in these patches are invoked in patch 3/4. Performance: ============ An optimized kprobe in powerpc is 1.05 to 4.7 times faster than a kprobe. Example: Placed a probe at an offset 0x50 in _do_fork(). *Time Diff here is, difference in time before hitting the probe and after the probed instruction. mftb() is employed in kernel/fork.c for this purpose. # echo 0 > /proc/sys/debug/kprobes-optimization Kprobes globally unoptimized [ 233.607120] Time Diff = 0x1f0 [ 233.608273] Time Diff = 0x1ee [ 233.609228] Time Diff = 0x203 [ 233.610400] Time Diff = 0x1ec [ 233.611335] Time Diff = 0x200 [ 233.612552] Time Diff = 0x1f0 [ 233.613386] Time Diff = 0x1ee [ 233.614547] Time Diff = 0x212 [ 233.615570] Time Diff = 0x206 [ 233.616819] Time Diff = 0x1f3 [ 233.617773] Time Diff = 0x1ec [ 233.618944] Time Diff = 0x1fb [ 233.619879] Time Diff = 0x1f0 [ 233.621066] Time Diff = 0x1f9 [ 233.621999] Time Diff = 0x283 [ 233.623281] Time Diff = 0x24d [ 233.624172] Time Diff = 0x1ea [ 233.625381] Time Diff = 0x1f0 [ 233.626358] Time Diff = 0x200 [ 233.627572] Time Diff = 0x1ed # echo 1 > /proc/sys/debug/kprobes-optimization Kprobes globally optimized [ 70.797075] Time Diff = 0x103 [ 70.799102] Time Diff = 0x181 [ 70.801861] Time Diff = 0x15e [ 70.803466] Time Diff = 0xf0 [ 70.804348] Time Diff = 0xd0 [ 70.805653] Time Diff = 0xad [ 70.806477] Time Diff = 0xe0 [ 70.807725] Time Diff = 0xbe [ 70.808541] Time Diff = 0xc3 [ 70.810191] Time Diff = 0xc7 [ 70.811007] Time Diff = 0xc0 [ 70.812629] Time Diff = 0xc0 [ 70.813640] Time Diff = 0xda [ 70.814915] Time Diff = 0xbb [ 70.815726] Time Diff = 0xc4 [ 70.816955] Time Diff = 0xc0 [ 70.817778] Time Diff = 0xcd [ 70.818999] Time Diff = 0xcd [ 70.820099] Time Diff = 0xcb [ 70.821333] Time Diff = 0xf0 Implementation: =================== The trap instruction is replaced by a branch to a detour buffer. To address the limitation of branch instruction in power architecture, detour buffer slot is allocated from a reserved area . This will ensure that the branch is within ± 32 MB range. The current kprobes insn caches allocate memory area for insn slots with module_alloc(). This will always be beyond ± 32MB range. The detour buffer contains a call to optimized_callback() which in turn call the pre_handler(). Once the pre-handler is run, the original instruction is emulated from the detour buffer itself. Also the detour buffer is equipped with a branch back to the normal work flow after the probed instruction is emulated. Before preparing optimization, Kprobes inserts original(breakpoint instruction)kprobe on the specified address. So, even if the kprobe is not possible to be optimized, it just uses a normal kprobe. Limitations: ============== - Number of probes which can be optimized is limited by the size of the area reserved. - Currently instructions which can be emulated using analyse_instr() are the only candidates for optimization. - Conditional branch instructions are not optimized. - Probes on kernel module region are not considered for optimization now. Link for the V1 patchset: https://lkml.org/lkml/2016/9/7/171 https://lkml.org/lkml/2016/9/7/174 https://lkml.org/lkml/2016/9/7/172 https://lkml.org/lkml/2016/9/7/173 Changes from v1: - Merged the three patches in V1 into a single patch. - Comments by Masami are addressed. - Some helper functions are implemented in separate patches. - Optimization for kprobe placed on the kretprobe_trampoline during boot time is implemented. Kindly let me know your suggestions and comments. Thanks, -Anju Anju T Sudhakar (2): arch/powerpc: Implement Optprobes arch/powerpc: Optimize kprobe in kretprobe_trampoline Naveen N. Rao (2): powerpc: asm/ppc-opcode.h: introduce __PPC_SH64() powerpc: add helper to check if offset is within rel branch range .../features/debug/optprobes/arch-support.txt | 2 +- arch/powerpc/Kconfig | 1 + arch/powerpc/include/asm/code-patching.h | 1 + arch/powerpc/include/asm/kprobes.h | 23 +- arch/powerpc/include/asm/ppc-opcode.h | 1 + arch/powerpc/include/asm/sstep.h | 1 + arch/powerpc/kernel/Makefile | 1 + arch/powerpc/kernel/kprobes.c | 8 + arch/powerpc/kernel/optprobes.c | 332 +++++++++++++++++++++ arch/powerpc/kernel/optprobes_head.S | 135 +++++++++ arch/powerpc/lib/code-patching.c | 24 +- arch/powerpc/lib/sstep.c | 22 ++ arch/powerpc/net/bpf_jit.h | 11 +- 13 files changed, 553 insertions(+), 9 deletions(-) create mode 100644 arch/powerpc/kernel/optprobes.c create mode 100644 arch/powerpc/kernel/optprobes_head.S -- 2.7.4