All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] OPTPROBES for powerpc
@ 2016-09-07  9:33 Anju T Sudhakar
  2016-09-07  9:33 ` [PATCH 1/3] arch/powerpc : Add detour buffer support for optprobes Anju T Sudhakar
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Anju T Sudhakar @ 2016-09-07  9:33 UTC (permalink / raw)
  To: linux-kernel, linuxppc-dev
  Cc: ananth, naveen.n.rao, paulus, srikar, benh, mpe, hemant, mahesh,
	mhiramat, Anju T Sudhakar

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=y, Size: 4928 bytes --]

This is the patchset of the kprobes jump optimization
(a.k.a OPTPROBES)for powerpc. Kprobe being an inevitable tool
for kernel developers, enhancing the performance of kprobe has
got much importance.

Currently kprobes inserts a trap instruction to probe a running kernel.
Jump optimization allows kprobes to replace the trap with a branch,
reducing the probe overhead drastically.

In this series, conditional branch instructions are not considered for
optimization as they have to be assessed carefully in SMP systems.


Performance:
=============
An optimized kprobe in powerpc is 1.05 to 4.7 times faster than a kprobe.

Example:

Placed a probe at an offset 0x50 in _do_fork().
*Time Diff here is, difference in time before hitting the probe and
after the probed instruction. mftb() is employed in kernel/fork.c for
this purpose.

# echo 0 > /proc/sys/debug/kprobes-optimization
Kprobes globally unoptimized
[  233.607120] Time Diff = 0x1f0
[  233.608273] Time Diff = 0x1ee
[  233.609228] Time Diff = 0x203
[  233.610400] Time Diff = 0x1ec
[  233.611335] Time Diff = 0x200
[  233.612552] Time Diff = 0x1f0
[  233.613386] Time Diff = 0x1ee
[  233.614547] Time Diff = 0x212
[  233.615570] Time Diff = 0x206
[  233.616819] Time Diff = 0x1f3
[  233.617773] Time Diff = 0x1ec
[  233.618944] Time Diff = 0x1fb
[  233.619879] Time Diff = 0x1f0
[  233.621066] Time Diff = 0x1f9
[  233.621999] Time Diff = 0x283
[  233.623281] Time Diff = 0x24d
[  233.624172] Time Diff = 0x1ea
[  233.625381] Time Diff = 0x1f0
[  233.626358] Time Diff = 0x200
[  233.627572] Time Diff = 0x1ed

# echo 1 > /proc/sys/debug/kprobes-optimization
Kprobes globally optimized
[   70.797075] Time Diff = 0x103
[   70.799102] Time Diff = 0x181
[   70.801861] Time Diff = 0x15e
[   70.803466] Time Diff = 0xf0
[   70.804348] Time Diff = 0xd0
[   70.805653] Time Diff = 0xad
[   70.806477] Time Diff = 0xe0
[   70.807725] Time Diff = 0xbe
[   70.808541] Time Diff = 0xc3
[   70.810191] Time Diff = 0xc7
[   70.811007] Time Diff = 0xc0
[   70.812629] Time Diff = 0xc0
[   70.813640] Time Diff = 0xda
[   70.814915] Time Diff = 0xbb
[   70.815726] Time Diff = 0xc4
[   70.816955] Time Diff = 0xc0
[   70.817778] Time Diff = 0xcd
[   70.818999] Time Diff = 0xcd
[   70.820099] Time Diff = 0xcb
[   70.821333] Time Diff = 0xf0

Implementation:
===================

The trap instruction is replaced by a branch to a detour buffer. To address
the limitation of branch instruction in power architecture detour buffer
slot is allocated from a reserved area . This will ensure that the branch
is within ± 32 MB range. Patch 2/3 furnishes this. The current kprobes
insn caches allocate memory area for insn slots with module_alloc(). This
will always be beyond ± 32MB range.

The detour buffer contains a call to optimized_callback() which in turn
call the pre_handler(). Once the pre-handler is run, the original
instruction is emulated from the detour buffer itself. Also the detour
buffer is equipped with a branch back to the normal work flow after the
probed instruction is emulated. Before preparing optimization, Kprobes
inserts original(breakpoint instruction)kprobe on the specified address.
So, even if the kprobe is not possible to be optimized, it just uses a
normal kprobe.

Limitations:
==============
- Number of probes which can be optimized is limited by the size of the
  area reserved.
- Currently instructions which can be emulated are the only candidates for
  optimization.
- Conditional branch instructions are not optimized.
- Probes on kernel module region are not considered for optimization now.

RFC patchset for optprobes: https://lkml.org/lkml/2016/5/31/375
			    https://lkml.org/lkml/2016/5/31/376
			    https://lkml.org/lkml/2016/5/31/377
			    https://lkml.org/lkml/2016/5/31/378 

Changes from RFC-v3 :

- Optimization for kporbe(in case of branch instructions) is limited to
  unconditional branch instructions only, since the conditional
  branches are to be assessed carefully in SMP systems.
- create_return_branch() is omitted.
- Comments by Masami are addressed.
 

Anju T Sudhakar (3):
  arch/powerpc : Add detour buffer support for optprobes
  arch/powerpc : optprobes for powerpc core
  arch/powerpc : Enable optprobes support in powerpc

 .../features/debug/optprobes/arch-support.txt      |   2 +-
 arch/powerpc/Kconfig                               |   1 +
 arch/powerpc/include/asm/kprobes.h                 |  24 ++
 arch/powerpc/include/asm/sstep.h                   |   1 +
 arch/powerpc/kernel/Makefile                       |   1 +
 arch/powerpc/kernel/optprobes.c                    | 329 +++++++++++++++++++++
 arch/powerpc/kernel/optprobes_head.S               | 119 ++++++++
 arch/powerpc/lib/sstep.c                           |  21 ++
 8 files changed, 497 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/kernel/optprobes.c
 create mode 100644 arch/powerpc/kernel/optprobes_head.S

-- 
2.7.4

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/3] arch/powerpc : Add detour buffer support for optprobes
  2016-09-07  9:33 [PATCH 0/3] OPTPROBES for powerpc Anju T Sudhakar
@ 2016-09-07  9:33 ` Anju T Sudhakar
  2016-09-07  9:33 ` [PATCH 2/3] arch/powerpc : optprobes for powerpc core Anju T Sudhakar
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Anju T Sudhakar @ 2016-09-07  9:33 UTC (permalink / raw)
  To: linux-kernel, linuxppc-dev
  Cc: ananth, naveen.n.rao, paulus, srikar, benh, mpe, hemant, mahesh,
	mhiramat, Anju T Sudhakar

Detour buffer contains instructions to create an in memory pt_regs.
After the execution of prehandler a call is made for instruction emulation.
The NIP is decided after the probed instruction is executed. Hence a branch
instruction is created to the NIP returned by emulate_step().

Instruction slot for detour buffer is allocated from the reserved area.
For the time being 64KB is reserved in memory for this purpose.

Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/kprobes.h   |  24 +++++++
 arch/powerpc/kernel/optprobes_head.S | 119 +++++++++++++++++++++++++++++++++++
 2 files changed, 143 insertions(+)
 create mode 100644 arch/powerpc/kernel/optprobes_head.S

diff --git a/arch/powerpc/include/asm/kprobes.h b/arch/powerpc/include/asm/kprobes.h
index 2c9759bd..2109ce03 100644
--- a/arch/powerpc/include/asm/kprobes.h
+++ b/arch/powerpc/include/asm/kprobes.h
@@ -38,7 +38,25 @@ struct pt_regs;
 struct kprobe;
 
 typedef ppc_opcode_t kprobe_opcode_t;
+
+extern kprobe_opcode_t optinsn_slot;
+/* Optinsn template address */
+extern kprobe_opcode_t optprobe_template_entry[];
+extern kprobe_opcode_t optprobe_template_call_handler[];
+extern kprobe_opcode_t optprobe_template_call_emulate[];
+extern kprobe_opcode_t optprobe_template_ret[];
+extern kprobe_opcode_t optprobe_template_insn[];
+extern kprobe_opcode_t optprobe_template_kp_addr[];
+extern kprobe_opcode_t optprobe_template_op_address1[];
+extern kprobe_opcode_t optprobe_template_end[];
+
 #define MAX_INSN_SIZE 1
+#define MAX_OPTIMIZED_LENGTH	4
+#define MAX_OPTINSN_SIZE				\
+	(((unsigned long)&optprobe_template_end -	\
+	(unsigned long)&optprobe_template_entry) /	\
+	sizeof(kprobe_opcode_t))
+#define RELATIVEJUMP_SIZE	4
 
 #ifdef PPC64_ELF_ABI_v2
 /* PPC64 ABIv2 needs local entry point */
@@ -124,6 +142,12 @@ struct kprobe_ctlblk {
 	struct prev_kprobe prev_kprobe;
 };
 
+struct arch_optimized_insn {
+	kprobe_opcode_t copied_insn[1];
+	/* detour buffer */
+	kprobe_opcode_t *insn;
+};
+
 extern int kprobe_exceptions_notify(struct notifier_block *self,
 					unsigned long val, void *data);
 extern int kprobe_fault_handler(struct pt_regs *regs, int trapnr);
diff --git a/arch/powerpc/kernel/optprobes_head.S b/arch/powerpc/kernel/optprobes_head.S
new file mode 100644
index 0000000..73db1df
--- /dev/null
+++ b/arch/powerpc/kernel/optprobes_head.S
@@ -0,0 +1,119 @@
+/*
+ * Code to prepare detour buffer for optprobes in Kernel.
+ *
+ * Copyright 2016, Anju T, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <asm/ppc_asm.h>
+#include <asm/ptrace.h>
+#include <asm/asm-offsets.h>
+
+#define	OPT_SLOT_SIZE	65536
+
+.balign	2
+.global optinsn_slot
+optinsn_slot:
+	/* Reserve an area to allocate slots for detour buffer */
+	.space	OPT_SLOT_SIZE
+
+/* Create an in-memory pt_regs */
+.global optprobe_template_entry
+optprobe_template_entry:
+	stdu	r1,-INT_FRAME_SIZE(r1)
+	SAVE_GPR(0,r1)
+	/* Save the previous SP into stack */
+	addi	r0,r1,INT_FRAME_SIZE
+	std	r0,GPR1(r1)
+	SAVE_10GPRS(2,r1)
+	SAVE_10GPRS(12,r1)
+	SAVE_10GPRS(22,r1)
+	/* Save SPRS */
+	mfmsr	r5
+	std	r5,_MSR(r1)
+	li	r5,0
+	std	r5,ORIG_GPR3(r1)
+	std	r5,_TRAP(r1)
+	std	r5,RESULT(r1)
+	mfctr	r5
+	std	r5,_CTR(r1)
+	mflr	r5
+	std	r5,_LINK(r1)
+	mfspr	r5,SPRN_XER
+	std	r5,_XER(r1)
+	mfcr	r5
+	std	r5,_CCR(r1)
+	lbz     r5,PACASOFTIRQEN(r13)
+	std     r5,SOFTE(r1)
+	mfdar	r5
+	std	r5,_DAR(r1)
+	mfdsisr	r5
+	std	r5,_DSISR(r1)
+
+/* Save p->addr into stack */
+.global optprobe_template_kp_addr
+optprobe_template_kp_addr:
+	nop
+	nop
+	nop
+	nop
+	nop
+	std	r3,_NIP(r1)
+
+/* Pass parameters for optimized_callback */
+.global optprobe_template_op_address1
+optprobe_template_op_address1:
+	nop
+	nop
+	nop
+	nop
+	nop
+	addi	r4,r1,STACK_FRAME_OVERHEAD
+
+/* Branch to optimized_callback() */
+.global optprobe_template_call_handler
+optprobe_template_call_handler:
+	nop
+	/* Pass parameters for instruction emulation */
+	addi	r3,r1,STACK_FRAME_OVERHEAD
+.global optprobe_template_insn
+optprobe_template_insn:
+	nop
+	nop
+
+/* Branch to instruction emulation  */
+.global optprobe_template_call_emulate
+optprobe_template_call_emulate:
+	nop
+	/* Restore the registers */
+	ld	r5,_MSR(r1)
+	mtmsr	r5
+	ld	r5,_CTR(r1)
+	mtctr	r5
+	ld	r5,_LINK(r1)
+	mtlr	r5
+	ld	r5,_XER(r1)
+	mtxer	r5
+	ld	r5,_CCR(r1)
+	mtcr	r5
+	ld	r5,_DAR(r1)
+	mtdar	r5
+	ld	r5,_DSISR(r1)
+	mtdsisr	r5
+	REST_GPR(0,r1)
+	REST_10GPRS(2,r1)
+	REST_10GPRS(12,r1)
+	REST_10GPRS(22,r1)
+	/* Restore the previous SP */
+	addi	r1,r1,INT_FRAME_SIZE
+
+/* Jump back to the normal workflow from trampoline */
+.global optprobe_template_ret
+optprobe_template_ret:
+	nop
+.global optprobe_template_end
+optprobe_template_end:
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/3] arch/powerpc : optprobes for powerpc core
  2016-09-07  9:33 [PATCH 0/3] OPTPROBES for powerpc Anju T Sudhakar
  2016-09-07  9:33 ` [PATCH 1/3] arch/powerpc : Add detour buffer support for optprobes Anju T Sudhakar
@ 2016-09-07  9:33 ` Anju T Sudhakar
  2016-09-08 16:47   ` Masami Hiramatsu
  2016-09-07  9:33 ` [PATCH 3/3] arch/powerpc : Enable optprobes support in powerpc Anju T Sudhakar
  2016-09-08 16:11 ` [PATCH 0/3] OPTPROBES for powerpc Masami Hiramatsu
  3 siblings, 1 reply; 8+ messages in thread
From: Anju T Sudhakar @ 2016-09-07  9:33 UTC (permalink / raw)
  To: linux-kernel, linuxppc-dev
  Cc: ananth, naveen.n.rao, paulus, srikar, benh, mpe, hemant, mahesh,
	mhiramat, Anju T Sudhakar

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=y, Size: 11251 bytes --]

Instructions which can be emulated are suppliants for optimization.
Before optimization ensure that the address range between the detour
buffer allocated and the instruction being probed is within ± 32MB.

Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/sstep.h |   1 +
 arch/powerpc/kernel/optprobes.c  | 329 +++++++++++++++++++++++++++++++++++++++
 arch/powerpc/lib/sstep.c         |  21 +++
 3 files changed, 351 insertions(+)
 create mode 100644 arch/powerpc/kernel/optprobes.c

diff --git a/arch/powerpc/include/asm/sstep.h b/arch/powerpc/include/asm/sstep.h
index d3a42cc..cd5f6ab 100644
--- a/arch/powerpc/include/asm/sstep.h
+++ b/arch/powerpc/include/asm/sstep.h
@@ -25,6 +25,7 @@ struct pt_regs;
 
 /* Emulate instructions that cause a transfer of control. */
 extern int emulate_step(struct pt_regs *regs, unsigned int instr);
+extern int optprobe_conditional_branch_check(unsigned int instr);
 
 enum instruction_type {
 	COMPUTE,		/* arith/logical/CR op, etc. */
diff --git a/arch/powerpc/kernel/optprobes.c b/arch/powerpc/kernel/optprobes.c
new file mode 100644
index 0000000..7983d07
--- /dev/null
+++ b/arch/powerpc/kernel/optprobes.c
@@ -0,0 +1,329 @@
+/*
+ * Code for Kernel probes Jump optimization.
+ *
+ * Copyright 2016, Anju T, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/kprobes.h>
+#include <linux/jump_label.h>
+#include <linux/types.h>
+#include <linux/slab.h>
+#include <linux/list.h>
+#include <asm/kprobes.h>
+#include <asm/ptrace.h>
+#include <asm/cacheflush.h>
+#include <asm/code-patching.h>
+#include <asm/sstep.h>
+
+DEFINE_INSN_CACHE_OPS(ppc_optinsn)
+
+#define TMPL_CALL_HDLR_IDX	\
+	(optprobe_template_call_handler - optprobe_template_entry)
+#define TMPL_EMULATE_IDX	\
+	(optprobe_template_call_emulate - optprobe_template_entry)
+#define TMPL_RET_IDX	\
+	(optprobe_template_ret - optprobe_template_entry)
+#define TMPL_KP_IDX	\
+	(optprobe_template_kp_addr - optprobe_template_entry)
+#define TMPL_OP1_IDX	\
+	(optprobe_template_op_address1 - optprobe_template_entry)
+#define TMPL_INSN_IDX	\
+	(optprobe_template_insn - optprobe_template_entry)
+#define TMPL_END_IDX	\
+	(optprobe_template_end - optprobe_template_entry)
+
+static bool insn_page_in_use;
+
+static void *__ppc_alloc_insn_page(void)
+{
+	if (insn_page_in_use)
+		return NULL;
+	insn_page_in_use = true;
+	return &optinsn_slot;
+}
+
+static void __ppc_free_insn_page(void *page __maybe_unused)
+{
+	insn_page_in_use = false;
+}
+
+struct kprobe_insn_cache kprobe_ppc_optinsn_slots = {
+	.mutex = __MUTEX_INITIALIZER(kprobe_ppc_optinsn_slots.mutex),
+	.pages = LIST_HEAD_INIT(kprobe_ppc_optinsn_slots.pages),
+	/* insn_size initialized later */
+	.alloc = __ppc_alloc_insn_page,
+	.free = __ppc_free_insn_page,
+	.nr_garbage = 0,
+};
+
+kprobe_opcode_t *ppc_get_optinsn_slot(struct optimized_kprobe *op)
+{
+	/*
+	 * The insn slot is allocated from the reserved
+	 * area(ie &optinsn_slot).We are not optimizing probes
+	 * at module_addr now.
+	 */
+	if (is_kernel_addr((unsigned long)op->kp.addr))
+		return get_ppc_optinsn_slot();
+	return NULL;
+}
+
+static void ppc_free_optinsn_slot(struct optimized_kprobe *op)
+{
+	if (!op->optinsn.insn)
+		return;
+	if (is_kernel_addr((unsigned long)op->kp.addr))
+		free_ppc_optinsn_slot(op->optinsn.insn, 0);
+}
+
+static unsigned long can_optimize(struct kprobe *p)
+{
+	struct pt_regs *regs;
+	unsigned int instr;
+
+	/*
+	 * Not optimizing the kprobe placed by
+	 * kretprobe during boot time
+	 */
+	if (p->addr == (kprobe_opcode_t *)&kretprobe_trampoline)
+		return 0;
+
+	regs = kmalloc(sizeof(*regs), GFP_KERNEL);
+	if (!regs)
+		return -ENOMEM;
+	memset(regs, 0, sizeof(struct pt_regs));
+	memcpy(regs, current_pt_regs(), sizeof(struct pt_regs));
+	regs->nip = (unsigned long)p->addr;
+	instr = *p->ainsn.insn;
+
+	/* Ensure the instruction can be emulated */
+	if (emulate_step(regs, instr) != 1)
+		return 0;
+	/* Conditional branches are not optimized */
+	if (optprobe_conditional_branch_check(instr) != 1)
+		return 0;
+	return regs->nip;
+}
+
+static void
+optimized_callback(struct optimized_kprobe *op, struct pt_regs *regs)
+{
+	struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
+	unsigned long flags;
+
+	local_irq_save(flags);
+
+	if (kprobe_running()) {
+		kprobes_inc_nmissed_count(&op->kp);
+	} else {
+		__this_cpu_write(current_kprobe, &op->kp);
+		kcb->kprobe_status = KPROBE_HIT_ACTIVE;
+		opt_pre_handler(&op->kp, regs);
+		__this_cpu_write(current_kprobe, NULL);
+	}
+	local_irq_restore(flags);
+}
+NOKPROBE_SYMBOL(optimized_callback);
+
+void arch_remove_optimized_kprobe(struct optimized_kprobe *op)
+{
+	ppc_free_optinsn_slot(op);
+	op->optinsn.insn = NULL;
+}
+
+/*
+ * emulate_step() requires insn to be emulated as
+ * second parameter. Load register 'r4' with the
+ * instruction.
+ */
+void create_load_emulate_insn(unsigned int insn, kprobe_opcode_t *addr)
+{
+	u32 instr, instr2;
+
+	/* synthesize addis r4,0,(insn)@h */
+	 instr = 0x3c000000 | 0x800000 | ((insn >> 16) & 0xffff);
+	*addr++ = instr;
+
+	/* ori r4,r4,(insn)@l */
+	instr2 = 0x60000000 | 0x40000 | 0x800000;
+	instr2 = instr2 | (insn & 0xffff);
+	*addr = instr2;
+}
+
+/*
+ * optimized_kprobe structure is required as a parameter
+ * for invoking optimized_callback() from detour buffer.
+ * Load this value into register 'r3'.
+ */
+void create_load_address_insn(unsigned long val, kprobe_opcode_t *addr)
+{
+	u32 instr1, instr2, instr3, instr4, instr5;
+	/*
+	 * 64bit immediate load into r3.
+	 * lis r3,(op)@highest
+	 */
+	instr1 = 0x3c000000 | 0x600000 | ((val >> 48) & 0xffff);
+	*addr++ = instr1;
+
+	/* ori r3,r3,(op)@higher */
+	instr2 = 0x60000000 | 0x30000 | 0x600000 | ((val >> 32) & 0xffff);
+	*addr++ = instr2;
+
+	/* rldicr r3,r3,32,31 */
+	instr3 = 0x78000004 | 0x30000 | 0x600000 | ((32 & 0x1f) << 11);
+	instr3 = instr3 | ((31 & 0x1f) << 6) | ((32 & 0x20) >> 4);
+	*addr++ = instr3;
+
+	/* oris r3,r3,(op)@h */
+	instr4 = 0x64000000 |  0x30000 | 0x600000 | ((val >> 16) & 0xffff);
+	*addr++ = instr4;
+
+	/* ori r3,r3,(op)@l */
+	instr5 = 0x60000000 | 0x30000 | 0x600000 | (val & 0xffff);
+	*addr = instr5;
+}
+
+int arch_prepare_optimized_kprobe(struct optimized_kprobe *op, struct kprobe *p)
+{
+	kprobe_opcode_t *buff, branch, branch2, branch3;
+	long rel_chk, ret_chk;
+	unsigned long nip;
+
+	kprobe_ppc_optinsn_slots.insn_size = MAX_OPTINSN_SIZE;
+	op->optinsn.insn = NULL;
+	nip = can_optimize(p);
+
+	if (!nip)
+		return -EILSEQ;
+
+	/* Allocate instruction slot for detour buffer */
+	buff = ppc_get_optinsn_slot(op);
+	if (!buff)
+		return -ENOMEM;
+
+	/*
+	 * OPTPROBE use a 'b' instruction to branch to optinsn.insn.
+	 *
+	 * The target address has to be relatively nearby, to permit use
+	 * of branch instruction in powerpc because the address is specified
+	 * in an immediate field in the instruction opcode itself, ie 24 bits
+	 * in the opcode specify the address. Therefore the address gap should
+	 * be 32MB on either side of the current instruction.
+	 */
+	rel_chk = (long)buff - (unsigned long)p->addr;
+	if (rel_chk < -0x2000000 || rel_chk > 0x1fffffc || rel_chk & 0x3) {
+		ppc_free_optinsn_slot(op);
+		return -ERANGE;
+	}
+	/* Check the return address is also within 32MB range */
+	ret_chk = (long)(buff + TMPL_RET_IDX) - (unsigned long)nip;
+	if (ret_chk < -0x2000000 || ret_chk > 0x1fffffc || ret_chk & 0x3) {
+		ppc_free_optinsn_slot(op);
+		return -ERANGE;
+	}
+
+	/* Do Copy arch specific instance from template */
+	memcpy(buff, optprobe_template_entry,
+	       TMPL_END_IDX * sizeof(kprobe_opcode_t));
+
+	/* Load address into register */
+	create_load_address_insn((unsigned long)p->addr, buff + TMPL_KP_IDX);
+	create_load_address_insn((unsigned long)op, buff + TMPL_OP1_IDX);
+
+	/*
+	 * Create a branch to the optimized_callback function.
+	 * optimized_callback, points to the global entry point.
+	 * Add +8, to create a branch to the LEP of the function.
+	 */
+	branch = create_branch((unsigned int *)buff + TMPL_CALL_HDLR_IDX,
+			       (unsigned long)optimized_callback + 8,
+				BRANCH_SET_LINK);
+
+	/* Place the branch instr into the trampoline */
+	buff[TMPL_CALL_HDLR_IDX] = branch;
+
+	/* Load instruction to be emulated into relevant register */
+	create_load_emulate_insn(*p->ainsn.insn, buff + TMPL_INSN_IDX);
+
+	/*
+	 * Create a branch instruction into the emulate_step.
+	 * Add +8, to create the branch to LEP of emulate_step().
+	 */
+	branch3 = create_branch((unsigned int *)buff + TMPL_EMULATE_IDX,
+				(unsigned long)emulate_step + 8,
+				BRANCH_SET_LINK);
+	buff[TMPL_EMULATE_IDX] = branch3;
+
+	/* Create a branch for jumping back */
+	branch2 = create_branch((unsigned int *)buff + TMPL_RET_IDX,
+				(unsigned long)nip, 0);
+	buff[TMPL_RET_IDX] = branch2;
+
+	op->optinsn.insn = buff;
+	smp_mb();
+	return 0;
+}
+
+int arch_prepared_optinsn(struct arch_optimized_insn *optinsn)
+{
+	return optinsn->insn != NULL;
+}
+
+/*
+ * Here,kprobe opt always replace one instruction (4 bytes
+ * aligned and 4 bytes long). It is impossible to encounter another
+ * kprobe in the address range. So always return 0.
+ */
+int arch_check_optimized_kprobe(struct optimized_kprobe *op)
+{
+	return 0;
+}
+
+void arch_optimize_kprobes(struct list_head *oplist)
+{
+	struct optimized_kprobe *op;
+	struct optimized_kprobe *tmp;
+
+	unsigned int branch;
+
+	list_for_each_entry_safe(op, tmp, oplist, list) {
+		/*
+		 * Backup instructions which will be replaced
+		 * by jump address
+		 */
+		memcpy(op->optinsn.copied_insn, op->kp.addr,
+		       RELATIVEJUMP_SIZE);
+		branch = create_branch((unsigned int *)op->kp.addr,
+				       (unsigned long)op->optinsn.insn, 0);
+		*op->kp.addr = branch;
+		list_del_init(&op->list);
+	}
+}
+
+void arch_unoptimize_kprobe(struct optimized_kprobe *op)
+{
+	arch_arm_kprobe(&op->kp);
+}
+
+void arch_unoptimize_kprobes(struct list_head *oplist,
+			     struct list_head *done_list)
+{
+	struct optimized_kprobe *op;
+	struct optimized_kprobe *tmp;
+
+	list_for_each_entry_safe(op, tmp, oplist, list) {
+		arch_unoptimize_kprobe(op);
+		list_move(&op->list, done_list);
+	}
+}
+
+int arch_within_optimized_kprobe(struct optimized_kprobe *op,
+				 unsigned long addr)
+{
+	return 0;
+}
diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index 3362299..c4b8259 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -2018,3 +2018,24 @@ int __kprobes emulate_step(struct pt_regs *regs, unsigned int instr)
 	regs->nip = truncate_if_32bit(regs->msr, regs->nip + 4);
 	return 1;
 }
+
+/* Before optimizing, ensure that the probed instruction is not a
+ * conditional branch instruction
+ */
+int __kprobes optprobe_conditional_branch_check(unsigned int instr)
+{
+	unsigned int opcode;
+
+	opcode = instr >> 26;
+	if (opcode == 16)
+		return 0;
+	if (opcode == 19) {
+		switch ((instr >> 1) & 0x3ff) {
+		case 16:        /* bclr, bclrl */
+		case 528:       /* bcctr, bcctrl */
+		case 560:       /* bctar, bctarl */
+			return 0;
+		}
+	}
+	return 1;
+}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/3] arch/powerpc : Enable optprobes support in powerpc
  2016-09-07  9:33 [PATCH 0/3] OPTPROBES for powerpc Anju T Sudhakar
  2016-09-07  9:33 ` [PATCH 1/3] arch/powerpc : Add detour buffer support for optprobes Anju T Sudhakar
  2016-09-07  9:33 ` [PATCH 2/3] arch/powerpc : optprobes for powerpc core Anju T Sudhakar
@ 2016-09-07  9:33 ` Anju T Sudhakar
  2016-09-08 16:11 ` [PATCH 0/3] OPTPROBES for powerpc Masami Hiramatsu
  3 siblings, 0 replies; 8+ messages in thread
From: Anju T Sudhakar @ 2016-09-07  9:33 UTC (permalink / raw)
  To: linux-kernel, linuxppc-dev
  Cc: ananth, naveen.n.rao, paulus, srikar, benh, mpe, hemant, mahesh,
	mhiramat, Anju T Sudhakar

Mark optprobe 'ok' for powerpc

Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
---
 Documentation/features/debug/optprobes/arch-support.txt | 2 +-
 arch/powerpc/Kconfig                                    | 1 +
 arch/powerpc/kernel/Makefile                            | 1 +
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/Documentation/features/debug/optprobes/arch-support.txt b/Documentation/features/debug/optprobes/arch-support.txt
index b8999d8..45bc99d 100644
--- a/Documentation/features/debug/optprobes/arch-support.txt
+++ b/Documentation/features/debug/optprobes/arch-support.txt
@@ -27,7 +27,7 @@
     |       nios2: | TODO |
     |    openrisc: | TODO |
     |      parisc: | TODO |
-    |     powerpc: | TODO |
+    |     powerpc: |  ok  |
     |        s390: | TODO |
     |       score: | TODO |
     |          sh: | TODO |
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index a5e0b47..136ca35 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -104,6 +104,7 @@ config PPC
 	select HAVE_IOREMAP_PROT
 	select HAVE_EFFICIENT_UNALIGNED_ACCESS if !CPU_LITTLE_ENDIAN
 	select HAVE_KPROBES
+	select HAVE_OPTPROBES
 	select HAVE_ARCH_KGDB
 	select HAVE_KRETPROBES
 	select HAVE_ARCH_TRACEHOOK
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index fe4c075..33667d3 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -98,6 +98,7 @@ endif
 obj-$(CONFIG_BOOTX_TEXT)	+= btext.o
 obj-$(CONFIG_SMP)		+= smp.o
 obj-$(CONFIG_KPROBES)		+= kprobes.o
+obj-$(CONFIG_OPTPROBES)		+= optprobes.o optprobes_head.o
 obj-$(CONFIG_UPROBES)		+= uprobes.o
 obj-$(CONFIG_PPC_UDBG_16550)	+= legacy_serial.o udbg_16550.o
 obj-$(CONFIG_STACKTRACE)	+= stacktrace.o
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/3] OPTPROBES for powerpc
  2016-09-07  9:33 [PATCH 0/3] OPTPROBES for powerpc Anju T Sudhakar
                   ` (2 preceding siblings ...)
  2016-09-07  9:33 ` [PATCH 3/3] arch/powerpc : Enable optprobes support in powerpc Anju T Sudhakar
@ 2016-09-08 16:11 ` Masami Hiramatsu
  3 siblings, 0 replies; 8+ messages in thread
From: Masami Hiramatsu @ 2016-09-08 16:11 UTC (permalink / raw)
  To: Anju T Sudhakar
  Cc: linux-kernel, linuxppc-dev, ananth, naveen.n.rao, paulus, srikar,
	benh, mpe, hemant, mahesh, mhiramat

Hi Anju,

On Wed,  7 Sep 2016 15:03:09 +0530
Anju T Sudhakar <anju@linux.vnet.ibm.com> wrote:

> This is the patchset of the kprobes jump optimization
> (a.k.a OPTPROBES)for powerpc. Kprobe being an inevitable tool
> for kernel developers, enhancing the performance of kprobe has
> got much importance.
> 
> Currently kprobes inserts a trap instruction to probe a running kernel.
> Jump optimization allows kprobes to replace the trap with a branch,
> reducing the probe overhead drastically.

Thank you for updating the series :)
I'll check that.

> 
> In this series, conditional branch instructions are not considered for
> optimization as they have to be assessed carefully in SMP systems.

So, what kind of problem are there on PPC? (can conditional flag be
changed by other cpu?)

Thanks,

> 
> 
> Performance:
> =============
> An optimized kprobe in powerpc is 1.05 to 4.7 times faster than a kprobe.
> 
> Example:
> 
> Placed a probe at an offset 0x50 in _do_fork().
> *Time Diff here is, difference in time before hitting the probe and
> after the probed instruction. mftb() is employed in kernel/fork.c for
> this purpose.
> 
> # echo 0 > /proc/sys/debug/kprobes-optimization
> Kprobes globally unoptimized
> [  233.607120] Time Diff = 0x1f0
> [  233.608273] Time Diff = 0x1ee
> [  233.609228] Time Diff = 0x203
> [  233.610400] Time Diff = 0x1ec
> [  233.611335] Time Diff = 0x200
> [  233.612552] Time Diff = 0x1f0
> [  233.613386] Time Diff = 0x1ee
> [  233.614547] Time Diff = 0x212
> [  233.615570] Time Diff = 0x206
> [  233.616819] Time Diff = 0x1f3
> [  233.617773] Time Diff = 0x1ec
> [  233.618944] Time Diff = 0x1fb
> [  233.619879] Time Diff = 0x1f0
> [  233.621066] Time Diff = 0x1f9
> [  233.621999] Time Diff = 0x283
> [  233.623281] Time Diff = 0x24d
> [  233.624172] Time Diff = 0x1ea
> [  233.625381] Time Diff = 0x1f0
> [  233.626358] Time Diff = 0x200
> [  233.627572] Time Diff = 0x1ed
> 
> # echo 1 > /proc/sys/debug/kprobes-optimization
> Kprobes globally optimized
> [   70.797075] Time Diff = 0x103
> [   70.799102] Time Diff = 0x181
> [   70.801861] Time Diff = 0x15e
> [   70.803466] Time Diff = 0xf0
> [   70.804348] Time Diff = 0xd0
> [   70.805653] Time Diff = 0xad
> [   70.806477] Time Diff = 0xe0
> [   70.807725] Time Diff = 0xbe
> [   70.808541] Time Diff = 0xc3
> [   70.810191] Time Diff = 0xc7
> [   70.811007] Time Diff = 0xc0
> [   70.812629] Time Diff = 0xc0
> [   70.813640] Time Diff = 0xda
> [   70.814915] Time Diff = 0xbb
> [   70.815726] Time Diff = 0xc4
> [   70.816955] Time Diff = 0xc0
> [   70.817778] Time Diff = 0xcd
> [   70.818999] Time Diff = 0xcd
> [   70.820099] Time Diff = 0xcb
> [   70.821333] Time Diff = 0xf0
> 
> Implementation:
> ===================
> 
> The trap instruction is replaced by a branch to a detour buffer. To address
> the limitation of branch instruction in power architecture detour buffer
> slot is allocated from a reserved area . This will ensure that the branch
> is within ± 32 MB range. Patch 2/3 furnishes this. The current kprobes
> insn caches allocate memory area for insn slots with module_alloc(). This
> will always be beyond ± 32MB range.
> 
> The detour buffer contains a call to optimized_callback() which in turn
> call the pre_handler(). Once the pre-handler is run, the original
> instruction is emulated from the detour buffer itself. Also the detour
> buffer is equipped with a branch back to the normal work flow after the
> probed instruction is emulated. Before preparing optimization, Kprobes
> inserts original(breakpoint instruction)kprobe on the specified address.
> So, even if the kprobe is not possible to be optimized, it just uses a
> normal kprobe.
> 
> Limitations:
> ==============
> - Number of probes which can be optimized is limited by the size of the
>   area reserved.
> - Currently instructions which can be emulated are the only candidates for
>   optimization.
> - Conditional branch instructions are not optimized.
> - Probes on kernel module region are not considered for optimization now.
> 
> RFC patchset for optprobes: https://lkml.org/lkml/2016/5/31/375
> 			    https://lkml.org/lkml/2016/5/31/376
> 			    https://lkml.org/lkml/2016/5/31/377
> 			    https://lkml.org/lkml/2016/5/31/378 
> 
> Changes from RFC-v3 :
> 
> - Optimization for kporbe(in case of branch instructions) is limited to
>   unconditional branch instructions only, since the conditional
>   branches are to be assessed carefully in SMP systems.
> - create_return_branch() is omitted.
> - Comments by Masami are addressed.
>  
> 
> Anju T Sudhakar (3):
>   arch/powerpc : Add detour buffer support for optprobes
>   arch/powerpc : optprobes for powerpc core
>   arch/powerpc : Enable optprobes support in powerpc
> 
>  .../features/debug/optprobes/arch-support.txt      |   2 +-
>  arch/powerpc/Kconfig                               |   1 +
>  arch/powerpc/include/asm/kprobes.h                 |  24 ++
>  arch/powerpc/include/asm/sstep.h                   |   1 +
>  arch/powerpc/kernel/Makefile                       |   1 +
>  arch/powerpc/kernel/optprobes.c                    | 329 +++++++++++++++++++++
>  arch/powerpc/kernel/optprobes_head.S               | 119 ++++++++
>  arch/powerpc/lib/sstep.c                           |  21 ++
>  8 files changed, 497 insertions(+), 1 deletion(-)
>  create mode 100644 arch/powerpc/kernel/optprobes.c
>  create mode 100644 arch/powerpc/kernel/optprobes_head.S
> 
> -- 
> 2.7.4
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3] arch/powerpc : optprobes for powerpc core
  2016-09-07  9:33 ` [PATCH 2/3] arch/powerpc : optprobes for powerpc core Anju T Sudhakar
@ 2016-09-08 16:47   ` Masami Hiramatsu
  2016-09-09 10:49     ` Anju T Sudhakar
  0 siblings, 1 reply; 8+ messages in thread
From: Masami Hiramatsu @ 2016-09-08 16:47 UTC (permalink / raw)
  To: Anju T Sudhakar
  Cc: linux-kernel, linuxppc-dev, ananth, naveen.n.rao, paulus, srikar,
	benh, mpe, hemant, mahesh, mhiramat

On Wed,  7 Sep 2016 15:03:11 +0530
Anju T Sudhakar <anju@linux.vnet.ibm.com> wrote:

> Instructions which can be emulated are suppliants for optimization.
> Before optimization ensure that the address range between the detour
> buffer allocated and the instruction being probed is within ± 32MB.
> 
> Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/sstep.h |   1 +
>  arch/powerpc/kernel/optprobes.c  | 329 +++++++++++++++++++++++++++++++++++++++
>  arch/powerpc/lib/sstep.c         |  21 +++
>  3 files changed, 351 insertions(+)
>  create mode 100644 arch/powerpc/kernel/optprobes.c
> 
> diff --git a/arch/powerpc/include/asm/sstep.h b/arch/powerpc/include/asm/sstep.h
> index d3a42cc..cd5f6ab 100644
> --- a/arch/powerpc/include/asm/sstep.h
> +++ b/arch/powerpc/include/asm/sstep.h
> @@ -25,6 +25,7 @@ struct pt_regs;
>  
>  /* Emulate instructions that cause a transfer of control. */
>  extern int emulate_step(struct pt_regs *regs, unsigned int instr);
> +extern int optprobe_conditional_branch_check(unsigned int instr);
>  
>  enum instruction_type {
>  	COMPUTE,		/* arith/logical/CR op, etc. */
> diff --git a/arch/powerpc/kernel/optprobes.c b/arch/powerpc/kernel/optprobes.c
> new file mode 100644
> index 0000000..7983d07
> --- /dev/null
> +++ b/arch/powerpc/kernel/optprobes.c
> @@ -0,0 +1,329 @@
> +/*
> + * Code for Kernel probes Jump optimization.
> + *
> + * Copyright 2016, Anju T, IBM Corp.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +
> +#include <linux/kprobes.h>
> +#include <linux/jump_label.h>
> +#include <linux/types.h>
> +#include <linux/slab.h>
> +#include <linux/list.h>
> +#include <asm/kprobes.h>
> +#include <asm/ptrace.h>
> +#include <asm/cacheflush.h>
> +#include <asm/code-patching.h>
> +#include <asm/sstep.h>
> +
> +DEFINE_INSN_CACHE_OPS(ppc_optinsn)
> +
> +#define TMPL_CALL_HDLR_IDX	\
> +	(optprobe_template_call_handler - optprobe_template_entry)
> +#define TMPL_EMULATE_IDX	\
> +	(optprobe_template_call_emulate - optprobe_template_entry)
> +#define TMPL_RET_IDX	\
> +	(optprobe_template_ret - optprobe_template_entry)
> +#define TMPL_KP_IDX	\
> +	(optprobe_template_kp_addr - optprobe_template_entry)
> +#define TMPL_OP1_IDX	\
> +	(optprobe_template_op_address1 - optprobe_template_entry)
> +#define TMPL_INSN_IDX	\
> +	(optprobe_template_insn - optprobe_template_entry)
> +#define TMPL_END_IDX	\
> +	(optprobe_template_end - optprobe_template_entry)
> +
> +static bool insn_page_in_use;
> +
> +static void *__ppc_alloc_insn_page(void)
> +{
> +	if (insn_page_in_use)
> +		return NULL;
> +	insn_page_in_use = true;
> +	return &optinsn_slot;
> +}
> +
> +static void __ppc_free_insn_page(void *page __maybe_unused)
> +{
> +	insn_page_in_use = false;
> +}
> +
> +struct kprobe_insn_cache kprobe_ppc_optinsn_slots = {
> +	.mutex = __MUTEX_INITIALIZER(kprobe_ppc_optinsn_slots.mutex),
> +	.pages = LIST_HEAD_INIT(kprobe_ppc_optinsn_slots.pages),
> +	/* insn_size initialized later */
> +	.alloc = __ppc_alloc_insn_page,
> +	.free = __ppc_free_insn_page,
> +	.nr_garbage = 0,
> +};
> +
> +kprobe_opcode_t *ppc_get_optinsn_slot(struct optimized_kprobe *op)
> +{
> +	/*
> +	 * The insn slot is allocated from the reserved
> +	 * area(ie &optinsn_slot).We are not optimizing probes
> +	 * at module_addr now.
> +	 */
> +	if (is_kernel_addr((unsigned long)op->kp.addr))
> +		return get_ppc_optinsn_slot();
> +	return NULL;
> +}
> +
> +static void ppc_free_optinsn_slot(struct optimized_kprobe *op)
> +{
> +	if (!op->optinsn.insn)
> +		return;
> +	if (is_kernel_addr((unsigned long)op->kp.addr))
> +		free_ppc_optinsn_slot(op->optinsn.insn, 0);
> +}
> +
> +static unsigned long can_optimize(struct kprobe *p)
> +{
> +	struct pt_regs *regs;
> +	unsigned int instr;
> +
> +	/*
> +	 * Not optimizing the kprobe placed by
> +	 * kretprobe during boot time
> +	 */
> +	if (p->addr == (kprobe_opcode_t *)&kretprobe_trampoline)
> +		return 0;
> +
> +	regs = kmalloc(sizeof(*regs), GFP_KERNEL);
> +	if (!regs)
> +		return -ENOMEM;
> +	memset(regs, 0, sizeof(struct pt_regs));
> +	memcpy(regs, current_pt_regs(), sizeof(struct pt_regs));
> +	regs->nip = (unsigned long)p->addr;
> +	instr = *p->ainsn.insn;
> +
> +	/* Ensure the instruction can be emulated */
> +	if (emulate_step(regs, instr) != 1)
> +		return 0;
> +	/* Conditional branches are not optimized */
> +	if (optprobe_conditional_branch_check(instr) != 1)
> +		return 0;
> +	return regs->nip;

Could you free regs here? Or allocate it on stack.

> +}
> +
> +static void
> +optimized_callback(struct optimized_kprobe *op, struct pt_regs *regs)
> +{
> +	struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
> +	unsigned long flags;
> +
> +	local_irq_save(flags);
> +
> +	if (kprobe_running()) {
> +		kprobes_inc_nmissed_count(&op->kp);
> +	} else {
> +		__this_cpu_write(current_kprobe, &op->kp);
> +		kcb->kprobe_status = KPROBE_HIT_ACTIVE;
> +		opt_pre_handler(&op->kp, regs);
> +		__this_cpu_write(current_kprobe, NULL);
> +	}
> +	local_irq_restore(flags);
> +}
> +NOKPROBE_SYMBOL(optimized_callback);
> +
> +void arch_remove_optimized_kprobe(struct optimized_kprobe *op)
> +{
> +	ppc_free_optinsn_slot(op);
> +	op->optinsn.insn = NULL;
> +}
> +
> +/*
> + * emulate_step() requires insn to be emulated as
> + * second parameter. Load register 'r4' with the
> + * instruction.
> + */
> +void create_load_emulate_insn(unsigned int insn, kprobe_opcode_t *addr)
> +{
> +	u32 instr, instr2;
> +
> +	/* synthesize addis r4,0,(insn)@h */
> +	 instr = 0x3c000000 | 0x800000 | ((insn >> 16) & 0xffff);
> +	*addr++ = instr;
> +
> +	/* ori r4,r4,(insn)@l */
> +	instr2 = 0x60000000 | 0x40000 | 0x800000;
> +	instr2 = instr2 | (insn & 0xffff);
> +	*addr = instr2;
> +}
> +
> +/*
> + * optimized_kprobe structure is required as a parameter
> + * for invoking optimized_callback() from detour buffer.
> + * Load this value into register 'r3'.
> + */
> +void create_load_address_insn(unsigned long val, kprobe_opcode_t *addr)
> +{
> +	u32 instr1, instr2, instr3, instr4, instr5;
> +	/*
> +	 * 64bit immediate load into r3.
> +	 * lis r3,(op)@highest
> +	 */
> +	instr1 = 0x3c000000 | 0x600000 | ((val >> 48) & 0xffff);
> +	*addr++ = instr1;
> +
> +	/* ori r3,r3,(op)@higher */
> +	instr2 = 0x60000000 | 0x30000 | 0x600000 | ((val >> 32) & 0xffff);
> +	*addr++ = instr2;
> +
> +	/* rldicr r3,r3,32,31 */
> +	instr3 = 0x78000004 | 0x30000 | 0x600000 | ((32 & 0x1f) << 11);
> +	instr3 = instr3 | ((31 & 0x1f) << 6) | ((32 & 0x20) >> 4);
> +	*addr++ = instr3;
> +
> +	/* oris r3,r3,(op)@h */
> +	instr4 = 0x64000000 |  0x30000 | 0x600000 | ((val >> 16) & 0xffff);
> +	*addr++ = instr4;
> +
> +	/* ori r3,r3,(op)@l */
> +	instr5 = 0x60000000 | 0x30000 | 0x600000 | (val & 0xffff);
> +	*addr = instr5;
> +}
> +
> +int arch_prepare_optimized_kprobe(struct optimized_kprobe *op, struct kprobe *p)
> +{
> +	kprobe_opcode_t *buff, branch, branch2, branch3;
> +	long rel_chk, ret_chk;
> +	unsigned long nip;
> +
> +	kprobe_ppc_optinsn_slots.insn_size = MAX_OPTINSN_SIZE;
> +	op->optinsn.insn = NULL;
> +	nip = can_optimize(p);
> +
> +	if (!nip)
> +		return -EILSEQ;
> +
> +	/* Allocate instruction slot for detour buffer */
> +	buff = ppc_get_optinsn_slot(op);
> +	if (!buff)
> +		return -ENOMEM;
> +
> +	/*
> +	 * OPTPROBE use a 'b' instruction to branch to optinsn.insn.
> +	 *
> +	 * The target address has to be relatively nearby, to permit use
> +	 * of branch instruction in powerpc because the address is specified
> +	 * in an immediate field in the instruction opcode itself, ie 24 bits
> +	 * in the opcode specify the address. Therefore the address gap should
> +	 * be 32MB on either side of the current instruction.
> +	 */
> +	rel_chk = (long)buff - (unsigned long)p->addr;
> +	if (rel_chk < -0x2000000 || rel_chk > 0x1fffffc || rel_chk & 0x3) {
> +		ppc_free_optinsn_slot(op);

This doesn't work because op->optinsn.insn is NULL here. (buff is assigned
at the end of this function)

> +		return -ERANGE;
> +	}
> +	/* Check the return address is also within 32MB range */
> +	ret_chk = (long)(buff + TMPL_RET_IDX) - (unsigned long)nip;
> +	if (ret_chk < -0x2000000 || ret_chk > 0x1fffffc || ret_chk & 0x3) {
> +		ppc_free_optinsn_slot(op);

ditto.

> +		return -ERANGE;
> +	}
> +
> +	/* Do Copy arch specific instance from template */
> +	memcpy(buff, optprobe_template_entry,
> +	       TMPL_END_IDX * sizeof(kprobe_opcode_t));
> +
> +	/* Load address into register */
> +	create_load_address_insn((unsigned long)p->addr, buff + TMPL_KP_IDX);
> +	create_load_address_insn((unsigned long)op, buff + TMPL_OP1_IDX);
> +
> +	/*
> +	 * Create a branch to the optimized_callback function.
> +	 * optimized_callback, points to the global entry point.
> +	 * Add +8, to create a branch to the LEP of the function.
> +	 */
> +	branch = create_branch((unsigned int *)buff + TMPL_CALL_HDLR_IDX,
> +			       (unsigned long)optimized_callback + 8,
> +				BRANCH_SET_LINK);
> +
> +	/* Place the branch instr into the trampoline */
> +	buff[TMPL_CALL_HDLR_IDX] = branch;
> +
> +	/* Load instruction to be emulated into relevant register */
> +	create_load_emulate_insn(*p->ainsn.insn, buff + TMPL_INSN_IDX);
> +
> +	/*
> +	 * Create a branch instruction into the emulate_step.
> +	 * Add +8, to create the branch to LEP of emulate_step().
> +	 */
> +	branch3 = create_branch((unsigned int *)buff + TMPL_EMULATE_IDX,
> +				(unsigned long)emulate_step + 8,
> +				BRANCH_SET_LINK);
> +	buff[TMPL_EMULATE_IDX] = branch3;
> +
> +	/* Create a branch for jumping back */
> +	branch2 = create_branch((unsigned int *)buff + TMPL_RET_IDX,
> +				(unsigned long)nip, 0);
> +	buff[TMPL_RET_IDX] = branch2;
> +
> +	op->optinsn.insn = buff;
> +	smp_mb();
> +	return 0;
> +}
> +
> +int arch_prepared_optinsn(struct arch_optimized_insn *optinsn)
> +{
> +	return optinsn->insn != NULL;
> +}
> +
> +/*
> + * Here,kprobe opt always replace one instruction (4 bytes
> + * aligned and 4 bytes long). It is impossible to encounter another
> + * kprobe in the address range. So always return 0.
> + */
> +int arch_check_optimized_kprobe(struct optimized_kprobe *op)
> +{
> +	return 0;
> +}
> +
> +void arch_optimize_kprobes(struct list_head *oplist)
> +{
> +	struct optimized_kprobe *op;
> +	struct optimized_kprobe *tmp;
> +
> +	unsigned int branch;
> +
> +	list_for_each_entry_safe(op, tmp, oplist, list) {
> +		/*
> +		 * Backup instructions which will be replaced
> +		 * by jump address
> +		 */
> +		memcpy(op->optinsn.copied_insn, op->kp.addr,
> +		       RELATIVEJUMP_SIZE);
> +		branch = create_branch((unsigned int *)op->kp.addr,
> +				       (unsigned long)op->optinsn.insn, 0);
> +		*op->kp.addr = branch;

Hmm, wouldn't we have to use patch_instruction() here?
(It seems ppc kprobe implementation should also be updated to use it...)

> +		list_del_init(&op->list);
> +	}
> +}
> +
> +void arch_unoptimize_kprobe(struct optimized_kprobe *op)
> +{
> +	arch_arm_kprobe(&op->kp);
> +}
> +
> +void arch_unoptimize_kprobes(struct list_head *oplist,
> +			     struct list_head *done_list)
> +{
> +	struct optimized_kprobe *op;
> +	struct optimized_kprobe *tmp;
> +
> +	list_for_each_entry_safe(op, tmp, oplist, list) {
> +		arch_unoptimize_kprobe(op);
> +		list_move(&op->list, done_list);
> +	}
> +}
> +
> +int arch_within_optimized_kprobe(struct optimized_kprobe *op,
> +				 unsigned long addr)
> +{
> +	return 0;

Here, please check the address range as same as arm32 optprobe implementation.

e.g.
        return ((unsigned long)op->kp.addr <= addr &&
                (unsigned long)op->kp.addr + RELATIVEJUMP_SIZE > addr);


Thank you,

> +}
> diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
> index 3362299..c4b8259 100644
> --- a/arch/powerpc/lib/sstep.c
> +++ b/arch/powerpc/lib/sstep.c
> @@ -2018,3 +2018,24 @@ int __kprobes emulate_step(struct pt_regs *regs, unsigned int instr)
>  	regs->nip = truncate_if_32bit(regs->msr, regs->nip + 4);
>  	return 1;
>  }
> +
> +/* Before optimizing, ensure that the probed instruction is not a
> + * conditional branch instruction
> + */
> +int __kprobes optprobe_conditional_branch_check(unsigned int instr)
> +{
> +	unsigned int opcode;
> +
> +	opcode = instr >> 26;
> +	if (opcode == 16)
> +		return 0;
> +	if (opcode == 19) {
> +		switch ((instr >> 1) & 0x3ff) {
> +		case 16:        /* bclr, bclrl */
> +		case 528:       /* bcctr, bcctrl */
> +		case 560:       /* bctar, bctarl */
> +			return 0;
> +		}
> +	}
> +	return 1;
> +}
> -- 
> 1.8.3.1
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3] arch/powerpc : optprobes for powerpc core
  2016-09-08 16:47   ` Masami Hiramatsu
@ 2016-09-09 10:49     ` Anju T Sudhakar
  2016-09-09 15:34       ` Masami Hiramatsu
  0 siblings, 1 reply; 8+ messages in thread
From: Anju T Sudhakar @ 2016-09-09 10:49 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: linux-kernel, linuxppc-dev, ananth, naveen.n.rao, paulus, srikar,
	benh, mpe, hemant, mahesh

[-- Attachment #1: Type: text/plain, Size: 13524 bytes --]

Hi Masami,


Thank you for reviewing the patch.


On Thursday 08 September 2016 10:17 PM, Masami Hiramatsu wrote:
> On Wed,  7 Sep 2016 15:03:11 +0530
> Anju T Sudhakar <anju@linux.vnet.ibm.com> wrote:
>
>> Instructions which can be emulated are suppliants for optimization.
>> Before optimization ensure that the address range between the detour
>> buffer allocated and the instruction being probed is within ± 32MB.
>>
>> Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
>> ---
>>   arch/powerpc/include/asm/sstep.h |   1 +
>>   arch/powerpc/kernel/optprobes.c  | 329 +++++++++++++++++++++++++++++++++++++++
>>   arch/powerpc/lib/sstep.c         |  21 +++
>>   3 files changed, 351 insertions(+)
>>   create mode 100644 arch/powerpc/kernel/optprobes.c
>>
>> diff --git a/arch/powerpc/include/asm/sstep.h b/arch/powerpc/include/asm/sstep.h
>> index d3a42cc..cd5f6ab 100644
>> --- a/arch/powerpc/include/asm/sstep.h
>> +++ b/arch/powerpc/include/asm/sstep.h
>> @@ -25,6 +25,7 @@ struct pt_regs;
>>   
>>   /* Emulate instructions that cause a transfer of control. */
>>   extern int emulate_step(struct pt_regs *regs, unsigned int instr);
>> +extern int optprobe_conditional_branch_check(unsigned int instr);
>>   
>>   enum instruction_type {
>>   	COMPUTE,		/* arith/logical/CR op, etc. */
>> diff --git a/arch/powerpc/kernel/optprobes.c b/arch/powerpc/kernel/optprobes.c
>> new file mode 100644
>> index 0000000..7983d07
>> --- /dev/null
>> +++ b/arch/powerpc/kernel/optprobes.c
>> @@ -0,0 +1,329 @@
>> +/*
>> + * Code for Kernel probes Jump optimization.
>> + *
>> + * Copyright 2016, Anju T, IBM Corp.
>> + *
>> + * This program is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU General Public License
>> + * as published by the Free Software Foundation; either version
>> + * 2 of the License, or (at your option) any later version.
>> + */
>> +
>> +#include <linux/kprobes.h>
>> +#include <linux/jump_label.h>
>> +#include <linux/types.h>
>> +#include <linux/slab.h>
>> +#include <linux/list.h>
>> +#include <asm/kprobes.h>
>> +#include <asm/ptrace.h>
>> +#include <asm/cacheflush.h>
>> +#include <asm/code-patching.h>
>> +#include <asm/sstep.h>
>> +
>> +DEFINE_INSN_CACHE_OPS(ppc_optinsn)
>> +
>> +#define TMPL_CALL_HDLR_IDX	\
>> +	(optprobe_template_call_handler - optprobe_template_entry)
>> +#define TMPL_EMULATE_IDX	\
>> +	(optprobe_template_call_emulate - optprobe_template_entry)
>> +#define TMPL_RET_IDX	\
>> +	(optprobe_template_ret - optprobe_template_entry)
>> +#define TMPL_KP_IDX	\
>> +	(optprobe_template_kp_addr - optprobe_template_entry)
>> +#define TMPL_OP1_IDX	\
>> +	(optprobe_template_op_address1 - optprobe_template_entry)
>> +#define TMPL_INSN_IDX	\
>> +	(optprobe_template_insn - optprobe_template_entry)
>> +#define TMPL_END_IDX	\
>> +	(optprobe_template_end - optprobe_template_entry)
>> +
>> +static bool insn_page_in_use;
>> +
>> +static void *__ppc_alloc_insn_page(void)
>> +{
>> +	if (insn_page_in_use)
>> +		return NULL;
>> +	insn_page_in_use = true;
>> +	return &optinsn_slot;
>> +}
>> +
>> +static void __ppc_free_insn_page(void *page __maybe_unused)
>> +{
>> +	insn_page_in_use = false;
>> +}
>> +
>> +struct kprobe_insn_cache kprobe_ppc_optinsn_slots = {
>> +	.mutex = __MUTEX_INITIALIZER(kprobe_ppc_optinsn_slots.mutex),
>> +	.pages = LIST_HEAD_INIT(kprobe_ppc_optinsn_slots.pages),
>> +	/* insn_size initialized later */
>> +	.alloc = __ppc_alloc_insn_page,
>> +	.free = __ppc_free_insn_page,
>> +	.nr_garbage = 0,
>> +};
>> +
>> +kprobe_opcode_t *ppc_get_optinsn_slot(struct optimized_kprobe *op)
>> +{
>> +	/*
>> +	 * The insn slot is allocated from the reserved
>> +	 * area(ie &optinsn_slot).We are not optimizing probes
>> +	 * at module_addr now.
>> +	 */
>> +	if (is_kernel_addr((unsigned long)op->kp.addr))
>> +		return get_ppc_optinsn_slot();
>> +	return NULL;
>> +}
>> +
>> +static void ppc_free_optinsn_slot(struct optimized_kprobe *op)
>> +{
>> +	if (!op->optinsn.insn)
>> +		return;
>> +	if (is_kernel_addr((unsigned long)op->kp.addr))
>> +		free_ppc_optinsn_slot(op->optinsn.insn, 0);
>> +}
>> +
>> +static unsigned long can_optimize(struct kprobe *p)
>> +{
>> +	struct pt_regs *regs;
>> +	unsigned int instr;
>> +
>> +	/*
>> +	 * Not optimizing the kprobe placed by
>> +	 * kretprobe during boot time
>> +	 */
>> +	if (p->addr == (kprobe_opcode_t *)&kretprobe_trampoline)
>> +		return 0;
>> +
>> +	regs = kmalloc(sizeof(*regs), GFP_KERNEL);
>> +	if (!regs)
>> +		return -ENOMEM;
>> +	memset(regs, 0, sizeof(struct pt_regs));
>> +	memcpy(regs, current_pt_regs(), sizeof(struct pt_regs));
>> +	regs->nip = (unsigned long)p->addr;
>> +	instr = *p->ainsn.insn;
>> +
>> +	/* Ensure the instruction can be emulated */
>> +	if (emulate_step(regs, instr) != 1)
>> +		return 0;
>> +	/* Conditional branches are not optimized */
>> +	if (optprobe_conditional_branch_check(instr) != 1)
>> +		return 0;
>> +	return regs->nip;
> Could you free regs here? Or allocate it on stack.

yes. 'regs' can be freed here.
>
>> +}
>> +
>> +static void
>> +optimized_callback(struct optimized_kprobe *op, struct pt_regs *regs)
>> +{
>> +	struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
>> +	unsigned long flags;
>> +
>> +	local_irq_save(flags);
>> +
>> +	if (kprobe_running()) {
>> +		kprobes_inc_nmissed_count(&op->kp);
>> +	} else {
>> +		__this_cpu_write(current_kprobe, &op->kp);
>> +		kcb->kprobe_status = KPROBE_HIT_ACTIVE;
>> +		opt_pre_handler(&op->kp, regs);
>> +		__this_cpu_write(current_kprobe, NULL);
>> +	}
>> +	local_irq_restore(flags);
>> +}
>> +NOKPROBE_SYMBOL(optimized_callback);
>> +
>> +void arch_remove_optimized_kprobe(struct optimized_kprobe *op)
>> +{
>> +	ppc_free_optinsn_slot(op);
>> +	op->optinsn.insn = NULL;
>> +}
>> +
>> +/*
>> + * emulate_step() requires insn to be emulated as
>> + * second parameter. Load register 'r4' with the
>> + * instruction.
>> + */
>> +void create_load_emulate_insn(unsigned int insn, kprobe_opcode_t *addr)
>> +{
>> +	u32 instr, instr2;
>> +
>> +	/* synthesize addis r4,0,(insn)@h */
>> +	 instr = 0x3c000000 | 0x800000 | ((insn >> 16) & 0xffff);
>> +	*addr++ = instr;
>> +
>> +	/* ori r4,r4,(insn)@l */
>> +	instr2 = 0x60000000 | 0x40000 | 0x800000;
>> +	instr2 = instr2 | (insn & 0xffff);
>> +	*addr = instr2;
>> +}
>> +
>> +/*
>> + * optimized_kprobe structure is required as a parameter
>> + * for invoking optimized_callback() from detour buffer.
>> + * Load this value into register 'r3'.
>> + */
>> +void create_load_address_insn(unsigned long val, kprobe_opcode_t *addr)
>> +{
>> +	u32 instr1, instr2, instr3, instr4, instr5;
>> +	/*
>> +	 * 64bit immediate load into r3.
>> +	 * lis r3,(op)@highest
>> +	 */
>> +	instr1 = 0x3c000000 | 0x600000 | ((val >> 48) & 0xffff);
>> +	*addr++ = instr1;
>> +
>> +	/* ori r3,r3,(op)@higher */
>> +	instr2 = 0x60000000 | 0x30000 | 0x600000 | ((val >> 32) & 0xffff);
>> +	*addr++ = instr2;
>> +
>> +	/* rldicr r3,r3,32,31 */
>> +	instr3 = 0x78000004 | 0x30000 | 0x600000 | ((32 & 0x1f) << 11);
>> +	instr3 = instr3 | ((31 & 0x1f) << 6) | ((32 & 0x20) >> 4);
>> +	*addr++ = instr3;
>> +
>> +	/* oris r3,r3,(op)@h */
>> +	instr4 = 0x64000000 |  0x30000 | 0x600000 | ((val >> 16) & 0xffff);
>> +	*addr++ = instr4;
>> +
>> +	/* ori r3,r3,(op)@l */
>> +	instr5 = 0x60000000 | 0x30000 | 0x600000 | (val & 0xffff);
>> +	*addr = instr5;
>> +}
>> +
>> +int arch_prepare_optimized_kprobe(struct optimized_kprobe *op, struct kprobe *p)
>> +{
>> +	kprobe_opcode_t *buff, branch, branch2, branch3;
>> +	long rel_chk, ret_chk;
>> +	unsigned long nip;
>> +
>> +	kprobe_ppc_optinsn_slots.insn_size = MAX_OPTINSN_SIZE;
>> +	op->optinsn.insn = NULL;
>> +	nip = can_optimize(p);
>> +
>> +	if (!nip)
>> +		return -EILSEQ;
>> +
>> +	/* Allocate instruction slot for detour buffer */
>> +	buff = ppc_get_optinsn_slot(op);
>> +	if (!buff)
>> +		return -ENOMEM;
>> +
>> +	/*
>> +	 * OPTPROBE use a 'b' instruction to branch to optinsn.insn.
>> +	 *
>> +	 * The target address has to be relatively nearby, to permit use
>> +	 * of branch instruction in powerpc because the address is specified
>> +	 * in an immediate field in the instruction opcode itself, ie 24 bits
>> +	 * in the opcode specify the address. Therefore the address gap should
>> +	 * be 32MB on either side of the current instruction.
>> +	 */
>> +	rel_chk = (long)buff - (unsigned long)p->addr;
>> +	if (rel_chk < -0x2000000 || rel_chk > 0x1fffffc || rel_chk & 0x3) {
>> +		ppc_free_optinsn_slot(op);
> This doesn't work because op->optinsn.insn is NULL here. (buff is assigned
> at the end of this function)

yes. You are right. Need to free 'buff' here.
Thank you for pointing out this.
>> +		return -ERANGE;
>> +	}
>> +	/* Check the return address is also within 32MB range */
>> +	ret_chk = (long)(buff + TMPL_RET_IDX) - (unsigned long)nip;
>> +	if (ret_chk < -0x2000000 || ret_chk > 0x1fffffc || ret_chk & 0x3) {
>> +		ppc_free_optinsn_slot(op);
> ditto.
>
>> +		return -ERANGE;
>> +	}
>> +
>> +	/* Do Copy arch specific instance from template */
>> +	memcpy(buff, optprobe_template_entry,
>> +	       TMPL_END_IDX * sizeof(kprobe_opcode_t));
>> +
>> +	/* Load address into register */
>> +	create_load_address_insn((unsigned long)p->addr, buff + TMPL_KP_IDX);
>> +	create_load_address_insn((unsigned long)op, buff + TMPL_OP1_IDX);
>> +
>> +	/*
>> +	 * Create a branch to the optimized_callback function.
>> +	 * optimized_callback, points to the global entry point.
>> +	 * Add +8, to create a branch to the LEP of the function.
>> +	 */
>> +	branch = create_branch((unsigned int *)buff + TMPL_CALL_HDLR_IDX,
>> +			       (unsigned long)optimized_callback + 8,
>> +				BRANCH_SET_LINK);
>> +
>> +	/* Place the branch instr into the trampoline */
>> +	buff[TMPL_CALL_HDLR_IDX] = branch;
>> +
>> +	/* Load instruction to be emulated into relevant register */
>> +	create_load_emulate_insn(*p->ainsn.insn, buff + TMPL_INSN_IDX);
>> +
>> +	/*
>> +	 * Create a branch instruction into the emulate_step.
>> +	 * Add +8, to create the branch to LEP of emulate_step().
>> +	 */
>> +	branch3 = create_branch((unsigned int *)buff + TMPL_EMULATE_IDX,
>> +				(unsigned long)emulate_step + 8,
>> +				BRANCH_SET_LINK);
>> +	buff[TMPL_EMULATE_IDX] = branch3;
>> +
>> +	/* Create a branch for jumping back */
>> +	branch2 = create_branch((unsigned int *)buff + TMPL_RET_IDX,
>> +				(unsigned long)nip, 0);
>> +	buff[TMPL_RET_IDX] = branch2;
>> +
>> +	op->optinsn.insn = buff;
>> +	smp_mb();
>> +	return 0;
>> +}
>> +
>> +int arch_prepared_optinsn(struct arch_optimized_insn *optinsn)
>> +{
>> +	return optinsn->insn != NULL;
>> +}
>> +
>> +/*
>> + * Here,kprobe opt always replace one instruction (4 bytes
>> + * aligned and 4 bytes long). It is impossible to encounter another
>> + * kprobe in the address range. So always return 0.
>> + */
>> +int arch_check_optimized_kprobe(struct optimized_kprobe *op)
>> +{
>> +	return 0;
>> +}
>> +
>> +void arch_optimize_kprobes(struct list_head *oplist)
>> +{
>> +	struct optimized_kprobe *op;
>> +	struct optimized_kprobe *tmp;
>> +
>> +	unsigned int branch;
>> +
>> +	list_for_each_entry_safe(op, tmp, oplist, list) {
>> +		/*
>> +		 * Backup instructions which will be replaced
>> +		 * by jump address
>> +		 */
>> +		memcpy(op->optinsn.copied_insn, op->kp.addr,
>> +		       RELATIVEJUMP_SIZE);
>> +		branch = create_branch((unsigned int *)op->kp.addr,
>> +				       (unsigned long)op->optinsn.insn, 0);
>> +		*op->kp.addr = branch;
> Hmm, wouldn't we have to use patch_instruction() here?
> (It seems ppc kprobe implementation should also be updated to use it...)
>> +		list_del_init(&op->list);
>> +	}
>> +}
>> +
>> +void arch_unoptimize_kprobe(struct optimized_kprobe *op)
>> +{
>> +	arch_arm_kprobe(&op->kp);
>> +}
>> +
>> +void arch_unoptimize_kprobes(struct list_head *oplist,
>> +			     struct list_head *done_list)
>> +{
>> +	struct optimized_kprobe *op;
>> +	struct optimized_kprobe *tmp;
>> +
>> +	list_for_each_entry_safe(op, tmp, oplist, list) {
>> +		arch_unoptimize_kprobe(op);
>> +		list_move(&op->list, done_list);
>> +	}
>> +}
>> +
>> +int arch_within_optimized_kprobe(struct optimized_kprobe *op,
>> +				 unsigned long addr)
>> +{
>> +	return 0;
> Here, please check the address range as same as arm32 optprobe implementation.
>
> e.g.
>          return ((unsigned long)op->kp.addr <= addr &&
>                  (unsigned long)op->kp.addr + RELATIVEJUMP_SIZE > addr);
>
>
> Thank you,

Do we really need this? The only case this check will succeed is if  
kp.addr is not a multiple of 4, which is not a valid address at all  
onPower. So should we again check here for that?



Thanks and regards
-Anju
>> +}
>> diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
>> index 3362299..c4b8259 100644
>> --- a/arch/powerpc/lib/sstep.c
>> +++ b/arch/powerpc/lib/sstep.c
>> @@ -2018,3 +2018,24 @@ int __kprobes emulate_step(struct pt_regs *regs, unsigned int instr)
>>   	regs->nip = truncate_if_32bit(regs->msr, regs->nip + 4);
>>   	return 1;
>>   }
>> +
>> +/* Before optimizing, ensure that the probed instruction is not a
>> + * conditional branch instruction
>> + */
>> +int __kprobes optprobe_conditional_branch_check(unsigned int instr)
>> +{
>> +	unsigned int opcode;
>> +
>> +	opcode = instr >> 26;
>> +	if (opcode == 16)
>> +		return 0;
>> +	if (opcode == 19) {
>> +		switch ((instr >> 1) & 0x3ff) {
>> +		case 16:        /* bclr, bclrl */
>> +		case 528:       /* bcctr, bcctrl */
>> +		case 560:       /* bctar, bctarl */
>> +			return 0;
>> +		}
>> +	}
>> +	return 1;
>> +}
>> -- 
>> 1.8.3.1
>>
>


[-- Attachment #2: Type: text/html, Size: 14703 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3] arch/powerpc : optprobes for powerpc core
  2016-09-09 10:49     ` Anju T Sudhakar
@ 2016-09-09 15:34       ` Masami Hiramatsu
  0 siblings, 0 replies; 8+ messages in thread
From: Masami Hiramatsu @ 2016-09-09 15:34 UTC (permalink / raw)
  To: Anju T Sudhakar
  Cc: linux-kernel, linuxppc-dev, ananth, naveen.n.rao, paulus, srikar,
	benh, mpe, hemant, mahesh

Hi Anju,

On Fri, 9 Sep 2016 16:19:41 +0530
Anju T Sudhakar <anju@linux.vnet.ibm.com> wrote:
> >> +void arch_unoptimize_kprobe(struct optimized_kprobe *op)
> >> +{
> >> +	arch_arm_kprobe(&op->kp);
> >> +}
> >> +
> >> +void arch_unoptimize_kprobes(struct list_head *oplist,
> >> +			     struct list_head *done_list)
> >> +{
> >> +	struct optimized_kprobe *op;
> >> +	struct optimized_kprobe *tmp;
> >> +
> >> +	list_for_each_entry_safe(op, tmp, oplist, list) {
> >> +		arch_unoptimize_kprobe(op);
> >> +		list_move(&op->list, done_list);
> >> +	}
> >> +}
> >> +
> >> +int arch_within_optimized_kprobe(struct optimized_kprobe *op,
> >> +				 unsigned long addr)
> >> +{
> >> +	return 0;
> > Here, please check the address range as same as arm32 optprobe implementation.
> >
> > e.g.
> >          return ((unsigned long)op->kp.addr <= addr &&
> >                  (unsigned long)op->kp.addr + RELATIVEJUMP_SIZE > addr);
> >
> >
> > Thank you,
> 
> Do we really need this? The only case this check will succeed is if  
> kp.addr is not a multiple of 4, which is not a valid address at all  
> onPower. So should we again check here for that?

Yes, since that is exported function, which means it can be used from
other part, other usage (e.g. for debug reason someone wants to use it).
Please do not optimize the code only for current implementation, but
for generic use case. 

Thank  you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-09-09 15:34 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-07  9:33 [PATCH 0/3] OPTPROBES for powerpc Anju T Sudhakar
2016-09-07  9:33 ` [PATCH 1/3] arch/powerpc : Add detour buffer support for optprobes Anju T Sudhakar
2016-09-07  9:33 ` [PATCH 2/3] arch/powerpc : optprobes for powerpc core Anju T Sudhakar
2016-09-08 16:47   ` Masami Hiramatsu
2016-09-09 10:49     ` Anju T Sudhakar
2016-09-09 15:34       ` Masami Hiramatsu
2016-09-07  9:33 ` [PATCH 3/3] arch/powerpc : Enable optprobes support in powerpc Anju T Sudhakar
2016-09-08 16:11 ` [PATCH 0/3] OPTPROBES for powerpc Masami Hiramatsu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.