linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v2 0/3] OPTPROBES for powerpc
@ 2016-05-19 15:10 Anju T
  2016-05-19 15:10 ` [RFC PATCH v2 1/3] arch/powerpc : Add detour buffer support for optprobes Anju T
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Anju T @ 2016-05-19 15:10 UTC (permalink / raw)
  To: linux-kernel, linuxppc-dev
  Cc: anju, ananth, naveen.n.rao, paulus, srikar, benh, mpe, hemant,
	mahesh, mhiramat

Here are the RFC patchset of the kprobes jump optimization
(a.k.a OPTPROBES)for powerpc. Kprobe being an inevitable tool
for kernel developers,enhancing the performance of kprobe has
got much importance.

Currently kprobes inserts a trap instruction to probe a running kernel.
Jump optimization allows kprobes to replace the trap with a branch,reducing
the probe overhead drastically.

Performance:
=============
An optimized kprobe in powerpc is 1.05 to 4.7 times faster than a kprobe.

Example:

Placed a probe at an offset 0x50 in _do_fork().
*Time Diff here is, difference in time before hitting the probe and after the probed instruction.
mftb() is employed in kernel/fork.c for this purpose.


# echo 0 > /proc/sys/debug/kprobes-optimization 
Kprobes globally unoptimized

[  233.607120] Time Diff = 0x1f0
[  233.608273] Time Diff = 0x1ee
[  233.609228] Time Diff = 0x203
[  233.610400] Time Diff = 0x1ec
[  233.611335] Time Diff = 0x200
[  233.612552] Time Diff = 0x1f0
[  233.613386] Time Diff = 0x1ee
[  233.614547] Time Diff = 0x212
[  233.615570] Time Diff = 0x206
[  233.616819] Time Diff = 0x1f3
[  233.617773] Time Diff = 0x1ec
[  233.618944] Time Diff = 0x1fb
[  233.619879] Time Diff = 0x1f0
[  233.621066] Time Diff = 0x1f9
[  233.621999] Time Diff = 0x283
[  233.623281] Time Diff = 0x24d
[  233.624172] Time Diff = 0x1ea
[  233.625381] Time Diff = 0x1f0
[  233.626358] Time Diff = 0x200
[  233.627572] Time Diff = 0x1ed

# echo 1 > /proc/sys/debug/kprobes-optimization 
Kprobes globally optimized

[   70.797075] Time Diff = 0x103
[   70.799102] Time Diff = 0x181
[   70.801861] Time Diff = 0x15e
[   70.803466] Time Diff = 0xf0
[   70.804348] Time Diff = 0xd0
[   70.805653] Time Diff = 0xad
[   70.806477] Time Diff = 0xe0
[   70.807725] Time Diff = 0xbe
[   70.808541] Time Diff = 0xc3
[   70.810191] Time Diff = 0xc7
[   70.811007] Time Diff = 0xc0
[   70.812629] Time Diff = 0xc0
[   70.813640] Time Diff = 0xda
[   70.814915] Time Diff = 0xbb
[   70.815726] Time Diff = 0xc4
[   70.816955] Time Diff = 0xc0
[   70.817778] Time Diff = 0xcd
[   70.818999] Time Diff = 0xcd
[   70.820099] Time Diff = 0xcb
[   70.821333] Time Diff = 0xf0

Implementation:
===================

The trap instruction is replaced by a branch to a detour buffer.
To address the limitation of branch instruction in power architecture
detour buffer slot is allocated from a reserved area . This will ensure
that the branch is within +/- 32 MB range. Patch 2/3 furnishes this.
The current kprobes insn caches  allocate memory area for insn slots
with module_alloc(). This will always be beyond +/- 32MB range.
Hence for allocating and freeing  slots from this reserved area
ppc_get_optinsn_slot() and ppc_free_optinsns_slot() are introduced.

The detour buffer contains a call to optimized_callback() which in turn
call the pre_handler(). Once the pre-handler is run, the original instruction
is emulated from the detour buffer itself. Also the detour buffer is equipped
with a branch back to the normal work flow after the probed instruction is emulated.
Before preparing optimization, Kprobes inserts original(user-defined) kprobe on the
specified address. So, even if the kprobe is not possible to be optimized, it just uses
a normal kprobe.

Limitations:
==============

- Number of probes which can be optimized is limited by the size of the area reserved.

        * TODO: Have a template based implementation that will alleviate the probe count by
          using a lesser space from the reserved area for optimization.

- Currently instructions which can be emulated are the only candidates for optimization.



Changes from RFC-v1:
-----------------------
- Detour buffer memory reservation code moved to optprobes.c
- optimized_callback() is marked as NOKPROBE_SYMBOL.
- Return NULL when there is no more slots to allocate from detour buffer.
- Other comments by Masami are addressed.


Kindly let me know your suggestions and comments.

Thanks
-Anju


Anju T (3):
  arch/powerpc : Add detour buffer support for optprobes
  arch/powerpc : optprobes for powerpc core
  arch/powerpc : Enable optprobes support in powerpc

 .../features/debug/optprobes/arch-support.txt      |   2 +-
 arch/powerpc/Kconfig                               |   1 +
 arch/powerpc/include/asm/kprobes.h                 |  25 ++
 arch/powerpc/kernel/Makefile                       |   1 +
 arch/powerpc/kernel/optprobes.c                    | 474 +++++++++++++++++++++
 arch/powerpc/kernel/optprobes_head.S               | 108 +++++
 6 files changed, 610 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/kernel/optprobes.c
 create mode 100644 arch/powerpc/kernel/optprobes_head.S

-- 
2.1.0

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC PATCH v2 1/3] arch/powerpc : Add detour buffer support for optprobes
  2016-05-19 15:10 [RFC PATCH v2 0/3] OPTPROBES for powerpc Anju T
@ 2016-05-19 15:10 ` Anju T
  2016-05-24 10:15   ` Madhavan Srinivasan
                     ` (2 more replies)
  2016-05-19 15:10 ` [RFC PATCH v2 2/3] arch/powerpc : optprobes for powerpc core Anju T
  2016-05-19 15:10 ` [RFC PATCH v2 3/3] arch/powerpc : Enable optprobes support in powerpc Anju T
  2 siblings, 3 replies; 9+ messages in thread
From: Anju T @ 2016-05-19 15:10 UTC (permalink / raw)
  To: linux-kernel, linuxppc-dev
  Cc: anju, ananth, naveen.n.rao, paulus, srikar, benh, mpe, hemant,
	mahesh, mhiramat

Detour buffer contains instructions to create an in memory pt_regs.
After the execution of prehandler a call is made for instruction emulation.
The NIP is decided after the probed instruction is executed. Hence a branch
instruction is created to the NIP returned by emulate_step().

Instruction slot for detour buffer is allocated from
the reserved area. For the time being 64KB is reserved
in memory for this purpose.

Signed-off-by: Anju T <anju@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/kprobes.h   |  25 ++++++++
 arch/powerpc/kernel/optprobes_head.S | 108 +++++++++++++++++++++++++++++++++++
 2 files changed, 133 insertions(+)
 create mode 100644 arch/powerpc/kernel/optprobes_head.S

diff --git a/arch/powerpc/include/asm/kprobes.h b/arch/powerpc/include/asm/kprobes.h
index 039b583..3e4c998 100644
--- a/arch/powerpc/include/asm/kprobes.h
+++ b/arch/powerpc/include/asm/kprobes.h
@@ -38,7 +38,25 @@ struct pt_regs;
 struct kprobe;
 
 typedef ppc_opcode_t kprobe_opcode_t;
+
+extern kprobe_opcode_t optinsn_slot;
+/* Optinsn template address */
+extern kprobe_opcode_t optprobe_template_entry[];
+extern kprobe_opcode_t optprobe_template_call_handler[];
+extern kprobe_opcode_t optprobe_template_call_emulate[];
+extern kprobe_opcode_t optprobe_template_ret_branch[];
+extern kprobe_opcode_t optprobe_template_ret[];
+extern kprobe_opcode_t optprobe_template_insn[];
+extern kprobe_opcode_t optprobe_template_op_address1[];
+extern kprobe_opcode_t optprobe_template_op_address2[];
+extern kprobe_opcode_t optprobe_template_end[];
+
 #define MAX_INSN_SIZE 1
+#define MAX_OPTIMIZED_LENGTH    4
+#define MAX_OPTINSN_SIZE				\
+	((unsigned long)&optprobe_template_end -	\
+	(unsigned long)&optprobe_template_entry)
+#define RELATIVEJUMP_SIZE       4
 
 #ifdef CONFIG_PPC64
 #if defined(_CALL_ELF) && _CALL_ELF == 2
@@ -129,5 +147,12 @@ struct kprobe_ctlblk {
 extern int kprobe_exceptions_notify(struct notifier_block *self,
 					unsigned long val, void *data);
 extern int kprobe_fault_handler(struct pt_regs *regs, int trapnr);
+
+struct arch_optimized_insn {
+	kprobe_opcode_t copied_insn[1];
+	/* detour buffer */
+	kprobe_opcode_t *insn;
+};
+
 #endif /* __KERNEL__ */
 #endif	/* _ASM_POWERPC_KPROBES_H */
diff --git a/arch/powerpc/kernel/optprobes_head.S b/arch/powerpc/kernel/optprobes_head.S
new file mode 100644
index 0000000..ce32aec
--- /dev/null
+++ b/arch/powerpc/kernel/optprobes_head.S
@@ -0,0 +1,108 @@
+/*
+ * Code to prepare detour buffer for optprobes in kernel.
+ *
+ * Copyright 2016, Anju T, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <asm/ppc_asm.h>
+#include <asm/ptrace.h>
+#include <asm/asm-offsets.h>
+
+.global optinsn_slot
+optinsn_slot:
+	/* Reserve an area to allocate slots for detour buffer */
+	.space	65536
+.global optprobe_template_entry
+optprobe_template_entry:
+	stdu	r1,-INT_FRAME_SIZE(r1)
+	SAVE_GPR(0,r1)
+	/* Save the previous SP into stack */
+	addi	r0,r1,INT_FRAME_SIZE
+	std	0,GPR1(r1)
+	SAVE_2GPRS(2,r1)
+	SAVE_8GPRS(4,r1)
+	SAVE_10GPRS(12,r1)
+	SAVE_10GPRS(22,r1)
+	/* Save SPRS */
+	mfcfar	r5
+	std	r5,_NIP(r1)
+	mfmsr	r5
+	std	r5,_MSR(r1)
+	mfctr	r5
+	std	r5,_CTR(r1)
+	mflr	r5
+	std	r5,_LINK(r1)
+	mfspr	r5,SPRN_XER
+	std	r5,_XER(r1)
+	li	r5,0
+	std	r5,_TRAP(r1)
+	mfdar	r5
+	std	r5,_DAR(r1)
+	mfdsisr	r5
+	std	r5,_DSISR(r1)
+	/* Pass parameters for optimized_callback */
+.global optprobe_template_op_address1
+optprobe_template_op_address1:
+	nop
+	nop
+	nop
+	nop
+	nop
+	addi	r4,r1,STACK_FRAME_OVERHEAD
+	/* Branch to the prehandler */
+.global optprobe_template_call_handler
+optprobe_template_call_handler:
+	nop
+	/* Pass parameters for instruction emulation */
+	addi	r3,r1,STACK_FRAME_OVERHEAD
+.global optprobe_template_insn
+optprobe_template_insn:
+	nop
+	nop
+	/* Branch to instruction emulation  */
+.global optprobe_template_call_emulate
+optprobe_template_call_emulate:
+	nop
+.global optprobe_template_op_address2
+optprobe_template_op_address2:
+	nop
+	nop
+	nop
+	nop
+	nop
+	addi	r4,r1,STACK_FRAME_OVERHEAD
+	/* Branch to create_return_branch() function */
+.global optprobe_template_ret_branch
+optprobe_template_ret_branch:
+	nop
+	/* Restore the registers */
+	ld	r5,_MSR(r1)
+	mtmsr	r5
+	ld	r5,_CTR(r1)
+	mtctr	r5
+	ld	r5,_LINK(r1)
+	mtlr	r5
+	ld	r5,_XER(r1)
+	mtxer	r5
+	ld	r5,_DAR(r1)
+	mtdar	r5
+	ld	r5,_DSISR(r1)
+	mtdsisr	r5
+	REST_GPR(0,r1)
+	REST_2GPRS(2,r1)
+	REST_8GPRS(4,r1)
+	REST_10GPRS(12,r1)
+	REST_10GPRS(22,r1)
+	/* Restore the previous SP */
+	addi	r1,r1,INT_FRAME_SIZE
+	/* Jump back to the normal workflow from trampoline */
+.global optprobe_template_ret
+optprobe_template_ret:
+	nop
+.global optprobe_template_end
+optprobe_template_end:
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC PATCH v2 2/3] arch/powerpc : optprobes for powerpc core
  2016-05-19 15:10 [RFC PATCH v2 0/3] OPTPROBES for powerpc Anju T
  2016-05-19 15:10 ` [RFC PATCH v2 1/3] arch/powerpc : Add detour buffer support for optprobes Anju T
@ 2016-05-19 15:10 ` Anju T
  2016-05-20 12:37   ` Masami Hiramatsu
  2016-05-19 15:10 ` [RFC PATCH v2 3/3] arch/powerpc : Enable optprobes support in powerpc Anju T
  2 siblings, 1 reply; 9+ messages in thread
From: Anju T @ 2016-05-19 15:10 UTC (permalink / raw)
  To: linux-kernel, linuxppc-dev
  Cc: anju, ananth, naveen.n.rao, paulus, srikar, benh, mpe, hemant,
	mahesh, mhiramat

ppc_get_optinsn_slot() and ppc_free_optinsn_slot() are
geared towards the allocation and freeing of memory from 
the area reserved for detour buffer.

Signed-off-by: Anju T <anju@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/optprobes.c | 480 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 480 insertions(+)
 create mode 100644 arch/powerpc/kernel/optprobes.c

diff --git a/arch/powerpc/kernel/optprobes.c b/arch/powerpc/kernel/optprobes.c
new file mode 100644
index 0000000..bb61e18
--- /dev/null
+++ b/arch/powerpc/kernel/optprobes.c
@@ -0,0 +1,480 @@
+/*
+ * Code for Kernel probes Jump optimization.
+ *
+ * Copyright 2016, Anju T, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/kprobes.h>
+#include <linux/jump_label.h>
+#include <linux/types.h>
+#include <linux/slab.h>
+#include <linux/list.h>
+#include <asm/kprobes.h>
+#include <asm/ptrace.h>
+#include <asm/cacheflush.h>
+#include <asm/code-patching.h>
+#include <asm/sstep.h>
+
+#define SLOT_SIZE 65536
+#define TMPL_CALL_HDLR_IDX	\
+	(optprobe_template_call_handler - optprobe_template_entry)
+#define TMPL_EMULATE_IDX	\
+	(optprobe_template_call_emulate - optprobe_template_entry)
+#define TMPL_RET_BRANCH_IDX	\
+	(optprobe_template_ret_branch - optprobe_template_entry)
+#define TMPL_RET_IDX	\
+	(optprobe_template_ret - optprobe_template_entry)
+#define TMPL_OP1_IDX	\
+	(optprobe_template_op_address1 - optprobe_template_entry)
+#define TMPL_OP2_IDX	\
+	(optprobe_template_op_address2 - optprobe_template_entry)
+#define TMPL_INSN_IDX	\
+	(optprobe_template_insn - optprobe_template_entry)
+#define TMPL_END_IDX	\
+	(optprobe_template_end - optprobe_template_entry)
+
+struct kprobe_ppc_insn_page {
+	struct list_head list;
+	kprobe_opcode_t *insns;	/* Page of instruction slots */
+	struct kprobe_insn_cache *cache;
+	int nused;
+	int ngarbage;
+	char slot_used[];
+};
+
+#define PPC_KPROBE_INSN_PAGE_SIZE(slots)	\
+	(offsetof(struct kprobe_ppc_insn_page, slot_used) +	\
+	(sizeof(char) * (slots)))
+
+enum ppc_kprobe_slot_state {
+	SLOT_CLEAN = 0,
+	SLOT_DIRTY = 1,
+	SLOT_USED = 2,
+};
+
+static struct kprobe_insn_cache kprobe_ppc_optinsn_slots = {
+	.mutex = __MUTEX_INITIALIZER(kprobe_ppc_optinsn_slots.mutex),
+	.pages = LIST_HEAD_INIT(kprobe_ppc_optinsn_slots.pages),
+	/* .insn_size is initialized later */
+	.nr_garbage = 0,
+};
+
+static int ppc_slots_per_page(struct kprobe_insn_cache *c)
+{
+	/*
+	 * Here the #slots per page differs from x86 as we have
+	 * only 64KB reserved.
+	 */
+	return SLOT_SIZE / (c->insn_size * sizeof(kprobe_opcode_t));
+}
+
+/* Return 1 if all garbages are collected, otherwise 0. */
+static int collect_one_slot(struct kprobe_ppc_insn_page *kip, int idx)
+{
+	kip->slot_used[idx] = SLOT_CLEAN;
+	kip->nused--;
+	return 0;
+}
+
+static int collect_garbage_slots(struct kprobe_insn_cache *c)
+{
+	struct kprobe_ppc_insn_page *kip, *next;
+
+	/* Ensure no-one is interrupted on the garbages */
+	synchronize_sched();
+
+	list_for_each_entry_safe(kip, next, &c->pages, list) {
+		int i;
+
+		if (kip->ngarbage == 0)
+			continue;
+		kip->ngarbage = 0;	/* we will collect all garbages */
+		for (i = 0; i < ppc_slots_per_page(c); i++) {
+			if (kip->slot_used[i] == SLOT_DIRTY &&
+			    collect_one_slot(kip, i))
+				break;
+		}
+	}
+	c->nr_garbage = 0;
+	return 0;
+}
+
+kprobe_opcode_t  *__ppc_get_optinsn_slot(struct kprobe_insn_cache *c)
+{
+	struct kprobe_ppc_insn_page *kip;
+	kprobe_opcode_t *slot = NULL;
+
+	mutex_lock(&c->mutex);
+	list_for_each_entry(kip, &c->pages, list) {
+		if (kip->nused < ppc_slots_per_page(c)) {
+			int i;
+
+			for (i = 0; i < ppc_slots_per_page(c); i++) {
+				if (kip->slot_used[i] == SLOT_CLEAN) {
+					kip->slot_used[i] = SLOT_USED;
+					kip->nused++;
+					slot = kip->insns + (i * c->insn_size);
+					goto out;
+				}
+			}
+			/* kip->nused reached max value. */
+			kip->nused = ppc_slots_per_page(c);
+			WARN_ON(1);
+		}
+		if (!list_empty(&c->pages)) {
+			pr_info("No more slots to allocate\n");
+			return NULL;
+		}
+	}
+	kip = kmalloc(PPC_KPROBE_INSN_PAGE_SIZE(ppc_slots_per_page(c)),
+		      GFP_KERNEL);
+	if (!kip)
+		goto out;
+	/*
+	 * Allocate from the reserved area so as to
+	 * ensure the range will be within +/-32MB
+	 */
+	kip->insns = &optinsn_slot;
+	if (!kip->insns) {
+		kfree(kip);
+		goto out;
+	}
+	INIT_LIST_HEAD(&kip->list);
+	memset(kip->slot_used, SLOT_CLEAN, ppc_slots_per_page(c));
+	kip->slot_used[0] = SLOT_USED;
+	kip->nused = 1;
+	kip->ngarbage = 0;
+	kip->cache = c;
+	list_add(&kip->list, &c->pages);
+	slot = kip->insns;
+out:
+	mutex_unlock(&c->mutex);
+	return slot;
+}
+
+kprobe_opcode_t *ppc_get_optinsn_slot(struct optimized_kprobe *op)
+{
+	/*
+	 * The insn slot is allocated from the reserved
+	 * area(ie &optinsn_slot).We are not optimizing probes
+	 * at module_addr now.
+	 */
+	kprobe_opcode_t *slot = NULL;
+
+	if (is_kernel_addr(op->kp.addr))
+		slot = __ppc_get_optinsn_slot(&kprobe_ppc_optinsn_slots);
+	else
+		pr_info("Kprobe can not be optimized\n");
+	return slot;
+}
+
+void __ppc_free_optinsn_slot(struct kprobe_insn_cache *c,
+			     kprobe_opcode_t *slot, int dirty)
+{
+	struct kprobe_ppc_insn_page *kip;
+
+	mutex_lock(&c->mutex);
+
+	list_for_each_entry(kip, &c->pages, list) {
+		long idx = ((long)slot - (long)kip->insns) /
+				(c->insn_size * sizeof(kprobe_opcode_t));
+		if (idx >= 0 && idx < ppc_slots_per_page(c)) {
+			WARN_ON(kip->slot_used[idx] != SLOT_USED);
+			if (dirty) {
+				kip->slot_used[idx] = SLOT_DIRTY;
+				kip->ngarbage++;
+				if (++c->nr_garbage > ppc_slots_per_page(c))
+					collect_garbage_slots(c);
+			} else
+				collect_one_slot(kip, idx);
+			goto out;
+		}
+	}
+	/* Could not free this slot. */
+	WARN_ON(1);
+out:
+	mutex_unlock(&c->mutex);
+}
+
+static void ppc_free_optinsn_slot(struct optimized_kprobe *op)
+{
+	if (!op->optinsn.insn)
+		return;
+	if (is_kernel_addr((unsigned long)op->kp.addr))
+		__ppc_free_optinsn_slot(&kprobe_ppc_optinsn_slots,
+					op->optinsn.insn, 0);
+}
+
+static void
+__arch_remove_optimized_kprobe(struct optimized_kprobe *op, int dirty)
+{
+	if (op->optinsn.insn) {
+		ppc_free_optinsn_slot(op);
+		op->optinsn.insn = NULL;
+	}
+}
+
+static int can_optimize(struct kprobe *p)
+{
+	struct pt_regs *regs;
+	unsigned int instr;
+	int r;
+
+	/*
+	 * Not optimizing the kprobe placed by
+	 * kretprobe during boot time
+	 */
+	if ((kprobe_opcode_t)p->addr == (kprobe_opcode_t)&kretprobe_trampoline)
+		return 0;
+
+	regs = current_pt_regs();
+	instr = *(p->ainsn.insn);
+
+	/* Ensure the instruction can be emulated*/
+	r = emulate_step(regs, instr);
+	if (r != 1)
+		return 0;
+
+	return 1;
+}
+
+static void
+create_return_branch(struct optimized_kprobe *op, struct pt_regs *regs)
+{
+	/*
+	 * Create a branch back to the return address
+	 * after the probed instruction is emulated
+	 */
+
+	kprobe_opcode_t branch, *buff;
+	unsigned long ret;
+
+	ret = regs->nip;
+	buff = op->optinsn.insn;
+
+	branch = create_branch((unsigned int *)buff + TMPL_RET_IDX,
+			       (unsigned long)ret, 0);
+	buff[TMPL_RET_IDX] = branch;
+}
+
+static void
+optimized_callback(struct optimized_kprobe *op, struct pt_regs *regs)
+{
+	struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
+	unsigned long flags;
+
+	local_irq_save(flags);
+
+	if (kprobe_running())
+		kprobes_inc_nmissed_count(&op->kp);
+	else {
+		__this_cpu_write(current_kprobe, &op->kp);
+		kcb->kprobe_status = KPROBE_HIT_ACTIVE;
+		opt_pre_handler(&op->kp, regs);
+		__this_cpu_write(current_kprobe, NULL);
+	}
+	local_irq_restore(flags);
+}
+NOKPROBE_SYMBOL(optimized_callback);
+
+void arch_remove_optimized_kprobe(struct optimized_kprobe *op)
+{
+	 __arch_remove_optimized_kprobe(op, 1);
+}
+
+void  create_insn(unsigned int insn, kprobe_opcode_t *addr)
+{
+	u32 instr, instr2;
+
+	/*
+	 * emulate_step() requires insn to be emulated as
+	 * second parameter. Hence r4 should be loaded
+	 * with 'insn'.
+	 * synthesize addis r4,0,(insn)@h
+	 */
+	instr = 0x3c000000 | 0x800000 | ((insn >> 16) & 0xffff);
+	*addr++ = instr;
+
+	/* ori r4,r4,(insn)@l */
+	instr2 = 0x60000000 | 0x40000 | 0x800000;
+	instr2 = instr2 | (insn & 0xffff);
+	*addr = instr2;
+}
+
+void create_load_address_insn(struct optimized_kprobe *op,
+			      kprobe_opcode_t *addr, kprobe_opcode_t *addr2)
+{
+	u32 instr1, instr2, instr3, instr4, instr5;
+	unsigned long val;
+
+	val = op;
+
+	/*
+	 * Optimized_kprobe structure is required as a parameter
+	 * for invoking optimized_callback() and create_return_branch()
+	 * from detour buffer. Hence need to have a 64bit immediate
+	 * load into r3.
+	 *
+	 * lis r3,(op)@highest
+	 */
+	instr1 = 0x3c000000 | 0x600000 | ((val >> 48) & 0xffff);
+	*addr++ = instr1;
+	*addr2++ = instr1;
+
+	/* ori r3,r3,(op)@higher */
+	instr2 = 0x60000000 | 0x30000 | 0x600000 | ((val >> 32) & 0xffff);
+	*addr++ = instr2;
+	*addr2++ = instr2;
+
+	/* rldicr r3,r3,32,31 */
+	instr3 = 0x78000004 | 0x30000 | 0x600000 | ((32 & 0x1f) << 11);
+	instr3 = instr3 | ((31 & 0x1f) << 6) | ((32 & 0x20) >> 4);
+	*addr++ = instr3;
+	*addr2++ = instr3;
+
+	/* oris r3,r3,(op)@h */
+	instr4 = 0x64000000 |  0x30000 | 0x600000 | ((val >> 16) & 0xffff);
+	*addr++ = instr4;
+	*addr2++ = instr4;
+
+	/* ori r3,r3,(op)@l */
+	instr5 =  0x60000000 | 0x30000 | 0x600000 | (val & 0xffff);
+	*addr = instr5;
+	*addr2 = instr5;
+}
+
+int arch_prepare_optimized_kprobe(struct optimized_kprobe *op, struct kprobe *p)
+{
+	kprobe_opcode_t *buff, branch, branch2, branch3;
+	long rel_chk, ret_chk;
+
+	kprobe_ppc_optinsn_slots.insn_size = MAX_OPTINSN_SIZE;
+	op->optinsn.insn = NULL;
+
+	if (!can_optimize(p))
+		return -EILSEQ;
+
+	/* Allocate instruction slot for detour buffer*/
+	buff = ppc_get_optinsn_slot(op);
+	if (!buff)
+		return -ENOMEM;
+
+	/*
+	 * OPTPROBE use a 'b' instruction to branch to optinsn.insn.
+	 *
+	 * The target address has to be relatively nearby, to permit use
+	 * of branch instruction in powerpc because the address is specified
+	 * in an immediate field in the instruction opcode itself, ie 24 bits
+	 * in the opcode specify the address. Therefore the address gap should
+	 * be 32MB on either side of the current instruction.
+	 */
+	rel_chk = (long)buff -
+			(unsigned long)p->addr;
+	if (rel_chk < -0x2000000 || rel_chk > 0x1fffffc || rel_chk & 0x3) {
+		ppc_free_optinsn_slot(op);
+		return -ERANGE;
+	}
+	/*
+	 * For the time being assume that the return address is NIP+4.
+	 * TODO : check the return address is also within 32MB range for
+	 * cases where NIP is not NIP+4.ie the NIP is decided after emulation.
+	 */
+	ret_chk = (long)(buff + TMPL_RET_IDX) - (unsigned long)(p->addr) + 4;
+	if (ret_chk < -0x2000000 || ret_chk > 0x1fffffc || ret_chk & 0x3) {
+		ppc_free_optinsn_slot(op);
+		return -ERANGE;
+	}
+
+	/* Do Copy arch specific instance from template*/
+	memcpy(buff, optprobe_template_entry,
+	       TMPL_END_IDX * sizeof(kprobe_opcode_t));
+	create_load_address_insn(op, buff + TMPL_OP1_IDX, buff + TMPL_OP2_IDX);
+
+	/* Create a branch to the optimized_callback function */
+	branch = create_branch((unsigned int *)buff + TMPL_CALL_HDLR_IDX,
+			       (unsigned long)optimized_callback + 8,
+				BRANCH_SET_LINK);
+
+	/* Place the branch instr into the trampoline */
+	buff[TMPL_CALL_HDLR_IDX] = branch;
+	create_insn(*(p->ainsn.insn), buff + TMPL_INSN_IDX);
+
+	/*Create a branch instruction into the emulate_step*/
+	branch3 = create_branch((unsigned int *)buff + TMPL_EMULATE_IDX,
+				(unsigned long)emulate_step + 8,
+				BRANCH_SET_LINK);
+	buff[TMPL_EMULATE_IDX] = branch3;
+
+	/* Create a branch for jumping back*/
+	branch2 = create_branch((unsigned int *)buff + TMPL_RET_BRANCH_IDX,
+				(unsigned long)create_return_branch + 8,
+				BRANCH_SET_LINK);
+	buff[TMPL_RET_BRANCH_IDX] = branch2;
+
+	op->optinsn.insn = buff;
+	return 0;
+}
+
+int arch_prepared_optinsn(struct arch_optimized_insn *optinsn)
+{
+	return optinsn->insn;
+}
+
+/*
+ * Here,kprobe opt always replace one instruction (4 bytes
+ * aligned and 4 bytes long). It is impossible to encounter another
+ * kprobe in the address range. So always return 0.
+ */
+int arch_check_optimized_kprobe(struct optimized_kprobe *op)
+{
+	return 0;
+}
+
+void arch_optimize_kprobes(struct list_head *oplist)
+{
+	struct optimized_kprobe *op;
+	struct optimized_kprobe *tmp;
+
+	unsigned int branch;
+
+	list_for_each_entry_safe(op, tmp, oplist, list) {
+	/*
+	* Backup instructions which will be replaced
+	* by jump address
+	*/
+		memcpy(op->optinsn.copied_insn, op->kp.addr,
+		       RELATIVEJUMP_SIZE);
+		branch = create_branch((unsigned int *)op->kp.addr,
+					(unsigned long)op->optinsn.insn, 0);
+		*op->kp.addr = branch;
+		list_del_init(&op->list);
+	}
+}
+
+void arch_unoptimize_kprobe(struct optimized_kprobe *op)
+{
+	arch_arm_kprobe(&op->kp);
+}
+
+void arch_unoptimize_kprobes(struct list_head *oplist,
+			     struct list_head *done_list)
+{
+	struct optimized_kprobe *op;
+	struct optimized_kprobe *tmp;
+
+	list_for_each_entry_safe(op, tmp, oplist, list) {
+		arch_unoptimize_kprobe(op);
+		list_move(&op->list, done_list);
+	}
+}
+
+int arch_within_optimized_kprobe(struct optimized_kprobe *op,
+				 unsigned long addr)
+{
+	return 0;
+}
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC PATCH v2 3/3] arch/powerpc : Enable optprobes support in powerpc
  2016-05-19 15:10 [RFC PATCH v2 0/3] OPTPROBES for powerpc Anju T
  2016-05-19 15:10 ` [RFC PATCH v2 1/3] arch/powerpc : Add detour buffer support for optprobes Anju T
  2016-05-19 15:10 ` [RFC PATCH v2 2/3] arch/powerpc : optprobes for powerpc core Anju T
@ 2016-05-19 15:10 ` Anju T
  2 siblings, 0 replies; 9+ messages in thread
From: Anju T @ 2016-05-19 15:10 UTC (permalink / raw)
  To: linux-kernel, linuxppc-dev
  Cc: anju, ananth, naveen.n.rao, paulus, srikar, benh, mpe, hemant,
	mahesh, mhiramat

Signed-off-by: Anju T <anju@linux.vnet.ibm.com>
---
 Documentation/features/debug/optprobes/arch-support.txt | 2 +-
 arch/powerpc/Kconfig                                    | 1 +
 arch/powerpc/kernel/Makefile                            | 1 +
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/Documentation/features/debug/optprobes/arch-support.txt b/Documentation/features/debug/optprobes/arch-support.txt
index b8999d8..45bc99d 100644
--- a/Documentation/features/debug/optprobes/arch-support.txt
+++ b/Documentation/features/debug/optprobes/arch-support.txt
@@ -27,7 +27,7 @@
     |       nios2: | TODO |
     |    openrisc: | TODO |
     |      parisc: | TODO |
-    |     powerpc: | TODO |
+    |     powerpc: |  ok  |
     |        s390: | TODO |
     |       score: | TODO |
     |          sh: | TODO |
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 7cd32c0..a87c9b1 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -104,6 +104,7 @@ config PPC
 	select HAVE_IOREMAP_PROT
 	select HAVE_EFFICIENT_UNALIGNED_ACCESS if !CPU_LITTLE_ENDIAN
 	select HAVE_KPROBES
+	select HAVE_OPTPROBES
 	select HAVE_ARCH_KGDB
 	select HAVE_KRETPROBES
 	select HAVE_ARCH_TRACEHOOK
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 2da380f..7994e22 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -99,6 +99,7 @@ endif
 obj-$(CONFIG_BOOTX_TEXT)	+= btext.o
 obj-$(CONFIG_SMP)		+= smp.o
 obj-$(CONFIG_KPROBES)		+= kprobes.o
+obj-$(CONFIG_OPTPROBES)		+= optprobes.o optprobes_head.o
 obj-$(CONFIG_UPROBES)		+= uprobes.o
 obj-$(CONFIG_PPC_UDBG_16550)	+= legacy_serial.o udbg_16550.o
 obj-$(CONFIG_STACKTRACE)	+= stacktrace.o
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH v2 2/3] arch/powerpc : optprobes for powerpc core
  2016-05-19 15:10 ` [RFC PATCH v2 2/3] arch/powerpc : optprobes for powerpc core Anju T
@ 2016-05-20 12:37   ` Masami Hiramatsu
  2016-05-24  6:06     ` Anju T
  0 siblings, 1 reply; 9+ messages in thread
From: Masami Hiramatsu @ 2016-05-20 12:37 UTC (permalink / raw)
  To: Anju T
  Cc: linux-kernel, linuxppc-dev, ananth, naveen.n.rao, paulus, srikar,
	benh, mpe, hemant, mahesh, mhiramat

Hi Anju,

Please see my comments below,

On Thu, 19 May 2016 20:40:39 +0530
Anju T <anju@linux.vnet.ibm.com> wrote:

> ppc_get_optinsn_slot() and ppc_free_optinsn_slot() are
> geared towards the allocation and freeing of memory from 
> the area reserved for detour buffer.
> 
> Signed-off-by: Anju T <anju@linux.vnet.ibm.com>
> ---
>  arch/powerpc/kernel/optprobes.c | 480 ++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 480 insertions(+)
>  create mode 100644 arch/powerpc/kernel/optprobes.c
> 
> diff --git a/arch/powerpc/kernel/optprobes.c b/arch/powerpc/kernel/optprobes.c
> new file mode 100644
> index 0000000..bb61e18
> --- /dev/null
> +++ b/arch/powerpc/kernel/optprobes.c
> @@ -0,0 +1,480 @@
> +/*
> + * Code for Kernel probes Jump optimization.
> + *
> + * Copyright 2016, Anju T, IBM Corp.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +
> +#include <linux/kprobes.h>
> +#include <linux/jump_label.h>
> +#include <linux/types.h>
> +#include <linux/slab.h>
> +#include <linux/list.h>
> +#include <asm/kprobes.h>
> +#include <asm/ptrace.h>
> +#include <asm/cacheflush.h>
> +#include <asm/code-patching.h>
> +#include <asm/sstep.h>
> +
> +#define SLOT_SIZE 65536
> +#define TMPL_CALL_HDLR_IDX	\
> +	(optprobe_template_call_handler - optprobe_template_entry)
> +#define TMPL_EMULATE_IDX	\
> +	(optprobe_template_call_emulate - optprobe_template_entry)
> +#define TMPL_RET_BRANCH_IDX	\
> +	(optprobe_template_ret_branch - optprobe_template_entry)
> +#define TMPL_RET_IDX	\
> +	(optprobe_template_ret - optprobe_template_entry)
> +#define TMPL_OP1_IDX	\
> +	(optprobe_template_op_address1 - optprobe_template_entry)
> +#define TMPL_OP2_IDX	\
> +	(optprobe_template_op_address2 - optprobe_template_entry)
> +#define TMPL_INSN_IDX	\
> +	(optprobe_template_insn - optprobe_template_entry)
> +#define TMPL_END_IDX	\
> +	(optprobe_template_end - optprobe_template_entry)
> +
> +struct kprobe_ppc_insn_page {
> +	struct list_head list;
> +	kprobe_opcode_t *insns;	/* Page of instruction slots */
> +	struct kprobe_insn_cache *cache;
> +	int nused;
> +	int ngarbage;
> +	char slot_used[];
> +};
> +
> +#define PPC_KPROBE_INSN_PAGE_SIZE(slots)	\
> +	(offsetof(struct kprobe_ppc_insn_page, slot_used) +	\
> +	(sizeof(char) * (slots)))
> +
> +enum ppc_kprobe_slot_state {
> +	SLOT_CLEAN = 0,
> +	SLOT_DIRTY = 1,
> +	SLOT_USED = 2,
> +};
> +
> +static struct kprobe_insn_cache kprobe_ppc_optinsn_slots = {
> +	.mutex = __MUTEX_INITIALIZER(kprobe_ppc_optinsn_slots.mutex),
> +	.pages = LIST_HEAD_INIT(kprobe_ppc_optinsn_slots.pages),
> +	/* .insn_size is initialized later */
> +	.nr_garbage = 0,
> +};
> +
> +static int ppc_slots_per_page(struct kprobe_insn_cache *c)
> +{
> +	/*
> +	 * Here the #slots per page differs from x86 as we have
> +	 * only 64KB reserved.
> +	 */
> +	return SLOT_SIZE / (c->insn_size * sizeof(kprobe_opcode_t));
> +}
> +
> +/* Return 1 if all garbages are collected, otherwise 0. */
> +static int collect_one_slot(struct kprobe_ppc_insn_page *kip, int idx)
> +{
> +	kip->slot_used[idx] = SLOT_CLEAN;
> +	kip->nused--;
> +	return 0;
> +}
> +
> +static int collect_garbage_slots(struct kprobe_insn_cache *c)
> +{
> +	struct kprobe_ppc_insn_page *kip, *next;
> +
> +	/* Ensure no-one is interrupted on the garbages */
> +	synchronize_sched();
> +
> +	list_for_each_entry_safe(kip, next, &c->pages, list) {
> +		int i;
> +
> +		if (kip->ngarbage == 0)
> +			continue;
> +		kip->ngarbage = 0;	/* we will collect all garbages */
> +		for (i = 0; i < ppc_slots_per_page(c); i++) {
> +			if (kip->slot_used[i] == SLOT_DIRTY &&
> +			    collect_one_slot(kip, i))
> +				break;
> +		}
> +	}
> +	c->nr_garbage = 0;
> +	return 0;
> +}
> +
> +kprobe_opcode_t  *__ppc_get_optinsn_slot(struct kprobe_insn_cache *c)
> +{
> +	struct kprobe_ppc_insn_page *kip;
> +	kprobe_opcode_t *slot = NULL;
> +
> +	mutex_lock(&c->mutex);
> +	list_for_each_entry(kip, &c->pages, list) {
> +		if (kip->nused < ppc_slots_per_page(c)) {
> +			int i;
> +
> +			for (i = 0; i < ppc_slots_per_page(c); i++) {
> +				if (kip->slot_used[i] == SLOT_CLEAN) {
> +					kip->slot_used[i] = SLOT_USED;
> +					kip->nused++;
> +					slot = kip->insns + (i * c->insn_size);
> +					goto out;
> +				}
> +			}
> +			/* kip->nused reached max value. */
> +			kip->nused = ppc_slots_per_page(c);
> +			WARN_ON(1);
> +		}
> +		if (!list_empty(&c->pages)) {
> +			pr_info("No more slots to allocate\n");
> +			return NULL;
> +		}

Ah, at this point this is always true!
what I meant was, checking it right after this loop (list_for_each_entry)
we should do that.

BTW, why wouldn't you reuse current insn_slot code?
Maybe you just need following function to allocate the
scratch pad instead of cloning code (or above code).

----
static bool __ppc_insn_page_used;
/*
 * Allocate from the reserved area so as to
 * ensure the range will be within +/-32MB
 */
static void *__ppc_alloc_insn_page(void)
{
	__ppc_insn_page_used = true;
	return optinsn_slot
}

static void *__ppc_free_insn_page(void *page __maybe_unused)
{
	__ppc_insn_page_used = false;
}

> +static struct kprobe_insn_cache kprobe_ppc_optinsn_slots = {
> +	.mutex = __MUTEX_INITIALIZER(kprobe_ppc_optinsn_slots.mutex),
	.alloc = __ppc_alloc_insn_page,
	.free = __ppc_free_insn_page,
> +	.pages = LIST_HEAD_INIT(kprobe_ppc_optinsn_slots.pages),
> +	/* .insn_size is initialized later */
> +	.nr_garbage = 0,
> +};
----- 

This can simplify the code.

> +	}
> +	kip = kmalloc(PPC_KPROBE_INSN_PAGE_SIZE(ppc_slots_per_page(c)),
> +		      GFP_KERNEL);
> +	if (!kip)
> +		goto out;
> +	/*
> +	 * Allocate from the reserved area so as to
> +	 * ensure the range will be within +/-32MB
> +	 */
> +	kip->insns = &optinsn_slot;
> +	if (!kip->insns) {
> +		kfree(kip);
> +		goto out;
> +	}
> +	INIT_LIST_HEAD(&kip->list);
> +	memset(kip->slot_used, SLOT_CLEAN, ppc_slots_per_page(c));
> +	kip->slot_used[0] = SLOT_USED;
> +	kip->nused = 1;
> +	kip->ngarbage = 0;
> +	kip->cache = c;
> +	list_add(&kip->list, &c->pages);
> +	slot = kip->insns;
> +out:
> +	mutex_unlock(&c->mutex);
> +	return slot;
> +}
> +
> +kprobe_opcode_t *ppc_get_optinsn_slot(struct optimized_kprobe *op)
> +{
> +	/*
> +	 * The insn slot is allocated from the reserved
> +	 * area(ie &optinsn_slot).We are not optimizing probes
> +	 * at module_addr now.
> +	 */
> +	kprobe_opcode_t *slot = NULL;
> +
> +	if (is_kernel_addr(op->kp.addr))
> +		slot = __ppc_get_optinsn_slot(&kprobe_ppc_optinsn_slots);
> +	else
> +		pr_info("Kprobe can not be optimized\n");
> +	return slot;
> +}
> +
> +void __ppc_free_optinsn_slot(struct kprobe_insn_cache *c,
> +			     kprobe_opcode_t *slot, int dirty)
> +{
> +	struct kprobe_ppc_insn_page *kip;
> +
> +	mutex_lock(&c->mutex);
> +
> +	list_for_each_entry(kip, &c->pages, list) {
> +		long idx = ((long)slot - (long)kip->insns) /
> +				(c->insn_size * sizeof(kprobe_opcode_t));
> +		if (idx >= 0 && idx < ppc_slots_per_page(c)) {
> +			WARN_ON(kip->slot_used[idx] != SLOT_USED);
> +			if (dirty) {
> +				kip->slot_used[idx] = SLOT_DIRTY;
> +				kip->ngarbage++;
> +				if (++c->nr_garbage > ppc_slots_per_page(c))
> +					collect_garbage_slots(c);
> +			} else
> +				collect_one_slot(kip, idx);
> +			goto out;
> +		}
> +	}
> +	/* Could not free this slot. */
> +	WARN_ON(1);
> +out:
> +	mutex_unlock(&c->mutex);
> +}
> +
> +static void ppc_free_optinsn_slot(struct optimized_kprobe *op)
> +{
> +	if (!op->optinsn.insn)
> +		return;
> +	if (is_kernel_addr((unsigned long)op->kp.addr))
> +		__ppc_free_optinsn_slot(&kprobe_ppc_optinsn_slots,
> +					op->optinsn.insn, 0);
> +}
> +
> +static void
> +__arch_remove_optimized_kprobe(struct optimized_kprobe *op, int dirty)
> +{
> +	if (op->optinsn.insn) {
> +		ppc_free_optinsn_slot(op);
> +		op->optinsn.insn = NULL;
> +	}
> +}
> +
> +static int can_optimize(struct kprobe *p)
> +{
> +	struct pt_regs *regs;
> +	unsigned int instr;
> +	int r;
> +
> +	/*
> +	 * Not optimizing the kprobe placed by
> +	 * kretprobe during boot time
> +	 */
> +	if ((kprobe_opcode_t)p->addr == (kprobe_opcode_t)&kretprobe_trampoline)
> +		return 0;
> +
> +	regs = current_pt_regs();
> +	instr = *(p->ainsn.insn);
> +
> +	/* Ensure the instruction can be emulated*/
> +	r = emulate_step(regs, instr);
> +	if (r != 1)
> +		return 0;
> +
> +	return 1;
> +}
> +
> +static void
> +create_return_branch(struct optimized_kprobe *op, struct pt_regs *regs)
> +{
> +	/*
> +	 * Create a branch back to the return address
> +	 * after the probed instruction is emulated
> +	 */
> +
> +	kprobe_opcode_t branch, *buff;
> +	unsigned long ret;
> +
> +	ret = regs->nip;
> +	buff = op->optinsn.insn;
> +
> +	branch = create_branch((unsigned int *)buff + TMPL_RET_IDX,
> +			       (unsigned long)ret, 0);
> +	buff[TMPL_RET_IDX] = branch;
> +}
> +
> +static void
> +optimized_callback(struct optimized_kprobe *op, struct pt_regs *regs)
> +{
> +	struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
> +	unsigned long flags;
> +
> +	local_irq_save(flags);
> +
> +	if (kprobe_running())
> +		kprobes_inc_nmissed_count(&op->kp);
> +	else {
> +		__this_cpu_write(current_kprobe, &op->kp);
> +		kcb->kprobe_status = KPROBE_HIT_ACTIVE;
> +		opt_pre_handler(&op->kp, regs);
> +		__this_cpu_write(current_kprobe, NULL);
> +	}
> +	local_irq_restore(flags);
> +}
> +NOKPROBE_SYMBOL(optimized_callback);
> +
> +void arch_remove_optimized_kprobe(struct optimized_kprobe *op)
> +{
> +	 __arch_remove_optimized_kprobe(op, 1);
> +}
> +
> +void  create_insn(unsigned int insn, kprobe_opcode_t *addr)
> +{
> +	u32 instr, instr2;
> +
> +	/*
> +	 * emulate_step() requires insn to be emulated as
> +	 * second parameter. Hence r4 should be loaded
> +	 * with 'insn'.
> +	 * synthesize addis r4,0,(insn)@h
> +	 */
> +	instr = 0x3c000000 | 0x800000 | ((insn >> 16) & 0xffff);
> +	*addr++ = instr;
> +
> +	/* ori r4,r4,(insn)@l */
> +	instr2 = 0x60000000 | 0x40000 | 0x800000;
> +	instr2 = instr2 | (insn & 0xffff);
> +	*addr = instr2;
> +}
> +
> +void create_load_address_insn(struct optimized_kprobe *op,
> +			      kprobe_opcode_t *addr, kprobe_opcode_t *addr2)
> +{
> +	u32 instr1, instr2, instr3, instr4, instr5;
> +	unsigned long val;
> +
> +	val = op;
> +
> +	/*
> +	 * Optimized_kprobe structure is required as a parameter
> +	 * for invoking optimized_callback() and create_return_branch()
> +	 * from detour buffer. Hence need to have a 64bit immediate
> +	 * load into r3.
> +	 *
> +	 * lis r3,(op)@highest
> +	 */
> +	instr1 = 0x3c000000 | 0x600000 | ((val >> 48) & 0xffff);
> +	*addr++ = instr1;
> +	*addr2++ = instr1;
> +
> +	/* ori r3,r3,(op)@higher */
> +	instr2 = 0x60000000 | 0x30000 | 0x600000 | ((val >> 32) & 0xffff);
> +	*addr++ = instr2;
> +	*addr2++ = instr2;
> +
> +	/* rldicr r3,r3,32,31 */
> +	instr3 = 0x78000004 | 0x30000 | 0x600000 | ((32 & 0x1f) << 11);
> +	instr3 = instr3 | ((31 & 0x1f) << 6) | ((32 & 0x20) >> 4);
> +	*addr++ = instr3;
> +	*addr2++ = instr3;
> +
> +	/* oris r3,r3,(op)@h */
> +	instr4 = 0x64000000 |  0x30000 | 0x600000 | ((val >> 16) & 0xffff);
> +	*addr++ = instr4;
> +	*addr2++ = instr4;
> +
> +	/* ori r3,r3,(op)@l */
> +	instr5 =  0x60000000 | 0x30000 | 0x600000 | (val & 0xffff);
> +	*addr = instr5;
> +	*addr2 = instr5;
> +}
> +
> +int arch_prepare_optimized_kprobe(struct optimized_kprobe *op, struct kprobe *p)
> +{
> +	kprobe_opcode_t *buff, branch, branch2, branch3;
> +	long rel_chk, ret_chk;
> +
> +	kprobe_ppc_optinsn_slots.insn_size = MAX_OPTINSN_SIZE;
> +	op->optinsn.insn = NULL;
> +
> +	if (!can_optimize(p))
> +		return -EILSEQ;
> +
> +	/* Allocate instruction slot for detour buffer*/
> +	buff = ppc_get_optinsn_slot(op);
> +	if (!buff)
> +		return -ENOMEM;
> +
> +	/*
> +	 * OPTPROBE use a 'b' instruction to branch to optinsn.insn.
> +	 *
> +	 * The target address has to be relatively nearby, to permit use
> +	 * of branch instruction in powerpc because the address is specified
> +	 * in an immediate field in the instruction opcode itself, ie 24 bits
> +	 * in the opcode specify the address. Therefore the address gap should
> +	 * be 32MB on either side of the current instruction.
> +	 */
> +	rel_chk = (long)buff -
> +			(unsigned long)p->addr;

There still be a new line...

> +	if (rel_chk < -0x2000000 || rel_chk > 0x1fffffc || rel_chk & 0x3) {
> +		ppc_free_optinsn_slot(op);
> +		return -ERANGE;
> +	}
> +	/*
> +	 * For the time being assume that the return address is NIP+4.
> +	 * TODO : check the return address is also within 32MB range for
> +	 * cases where NIP is not NIP+4.ie the NIP is decided after emulation.
> +	 */
> +	ret_chk = (long)(buff + TMPL_RET_IDX) - (unsigned long)(p->addr) + 4;
> +	if (ret_chk < -0x2000000 || ret_chk > 0x1fffffc || ret_chk & 0x3) {
> +		ppc_free_optinsn_slot(op);
> +		return -ERANGE;
> +	}
> +
> +	/* Do Copy arch specific instance from template*/
> +	memcpy(buff, optprobe_template_entry,
> +	       TMPL_END_IDX * sizeof(kprobe_opcode_t));
> +	create_load_address_insn(op, buff + TMPL_OP1_IDX, buff + TMPL_OP2_IDX);
> +
> +	/* Create a branch to the optimized_callback function */
> +	branch = create_branch((unsigned int *)buff + TMPL_CALL_HDLR_IDX,
> +			       (unsigned long)optimized_callback + 8,
> +				BRANCH_SET_LINK);
> +
> +	/* Place the branch instr into the trampoline */
> +	buff[TMPL_CALL_HDLR_IDX] = branch;
> +	create_insn(*(p->ainsn.insn), buff + TMPL_INSN_IDX);
> +
> +	/*Create a branch instruction into the emulate_step*/
> +	branch3 = create_branch((unsigned int *)buff + TMPL_EMULATE_IDX,
> +				(unsigned long)emulate_step + 8,
> +				BRANCH_SET_LINK);
> +	buff[TMPL_EMULATE_IDX] = branch3;
> +
> +	/* Create a branch for jumping back*/
> +	branch2 = create_branch((unsigned int *)buff + TMPL_RET_BRANCH_IDX,
> +				(unsigned long)create_return_branch + 8,
> +				BRANCH_SET_LINK);
> +	buff[TMPL_RET_BRANCH_IDX] = branch2;
> +
> +	op->optinsn.insn = buff;
> +	return 0;
> +}
> +
> +int arch_prepared_optinsn(struct arch_optimized_insn *optinsn)
> +{
> +	return optinsn->insn;
> +}
> +
> +/*
> + * Here,kprobe opt always replace one instruction (4 bytes
> + * aligned and 4 bytes long). It is impossible to encounter another
> + * kprobe in the address range. So always return 0.
> + */
> +int arch_check_optimized_kprobe(struct optimized_kprobe *op)
> +{
> +	return 0;
> +}
> +
> +void arch_optimize_kprobes(struct list_head *oplist)
> +{
> +	struct optimized_kprobe *op;
> +	struct optimized_kprobe *tmp;
> +
> +	unsigned int branch;
> +
> +	list_for_each_entry_safe(op, tmp, oplist, list) {
> +	/*
> +	* Backup instructions which will be replaced
> +	* by jump address
> +	*/

This comment should also be indented.

> +		memcpy(op->optinsn.copied_insn, op->kp.addr,
> +		       RELATIVEJUMP_SIZE);
> +		branch = create_branch((unsigned int *)op->kp.addr,
> +					(unsigned long)op->optinsn.insn, 0);
> +		*op->kp.addr = branch;
> +		list_del_init(&op->list);
> +	}
> +}
> +
> +void arch_unoptimize_kprobe(struct optimized_kprobe *op)
> +{
> +	arch_arm_kprobe(&op->kp);
> +}
> +
> +void arch_unoptimize_kprobes(struct list_head *oplist,
> +			     struct list_head *done_list)
> +{
> +	struct optimized_kprobe *op;
> +	struct optimized_kprobe *tmp;
> +
> +	list_for_each_entry_safe(op, tmp, oplist, list) {
> +		arch_unoptimize_kprobe(op);
> +		list_move(&op->list, done_list);
> +	}
> +}
> +
> +int arch_within_optimized_kprobe(struct optimized_kprobe *op,
> +				 unsigned long addr)
> +{
> +	return 0;
> +}
> -- 
> 2.1.0
> 

Thanks!


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH v2 2/3] arch/powerpc : optprobes for powerpc core
  2016-05-20 12:37   ` Masami Hiramatsu
@ 2016-05-24  6:06     ` Anju T
  0 siblings, 0 replies; 9+ messages in thread
From: Anju T @ 2016-05-24  6:06 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: linux-kernel, linuxppc-dev, ananth, naveen.n.rao, paulus, srikar,
	benh, mpe, hemant, mahesh

Hi,
On Friday 20 May 2016 06:07 PM, Masami Hiramatsu wrote:
> Hi Anju,
>
> Please see my comments below,
>
> On Thu, 19 May 2016 20:40:39 +0530
> Anju T <anju@linux.vnet.ibm.com> wrote:
>
>> ppc_get_optinsn_slot() and ppc_free_optinsn_slot() are
>> geared towards the allocation and freeing of memory from
>> the area reserved for detour buffer.
>>
>> Signed-off-by: Anju T <anju@linux.vnet.ibm.com>
>> ---
>>   arch/powerpc/kernel/optprobes.c | 480 ++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 480 insertions(+)
>>   create mode 100644 arch/powerpc/kernel/optprobes.c
>>
>> diff --git a/arch/powerpc/kernel/optprobes.c b/arch/powerpc/kernel/optprobes.c
>> new file mode 100644
>> index 0000000..bb61e18
>> --- /dev/null
>> +++ b/arch/powerpc/kernel/optprobes.c
>> @@ -0,0 +1,480 @@
>> +/*
>> + * Code for Kernel probes Jump optimization.
>> + *
>> + * Copyright 2016, Anju T, IBM Corp.
>> + *
>> + * This program is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU General Public License
>> + * as published by the Free Software Foundation; either version
>> + * 2 of the License, or (at your option) any later version.
>> + */
>> +
>> +#include <linux/kprobes.h>
>> +#include <linux/jump_label.h>
>> +#include <linux/types.h>
>> +#include <linux/slab.h>
>> +#include <linux/list.h>
>> +#include <asm/kprobes.h>
>> +#include <asm/ptrace.h>
>> +#include <asm/cacheflush.h>
>> +#include <asm/code-patching.h>
>> +#include <asm/sstep.h>
>> +
>> +#define SLOT_SIZE 65536
>> +#define TMPL_CALL_HDLR_IDX	\
>> +	(optprobe_template_call_handler - optprobe_template_entry)
>> +#define TMPL_EMULATE_IDX	\
>> +	(optprobe_template_call_emulate - optprobe_template_entry)
>> +#define TMPL_RET_BRANCH_IDX	\
>> +	(optprobe_template_ret_branch - optprobe_template_entry)
>> +#define TMPL_RET_IDX	\
>> +	(optprobe_template_ret - optprobe_template_entry)
>> +#define TMPL_OP1_IDX	\
>> +	(optprobe_template_op_address1 - optprobe_template_entry)
>> +#define TMPL_OP2_IDX	\
>> +	(optprobe_template_op_address2 - optprobe_template_entry)
>> +#define TMPL_INSN_IDX	\
>> +	(optprobe_template_insn - optprobe_template_entry)
>> +#define TMPL_END_IDX	\
>> +	(optprobe_template_end - optprobe_template_entry)
>> +
>> +struct kprobe_ppc_insn_page {
>> +	struct list_head list;
>> +	kprobe_opcode_t *insns;	/* Page of instruction slots */
>> +	struct kprobe_insn_cache *cache;
>> +	int nused;
>> +	int ngarbage;
>> +	char slot_used[];
>> +};
>> +
>> +#define PPC_KPROBE_INSN_PAGE_SIZE(slots)	\
>> +	(offsetof(struct kprobe_ppc_insn_page, slot_used) +	\
>> +	(sizeof(char) * (slots)))
>> +
>> +enum ppc_kprobe_slot_state {
>> +	SLOT_CLEAN = 0,
>> +	SLOT_DIRTY = 1,
>> +	SLOT_USED = 2,
>> +};
>> +
>> +static struct kprobe_insn_cache kprobe_ppc_optinsn_slots = {
>> +	.mutex = __MUTEX_INITIALIZER(kprobe_ppc_optinsn_slots.mutex),
>> +	.pages = LIST_HEAD_INIT(kprobe_ppc_optinsn_slots.pages),
>> +	/* .insn_size is initialized later */
>> +	.nr_garbage = 0,
>> +};
>> +
>> +static int ppc_slots_per_page(struct kprobe_insn_cache *c)
>> +{
>> +	/*
>> +	 * Here the #slots per page differs from x86 as we have
>> +	 * only 64KB reserved.
>> +	 */
>> +	return SLOT_SIZE / (c->insn_size * sizeof(kprobe_opcode_t));
>> +}
>> +
>> +/* Return 1 if all garbages are collected, otherwise 0. */
>> +static int collect_one_slot(struct kprobe_ppc_insn_page *kip, int idx)
>> +{
>> +	kip->slot_used[idx] = SLOT_CLEAN;
>> +	kip->nused--;
>> +	return 0;
>> +}
>> +
>> +static int collect_garbage_slots(struct kprobe_insn_cache *c)
>> +{
>> +	struct kprobe_ppc_insn_page *kip, *next;
>> +
>> +	/* Ensure no-one is interrupted on the garbages */
>> +	synchronize_sched();
>> +
>> +	list_for_each_entry_safe(kip, next, &c->pages, list) {
>> +		int i;
>> +
>> +		if (kip->ngarbage == 0)
>> +			continue;
>> +		kip->ngarbage = 0;	/* we will collect all garbages */
>> +		for (i = 0; i < ppc_slots_per_page(c); i++) {
>> +			if (kip->slot_used[i] == SLOT_DIRTY &&
>> +			    collect_one_slot(kip, i))
>> +				break;
>> +		}
>> +	}
>> +	c->nr_garbage = 0;
>> +	return 0;
>> +}
>> +
>> +kprobe_opcode_t  *__ppc_get_optinsn_slot(struct kprobe_insn_cache *c)
>> +{
>> +	struct kprobe_ppc_insn_page *kip;
>> +	kprobe_opcode_t *slot = NULL;
>> +
>> +	mutex_lock(&c->mutex);
>> +	list_for_each_entry(kip, &c->pages, list) {
>> +		if (kip->nused < ppc_slots_per_page(c)) {
>> +			int i;
>> +
>> +			for (i = 0; i < ppc_slots_per_page(c); i++) {
>> +				if (kip->slot_used[i] == SLOT_CLEAN) {
>> +					kip->slot_used[i] = SLOT_USED;
>> +					kip->nused++;
>> +					slot = kip->insns + (i * c->insn_size);
>> +					goto out;
>> +				}
>> +			}
>> +			/* kip->nused reached max value. */
>> +			kip->nused = ppc_slots_per_page(c);
>> +			WARN_ON(1);
>> +		}
>> +		if (!list_empty(&c->pages)) {
>> +			pr_info("No more slots to allocate\n");
>> +			return NULL;
>> +		}
> Ah, at this point this is always true!
> what I meant was, checking it right after this loop (list_for_each_entry)
> we should do that.
>
> BTW, why wouldn't you reuse current insn_slot code?
> Maybe you just need following function to allocate the
> scratch pad instead of cloning code (or above code).
>
> ----
> static bool __ppc_insn_page_used;
> /*
>   * Allocate from the reserved area so as to
>   * ensure the range will be within +/-32MB
>   */
> static void *__ppc_alloc_insn_page(void)
> {
> 	__ppc_insn_page_used = true;
> 	return optinsn_slot
> }
>
> static void *__ppc_free_insn_page(void *page __maybe_unused)
> {
> 	__ppc_insn_page_used = false;
> }
>
>> +static struct kprobe_insn_cache kprobe_ppc_optinsn_slots = {
>> +	.mutex = __MUTEX_INITIALIZER(kprobe_ppc_optinsn_slots.mutex),
> 	.alloc = __ppc_alloc_insn_page,
> 	.free = __ppc_free_insn_page,
>> +	.pages = LIST_HEAD_INIT(kprobe_ppc_optinsn_slots.pages),
>> +	/* .insn_size is initialized later */
>> +	.nr_garbage = 0,
>> +};
> -----
>
> This can simplify the code.


Yes.This we can do.

>> +	}
>> +	kip = kmalloc(PPC_KPROBE_INSN_PAGE_SIZE(ppc_slots_per_page(c)),
>> +		      GFP_KERNEL);
>> +	if (!kip)
>> +		goto out;
>> +	/*
>> +	 * Allocate from the reserved area so as to
>> +	 * ensure the range will be within +/-32MB
>> +	 */
>> +	kip->insns = &optinsn_slot;
>> +	if (!kip->insns) {
>> +		kfree(kip);
>> +		goto out;
>> +	}
>> +	INIT_LIST_HEAD(&kip->list);
>> +	memset(kip->slot_used, SLOT_CLEAN, ppc_slots_per_page(c));
>> +	kip->slot_used[0] = SLOT_USED;
>> +	kip->nused = 1;
>> +	kip->ngarbage = 0;
>> +	kip->cache = c;
>> +	list_add(&kip->list, &c->pages);
>> +	slot = kip->insns;
>> +out:
>> +	mutex_unlock(&c->mutex);
>> +	return slot;
>> +}
>> +
>> +kprobe_opcode_t *ppc_get_optinsn_slot(struct optimized_kprobe *op)
>> +{
>> +	/*
>> +	 * The insn slot is allocated from the reserved
>> +	 * area(ie &optinsn_slot).We are not optimizing probes
>> +	 * at module_addr now.
>> +	 */
>> +	kprobe_opcode_t *slot = NULL;
>> +
>> +	if (is_kernel_addr(op->kp.addr))
>> +		slot = __ppc_get_optinsn_slot(&kprobe_ppc_optinsn_slots);
>> +	else
>> +		pr_info("Kprobe can not be optimized\n");
>> +	return slot;
>> +}
>> +
>> +void __ppc_free_optinsn_slot(struct kprobe_insn_cache *c,
>> +			     kprobe_opcode_t *slot, int dirty)
>> +{
>> +	struct kprobe_ppc_insn_page *kip;
>> +
>> +	mutex_lock(&c->mutex);
>> +
>> +	list_for_each_entry(kip, &c->pages, list) {
>> +		long idx = ((long)slot - (long)kip->insns) /
>> +				(c->insn_size * sizeof(kprobe_opcode_t));
>> +		if (idx >= 0 && idx < ppc_slots_per_page(c)) {
>> +			WARN_ON(kip->slot_used[idx] != SLOT_USED);
>> +			if (dirty) {
>> +				kip->slot_used[idx] = SLOT_DIRTY;
>> +				kip->ngarbage++;
>> +				if (++c->nr_garbage > ppc_slots_per_page(c))
>> +					collect_garbage_slots(c);
>> +			} else
>> +				collect_one_slot(kip, idx);
>> +			goto out;
>> +		}
>> +	}
>> +	/* Could not free this slot. */
>> +	WARN_ON(1);
>> +out:
>> +	mutex_unlock(&c->mutex);
>> +}
>> +
>> +static void ppc_free_optinsn_slot(struct optimized_kprobe *op)
>> +{
>> +	if (!op->optinsn.insn)
>> +		return;
>> +	if (is_kernel_addr((unsigned long)op->kp.addr))
>> +		__ppc_free_optinsn_slot(&kprobe_ppc_optinsn_slots,
>> +					op->optinsn.insn, 0);
>> +}
>> +
>> +static void
>> +__arch_remove_optimized_kprobe(struct optimized_kprobe *op, int dirty)
>> +{
>> +	if (op->optinsn.insn) {
>> +		ppc_free_optinsn_slot(op);
>> +		op->optinsn.insn = NULL;
>> +	}
>> +}
>> +
>> +static int can_optimize(struct kprobe *p)
>> +{
>> +	struct pt_regs *regs;
>> +	unsigned int instr;
>> +	int r;
>> +
>> +	/*
>> +	 * Not optimizing the kprobe placed by
>> +	 * kretprobe during boot time
>> +	 */
>> +	if ((kprobe_opcode_t)p->addr == (kprobe_opcode_t)&kretprobe_trampoline)
>> +		return 0;
>> +
>> +	regs = current_pt_regs();
>> +	instr = *(p->ainsn.insn);
>> +
>> +	/* Ensure the instruction can be emulated*/
>> +	r = emulate_step(regs, instr);
>> +	if (r != 1)
>> +		return 0;
>> +
>> +	return 1;
>> +}
>> +
>> +static void
>> +create_return_branch(struct optimized_kprobe *op, struct pt_regs *regs)
>> +{
>> +	/*
>> +	 * Create a branch back to the return address
>> +	 * after the probed instruction is emulated
>> +	 */
>> +
>> +	kprobe_opcode_t branch, *buff;
>> +	unsigned long ret;
>> +
>> +	ret = regs->nip;
>> +	buff = op->optinsn.insn;
>> +
>> +	branch = create_branch((unsigned int *)buff + TMPL_RET_IDX,
>> +			       (unsigned long)ret, 0);
>> +	buff[TMPL_RET_IDX] = branch;
>> +}
>> +
>> +static void
>> +optimized_callback(struct optimized_kprobe *op, struct pt_regs *regs)
>> +{
>> +	struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
>> +	unsigned long flags;
>> +
>> +	local_irq_save(flags);
>> +
>> +	if (kprobe_running())
>> +		kprobes_inc_nmissed_count(&op->kp);
>> +	else {
>> +		__this_cpu_write(current_kprobe, &op->kp);
>> +		kcb->kprobe_status = KPROBE_HIT_ACTIVE;
>> +		opt_pre_handler(&op->kp, regs);
>> +		__this_cpu_write(current_kprobe, NULL);
>> +	}
>> +	local_irq_restore(flags);
>> +}
>> +NOKPROBE_SYMBOL(optimized_callback);
>> +
>> +void arch_remove_optimized_kprobe(struct optimized_kprobe *op)
>> +{
>> +	 __arch_remove_optimized_kprobe(op, 1);
>> +}
>> +
>> +void  create_insn(unsigned int insn, kprobe_opcode_t *addr)
>> +{
>> +	u32 instr, instr2;
>> +
>> +	/*
>> +	 * emulate_step() requires insn to be emulated as
>> +	 * second parameter. Hence r4 should be loaded
>> +	 * with 'insn'.
>> +	 * synthesize addis r4,0,(insn)@h
>> +	 */
>> +	instr = 0x3c000000 | 0x800000 | ((insn >> 16) & 0xffff);
>> +	*addr++ = instr;
>> +
>> +	/* ori r4,r4,(insn)@l */
>> +	instr2 = 0x60000000 | 0x40000 | 0x800000;
>> +	instr2 = instr2 | (insn & 0xffff);
>> +	*addr = instr2;
>> +}
>> +
>> +void create_load_address_insn(struct optimized_kprobe *op,
>> +			      kprobe_opcode_t *addr, kprobe_opcode_t *addr2)
>> +{
>> +	u32 instr1, instr2, instr3, instr4, instr5;
>> +	unsigned long val;
>> +
>> +	val = op;
>> +
>> +	/*
>> +	 * Optimized_kprobe structure is required as a parameter
>> +	 * for invoking optimized_callback() and create_return_branch()
>> +	 * from detour buffer. Hence need to have a 64bit immediate
>> +	 * load into r3.
>> +	 *
>> +	 * lis r3,(op)@highest
>> +	 */
>> +	instr1 = 0x3c000000 | 0x600000 | ((val >> 48) & 0xffff);
>> +	*addr++ = instr1;
>> +	*addr2++ = instr1;
>> +
>> +	/* ori r3,r3,(op)@higher */
>> +	instr2 = 0x60000000 | 0x30000 | 0x600000 | ((val >> 32) & 0xffff);
>> +	*addr++ = instr2;
>> +	*addr2++ = instr2;
>> +
>> +	/* rldicr r3,r3,32,31 */
>> +	instr3 = 0x78000004 | 0x30000 | 0x600000 | ((32 & 0x1f) << 11);
>> +	instr3 = instr3 | ((31 & 0x1f) << 6) | ((32 & 0x20) >> 4);
>> +	*addr++ = instr3;
>> +	*addr2++ = instr3;
>> +
>> +	/* oris r3,r3,(op)@h */
>> +	instr4 = 0x64000000 |  0x30000 | 0x600000 | ((val >> 16) & 0xffff);
>> +	*addr++ = instr4;
>> +	*addr2++ = instr4;
>> +
>> +	/* ori r3,r3,(op)@l */
>> +	instr5 =  0x60000000 | 0x30000 | 0x600000 | (val & 0xffff);
>> +	*addr = instr5;
>> +	*addr2 = instr5;
>> +}
>> +
>> +int arch_prepare_optimized_kprobe(struct optimized_kprobe *op, struct kprobe *p)
>> +{
>> +	kprobe_opcode_t *buff, branch, branch2, branch3;
>> +	long rel_chk, ret_chk;
>> +
>> +	kprobe_ppc_optinsn_slots.insn_size = MAX_OPTINSN_SIZE;
>> +	op->optinsn.insn = NULL;
>> +
>> +	if (!can_optimize(p))
>> +		return -EILSEQ;
>> +
>> +	/* Allocate instruction slot for detour buffer*/
>> +	buff = ppc_get_optinsn_slot(op);
>> +	if (!buff)
>> +		return -ENOMEM;
>> +
>> +	/*
>> +	 * OPTPROBE use a 'b' instruction to branch to optinsn.insn.
>> +	 *
>> +	 * The target address has to be relatively nearby, to permit use
>> +	 * of branch instruction in powerpc because the address is specified
>> +	 * in an immediate field in the instruction opcode itself, ie 24 bits
>> +	 * in the opcode specify the address. Therefore the address gap should
>> +	 * be 32MB on either side of the current instruction.
>> +	 */
>> +	rel_chk = (long)buff -
>> +			(unsigned long)p->addr;
> There still be a new line...

Oh..ok. I thought you were mentioning about the line above the if 
condition. :-)
This I will remove.
>
>> +	if (rel_chk < -0x2000000 || rel_chk > 0x1fffffc || rel_chk & 0x3) {
>> +		ppc_free_optinsn_slot(op);
>> +		return -ERANGE;
>> +	}
>> +	/*
>> +	 * For the time being assume that the return address is NIP+4.
>> +	 * TODO : check the return address is also within 32MB range for
>> +	 * cases where NIP is not NIP+4.ie the NIP is decided after emulation.
>> +	 */
>> +	ret_chk = (long)(buff + TMPL_RET_IDX) - (unsigned long)(p->addr) + 4;
>> +	if (ret_chk < -0x2000000 || ret_chk > 0x1fffffc || ret_chk & 0x3) {
>> +		ppc_free_optinsn_slot(op);
>> +		return -ERANGE;
>> +	}
>> +
>> +	/* Do Copy arch specific instance from template*/
>> +	memcpy(buff, optprobe_template_entry,
>> +	       TMPL_END_IDX * sizeof(kprobe_opcode_t));
>> +	create_load_address_insn(op, buff + TMPL_OP1_IDX, buff + TMPL_OP2_IDX);
>> +
>> +	/* Create a branch to the optimized_callback function */
>> +	branch = create_branch((unsigned int *)buff + TMPL_CALL_HDLR_IDX,
>> +			       (unsigned long)optimized_callback + 8,
>> +				BRANCH_SET_LINK);
>> +
>> +	/* Place the branch instr into the trampoline */
>> +	buff[TMPL_CALL_HDLR_IDX] = branch;
>> +	create_insn(*(p->ainsn.insn), buff + TMPL_INSN_IDX);
>> +
>> +	/*Create a branch instruction into the emulate_step*/
>> +	branch3 = create_branch((unsigned int *)buff + TMPL_EMULATE_IDX,
>> +				(unsigned long)emulate_step + 8,
>> +				BRANCH_SET_LINK);
>> +	buff[TMPL_EMULATE_IDX] = branch3;
>> +
>> +	/* Create a branch for jumping back*/
>> +	branch2 = create_branch((unsigned int *)buff + TMPL_RET_BRANCH_IDX,
>> +				(unsigned long)create_return_branch + 8,
>> +				BRANCH_SET_LINK);
>> +	buff[TMPL_RET_BRANCH_IDX] = branch2;
>> +
>> +	op->optinsn.insn = buff;
>> +	return 0;
>> +}
>> +
>> +int arch_prepared_optinsn(struct arch_optimized_insn *optinsn)
>> +{
>> +	return optinsn->insn;
>> +}
>> +
>> +/*
>> + * Here,kprobe opt always replace one instruction (4 bytes
>> + * aligned and 4 bytes long). It is impossible to encounter another
>> + * kprobe in the address range. So always return 0.
>> + */
>> +int arch_check_optimized_kprobe(struct optimized_kprobe *op)
>> +{
>> +	return 0;
>> +}
>> +
>> +void arch_optimize_kprobes(struct list_head *oplist)
>> +{
>> +	struct optimized_kprobe *op;
>> +	struct optimized_kprobe *tmp;
>> +
>> +	unsigned int branch;
>> +
>> +	list_for_each_entry_safe(op, tmp, oplist, list) {
>> +	/*
>> +	* Backup instructions which will be replaced
>> +	* by jump address
>> +	*/
> This comment should also be indented.

Sure. :-)


Thanks
Anju.
>
>> +		memcpy(op->optinsn.copied_insn, op->kp.addr,
>> +		       RELATIVEJUMP_SIZE);
>> +		branch = create_branch((unsigned int *)op->kp.addr,
>> +					(unsigned long)op->optinsn.insn, 0);
>> +		*op->kp.addr = branch;
>> +		list_del_init(&op->list);
>> +	}
>> +}
>> +
>> +void arch_unoptimize_kprobe(struct optimized_kprobe *op)
>> +{
>> +	arch_arm_kprobe(&op->kp);
>> +}
>> +
>> +void arch_unoptimize_kprobes(struct list_head *oplist,
>> +			     struct list_head *done_list)
>> +{
>> +	struct optimized_kprobe *op;
>> +	struct optimized_kprobe *tmp;
>> +
>> +	list_for_each_entry_safe(op, tmp, oplist, list) {
>> +		arch_unoptimize_kprobe(op);
>> +		list_move(&op->list, done_list);
>> +	}
>> +}
>> +
>> +int arch_within_optimized_kprobe(struct optimized_kprobe *op,
>> +				 unsigned long addr)
>> +{
>> +	return 0;
>> +}
>> -- 
>> 2.1.0
>>
> Thanks!
>
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH v2 1/3] arch/powerpc : Add detour buffer support for optprobes
  2016-05-19 15:10 ` [RFC PATCH v2 1/3] arch/powerpc : Add detour buffer support for optprobes Anju T
@ 2016-05-24 10:15   ` Madhavan Srinivasan
  2016-05-24 11:14   ` Naveen N. Rao
       [not found]   ` <201605241015.u4OAFAnB038915@mx0a-001b2d01.pphosted.com>
  2 siblings, 0 replies; 9+ messages in thread
From: Madhavan Srinivasan @ 2016-05-24 10:15 UTC (permalink / raw)
  To: Anju T, linux-kernel, linuxppc-dev
  Cc: ananth, mahesh, paulus, mhiramat, naveen.n.rao, hemant, srikar



On Thursday 19 May 2016 08:40 PM, Anju T wrote:
> Detour buffer contains instructions to create an in memory pt_regs.
> After the execution of prehandler a call is made for instruction emulation.
> The NIP is decided after the probed instruction is executed. Hence a branch
> instruction is created to the NIP returned by emulate_step().
>
> Instruction slot for detour buffer is allocated from
> the reserved area. For the time being 64KB is reserved
> in memory for this purpose.
>
> Signed-off-by: Anju T <anju@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/kprobes.h   |  25 ++++++++
>  arch/powerpc/kernel/optprobes_head.S | 108 +++++++++++++++++++++++++++++++++++
>  2 files changed, 133 insertions(+)
>  create mode 100644 arch/powerpc/kernel/optprobes_head.S
>
> diff --git a/arch/powerpc/include/asm/kprobes.h b/arch/powerpc/include/asm/kprobes.h
> index 039b583..3e4c998 100644
> --- a/arch/powerpc/include/asm/kprobes.h
> +++ b/arch/powerpc/include/asm/kprobes.h
> @@ -38,7 +38,25 @@ struct pt_regs;
>  struct kprobe;
>  
>  typedef ppc_opcode_t kprobe_opcode_t;
> +
> +extern kprobe_opcode_t optinsn_slot;
> +/* Optinsn template address */
> +extern kprobe_opcode_t optprobe_template_entry[];
> +extern kprobe_opcode_t optprobe_template_call_handler[];
> +extern kprobe_opcode_t optprobe_template_call_emulate[];
> +extern kprobe_opcode_t optprobe_template_ret_branch[];
> +extern kprobe_opcode_t optprobe_template_ret[];
> +extern kprobe_opcode_t optprobe_template_insn[];
> +extern kprobe_opcode_t optprobe_template_op_address1[];
> +extern kprobe_opcode_t optprobe_template_op_address2[];
> +extern kprobe_opcode_t optprobe_template_end[];
> +
>  #define MAX_INSN_SIZE 1
> +#define MAX_OPTIMIZED_LENGTH    4
> +#define MAX_OPTINSN_SIZE				\
> +	((unsigned long)&optprobe_template_end -	\
> +	(unsigned long)&optprobe_template_entry)
> +#define RELATIVEJUMP_SIZE       4
>  
>  #ifdef CONFIG_PPC64
>  #if defined(_CALL_ELF) && _CALL_ELF == 2
> @@ -129,5 +147,12 @@ struct kprobe_ctlblk {
>  extern int kprobe_exceptions_notify(struct notifier_block *self,
>  					unsigned long val, void *data);
>  extern int kprobe_fault_handler(struct pt_regs *regs, int trapnr);
> +
> +struct arch_optimized_insn {
> +	kprobe_opcode_t copied_insn[1];
> +	/* detour buffer */
> +	kprobe_opcode_t *insn;
> +};
> +
>  #endif /* __KERNEL__ */
>  #endif	/* _ASM_POWERPC_KPROBES_H */
> diff --git a/arch/powerpc/kernel/optprobes_head.S b/arch/powerpc/kernel/optprobes_head.S
> new file mode 100644
> index 0000000..ce32aec
> --- /dev/null
> +++ b/arch/powerpc/kernel/optprobes_head.S
> @@ -0,0 +1,108 @@
> +/*
> + * Code to prepare detour buffer for optprobes in kernel.
> + *
> + * Copyright 2016, Anju T, IBM Corp.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +
> +#include <asm/ppc_asm.h>
> +#include <asm/ptrace.h>
> +#include <asm/asm-offsets.h>
> +
> +.global optinsn_slot
> +optinsn_slot:
> +	/* Reserve an area to allocate slots for detour buffer */
> +	.space	65536

Can we replace "65536" with SLOT_SIZE?

> +.global optprobe_template_entry
> +optprobe_template_entry:
> +	stdu	r1,-INT_FRAME_SIZE(r1)
> +	SAVE_GPR(0,r1)
> +	/* Save the previous SP into stack */
> +	addi	r0,r1,INT_FRAME_SIZE
> +	std	0,GPR1(r1)
> +	SAVE_2GPRS(2,r1)
> +	SAVE_8GPRS(4,r1)
> +	SAVE_10GPRS(12,r1)
> +	SAVE_10GPRS(22,r1)
> +	/* Save SPRS */
> +	mfcfar	r5
> +	std	r5,_NIP(r1)
> +	mfmsr	r5
> +	std	r5,_MSR(r1)
> +	mfctr	r5
> +	std	r5,_CTR(r1)
> +	mflr	r5
> +	std	r5,_LINK(r1)
> +	mfspr	r5,SPRN_XER
> +	std	r5,_XER(r1)
> +	li	r5,0
> +	std	r5,_TRAP(r1)
> +	mfdar	r5
> +	std	r5,_DAR(r1)
> +	mfdsisr	r5
> +	std	r5,_DSISR(r1)

We are missing _CCR (CR register)?

Why tab space for the comment? and
a new line before the command will be good

> +	/* Pass parameters for optimized_callback */
> +.global optprobe_template_op_address1
> +optprobe_template_op_address1:
> +	nop
> +	nop
> +	nop
> +	nop
> +	nop
> +	addi	r4,r1,STACK_FRAME_OVERHEAD
same here.

> +	/* Branch to the prehandler */
> +.global optprobe_template_call_handler
> +optprobe_template_call_handler:
> +	nop
> +	/* Pass parameters for instruction emulation */
> +	addi	r3,r1,STACK_FRAME_OVERHEAD

Same here and also rest of the patch

> +.global optprobe_template_insn
> +optprobe_template_insn:
> +	nop
> +	nop
> +	/* Branch to instruction emulation  */
> +.global optprobe_template_call_emulate
> +optprobe_template_call_emulate:
> +	nop
> +.global optprobe_template_op_address2
> +optprobe_template_op_address2:
> +	nop
> +	nop
> +	nop
> +	nop
> +	nop
> +	addi	r4,r1,STACK_FRAME_OVERHEAD
> +	/* Branch to create_return_branch() function */
> +.global optprobe_template_ret_branch
> +optprobe_template_ret_branch:
> +	nop
> +	/* Restore the registers */
> +	ld	r5,_MSR(r1)
> +	mtmsr	r5
> +	ld	r5,_CTR(r1)
> +	mtctr	r5
> +	ld	r5,_LINK(r1)
> +	mtlr	r5
> +	ld	r5,_XER(r1)
> +	mtxer	r5
> +	ld	r5,_DAR(r1)
> +	mtdar	r5
> +	ld	r5,_DSISR(r1)
> +	mtdsisr	r5

Again missing CR.

> +	REST_GPR(0,r1)
> +	REST_2GPRS(2,r1)
> +	REST_8GPRS(4,r1)
> +	REST_10GPRS(12,r1)
> +	REST_10GPRS(22,r1)
> +	/* Restore the previous SP */
> +	addi	r1,r1,INT_FRAME_SIZE
> +	/* Jump back to the normal workflow from trampoline */
> +.global optprobe_template_ret
> +optprobe_template_ret:
> +	nop
> +.global optprobe_template_end
> +optprobe_template_end:

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH v2 1/3] arch/powerpc : Add detour buffer support for optprobes
  2016-05-19 15:10 ` [RFC PATCH v2 1/3] arch/powerpc : Add detour buffer support for optprobes Anju T
  2016-05-24 10:15   ` Madhavan Srinivasan
@ 2016-05-24 11:14   ` Naveen N. Rao
       [not found]   ` <201605241015.u4OAFAnB038915@mx0a-001b2d01.pphosted.com>
  2 siblings, 0 replies; 9+ messages in thread
From: Naveen N. Rao @ 2016-05-24 11:14 UTC (permalink / raw)
  To: Anju T
  Cc: linux-kernel, linuxppc-dev, ananth, paulus, srikar, benh, mpe,
	hemant, mahesh, mhiramat

On 2016/05/19 08:40PM, Anju T wrote:
> Detour buffer contains instructions to create an in memory pt_regs.
> After the execution of prehandler a call is made for instruction emulation.
> The NIP is decided after the probed instruction is executed. Hence a branch
> instruction is created to the NIP returned by emulate_step().
> 
> Instruction slot for detour buffer is allocated from
> the reserved area. For the time being 64KB is reserved
> in memory for this purpose.
> 
> Signed-off-by: Anju T <anju@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/kprobes.h   |  25 ++++++++
>  arch/powerpc/kernel/optprobes_head.S | 108 +++++++++++++++++++++++++++++++++++
>  2 files changed, 133 insertions(+)
>  create mode 100644 arch/powerpc/kernel/optprobes_head.S
> 
> diff --git a/arch/powerpc/include/asm/kprobes.h b/arch/powerpc/include/asm/kprobes.h
> index 039b583..3e4c998 100644
> --- a/arch/powerpc/include/asm/kprobes.h
> +++ b/arch/powerpc/include/asm/kprobes.h
> @@ -38,7 +38,25 @@ struct pt_regs;
>  struct kprobe;
> 
>  typedef ppc_opcode_t kprobe_opcode_t;
> +
> +extern kprobe_opcode_t optinsn_slot;
> +/* Optinsn template address */
> +extern kprobe_opcode_t optprobe_template_entry[];
> +extern kprobe_opcode_t optprobe_template_call_handler[];
> +extern kprobe_opcode_t optprobe_template_call_emulate[];
> +extern kprobe_opcode_t optprobe_template_ret_branch[];
> +extern kprobe_opcode_t optprobe_template_ret[];
> +extern kprobe_opcode_t optprobe_template_insn[];
> +extern kprobe_opcode_t optprobe_template_op_address1[];
> +extern kprobe_opcode_t optprobe_template_op_address2[];
> +extern kprobe_opcode_t optprobe_template_end[];
> +
>  #define MAX_INSN_SIZE 1
> +#define MAX_OPTIMIZED_LENGTH    4
> +#define MAX_OPTINSN_SIZE				\
> +	((unsigned long)&optprobe_template_end -	\
> +	(unsigned long)&optprobe_template_entry)
> +#define RELATIVEJUMP_SIZE       4
> 
>  #ifdef CONFIG_PPC64
>  #if defined(_CALL_ELF) && _CALL_ELF == 2
> @@ -129,5 +147,12 @@ struct kprobe_ctlblk {
>  extern int kprobe_exceptions_notify(struct notifier_block *self,
>  					unsigned long val, void *data);
>  extern int kprobe_fault_handler(struct pt_regs *regs, int trapnr);
> +
> +struct arch_optimized_insn {
> +	kprobe_opcode_t copied_insn[1];
> +	/* detour buffer */
> +	kprobe_opcode_t *insn;
> +};
> +
>  #endif /* __KERNEL__ */
>  #endif	/* _ASM_POWERPC_KPROBES_H */
> diff --git a/arch/powerpc/kernel/optprobes_head.S b/arch/powerpc/kernel/optprobes_head.S
> new file mode 100644
> index 0000000..ce32aec
> --- /dev/null
> +++ b/arch/powerpc/kernel/optprobes_head.S
> @@ -0,0 +1,108 @@
> +/*
> + * Code to prepare detour buffer for optprobes in kernel.
> + *
> + * Copyright 2016, Anju T, IBM Corp.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +
> +#include <asm/ppc_asm.h>
> +#include <asm/ptrace.h>
> +#include <asm/asm-offsets.h>
> +
> +.global optinsn_slot
> +optinsn_slot:
> +	/* Reserve an area to allocate slots for detour buffer */
> +	.space	65536
> +.global optprobe_template_entry
> +optprobe_template_entry:
> +	stdu	r1,-INT_FRAME_SIZE(r1)
> +	SAVE_GPR(0,r1)
> +	/* Save the previous SP into stack */
> +	addi	r0,r1,INT_FRAME_SIZE
> +	std	0,GPR1(r1)
> +	SAVE_2GPRS(2,r1)
> +	SAVE_8GPRS(4,r1)
> +	SAVE_10GPRS(12,r1)
> +	SAVE_10GPRS(22,r1)
> +	/* Save SPRS */
> +	mfcfar	r5
> +	std	r5,_NIP(r1)

I think this will not work. Looking through the code, I don't see us 
saving/restoring CFAR across interrupts.

Also, per the ISA:
"The Come-From Address Register (CFAR) is a 64-bit
register. When an rfebb, rfid, or rfscv instruction is
executed, the register is set to the effective address of
the instruction."

So, it looks like we actually can't save/restore it anyway.

Regardless, NIP is the same as the kprobe'd address, so you should be 
able to either use immediate loads or load the address from kprobe 
structure.


- Naveen

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH v2 1/3] arch/powerpc : Add detour buffer support for optprobes
       [not found]   ` <201605241015.u4OAFAnB038915@mx0a-001b2d01.pphosted.com>
@ 2016-05-30  8:51     ` Naveen N. Rao
  0 siblings, 0 replies; 9+ messages in thread
From: Naveen N. Rao @ 2016-05-30  8:51 UTC (permalink / raw)
  To: Madhavan Srinivasan
  Cc: Anju T, linux-kernel, linuxppc-dev, ananth, mahesh, paulus,
	mhiramat, hemant, srikar

On 2016/05/24 03:45PM, Madhavan Srinivasan wrote:
> 
> 
> On Thursday 19 May 2016 08:40 PM, Anju T wrote:
> > Detour buffer contains instructions to create an in memory pt_regs.
> > After the execution of prehandler a call is made for instruction emulation.
> > The NIP is decided after the probed instruction is executed. Hence a branch
> > instruction is created to the NIP returned by emulate_step().
> >
> > Instruction slot for detour buffer is allocated from
> > the reserved area. For the time being 64KB is reserved
> > in memory for this purpose.
> >
> > Signed-off-by: Anju T <anju@linux.vnet.ibm.com>
> > ---
> >  arch/powerpc/include/asm/kprobes.h   |  25 ++++++++
> >  arch/powerpc/kernel/optprobes_head.S | 108 +++++++++++++++++++++++++++++++++++
> >  2 files changed, 133 insertions(+)
> >  create mode 100644 arch/powerpc/kernel/optprobes_head.S
> >
> > diff --git a/arch/powerpc/include/asm/kprobes.h b/arch/powerpc/include/asm/kprobes.h
> > index 039b583..3e4c998 100644
> > --- a/arch/powerpc/include/asm/kprobes.h
> > +++ b/arch/powerpc/include/asm/kprobes.h
> > @@ -38,7 +38,25 @@ struct pt_regs;
> >  struct kprobe;
> >  
> >  typedef ppc_opcode_t kprobe_opcode_t;
> > +
> > +extern kprobe_opcode_t optinsn_slot;
> > +/* Optinsn template address */
> > +extern kprobe_opcode_t optprobe_template_entry[];
> > +extern kprobe_opcode_t optprobe_template_call_handler[];
> > +extern kprobe_opcode_t optprobe_template_call_emulate[];
> > +extern kprobe_opcode_t optprobe_template_ret_branch[];
> > +extern kprobe_opcode_t optprobe_template_ret[];
> > +extern kprobe_opcode_t optprobe_template_insn[];
> > +extern kprobe_opcode_t optprobe_template_op_address1[];
> > +extern kprobe_opcode_t optprobe_template_op_address2[];
> > +extern kprobe_opcode_t optprobe_template_end[];
> > +
> >  #define MAX_INSN_SIZE 1
> > +#define MAX_OPTIMIZED_LENGTH    4
> > +#define MAX_OPTINSN_SIZE				\
> > +	((unsigned long)&optprobe_template_end -	\
> > +	(unsigned long)&optprobe_template_entry)
> > +#define RELATIVEJUMP_SIZE       4
> >  
> >  #ifdef CONFIG_PPC64
> >  #if defined(_CALL_ELF) && _CALL_ELF == 2
> > @@ -129,5 +147,12 @@ struct kprobe_ctlblk {
> >  extern int kprobe_exceptions_notify(struct notifier_block *self,
> >  					unsigned long val, void *data);
> >  extern int kprobe_fault_handler(struct pt_regs *regs, int trapnr);
> > +
> > +struct arch_optimized_insn {
> > +	kprobe_opcode_t copied_insn[1];
> > +	/* detour buffer */
> > +	kprobe_opcode_t *insn;
> > +};
> > +
> >  #endif /* __KERNEL__ */
> >  #endif	/* _ASM_POWERPC_KPROBES_H */
> > diff --git a/arch/powerpc/kernel/optprobes_head.S b/arch/powerpc/kernel/optprobes_head.S
> > new file mode 100644
> > index 0000000..ce32aec
> > --- /dev/null
> > +++ b/arch/powerpc/kernel/optprobes_head.S
> > @@ -0,0 +1,108 @@
> > +/*
> > + * Code to prepare detour buffer for optprobes in kernel.
> > + *
> > + * Copyright 2016, Anju T, IBM Corp.
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License
> > + * as published by the Free Software Foundation; either version
> > + * 2 of the License, or (at your option) any later version.
> > + */
> > +
> > +#include <asm/ppc_asm.h>
> > +#include <asm/ptrace.h>
> > +#include <asm/asm-offsets.h>
> > +

You should also add an align directive here:

.align 2

> > +.global optinsn_slot
> > +optinsn_slot:
> > +	/* Reserve an area to allocate slots for detour buffer */
> > +	.space	65536
> 
> Can we replace "65536" with SLOT_SIZE?

Yes, a macro will be good to have.

> 
> > +.global optprobe_template_entry
> > +optprobe_template_entry:
> > +	stdu	r1,-INT_FRAME_SIZE(r1)
> > +	SAVE_GPR(0,r1)
> > +	/* Save the previous SP into stack */
> > +	addi	r0,r1,INT_FRAME_SIZE
> > +	std	0,GPR1(r1)
> > +	SAVE_2GPRS(2,r1)
> > +	SAVE_8GPRS(4,r1)
> > +	SAVE_10GPRS(12,r1)
> > +	SAVE_10GPRS(22,r1)
> > +	/* Save SPRS */
> > +	mfcfar	r5
> > +	std	r5,_NIP(r1)
> > +	mfmsr	r5
> > +	std	r5,_MSR(r1)
> > +	mfctr	r5
> > +	std	r5,_CTR(r1)
> > +	mflr	r5
> > +	std	r5,_LINK(r1)
> > +	mfspr	r5,SPRN_XER
> > +	std	r5,_XER(r1)
> > +	li	r5,0
> > +	std	r5,_TRAP(r1)
> > +	mfdar	r5
> > +	std	r5,_DAR(r1)
> > +	mfdsisr	r5
> > +	std	r5,_DSISR(r1)
> 
> We are missing _CCR (CR register)?

Yes, and I think we need to ensure we fill reasonable values for all 
registers in pt_regs including softe (from paca), orig_gpr3 (0?) and 
result (0).

> 
> Why tab space for the comment? and
> a new line before the command will be good

... new line before *each section* will be good :)

> 
> > +	/* Pass parameters for optimized_callback */
> > +.global optprobe_template_op_address1
> > +optprobe_template_op_address1:

So, the general style is to align labels at the start of a line and to 
align everything else as per the code indentation.

In this case though, I think it's best to use a macro for better 
readability. Something like:

#define _GLOB(name)	.global name;		\
			name:


- Naveen

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-05-30  8:53 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-19 15:10 [RFC PATCH v2 0/3] OPTPROBES for powerpc Anju T
2016-05-19 15:10 ` [RFC PATCH v2 1/3] arch/powerpc : Add detour buffer support for optprobes Anju T
2016-05-24 10:15   ` Madhavan Srinivasan
2016-05-24 11:14   ` Naveen N. Rao
     [not found]   ` <201605241015.u4OAFAnB038915@mx0a-001b2d01.pphosted.com>
2016-05-30  8:51     ` Naveen N. Rao
2016-05-19 15:10 ` [RFC PATCH v2 2/3] arch/powerpc : optprobes for powerpc core Anju T
2016-05-20 12:37   ` Masami Hiramatsu
2016-05-24  6:06     ` Anju T
2016-05-19 15:10 ` [RFC PATCH v2 3/3] arch/powerpc : Enable optprobes support in powerpc Anju T

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).