All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] uprobes/x86: introduce uprobe_xol_ops and arch_uprobe->ops
@ 2014-03-31 19:43 Oleg Nesterov
  2014-03-31 19:43 ` [PATCH 1/7] uprobes: Kill UPROBE_SKIP_SSTEP and can_skip_sstep() Oleg Nesterov
                   ` (8 more replies)
  0 siblings, 9 replies; 34+ messages in thread
From: Oleg Nesterov @ 2014-03-31 19:43 UTC (permalink / raw)
  To: Ingo Molnar, Srikar Dronamraju
  Cc: Ananth N Mavinakayanahalli, David Long, Denys Vlasenko,
	Frank Ch. Eigler, Jim Keniston, Jonathan Lebon, Masami Hiramatsu,
	linux-kernel

Hello.

x86 can not handle the rip-relative jmp/call instrsuctions, the probed
task can be killed by general protection fault. I'll describe this in
more details when I send the fixes. Now I am sending the preparations
which (I hope) make sense anyway, please review.

My main concern is 3/7. I know absolutely nothing about instruction
decoding, so I can only guess what, for example, OPCODE1() == 0xff or
MODRM_REG() == 2 actually means. Please review.

Ananth, David, please ack/nack the first change, it affects powerpc/arm.

Oleg.

 arch/x86/include/asm/uprobes.h |    7 +-
 arch/x86/kernel/uprobes.c      |  343 ++++++++++++++++++++--------------------
 kernel/events/uprobes.c        |   23 +---
 3 files changed, 176 insertions(+), 197 deletions(-)


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 1/7] uprobes: Kill UPROBE_SKIP_SSTEP and can_skip_sstep()
  2014-03-31 19:43 [PATCH 0/7] uprobes/x86: introduce uprobe_xol_ops and arch_uprobe->ops Oleg Nesterov
@ 2014-03-31 19:43 ` Oleg Nesterov
  2014-04-01  1:41   ` Masami Hiramatsu
                     ` (2 more replies)
  2014-03-31 19:43 ` [PATCH 2/7] uprobes/x86: Fold prepare_fixups() into arch_uprobe_analyze_insn() Oleg Nesterov
                   ` (7 subsequent siblings)
  8 siblings, 3 replies; 34+ messages in thread
From: Oleg Nesterov @ 2014-03-31 19:43 UTC (permalink / raw)
  To: Ingo Molnar, Srikar Dronamraju
  Cc: Ananth N Mavinakayanahalli, David Long, Denys Vlasenko,
	Frank Ch. Eigler, Jim Keniston, Jonathan Lebon, Masami Hiramatsu,
	linux-kernel

UPROBE_COPY_INSN, UPROBE_SKIP_SSTEP, and uprobe->flags must die. This
patch kills UPROBE_SKIP_SSTEP. I never understood why it was added;
not only it doesn't help, it harms.

It can only help to avoid arch_uprobe_skip_sstep() if it was already
called before and failed. But this is ugly, if we want to know whether
we can emulate this instruction or not we should do this analysis in
arch_uprobe_analyze_insn(), not when we hit this probe for the first
time.

And in fact this logic is simply wrong. arch_uprobe_skip_sstep() can
fail or not depending on the task/register state, if this insn can be
emulated but, say, put_user() fails we need to xol it this time, but
this doesn't mean we shouldn't try to emulate it when this or another
thread hist this bp next time.

And this is the actual reason for this change. We need to emulate the
"call" insn, but push(return-address) can obviously fail.

Per-arch notes:

	x86: __skip_sstep() can only emulate "rep;nop". With this
	     change it will be called every time and most probably
	     for no reason.

	     This will be fixed by the next changes. We need to
	     change this suboptimal code anyway.

	arm: Should not be affected. It has its own "bool simulate"
	     flag checked in arch_uprobe_skip_sstep().

	ppc: Looks like, it can emulate almost everything. Does it
	     actually needs to record the fact that emulate_step()
	     failed? Hopefully not. But if yes, it can add the ppc-
	     specific flag into arch_uprobe.

TODO: rename arch_uprobe_skip_sstep() to arch_uprobe_emulate_insn(),

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/events/uprobes.c |   23 ++---------------------
 1 files changed, 2 insertions(+), 21 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 307d87c..7a3e14e 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -60,8 +60,6 @@ static struct percpu_rw_semaphore dup_mmap_sem;
 
 /* Have a copy of original instruction */
 #define UPROBE_COPY_INSN	0
-/* Can skip singlestep */
-#define UPROBE_SKIP_SSTEP	1
 
 struct uprobe {
 	struct rb_node		rb_node;	/* node in the rb tree */
@@ -491,12 +489,9 @@ static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset)
 	uprobe->offset = offset;
 	init_rwsem(&uprobe->register_rwsem);
 	init_rwsem(&uprobe->consumer_rwsem);
-	/* For now assume that the instruction need not be single-stepped */
-	__set_bit(UPROBE_SKIP_SSTEP, &uprobe->flags);
 
 	/* add to uprobes_tree, sorted on inode:offset */
 	cur_uprobe = insert_uprobe(uprobe);
-
 	/* a uprobe exists for this inode:offset combination */
 	if (cur_uprobe) {
 		kfree(uprobe);
@@ -1628,20 +1623,6 @@ bool uprobe_deny_signal(void)
 	return true;
 }
 
-/*
- * Avoid singlestepping the original instruction if the original instruction
- * is a NOP or can be emulated.
- */
-static bool can_skip_sstep(struct uprobe *uprobe, struct pt_regs *regs)
-{
-	if (test_bit(UPROBE_SKIP_SSTEP, &uprobe->flags)) {
-		if (arch_uprobe_skip_sstep(&uprobe->arch, regs))
-			return true;
-		clear_bit(UPROBE_SKIP_SSTEP, &uprobe->flags);
-	}
-	return false;
-}
-
 static void mmf_recalc_uprobes(struct mm_struct *mm)
 {
 	struct vm_area_struct *vma;
@@ -1859,13 +1840,13 @@ static void handle_swbp(struct pt_regs *regs)
 		goto out;
 
 	handler_chain(uprobe, regs);
-	if (can_skip_sstep(uprobe, regs))
+	if (arch_uprobe_skip_sstep(&uprobe->arch, regs))
 		goto out;
 
 	if (!pre_ssout(uprobe, regs, bp_vaddr))
 		return;
 
-	/* can_skip_sstep() succeeded, or restart if can't singlestep */
+	/* arch_uprobe_skip_sstep() succeeded, or restart if can't singlestep */
 out:
 	put_uprobe(uprobe);
 }
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 2/7] uprobes/x86: Fold prepare_fixups() into arch_uprobe_analyze_insn()
  2014-03-31 19:43 [PATCH 0/7] uprobes/x86: introduce uprobe_xol_ops and arch_uprobe->ops Oleg Nesterov
  2014-03-31 19:43 ` [PATCH 1/7] uprobes: Kill UPROBE_SKIP_SSTEP and can_skip_sstep() Oleg Nesterov
@ 2014-03-31 19:43 ` Oleg Nesterov
  2014-04-01  2:00   ` Masami Hiramatsu
  2014-04-03 16:33   ` Srikar Dronamraju
  2014-03-31 19:44 ` [PATCH 3/7] uprobes/x86: Conditionalize the usage of handle_riprel_insn() Oleg Nesterov
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 34+ messages in thread
From: Oleg Nesterov @ 2014-03-31 19:43 UTC (permalink / raw)
  To: Ingo Molnar, Srikar Dronamraju
  Cc: Ananth N Mavinakayanahalli, David Long, Denys Vlasenko,
	Frank Ch. Eigler, Jim Keniston, Jonathan Lebon, Masami Hiramatsu,
	linux-kernel

No functional changes, preparation.

Shift the code from prepare_fixups() to arch_uprobe_analyze_insn()
with the following modifications:

	- Do not call insn_get_opcode() again, it was already called
	  by validate_insn_bits().

	- Move "case 0xea" up. This way "case 0xff" can fall through
	  to default case.

	- change "case 0xff" to use the nested "switch (MODRM_REG)",
	  this way the code looks a bit simpler.

	- Make the comments look consistent.

While at it, kill the initialization of rip_rela_target_address and
->fixups, we can rely on kzalloc(). We will add the new members into
arch_uprobe, it would be better to assume that everything is zero by
default.

TODO: cleanup/fix the mess in validate_insn_bits() paths:

	- validate_insn_64bits() and validate_insn_32bits() should be
	  unified.

	- "ifdef" is not used consistently; if good_insns_64 depends
	  on CONFIG_X86_64, then probably good_insns_32 should depend
	  on CONFIG_X86_32/EMULATION

	- the usage of mm->context.ia32_compat looks wrong if the task
	  is TIF_X32.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 arch/x86/kernel/uprobes.c |  110 +++++++++++++++++++--------------------------
 1 files changed, 47 insertions(+), 63 deletions(-)

diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 2ed8459..098e56e 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -53,7 +53,7 @@
 #define OPCODE1(insn)		((insn)->opcode.bytes[0])
 #define OPCODE2(insn)		((insn)->opcode.bytes[1])
 #define OPCODE3(insn)		((insn)->opcode.bytes[2])
-#define MODRM_REG(insn)		X86_MODRM_REG(insn->modrm.value)
+#define MODRM_REG(insn)		X86_MODRM_REG((insn)->modrm.value)
 
 #define W(row, b0, b1, b2, b3, b4, b5, b6, b7, b8, b9, ba, bb, bc, bd, be, bf)\
 	(((b0##UL << 0x0)|(b1##UL << 0x1)|(b2##UL << 0x2)|(b3##UL << 0x3) |   \
@@ -229,63 +229,6 @@ static int validate_insn_32bits(struct arch_uprobe *auprobe, struct insn *insn)
 	return -ENOTSUPP;
 }
 
-/*
- * Figure out which fixups arch_uprobe_post_xol() will need to perform, and
- * annotate arch_uprobe->fixups accordingly.  To start with,
- * arch_uprobe->fixups is either zero or it reflects rip-related fixups.
- */
-static void prepare_fixups(struct arch_uprobe *auprobe, struct insn *insn)
-{
-	bool fix_ip = true, fix_call = false;	/* defaults */
-	int reg;
-
-	insn_get_opcode(insn);	/* should be a nop */
-
-	switch (OPCODE1(insn)) {
-	case 0x9d:
-		/* popf */
-		auprobe->fixups |= UPROBE_FIX_SETF;
-		break;
-	case 0xc3:		/* ret/lret */
-	case 0xcb:
-	case 0xc2:
-	case 0xca:
-		/* ip is correct */
-		fix_ip = false;
-		break;
-	case 0xe8:		/* call relative - Fix return addr */
-		fix_call = true;
-		break;
-	case 0x9a:		/* call absolute - Fix return addr, not ip */
-		fix_call = true;
-		fix_ip = false;
-		break;
-	case 0xff:
-		insn_get_modrm(insn);
-		reg = MODRM_REG(insn);
-		if (reg == 2 || reg == 3) {
-			/* call or lcall, indirect */
-			/* Fix return addr; ip is correct. */
-			fix_call = true;
-			fix_ip = false;
-		} else if (reg == 4 || reg == 5) {
-			/* jmp or ljmp, indirect */
-			/* ip is correct. */
-			fix_ip = false;
-		}
-		break;
-	case 0xea:		/* jmp absolute -- ip is correct */
-		fix_ip = false;
-		break;
-	default:
-		break;
-	}
-	if (fix_ip)
-		auprobe->fixups |= UPROBE_FIX_IP;
-	if (fix_call)
-		auprobe->fixups |= UPROBE_FIX_CALL;
-}
-
 #ifdef CONFIG_X86_64
 /*
  * If arch_uprobe->insn doesn't use rip-relative addressing, return
@@ -318,7 +261,6 @@ handle_riprel_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, struct ins
 	if (mm->context.ia32_compat)
 		return;
 
-	auprobe->rip_rela_target_address = 0x0;
 	if (!insn_rip_relative(insn))
 		return;
 
@@ -421,16 +363,58 @@ static int validate_insn_bits(struct arch_uprobe *auprobe, struct mm_struct *mm,
  */
 int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long addr)
 {
-	int ret;
 	struct insn insn;
+	bool fix_ip = true, fix_call = false;
+	int ret;
 
-	auprobe->fixups = 0;
 	ret = validate_insn_bits(auprobe, mm, &insn);
-	if (ret != 0)
+	if (ret)
 		return ret;
 
+	/*
+	 * Figure out which fixups arch_uprobe_post_xol() will need to perform,
+	 * and annotate arch_uprobe->fixups accordingly. To start with, ->fixups
+	 * is either zero or it reflects rip-related fixups.
+	 */
 	handle_riprel_insn(auprobe, mm, &insn);
-	prepare_fixups(auprobe, &insn);
+
+	switch (OPCODE1(&insn)) {
+	case 0x9d:		/* popf */
+		auprobe->fixups |= UPROBE_FIX_SETF;
+		break;
+	case 0xc3:		/* ret or lret -- ip is correct */
+	case 0xcb:
+	case 0xc2:
+	case 0xca:
+		fix_ip = false;
+		break;
+	case 0xe8:		/* call relative - Fix return addr */
+		fix_call = true;
+		break;
+	case 0x9a:		/* call absolute - Fix return addr, not ip */
+		fix_call = true;
+		fix_ip = false;
+		break;
+	case 0xea:		/* jmp absolute -- ip is correct */
+		fix_ip = false;
+		break;
+	case 0xff:
+		insn_get_modrm(&insn);
+		switch (MODRM_REG(&insn)) {
+		case 2: case 3:			/* call or lcall, indirect */
+			fix_call = true;
+		case 4: case 5:			/* jmp or ljmp, indirect */
+			fix_ip = false;
+		}
+		break;
+	default:
+		break;
+	}
+
+	if (fix_ip)
+		auprobe->fixups |= UPROBE_FIX_IP;
+	if (fix_call)
+		auprobe->fixups |= UPROBE_FIX_CALL;
 
 	return 0;
 }
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 3/7] uprobes/x86: Conditionalize the usage of handle_riprel_insn()
  2014-03-31 19:43 [PATCH 0/7] uprobes/x86: introduce uprobe_xol_ops and arch_uprobe->ops Oleg Nesterov
  2014-03-31 19:43 ` [PATCH 1/7] uprobes: Kill UPROBE_SKIP_SSTEP and can_skip_sstep() Oleg Nesterov
  2014-03-31 19:43 ` [PATCH 2/7] uprobes/x86: Fold prepare_fixups() into arch_uprobe_analyze_insn() Oleg Nesterov
@ 2014-03-31 19:44 ` Oleg Nesterov
  2014-04-01  3:17   ` Masami Hiramatsu
  2014-03-31 19:44 ` [PATCH 4/7] uprobes/x86: Kill the "ia32_compat" check in handle_riprel_insn(), remove "mm" arg Oleg Nesterov
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 34+ messages in thread
From: Oleg Nesterov @ 2014-03-31 19:44 UTC (permalink / raw)
  To: Ingo Molnar, Srikar Dronamraju
  Cc: Ananth N Mavinakayanahalli, David Long, Denys Vlasenko,
	Frank Ch. Eigler, Jim Keniston, Jonathan Lebon, Masami Hiramatsu,
	linux-kernel

arch_uprobe_analyze_insn() calls handle_riprel_insn() at the start,
but only "0xff" and "default" cases need the UPROBE_FIX_RIP_ logic.
Move the callsite into "default" case and change the "0xff" case to
fall-through.

We are going to add the various hooks to handle the rip-relative
jmp/call instructions (and more), we need this change to enforce the
fact that the new code can't conflict with is_riprel_insn() code.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 arch/x86/kernel/uprobes.c |    6 ++----
 1 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 098e56e..d72dfbf 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -376,8 +376,6 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm,
 	 * and annotate arch_uprobe->fixups accordingly. To start with, ->fixups
 	 * is either zero or it reflects rip-related fixups.
 	 */
-	handle_riprel_insn(auprobe, mm, &insn);
-
 	switch (OPCODE1(&insn)) {
 	case 0x9d:		/* popf */
 		auprobe->fixups |= UPROBE_FIX_SETF;
@@ -406,9 +404,9 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm,
 		case 4: case 5:			/* jmp or ljmp, indirect */
 			fix_ip = false;
 		}
-		break;
+		/* fall through */
 	default:
-		break;
+		handle_riprel_insn(auprobe, mm, &insn);
 	}
 
 	if (fix_ip)
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 4/7] uprobes/x86: Kill the "ia32_compat" check in handle_riprel_insn(), remove "mm" arg
  2014-03-31 19:43 [PATCH 0/7] uprobes/x86: introduce uprobe_xol_ops and arch_uprobe->ops Oleg Nesterov
                   ` (2 preceding siblings ...)
  2014-03-31 19:44 ` [PATCH 3/7] uprobes/x86: Conditionalize the usage of handle_riprel_insn() Oleg Nesterov
@ 2014-03-31 19:44 ` Oleg Nesterov
  2014-04-01  3:10   ` Masami Hiramatsu
  2014-04-03 16:33   ` Srikar Dronamraju
  2014-03-31 19:44 ` [PATCH 5/7] uprobes/x86: Gather "riprel" functions together Oleg Nesterov
                   ` (4 subsequent siblings)
  8 siblings, 2 replies; 34+ messages in thread
From: Oleg Nesterov @ 2014-03-31 19:44 UTC (permalink / raw)
  To: Ingo Molnar, Srikar Dronamraju
  Cc: Ananth N Mavinakayanahalli, David Long, Denys Vlasenko,
	Frank Ch. Eigler, Jim Keniston, Jonathan Lebon, Masami Hiramatsu,
	linux-kernel

Kill the "mm->context.ia32_compat" check in handle_riprel_insn(), if
it is true insn_rip_relative() must return false. validate_insn_bits()
passed "ia32_compat" as !x86_64 to insn_init(), and insn_rip_relative()
checks insn->x86_64.

Also, remove the no longer needed "struct mm_struct *mm" argument and
the unnecessary "return" at the end.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 arch/x86/kernel/uprobes.c |   10 +++-------
 1 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index d72dfbf..382a69d 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -253,14 +253,11 @@ static int validate_insn_32bits(struct arch_uprobe *auprobe, struct insn *insn)
  *  - The displacement is always 4 bytes.
  */
 static void
-handle_riprel_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, struct insn *insn)
+handle_riprel_insn(struct arch_uprobe *auprobe, struct insn *insn)
 {
 	u8 *cursor;
 	u8 reg;
 
-	if (mm->context.ia32_compat)
-		return;
-
 	if (!insn_rip_relative(insn))
 		return;
 
@@ -314,7 +311,6 @@ handle_riprel_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, struct ins
 		cursor++;
 		memmove(cursor, cursor + insn->displacement.nbytes, insn->immediate.nbytes);
 	}
-	return;
 }
 
 static int validate_insn_64bits(struct arch_uprobe *auprobe, struct insn *insn)
@@ -343,7 +339,7 @@ static int validate_insn_bits(struct arch_uprobe *auprobe, struct mm_struct *mm,
 	return validate_insn_64bits(auprobe, insn);
 }
 #else /* 32-bit: */
-static void handle_riprel_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, struct insn *insn)
+static void handle_riprel_insn(struct arch_uprobe *auprobe, struct insn *insn)
 {
 	/* No RIP-relative addressing on 32-bit */
 }
@@ -406,7 +402,7 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm,
 		}
 		/* fall through */
 	default:
-		handle_riprel_insn(auprobe, mm, &insn);
+		handle_riprel_insn(auprobe, &insn);
 	}
 
 	if (fix_ip)
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 5/7] uprobes/x86: Gather "riprel" functions together
  2014-03-31 19:43 [PATCH 0/7] uprobes/x86: introduce uprobe_xol_ops and arch_uprobe->ops Oleg Nesterov
                   ` (3 preceding siblings ...)
  2014-03-31 19:44 ` [PATCH 4/7] uprobes/x86: Kill the "ia32_compat" check in handle_riprel_insn(), remove "mm" arg Oleg Nesterov
@ 2014-03-31 19:44 ` Oleg Nesterov
  2014-04-01  3:21   ` Masami Hiramatsu
  2014-04-02 19:53   ` Jim Keniston
  2014-03-31 19:44 ` [PATCH 6/7] uprobes/x86: move the UPROBE_FIX_{RIP,IP,CALL} code at the end of pre/post hooks Oleg Nesterov
                   ` (3 subsequent siblings)
  8 siblings, 2 replies; 34+ messages in thread
From: Oleg Nesterov @ 2014-03-31 19:44 UTC (permalink / raw)
  To: Ingo Molnar, Srikar Dronamraju
  Cc: Ananth N Mavinakayanahalli, David Long, Denys Vlasenko,
	Frank Ch. Eigler, Jim Keniston, Jonathan Lebon, Masami Hiramatsu,
	linux-kernel

Cosmetic. Move pre_xol_rip_insn() and handle_riprel_post_xol() up
to the closely related handle_riprel_insn().

This way it is simpler to read and understand this code, and this
lessens the number of ifdef's.

TODO: rename them somehow to make the naming consistent.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 arch/x86/kernel/uprobes.c |  118 ++++++++++++++++++++-------------------------
 1 files changed, 53 insertions(+), 65 deletions(-)

diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 382a69d..e8d7b38 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -313,6 +313,48 @@ handle_riprel_insn(struct arch_uprobe *auprobe, struct insn *insn)
 	}
 }
 
+/*
+ * If we're emulating a rip-relative instruction, save the contents
+ * of the scratch register and store the target address in that register.
+ */
+static void
+pre_xol_rip_insn(struct arch_uprobe *auprobe, struct pt_regs *regs,
+				struct arch_uprobe_task *autask)
+{
+	if (auprobe->fixups & UPROBE_FIX_RIP_AX) {
+		autask->saved_scratch_register = regs->ax;
+		regs->ax = current->utask->vaddr;
+		regs->ax += auprobe->rip_rela_target_address;
+	} else if (auprobe->fixups & UPROBE_FIX_RIP_CX) {
+		autask->saved_scratch_register = regs->cx;
+		regs->cx = current->utask->vaddr;
+		regs->cx += auprobe->rip_rela_target_address;
+	}
+}
+
+static void
+handle_riprel_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs, long *correction)
+{
+	if (auprobe->fixups & (UPROBE_FIX_RIP_AX | UPROBE_FIX_RIP_CX)) {
+		struct arch_uprobe_task *autask;
+
+		autask = &current->utask->autask;
+		if (auprobe->fixups & UPROBE_FIX_RIP_AX)
+			regs->ax = autask->saved_scratch_register;
+		else
+			regs->cx = autask->saved_scratch_register;
+
+		/*
+		 * The original instruction includes a displacement, and so
+		 * is 4 bytes longer than what we've just single-stepped.
+		 * Fall through to handle stuff like "jmpq *...(%rip)" and
+		 * "callq *...(%rip)".
+		 */
+		if (correction)
+			*correction += 4;
+	}
+}
+
 static int validate_insn_64bits(struct arch_uprobe *auprobe, struct insn *insn)
 {
 	insn_init(insn, auprobe->insn, true);
@@ -339,9 +381,19 @@ static int validate_insn_bits(struct arch_uprobe *auprobe, struct mm_struct *mm,
 	return validate_insn_64bits(auprobe, insn);
 }
 #else /* 32-bit: */
+/*
+ * No RIP-relative addressing on 32-bit
+ */
 static void handle_riprel_insn(struct arch_uprobe *auprobe, struct insn *insn)
 {
-	/* No RIP-relative addressing on 32-bit */
+}
+static void pre_xol_rip_insn(struct arch_uprobe *auprobe, struct pt_regs *regs,
+				struct arch_uprobe_task *autask)
+{
+}
+static void handle_riprel_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs,
+					long *correction)
+{
 }
 
 static int validate_insn_bits(struct arch_uprobe *auprobe, struct mm_struct *mm,  struct insn *insn)
@@ -413,34 +465,6 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm,
 	return 0;
 }
 
-#ifdef CONFIG_X86_64
-/*
- * If we're emulating a rip-relative instruction, save the contents
- * of the scratch register and store the target address in that register.
- */
-static void
-pre_xol_rip_insn(struct arch_uprobe *auprobe, struct pt_regs *regs,
-				struct arch_uprobe_task *autask)
-{
-	if (auprobe->fixups & UPROBE_FIX_RIP_AX) {
-		autask->saved_scratch_register = regs->ax;
-		regs->ax = current->utask->vaddr;
-		regs->ax += auprobe->rip_rela_target_address;
-	} else if (auprobe->fixups & UPROBE_FIX_RIP_CX) {
-		autask->saved_scratch_register = regs->cx;
-		regs->cx = current->utask->vaddr;
-		regs->cx += auprobe->rip_rela_target_address;
-	}
-}
-#else
-static void
-pre_xol_rip_insn(struct arch_uprobe *auprobe, struct pt_regs *regs,
-				struct arch_uprobe_task *autask)
-{
-	/* No RIP-relative addressing on 32-bit */
-}
-#endif
-
 /*
  * arch_uprobe_pre_xol - prepare to execute out of line.
  * @auprobe: the probepoint information.
@@ -490,42 +514,6 @@ static int adjust_ret_addr(unsigned long sp, long correction)
 	return 0;
 }
 
-#ifdef CONFIG_X86_64
-static bool is_riprel_insn(struct arch_uprobe *auprobe)
-{
-	return ((auprobe->fixups & (UPROBE_FIX_RIP_AX | UPROBE_FIX_RIP_CX)) != 0);
-}
-
-static void
-handle_riprel_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs, long *correction)
-{
-	if (is_riprel_insn(auprobe)) {
-		struct arch_uprobe_task *autask;
-
-		autask = &current->utask->autask;
-		if (auprobe->fixups & UPROBE_FIX_RIP_AX)
-			regs->ax = autask->saved_scratch_register;
-		else
-			regs->cx = autask->saved_scratch_register;
-
-		/*
-		 * The original instruction includes a displacement, and so
-		 * is 4 bytes longer than what we've just single-stepped.
-		 * Fall through to handle stuff like "jmpq *...(%rip)" and
-		 * "callq *...(%rip)".
-		 */
-		if (correction)
-			*correction += 4;
-	}
-}
-#else
-static void
-handle_riprel_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs, long *correction)
-{
-	/* No RIP-relative addressing on 32-bit */
-}
-#endif
-
 /*
  * If xol insn itself traps and generates a signal(Say,
  * SIGILL/SIGSEGV/etc), then detect the case where a singlestepped
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 6/7] uprobes/x86: move the UPROBE_FIX_{RIP,IP,CALL} code at the end of pre/post hooks
  2014-03-31 19:43 [PATCH 0/7] uprobes/x86: introduce uprobe_xol_ops and arch_uprobe->ops Oleg Nesterov
                   ` (4 preceding siblings ...)
  2014-03-31 19:44 ` [PATCH 5/7] uprobes/x86: Gather "riprel" functions together Oleg Nesterov
@ 2014-03-31 19:44 ` Oleg Nesterov
  2014-04-01  3:24   ` Masami Hiramatsu
  2014-04-03 16:34   ` Srikar Dronamraju
  2014-03-31 19:44 ` [PATCH 7/7] uprobes/x86: Introduce uprobe_xol_ops and arch_uprobe->ops Oleg Nesterov
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 34+ messages in thread
From: Oleg Nesterov @ 2014-03-31 19:44 UTC (permalink / raw)
  To: Ingo Molnar, Srikar Dronamraju
  Cc: Ananth N Mavinakayanahalli, David Long, Denys Vlasenko,
	Frank Ch. Eigler, Jim Keniston, Jonathan Lebon, Masami Hiramatsu,
	linux-kernel

No functional changes. Preparation to simplify the review of the next
change. Just reorder the code in arch_uprobe_pre/post_xol() functions
so that UPROBE_FIX_{RIP_*,IP,CALL} logic goes to the end.

Also change arch_uprobe_pre_xol() to use utask instead of autask, to
make the code more symmetrical with arch_uprobe_post_xol().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 arch/x86/kernel/uprobes.c |   30 ++++++++++++++----------------
 1 files changed, 14 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index e8d7b38..7dd65dc 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -472,19 +472,18 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm,
  */
 int arch_uprobe_pre_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
 {
-	struct arch_uprobe_task *autask;
+	struct uprobe_task *utask = current->utask;
 
-	autask = &current->utask->autask;
-	autask->saved_trap_nr = current->thread.trap_nr;
+	regs->ip = utask->xol_vaddr;
+	utask->autask.saved_trap_nr = current->thread.trap_nr;
 	current->thread.trap_nr = UPROBE_TRAP_NR;
-	regs->ip = current->utask->xol_vaddr;
-	pre_xol_rip_insn(auprobe, regs, autask);
 
-	autask->saved_tf = !!(regs->flags & X86_EFLAGS_TF);
+	utask->autask.saved_tf = !!(regs->flags & X86_EFLAGS_TF);
 	regs->flags |= X86_EFLAGS_TF;
 	if (test_tsk_thread_flag(current, TIF_BLOCKSTEP))
 		set_task_blockstep(current, false);
 
+	pre_xol_rip_insn(auprobe, regs, &utask->autask);
 	return 0;
 }
 
@@ -558,22 +557,13 @@ bool arch_uprobe_xol_was_trapped(struct task_struct *t)
  */
 int arch_uprobe_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
 {
-	struct uprobe_task *utask;
+	struct uprobe_task *utask = current->utask;
 	long correction;
 	int result = 0;
 
 	WARN_ON_ONCE(current->thread.trap_nr != UPROBE_TRAP_NR);
 
-	utask = current->utask;
 	current->thread.trap_nr = utask->autask.saved_trap_nr;
-	correction = (long)(utask->vaddr - utask->xol_vaddr);
-	handle_riprel_post_xol(auprobe, regs, &correction);
-	if (auprobe->fixups & UPROBE_FIX_IP)
-		regs->ip += correction;
-
-	if (auprobe->fixups & UPROBE_FIX_CALL)
-		result = adjust_ret_addr(regs->sp, correction);
-
 	/*
 	 * arch_uprobe_pre_xol() doesn't save the state of TIF_BLOCKSTEP
 	 * so we can get an extra SIGTRAP if we do not clear TF. We need
@@ -584,6 +574,14 @@ int arch_uprobe_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
 	else if (!(auprobe->fixups & UPROBE_FIX_SETF))
 		regs->flags &= ~X86_EFLAGS_TF;
 
+	correction = (long)(utask->vaddr - utask->xol_vaddr);
+	handle_riprel_post_xol(auprobe, regs, &correction);
+	if (auprobe->fixups & UPROBE_FIX_IP)
+		regs->ip += correction;
+
+	if (auprobe->fixups & UPROBE_FIX_CALL)
+		result = adjust_ret_addr(regs->sp, correction);
+
 	return result;
 }
 
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 7/7] uprobes/x86: Introduce uprobe_xol_ops and arch_uprobe->ops
  2014-03-31 19:43 [PATCH 0/7] uprobes/x86: introduce uprobe_xol_ops and arch_uprobe->ops Oleg Nesterov
                   ` (5 preceding siblings ...)
  2014-03-31 19:44 ` [PATCH 6/7] uprobes/x86: move the UPROBE_FIX_{RIP,IP,CALL} code at the end of pre/post hooks Oleg Nesterov
@ 2014-03-31 19:44 ` Oleg Nesterov
  2014-04-01  6:52   ` Masami Hiramatsu
                     ` (2 more replies)
  2014-04-02 19:58 ` [PATCH 0/7] uprobes/x86: introduce " Jim Keniston
  2014-04-03 20:00 ` [PATCH 8-9/7] " Oleg Nesterov
  8 siblings, 3 replies; 34+ messages in thread
From: Oleg Nesterov @ 2014-03-31 19:44 UTC (permalink / raw)
  To: Ingo Molnar, Srikar Dronamraju
  Cc: Ananth N Mavinakayanahalli, David Long, Denys Vlasenko,
	Frank Ch. Eigler, Jim Keniston, Jonathan Lebon, Masami Hiramatsu,
	linux-kernel

Introduce arch_uprobe->ops pointing to the "struct uprobe_xol_ops",
move the current UPROBE_FIX_{RIP*,IP,CALL} code into the default
set of methods and change arch_uprobe_pre/post_xol() accordingly.

This way we can add the new uprobe_xol_ops's to handle the insns
which need the special processing (rip-relative jmp/call at least).

TODO: An error from adjust_ret_addr() shouldn't be silently ignored,
we should teach arch_uprobe_post_xol() or handle_singlestep() paths
to restart the probed insn in this case. And probably "adjust" can
be simplified and turned into set_ret_addr(). It seems that we do
not really need copy_from_user(), we can always calculate the value
we need to write into *regs->sp.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 arch/x86/include/asm/uprobes.h |    7 ++-
 arch/x86/kernel/uprobes.c      |  107 +++++++++++++++++++++++++--------------
 2 files changed, 74 insertions(+), 40 deletions(-)

diff --git a/arch/x86/include/asm/uprobes.h b/arch/x86/include/asm/uprobes.h
index 3087ea9..9f8210b 100644
--- a/arch/x86/include/asm/uprobes.h
+++ b/arch/x86/include/asm/uprobes.h
@@ -33,12 +33,17 @@ typedef u8 uprobe_opcode_t;
 #define UPROBE_SWBP_INSN		0xcc
 #define UPROBE_SWBP_INSN_SIZE		   1
 
+struct uprobe_xol_ops;
+
 struct arch_uprobe {
-	u16				fixups;
 	union {
 		u8			insn[MAX_UINSN_BYTES];
 		u8			ixol[MAX_UINSN_BYTES];
 	};
+
+	u16				fixups;
+	const struct uprobe_xol_ops	*ops;
+
 #ifdef CONFIG_X86_64
 	unsigned long			rip_rela_target_address;
 #endif
diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 7dd65dc..a822de5 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -402,6 +402,64 @@ static int validate_insn_bits(struct arch_uprobe *auprobe, struct mm_struct *mm,
 }
 #endif /* CONFIG_X86_64 */
 
+struct uprobe_xol_ops {
+	bool	(*emulate)(struct arch_uprobe *, struct pt_regs *);
+	int	(*pre_xol)(struct arch_uprobe *, struct pt_regs *);
+	int	(*post_xol)(struct arch_uprobe *, struct pt_regs *);
+};
+
+static int default_pre_xol_op(struct arch_uprobe *auprobe, struct pt_regs *regs)
+{
+	pre_xol_rip_insn(auprobe, regs, &current->utask->autask);
+	return 0;
+}
+
+/*
+ * Adjust the return address pushed by a call insn executed out of line.
+ */
+static int adjust_ret_addr(unsigned long sp, long correction)
+{
+	int rasize, ncopied;
+	long ra = 0;
+
+	if (is_ia32_task())
+		rasize = 4;
+	else
+		rasize = 8;
+
+	ncopied = copy_from_user(&ra, (void __user *)sp, rasize);
+	if (unlikely(ncopied))
+		return -EFAULT;
+
+	ra += correction;
+	ncopied = copy_to_user((void __user *)sp, &ra, rasize);
+	if (unlikely(ncopied))
+		return -EFAULT;
+
+	return 0;
+}
+
+static int default_post_xol_op(struct arch_uprobe *auprobe, struct pt_regs *regs)
+{
+	struct uprobe_task *utask = current->utask;
+	long correction = (long)(utask->vaddr - utask->xol_vaddr);
+	int ret = 0;
+
+	handle_riprel_post_xol(auprobe, regs, &correction);
+	if (auprobe->fixups & UPROBE_FIX_IP)
+		regs->ip += correction;
+
+	if (auprobe->fixups & UPROBE_FIX_CALL)
+		ret = adjust_ret_addr(regs->sp, correction);
+
+	return ret;
+}
+
+static struct uprobe_xol_ops defaule_xop_ops = {
+	.pre_xol  = default_pre_xol_op,
+	.post_xol = default_post_xol_op,
+};
+
 /**
  * arch_uprobe_analyze_insn - instruction analysis including validity and fixups.
  * @mm: the probed address space.
@@ -462,6 +520,7 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm,
 	if (fix_call)
 		auprobe->fixups |= UPROBE_FIX_CALL;
 
+	auprobe->ops = &defaule_xop_ops;
 	return 0;
 }
 
@@ -483,33 +542,8 @@ int arch_uprobe_pre_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
 	if (test_tsk_thread_flag(current, TIF_BLOCKSTEP))
 		set_task_blockstep(current, false);
 
-	pre_xol_rip_insn(auprobe, regs, &utask->autask);
-	return 0;
-}
-
-/*
- * This function is called by arch_uprobe_post_xol() to adjust the return
- * address pushed by a call instruction executed out of line.
- */
-static int adjust_ret_addr(unsigned long sp, long correction)
-{
-	int rasize, ncopied;
-	long ra = 0;
-
-	if (is_ia32_task())
-		rasize = 4;
-	else
-		rasize = 8;
-
-	ncopied = copy_from_user(&ra, (void __user *)sp, rasize);
-	if (unlikely(ncopied))
-		return -EFAULT;
-
-	ra += correction;
-	ncopied = copy_to_user((void __user *)sp, &ra, rasize);
-	if (unlikely(ncopied))
-		return -EFAULT;
-
+	if (auprobe->ops->pre_xol)
+		return auprobe->ops->pre_xol(auprobe, regs);
 	return 0;
 }
 
@@ -558,11 +592,8 @@ bool arch_uprobe_xol_was_trapped(struct task_struct *t)
 int arch_uprobe_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
 {
 	struct uprobe_task *utask = current->utask;
-	long correction;
-	int result = 0;
 
 	WARN_ON_ONCE(current->thread.trap_nr != UPROBE_TRAP_NR);
-
 	current->thread.trap_nr = utask->autask.saved_trap_nr;
 	/*
 	 * arch_uprobe_pre_xol() doesn't save the state of TIF_BLOCKSTEP
@@ -574,15 +605,9 @@ int arch_uprobe_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
 	else if (!(auprobe->fixups & UPROBE_FIX_SETF))
 		regs->flags &= ~X86_EFLAGS_TF;
 
-	correction = (long)(utask->vaddr - utask->xol_vaddr);
-	handle_riprel_post_xol(auprobe, regs, &correction);
-	if (auprobe->fixups & UPROBE_FIX_IP)
-		regs->ip += correction;
-
-	if (auprobe->fixups & UPROBE_FIX_CALL)
-		result = adjust_ret_addr(regs->sp, correction);
-
-	return result;
+	if (auprobe->ops->post_xol)
+		return auprobe->ops->post_xol(auprobe, regs);
+	return 0;
 }
 
 /* callback routine for handling exceptions. */
@@ -640,6 +665,10 @@ static bool __skip_sstep(struct arch_uprobe *auprobe, struct pt_regs *regs)
 {
 	int i;
 
+	if (auprobe->ops->emulate)
+		return auprobe->ops->emulate(auprobe, regs);
+
+	/* TODO: move this code into ->emulate() hook */
 	for (i = 0; i < MAX_UINSN_BYTES; i++) {
 		if (auprobe->insn[i] == 0x66)
 			continue;
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/7] uprobes: Kill UPROBE_SKIP_SSTEP and can_skip_sstep()
  2014-03-31 19:43 ` [PATCH 1/7] uprobes: Kill UPROBE_SKIP_SSTEP and can_skip_sstep() Oleg Nesterov
@ 2014-04-01  1:41   ` Masami Hiramatsu
  2014-04-01 15:39   ` David Long
  2014-04-03 16:32   ` Srikar Dronamraju
  2 siblings, 0 replies; 34+ messages in thread
From: Masami Hiramatsu @ 2014-04-01  1:41 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Srikar Dronamraju, Ananth N Mavinakayanahalli,
	David Long, Denys Vlasenko, Frank Ch. Eigler, Jim Keniston,
	Jonathan Lebon, linux-kernel

(2014/04/01 4:43), Oleg Nesterov wrote:
> UPROBE_COPY_INSN, UPROBE_SKIP_SSTEP, and uprobe->flags must die. This
> patch kills UPROBE_SKIP_SSTEP. I never understood why it was added;
> not only it doesn't help, it harms.
> 
> It can only help to avoid arch_uprobe_skip_sstep() if it was already
> called before and failed. But this is ugly, if we want to know whether
> we can emulate this instruction or not we should do this analysis in
> arch_uprobe_analyze_insn(), not when we hit this probe for the first
> time.
> 
> And in fact this logic is simply wrong. arch_uprobe_skip_sstep() can
> fail or not depending on the task/register state, if this insn can be
> emulated but, say, put_user() fails we need to xol it this time, but
> this doesn't mean we shouldn't try to emulate it when this or another
> thread hist this bp next time.
> 
> And this is the actual reason for this change. We need to emulate the
> "call" insn, but push(return-address) can obviously fail.
> 
> Per-arch notes:
> 
> 	x86: __skip_sstep() can only emulate "rep;nop". With this
> 	     change it will be called every time and most probably
> 	     for no reason.
> 
> 	     This will be fixed by the next changes. We need to
> 	     change this suboptimal code anyway.
> 
> 	arm: Should not be affected. It has its own "bool simulate"
> 	     flag checked in arch_uprobe_skip_sstep().
> 
> 	ppc: Looks like, it can emulate almost everything. Does it
> 	     actually needs to record the fact that emulate_step()
> 	     failed? Hopefully not. But if yes, it can add the ppc-
> 	     specific flag into arch_uprobe.
> 
> TODO: rename arch_uprobe_skip_sstep() to arch_uprobe_emulate_insn(),
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Looks reasonable to me.

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>

Thank you,

> ---
>  kernel/events/uprobes.c |   23 ++---------------------
>  1 files changed, 2 insertions(+), 21 deletions(-)
> 
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 307d87c..7a3e14e 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -60,8 +60,6 @@ static struct percpu_rw_semaphore dup_mmap_sem;
>  
>  /* Have a copy of original instruction */
>  #define UPROBE_COPY_INSN	0
> -/* Can skip singlestep */
> -#define UPROBE_SKIP_SSTEP	1
>  
>  struct uprobe {
>  	struct rb_node		rb_node;	/* node in the rb tree */
> @@ -491,12 +489,9 @@ static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset)
>  	uprobe->offset = offset;
>  	init_rwsem(&uprobe->register_rwsem);
>  	init_rwsem(&uprobe->consumer_rwsem);
> -	/* For now assume that the instruction need not be single-stepped */
> -	__set_bit(UPROBE_SKIP_SSTEP, &uprobe->flags);
>  
>  	/* add to uprobes_tree, sorted on inode:offset */
>  	cur_uprobe = insert_uprobe(uprobe);
> -
>  	/* a uprobe exists for this inode:offset combination */
>  	if (cur_uprobe) {
>  		kfree(uprobe);
> @@ -1628,20 +1623,6 @@ bool uprobe_deny_signal(void)
>  	return true;
>  }
>  
> -/*
> - * Avoid singlestepping the original instruction if the original instruction
> - * is a NOP or can be emulated.
> - */
> -static bool can_skip_sstep(struct uprobe *uprobe, struct pt_regs *regs)
> -{
> -	if (test_bit(UPROBE_SKIP_SSTEP, &uprobe->flags)) {
> -		if (arch_uprobe_skip_sstep(&uprobe->arch, regs))
> -			return true;
> -		clear_bit(UPROBE_SKIP_SSTEP, &uprobe->flags);
> -	}
> -	return false;
> -}
> -
>  static void mmf_recalc_uprobes(struct mm_struct *mm)
>  {
>  	struct vm_area_struct *vma;
> @@ -1859,13 +1840,13 @@ static void handle_swbp(struct pt_regs *regs)
>  		goto out;
>  
>  	handler_chain(uprobe, regs);
> -	if (can_skip_sstep(uprobe, regs))
> +	if (arch_uprobe_skip_sstep(&uprobe->arch, regs))
>  		goto out;
>  
>  	if (!pre_ssout(uprobe, regs, bp_vaddr))
>  		return;
>  
> -	/* can_skip_sstep() succeeded, or restart if can't singlestep */
> +	/* arch_uprobe_skip_sstep() succeeded, or restart if can't singlestep */
>  out:
>  	put_uprobe(uprobe);
>  }
> 


-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/7] uprobes/x86: Fold prepare_fixups() into arch_uprobe_analyze_insn()
  2014-03-31 19:43 ` [PATCH 2/7] uprobes/x86: Fold prepare_fixups() into arch_uprobe_analyze_insn() Oleg Nesterov
@ 2014-04-01  2:00   ` Masami Hiramatsu
  2014-04-03 16:33   ` Srikar Dronamraju
  1 sibling, 0 replies; 34+ messages in thread
From: Masami Hiramatsu @ 2014-04-01  2:00 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Srikar Dronamraju, Ananth N Mavinakayanahalli,
	David Long, Denys Vlasenko, Frank Ch. Eigler, Jim Keniston,
	Jonathan Lebon, linux-kernel

(2014/04/01 4:43), Oleg Nesterov wrote:
> No functional changes, preparation.
> 
> Shift the code from prepare_fixups() to arch_uprobe_analyze_insn()
> with the following modifications:
> 
> 	- Do not call insn_get_opcode() again, it was already called
> 	  by validate_insn_bits().
> 
> 	- Move "case 0xea" up. This way "case 0xff" can fall through
> 	  to default case.
> 
> 	- change "case 0xff" to use the nested "switch (MODRM_REG)",
> 	  this way the code looks a bit simpler.
> 
> 	- Make the comments look consistent.

Interesting. Similar cleanup should be applied to kprobes too. :)

> 
> While at it, kill the initialization of rip_rela_target_address and
> ->fixups, we can rely on kzalloc(). We will add the new members into
> arch_uprobe, it would be better to assume that everything is zero by
> default.
> 
> TODO: cleanup/fix the mess in validate_insn_bits() paths:
> 
> 	- validate_insn_64bits() and validate_insn_32bits() should be
> 	  unified.
> 
> 	- "ifdef" is not used consistently; if good_insns_64 depends
> 	  on CONFIG_X86_64, then probably good_insns_32 should depend
> 	  on CONFIG_X86_32/EMULATION
> 
> 	- the usage of mm->context.ia32_compat looks wrong if the task
> 	  is TIF_X32.
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>

> ---
>  arch/x86/kernel/uprobes.c |  110 +++++++++++++++++++--------------------------
>  1 files changed, 47 insertions(+), 63 deletions(-)
> 
> diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> index 2ed8459..098e56e 100644
> --- a/arch/x86/kernel/uprobes.c
> +++ b/arch/x86/kernel/uprobes.c
> @@ -53,7 +53,7 @@
>  #define OPCODE1(insn)		((insn)->opcode.bytes[0])
>  #define OPCODE2(insn)		((insn)->opcode.bytes[1])
>  #define OPCODE3(insn)		((insn)->opcode.bytes[2])
> -#define MODRM_REG(insn)		X86_MODRM_REG(insn->modrm.value)
> +#define MODRM_REG(insn)		X86_MODRM_REG((insn)->modrm.value)
>  
>  #define W(row, b0, b1, b2, b3, b4, b5, b6, b7, b8, b9, ba, bb, bc, bd, be, bf)\
>  	(((b0##UL << 0x0)|(b1##UL << 0x1)|(b2##UL << 0x2)|(b3##UL << 0x3) |   \
> @@ -229,63 +229,6 @@ static int validate_insn_32bits(struct arch_uprobe *auprobe, struct insn *insn)
>  	return -ENOTSUPP;
>  }
>  
> -/*
> - * Figure out which fixups arch_uprobe_post_xol() will need to perform, and
> - * annotate arch_uprobe->fixups accordingly.  To start with,
> - * arch_uprobe->fixups is either zero or it reflects rip-related fixups.
> - */
> -static void prepare_fixups(struct arch_uprobe *auprobe, struct insn *insn)
> -{
> -	bool fix_ip = true, fix_call = false;	/* defaults */
> -	int reg;
> -
> -	insn_get_opcode(insn);	/* should be a nop */
> -
> -	switch (OPCODE1(insn)) {
> -	case 0x9d:
> -		/* popf */
> -		auprobe->fixups |= UPROBE_FIX_SETF;
> -		break;
> -	case 0xc3:		/* ret/lret */
> -	case 0xcb:
> -	case 0xc2:
> -	case 0xca:
> -		/* ip is correct */
> -		fix_ip = false;
> -		break;
> -	case 0xe8:		/* call relative - Fix return addr */
> -		fix_call = true;
> -		break;
> -	case 0x9a:		/* call absolute - Fix return addr, not ip */
> -		fix_call = true;
> -		fix_ip = false;
> -		break;
> -	case 0xff:
> -		insn_get_modrm(insn);
> -		reg = MODRM_REG(insn);
> -		if (reg == 2 || reg == 3) {
> -			/* call or lcall, indirect */
> -			/* Fix return addr; ip is correct. */
> -			fix_call = true;
> -			fix_ip = false;
> -		} else if (reg == 4 || reg == 5) {
> -			/* jmp or ljmp, indirect */
> -			/* ip is correct. */
> -			fix_ip = false;
> -		}
> -		break;
> -	case 0xea:		/* jmp absolute -- ip is correct */
> -		fix_ip = false;
> -		break;
> -	default:
> -		break;
> -	}
> -	if (fix_ip)
> -		auprobe->fixups |= UPROBE_FIX_IP;
> -	if (fix_call)
> -		auprobe->fixups |= UPROBE_FIX_CALL;
> -}
> -
>  #ifdef CONFIG_X86_64
>  /*
>   * If arch_uprobe->insn doesn't use rip-relative addressing, return
> @@ -318,7 +261,6 @@ handle_riprel_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, struct ins
>  	if (mm->context.ia32_compat)
>  		return;
>  
> -	auprobe->rip_rela_target_address = 0x0;
>  	if (!insn_rip_relative(insn))
>  		return;
>  
> @@ -421,16 +363,58 @@ static int validate_insn_bits(struct arch_uprobe *auprobe, struct mm_struct *mm,
>   */
>  int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long addr)
>  {
> -	int ret;
>  	struct insn insn;
> +	bool fix_ip = true, fix_call = false;
> +	int ret;
>  
> -	auprobe->fixups = 0;
>  	ret = validate_insn_bits(auprobe, mm, &insn);
> -	if (ret != 0)
> +	if (ret)
>  		return ret;
>  
> +	/*
> +	 * Figure out which fixups arch_uprobe_post_xol() will need to perform,
> +	 * and annotate arch_uprobe->fixups accordingly. To start with, ->fixups
> +	 * is either zero or it reflects rip-related fixups.
> +	 */
>  	handle_riprel_insn(auprobe, mm, &insn);
> -	prepare_fixups(auprobe, &insn);
> +
> +	switch (OPCODE1(&insn)) {
> +	case 0x9d:		/* popf */
> +		auprobe->fixups |= UPROBE_FIX_SETF;
> +		break;
> +	case 0xc3:		/* ret or lret -- ip is correct */
> +	case 0xcb:
> +	case 0xc2:
> +	case 0xca:
> +		fix_ip = false;
> +		break;
> +	case 0xe8:		/* call relative - Fix return addr */
> +		fix_call = true;
> +		break;
> +	case 0x9a:		/* call absolute - Fix return addr, not ip */
> +		fix_call = true;
> +		fix_ip = false;
> +		break;
> +	case 0xea:		/* jmp absolute -- ip is correct */
> +		fix_ip = false;
> +		break;
> +	case 0xff:
> +		insn_get_modrm(&insn);
> +		switch (MODRM_REG(&insn)) {
> +		case 2: case 3:			/* call or lcall, indirect */
> +			fix_call = true;
> +		case 4: case 5:			/* jmp or ljmp, indirect */
> +			fix_ip = false;
> +		}
> +		break;
> +	default:
> +		break;
> +	}
> +
> +	if (fix_ip)
> +		auprobe->fixups |= UPROBE_FIX_IP;
> +	if (fix_call)
> +		auprobe->fixups |= UPROBE_FIX_CALL;
>  
>  	return 0;
>  }
> 


-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/7] uprobes/x86: Kill the "ia32_compat" check in handle_riprel_insn(), remove "mm" arg
  2014-03-31 19:44 ` [PATCH 4/7] uprobes/x86: Kill the "ia32_compat" check in handle_riprel_insn(), remove "mm" arg Oleg Nesterov
@ 2014-04-01  3:10   ` Masami Hiramatsu
  2014-04-03 16:33   ` Srikar Dronamraju
  1 sibling, 0 replies; 34+ messages in thread
From: Masami Hiramatsu @ 2014-04-01  3:10 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Srikar Dronamraju, Ananth N Mavinakayanahalli,
	David Long, Denys Vlasenko, Frank Ch. Eigler, Jim Keniston,
	Jonathan Lebon, linux-kernel

(2014/04/01 4:44), Oleg Nesterov wrote:
> Kill the "mm->context.ia32_compat" check in handle_riprel_insn(), if
> it is true insn_rip_relative() must return false. validate_insn_bits()
> passed "ia32_compat" as !x86_64 to insn_init(), and insn_rip_relative()
> checks insn->x86_64.
> 
> Also, remove the no longer needed "struct mm_struct *mm" argument and
> the unnecessary "return" at the end.
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

This looks good to me.

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>

Thank you,

> ---
>  arch/x86/kernel/uprobes.c |   10 +++-------
>  1 files changed, 3 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> index d72dfbf..382a69d 100644
> --- a/arch/x86/kernel/uprobes.c
> +++ b/arch/x86/kernel/uprobes.c
> @@ -253,14 +253,11 @@ static int validate_insn_32bits(struct arch_uprobe *auprobe, struct insn *insn)
>   *  - The displacement is always 4 bytes.
>   */
>  static void
> -handle_riprel_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, struct insn *insn)
> +handle_riprel_insn(struct arch_uprobe *auprobe, struct insn *insn)
>  {
>  	u8 *cursor;
>  	u8 reg;
>  
> -	if (mm->context.ia32_compat)
> -		return;
> -
>  	if (!insn_rip_relative(insn))
>  		return;
>  
> @@ -314,7 +311,6 @@ handle_riprel_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, struct ins
>  		cursor++;
>  		memmove(cursor, cursor + insn->displacement.nbytes, insn->immediate.nbytes);
>  	}
> -	return;
>  }
>  
>  static int validate_insn_64bits(struct arch_uprobe *auprobe, struct insn *insn)
> @@ -343,7 +339,7 @@ static int validate_insn_bits(struct arch_uprobe *auprobe, struct mm_struct *mm,
>  	return validate_insn_64bits(auprobe, insn);
>  }
>  #else /* 32-bit: */
> -static void handle_riprel_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, struct insn *insn)
> +static void handle_riprel_insn(struct arch_uprobe *auprobe, struct insn *insn)
>  {
>  	/* No RIP-relative addressing on 32-bit */
>  }
> @@ -406,7 +402,7 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm,
>  		}
>  		/* fall through */
>  	default:
> -		handle_riprel_insn(auprobe, mm, &insn);
> +		handle_riprel_insn(auprobe, &insn);
>  	}
>  
>  	if (fix_ip)
> 


-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/7] uprobes/x86: Conditionalize the usage of handle_riprel_insn()
  2014-03-31 19:44 ` [PATCH 3/7] uprobes/x86: Conditionalize the usage of handle_riprel_insn() Oleg Nesterov
@ 2014-04-01  3:17   ` Masami Hiramatsu
  2014-04-01 14:33     ` Oleg Nesterov
  0 siblings, 1 reply; 34+ messages in thread
From: Masami Hiramatsu @ 2014-04-01  3:17 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Srikar Dronamraju, Ananth N Mavinakayanahalli,
	David Long, Denys Vlasenko, Frank Ch. Eigler, Jim Keniston,
	Jonathan Lebon, linux-kernel

(2014/04/01 4:44), Oleg Nesterov wrote:
> arch_uprobe_analyze_insn() calls handle_riprel_insn() at the start,
> but only "0xff" and "default" cases need the UPROBE_FIX_RIP_ logic.
> Move the callsite into "default" case and change the "0xff" case to
> fall-through.
> 
> We are going to add the various hooks to handle the rip-relative
> jmp/call instructions (and more), we need this change to enforce the
> fact that the new code can't conflict with is_riprel_insn() code.
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Hmm, this seems not obviously reasonable at this point.
However, the code itself is not wrong. Could you merge
this change to that new hooks?

Thank you,

> ---
>  arch/x86/kernel/uprobes.c |    6 ++----
>  1 files changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> index 098e56e..d72dfbf 100644
> --- a/arch/x86/kernel/uprobes.c
> +++ b/arch/x86/kernel/uprobes.c
> @@ -376,8 +376,6 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm,
>  	 * and annotate arch_uprobe->fixups accordingly. To start with, ->fixups
>  	 * is either zero or it reflects rip-related fixups.
>  	 */
> -	handle_riprel_insn(auprobe, mm, &insn);
> -
>  	switch (OPCODE1(&insn)) {
>  	case 0x9d:		/* popf */
>  		auprobe->fixups |= UPROBE_FIX_SETF;
> @@ -406,9 +404,9 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm,
>  		case 4: case 5:			/* jmp or ljmp, indirect */
>  			fix_ip = false;
>  		}
> -		break;
> +		/* fall through */
>  	default:
> -		break;
> +		handle_riprel_insn(auprobe, mm, &insn);
>  	}
>  
>  	if (fix_ip)
> 


-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 5/7] uprobes/x86: Gather "riprel" functions together
  2014-03-31 19:44 ` [PATCH 5/7] uprobes/x86: Gather "riprel" functions together Oleg Nesterov
@ 2014-04-01  3:21   ` Masami Hiramatsu
  2014-04-02 19:53   ` Jim Keniston
  1 sibling, 0 replies; 34+ messages in thread
From: Masami Hiramatsu @ 2014-04-01  3:21 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Srikar Dronamraju, Ananth N Mavinakayanahalli,
	David Long, Denys Vlasenko, Frank Ch. Eigler, Jim Keniston,
	Jonathan Lebon, linux-kernel

(2014/04/01 4:44), Oleg Nesterov wrote:
> Cosmetic. Move pre_xol_rip_insn() and handle_riprel_post_xol() up
> to the closely related handle_riprel_insn().
> 
> This way it is simpler to read and understand this code, and this
> lessens the number of ifdef's.
> 
> TODO: rename them somehow to make the naming consistent.
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Looks good to me:)

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>

Thank you,

> ---
>  arch/x86/kernel/uprobes.c |  118 ++++++++++++++++++++-------------------------
>  1 files changed, 53 insertions(+), 65 deletions(-)
> 
> diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> index 382a69d..e8d7b38 100644
> --- a/arch/x86/kernel/uprobes.c
> +++ b/arch/x86/kernel/uprobes.c
> @@ -313,6 +313,48 @@ handle_riprel_insn(struct arch_uprobe *auprobe, struct insn *insn)
>  	}
>  }
>  
> +/*
> + * If we're emulating a rip-relative instruction, save the contents
> + * of the scratch register and store the target address in that register.
> + */
> +static void
> +pre_xol_rip_insn(struct arch_uprobe *auprobe, struct pt_regs *regs,
> +				struct arch_uprobe_task *autask)
> +{
> +	if (auprobe->fixups & UPROBE_FIX_RIP_AX) {
> +		autask->saved_scratch_register = regs->ax;
> +		regs->ax = current->utask->vaddr;
> +		regs->ax += auprobe->rip_rela_target_address;
> +	} else if (auprobe->fixups & UPROBE_FIX_RIP_CX) {
> +		autask->saved_scratch_register = regs->cx;
> +		regs->cx = current->utask->vaddr;
> +		regs->cx += auprobe->rip_rela_target_address;
> +	}
> +}
> +
> +static void
> +handle_riprel_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs, long *correction)
> +{
> +	if (auprobe->fixups & (UPROBE_FIX_RIP_AX | UPROBE_FIX_RIP_CX)) {
> +		struct arch_uprobe_task *autask;
> +
> +		autask = &current->utask->autask;
> +		if (auprobe->fixups & UPROBE_FIX_RIP_AX)
> +			regs->ax = autask->saved_scratch_register;
> +		else
> +			regs->cx = autask->saved_scratch_register;
> +
> +		/*
> +		 * The original instruction includes a displacement, and so
> +		 * is 4 bytes longer than what we've just single-stepped.
> +		 * Fall through to handle stuff like "jmpq *...(%rip)" and
> +		 * "callq *...(%rip)".
> +		 */
> +		if (correction)
> +			*correction += 4;
> +	}
> +}
> +
>  static int validate_insn_64bits(struct arch_uprobe *auprobe, struct insn *insn)
>  {
>  	insn_init(insn, auprobe->insn, true);
> @@ -339,9 +381,19 @@ static int validate_insn_bits(struct arch_uprobe *auprobe, struct mm_struct *mm,
>  	return validate_insn_64bits(auprobe, insn);
>  }
>  #else /* 32-bit: */
> +/*
> + * No RIP-relative addressing on 32-bit
> + */
>  static void handle_riprel_insn(struct arch_uprobe *auprobe, struct insn *insn)
>  {
> -	/* No RIP-relative addressing on 32-bit */
> +}
> +static void pre_xol_rip_insn(struct arch_uprobe *auprobe, struct pt_regs *regs,
> +				struct arch_uprobe_task *autask)
> +{
> +}
> +static void handle_riprel_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs,
> +					long *correction)
> +{
>  }
>  
>  static int validate_insn_bits(struct arch_uprobe *auprobe, struct mm_struct *mm,  struct insn *insn)
> @@ -413,34 +465,6 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm,
>  	return 0;
>  }
>  
> -#ifdef CONFIG_X86_64
> -/*
> - * If we're emulating a rip-relative instruction, save the contents
> - * of the scratch register and store the target address in that register.
> - */
> -static void
> -pre_xol_rip_insn(struct arch_uprobe *auprobe, struct pt_regs *regs,
> -				struct arch_uprobe_task *autask)
> -{
> -	if (auprobe->fixups & UPROBE_FIX_RIP_AX) {
> -		autask->saved_scratch_register = regs->ax;
> -		regs->ax = current->utask->vaddr;
> -		regs->ax += auprobe->rip_rela_target_address;
> -	} else if (auprobe->fixups & UPROBE_FIX_RIP_CX) {
> -		autask->saved_scratch_register = regs->cx;
> -		regs->cx = current->utask->vaddr;
> -		regs->cx += auprobe->rip_rela_target_address;
> -	}
> -}
> -#else
> -static void
> -pre_xol_rip_insn(struct arch_uprobe *auprobe, struct pt_regs *regs,
> -				struct arch_uprobe_task *autask)
> -{
> -	/* No RIP-relative addressing on 32-bit */
> -}
> -#endif
> -
>  /*
>   * arch_uprobe_pre_xol - prepare to execute out of line.
>   * @auprobe: the probepoint information.
> @@ -490,42 +514,6 @@ static int adjust_ret_addr(unsigned long sp, long correction)
>  	return 0;
>  }
>  
> -#ifdef CONFIG_X86_64
> -static bool is_riprel_insn(struct arch_uprobe *auprobe)
> -{
> -	return ((auprobe->fixups & (UPROBE_FIX_RIP_AX | UPROBE_FIX_RIP_CX)) != 0);
> -}
> -
> -static void
> -handle_riprel_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs, long *correction)
> -{
> -	if (is_riprel_insn(auprobe)) {
> -		struct arch_uprobe_task *autask;
> -
> -		autask = &current->utask->autask;
> -		if (auprobe->fixups & UPROBE_FIX_RIP_AX)
> -			regs->ax = autask->saved_scratch_register;
> -		else
> -			regs->cx = autask->saved_scratch_register;
> -
> -		/*
> -		 * The original instruction includes a displacement, and so
> -		 * is 4 bytes longer than what we've just single-stepped.
> -		 * Fall through to handle stuff like "jmpq *...(%rip)" and
> -		 * "callq *...(%rip)".
> -		 */
> -		if (correction)
> -			*correction += 4;
> -	}
> -}
> -#else
> -static void
> -handle_riprel_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs, long *correction)
> -{
> -	/* No RIP-relative addressing on 32-bit */
> -}
> -#endif
> -
>  /*
>   * If xol insn itself traps and generates a signal(Say,
>   * SIGILL/SIGSEGV/etc), then detect the case where a singlestepped
> 


-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 6/7] uprobes/x86: move the UPROBE_FIX_{RIP,IP,CALL} code at the end of pre/post hooks
  2014-03-31 19:44 ` [PATCH 6/7] uprobes/x86: move the UPROBE_FIX_{RIP,IP,CALL} code at the end of pre/post hooks Oleg Nesterov
@ 2014-04-01  3:24   ` Masami Hiramatsu
  2014-04-03 16:34   ` Srikar Dronamraju
  1 sibling, 0 replies; 34+ messages in thread
From: Masami Hiramatsu @ 2014-04-01  3:24 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Srikar Dronamraju, Ananth N Mavinakayanahalli,
	David Long, Denys Vlasenko, Frank Ch. Eigler, Jim Keniston,
	Jonathan Lebon, linux-kernel

(2014/04/01 4:44), Oleg Nesterov wrote:
> No functional changes. Preparation to simplify the review of the next
> change. Just reorder the code in arch_uprobe_pre/post_xol() functions
> so that UPROBE_FIX_{RIP_*,IP,CALL} logic goes to the end.
> 
> Also change arch_uprobe_pre_xol() to use utask instead of autask, to
> make the code more symmetrical with arch_uprobe_post_xol().
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>

Thank you!

> ---
>  arch/x86/kernel/uprobes.c |   30 ++++++++++++++----------------
>  1 files changed, 14 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> index e8d7b38..7dd65dc 100644
> --- a/arch/x86/kernel/uprobes.c
> +++ b/arch/x86/kernel/uprobes.c
> @@ -472,19 +472,18 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm,
>   */
>  int arch_uprobe_pre_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
>  {
> -	struct arch_uprobe_task *autask;
> +	struct uprobe_task *utask = current->utask;
>  
> -	autask = &current->utask->autask;
> -	autask->saved_trap_nr = current->thread.trap_nr;
> +	regs->ip = utask->xol_vaddr;
> +	utask->autask.saved_trap_nr = current->thread.trap_nr;
>  	current->thread.trap_nr = UPROBE_TRAP_NR;
> -	regs->ip = current->utask->xol_vaddr;
> -	pre_xol_rip_insn(auprobe, regs, autask);
>  
> -	autask->saved_tf = !!(regs->flags & X86_EFLAGS_TF);
> +	utask->autask.saved_tf = !!(regs->flags & X86_EFLAGS_TF);
>  	regs->flags |= X86_EFLAGS_TF;
>  	if (test_tsk_thread_flag(current, TIF_BLOCKSTEP))
>  		set_task_blockstep(current, false);
>  
> +	pre_xol_rip_insn(auprobe, regs, &utask->autask);
>  	return 0;
>  }
>  
> @@ -558,22 +557,13 @@ bool arch_uprobe_xol_was_trapped(struct task_struct *t)
>   */
>  int arch_uprobe_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
>  {
> -	struct uprobe_task *utask;
> +	struct uprobe_task *utask = current->utask;
>  	long correction;
>  	int result = 0;
>  
>  	WARN_ON_ONCE(current->thread.trap_nr != UPROBE_TRAP_NR);
>  
> -	utask = current->utask;
>  	current->thread.trap_nr = utask->autask.saved_trap_nr;
> -	correction = (long)(utask->vaddr - utask->xol_vaddr);
> -	handle_riprel_post_xol(auprobe, regs, &correction);
> -	if (auprobe->fixups & UPROBE_FIX_IP)
> -		regs->ip += correction;
> -
> -	if (auprobe->fixups & UPROBE_FIX_CALL)
> -		result = adjust_ret_addr(regs->sp, correction);
> -
>  	/*
>  	 * arch_uprobe_pre_xol() doesn't save the state of TIF_BLOCKSTEP
>  	 * so we can get an extra SIGTRAP if we do not clear TF. We need
> @@ -584,6 +574,14 @@ int arch_uprobe_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
>  	else if (!(auprobe->fixups & UPROBE_FIX_SETF))
>  		regs->flags &= ~X86_EFLAGS_TF;
>  
> +	correction = (long)(utask->vaddr - utask->xol_vaddr);
> +	handle_riprel_post_xol(auprobe, regs, &correction);
> +	if (auprobe->fixups & UPROBE_FIX_IP)
> +		regs->ip += correction;
> +
> +	if (auprobe->fixups & UPROBE_FIX_CALL)
> +		result = adjust_ret_addr(regs->sp, correction);
> +
>  	return result;
>  }
>  
> 


-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 7/7] uprobes/x86: Introduce uprobe_xol_ops and arch_uprobe->ops
  2014-03-31 19:44 ` [PATCH 7/7] uprobes/x86: Introduce uprobe_xol_ops and arch_uprobe->ops Oleg Nesterov
@ 2014-04-01  6:52   ` Masami Hiramatsu
  2014-04-01 14:44     ` Oleg Nesterov
  2014-04-01 15:01   ` Oleg Nesterov
  2014-04-02 19:46   ` Jim Keniston
  2 siblings, 1 reply; 34+ messages in thread
From: Masami Hiramatsu @ 2014-04-01  6:52 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Srikar Dronamraju, Ananth N Mavinakayanahalli,
	David Long, Denys Vlasenko, Frank Ch. Eigler, Jim Keniston,
	Jonathan Lebon, linux-kernel

(2014/04/01 4:44), Oleg Nesterov wrote:
> Introduce arch_uprobe->ops pointing to the "struct uprobe_xol_ops",
> move the current UPROBE_FIX_{RIP*,IP,CALL} code into the default
> set of methods and change arch_uprobe_pre/post_xol() accordingly.
> 
> This way we can add the new uprobe_xol_ops's to handle the insns
> which need the special processing (rip-relative jmp/call at least).
> 
> TODO: An error from adjust_ret_addr() shouldn't be silently ignored,
> we should teach arch_uprobe_post_xol() or handle_singlestep() paths
> to restart the probed insn in this case. And probably "adjust" can
> be simplified and turned into set_ret_addr(). It seems that we do
> not really need copy_from_user(), we can always calculate the value
> we need to write into *regs->sp.
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
>  arch/x86/include/asm/uprobes.h |    7 ++-
>  arch/x86/kernel/uprobes.c      |  107 +++++++++++++++++++++++++--------------
>  2 files changed, 74 insertions(+), 40 deletions(-)
> 
> diff --git a/arch/x86/include/asm/uprobes.h b/arch/x86/include/asm/uprobes.h
> index 3087ea9..9f8210b 100644
> --- a/arch/x86/include/asm/uprobes.h
> +++ b/arch/x86/include/asm/uprobes.h
> @@ -33,12 +33,17 @@ typedef u8 uprobe_opcode_t;
>  #define UPROBE_SWBP_INSN		0xcc
>  #define UPROBE_SWBP_INSN_SIZE		   1
>  
> +struct uprobe_xol_ops;
> +
>  struct arch_uprobe {
> -	u16				fixups;
>  	union {
>  		u8			insn[MAX_UINSN_BYTES];
>  		u8			ixol[MAX_UINSN_BYTES];
>  	};
> +
> +	u16				fixups;
> +	const struct uprobe_xol_ops	*ops;
> +
>  #ifdef CONFIG_X86_64
>  	unsigned long			rip_rela_target_address;
>  #endif
> diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> index 7dd65dc..a822de5 100644
> --- a/arch/x86/kernel/uprobes.c
> +++ b/arch/x86/kernel/uprobes.c
> @@ -402,6 +402,64 @@ static int validate_insn_bits(struct arch_uprobe *auprobe, struct mm_struct *mm,
>  }
>  #endif /* CONFIG_X86_64 */
>  
> +struct uprobe_xol_ops {
> +	bool	(*emulate)(struct arch_uprobe *, struct pt_regs *);
> +	int	(*pre_xol)(struct arch_uprobe *, struct pt_regs *);
> +	int	(*post_xol)(struct arch_uprobe *, struct pt_regs *);
> +};
> +
> +static int default_pre_xol_op(struct arch_uprobe *auprobe, struct pt_regs *regs)
> +{
> +	pre_xol_rip_insn(auprobe, regs, &current->utask->autask);
> +	return 0;
> +}
> +
> +/*
> + * Adjust the return address pushed by a call insn executed out of line.
> + */
> +static int adjust_ret_addr(unsigned long sp, long correction)
> +{
> +	int rasize, ncopied;
> +	long ra = 0;
> +
> +	if (is_ia32_task())
> +		rasize = 4;
> +	else
> +		rasize = 8;
> +
> +	ncopied = copy_from_user(&ra, (void __user *)sp, rasize);
> +	if (unlikely(ncopied))
> +		return -EFAULT;
> +
> +	ra += correction;
> +	ncopied = copy_to_user((void __user *)sp, &ra, rasize);
> +	if (unlikely(ncopied))
> +		return -EFAULT;
> +
> +	return 0;
> +}
> +
> +static int default_post_xol_op(struct arch_uprobe *auprobe, struct pt_regs *regs)
> +{
> +	struct uprobe_task *utask = current->utask;
> +	long correction = (long)(utask->vaddr - utask->xol_vaddr);
> +	int ret = 0;
> +
> +	handle_riprel_post_xol(auprobe, regs, &correction);
> +	if (auprobe->fixups & UPROBE_FIX_IP)
> +		regs->ip += correction;
> +
> +	if (auprobe->fixups & UPROBE_FIX_CALL)
> +		ret = adjust_ret_addr(regs->sp, correction);
> +
> +	return ret;
> +}
> +
> +static struct uprobe_xol_ops defaule_xop_ops = {
> +	.pre_xol  = default_pre_xol_op,
> +	.post_xol = default_post_xol_op,

If there is no ops->emulate, I think it should not be defined now.
You can add it when you really need it. :)

> +};
> +
>  /**
>   * arch_uprobe_analyze_insn - instruction analysis including validity and fixups.
>   * @mm: the probed address space.

Thank you,

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/7] uprobes/x86: Conditionalize the usage of handle_riprel_insn()
  2014-04-01  3:17   ` Masami Hiramatsu
@ 2014-04-01 14:33     ` Oleg Nesterov
  2014-04-01 16:39       ` Oleg Nesterov
  0 siblings, 1 reply; 34+ messages in thread
From: Oleg Nesterov @ 2014-04-01 14:33 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Ingo Molnar, Srikar Dronamraju, Ananth N Mavinakayanahalli,
	David Long, Denys Vlasenko, Frank Ch. Eigler, Jim Keniston,
	Jonathan Lebon, linux-kernel

On 04/01, Masami Hiramatsu wrote:
>
> (2014/04/01 4:44), Oleg Nesterov wrote:
> > arch_uprobe_analyze_insn() calls handle_riprel_insn() at the start,
> > but only "0xff" and "default" cases need the UPROBE_FIX_RIP_ logic.
> > Move the callsite into "default" case and change the "0xff" case to
> > fall-through.
> >
> > We are going to add the various hooks to handle the rip-relative
> > jmp/call instructions (and more), we need this change to enforce the
> > fact that the new code can't conflict with is_riprel_insn() code.
> >
> > Signed-off-by: Oleg Nesterov <oleg@redhat.com>
>
> Hmm, this seems not obviously reasonable at this point.
> However, the code itself is not wrong. Could you merge
> this change to that new hooks?

Good point, I'll send v2.

I'd still prefer to do this in a separate patch if you do not object,
and this patch should come after "add uprobe_xol_ops". In this case
it should be more clear why do we need this change, and the changelog
can tell more. Say, it can mention that otherwise uprobe_abort_xol()
can be confused until at least we add uprobe_xol_ops->abort().

And probably I'll add another patch into this series, "restart if
->post_xol() fails" (see "TODO:" in 7/7). We need this change anyway
to emulate the "call" insn. We could fix this before we add the hooks,
but after 7/7 the change will be more simple/clear.

Thanks!

Oleg.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 7/7] uprobes/x86: Introduce uprobe_xol_ops and arch_uprobe->ops
  2014-04-01  6:52   ` Masami Hiramatsu
@ 2014-04-01 14:44     ` Oleg Nesterov
  0 siblings, 0 replies; 34+ messages in thread
From: Oleg Nesterov @ 2014-04-01 14:44 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Ingo Molnar, Srikar Dronamraju, Ananth N Mavinakayanahalli,
	David Long, Denys Vlasenko, Frank Ch. Eigler, Jim Keniston,
	Jonathan Lebon, linux-kernel

On 04/01, Masami Hiramatsu wrote:
>
> (2014/04/01 4:44), Oleg Nesterov wrote:
> > +
> > +static struct uprobe_xol_ops defaule_xop_ops = {
> > +	.pre_xol  = default_pre_xol_op,
> > +	.post_xol = default_post_xol_op,
>
> If there is no ops->emulate, I think it should not be defined now.
> You can add it when you really need it. :)

Yes, but the next patch will need it. I do not really mind, but if you
do not object I'd prefer to define it in this patch (note that it also
changes skip_sstep() to check ops->emulate even if it has no users yet).
This way the next change will be a little bit simpler.

In the longer term we will (probably) need ops->abort as well, but currently
we can assume handle_riprel_post_xol() in arch_uprobe_abort_xol() can do
nothing wrong even if ->ops != &defaule_xol_ops (one of the reasons for 3/7).

Oleg.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 7/7] uprobes/x86: Introduce uprobe_xol_ops and arch_uprobe->ops
  2014-03-31 19:44 ` [PATCH 7/7] uprobes/x86: Introduce uprobe_xol_ops and arch_uprobe->ops Oleg Nesterov
  2014-04-01  6:52   ` Masami Hiramatsu
@ 2014-04-01 15:01   ` Oleg Nesterov
  2014-04-02 19:46   ` Jim Keniston
  2 siblings, 0 replies; 34+ messages in thread
From: Oleg Nesterov @ 2014-04-01 15:01 UTC (permalink / raw)
  To: Ingo Molnar, Srikar Dronamraju
  Cc: Ananth N Mavinakayanahalli, David Long, Denys Vlasenko,
	Frank Ch. Eigler, Jim Keniston, Jonathan Lebon, Masami Hiramatsu,
	linux-kernel

On 03/31, Oleg Nesterov wrote:
>
> +static struct uprobe_xol_ops defaule_xop_ops = {
> +	.pre_xol  = default_pre_xol_op,
> +	.post_xol = default_post_xol_op,
> +};

...

> +	auprobe->ops = &defaule_xop_ops;

Cough. At least the name is the same ;) But I obviously meant "default_xol_ops".
Another reason for v2.

Oleg.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/7] uprobes: Kill UPROBE_SKIP_SSTEP and can_skip_sstep()
  2014-03-31 19:43 ` [PATCH 1/7] uprobes: Kill UPROBE_SKIP_SSTEP and can_skip_sstep() Oleg Nesterov
  2014-04-01  1:41   ` Masami Hiramatsu
@ 2014-04-01 15:39   ` David Long
  2014-04-03 16:32   ` Srikar Dronamraju
  2 siblings, 0 replies; 34+ messages in thread
From: David Long @ 2014-04-01 15:39 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Srikar Dronamraju, Ananth N Mavinakayanahalli,
	Denys Vlasenko, Frank Ch. Eigler, Jim Keniston, Jonathan Lebon,
	Masami Hiramatsu, linux-kernel

On 03/31/14 15:43, Oleg Nesterov wrote:
> UPROBE_COPY_INSN, UPROBE_SKIP_SSTEP, and uprobe->flags must die. This
> patch kills UPROBE_SKIP_SSTEP. I never understood why it was added;
> not only it doesn't help, it harms.
>
> It can only help to avoid arch_uprobe_skip_sstep() if it was already
> called before and failed. But this is ugly, if we want to know whether
> we can emulate this instruction or not we should do this analysis in
> arch_uprobe_analyze_insn(), not when we hit this probe for the first
> time.
>
> And in fact this logic is simply wrong. arch_uprobe_skip_sstep() can
> fail or not depending on the task/register state, if this insn can be
> emulated but, say, put_user() fails we need to xol it this time, but
> this doesn't mean we shouldn't try to emulate it when this or another
> thread hist this bp next time.
>
> And this is the actual reason for this change. We need to emulate the
> "call" insn, but push(return-address) can obviously fail.
>
> Per-arch notes:
>
> 	x86: __skip_sstep() can only emulate "rep;nop". With this
> 	     change it will be called every time and most probably
> 	     for no reason.
>
> 	     This will be fixed by the next changes. We need to
> 	     change this suboptimal code anyway.
>
> 	arm: Should not be affected. It has its own "bool simulate"
> 	     flag checked in arch_uprobe_skip_sstep().
>
> 	ppc: Looks like, it can emulate almost everything. Does it
> 	     actually needs to record the fact that emulate_step()
> 	     failed? Hopefully not. But if yes, it can add the ppc-
> 	     specific flag into arch_uprobe.
>
> TODO: rename arch_uprobe_skip_sstep() to arch_uprobe_emulate_insn(),
>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
>   kernel/events/uprobes.c |   23 ++---------------------
>   1 files changed, 2 insertions(+), 21 deletions(-)
>
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 307d87c..7a3e14e 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -60,8 +60,6 @@ static struct percpu_rw_semaphore dup_mmap_sem;
>
>   /* Have a copy of original instruction */
>   #define UPROBE_COPY_INSN	0
> -/* Can skip singlestep */
> -#define UPROBE_SKIP_SSTEP	1
>
>   struct uprobe {
>   	struct rb_node		rb_node;	/* node in the rb tree */
> @@ -491,12 +489,9 @@ static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset)
>   	uprobe->offset = offset;
>   	init_rwsem(&uprobe->register_rwsem);
>   	init_rwsem(&uprobe->consumer_rwsem);
> -	/* For now assume that the instruction need not be single-stepped */
> -	__set_bit(UPROBE_SKIP_SSTEP, &uprobe->flags);
>
>   	/* add to uprobes_tree, sorted on inode:offset */
>   	cur_uprobe = insert_uprobe(uprobe);
> -
>   	/* a uprobe exists for this inode:offset combination */
>   	if (cur_uprobe) {
>   		kfree(uprobe);
> @@ -1628,20 +1623,6 @@ bool uprobe_deny_signal(void)
>   	return true;
>   }
>
> -/*
> - * Avoid singlestepping the original instruction if the original instruction
> - * is a NOP or can be emulated.
> - */
> -static bool can_skip_sstep(struct uprobe *uprobe, struct pt_regs *regs)
> -{
> -	if (test_bit(UPROBE_SKIP_SSTEP, &uprobe->flags)) {
> -		if (arch_uprobe_skip_sstep(&uprobe->arch, regs))
> -			return true;
> -		clear_bit(UPROBE_SKIP_SSTEP, &uprobe->flags);
> -	}
> -	return false;
> -}
> -
>   static void mmf_recalc_uprobes(struct mm_struct *mm)
>   {
>   	struct vm_area_struct *vma;
> @@ -1859,13 +1840,13 @@ static void handle_swbp(struct pt_regs *regs)
>   		goto out;
>
>   	handler_chain(uprobe, regs);
> -	if (can_skip_sstep(uprobe, regs))
> +	if (arch_uprobe_skip_sstep(&uprobe->arch, regs))
>   		goto out;
>
>   	if (!pre_ssout(uprobe, regs, bp_vaddr))
>   		return;
>
> -	/* can_skip_sstep() succeeded, or restart if can't singlestep */
> +	/* arch_uprobe_skip_sstep() succeeded, or restart if can't singlestep */
>   out:
>   	put_uprobe(uprobe);
>   }
>

This looks OK to me.  I've tested it with my ARM uprobes/kprobes patch 
and there are no regressions.

Reviewed-by: David A. Long <dave.long@linaro.org>


-dl


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/7] uprobes/x86: Conditionalize the usage of handle_riprel_insn()
  2014-04-01 14:33     ` Oleg Nesterov
@ 2014-04-01 16:39       ` Oleg Nesterov
  2014-04-02 17:57         ` Jim Keniston
  0 siblings, 1 reply; 34+ messages in thread
From: Oleg Nesterov @ 2014-04-01 16:39 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Ingo Molnar, Srikar Dronamraju, Ananth N Mavinakayanahalli,
	David Long, Denys Vlasenko, Frank Ch. Eigler, Jim Keniston,
	Jonathan Lebon, linux-kernel

On 04/01, Oleg Nesterov wrote:
>
> Good point, I'll send v2.

Masami, et al, I was going to send v2 plus the next short RFC series
which fixes the problem today, but it turns out that I have no time.
Will try to do this tomorrow.

So let me explain the problem, and how (I think) it should be solved.
Unfortunately, I do not even know the terminology, so firstly I have
to explain you the things I recently learned when I investigated the
bug report ;)

Lets only discuss the "call" instruction (the one which starts with
"e8" byte), because (compared to "jmp") the fix is more complicated.
This instruction simply does

	push rip
	rip += offset_encoded_in_insn	// ignoring the length of insn

To simplify, suppose that we probe such an insn at virtual address
0x1000 and offset_encoded_in_insn == 0x10.

When this task hits the probe, the kernel simply copies this (unmodified)
insn into xol area, and the task does single step. Lets suppose that
the virtual address of xol slot == 0x2000. This means that regs->ip
becomes 0x2000 + 0x10 = 0x2010, and we only need to adjust ->ip and
return adrress.

Note that the new ->ip == 0x2010 can point to nowhere, to the unmapped
area or it can point to the kernel memory. This still works because
the task doesn't even try to execute the next insn at this address.

But! this does NOT works if the new ->ip is non-canonical (not sure this
is the common term, I mean the hole starting from 0xffff800000000000,
which I didn't know about until recently ;). In this case CPU generates
#GP and do_general_protection() kills the task which tries to execute
this insn out-of-line.

The test-case I wrote to ensure:

	void probe_func(void), callee(void);

	int failed = 1;

	asm (
		".text\n"
		".align 4096\n"
		".globl probe_func\n"
		"probe_func:\n"
		"call callee\n"
		"ret"
	);

	/*
	 * This assumes that:
	 *
	 *	- &probe_func = 0x401000 + a_bit, aligned = 0x402000
	 *
	 *	- xol_vma->vm_start = TASK_SIZE_MAX - PAGE_SIZE = 0x7fffffffe000
	 *	  as xol_add_vma() asks; the 1st slot = 0x7fffffffe080
	 *
	 * so we can target the non-canonical address from xol_vma using
	 * the simple math below, 100 * 4096 is just the random offset
	 */
	asm (".org . + 0x800000000000 - 0x7fffffffe080 - 5 - 1  + 100 * 4096\n");

	void callee(void)
	{
		failed = 0;
	}

	int main(void)
	{
		probe_func();
		return failed;
	}

It dies if you probe "probe_func" (although this test-case is not very
reliable, randomize_va_space/etc can change the placement of xol area).

Now let me describe how I am going to fix this. Please let me know if
you think this can't work or you see the better solution.

To simplify, lets ignore the fact we have lib/insn.c. The first change
will add

	bool call_emulate_op(...)
	{
		// push return adress
		if (put_user(regs->ip + 5, (long __user *)(regs->sp - 8)))
			return false;

		regs->sp -= 8;
		regs->ip += 5 + *(s32*)(auprobe->insn + 1);
		return true;
	}

and change "case 0xe8" in arch_uprobe_analyze_insn() to setup
->ops = call_xol_ops.

But this is obviously incomplete because put_user() can fail. In this case
we could send a signal an restart, but this is not simple. We do not know
which signal (say, SIGSEGV or SIGBUS ?), we do not know what should we put
into siginfo, etc. And we do not want to emulate __do_page_fault ;)

So the 2nd patch will do the following:

	1. Add "long call_offset" into "struct arch_uprobe". (yes, we
	   should also add a "union" into arch_uprobe, but lets ignore
	   the details).

	2. change "case 0xe8" in arch_uprobe_analyze_insn() to also do

		s32 *offset = (void*)(auprobe->insn + 1);

		arch_uprobe->call_offset = *offset;
		/*
		 * Turn this insn into
		 *
		 *	1: call 1b;
		 *
		 * This is only needed to expand the stack if emulate
		 * fails. We do not turn it into, say, "pushf" because
		 * we do not want to change the 1st byte used by
		 * set_orig_insn().
		 *
		*offset = -5;

	3. Change call_emulate_op() to use arch_uprobe->call_offset

	4. Add

		//
		// In practice, this will be almost never called. put_user()
		// in call_emulate_op() failed, single-step can only succeed
		// if another thread expanded our stack.
		//
		int call_post_xol_op(...)
		{
			regs->sp += 8;
			// tell the caller to restart
			return -EAGAIN;
		}

Once again, if this can work we need more changes to handle jmp's/etc. But
lets discuss this later. I am thinking in horror about conditional jmp ;)
In fact this should be simple, just I do not know (yet) to parse such an
insn, and I simply do not know if lib/insn.c can help me to figure out which
flag in regs->flags ->emulate() should check.

So this all looks a bit complicated, but I do not see a simpler solution.
Well, we could avoid ->emulate() altogether, we could just mangle the
problematic instructions and complicate arch_uprobe_post_xol(), but I do
not think this would be better. And if nothing else, ->emulate is a) simple
and b) should likely succeed and make the probing faster.

But perhaps I missed something?

Oleg.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/7] uprobes/x86: Conditionalize the usage of handle_riprel_insn()
  2014-04-01 16:39       ` Oleg Nesterov
@ 2014-04-02 17:57         ` Jim Keniston
  2014-04-02 19:14           ` Oleg Nesterov
  0 siblings, 1 reply; 34+ messages in thread
From: Jim Keniston @ 2014-04-02 17:57 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Masami Hiramatsu, Ingo Molnar, Srikar Dronamraju,
	Ananth N Mavinakayanahalli, David Long, Denys Vlasenko,
	Frank Ch. Eigler, Jonathan Lebon, linux-kernel

On Tue, 2014-04-01 at 18:39 +0200, Oleg Nesterov wrote:
> On 04/01, Oleg Nesterov wrote:
> >
> > Good point, I'll send v2.
> 
> Masami, et al, I was going to send v2 plus the next short RFC series
> which fixes the problem today, but it turns out that I have no time.
> Will try to do this tomorrow.
> 
> So let me explain the problem, and how (I think) it should be solved.
> Unfortunately, I do not even know the terminology, so firstly I have
> to explain you the things I recently learned when I investigated the
> bug report ;)
> 
[problem description and proposed solution snipped]

Thanks for your work on this.  I think your analysis is correct.  As you
say, emulating calls is tricky because of the possibility that the call
will incur a page fault when it grows the stack.  Your best solution
might be to emulate jumps, but rewrite call instructions using a scratch
register, similar to how we handle rip-relative instructions.

> 
> Once again, if this can work we need more changes to handle jmp's/etc. But
> lets discuss this later. I am thinking in horror about conditional jmp ;)
> In fact this should be simple, just I do not know (yet) to parse such an
> insn, and I simply do not know if lib/insn.c can help me to figure out which
> flag in regs->flags ->emulate() should check.

Emulating jumps (including conditional jumps) shouldn't be all that much
code.  In case you haven't already found it, the "AMD64 Architecture
Programmer's Manual, Volume 3" provides the sort of info you need.
Looking at the "Jcc" section, I see dozens of names of conditional-jump
instructions; but they're all aliases for the same 16 variations, which
branch based on various combinations of 5 flags in regs->eflags (OF, CF,
ZF, SF, PF).

One thing about emulating jumps is that if the task has block stepping
enabled, then a trap is expected on every successful branch.  I'd have
to poke around a while to figure out how to know whether block stepping
is enabled.  set_task_blockstep() should point the way.

> 
> So this all looks a bit complicated, but I do not see a simpler solution.
> Well, we could avoid ->emulate() altogether, we could just mangle the
> problematic instructions and complicate arch_uprobe_post_xol(), but I do
> not think this would be better. And if nothing else, ->emulate is a) simple
> and b) should likely succeed and make the probing faster.
> 
> But perhaps I missed something?
> 
> Oleg.
> 

Jim Keniston
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/7] uprobes/x86: Conditionalize the usage of handle_riprel_insn()
  2014-04-02 17:57         ` Jim Keniston
@ 2014-04-02 19:14           ` Oleg Nesterov
  0 siblings, 0 replies; 34+ messages in thread
From: Oleg Nesterov @ 2014-04-02 19:14 UTC (permalink / raw)
  To: Jim Keniston
  Cc: Masami Hiramatsu, Ingo Molnar, Srikar Dronamraju,
	Ananth N Mavinakayanahalli, David Long, Denys Vlasenko,
	Frank Ch. Eigler, Jonathan Lebon, linux-kernel

On 04/02, Jim Keniston wrote:
>
> On Tue, 2014-04-01 at 18:39 +0200, Oleg Nesterov wrote:
>
> > So let me explain the problem, and how (I think) it should be solved.
> > Unfortunately, I do not even know the terminology, so firstly I have
> > to explain you the things I recently learned when I investigated the
> > bug report ;)
> >
> [problem description and proposed solution snipped]
>
> Thanks for your work on this.  I think your analysis is correct.

Great, thanks!

> As you
> say, emulating calls is tricky because of the possibility that the call
> will incur a page fault when it grows the stack.  Your best solution
> might be to emulate jumps,

Yes,

> but rewrite call instructions using a scratch
> register, similar to how we handle rip-relative instructions.

Yes, this is what I meant when I said that we can avoid ->emulate in
this case, mangle insn, and complicate post_xol(). But so far I do not
think this would be better.

OK. Let me actually finish amd send the fixes, then we can discuss this
again and see if another approach makes more sense.

Sorry, I was distracted again, so I need more time. Will try to send tomorrow.

> > Once again, if this can work we need more changes to handle jmp's/etc. But
> > lets discuss this later. I am thinking in horror about conditional jmp ;)
> > In fact this should be simple, just I do not know (yet) to parse such an
> > insn, and I simply do not know if lib/insn.c can help me to figure out which
> > flag in regs->flags ->emulate() should check.
>
> Emulating jumps (including conditional jumps) shouldn't be all that much
> code.  In case you haven't already found it, the "AMD64 Architecture
> Programmer's Manual, Volume 3" provides the sort of info you need.

Thanks. I'll try to read it, but most probably I'll come here with the
stupid questions anyway.

> One thing about emulating jumps is that if the task has block stepping
> enabled, then a trap is expected on every successful branch.

Yes, but probably we can do this later. Note that uprobes doesn't play
nice with TIF_BLOCKSTEP anyway, see the comment in arch_uprobe_post_xol:

	/*
	 * arch_uprobe_pre_xol() doesn't save the state of TIF_BLOCKSTEP
	 * so we can get an extra SIGTRAP if we do not clear TF. We need
	 * to examine the opcode to make it right.
	 */

So I think that at least the initial version can safely ignore this problem.

Oleg.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 7/7] uprobes/x86: Introduce uprobe_xol_ops and arch_uprobe->ops
  2014-03-31 19:44 ` [PATCH 7/7] uprobes/x86: Introduce uprobe_xol_ops and arch_uprobe->ops Oleg Nesterov
  2014-04-01  6:52   ` Masami Hiramatsu
  2014-04-01 15:01   ` Oleg Nesterov
@ 2014-04-02 19:46   ` Jim Keniston
  2014-04-03 19:49     ` Oleg Nesterov
  2 siblings, 1 reply; 34+ messages in thread
From: Jim Keniston @ 2014-04-02 19:46 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Srikar Dronamraju, Ananth N Mavinakayanahalli,
	David Long, Denys Vlasenko, Frank Ch. Eigler, Jonathan Lebon,
	Masami Hiramatsu, linux-kernel

On Mon, 2014-03-31 at 21:44 +0200, Oleg Nesterov wrote:
...
> +/*
> + * Adjust the return address pushed by a call insn executed out of line.
> + */
> +static int adjust_ret_addr(unsigned long sp, long correction)
> +{
> +	int rasize, ncopied;
> +	long ra = 0;
> +
> +	if (is_ia32_task())
> +		rasize = 4;
> +	else
> +		rasize = 8;
> +
> +	ncopied = copy_from_user(&ra, (void __user *)sp, rasize);
> +	if (unlikely(ncopied))
> +		return -EFAULT;
> +
> +	ra += correction;
> +	ncopied = copy_to_user((void __user *)sp, &ra, rasize);
> +	if (unlikely(ncopied))
> +		return -EFAULT;
> +
> +	return 0;
> +}

This isn't your bug, Oleg -- you're just moving code -- but consider
taking this opportunity to fix it...

"ncopied" is a misnomer here.  copy_from_user() and copy_to_user()
return the number of bytes that could NOT be copied.  Once upon a time
(in uprobes's pre-upstream days), this was called "nleft" -- i.e., the
number of bytes left uncopied.  A more accurate name like "nleft" or
"nmissed" or "nr_uncopied" might yield less confusion in the future --
or just dispense with the variable altogether.

arch_uretprobe_hijack_return_addr() has this same problem, although
there we need the variable, because if zero bytes of the return address
are overwritten, we can fail more gracefully.

Jim


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 5/7] uprobes/x86: Gather "riprel" functions together
  2014-03-31 19:44 ` [PATCH 5/7] uprobes/x86: Gather "riprel" functions together Oleg Nesterov
  2014-04-01  3:21   ` Masami Hiramatsu
@ 2014-04-02 19:53   ` Jim Keniston
  2014-04-03 19:50     ` Oleg Nesterov
  1 sibling, 1 reply; 34+ messages in thread
From: Jim Keniston @ 2014-04-02 19:53 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Srikar Dronamraju, Ananth N Mavinakayanahalli,
	David Long, Denys Vlasenko, Frank Ch. Eigler, Jonathan Lebon,
	Masami Hiramatsu, linux-kernel

On Mon, 2014-03-31 at 21:44 +0200, Oleg Nesterov wrote:
...
> +static void
> +handle_riprel_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs, long *correction)
> +{
> +	if (auprobe->fixups & (UPROBE_FIX_RIP_AX | UPROBE_FIX_RIP_CX)) {
> +		struct arch_uprobe_task *autask;
> +
> +		autask = &current->utask->autask;
> +		if (auprobe->fixups & UPROBE_FIX_RIP_AX)
> +			regs->ax = autask->saved_scratch_register;
> +		else
> +			regs->cx = autask->saved_scratch_register;
> +
> +		/*
> +		 * The original instruction includes a displacement, and so
> +		 * is 4 bytes longer than what we've just single-stepped.
> +		 * Fall through to handle stuff like "jmpq *...(%rip)" and
> +		 * "callq *...(%rip)".
> +		 */
> +		if (correction)
> +			*correction += 4;
> +	}
> +}

This is another case of a glitch in the code being moved.  Since this
code was moved to its own function, the "Fall through" comment makes no
sense.  Maybe
		 * Caller may need to apply other fixups to handle stuff
		 * like "jmpq *...(%rip)" and "callq *...(%rip)".

Jim


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/7] uprobes/x86: introduce uprobe_xol_ops and arch_uprobe->ops
  2014-03-31 19:43 [PATCH 0/7] uprobes/x86: introduce uprobe_xol_ops and arch_uprobe->ops Oleg Nesterov
                   ` (6 preceding siblings ...)
  2014-03-31 19:44 ` [PATCH 7/7] uprobes/x86: Introduce uprobe_xol_ops and arch_uprobe->ops Oleg Nesterov
@ 2014-04-02 19:58 ` Jim Keniston
  2014-04-03 20:00 ` [PATCH 8-9/7] " Oleg Nesterov
  8 siblings, 0 replies; 34+ messages in thread
From: Jim Keniston @ 2014-04-02 19:58 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Srikar Dronamraju, Ananth N Mavinakayanahalli,
	David Long, Denys Vlasenko, Frank Ch. Eigler, Jonathan Lebon,
	Masami Hiramatsu, linux-kernel

On Mon, 2014-03-31 at 21:43 +0200, Oleg Nesterov wrote:
> Hello.
> 
> x86 can not handle the rip-relative jmp/call instrsuctions, the probed
> task can be killed by general protection fault. I'll describe this in
> more details when I send the fixes. Now I am sending the preparations
> which (I hope) make sense anyway, please review.
> 
> My main concern is 3/7. I know absolutely nothing about instruction
> decoding, so I can only guess what, for example, OPCODE1() == 0xff or
> MODRM_REG() == 2 actually means. Please review.
> 
> Ananth, David, please ack/nack the first change, it affects powerpc/arm.
> 
> Oleg.
> 
>  arch/x86/include/asm/uprobes.h |    7 +-
>  arch/x86/kernel/uprobes.c      |  343 ++++++++++++++++++++--------------------
>  kernel/events/uprobes.c        |   23 +---
>  3 files changed, 176 insertions(+), 197 deletions(-)
> 

I've reviewed all 7 patches.  Aside from a couple of nits (noted
elsewhere) that Oleg inherited, it looks good so far.

Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/7] uprobes: Kill UPROBE_SKIP_SSTEP and can_skip_sstep()
  2014-03-31 19:43 ` [PATCH 1/7] uprobes: Kill UPROBE_SKIP_SSTEP and can_skip_sstep() Oleg Nesterov
  2014-04-01  1:41   ` Masami Hiramatsu
  2014-04-01 15:39   ` David Long
@ 2014-04-03 16:32   ` Srikar Dronamraju
  2 siblings, 0 replies; 34+ messages in thread
From: Srikar Dronamraju @ 2014-04-03 16:32 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Ananth N Mavinakayanahalli, David Long,
	Denys Vlasenko, Frank Ch. Eigler, Jim Keniston, Jonathan Lebon,
	Masami Hiramatsu, linux-kernel

> UPROBE_COPY_INSN, UPROBE_SKIP_SSTEP, and uprobe->flags must die. This
> patch kills UPROBE_SKIP_SSTEP. I never understood why it was added;
> not only it doesn't help, it harms.
> 
> It can only help to avoid arch_uprobe_skip_sstep() if it was already
> called before and failed. But this is ugly, if we want to know whether
> we can emulate this instruction or not we should do this analysis in
> arch_uprobe_analyze_insn(), not when we hit this probe for the first
> time.
> 
> And in fact this logic is simply wrong. arch_uprobe_skip_sstep() can
> fail or not depending on the task/register state, if this insn can be
> emulated but, say, put_user() fails we need to xol it this time, but
> this doesn't mean we shouldn't try to emulate it when this or another
> thread hist this bp next time.
> 
> And this is the actual reason for this change. We need to emulate the
> "call" insn, but push(return-address) can obviously fail.
> 
> Per-arch notes:
> 
> 	x86: __skip_sstep() can only emulate "rep;nop". With this
> 	     change it will be called every time and most probably
> 	     for no reason.
> 
> 	     This will be fixed by the next changes. We need to
> 	     change this suboptimal code anyway.
> 
> 	arm: Should not be affected. It has its own "bool simulate"
> 	     flag checked in arch_uprobe_skip_sstep().
> 
> 	ppc: Looks like, it can emulate almost everything. Does it
> 	     actually needs to record the fact that emulate_step()
> 	     failed? Hopefully not. But if yes, it can add the ppc-
> 	     specific flag into arch_uprobe.
> 
> TODO: rename arch_uprobe_skip_sstep() to arch_uprobe_emulate_insn(),
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

-- 
Thanks and Regards
Srikar Dronamraju


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/7] uprobes/x86: Fold prepare_fixups() into arch_uprobe_analyze_insn()
  2014-03-31 19:43 ` [PATCH 2/7] uprobes/x86: Fold prepare_fixups() into arch_uprobe_analyze_insn() Oleg Nesterov
  2014-04-01  2:00   ` Masami Hiramatsu
@ 2014-04-03 16:33   ` Srikar Dronamraju
  1 sibling, 0 replies; 34+ messages in thread
From: Srikar Dronamraju @ 2014-04-03 16:33 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Ananth N Mavinakayanahalli, David Long,
	Denys Vlasenko, Frank Ch. Eigler, Jim Keniston, Jonathan Lebon,
	Masami Hiramatsu, linux-kernel

* Oleg Nesterov <oleg@redhat.com> [2014-03-31 21:43:59]:

> No functional changes, preparation.
> 
> Shift the code from prepare_fixups() to arch_uprobe_analyze_insn()
> with the following modifications:
> 
> 	- Do not call insn_get_opcode() again, it was already called
> 	  by validate_insn_bits().
> 
> 	- Move "case 0xea" up. This way "case 0xff" can fall through
> 	  to default case.
> 
> 	- change "case 0xff" to use the nested "switch (MODRM_REG)",
> 	  this way the code looks a bit simpler.
> 
> 	- Make the comments look consistent.
> 
> While at it, kill the initialization of rip_rela_target_address and
> ->fixups, we can rely on kzalloc(). We will add the new members into
> arch_uprobe, it would be better to assume that everything is zero by
> default.
> 
> TODO: cleanup/fix the mess in validate_insn_bits() paths:
> 
> 	- validate_insn_64bits() and validate_insn_32bits() should be
> 	  unified.
> 
> 	- "ifdef" is not used consistently; if good_insns_64 depends
> 	  on CONFIG_X86_64, then probably good_insns_32 should depend
> 	  on CONFIG_X86_32/EMULATION
> 
> 	- the usage of mm->context.ia32_compat looks wrong if the task
> 	  is TIF_X32.
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
-- 
Thanks and Regards
Srikar Dronamraju


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/7] uprobes/x86: Kill the "ia32_compat" check in handle_riprel_insn(), remove "mm" arg
  2014-03-31 19:44 ` [PATCH 4/7] uprobes/x86: Kill the "ia32_compat" check in handle_riprel_insn(), remove "mm" arg Oleg Nesterov
  2014-04-01  3:10   ` Masami Hiramatsu
@ 2014-04-03 16:33   ` Srikar Dronamraju
  1 sibling, 0 replies; 34+ messages in thread
From: Srikar Dronamraju @ 2014-04-03 16:33 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Ananth N Mavinakayanahalli, David Long,
	Denys Vlasenko, Frank Ch. Eigler, Jim Keniston, Jonathan Lebon,
	Masami Hiramatsu, linux-kernel

* Oleg Nesterov <oleg@redhat.com> [2014-03-31 21:44:05]:

> Kill the "mm->context.ia32_compat" check in handle_riprel_insn(), if
> it is true insn_rip_relative() must return false. validate_insn_bits()
> passed "ia32_compat" as !x86_64 to insn_init(), and insn_rip_relative()
> checks insn->x86_64.
> 
> Also, remove the no longer needed "struct mm_struct *mm" argument and
> the unnecessary "return" at the end.
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
-- 
Thanks and Regards
Srikar Dronamraju


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 6/7] uprobes/x86: move the UPROBE_FIX_{RIP,IP,CALL} code at the end of pre/post hooks
  2014-03-31 19:44 ` [PATCH 6/7] uprobes/x86: move the UPROBE_FIX_{RIP,IP,CALL} code at the end of pre/post hooks Oleg Nesterov
  2014-04-01  3:24   ` Masami Hiramatsu
@ 2014-04-03 16:34   ` Srikar Dronamraju
  1 sibling, 0 replies; 34+ messages in thread
From: Srikar Dronamraju @ 2014-04-03 16:34 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Ananth N Mavinakayanahalli, David Long,
	Denys Vlasenko, Frank Ch. Eigler, Jim Keniston, Jonathan Lebon,
	Masami Hiramatsu, linux-kernel

* Oleg Nesterov <oleg@redhat.com> [2014-03-31 21:44:12]:

> No functional changes. Preparation to simplify the review of the next
> change. Just reorder the code in arch_uprobe_pre/post_xol() functions
> so that UPROBE_FIX_{RIP_*,IP,CALL} logic goes to the end.
> 
> Also change arch_uprobe_pre_xol() to use utask instead of autask, to
> make the code more symmetrical with arch_uprobe_post_xol().
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
-- 
Thanks and Regards
Srikar Dronamraju


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 7/7] uprobes/x86: Introduce uprobe_xol_ops and arch_uprobe->ops
  2014-04-02 19:46   ` Jim Keniston
@ 2014-04-03 19:49     ` Oleg Nesterov
  0 siblings, 0 replies; 34+ messages in thread
From: Oleg Nesterov @ 2014-04-03 19:49 UTC (permalink / raw)
  To: Jim Keniston
  Cc: Ingo Molnar, Srikar Dronamraju, Ananth N Mavinakayanahalli,
	David Long, Denys Vlasenko, Frank Ch. Eigler, Jonathan Lebon,
	Masami Hiramatsu, linux-kernel

On 04/02, Jim Keniston wrote:
>
> On Mon, 2014-03-31 at 21:44 +0200, Oleg Nesterov wrote:
> ...
> > +/*
> > + * Adjust the return address pushed by a call insn executed out of line.
> > + */
> > +static int adjust_ret_addr(unsigned long sp, long correction)
> > +{
> > +	int rasize, ncopied;
> > +	long ra = 0;
> > +
> > +	if (is_ia32_task())
> > +		rasize = 4;
> > +	else
> > +		rasize = 8;
> > +
> > +	ncopied = copy_from_user(&ra, (void __user *)sp, rasize);
> > +	if (unlikely(ncopied))
> > +		return -EFAULT;
> > +
> > +	ra += correction;
> > +	ncopied = copy_to_user((void __user *)sp, &ra, rasize);
> > +	if (unlikely(ncopied))
> > +		return -EFAULT;
> > +
> > +	return 0;
> > +}
>
> This isn't your bug, Oleg -- you're just moving code -- but consider
> taking this opportunity to fix it...
>
> "ncopied" is a misnomer here.  copy_from_user() and copy_to_user()
> return the number of bytes that could NOT be copied.

Yes, thanks. I'll try to cleanup this later. I am not sure yet, but
perhaps I will change adjust_ret_addr() and hijack_return_addr() to
use a couple of new get/put_user helpers, because ->call_emulate()
needs to check is_ia32_task() and write to *sp too.

Thanks.

Oleg.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 5/7] uprobes/x86: Gather "riprel" functions together
  2014-04-02 19:53   ` Jim Keniston
@ 2014-04-03 19:50     ` Oleg Nesterov
  0 siblings, 0 replies; 34+ messages in thread
From: Oleg Nesterov @ 2014-04-03 19:50 UTC (permalink / raw)
  To: Jim Keniston
  Cc: Ingo Molnar, Srikar Dronamraju, Ananth N Mavinakayanahalli,
	David Long, Denys Vlasenko, Frank Ch. Eigler, Jonathan Lebon,
	Masami Hiramatsu, linux-kernel

On 04/02, Jim Keniston wrote:
>
> On Mon, 2014-03-31 at 21:44 +0200, Oleg Nesterov wrote:
> ...
> > +static void
> > +handle_riprel_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs, long *correction)
> > +{
> > +	if (auprobe->fixups & (UPROBE_FIX_RIP_AX | UPROBE_FIX_RIP_CX)) {
> > +		struct arch_uprobe_task *autask;
> > +
> > +		autask = &current->utask->autask;
> > +		if (auprobe->fixups & UPROBE_FIX_RIP_AX)
> > +			regs->ax = autask->saved_scratch_register;
> > +		else
> > +			regs->cx = autask->saved_scratch_register;
> > +
> > +		/*
> > +		 * The original instruction includes a displacement, and so
> > +		 * is 4 bytes longer than what we've just single-stepped.
> > +		 * Fall through to handle stuff like "jmpq *...(%rip)" and
> > +		 * "callq *...(%rip)".
> > +		 */
> > +		if (correction)
> > +			*correction += 4;
> > +	}
> > +}
>
> This is another case of a glitch in the code being moved.  Since this
> code was moved to its own function, the "Fall through" comment makes no
> sense.  Maybe
> 		 * Caller may need to apply other fixups to handle stuff
> 		 * like "jmpq *...(%rip)" and "callq *...(%rip)".

Thanks. updated the comment in v2.

Oleg.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 8-9/7] uprobes/x86: introduce uprobe_xol_ops and arch_uprobe->ops
  2014-03-31 19:43 [PATCH 0/7] uprobes/x86: introduce uprobe_xol_ops and arch_uprobe->ops Oleg Nesterov
                   ` (7 preceding siblings ...)
  2014-04-02 19:58 ` [PATCH 0/7] uprobes/x86: introduce " Jim Keniston
@ 2014-04-03 20:00 ` Oleg Nesterov
  2014-04-03 20:00   ` [PATCH 8/7] uprobes/x86: Send SIGILL if arch_uprobe_post_xol() fails Oleg Nesterov
  2014-04-03 20:01   ` [PATCH 9/7] uprobes/x86: Teach arch_uprobe_post_xol() to restart if possible Oleg Nesterov
  8 siblings, 2 replies; 34+ messages in thread
From: Oleg Nesterov @ 2014-04-03 20:00 UTC (permalink / raw)
  To: Ingo Molnar, Srikar Dronamraju
  Cc: Ananth N Mavinakayanahalli, David Long, Denys Vlasenko,
	Frank Ch. Eigler, Jim Keniston, Jonathan Lebon, Masami Hiramatsu,
	linux-kernel

On 03/31, Oleg Nesterov wrote:
>
> when I send the fixes.

Damn. All I can say - sorry for delay. Still no fixes.

I only finished v2, but do not have time to add the acks I got and
resend. Apart from cosmetic changes the resulting code is the same
plus the new 2 patches I am sending in reply to this message.

Just in case, let me repeat. I really want to separate these preps
and the actual fixes. Because I believe that they make sense in any
case, no matter how we will actually fix the problem.

Oleg.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 8/7] uprobes/x86: Send SIGILL if arch_uprobe_post_xol() fails
  2014-04-03 20:00 ` [PATCH 8-9/7] " Oleg Nesterov
@ 2014-04-03 20:00   ` Oleg Nesterov
  2014-04-03 20:01   ` [PATCH 9/7] uprobes/x86: Teach arch_uprobe_post_xol() to restart if possible Oleg Nesterov
  1 sibling, 0 replies; 34+ messages in thread
From: Oleg Nesterov @ 2014-04-03 20:00 UTC (permalink / raw)
  To: Ingo Molnar, Srikar Dronamraju
  Cc: Ananth N Mavinakayanahalli, David Long, Denys Vlasenko,
	Frank Ch. Eigler, Jim Keniston, Jonathan Lebon, Masami Hiramatsu,
	linux-kernel

Currently the error from arch_uprobe_post_xol() is silently ignored.
This doesn't look good and this can lead to the hard-to-debug problems.

1. Change handle_singlestep() to loudly complain and send SIGILL.

   Note: this only affects x86, ppc/arm can't fail.

2. Change arch_uprobe_post_xol() to call arch_uprobe_abort_xol() and
   avoid TF games if it is going to return an error.

   This can help to to analyze the problem, if nothing else we should
   not report ->ip = xol_slot in the core-file.

   Note: this means that handle_riprel_post_xol() can be called twice,
   but this is fine because it idempotent.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 arch/x86/kernel/uprobes.c |   16 ++++++++++++----
 kernel/events/uprobes.c   |    8 +++++++-
 2 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 43df132..3bdfb6e 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -594,6 +594,15 @@ int arch_uprobe_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
 	struct uprobe_task *utask = current->utask;
 
 	WARN_ON_ONCE(current->thread.trap_nr != UPROBE_TRAP_NR);
+
+	if (auprobe->ops->post_xol) {
+		int err = auprobe->ops->post_xol(auprobe, regs);
+		if (err) {
+			arch_uprobe_abort_xol(auprobe, regs);
+			return err;
+		}
+	}
+
 	current->thread.trap_nr = utask->autask.saved_trap_nr;
 	/*
 	 * arch_uprobe_pre_xol() doesn't save the state of TIF_BLOCKSTEP
@@ -605,8 +614,6 @@ int arch_uprobe_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
 	else if (!(auprobe->fixups & UPROBE_FIX_SETF))
 		regs->flags &= ~X86_EFLAGS_TF;
 
-	if (auprobe->ops->post_xol)
-		return auprobe->ops->post_xol(auprobe, regs);
 	return 0;
 }
 
@@ -641,8 +648,9 @@ int arch_uprobe_exception_notify(struct notifier_block *self, unsigned long val,
 
 /*
  * This function gets called when XOL instruction either gets trapped or
- * the thread has a fatal signal, so reset the instruction pointer to its
- * probed address.
+ * the thread has a fatal signal, or if arch_uprobe_post_xol() failed.
+ * Reset the instruction pointer to its probed address for the potential
+ * restart or for post mortem analysis.
  */
 void arch_uprobe_abort_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
 {
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 7a3e14e..2adbc97 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1858,10 +1858,11 @@ out:
 static void handle_singlestep(struct uprobe_task *utask, struct pt_regs *regs)
 {
 	struct uprobe *uprobe;
+	int err = 0;
 
 	uprobe = utask->active_uprobe;
 	if (utask->state == UTASK_SSTEP_ACK)
-		arch_uprobe_post_xol(&uprobe->arch, regs);
+		err = arch_uprobe_post_xol(&uprobe->arch, regs);
 	else if (utask->state == UTASK_SSTEP_TRAPPED)
 		arch_uprobe_abort_xol(&uprobe->arch, regs);
 	else
@@ -1875,6 +1876,11 @@ static void handle_singlestep(struct uprobe_task *utask, struct pt_regs *regs)
 	spin_lock_irq(&current->sighand->siglock);
 	recalc_sigpending(); /* see uprobe_deny_signal() */
 	spin_unlock_irq(&current->sighand->siglock);
+
+	if (unlikely(err)) {
+		uprobe_warn(current, "execute the probed insn, sending SIGILL.");
+		force_sig_info(SIGILL, SEND_SIG_FORCED, current);
+	}
 }
 
 /*
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 9/7] uprobes/x86: Teach arch_uprobe_post_xol() to restart if possible
  2014-04-03 20:00 ` [PATCH 8-9/7] " Oleg Nesterov
  2014-04-03 20:00   ` [PATCH 8/7] uprobes/x86: Send SIGILL if arch_uprobe_post_xol() fails Oleg Nesterov
@ 2014-04-03 20:01   ` Oleg Nesterov
  1 sibling, 0 replies; 34+ messages in thread
From: Oleg Nesterov @ 2014-04-03 20:01 UTC (permalink / raw)
  To: Ingo Molnar, Srikar Dronamraju
  Cc: Ananth N Mavinakayanahalli, David Long, Denys Vlasenko,
	Frank Ch. Eigler, Jim Keniston, Jonathan Lebon, Masami Hiramatsu,
	linux-kernel

SIGILL after the failed arch_uprobe_post_xol() should only be used as
a last resort, we should try to restart the probed insn if possible.

Currently only adjust_ret_addr() can fail, and this can only happen if
another thread unmapped our stack after we executed "call" out-of-line.
Most probably the application if buggy, but even in this case it can
have a handler for SIGSEGV/etc. And in theory it can be even correct
and do something non-trivial with its memory.

Of course we can't restart unconditionally, so arch_uprobe_post_xol()
does this only if ->post_xol() returns -ERESTART even if currently this
is the only possible error.

Note: this is not "perfect" because handler_chain() we actually do not
want the extra handler_chain() after restart, but I think this is the
best solution we can realistically do without too much uglifications.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 arch/x86/kernel/uprobes.c |   15 +++++++++++----
 1 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 3bdfb6e..78c4bb3 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -443,16 +443,17 @@ static int default_post_xol_op(struct arch_uprobe *auprobe, struct pt_regs *regs
 {
 	struct uprobe_task *utask = current->utask;
 	long correction = (long)(utask->vaddr - utask->xol_vaddr);
-	int ret = 0;
 
 	handle_riprel_post_xol(auprobe, regs, &correction);
 	if (auprobe->fixups & UPROBE_FIX_IP)
 		regs->ip += correction;
 
-	if (auprobe->fixups & UPROBE_FIX_CALL)
-		ret = adjust_ret_addr(regs->sp, correction);
+	if (auprobe->fixups & UPROBE_FIX_CALL) {
+		if (adjust_ret_addr(regs->sp, correction))
+			return -ERESTART;
+	}
 
-	return ret;
+	return 0;
 }
 
 static struct uprobe_xol_ops default_xol_ops = {
@@ -599,6 +600,12 @@ int arch_uprobe_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
 		int err = auprobe->ops->post_xol(auprobe, regs);
 		if (err) {
 			arch_uprobe_abort_xol(auprobe, regs);
+			/*
+			 * Restart the probed insn. ->post_xol() must ensure
+			 * this is really possible if it returns -ERESTART.
+			 */
+			if (err == -ERESTART)
+				return 0;
 			return err;
 		}
 	}
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2014-04-03 20:25 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-31 19:43 [PATCH 0/7] uprobes/x86: introduce uprobe_xol_ops and arch_uprobe->ops Oleg Nesterov
2014-03-31 19:43 ` [PATCH 1/7] uprobes: Kill UPROBE_SKIP_SSTEP and can_skip_sstep() Oleg Nesterov
2014-04-01  1:41   ` Masami Hiramatsu
2014-04-01 15:39   ` David Long
2014-04-03 16:32   ` Srikar Dronamraju
2014-03-31 19:43 ` [PATCH 2/7] uprobes/x86: Fold prepare_fixups() into arch_uprobe_analyze_insn() Oleg Nesterov
2014-04-01  2:00   ` Masami Hiramatsu
2014-04-03 16:33   ` Srikar Dronamraju
2014-03-31 19:44 ` [PATCH 3/7] uprobes/x86: Conditionalize the usage of handle_riprel_insn() Oleg Nesterov
2014-04-01  3:17   ` Masami Hiramatsu
2014-04-01 14:33     ` Oleg Nesterov
2014-04-01 16:39       ` Oleg Nesterov
2014-04-02 17:57         ` Jim Keniston
2014-04-02 19:14           ` Oleg Nesterov
2014-03-31 19:44 ` [PATCH 4/7] uprobes/x86: Kill the "ia32_compat" check in handle_riprel_insn(), remove "mm" arg Oleg Nesterov
2014-04-01  3:10   ` Masami Hiramatsu
2014-04-03 16:33   ` Srikar Dronamraju
2014-03-31 19:44 ` [PATCH 5/7] uprobes/x86: Gather "riprel" functions together Oleg Nesterov
2014-04-01  3:21   ` Masami Hiramatsu
2014-04-02 19:53   ` Jim Keniston
2014-04-03 19:50     ` Oleg Nesterov
2014-03-31 19:44 ` [PATCH 6/7] uprobes/x86: move the UPROBE_FIX_{RIP,IP,CALL} code at the end of pre/post hooks Oleg Nesterov
2014-04-01  3:24   ` Masami Hiramatsu
2014-04-03 16:34   ` Srikar Dronamraju
2014-03-31 19:44 ` [PATCH 7/7] uprobes/x86: Introduce uprobe_xol_ops and arch_uprobe->ops Oleg Nesterov
2014-04-01  6:52   ` Masami Hiramatsu
2014-04-01 14:44     ` Oleg Nesterov
2014-04-01 15:01   ` Oleg Nesterov
2014-04-02 19:46   ` Jim Keniston
2014-04-03 19:49     ` Oleg Nesterov
2014-04-02 19:58 ` [PATCH 0/7] uprobes/x86: introduce " Jim Keniston
2014-04-03 20:00 ` [PATCH 8-9/7] " Oleg Nesterov
2014-04-03 20:00   ` [PATCH 8/7] uprobes/x86: Send SIGILL if arch_uprobe_post_xol() fails Oleg Nesterov
2014-04-03 20:01   ` [PATCH 9/7] uprobes/x86: Teach arch_uprobe_post_xol() to restart if possible Oleg Nesterov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.