All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86: ptrace: sign-extend eax with orig_eax>=0
@ 2009-09-23  0:46 Roland McGrath
  2009-09-23  1:31 ` Linus Torvalds
  0 siblings, 1 reply; 11+ messages in thread
From: Roland McGrath @ 2009-09-23  0:46 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: jan.kratochvil, Oleg Nesterov, x86, linux-kernel, Andrew Morton,
	Linus Torvalds

The 32-bit ptrace syscall on a 64-bit kernel (32-bit debugger on
32-bit task) behaves differently than a native 32-bit kernel.  When
setting a register state of orig_eax>=0 and eax=-ERESTART* when the
debugged task is NOT on its way out of a 32-bit syscall, the task will
fail to do the syscall restart logic that it should do.

Test case available at http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/tests/ptrace-tests/tests/erestartsys-trap.c?cvsroot=systemtap

This happens because the 32-bit ptrace syscall sets eax=0xffffffff
when it sets orig_eax>=0.  The resuming task will not sign-extend this
for the -ERESTART* check because TS_COMPAT is not set.  (So the task
thinks it is restarting after a 64-bit syscall, not a 32-bit one.)

The fix is to have 32-bit ptrace calls sign-extend eax when orig_eax>=0.
The long comment in the change explains the scenarios and caveats fully.

Reported-by: Jan.Kratochvil@redhat.com
Signed-off-by: Roland McGrath <roland@redhat.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
---
 arch/x86/kernel/ptrace.c |   55 +++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 54 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 8d7d5c9..ecb7a49 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -1120,7 +1120,6 @@ static int putreg32(struct task_struct *child, unsigned regno, u32 value)
 	R32(edi, di);
 	R32(esi, si);
 	R32(ebp, bp);
-	R32(eax, ax);
 	R32(eip, ip);
 	R32(esp, sp);
 
@@ -1130,6 +1129,60 @@ static int putreg32(struct task_struct *child, unsigned regno, u32 value)
 		 * causes (long)orig_ax < 0 tests to fire correctly.
 		 */
 		regs->orig_ax = (long) (s32) value;
+
+		/*
+		 * Whenever setting orig_eax to indicate a system call in
+		 * progress, make sure an eax value set by the debugger gets
+		 * sign-extended so that any ax == -ERESTART* tests fire
+		 * correctly.
+		 *
+		 * When those tests (in handle_signal) are done directly
+		 * after an actual 32-bit syscall, then TS_COMPAT is set and
+		 * so syscall_get_error() does sign-extension.  However, the
+		 * debugger sometimes saves that state and then restores it
+		 * later with the intent of picking up the old thread state
+		 * that can be about to do syscall restart.
+		 *
+		 * When it's a 32-bit debugger, that truncates ax to 32 bits.
+		 * If the debugger restores thread state and resumes after a
+		 * ptrace stop when the child was not doing a new syscall, it
+		 * will not have TS_COMPAT set to make syscall_get_error()
+		 * notice and do the sign-extension.
+		 *
+		 * We can't have syscall_get_error() always sign-extend,
+		 * since that's wrong for 64-bit syscalls.  We want it to
+		 * check TS_COMPAT rather than TIF_IA32 to avoid a false
+		 * positive in the oddball case of a 32-bit task doing a
+		 * syscall from a 64-bit code segment.  In the "restored
+		 * thread state" case, it has no way to know whether the
+		 * restored state refers to a 32-bit or 64-bit syscall.
+		 *
+		 * So we can't win 'em all.  We assume that if you are using
+		 * a 32-bit debugger, you don't really care about arcane
+		 * interference with a child trying to use 64-bit syscalls.
+		 * (Just use a 64-bit debugger on it instead!)  What we do
+		 * here makes a 32-bit debugger fiddling a 32-bit task
+		 * consistent with what happens on a native 32-bit kernel.
+		 *
+		 * NOTE!  Since we have no similar logic in putreg(), we
+		 * just expect a 64-bit debugger to save/restore the full
+		 * 64 bits.  If a 64-bit debugger were to treat a 32-bit
+		 * task differently and save/restore only 32 bits per
+		 * register, it would have to grok orig_eax >= 0 and know
+		 * to sign-extend its saved eax when setting it as 64 bits.
+		 */
+		if (regs->orig_ax >= 0)
+			regs->ax = (long) (s32) regs->ax;
+		break;
+
+	case offsetof(struct user32, regs.eax):
+		/*
+		 * As above, for either order of setting both ax and orig_ax.
+		 */
+		if (regs->orig_ax >= 0)
+			regs->ax = (long) (s32) value;
+		else
+			regs->ax = value;
 		break;
 
 	case offsetof(struct user32, regs.eflags):

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86: ptrace: sign-extend eax with orig_eax>=0
  2009-09-23  0:46 [PATCH] x86: ptrace: sign-extend eax with orig_eax>=0 Roland McGrath
@ 2009-09-23  1:31 ` Linus Torvalds
  2009-09-23  2:40   ` Roland McGrath
  0 siblings, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2009-09-23  1:31 UTC (permalink / raw)
  To: Roland McGrath
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, jan.kratochvil,
	Oleg Nesterov, x86, linux-kernel, Andrew Morton



On Tue, 22 Sep 2009, Roland McGrath wrote:
>
> The 32-bit ptrace syscall on a 64-bit kernel (32-bit debugger on
> 32-bit task) behaves differently than a native 32-bit kernel.  When
> setting a register state of orig_eax>=0 and eax=-ERESTART* when the
> debugged task is NOT on its way out of a 32-bit syscall, the task will
> fail to do the syscall restart logic that it should do.

Hmm. This really smells extremely hacky to me.

I see what you're doing, and I understand why, but it all makes me 
shudder. I think we already do the wrong thing for 'orig_ax', and that we 
should probably simply test just the low 32 bits of it (looks to be easily 
done by just making the return type of 'syscall_get_nr()' be 'int') rather 
than have that special "let's sign-extend orig_ax" code in ptrace.

I also get the feeling that the TS_COMPAT testing is just hacky to begin 
with. I think it was broken to do things that way, and I have this gut 
feel that we really should have hidden the "am I a 32-bit or 64-bit 
syscall" in that orig_ax field instead (ie make the 64-bit system calls 
set a bit or whatever). That's the field we've always used for system call 
flagging, and TS_COMPAT was broken.

In this example, the problem for ptrace seems to be that TS_COMPAT ends up 
being that "extra" bit of information that isn't visible in the register 
set.

But that's a bigger separate cleanup. In the meantime, I really get the 
feeling that your patch is nasty, and could be replaced with something 
like the following instead.. It just sets the TS_COMPAT bit if we use a 
32-bit interface to set the system call number to positive (ie "inside 
system call").

THE BELOW IS TOTALLY UNTESTED! I haven't really thought it through. I just 
reacted very negatively to your patch, and my gut feel is that the code is 
fundamentally doing something wrong. The below may not work, and may be 
seriously broken and miss the point, but I think it conceptually comes 
closer to how things _should_ work.

		Linus

---
 arch/x86/include/asm/syscall.h |    2 +-
 arch/x86/kernel/ptrace.c       |    8 +++++++-
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h
index d82f39b..f2a0631 100644
--- a/arch/x86/include/asm/syscall.h
+++ b/arch/x86/include/asm/syscall.h
@@ -16,7 +16,7 @@
 #include <linux/sched.h>
 #include <linux/err.h>
 
-static inline long syscall_get_nr(struct task_struct *task,
+static inline int syscall_get_nr(struct task_struct *task,
 				  struct pt_regs *regs)
 {
 	/*
diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 8d7d5c9..90b67db 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -1124,13 +1124,19 @@ static int putreg32(struct task_struct *child, unsigned regno, u32 value)
 	R32(eip, ip);
 	R32(esp, sp);
 
-	case offsetof(struct user32, regs.orig_eax):
+	case offsetof(struct user32, regs.orig_eax): {
+		struct thread_info *thread = task_thread_info(child);
 		/*
 		 * Sign-extend the value so that orig_eax = -1
 		 * causes (long)orig_ax < 0 tests to fire correctly.
 		 */
 		regs->orig_ax = (long) (s32) value;
+		if ((s32) value >= 0)
+			thread->status |= TS_COMPAT;
+		else
+			thread->status &= ~TS_COMPAT;
 		break;
+	}
 
 	case offsetof(struct user32, regs.eflags):
 		return set_flags(child, value);

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86: ptrace: sign-extend eax with orig_eax>=0
  2009-09-23  1:31 ` Linus Torvalds
@ 2009-09-23  2:40   ` Roland McGrath
  2009-09-23  6:18     ` [PATCH 0/4] x86: clean up orig_ax handling Roland McGrath
  0 siblings, 1 reply; 11+ messages in thread
From: Roland McGrath @ 2009-09-23  2:40 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, jan.kratochvil,
	Oleg Nesterov, x86, linux-kernel, Andrew Morton

> Hmm. This really smells extremely hacky to me.

Agreed.  Anything with a half-page comment about why it's two-thirds
right and only one-third wrong is not what we'd choose when we could
think of better options.

> I see what you're doing, and I understand why, but it all makes me 
> shudder. I think we already do the wrong thing for 'orig_ax', and that we 
> should probably simply test just the low 32 bits of it (looks to be easily 
> done by just making the return type of 'syscall_get_nr()' be 'int') rather 
> than have that special "let's sign-extend orig_ax" code in ptrace.

Yes, that seems mostly equivalent.  By having orig_ax sign-extended in
ptrace unconditionally (i.e. even for 64-bit tasks), you've already said
that the semantics of orig_ax are that you only get 32 bits of meaningful
value in there.  In fact, that's what you said explicitly when we discussed
that change.  There is no reason for syscall_get_nr() not to return int.

I said "mostly equivalent".  The only difference I can see is that
today 64-bit ptrace callers (and core dumps, etc.), can see the high
bits of orig_ax and probably expect to see 64 1 bits in the "no
syscall" case.  So if we punt all the existing sign-extension and say
that orig_ax has only 32 meaningful bits internally in all uses, then
the ABI-preserving change is to make getreg() always sign-extend the
32-bit orig_ax value so -1 is -1LL in userland.

I wonder if you want to say the same thing about all machines generally,
that the syscall number can only have 32 meaningful bits.  That seems
reasonable to me.  Then it would be appropriate to change the type in
asm-generic/syscall.h; that file is never really used, but it sets the
function signatures that other arch's files are written to match.

> I also get the feeling that the TS_COMPAT testing is just hacky to begin 
> with. I think it was broken to do things that way, and I have this gut 
> feel that we really should have hidden the "am I a 32-bit or 64-bit 
> syscall" in that orig_ax field instead (ie make the 64-bit system calls 
> set a bit or whatever). That's the field we've always used for system call 
> flagging, and TS_COMPAT was broken.

On the whole I agree.  hpa and I talked about this very idea last
year.  We never got around to doing anything about it, since the
previous hack at the time had fixed the known bug.  Doing this
internally is probably pretty easy and is nice for all the other cases
with TS_COMPAT checks now.  Unfortunately, it's a userland ABI issue
to change this such that any new special meaning of some orig_ax bits
can be seen by userland so as to help this latest case.

What hpa and I discussed in fact was more than just a 32/64 flag, but
a complete "which entry path" indicator.  i.e. int80 vs sysenter vs
AMD 32-bit "syscall" (if we ever supported that) vs 64-bit "syscall"
vs non-syscall (exception/interrupt) entries.  That would have a bit
or two of information even for 32-bit.  (We figured that limiting an
actual syscall # to 24 bits or so would be fine.)  But the utility of
distinguishing those paths (aside from 32 vs 64 and syscall vs not) is
purely theoretical, which is to say I don't think we even have any
contrived idea what you'd use it for.

> In this example, the problem for ptrace seems to be that TS_COMPAT ends up 
> being that "extra" bit of information that isn't visible in the register 
> set.

Agreed.  All else being equal, I would prefer to have this bit appear in
the regset layout somewhere.  Unfortunately I don't see how we can both
make userland implicitly win by saving and restoring it blindly, and be
ABI-compatible with existing userland that knows the orig_ax meaning today.

> But that's a bigger separate cleanup. In the meantime, I really get the 
> feeling that your patch is nasty, and could be replaced with something 
> like the following instead.. It just sets the TS_COMPAT bit if we use a 
> 32-bit interface to set the system call number to positive (ie "inside 
> system call").

I think I may have been tempted toward that but didn't really consider it.
The only real reason is that TS_* bits are "thread-synchronous" and I
thought there was no precedent for touching them on any task != current.
(Given I understand how ptrace is the special situation where it could be
safe.)  But, in fact, ptrace can already fiddle TS_USEDFPU.  So it's no
worse than that, anyway.  So it's better than bad, it's good!

> THE BELOW IS TOTALLY UNTESTED! I haven't really thought it through. I just 
> reacted very negatively to your patch, and my gut feel is that the code is 
> fundamentally doing something wrong. The below may not work, and may be 
> seriously broken and miss the point, but I think it conceptually comes 
> closer to how things _should_ work.

I agree.


Thanks,
Roland

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 0/4] x86: clean up orig_ax handling
  2009-09-23  2:40   ` Roland McGrath
@ 2009-09-23  6:18     ` Roland McGrath
  2009-09-23  6:19       ` [PATCH 1/4] asm-generic: syscall_get_nr returns int Roland McGrath
                         ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Roland McGrath @ 2009-09-23  6:18 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Linus Torvalds, jan.kratochvil, Oleg Nesterov, x86, linux-kernel,
	Andrew Morton

Here is the aforementioned other tack on this.  

I said earlier that getreg() (i.e. 64-bit ptrace/core-dump fetches)
should sign-extend the low 32 bits of orig_ax up.  But I've changed my
mind.  It's true that today you can store 0xffffffff via either 64-bit
ptrace or 32-bit ptrace and then read back -1 via 64-bit ptrace.  (This
wasn't always so, and so we can hope that no debugger really depends on
it.)  What seems more important is that tracing and core dumps correctly
show the full orig_ax value incoming in %rax from userland, since
%rax=__NR_foo|(1UL<<32) behaves differently (i.e. -ENOSYS) than
%eax=_NR_foo in actual fact when user-mode does "syscall" with those
values.  In a bogon case like that, you would like to have traces/dumps
tell you why the task is not making a proper syscall rather than lie
about what register bits it entered the kernel with.

Patches 1-3 change no ptrace-tests outcomes, i.e. don't regress on the
test cases that went with the original sign-extension changes.  They
reintroduce e.g. the ability to blindly read and write back the whole
regset when at syscall-entry tracing with %rax=__NR_foo|(1UL<<32) and
have that fail with -ENOSYS as it would without tracing rather than
perturb the tracee to call sys_foo instead.  (Not that this is useful.)

Patch 4 does Linus's fix for the outstanding bug.  I've verified it works.


Thanks,
Roland
---
The following changes since commit 7fa07729e439a6184bd824746d06a49cca553f15:
  Linus Torvalds (1):
        Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/.../tip/linux-2.6-tip

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-roland.git x86/orig_ax

Roland McGrath (4):
      asm-generic: syscall_get_nr returns int
      x86: syscall_get_nr returns int
      x86: ptrace: do not sign-extend orig_ax on write
      x86: ptrace: set TS_COMPAT when 32-bit ptrace sets orig_eax>=0

 arch/x86/include/asm/syscall.h |   14 +++++++-------
 arch/x86/kernel/ptrace.c       |   21 ++++++++-------------
 include/asm-generic/syscall.h  |    8 ++++++--
 3 files changed, 21 insertions(+), 22 deletions(-)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/4] asm-generic: syscall_get_nr returns int
  2009-09-23  6:18     ` [PATCH 0/4] x86: clean up orig_ax handling Roland McGrath
@ 2009-09-23  6:19       ` Roland McGrath
  2009-09-23  6:20       ` [PATCH 2/4] x86: " Roland McGrath
                         ` (3 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Roland McGrath @ 2009-09-23  6:19 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Linus Torvalds, jan.kratochvil, Oleg Nesterov, x86, linux-kernel,
	Andrew Morton

Only 32 bits of system call number are meaningful, so make the
specification for syscall_get_nr() be to return int, not long.

Signed-off-by: Roland McGrath <roland@redhat.com>
---
 include/asm-generic/syscall.h |    8 ++++++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/asm-generic/syscall.h b/include/asm-generic/syscall.h
index ea8087b..5c122ae 100644
--- a/include/asm-generic/syscall.h
+++ b/include/asm-generic/syscall.h
@@ -1,7 +1,7 @@
 /*
  * Access to user system call parameters and results
  *
- * Copyright (C) 2008 Red Hat, Inc.  All rights reserved.
+ * Copyright (C) 2008-2009 Red Hat, Inc.  All rights reserved.
  *
  * This copyrighted material is made available to anyone wishing to use,
  * modify, copy, or redistribute it subject to the terms and conditions
@@ -32,9 +32,13 @@ struct pt_regs;
  * If @task is not executing a system call, i.e. it's blocked
  * inside the kernel for a fault or signal, returns -1.
  *
+ * Note this returns int even on 64-bit machines.  Only 32 bits of
+ * system call number can be meaningful.  If the actual arch value
+ * is 64 bits, this truncates to 32 bits so 0xffffffff means -1.
+ *
  * It's only valid to call this when @task is known to be blocked.
  */
-long syscall_get_nr(struct task_struct *task, struct pt_regs *regs);
+int syscall_get_nr(struct task_struct *task, struct pt_regs *regs);
 
 /**
  * syscall_rollback - roll back registers after an aborted system call

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/4] x86: syscall_get_nr returns int
  2009-09-23  6:18     ` [PATCH 0/4] x86: clean up orig_ax handling Roland McGrath
  2009-09-23  6:19       ` [PATCH 1/4] asm-generic: syscall_get_nr returns int Roland McGrath
@ 2009-09-23  6:20       ` Roland McGrath
  2009-09-23  6:20       ` [PATCH 3/4] x86: ptrace: do not sign-extend orig_ax on write Roland McGrath
                         ` (2 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Roland McGrath @ 2009-09-23  6:20 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Linus Torvalds, jan.kratochvil, Oleg Nesterov, x86, linux-kernel,
	Andrew Morton

Make syscall_get_nr() return int, so we always sign-extend
the low 32 bits of orig_ax in checks.

Signed-off-by: Roland McGrath <roland@redhat.com>
---
 arch/x86/include/asm/syscall.h |   14 +++++++-------
 1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h
index d82f39b..8d33bc5 100644
--- a/arch/x86/include/asm/syscall.h
+++ b/arch/x86/include/asm/syscall.h
@@ -1,7 +1,7 @@
 /*
  * Access to user system call parameters and results
  *
- * Copyright (C) 2008 Red Hat, Inc.  All rights reserved.
+ * Copyright (C) 2008-2009 Red Hat, Inc.  All rights reserved.
  *
  * This copyrighted material is made available to anyone wishing to use,
  * modify, copy, or redistribute it subject to the terms and conditions
@@ -16,13 +16,13 @@
 #include <linux/sched.h>
 #include <linux/err.h>
 
-static inline long syscall_get_nr(struct task_struct *task,
-				  struct pt_regs *regs)
+/*
+ * Only the low 32 bits of orig_ax are meaningful, so we return int.
+ * This importantly ignores the high bits on 64-bit, so comparisons
+ * sign-extend the low 32 bits.
+ */
+static inline int syscall_get_nr(struct task_struct *task, struct pt_regs *regs)
 {
-	/*
-	 * We always sign-extend a -1 value being set here,
-	 * so this is always either -1L or a syscall number.
-	 */
 	return regs->orig_ax;
 }
 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/4] x86: ptrace: do not sign-extend orig_ax on write
  2009-09-23  6:18     ` [PATCH 0/4] x86: clean up orig_ax handling Roland McGrath
  2009-09-23  6:19       ` [PATCH 1/4] asm-generic: syscall_get_nr returns int Roland McGrath
  2009-09-23  6:20       ` [PATCH 2/4] x86: " Roland McGrath
@ 2009-09-23  6:20       ` Roland McGrath
  2009-09-23  6:21       ` [PATCH 4/4] x86: ptrace: set TS_COMPAT when 32-bit ptrace sets orig_eax>=0 Roland McGrath
  2009-09-23 15:41       ` [PATCH 0/4] x86: clean up orig_ax handling Ingo Molnar
  4 siblings, 0 replies; 11+ messages in thread
From: Roland McGrath @ 2009-09-23  6:20 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Linus Torvalds, jan.kratochvil, Oleg Nesterov, x86, linux-kernel,
	Andrew Morton

The high 32 bits of orig_ax will be ignored when it matters,
so don't fiddle them when setting it.

Signed-off-by: Roland McGrath <roland@redhat.com>
---
 arch/x86/kernel/ptrace.c |   19 +------------------
 1 files changed, 1 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 8d7d5c9..52222fa 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -325,16 +325,6 @@ static int putreg(struct task_struct *child,
 		return set_flags(child, value);
 
 #ifdef CONFIG_X86_64
-	/*
-	 * Orig_ax is really just a flag with small positive and
-	 * negative values, so make sure to always sign-extend it
-	 * from 32 bits so that it works correctly regardless of
-	 * whether we come from a 32-bit environment or not.
-	 */
-	case offsetof(struct user_regs_struct, orig_ax):
-		value = (long) (s32) value;
-		break;
-
 	case offsetof(struct user_regs_struct,fs_base):
 		if (value >= TASK_SIZE_OF(child))
 			return -EIO;
@@ -1121,17 +1111,10 @@ static int putreg32(struct task_struct *child, unsigned regno, u32 value)
 	R32(esi, si);
 	R32(ebp, bp);
 	R32(eax, ax);
+	R32(orig_eax, orig_ax);
 	R32(eip, ip);
 	R32(esp, sp);
 
-	case offsetof(struct user32, regs.orig_eax):
-		/*
-		 * Sign-extend the value so that orig_eax = -1
-		 * causes (long)orig_ax < 0 tests to fire correctly.
-		 */
-		regs->orig_ax = (long) (s32) value;
-		break;
-
 	case offsetof(struct user32, regs.eflags):
 		return set_flags(child, value);
 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/4] x86: ptrace: set TS_COMPAT when 32-bit ptrace sets orig_eax>=0
  2009-09-23  6:18     ` [PATCH 0/4] x86: clean up orig_ax handling Roland McGrath
                         ` (2 preceding siblings ...)
  2009-09-23  6:20       ` [PATCH 3/4] x86: ptrace: do not sign-extend orig_ax on write Roland McGrath
@ 2009-09-23  6:21       ` Roland McGrath
  2009-09-23 15:41       ` [PATCH 0/4] x86: clean up orig_ax handling Ingo Molnar
  4 siblings, 0 replies; 11+ messages in thread
From: Roland McGrath @ 2009-09-23  6:21 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Linus Torvalds, jan.kratochvil, Oleg Nesterov, x86, linux-kernel,
	Andrew Morton

The 32-bit ptrace syscall on a 64-bit kernel (32-bit debugger on
32-bit task) behaves differently than a native 32-bit kernel.  When
setting a register state of orig_eax>=0 and eax=-ERESTART* when the
debugged task is NOT on its way out of a 32-bit syscall, the task will
fail to do the syscall restart logic that it should do.

Test case available at http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/tests/ptrace-tests/tests/erestartsys-trap.c?cvsroot=systemtap

This happens because the 32-bit ptrace syscall sets eax=0xffffffff
when it sets orig_eax>=0.  The resuming task will not sign-extend this
for the -ERESTART* check because TS_COMPAT is not set.  (So the task
thinks it is restarting after a 64-bit syscall, not a 32-bit one.)

The fix is to have 32-bit ptrace calls set TS_COMPAT when setting
orig_eax>=0.  This ensures that the 32-bit syscall restart logic
will apply when the child resumes.

Signed-off-by: Roland McGrath <roland@redhat.com>
---
 arch/x86/kernel/ptrace.c |   14 +++++++++++++-
 1 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 52222fa..7b058a2 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -1111,10 +1111,22 @@ static int putreg32(struct task_struct *child, unsigned regno, u32 value)
 	R32(esi, si);
 	R32(ebp, bp);
 	R32(eax, ax);
-	R32(orig_eax, orig_ax);
 	R32(eip, ip);
 	R32(esp, sp);
 
+	case offsetof(struct user32, regs.orig_eax):
+		/*
+		 * A 32-bit debugger setting orig_eax means to restore
+		 * the state of the task restarting a 32-bit syscall.
+		 * Make sure we interpret the -ERESTART* codes correctly
+		 * in case the task is not actually still sitting at the
+		 * exit from a 32-bit syscall with TS_COMPAT still set.
+		 */
+		regs->orig_ax = value;
+		if (syscall_get_nr(child, regs) >= 0)
+			task_thread_info(child)->status |= TS_COMPAT;
+		break;
+
 	case offsetof(struct user32, regs.eflags):
 		return set_flags(child, value);
 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/4] x86: clean up orig_ax handling
  2009-09-23  6:18     ` [PATCH 0/4] x86: clean up orig_ax handling Roland McGrath
                         ` (3 preceding siblings ...)
  2009-09-23  6:21       ` [PATCH 4/4] x86: ptrace: set TS_COMPAT when 32-bit ptrace sets orig_eax>=0 Roland McGrath
@ 2009-09-23 15:41       ` Ingo Molnar
  2009-09-23 15:53         ` Linus Torvalds
  4 siblings, 1 reply; 11+ messages in thread
From: Ingo Molnar @ 2009-09-23 15:41 UTC (permalink / raw)
  To: Roland McGrath
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Linus Torvalds,
	jan.kratochvil, Oleg Nesterov, x86, linux-kernel, Andrew Morton


* Roland McGrath <roland@redhat.com> wrote:

> Here is the aforementioned other tack on this.  
> 
> I said earlier that getreg() (i.e. 64-bit ptrace/core-dump fetches)
> should sign-extend the low 32 bits of orig_ax up.  But I've changed my
> mind.  It's true that today you can store 0xffffffff via either 64-bit
> ptrace or 32-bit ptrace and then read back -1 via 64-bit ptrace.  (This
> wasn't always so, and so we can hope that no debugger really depends on
> it.)  What seems more important is that tracing and core dumps correctly
> show the full orig_ax value incoming in %rax from userland, since
> %rax=__NR_foo|(1UL<<32) behaves differently (i.e. -ENOSYS) than
> %eax=_NR_foo in actual fact when user-mode does "syscall" with those
> values.  In a bogon case like that, you would like to have traces/dumps
> tell you why the task is not making a proper syscall rather than lie
> about what register bits it entered the kernel with.
> 
> Patches 1-3 change no ptrace-tests outcomes, i.e. don't regress on the
> test cases that went with the original sign-extension changes.  They
> reintroduce e.g. the ability to blindly read and write back the whole
> regset when at syscall-entry tracing with %rax=__NR_foo|(1UL<<32) and
> have that fail with -ENOSYS as it would without tracing rather than
> perturb the tracee to call sys_foo instead.  (Not that this is useful.)
> 
> Patch 4 does Linus's fix for the outstanding bug.  I've verified it works.
> 
> 
> Thanks,
> Roland
> ---
> The following changes since commit 7fa07729e439a6184bd824746d06a49cca553f15:
>   Linus Torvalds (1):
>         Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/.../tip/linux-2.6-tip
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-roland.git x86/orig_ax
> 
> Roland McGrath (4):
>       asm-generic: syscall_get_nr returns int
>       x86: syscall_get_nr returns int
>       x86: ptrace: do not sign-extend orig_ax on write
>       x86: ptrace: set TS_COMPAT when 32-bit ptrace sets orig_eax>=0
> 
>  arch/x86/include/asm/syscall.h |   14 +++++++-------
>  arch/x86/kernel/ptrace.c       |   21 ++++++++-------------
>  include/asm-generic/syscall.h  |    8 ++++++--
>  3 files changed, 21 insertions(+), 22 deletions(-)

Linus, this series looks good to me. Do you want to pull this directly 
or should we test this for a few days in the x86 tree? (either solution 
is fine to me)

	Ingo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/4] x86: clean up orig_ax handling
  2009-09-23 15:41       ` [PATCH 0/4] x86: clean up orig_ax handling Ingo Molnar
@ 2009-09-23 15:53         ` Linus Torvalds
  2009-09-23 16:21           ` Ingo Molnar
  0 siblings, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2009-09-23 15:53 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Roland McGrath, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	jan.kratochvil, Oleg Nesterov, x86, linux-kernel, Andrew Morton



On Wed, 23 Sep 2009, Ingo Molnar wrote:
> 
> Linus, this series looks good to me. Do you want to pull this directly 
> or should we test this for a few days in the x86 tree? (either solution 
> is fine to me)

I already pulled it, since I'm used to pulling ptrace fixes from Roland,

		Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/4] x86: clean up orig_ax handling
  2009-09-23 15:53         ` Linus Torvalds
@ 2009-09-23 16:21           ` Ingo Molnar
  0 siblings, 0 replies; 11+ messages in thread
From: Ingo Molnar @ 2009-09-23 16:21 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Roland McGrath, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	jan.kratochvil, Oleg Nesterov, x86, linux-kernel, Andrew Morton


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Wed, 23 Sep 2009, Ingo Molnar wrote:
> > 
> > Linus, this series looks good to me. Do you want to pull this 
> > directly or should we test this for a few days in the x86 tree? 
> > (either solution is fine to me)
> 
> I already pulled it, since I'm used to pulling ptrace fixes from 
> Roland,

Ok, thanks!

	Ingo

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2009-09-23 16:22 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-23  0:46 [PATCH] x86: ptrace: sign-extend eax with orig_eax>=0 Roland McGrath
2009-09-23  1:31 ` Linus Torvalds
2009-09-23  2:40   ` Roland McGrath
2009-09-23  6:18     ` [PATCH 0/4] x86: clean up orig_ax handling Roland McGrath
2009-09-23  6:19       ` [PATCH 1/4] asm-generic: syscall_get_nr returns int Roland McGrath
2009-09-23  6:20       ` [PATCH 2/4] x86: " Roland McGrath
2009-09-23  6:20       ` [PATCH 3/4] x86: ptrace: do not sign-extend orig_ax on write Roland McGrath
2009-09-23  6:21       ` [PATCH 4/4] x86: ptrace: set TS_COMPAT when 32-bit ptrace sets orig_eax>=0 Roland McGrath
2009-09-23 15:41       ` [PATCH 0/4] x86: clean up orig_ax handling Ingo Molnar
2009-09-23 15:53         ` Linus Torvalds
2009-09-23 16:21           ` Ingo Molnar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.