linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86/ptrace: Remove questionable TS_COMPAT usage in ptrace
@ 2016-06-18 10:21 Andy Lutomirski
  2016-06-18 13:55 ` Pedro Alves
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Andy Lutomirski @ 2016-06-18 10:21 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Kees Cook, Borislav Petkov, Oleg Nesterov, Andy Lutomirski

A 32-bit tracer can set a tracee's TS_COMPAT flag by poking orig_ax.

- If the tracee is stopped in a 32-bit syscall, this has no effect
  as TS_COMPAT will already be set.

- If the tracee is stopped on entry to a 64-bit syscall, this could
  cause problems: in_compat_syscall() etc will be out of sync with
  the actual syscall table in use.  I can imagine this bre aking
  audit.  (It can't meaningfully break seccomp, as a malicious
  tracer can already fully bypass seccomp.)  I could also imagine it
  subtly confusing the implementations of a few syscalls.

 - If the tracee is stopped in a non-syscall context, then TS_COMPAT
   will end up set in user mode, which isn't supposed to happen.  The results
   are likely to be similar to the 64-bit syscall entry case.

Fix it by deleting the code.

Here's my understanding of the previous intent.  Suppose a
32-bit tracee makes a syscall that returns one of the -ERESTART
codes.  A 32-bit tracer saves away its register state.  The tracee
resumes, returns from the system call, and gets stopped again for a
non-syscall reason (e.g. a signal).  Then the tracer tries to roll
back the tracee's state by writing all of the saved registers back.

The result of this sequence of events will be that the tracee's
registers' low bits match what they were at the end of the syscall
but TS_COMPAT will be clear.  This will cause syscall_get_error() to
return a *positive* number, because we zero-extend registers poked
by 32-bit tracers instead of sign-extending them.  As a result, with
this change, syscall restart won't happen, whereas, historically, it
would have happened.

As far as I can tell, this corner case isn't very important, and I
also don't see how one would reasonably have triggered it in the
first place.  In particular, syscall restart happens before ptrace
hooks in the syscall exit path, so a tracer should never see the
problematic pre-restart syscall state in the first place.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---

Oleg, you often have good insight into ptrace oddities.  Do you think I'm
correct that this can safely be deleted?

Kees, I think that either this or a similar fix is important to make the
seccomp reordering series be fully effective.

Ingo, this is intended for x86/urgent.  I deliberately didn't cc stable,
as the bug this fixes seems minor enough that I think we can wait a while
to backport it.

arch/x86/kernel/ptrace.c | 9 ---------
 1 file changed, 9 deletions(-)

diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 600edd225e81..bde57caf9aa9 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -922,16 +922,7 @@ static int putreg32(struct task_struct *child, unsigned regno, u32 value)
 	R32(esp, sp);
 
 	case offsetof(struct user32, regs.orig_eax):
-		/*
-		 * A 32-bit debugger setting orig_eax means to restore
-		 * the state of the task restarting a 32-bit syscall.
-		 * Make sure we interpret the -ERESTART* codes correctly
-		 * in case the task is not actually still sitting at the
-		 * exit from a 32-bit syscall with TS_COMPAT still set.
-		 */
 		regs->orig_ax = value;
-		if (syscall_get_nr(child, regs) >= 0)
-			task_thread_info(child)->status |= TS_COMPAT;
 		break;
 
 	case offsetof(struct user32, regs.eflags):
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/ptrace: Remove questionable TS_COMPAT usage in ptrace
  2016-06-18 10:21 [PATCH] x86/ptrace: Remove questionable TS_COMPAT usage in ptrace Andy Lutomirski
@ 2016-06-18 13:55 ` Pedro Alves
  2016-06-18 14:41   ` Pedro Alves
  2016-06-18 17:02   ` Andy Lutomirski
  2016-06-18 17:48 ` Kees Cook
  2016-06-19 21:19 ` Oleg Nesterov
  2 siblings, 2 replies; 19+ messages in thread
From: Pedro Alves @ 2016-06-18 13:55 UTC (permalink / raw)
  To: Andy Lutomirski, x86
  Cc: linux-kernel, Kees Cook, Borislav Petkov, Oleg Nesterov

On 06/18/2016 11:21 AM, Andy Lutomirski wrote:
> A 32-bit tracer can set a tracee's TS_COMPAT flag by poking orig_ax.
> 
> - If the tracee is stopped in a 32-bit syscall, this has no effect
>   as TS_COMPAT will already be set.
> 
> - If the tracee is stopped on entry to a 64-bit syscall, this could
>   cause problems: in_compat_syscall() etc will be out of sync with
>   the actual syscall table in use.  I can imagine this bre aking
>   audit.  (It can't meaningfully break seccomp, as a malicious
>   tracer can already fully bypass seccomp.)  I could also imagine it
>   subtly confusing the implementations of a few syscalls.
> 
>  - If the tracee is stopped in a non-syscall context, then TS_COMPAT
>    will end up set in user mode, which isn't supposed to happen.  The results
>    are likely to be similar to the 64-bit syscall entry case.
> 
> Fix it by deleting the code.
> 
> Here's my understanding of the previous intent.  Suppose a
> 32-bit tracee makes a syscall that returns one of the -ERESTART
> codes.  A 32-bit tracer saves away its register state.  The tracee
> resumes, returns from the system call, and gets stopped again for a
> non-syscall reason (e.g. a signal).  Then the tracer tries to roll
> back the tracee's state by writing all of the saved registers back.
> 
> The result of this sequence of events will be that the tracee's
> registers' low bits match what they were at the end of the syscall
> but TS_COMPAT will be clear.  This will cause syscall_get_error() to
> return a *positive* number, because we zero-extend registers poked
> by 32-bit tracers instead of sign-extending them.  As a result, with
> this change, syscall restart won't happen, whereas, historically, it
> would have happened.
> 
> As far as I can tell, this corner case isn't very important, and I

I believe it's actually very much very important for gdb, for restoring
the inferior state when the user calls a function in the inferior, with:

 (gdb) print foo()

Some background here:

 http://linux-kernel.vger.kernel.narkive.com/fqrh4xKK/patch-x86-ptrace-sign-extend-eax-with-orig-eax-0

This hunk being mentioned in this thread a couple years ago too:

 https://www.sourceware.org/ml/gdb/2014-04/msg00095.html

Please don't break this use case ( and fix the one reported in
that thread :-) ).

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/ptrace: Remove questionable TS_COMPAT usage in ptrace
  2016-06-18 13:55 ` Pedro Alves
@ 2016-06-18 14:41   ` Pedro Alves
  2016-06-18 17:02   ` Andy Lutomirski
  1 sibling, 0 replies; 19+ messages in thread
From: Pedro Alves @ 2016-06-18 14:41 UTC (permalink / raw)
  To: Andy Lutomirski, x86
  Cc: linux-kernel, Kees Cook, Borislav Petkov, Oleg Nesterov

On 06/18/2016 02:55 PM, Pedro Alves wrote:

> This hunk being mentioned in this thread a couple years ago too:
> 
>  https://www.sourceware.org/ml/gdb/2014-04/msg00095.html
> 
> Please don't break this use case ( and fix the one reported in
> that thread :-) ).

BTW, there was a follow up v2 patch to that last url, here:

 [PATCH v2] Fix get ERESTARTSYS with m32 in x86_64 when debug by GDB
 https://sourceware.org/ml/gdb/2014-05/msg00004.html

(for some reason, I can only find the ping on the lkml, not the
 original post, though it was apparently CCed.)

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/ptrace: Remove questionable TS_COMPAT usage in ptrace
  2016-06-18 13:55 ` Pedro Alves
  2016-06-18 14:41   ` Pedro Alves
@ 2016-06-18 17:02   ` Andy Lutomirski
  2016-06-19 22:09     ` Andy Lutomirski
  2016-06-20 10:07     ` Pedro Alves
  1 sibling, 2 replies; 19+ messages in thread
From: Andy Lutomirski @ 2016-06-18 17:02 UTC (permalink / raw)
  To: Pedro Alves
  Cc: Oleg Nesterov, Kees Cook, Borislav Petkov, linux-kernel, X86 ML,
	Linus Torvalds

On Jun 18, 2016 6:56 AM, "Pedro Alves" <pedro@palves.net> wrote:
>
> On 06/18/2016 11:21 AM, Andy Lutomirski wrote:
> > A 32-bit tracer can set a tracee's TS_COMPAT flag by poking orig_ax.
> >
> > - If the tracee is stopped in a 32-bit syscall, this has no effect
> >   as TS_COMPAT will already be set.
> >
> > - If the tracee is stopped on entry to a 64-bit syscall, this could
> >   cause problems: in_compat_syscall() etc will be out of sync with
> >   the actual syscall table in use.  I can imagine this bre aking
> >   audit.  (It can't meaningfully break seccomp, as a malicious
> >   tracer can already fully bypass seccomp.)  I could also imagine it
> >   subtly confusing the implementations of a few syscalls.
> >
> >  - If the tracee is stopped in a non-syscall context, then TS_COMPAT
> >    will end up set in user mode, which isn't supposed to happen.  The results
> >    are likely to be similar to the 64-bit syscall entry case.
> >
> > Fix it by deleting the code.
> >
> > Here's my understanding of the previous intent.  Suppose a
> > 32-bit tracee makes a syscall that returns one of the -ERESTART
> > codes.  A 32-bit tracer saves away its register state.  The tracee
> > resumes, returns from the system call, and gets stopped again for a
> > non-syscall reason (e.g. a signal).  Then the tracer tries to roll
> > back the tracee's state by writing all of the saved registers back.
> >
> > The result of this sequence of events will be that the tracee's
> > registers' low bits match what they were at the end of the syscall
> > but TS_COMPAT will be clear.  This will cause syscall_get_error() to
> > return a *positive* number, because we zero-extend registers poked
> > by 32-bit tracers instead of sign-extending them.  As a result, with
> > this change, syscall restart won't happen, whereas, historically, it
> > would have happened.
> >
> > As far as I can tell, this corner case isn't very important, and I
>
> I believe it's actually very much very important for gdb, for restoring
> the inferior state when the user calls a function in the inferior, with:
>
>  (gdb) print foo()
>
> Some background here:
>
>  http://linux-kernel.vger.kernel.narkive.com/fqrh4xKK/patch-x86-ptrace-sign-extend-eax-with-orig-eax-0

Yuck.  I should have dug in to the history.  Why not just
unconditionally sign-extend eax when set by a 32-bit tracer?

Do you know how to acquire a copy of erestartsys-trap.c?  The old
links appear to be broken.

Also, while I have your attention: when gdb restores old state like
this, does it do it with individual calls to PTRACE_POKEUSER or does
it use SETREGSET or similar to do it all at once?  I'm asking because
I have some other code (fsgsbase) that's on hold until I can figure
out how to keep it from breaking gdb if and when gdb writes to fs and
fs_base.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/ptrace: Remove questionable TS_COMPAT usage in ptrace
  2016-06-18 10:21 [PATCH] x86/ptrace: Remove questionable TS_COMPAT usage in ptrace Andy Lutomirski
  2016-06-18 13:55 ` Pedro Alves
@ 2016-06-18 17:48 ` Kees Cook
  2016-06-19 21:19 ` Oleg Nesterov
  2 siblings, 0 replies; 19+ messages in thread
From: Kees Cook @ 2016-06-18 17:48 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: x86, linux-kernel, Borislav Petkov, Oleg Nesterov

On Sat, Jun 18, 2016 at 3:21 AM, Andy Lutomirski <luto@kernel.org> wrote:
> A 32-bit tracer can set a tracee's TS_COMPAT flag by poking orig_ax.
>
> - If the tracee is stopped in a 32-bit syscall, this has no effect
>   as TS_COMPAT will already be set.
>
> - If the tracee is stopped on entry to a 64-bit syscall, this could
>   cause problems: in_compat_syscall() etc will be out of sync with
>   the actual syscall table in use.  I can imagine this bre aking
>   audit.  (It can't meaningfully break seccomp, as a malicious
>   tracer can already fully bypass seccomp.)  I could also imagine it
>   subtly confusing the implementations of a few syscalls.
>
>  - If the tracee is stopped in a non-syscall context, then TS_COMPAT
>    will end up set in user mode, which isn't supposed to happen.  The results
>    are likely to be similar to the 64-bit syscall entry case.
>
> Fix it by deleting the code.
>
> Here's my understanding of the previous intent.  Suppose a
> 32-bit tracee makes a syscall that returns one of the -ERESTART
> codes.  A 32-bit tracer saves away its register state.  The tracee
> resumes, returns from the system call, and gets stopped again for a
> non-syscall reason (e.g. a signal).  Then the tracer tries to roll
> back the tracee's state by writing all of the saved registers back.
>
> The result of this sequence of events will be that the tracee's
> registers' low bits match what they were at the end of the syscall
> but TS_COMPAT will be clear.  This will cause syscall_get_error() to
> return a *positive* number, because we zero-extend registers poked
> by 32-bit tracers instead of sign-extending them.  As a result, with
> this change, syscall restart won't happen, whereas, historically, it
> would have happened.
>
> As far as I can tell, this corner case isn't very important, and I
> also don't see how one would reasonably have triggered it in the
> first place.  In particular, syscall restart happens before ptrace
> hooks in the syscall exit path, so a tracer should never see the
> problematic pre-restart syscall state in the first place.
>
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
>
> Oleg, you often have good insight into ptrace oddities.  Do you think I'm
> correct that this can safely be deleted?
>
> Kees, I think that either this or a similar fix is important to make the
> seccomp reordering series be fully effective.

For good measure, just to repeat what I said in the other thread:
yeah, this TS_COMPAT injection needs to be fixed, though I don't view
it as critical for seccomp since the major bypass situation will be
fixed already. The TS_COMPAT interaction isn't bad (since the arch
state is saved before the seccomp calls), but it can confuse a filter
intentionally trying to hit this bug via RET_SECCOMP_TRACE, but then
it would only be a problem for a filter that was either ignoring arch
tests or doing biarch filtering, which is extremely uncommon.

-Kees

>
> Ingo, this is intended for x86/urgent.  I deliberately didn't cc stable,
> as the bug this fixes seems minor enough that I think we can wait a while
> to backport it.
>
> arch/x86/kernel/ptrace.c | 9 ---------
>  1 file changed, 9 deletions(-)
>
> diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
> index 600edd225e81..bde57caf9aa9 100644
> --- a/arch/x86/kernel/ptrace.c
> +++ b/arch/x86/kernel/ptrace.c
> @@ -922,16 +922,7 @@ static int putreg32(struct task_struct *child, unsigned regno, u32 value)
>         R32(esp, sp);
>
>         case offsetof(struct user32, regs.orig_eax):
> -               /*
> -                * A 32-bit debugger setting orig_eax means to restore
> -                * the state of the task restarting a 32-bit syscall.
> -                * Make sure we interpret the -ERESTART* codes correctly
> -                * in case the task is not actually still sitting at the
> -                * exit from a 32-bit syscall with TS_COMPAT still set.
> -                */
>                 regs->orig_ax = value;
> -               if (syscall_get_nr(child, regs) >= 0)
> -                       task_thread_info(child)->status |= TS_COMPAT;
>                 break;
>
>         case offsetof(struct user32, regs.eflags):
> --
> 2.7.4
>



-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/ptrace: Remove questionable TS_COMPAT usage in ptrace
  2016-06-18 10:21 [PATCH] x86/ptrace: Remove questionable TS_COMPAT usage in ptrace Andy Lutomirski
  2016-06-18 13:55 ` Pedro Alves
  2016-06-18 17:48 ` Kees Cook
@ 2016-06-19 21:19 ` Oleg Nesterov
  2016-06-19 22:23   ` Andy Lutomirski
  2016-06-20  6:12   ` Andy Lutomirski
  2 siblings, 2 replies; 19+ messages in thread
From: Oleg Nesterov @ 2016-06-19 21:19 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: x86, linux-kernel, Kees Cook, Borislav Petkov

Let me first thank Pedro who has already replied!

And I have to admit I will need to re-read his explanations after
sleep to (try to) convince myself I fully understans the problems ;)
Too late for me.

Right now I have nothing to add, but

On 06/18, Andy Lutomirski wrote:
>
> @@ -922,16 +922,7 @@ static int putreg32(struct task_struct *child, unsigned regno, u32 value)
>  	R32(esp, sp);
>  
>  	case offsetof(struct user32, regs.orig_eax):
> -		/*
> -		 * A 32-bit debugger setting orig_eax means to restore
> -		 * the state of the task restarting a 32-bit syscall.
> -		 * Make sure we interpret the -ERESTART* codes correctly
> -		 * in case the task is not actually still sitting at the
> -		 * exit from a 32-bit syscall with TS_COMPAT still set.
> -		 */
>  		regs->orig_ax = value;
> -		if (syscall_get_nr(child, regs) >= 0)
> -			task_thread_info(child)->status |= TS_COMPAT;

I agree it would be nice to remove this code, but then it is not clear
how/when we should sign-extend regs->ax..

And this leads to another question, why do we actually need to set/clear
TS_COMPAT in set_personality_ia32() ??

Oleg.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/ptrace: Remove questionable TS_COMPAT usage in ptrace
  2016-06-18 17:02   ` Andy Lutomirski
@ 2016-06-19 22:09     ` Andy Lutomirski
  2016-06-20 10:27       ` Pedro Alves
  2016-06-20 15:24       ` Oleg Nesterov
  2016-06-20 10:07     ` Pedro Alves
  1 sibling, 2 replies; 19+ messages in thread
From: Andy Lutomirski @ 2016-06-19 22:09 UTC (permalink / raw)
  To: Pedro Alves
  Cc: Oleg Nesterov, Kees Cook, Borislav Petkov, linux-kernel, X86 ML,
	Linus Torvalds

On Sat, Jun 18, 2016 at 10:02 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Jun 18, 2016 6:56 AM, "Pedro Alves" <pedro@palves.net> wrote:
>>
>> On 06/18/2016 11:21 AM, Andy Lutomirski wrote:
>> > A 32-bit tracer can set a tracee's TS_COMPAT flag by poking orig_ax.
>> >
>> > - If the tracee is stopped in a 32-bit syscall, this has no effect
>> >   as TS_COMPAT will already be set.
>> >
>> > - If the tracee is stopped on entry to a 64-bit syscall, this could
>> >   cause problems: in_compat_syscall() etc will be out of sync with
>> >   the actual syscall table in use.  I can imagine this bre aking
>> >   audit.  (It can't meaningfully break seccomp, as a malicious
>> >   tracer can already fully bypass seccomp.)  I could also imagine it
>> >   subtly confusing the implementations of a few syscalls.
>> >
>> >  - If the tracee is stopped in a non-syscall context, then TS_COMPAT
>> >    will end up set in user mode, which isn't supposed to happen.  The results
>> >    are likely to be similar to the 64-bit syscall entry case.
>> >
>> > Fix it by deleting the code.
>> >
>> > Here's my understanding of the previous intent.  Suppose a
>> > 32-bit tracee makes a syscall that returns one of the -ERESTART
>> > codes.  A 32-bit tracer saves away its register state.  The tracee
>> > resumes, returns from the system call, and gets stopped again for a
>> > non-syscall reason (e.g. a signal).  Then the tracer tries to roll
>> > back the tracee's state by writing all of the saved registers back.
>> >
>> > The result of this sequence of events will be that the tracee's
>> > registers' low bits match what they were at the end of the syscall
>> > but TS_COMPAT will be clear.  This will cause syscall_get_error() to
>> > return a *positive* number, because we zero-extend registers poked
>> > by 32-bit tracers instead of sign-extending them.  As a result, with
>> > this change, syscall restart won't happen, whereas, historically, it
>> > would have happened.
>> >
>> > As far as I can tell, this corner case isn't very important, and I
>>
>> I believe it's actually very much very important for gdb, for restoring
>> the inferior state when the user calls a function in the inferior, with:
>>
>>  (gdb) print foo()
>>
>> Some background here:
>>
>>  http://linux-kernel.vger.kernel.narkive.com/fqrh4xKK/patch-x86-ptrace-sign-extend-eax-with-orig-eax-0
>
> Yuck.  I should have dug in to the history.  Why not just
> unconditionally sign-extend eax when set by a 32-bit tracer?
>
> Do you know how to acquire a copy of erestartsys-trap.c?  The old
> links appear to be broken.
>
> Also, while I have your attention: when gdb restores old state like
> this, does it do it with individual calls to PTRACE_POKEUSER or does
> it use SETREGSET or similar to do it all at once?  I'm asking because
> I have some other code (fsgsbase) that's on hold until I can figure
> out how to keep it from breaking gdb if and when gdb writes to fs and
> fs_base.

OK, I studied this a bit.  What a mess!  Here's what's going on AFAICT:

1. For reasons that probably made sense, the kernel delivers signals
and triggers ptrace before handling syscall restart.  This means that
-ERESTART_RESTARTBLOCK, etc is visible to userspace.  We could
plausibly get away with changing that, but it seems quite risky.

2. As a result of (1), gdb (quite reasonably) expects that it can
snapshot user state on signal delivery, adjust regs to call a
function, and then restore user state.

3. Presumably as a result of (2), we do syscall restart if indicated
by the register state on ptrace resume even if we're *not* resuming a
syscall.

4. Also as a result of (1), gdb expects that writing -1 to orig_eax
via POKEUSER or similar will *disable* syscall restart, which is
necessary to get function calling on syscall exit to work.

The combination of (1) and (4) means that we need to skip syscall
restart if orig_eax == -1 (in a 32-bit signed sense).  The combination
of (1) and (2) means that we need to enable syscall restart if
orig_eax > 0 (in a 32-bit signed sense) and eax contains a
-ERESTARTxyz code (again in a signed sense).  And, for extra fun, we
need to make sure that, if we set __NR_restart_syscall, we set the
correct value based on the syscall that we're restarting.

The latter bit is a mess and is probably broken on current kernels for
64-bit gdb attached to a 32-bit process.  (Is it?  All of this stuff
is a bit of a pain to test.)

Here's my proposal:

Step 1: for 4.7 and for -stable, introduce TS_I386_REGS_POKED.  Set it
in putreg32.  Use it in syscall_get_error, get_nr_restart_syscall,
etc.  Clear it in do_signal.  This gets rid of the arch confusion
while (hopefully) preserving current behavior.

Step 2: Optionally, for 4.8, consider cleaning up the whole mess.
First, change putreg32 to sign-extend orig_eax and eax and just get
rid of the TS_COMPAT checks in syscall_get_nr and syscall_get_error.
Then we either change the 64-bit -ERESTART_BLOCK to a different number
or encode the desired restart bitness in restart_block or similar.

I wonder if we could actually get away with doing syscall restart
processing before ptrace invocation.

--Andy

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/ptrace: Remove questionable TS_COMPAT usage in ptrace
  2016-06-19 21:19 ` Oleg Nesterov
@ 2016-06-19 22:23   ` Andy Lutomirski
  2016-06-20  6:12   ` Andy Lutomirski
  1 sibling, 0 replies; 19+ messages in thread
From: Andy Lutomirski @ 2016-06-19 22:23 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andy Lutomirski, X86 ML, linux-kernel, Kees Cook, Borislav Petkov

On Sun, Jun 19, 2016 at 2:19 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> Let me first thank Pedro who has already replied!
>
> And I have to admit I will need to re-read his explanations after
> sleep to (try to) convince myself I fully understans the problems ;)
> Too late for me.
>
> Right now I have nothing to add, but
>
> On 06/18, Andy Lutomirski wrote:
>>
>> @@ -922,16 +922,7 @@ static int putreg32(struct task_struct *child, unsigned regno, u32 value)
>>       R32(esp, sp);
>>
>>       case offsetof(struct user32, regs.orig_eax):
>> -             /*
>> -              * A 32-bit debugger setting orig_eax means to restore
>> -              * the state of the task restarting a 32-bit syscall.
>> -              * Make sure we interpret the -ERESTART* codes correctly
>> -              * in case the task is not actually still sitting at the
>> -              * exit from a 32-bit syscall with TS_COMPAT still set.
>> -              */
>>               regs->orig_ax = value;
>> -             if (syscall_get_nr(child, regs) >= 0)
>> -                     task_thread_info(child)->status |= TS_COMPAT;
>
> I agree it would be nice to remove this code, but then it is not clear
> how/when we should sign-extend regs->ax..
>
> And this leads to another question, why do we actually need to set/clear
> TS_COMPAT in set_personality_ia32() ??

I have no idea.  Legacy junk?  Maybe so audit sees execution of a
64-bit task as a 64-bit sys_execve return even if the task was 32-bit
before execve?

--Andy

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/ptrace: Remove questionable TS_COMPAT usage in ptrace
  2016-06-19 21:19 ` Oleg Nesterov
  2016-06-19 22:23   ` Andy Lutomirski
@ 2016-06-20  6:12   ` Andy Lutomirski
  2016-06-20 15:31     ` Oleg Nesterov
  2016-06-20 17:53     ` the usage of __SYSCALL_MASK in entry_SYSCALL_64/do_syscall_64 is not consistent Oleg Nesterov
  1 sibling, 2 replies; 19+ messages in thread
From: Andy Lutomirski @ 2016-06-20  6:12 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andy Lutomirski, X86 ML, linux-kernel, Kees Cook, Borislav Petkov

On Sun, Jun 19, 2016 at 2:19 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> Let me first thank Pedro who has already replied!
>
> And I have to admit I will need to re-read his explanations after
> sleep to (try to) convince myself I fully understans the problems ;)
> Too late for me.
>
> Right now I have nothing to add, but
>
> On 06/18, Andy Lutomirski wrote:
>>
>> @@ -922,16 +922,7 @@ static int putreg32(struct task_struct *child, unsigned regno, u32 value)
>>       R32(esp, sp);
>>
>>       case offsetof(struct user32, regs.orig_eax):
>> -             /*
>> -              * A 32-bit debugger setting orig_eax means to restore
>> -              * the state of the task restarting a 32-bit syscall.
>> -              * Make sure we interpret the -ERESTART* codes correctly
>> -              * in case the task is not actually still sitting at the
>> -              * exit from a 32-bit syscall with TS_COMPAT still set.
>> -              */
>>               regs->orig_ax = value;
>> -             if (syscall_get_nr(child, regs) >= 0)
>> -                     task_thread_info(child)->status |= TS_COMPAT;
>
> I agree it would be nice to remove this code, but then it is not clear
> how/when we should sign-extend regs->ax..
>
> And this leads to another question, why do we actually need to set/clear
> TS_COMPAT in set_personality_ia32() ??

Something's clearly buggy there, considering that
set_personality_64bit() does *not* clear it.

--Andy

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/ptrace: Remove questionable TS_COMPAT usage in ptrace
  2016-06-18 17:02   ` Andy Lutomirski
  2016-06-19 22:09     ` Andy Lutomirski
@ 2016-06-20 10:07     ` Pedro Alves
  2016-06-20 11:12       ` Jan Kratochvil
  1 sibling, 1 reply; 19+ messages in thread
From: Pedro Alves @ 2016-06-20 10:07 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Oleg Nesterov, Kees Cook, Borislav Petkov, linux-kernel, X86 ML,
	Linus Torvalds, Jan Kratochvil, Pedro Alves

On 06/18/2016 06:02 PM, Andy Lutomirski wrote:

> Yuck.  I should have dug in to the history.  Why not just
> unconditionally sign-extend eax when set by a 32-bit tracer?

No idea.

> 
> Do you know how to acquire a copy of erestartsys-trap.c?  The old
> links appear to be broken.

That's part of the ptrace testsuite project, still in cvs, though the url changed:

 $ https://sourceware.org/systemtap/wiki/utrace/tests

 $ cvs -d :pserver:anoncvs:anoncvs@sourceware.org:/cvs/systemtap co ptrace-tests

Can't seem to find a cvsweb interface for that.

I think it'd be great to move these to the selftests infrastructure
directly in the kernel tree.  However, nobody's has ever managed to
find energy for that.

> 
> Also, while I have your attention: when gdb restores old state like
> this, does it do it with individual calls to PTRACE_POKEUSER or does
> it use SETREGSET or similar to do it all at once?  I'm asking because
> I have some other code (fsgsbase) that's on hold until I can figure
> out how to keep it from breaking gdb if and when gdb writes to fs and
> fs_base.
> 

It depends on which register you're accessing, and on kernel version.
But on a recent kernel, it should be using PTRACE_SETREGS / PTRACE_SETREGSET,
thus storing a whole register set in one go.  (And it's likely we could
get rid of PTRACE_POKE fallback paths by now.)
To write to the debug registers (dr0-dr7), PTRACE_POKEUSER is always used.

There's code that coordinates with glibc's libthread_db.so that ends
up _reading_ fs_base/gs_base, and gdb uses PTRACE_PEEKUSER for that,
though there's a pending series that changes it, exposing fs_base/gs_base
as just another register in gdb's register cache:

 https://www.sourceware.org/ml/gdb-patches/2015-11/msg00076.html
 https://www.sourceware.org/ml/gdb-patches/2015-11/msg00077.html

Guess that makes fs_base/gs_base user-writable, if the kernel allows it.

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/ptrace: Remove questionable TS_COMPAT usage in ptrace
  2016-06-19 22:09     ` Andy Lutomirski
@ 2016-06-20 10:27       ` Pedro Alves
  2016-06-20 15:24       ` Oleg Nesterov
  1 sibling, 0 replies; 19+ messages in thread
From: Pedro Alves @ 2016-06-20 10:27 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Oleg Nesterov, Kees Cook, Borislav Petkov, linux-kernel, X86 ML,
	Linus Torvalds, Pedro Alves

On 06/19/2016 11:09 PM, Andy Lutomirski wrote:
> 
> The latter bit is a mess and is probably broken on current kernels for
> 64-bit gdb attached to a 32-bit process.  (Is it?  All of this stuff
> is a bit of a pain to test.)

The testcase at:

 https://sourceware.org/ml/gdb/2014-05/msg00004.html

still fails for me on Fedora 23 with git master gdb.

Nevermind the misleading URL, that's a kernel patch.

 $ gcc -g -m32 interrupt.c -o interrupt.32
 ...
 (gdb) r
 Starting program: /home/pedro/tmp/interrupt.32 
 talk to me baby
 ^C
 Program received signal SIGINT, Interrupt.
 0xf7fd9d49 in __kernel_vsyscall ()
 (gdb) p func1()
 $1 = 4
 (gdb) c
 Continuing.
 Unknown error 512
 [Inferior 1 (process 20252) exited with code 01]
 (gdb) 

That was a 64-bit gdb.

Note it doesn't fail with fedora 23's gdb, because of a 
fedora-local workaround.

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/ptrace: Remove questionable TS_COMPAT usage in ptrace
  2016-06-20 10:07     ` Pedro Alves
@ 2016-06-20 11:12       ` Jan Kratochvil
  0 siblings, 0 replies; 19+ messages in thread
From: Jan Kratochvil @ 2016-06-20 11:12 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Oleg Nesterov, Kees Cook, Borislav Petkov, linux-kernel, X86 ML,
	Linus Torvalds, Pedro Alves, Roland McGrath

On Mon, 20 Jun 2016 12:07:56 +0200, Pedro Alves wrote:
> On 06/18/2016 06:02 PM, Andy Lutomirski wrote:
> > Yuck.  I should have dug in to the history.  Why not just
> > unconditionally sign-extend eax when set by a 32-bit tracer?
> 
> No idea.

Roland McGrath knows why he wrote it that way, Cced.

What if eax contains an address in 2GB..4GB range?  I guess currently the
orig_eax check tries to verify %eax cannot contain an address.


Jan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/ptrace: Remove questionable TS_COMPAT usage in ptrace
  2016-06-19 22:09     ` Andy Lutomirski
  2016-06-20 10:27       ` Pedro Alves
@ 2016-06-20 15:24       ` Oleg Nesterov
  2016-06-20 16:30         ` Andy Lutomirski
  1 sibling, 1 reply; 19+ messages in thread
From: Oleg Nesterov @ 2016-06-20 15:24 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Pedro Alves, Kees Cook, Borislav Petkov, linux-kernel, X86 ML,
	Linus Torvalds

On 06/19, Andy Lutomirski wrote:
>
> On Sat, Jun 18, 2016 at 10:02 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> Step 1: for 4.7 and for -stable, introduce TS_I386_REGS_POKED.  Set it
> in putreg32.  Use it in syscall_get_error, get_nr_restart_syscall,
> etc.  Clear it in do_signal.

do_signal() won't be necessarily called...

> I wonder if we could actually get away with doing syscall restart
> processing before ptrace invocation.

How? this doesn't look possible or I misunderstood.

How about the simple change below for now? IIRC 32-bit task can't use
"syscall" so if syscall_get_nr() >= 0 then even the wrong TS_COMPAT is
not that bad, even if it "leaks" to user-mode.

nobody should use, say, in_ia32_syscall() unless we know that "in syscall"
is actually true. Hmm, arch/x86/kernel/uprobes.c does and this is wrong
regardless, I'll send the fix.

Oleg.

--- x/arch/x86/kernel/ptrace.c
+++ x/arch/x86/kernel/ptrace.c
@@ -930,7 +930,7 @@ static int putreg32(struct task_struct *
 		 * exit from a 32-bit syscall with TS_COMPAT still set.
 		 */
 		regs->orig_ax = value;
-		if (syscall_get_nr(child, regs) >= 0)
+		if (syscall_get_nr(child, regs) >= 0 && !user_64bit_mode(regs))
 			task_thread_info(child)->status |= TS_COMPAT;
 		break;
 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/ptrace: Remove questionable TS_COMPAT usage in ptrace
  2016-06-20  6:12   ` Andy Lutomirski
@ 2016-06-20 15:31     ` Oleg Nesterov
  2016-06-20 17:53     ` the usage of __SYSCALL_MASK in entry_SYSCALL_64/do_syscall_64 is not consistent Oleg Nesterov
  1 sibling, 0 replies; 19+ messages in thread
From: Oleg Nesterov @ 2016-06-20 15:31 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andy Lutomirski, X86 ML, linux-kernel, Kees Cook, Borislav Petkov

On 06/19, Andy Lutomirski wrote:
>
> On Sun, Jun 19, 2016 at 2:19 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > And this leads to another question, why do we actually need to set/clear
> > TS_COMPAT in set_personality_ia32() ??
>
> Something's clearly buggy there, considering that
> set_personality_64bit() does *not* clear it.

Yes, yes, I too noticed this, and this doesn't match the "if (x32)"
branch in set_personality_ia32().

But I think we do not really need to clear this bit. And probably
set_personality_ia32() doesn't need to play with TS_COMPAT.

Oleg.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/ptrace: Remove questionable TS_COMPAT usage in ptrace
  2016-06-20 16:30         ` Andy Lutomirski
@ 2016-06-20 16:14           ` Oleg Nesterov
  2016-06-20 17:25             ` Andy Lutomirski
  0 siblings, 1 reply; 19+ messages in thread
From: Oleg Nesterov @ 2016-06-20 16:14 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Pedro Alves, Kees Cook, Borislav Petkov, linux-kernel, X86 ML,
	Linus Torvalds

On 06/20, Andy Lutomirski wrote:
>
> On Mon, Jun 20, 2016 at 8:24 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > How about the simple change below for now? IIRC 32-bit task can't use
> > "syscall" so if syscall_get_nr() >= 0 then even the wrong TS_COMPAT is
> > not that bad, even if it "leaks" to user-mode.
>
> Hmm.  That should fix the minor security issue, but it will even
> further break cross-arch tracing: now a 32-bit tracer tracing a 64-bit
> task that does int $0x80 will malfunction even more than it would
> have.

This is broken in any case. I mean, a 32-bit debugger can't really
debug a 64-bit task.

I don't think this change makes the things really worse.

> Also, it relies on bizarre arch details IMO.

Heh,  it looks as if your patch do not ;)

> I think I prefer my version, coming momentarily.

I disagree... I don't really understand why do we need the additional
complications for the minimal fix which doesn't look very nice anyway.

But I won't argue, and your patch looks correct to me.

Oleg.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/ptrace: Remove questionable TS_COMPAT usage in ptrace
  2016-06-20 15:24       ` Oleg Nesterov
@ 2016-06-20 16:30         ` Andy Lutomirski
  2016-06-20 16:14           ` Oleg Nesterov
  0 siblings, 1 reply; 19+ messages in thread
From: Andy Lutomirski @ 2016-06-20 16:30 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Pedro Alves, Kees Cook, Borislav Petkov, linux-kernel, X86 ML,
	Linus Torvalds

On Mon, Jun 20, 2016 at 8:24 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 06/19, Andy Lutomirski wrote:
>>
>> On Sat, Jun 18, 2016 at 10:02 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>> Step 1: for 4.7 and for -stable, introduce TS_I386_REGS_POKED.  Set it
>> in putreg32.  Use it in syscall_get_error, get_nr_restart_syscall,
>> etc.  Clear it in do_signal.
>
> do_signal() won't be necessarily called...

True.  But I should have said "clear it in prepare_exit_to_usermode",
and the patch I'm just about to send does that.
>
>> I wonder if we could actually get away with doing syscall restart
>> processing before ptrace invocation.
>
> How? this doesn't look possible or I misunderstood.
>
> How about the simple change below for now? IIRC 32-bit task can't use
> "syscall" so if syscall_get_nr() >= 0 then even the wrong TS_COMPAT is
> not that bad, even if it "leaks" to user-mode.

Hmm.  That should fix the minor security issue, but it will even
further break cross-arch tracing: now a 32-bit tracer tracing a 64-bit
task that does int $0x80 will malfunction even more than it would
have.  Also, it relies on bizarre arch details IMO.

I think I prefer my version, coming momentarily.

--Andy

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/ptrace: Remove questionable TS_COMPAT usage in ptrace
  2016-06-20 16:14           ` Oleg Nesterov
@ 2016-06-20 17:25             ` Andy Lutomirski
  0 siblings, 0 replies; 19+ messages in thread
From: Andy Lutomirski @ 2016-06-20 17:25 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Pedro Alves, Kees Cook, Borislav Petkov, linux-kernel, X86 ML,
	Linus Torvalds

On Mon, Jun 20, 2016 at 9:14 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 06/20, Andy Lutomirski wrote:
>>
>> On Mon, Jun 20, 2016 at 8:24 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>> >
>> > How about the simple change below for now? IIRC 32-bit task can't use
>> > "syscall" so if syscall_get_nr() >= 0 then even the wrong TS_COMPAT is
>> > not that bad, even if it "leaks" to user-mode.
>>
>> Hmm.  That should fix the minor security issue, but it will even
>> further break cross-arch tracing: now a 32-bit tracer tracing a 64-bit
>> task that does int $0x80 will malfunction even more than it would
>> have.
>
> This is broken in any case. I mean, a 32-bit debugger can't really
> debug a 64-bit task.
>
> I don't think this change makes the things really worse.
>
>> Also, it relies on bizarre arch details IMO.
>
> Heh,  it looks as if your patch do not ;)
>
>> I think I prefer my version, coming momentarily.
>
> I disagree... I don't really understand why do we need the additional
> complications for the minimal fix which doesn't look very nice anyway.
>
> But I won't argue, and your patch looks correct to me.

Part of the reason is that I actually want to fix this in two parts.
After my TS_REGS_POKED_I386 patch, I have a follow-up patch that I
want to polish and test a bit more that's intended to get the
64-bit-tracer/32-bit-tracee case right, but it probably needs extra
testing, is less likely to make 4.8, and is definitely not for
4.7/stab.e.  That patch deletes TS_REGS_POKED_I386 again.  So by
having the minimal fix be more mindless, I'm more comfortable with the
backport.  I'll send out the second patch soon.


--Andy


-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 19+ messages in thread

* the usage of __SYSCALL_MASK in entry_SYSCALL_64/do_syscall_64 is not consistent
  2016-06-20  6:12   ` Andy Lutomirski
  2016-06-20 15:31     ` Oleg Nesterov
@ 2016-06-20 17:53     ` Oleg Nesterov
  2016-06-21 19:01       ` Kees Cook
  1 sibling, 1 reply; 19+ messages in thread
From: Oleg Nesterov @ 2016-06-20 17:53 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andy Lutomirski, X86 ML, linux-kernel, Kees Cook, Borislav Petkov

On 06/19, Andy Lutomirski wrote:
>
> Something's clearly buggy there,

The usage of __X32_SYSCALL_BIT doesn't look right too. Nothing serious
but still.

Damn, initially I thought I have found the serious bug in entry_64.S
and it took me some time to understand why my exploit doesn't work ;)
So I learned that

	andl    $__SYSCALL_MASK, %eax

in entry_SYSCALL_64_fastpath() zero-extends %rax and thus

	cmpl    $__NR_syscall_max, %eax
	...
	call    *sys_call_table(, %rax, 8)

is correct (rax <= __NR_syscall_max).

OK, so entry_64.S simply "ignores" the upper bits if CONFIG_X86_X32_ABI.
Fine, but this doesn't match the

	if (likely((nr & __SYSCALL_MASK) < NR_syscalls))

check in do_syscall_64(). So this test-case

	#include <stdio.h>

	int main(void)
	{
		// __NR_exit == 0x3c
		asm volatile ("movq $0xFFFFFFFF0000003c, %rax; syscall");

		printf("I didn't exit because I am traced\n");

		return 0;
	}

silently exits if not traced, otherwise it calls printf().

Should we do something or we do not care?

Oleg.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: the usage of __SYSCALL_MASK in entry_SYSCALL_64/do_syscall_64 is not consistent
  2016-06-20 17:53     ` the usage of __SYSCALL_MASK in entry_SYSCALL_64/do_syscall_64 is not consistent Oleg Nesterov
@ 2016-06-21 19:01       ` Kees Cook
  0 siblings, 0 replies; 19+ messages in thread
From: Kees Cook @ 2016-06-21 19:01 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andy Lutomirski, Andy Lutomirski, X86 ML, linux-kernel, Borislav Petkov

On Mon, Jun 20, 2016 at 10:53 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 06/19, Andy Lutomirski wrote:
>>
>> Something's clearly buggy there,
>
> The usage of __X32_SYSCALL_BIT doesn't look right too. Nothing serious
> but still.
>
> Damn, initially I thought I have found the serious bug in entry_64.S
> and it took me some time to understand why my exploit doesn't work ;)
> So I learned that
>
>         andl    $__SYSCALL_MASK, %eax
>
> in entry_SYSCALL_64_fastpath() zero-extends %rax and thus
>
>         cmpl    $__NR_syscall_max, %eax
>         ...
>         call    *sys_call_table(, %rax, 8)
>
> is correct (rax <= __NR_syscall_max).
>
> OK, so entry_64.S simply "ignores" the upper bits if CONFIG_X86_X32_ABI.
> Fine, but this doesn't match the
>
>         if (likely((nr & __SYSCALL_MASK) < NR_syscalls))
>
> check in do_syscall_64(). So this test-case
>
>         #include <stdio.h>
>
>         int main(void)
>         {
>                 // __NR_exit == 0x3c
>                 asm volatile ("movq $0xFFFFFFFF0000003c, %rax; syscall");
>
>                 printf("I didn't exit because I am traced\n");
>
>                 return 0;
>         }
>
> silently exits if not traced, otherwise it calls printf().
>
> Should we do something or we do not care?

The slow path has seccomp, so there's no filter bypass with this. I
think it should get corrected, just for proper behavior, but it
currently looks harmless. It does, technically, double the attack
surface for userspace ROPish attacks since now the top half of the
register can be F instead of 0, but that's probably not a very big
deal.

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2016-06-21 19:01 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-18 10:21 [PATCH] x86/ptrace: Remove questionable TS_COMPAT usage in ptrace Andy Lutomirski
2016-06-18 13:55 ` Pedro Alves
2016-06-18 14:41   ` Pedro Alves
2016-06-18 17:02   ` Andy Lutomirski
2016-06-19 22:09     ` Andy Lutomirski
2016-06-20 10:27       ` Pedro Alves
2016-06-20 15:24       ` Oleg Nesterov
2016-06-20 16:30         ` Andy Lutomirski
2016-06-20 16:14           ` Oleg Nesterov
2016-06-20 17:25             ` Andy Lutomirski
2016-06-20 10:07     ` Pedro Alves
2016-06-20 11:12       ` Jan Kratochvil
2016-06-18 17:48 ` Kees Cook
2016-06-19 21:19 ` Oleg Nesterov
2016-06-19 22:23   ` Andy Lutomirski
2016-06-20  6:12   ` Andy Lutomirski
2016-06-20 15:31     ` Oleg Nesterov
2016-06-20 17:53     ` the usage of __SYSCALL_MASK in entry_SYSCALL_64/do_syscall_64 is not consistent Oleg Nesterov
2016-06-21 19:01       ` Kees Cook

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).