From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1751785AbcFSWKV (ORCPT <rfc822;w@1wt.eu>);
	Sun, 19 Jun 2016 18:10:21 -0400
Received: from mail-vk0-f41.google.com ([209.85.213.41]:36395 "EHLO
	mail-vk0-f41.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751364AbcFSWKM (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Sun, 19 Jun 2016 18:10:12 -0400
MIME-Version: 1.0
In-Reply-To: <CALCETrW9guyn4-SqKaPPvcr7cQ5DskwVSKu+WZhMkG5NZHciAg@mail.gmail.com>
References: <94bda8cd5f326ae5591c80fb5d7c1c22624accec.1466244711.git.luto@kernel.org>
 <b9f316ca-5f79-91f5-1d19-773679778357@palves.net> <CALCETrW9guyn4-SqKaPPvcr7cQ5DskwVSKu+WZhMkG5NZHciAg@mail.gmail.com>
From: Andy Lutomirski <luto@amacapital.net>
Date: Sun, 19 Jun 2016 15:09:41 -0700
Message-ID: <CALCETrUBSGtvbnokoFWMW_CNmz-9R+ry85hd3NGkH8vMQtUdjw@mail.gmail.com>
Subject: Re: [PATCH] x86/ptrace: Remove questionable TS_COMPAT usage in ptrace
To: Pedro Alves <pedro@palves.net>
Cc: Oleg Nesterov <oleg@redhat.com>, Kees Cook <keescook@chromium.org>,
        Borislav Petkov <bp@alien8.de>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        X86 ML <x86@kernel.org>,
        Linus Torvalds <torvalds@linux-foundation.org>
Content-Type: text/plain; charset=UTF-8
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Sat, Jun 18, 2016 at 10:02 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Jun 18, 2016 6:56 AM, "Pedro Alves" <pedro@palves.net> wrote:
>>
>> On 06/18/2016 11:21 AM, Andy Lutomirski wrote:
>> > A 32-bit tracer can set a tracee's TS_COMPAT flag by poking orig_ax.
>> >
>> > - If the tracee is stopped in a 32-bit syscall, this has no effect
>> >   as TS_COMPAT will already be set.
>> >
>> > - If the tracee is stopped on entry to a 64-bit syscall, this could
>> >   cause problems: in_compat_syscall() etc will be out of sync with
>> >   the actual syscall table in use.  I can imagine this bre aking
>> >   audit.  (It can't meaningfully break seccomp, as a malicious
>> >   tracer can already fully bypass seccomp.)  I could also imagine it
>> >   subtly confusing the implementations of a few syscalls.
>> >
>> >  - If the tracee is stopped in a non-syscall context, then TS_COMPAT
>> >    will end up set in user mode, which isn't supposed to happen.  The results
>> >    are likely to be similar to the 64-bit syscall entry case.
>> >
>> > Fix it by deleting the code.
>> >
>> > Here's my understanding of the previous intent.  Suppose a
>> > 32-bit tracee makes a syscall that returns one of the -ERESTART
>> > codes.  A 32-bit tracer saves away its register state.  The tracee
>> > resumes, returns from the system call, and gets stopped again for a
>> > non-syscall reason (e.g. a signal).  Then the tracer tries to roll
>> > back the tracee's state by writing all of the saved registers back.
>> >
>> > The result of this sequence of events will be that the tracee's
>> > registers' low bits match what they were at the end of the syscall
>> > but TS_COMPAT will be clear.  This will cause syscall_get_error() to
>> > return a *positive* number, because we zero-extend registers poked
>> > by 32-bit tracers instead of sign-extending them.  As a result, with
>> > this change, syscall restart won't happen, whereas, historically, it
>> > would have happened.
>> >
>> > As far as I can tell, this corner case isn't very important, and I
>>
>> I believe it's actually very much very important for gdb, for restoring
>> the inferior state when the user calls a function in the inferior, with:
>>
>>  (gdb) print foo()
>>
>> Some background here:
>>
>>  http://linux-kernel.vger.kernel.narkive.com/fqrh4xKK/patch-x86-ptrace-sign-extend-eax-with-orig-eax-0
>
> Yuck.  I should have dug in to the history.  Why not just
> unconditionally sign-extend eax when set by a 32-bit tracer?
>
> Do you know how to acquire a copy of erestartsys-trap.c?  The old
> links appear to be broken.
>
> Also, while I have your attention: when gdb restores old state like
> this, does it do it with individual calls to PTRACE_POKEUSER or does
> it use SETREGSET or similar to do it all at once?  I'm asking because
> I have some other code (fsgsbase) that's on hold until I can figure
> out how to keep it from breaking gdb if and when gdb writes to fs and
> fs_base.

OK, I studied this a bit.  What a mess!  Here's what's going on AFAICT:

1. For reasons that probably made sense, the kernel delivers signals
and triggers ptrace before handling syscall restart.  This means that
-ERESTART_RESTARTBLOCK, etc is visible to userspace.  We could
plausibly get away with changing that, but it seems quite risky.

2. As a result of (1), gdb (quite reasonably) expects that it can
snapshot user state on signal delivery, adjust regs to call a
function, and then restore user state.

3. Presumably as a result of (2), we do syscall restart if indicated
by the register state on ptrace resume even if we're *not* resuming a
syscall.

4. Also as a result of (1), gdb expects that writing -1 to orig_eax
via POKEUSER or similar will *disable* syscall restart, which is
necessary to get function calling on syscall exit to work.

The combination of (1) and (4) means that we need to skip syscall
restart if orig_eax == -1 (in a 32-bit signed sense).  The combination
of (1) and (2) means that we need to enable syscall restart if
orig_eax > 0 (in a 32-bit signed sense) and eax contains a
-ERESTARTxyz code (again in a signed sense).  And, for extra fun, we
need to make sure that, if we set __NR_restart_syscall, we set the
correct value based on the syscall that we're restarting.

The latter bit is a mess and is probably broken on current kernels for
64-bit gdb attached to a 32-bit process.  (Is it?  All of this stuff
is a bit of a pain to test.)

Here's my proposal:

Step 1: for 4.7 and for -stable, introduce TS_I386_REGS_POKED.  Set it
in putreg32.  Use it in syscall_get_error, get_nr_restart_syscall,
etc.  Clear it in do_signal.  This gets rid of the arch confusion
while (hopefully) preserving current behavior.

Step 2: Optionally, for 4.8, consider cleaning up the whole mess.
First, change putreg32 to sign-extend orig_eax and eax and just get
rid of the TS_COMPAT checks in syscall_get_nr and syscall_get_error.
Then we either change the 64-bit -ERESTART_BLOCK to a different number
or encode the desired restart bitness in restart_block or similar.

I wonder if we could actually get away with doing syscall restart
processing before ptrace invocation.

--Andy