From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751785AbcFSWKV (ORCPT ); Sun, 19 Jun 2016 18:10:21 -0400 Received: from mail-vk0-f41.google.com ([209.85.213.41]:36395 "EHLO mail-vk0-f41.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751364AbcFSWKM (ORCPT ); Sun, 19 Jun 2016 18:10:12 -0400 MIME-Version: 1.0 In-Reply-To: References: <94bda8cd5f326ae5591c80fb5d7c1c22624accec.1466244711.git.luto@kernel.org> From: Andy Lutomirski Date: Sun, 19 Jun 2016 15:09:41 -0700 Message-ID: Subject: Re: [PATCH] x86/ptrace: Remove questionable TS_COMPAT usage in ptrace To: Pedro Alves Cc: Oleg Nesterov , Kees Cook , Borislav Petkov , "linux-kernel@vger.kernel.org" , X86 ML , Linus Torvalds Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Jun 18, 2016 at 10:02 AM, Andy Lutomirski wrote: > On Jun 18, 2016 6:56 AM, "Pedro Alves" wrote: >> >> On 06/18/2016 11:21 AM, Andy Lutomirski wrote: >> > A 32-bit tracer can set a tracee's TS_COMPAT flag by poking orig_ax. >> > >> > - If the tracee is stopped in a 32-bit syscall, this has no effect >> > as TS_COMPAT will already be set. >> > >> > - If the tracee is stopped on entry to a 64-bit syscall, this could >> > cause problems: in_compat_syscall() etc will be out of sync with >> > the actual syscall table in use. I can imagine this bre aking >> > audit. (It can't meaningfully break seccomp, as a malicious >> > tracer can already fully bypass seccomp.) I could also imagine it >> > subtly confusing the implementations of a few syscalls. >> > >> > - If the tracee is stopped in a non-syscall context, then TS_COMPAT >> > will end up set in user mode, which isn't supposed to happen. The results >> > are likely to be similar to the 64-bit syscall entry case. >> > >> > Fix it by deleting the code. >> > >> > Here's my understanding of the previous intent. Suppose a >> > 32-bit tracee makes a syscall that returns one of the -ERESTART >> > codes. A 32-bit tracer saves away its register state. The tracee >> > resumes, returns from the system call, and gets stopped again for a >> > non-syscall reason (e.g. a signal). Then the tracer tries to roll >> > back the tracee's state by writing all of the saved registers back. >> > >> > The result of this sequence of events will be that the tracee's >> > registers' low bits match what they were at the end of the syscall >> > but TS_COMPAT will be clear. This will cause syscall_get_error() to >> > return a *positive* number, because we zero-extend registers poked >> > by 32-bit tracers instead of sign-extending them. As a result, with >> > this change, syscall restart won't happen, whereas, historically, it >> > would have happened. >> > >> > As far as I can tell, this corner case isn't very important, and I >> >> I believe it's actually very much very important for gdb, for restoring >> the inferior state when the user calls a function in the inferior, with: >> >> (gdb) print foo() >> >> Some background here: >> >> http://linux-kernel.vger.kernel.narkive.com/fqrh4xKK/patch-x86-ptrace-sign-extend-eax-with-orig-eax-0 > > Yuck. I should have dug in to the history. Why not just > unconditionally sign-extend eax when set by a 32-bit tracer? > > Do you know how to acquire a copy of erestartsys-trap.c? The old > links appear to be broken. > > Also, while I have your attention: when gdb restores old state like > this, does it do it with individual calls to PTRACE_POKEUSER or does > it use SETREGSET or similar to do it all at once? I'm asking because > I have some other code (fsgsbase) that's on hold until I can figure > out how to keep it from breaking gdb if and when gdb writes to fs and > fs_base. OK, I studied this a bit. What a mess! Here's what's going on AFAICT: 1. For reasons that probably made sense, the kernel delivers signals and triggers ptrace before handling syscall restart. This means that -ERESTART_RESTARTBLOCK, etc is visible to userspace. We could plausibly get away with changing that, but it seems quite risky. 2. As a result of (1), gdb (quite reasonably) expects that it can snapshot user state on signal delivery, adjust regs to call a function, and then restore user state. 3. Presumably as a result of (2), we do syscall restart if indicated by the register state on ptrace resume even if we're *not* resuming a syscall. 4. Also as a result of (1), gdb expects that writing -1 to orig_eax via POKEUSER or similar will *disable* syscall restart, which is necessary to get function calling on syscall exit to work. The combination of (1) and (4) means that we need to skip syscall restart if orig_eax == -1 (in a 32-bit signed sense). The combination of (1) and (2) means that we need to enable syscall restart if orig_eax > 0 (in a 32-bit signed sense) and eax contains a -ERESTARTxyz code (again in a signed sense). And, for extra fun, we need to make sure that, if we set __NR_restart_syscall, we set the correct value based on the syscall that we're restarting. The latter bit is a mess and is probably broken on current kernels for 64-bit gdb attached to a 32-bit process. (Is it? All of this stuff is a bit of a pain to test.) Here's my proposal: Step 1: for 4.7 and for -stable, introduce TS_I386_REGS_POKED. Set it in putreg32. Use it in syscall_get_error, get_nr_restart_syscall, etc. Clear it in do_signal. This gets rid of the arch confusion while (hopefully) preserving current behavior. Step 2: Optionally, for 4.8, consider cleaning up the whole mess. First, change putreg32 to sign-extend orig_eax and eax and just get rid of the TS_COMPAT checks in syscall_get_nr and syscall_get_error. Then we either change the 64-bit -ERESTART_BLOCK to a different number or encode the desired restart bitness in restart_block or similar. I wonder if we could actually get away with doing syscall restart processing before ptrace invocation. --Andy