From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752658AbbGJBe3 (ORCPT ); Thu, 9 Jul 2015 21:34:29 -0400 Received: from mail-la0-f47.google.com ([209.85.215.47]:36331 "EHLO mail-la0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751709AbbGJBeU (ORCPT ); Thu, 9 Jul 2015 21:34:20 -0400 MIME-Version: 1.0 In-Reply-To: References: From: Andy Lutomirski Date: Thu, 9 Jul 2015 18:33:59 -0700 Message-ID: Subject: Re: [RFC/PATCH 5/7] x86/vm86: Teach handle_vm86_trap to return to 32bit mode directly To: Andy Lutomirski Cc: X86 ML , "linux-kernel@vger.kernel.org" , =?UTF-8?B?RnLDqWTDqXJpYyBXZWlzYmVja2Vy?= , Rik van Riel , Oleg Nesterov , Denys Vlasenko , Borislav Petkov , Kees Cook , Brian Gerst , Linus Torvalds Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jul 9, 2015 at 3:41 PM, Andy Lutomirski wrote: > On Wed, Jul 8, 2015 at 12:24 PM, Andy Lutomirski wrote: >> The TIF_NOTIFY_RESUME hack it was using was buggy and unsupportable. >> vm86 mode was completely broken under ptrace, for example, because >> we'd never make it to v8086 mode. >> >> This code is still a huge, scary mess, but at least it's no longer >> tangled with the exit-to-userspace loop. > > This patch is incorrect. Brian, what's the ETA for your vm86 cleanup? > If it's very soon, then I'll see if I can rely on it. If not, I'll > have to come up with a way to fix this patch. > > Grr. The kernel state when handle_vm86_trap is called is absurd right > now. Somehow we're supposed to survive do_trap, send a signal > corresponding to the outside-vm86 state, and exit vm86 cleanly (with > ax = 0), all before returning to user mode. I doubt these semantics > are even intentional. > > This code sucks. OK, I have a version that seems to work. It comes with a much better selftest, too. I'll send it shortly. Brian, would it make sense to base your work on top of it? Now that I've looked at this stuff, if I were designing Linux support for v8086 mode, I'd do it very differently. There wouldn't be a vm86 syscall at all. Instead you'd call sigaltstack, then raise a signal, set X86_EFLAGS_VM, and return. The kernel would handle X86_EFLAGS_VM being set by setting TIF_V8086 and adjusting sp0. On entry, TIF_V8086 would move the segment registers from the hardware frame into pt_regs and, on exit, TIF_V8086 would move them back. Clearing X86_EFLAGS_VM (via ptrace, signal delivery, or sigreturn) would sanitize the segment registers. SYSENTER would be safe, so the SYSENTER_CS hack wouldn't be needed. Of course, we'd lose the CPU state, so the user would have to be careful. And that's it. There wouldn't be any emulation -- user code could emulate syscalls all by itself in a signal handler. Exiting v8086 mode would be straightforward -- just do anything that would raise a signal. Of course, this isn't at all ABI-compatible with the current turd, and v8086 mode isn't really that useful, so this is just idle retroactive speculation. But the TIF_V8086 trick would still be useful to let us get rid of all the awful hacks in the trap and exit code. --Andy