From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1754840AbbGJP12 (ORCPT <rfc822;w@1wt.eu>);
	Fri, 10 Jul 2015 11:27:28 -0400
Received: from mail-oi0-f51.google.com ([209.85.218.51]:35545 "EHLO
	mail-oi0-f51.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1754775AbbGJP1R (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Fri, 10 Jul 2015 11:27:17 -0400
MIME-Version: 1.0
In-Reply-To: <CALCETrXb_A4s=ORYDEv4j1--tQsqKeHkyaKbL6cUhDa1FxpG6A@mail.gmail.com>
References: <cover.1436383168.git.luto@kernel.org>
	<c3bfa8ab98867fe07521a55765b85b3d4b582579.1436383168.git.luto@kernel.org>
	<CALCETrWrJ1CJ6cqpHzOGHSx3BRZ2nYG8dDZWabbaj1n-=YBvng@mail.gmail.com>
	<CALCETrXb_A4s=ORYDEv4j1--tQsqKeHkyaKbL6cUhDa1FxpG6A@mail.gmail.com>
Date: Fri, 10 Jul 2015 11:27:16 -0400
Message-ID: <CAMzpN2j08HroWeAXxJitwNvdGsEvi7ao2PB12bLCqLy1_Nex-g@mail.gmail.com>
Subject: Re: [RFC/PATCH 5/7] x86/vm86: Teach handle_vm86_trap to return to
 32bit mode directly
From: Brian Gerst <brgerst@gmail.com>
To: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>, X86 ML <x86@kernel.org>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        =?UTF-8?B?RnLDqWTDqXJpYyBXZWlzYmVja2Vy?= <fweisbec@gmail.com>,
        Rik van Riel <riel@redhat.com>, Oleg Nesterov <oleg@redhat.com>,
        Denys Vlasenko <vda.linux@googlemail.com>,
        Borislav Petkov <bp@alien8.de>, Kees Cook <keescook@chromium.org>,
        Linus Torvalds <torvalds@linux-foundation.org>
Content-Type: text/plain; charset=UTF-8
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Thu, Jul 9, 2015 at 9:33 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Thu, Jul 9, 2015 at 3:41 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Wed, Jul 8, 2015 at 12:24 PM, Andy Lutomirski <luto@kernel.org> wrote:
>>> The TIF_NOTIFY_RESUME hack it was using was buggy and unsupportable.
>>> vm86 mode was completely broken under ptrace, for example, because
>>> we'd never make it to v8086 mode.
>>>
>>> This code is still a huge, scary mess, but at least it's no longer
>>> tangled with the exit-to-userspace loop.
>>
>> This patch is incorrect.  Brian, what's the ETA for your vm86 cleanup?
>>  If it's very soon, then I'll see if I can rely on it.  If not, I'll
>> have to come up with a way to fix this patch.
>>
>> Grr.  The kernel state when handle_vm86_trap is called is absurd right
>> now.  Somehow we're supposed to survive do_trap, send a signal
>> corresponding to the outside-vm86 state, and exit vm86 cleanly (with
>> ax = 0), all before returning to user mode.  I doubt these semantics
>> are even intentional.
>>
>> This code sucks.
>
> OK, I have a version that seems to work.  It comes with a much better
> selftest, too.  I'll send it shortly.
>
> Brian, would it make sense to base your work on top of it?
>
> Now that I've looked at this stuff, if I were designing Linux support
> for v8086 mode, I'd do it very differently.  There wouldn't be a vm86
> syscall at all.  Instead you'd call sigaltstack, then raise a signal,
> set X86_EFLAGS_VM, and return.
>
> The kernel would handle X86_EFLAGS_VM being set by setting TIF_V8086
> and adjusting sp0.  On entry, TIF_V8086 would move the segment
> registers from the hardware frame into pt_regs and, on exit, TIF_V8086
> would move them back.  Clearing X86_EFLAGS_VM (via ptrace, signal
> delivery, or sigreturn) would sanitize the segment registers.
>
> SYSENTER would be safe, so the SYSENTER_CS hack wouldn't be needed.
> Of course, we'd lose the CPU state, so the user would have to be
> careful.
>
> And that's it.  There wouldn't be any emulation -- user code could
> emulate syscalls all by itself in a signal handler.  Exiting v8086
> mode would be straightforward -- just do anything that would raise a
> signal.
>
> Of course, this isn't at all ABI-compatible with the current turd, and
> v8086 mode isn't really that useful, so this is just idle retroactive
> speculation.  But the TIF_V8086 trick would still be useful to let us
> get rid of all the awful hacks in the trap and exit code.

I'll post my patches tonight when I get home.  It would probably make
more sense for you to base off mine, since it should eliminate the
need for you to touch any vm86 code.

I fixed the signal issue by checking if the VM flag is set in
handle_signal(), and swap the register state there before pushing the
signal frame, but that is only possible after removing the need to
jump back into the exit asm routines. work_notifysig_v86 is gone too
as a result.

--
Brian Gerst