From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1752769AbbC0Viz (ORCPT <rfc822;w@1wt.eu>);
	Fri, 27 Mar 2015 17:38:55 -0400
Received: from mail-lb0-f173.google.com ([209.85.217.173]:34694 "EHLO
	mail-lb0-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1752563AbbC0ViU (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Fri, 27 Mar 2015 17:38:20 -0400
MIME-Version: 1.0
In-Reply-To: <20150327113125.GA14778@gmail.com>
References: <1427373731-13056-1-git-send-email-dvlasenk@redhat.com>
 <CAMzpN2iRhG8vhHEd2AD5LHWi7rQBoqdnVSjnNdkay8HpeJsjFw@mail.gmail.com> <20150327113125.GA14778@gmail.com>
From: Andy Lutomirski <luto@amacapital.net>
Date: Fri, 27 Mar 2015 14:37:58 -0700
Message-ID: <CALCETrUQtHLLEHqq=PJdf7QDUP_RjbS2szKgHa210+VMgotsPg@mail.gmail.com>
Subject: Re: [PATCH] x86/asm/entry/64: better check for canonical address
To: Ingo Molnar <mingo@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>, Denys Vlasenko <dvlasenk@redhat.com>,
        Borislav Petkov <bp@alien8.de>,
        "the arch/x86 maintainers" <x86@kernel.org>,
        Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Content-Type: text/plain; charset=UTF-8
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Fri, Mar 27, 2015 at 4:31 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Brian Gerst <brgerst@gmail.com> wrote:
>
>> On Thu, Mar 26, 2015 at 8:42 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
>> > This change makes the check exact (no more false positives
>> > on kernel addresses).
>> >
>> > It isn't really important to be fully correct here -
>> > almost all addresses we'll ever see will be userspace ones,
>> > but OTOH it looks to be cheap enough:
>> > the new code uses two more ALU ops but preserves %rcx,
>> > allowing to not reload it from pt_regs->cx again.
>> > On disassembly level, the changes are:
>> >
>> > cmp %rcx,0x80(%rsp) -> mov 0x80(%rsp),%r11; cmp %rcx,%r11
>> > shr $0x2f,%rcx      -> shl $0x10,%rcx; sar $0x10,%rcx; cmp %rcx,%r11
>> > mov 0x58(%rsp),%rcx -> (eliminated)
>> >
>> > Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
>> > CC: Borislav Petkov <bp@alien8.de>
>> > CC: x86@kernel.org
>> > CC: linux-kernel@vger.kernel.org
>> > ---
>> >
>> > Andy, I'd undecided myself on the merits of doing this.
>> > If you like it, feel free to take it in your tree.
>> > I trimmed CC list to not bother too many people with this trivial
>> > and quite possibly "useless churn"-class change.
>> >
>> >  arch/x86/kernel/entry_64.S | 23 ++++++++++++-----------
>> >  1 file changed, 12 insertions(+), 11 deletions(-)
>> >
>> > diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
>> > index bf9afad..a36d04d 100644
>> > --- a/arch/x86/kernel/entry_64.S
>> > +++ b/arch/x86/kernel/entry_64.S
>> > @@ -688,26 +688,27 @@ retint_swapgs:            /* return to user-space */
>> >          * a completely clean 64-bit userspace context.
>> >          */
>> >         movq RCX(%rsp),%rcx
>> > -       cmpq %rcx,RIP(%rsp)             /* RCX == RIP */
>> > +       movq RIP(%rsp),%r11
>> > +       cmpq %rcx,%r11                  /* RCX == RIP */
>> >         jne opportunistic_sysret_failed
>> >
>> >         /*
>> >          * On Intel CPUs, sysret with non-canonical RCX/RIP will #GP
>> >          * in kernel space.  This essentially lets the user take over
>> > -        * the kernel, since userspace controls RSP.  It's not worth
>> > -        * testing for canonicalness exactly -- this check detects any
>> > -        * of the 17 high bits set, which is true for non-canonical
>> > -        * or kernel addresses.  (This will pessimize vsyscall=native.
>> > -        * Big deal.)
>> > +        * the kernel, since userspace controls RSP.
>> >          *
>> > -        * If virtual addresses ever become wider, this will need
>> > +        * If width of "canonical tail" ever become variable, this will need
>> >          * to be updated to remain correct on both old and new CPUs.
>> >          */
>> >         .ifne __VIRTUAL_MASK_SHIFT - 47
>> >         .error "virtual address width changed -- sysret checks need update"
>> >         .endif
>> > -       shr $__VIRTUAL_MASK_SHIFT, %rcx
>> > -       jnz opportunistic_sysret_failed
>> > +       /* Change top 16 bits to be a sign-extension of the rest */
>> > +       shl     $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx
>> > +       sar     $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx
>> > +       /* If this changed %rcx, it was not canonical */
>> > +       cmpq    %rcx, %r11
>> > +       jne     opportunistic_sysret_failed
>> >
>> >         cmpq $__USER_CS,CS(%rsp)        /* CS must match SYSRET */
>> >         jne opportunistic_sysret_failed
>>
>> Would it be possible to to skip this check entirely on AMD
>> processors? It's my understanding that AMD correctly issues the #GP
>> from CPL3, causing a stack switch.
>
> This needs a testcase I suspect.

IMO one decent way to write the test case would be to extend the
sigreturn test I just submitted.  For each n, do raise(SIGUSR1), then
change RCX and RIP to 2^n.  Return and catch the SIGSEGV, then restore
the original RIP.  Repeat with 2^n replaced with 2^n-1 and ~(2^n-1).

The only real trick is that we need to make sure that there's no
actual executable code at any of these addresses.

--Andy

>
>> Looking at the AMD docs, sysret doesn't even check for a canonical
>> address.  The #GP is probably from the instruction fetch at the
>> non-canonical address instead of from sysret itself.
>
> I suspect it's similar to what would happen if we tried a RET to a
> non-canonical address: the fetch fails and the JMP gets the #GP?
>
> In that sense it's the fault of the return instruction.
>
> Thanks,
>
>         Ingo


-- 
Andy Lutomirski
AMA Capital Management, LLC