From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1754347AbbHRWko (ORCPT <rfc822;w@1wt.eu>);
	Tue, 18 Aug 2015 18:40:44 -0400
Received: from mail-ob0-f181.google.com ([209.85.214.181]:35393 "EHLO
	mail-ob0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1752464AbbHRWkl (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Tue, 18 Aug 2015 18:40:41 -0400
MIME-Version: 1.0
In-Reply-To: <20150818223409.GB12858@lerouge>
References: <60e90901eee611e59e958bfdbbe39969b4f88fe5.1435952415.git.luto@kernel.org>
 <tip-02bc7768fe447ae305e924b931fa629073a4a1b9@git.kernel.org>
 <20150811223827.GB15639@lerouge> <CALCETrVo+FoiWoRJXAtcONm9itV1FsogCDx8U1SussBpr_K9NA@mail.gmail.com>
 <20150811232234.GD15639@lerouge> <CALCETrVN00n0MJ-F1idkE7i4NA6H_kN69ztsTqH6sOpJHJ0NMQ@mail.gmail.com>
 <20150812133217.GB21542@lerouge> <CALCETrVMtAuckNBctBbXUPpTNRV1PoPufJUQuSuZ8V5vQgFW6A@mail.gmail.com>
 <20150818223409.GB12858@lerouge>
From: Andy Lutomirski <luto@amacapital.net>
Date: Tue, 18 Aug 2015 15:40:20 -0700
Message-ID: <CALCETrVD1EyH075Lwja4rCLg9OXSB7ZgQtP_5ZwFypf3b9q9sQ@mail.gmail.com>
Subject: Re: [tip:x86/asm] x86/asm/entry/64: Migrate error and IRQ exit work
 to C and remove old assembly code
To: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>, Rik van Riel <riel@redhat.com>,
        Borislav Petkov <bp@alien8.de>, Peter Zijlstra <peterz@infradead.org>,
        Brian Gerst <brgerst@gmail.com>,
        Denys Vlasenko <vda.linux@googlemail.com>,
        Kees Cook <keescook@chromium.org>,
        Thomas Gleixner <tglx@linutronix.de>, Oleg Nesterov <oleg@redhat.com>,
        Andrew Lutomirski <luto@kernel.org>,
        Linus Torvalds <torvalds@linux-foundation.org>,
        Ingo Molnar <mingo@kernel.org>, "H. Peter Anvin" <hpa@zytor.com>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        "linux-tip-commits@vger.kernel.org" 
	<linux-tip-commits@vger.kernel.org>
Content-Type: text/plain; charset=UTF-8
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Tue, Aug 18, 2015 at 3:34 PM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
> On Wed, Aug 12, 2015 at 07:59:44AM -0700, Andy Lutomirski wrote:
>> On Wed, Aug 12, 2015 at 6:32 AM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
>> > Right, and doing it the way we did previously was safe wrt. that.
>> >
>> > Can't we have exceptions slow path just like the way we do it in syscalls?
>> >
>> > Then the exception slow path would just do:
>> >
>> >     if TIF_NOHZ
>> >        ctx = exception_enter()
>> >     exception_handler()
>> >     if TIF_NOHZ
>> >        exception_exit(ctx)
>>
>> What's the purpose of TIF_NOHZ right now?  For syscalls, it makes
>> sense, but is there any case in which TIF_NOHZ is set on one CPU but
>> not on another CPU?  It might make sense to get the performance back
>> using static keys instead of TIF_NOHZ.
>
> Sure if we can manage to do that. The nice thing about TIF flags is that
> they are a single check that is always there.
>

True, although my patch loses that benefit for the fast compat entries
due to the syscall arg fault stuff (what a mess!).

>>
>> If we switched back to exception_enter, we'd have to remember the
>> previous state, and, with a single exception right now, I think that's
>> unnecessary.
>>
>> I think there are only three states we can be in at exception entry:
>> user (and user_mode(regs)), kernel (and kernel_mode(regs)), or
>> NMI-like.
>
> But we can have user && (!user_mode(regs)) if exception happens on exception
> entry code.

I sure hope not, unless it nests inside an NMI-like thing.  It's
conceivable that this might happen due to perf NMIs causing a failed
MSR read or similar.  We might need to relax the assertions to check
that we're either in kernel or NMI context.  If so, that's
straightforward.  Meanwhile no one has reported this happening.

>
>> In the user case, the new code is correct.  In the kernel
>> case, the new code is also correct.  In the NMI case (if we're nested
>> in an NMI or similar entry)) then it is and was the responsibility of
>> the NMI-like entry to call rcu_nmi_enter(), and things that nest
>> inside that shouldn't touch context tracking (with the possible
>> exception of calling rcu_nmi_enter() again).
>>
>> In current -tip, there's a slight hole in this due to syscalls, and I'll fix it.
>
> There must be a check for context tracking enabled anyway. So why can't
> we just just do in exception entry code:
>
>        if (exception_slow_path()) {
>            exception_enter()
>            exception_handler()
>            exception_exit()
>        } else {
>            normal stuff
>        }
>
> Especially if we can manage to implement static keys in ASM, this will sum up to
> a single one.

There isn't really an exception slow path.  There's already a branch
for user vs kernel (in the CPL sense), and with my patches, there's no
additional branch for previous context tracking state.

>
>> >> The latter is annoying, but the entry code needs to deal with it
>> >> anyway.  For example, any exception early in NMI is currently really
>> >> bad.  Non-IST exceptions very early in SYSCALL are fatal.
>> >> Non-paranoid exceptions outside swapgs are fatal.  Etc.
>> >
>> > Sure but that doesn't mean I'm happy with introducing new fragile path
>> > like those. Especially as we have a way to fix without more overhead.
>>
>> I think my approach can work with even less overhead: there are fewer
>> branches due to checking the previous state.
>>
>> >> > Also as long as there is at least one instruction between entry to the kernel
>> >> > and context tracking noting it, there is a risk for an exception. Hence entry
>> >> > code will never be atomic enough to avoid this kind of bugs.
>> >>
>> >> By that argument, we're doomed.  Non-IST exceptions outside swapgs are fatal.
>> >
>> > Does that concern only error_entry() exceptions?
>>
>> Yes, but the set of paranoid_entry exceptions is shrinking.  In -tip, there are:
>>
>> NMI: NMI is special and will call rcu_nmi_enter().  Nothing's changing here.
>>
>> MCE: Once upon a time, MCE was simply buggy.  As of 4.0 (IIRC) MCE
>> from kernel mode calls rcu_nmi_enter().
>>
>> BP: This is going away, I think.  #BP should stop being special by 4.4.
>>
>> DB: That's the only weird case.  Patches to prevent instruction
>> breakpoints in entry code are already in -tip.  The only thing left is
>> kernel watchpoints, and we need to do something about that.
>
> So now we can't set a breakpoint on syscall entry anymore?
>
> I'm still nervous with all that.

We haven't done anything that would make breakpoints on syscall entry
less safe than they were, but we now disallow the breakpoints.  In the
future, we might take advantage of that change.

-- 
Andy Lutomirski
AMA Capital Management, LLC