From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1752154AbcJDCKJ (ORCPT <rfc822;w@1wt.eu>);
        Mon, 3 Oct 2016 22:10:09 -0400
Received: from mail-vk0-f52.google.com ([209.85.213.52]:33930 "EHLO
        mail-vk0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1751393AbcJDCKH (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 3 Oct 2016 22:10:07 -0400
MIME-Version: 1.0
In-Reply-To: <1475544579.21644.37.camel@redhat.com>
References: <1475353895-22175-1-git-send-email-riel@redhat.com>
 <1475353895-22175-3-git-send-email-riel@redhat.com> <CALCETrWnhA4Gd5AxDEJz7d-VQNDN7O8ayyna7fz3=HNtNnARxQ@mail.gmail.com>
 <1475366900.21644.8.camel@redhat.com> <CALCETrWzYa1F7ucq=gMUhEbY+=uqSAaOgw1qZA3VkKrCo_7XOg@mail.gmail.com>
 <1475529679.4622.27.camel@redhat.com> <CALCETrX_A71XYJxSxprd_ujHFUK_M7UfmCXeNnP1iGx17CSu8A@mail.gmail.com>
 <1475544579.21644.37.camel@redhat.com>
From: Andy Lutomirski <luto@amacapital.net>
Date: Mon, 3 Oct 2016 19:09:46 -0700
Message-ID: <CALCETrVQetdSzd9t1njEXpSiX-4xoGjG-fTfQKwfb=x4Jxpx4A@mail.gmail.com>
Subject: Re: [PATCH RFC 2/5] x86,fpu: delay FPU register loading until switch
 to userspace
To: Rik van Riel <riel@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        X86 ML <x86@kernel.org>, Thomas Gleixner <tglx@linutronix.de>,
        Paolo Bonzini <pbonzini@redhat.com>, Ingo Molnar <mingo@redhat.com>,
        Andrew Lutomirski <luto@kernel.org>, "H. Peter Anvin" <hpa@zytor.com>,
        Borislav Petkov <bp@suse.de>
Content-Type: text/plain; charset=UTF-8
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Mon, Oct 3, 2016 at 6:29 PM, Rik van Riel <riel@redhat.com> wrote:
> On Mon, 2016-10-03 at 14:36 -0700, Andy Lutomirski wrote:
>> On Mon, Oct 3, 2016 at 2:21 PM, Rik van Riel <riel@redhat.com> wrote:
>> >
>> > >
>> > > One of my objections to the current code structure is that I find
>> > > it
>> > > quite hard to keep track of all of the possible states of the fpu
>> > > that
>> > > we have.  Off the top of my head, we have:
>> > >
>> > > 1. In-cpu state is valid and in-memory state is invalid.  (This
>> > > is
>> > > the
>> > > case when we're running user code, for example.)
>> > > 2. In-memory state is valid.
>> > > 2a. In-cpu state is *also* valid for some particular task.  In
>> > > this
>> > > state, no one may write to either FPU state copy for that task
>> > > lest
>> > > they get out of sync.
>> > > 2b. In-cpu state is not valid.
>> >
>> > It gets more fun with ptrace. Look at xfpregs_get
>> > and xfpregs_set, which call fpu__activate_fpstate_read
>> > and fpu__activate_fpstate_write respectively.
>> >
>> > It looks like those functions already deal with the
>> > cases you outline above.
>>
>> Hmm, maybe they could be used.  The names are IMO atrocious.  I had
>> to
>> read the docs and the code to figure out WTF "activating" the
>> "fpstate" even meant.
>
> If you have good naming ideas, I'd be willing to submit
> patches to rename them :)
>
>> > >
>> > > Rik, your patches further complicate this by making it possible
>> > > to
>> > > have the in-cpu state be valid for a non-current task even when
>> > > non-idle.
>> >
>> > However, we never need to examine the in-cpu state for
>> > a task that is not currently running, or being scheduled
>> > to in the context switch code.
>> >
>> > We always save the FPU register state when a task is
>> > context switched out, so for a non-current task, we
>> > still only need to consider whether the in-memory state
>> > is valid or not.
>> >
>> > The in-cpu state does not matter.
>> >
>> > fpu__activate_fpstate_write will disable lazy restore,
>> > and ensure a full restore from memory.
>>
>> Something has to fix up owner_ctx so that the previous thread doesn't
>> think its CPU state is still valid.  I don't think
>> activate_fpstate_write does it, because nothing should (I think) be
>> calling it as a mattor of course on context switch.
>
> These functions are called on tasks that are ptraced,
> and currently stopped. fpu__activate_fpstate_write
> sets fpu->last_cpu to -1, which will cause the next
> context switch to the task to load the fpstate into
> the registers.
>
> The status in the struct fpu keeps track of the status
> of things, though maybe not as explicitly as you would
> want.
>
> Having two separate status booleans for "registers valid"
> and "memory valid" may make more sense.

I have no problem with the concept of "owner_ctx", and I think it's a
perfectly reasonable data structure.  My problem with it is that it's
subtle and knowledge of it is spread all over the place.  Just going
with "registers valid" in a variable won't work, I think, because
there's nowhere to put it.  We need to be able to delete a struct fpu
while that struct fpu might have a valid copy in a different cpu's
registers.

Anyway, feel free to tell me that I'm making this too difficult :)

>
> Running the task in user space, would have to ensure
> "registers valid" is true, and make "memory valid"
> false, because userspace could write to the registers.
>
> Doing a ptrace fpstate_read would make "memory valid"
> true, but does not need to invalidate "registers valid".
>
> Doing a ptrace fpstate_write would make "memory valid"
> true, but would invalidate "registers valid".
>
> Context switching away from a task would make "memory
> valid" true, by virtue of copying the fpstate to
> memory.
>
> Ingo, would you have any objection to patches tracking
> the validity of both register and memory states
> independently?
>
> We can get rid of fpu.counter, since nobody uses it
> any more.

We should definitely do this.

Maybe getting in some cleanups first (my lazy fpu deletion,
fpu.counter removal, etc) first is the way to go.

>
>> > >  This is why I'm thinking that we should maybe revisit the
>> > > API a bit to make this clearer and to avoid scattering around all
>> > > the
>> > > state manipulation quite so much.  I propose, as a straw-man:
>> > >
>> > > user_fpregs_copy_to_cpu() -- Makes the CPU state be valid for
>> > > current.
>> > > After calling this, we're in state 1 or 2a.
>> > >
>> > > user_fpregs_move_to_cpu() -- Makes the CPU state be valid and
>> > > writable
>> > > for current.  Invalidates any in-memory copy.  After calling
>> > > this,
>> > > we're in state 1.
>> >
>> > How can we distinguish between these two?
>> >
>> > If the task is running, it can change its FPU
>> > registers at any time, without any obvious thing
>> > to mark the transition from state 2a to state 1.
>>
>> That's true now, but I don't think it's really true any more with
>> your
>> patches.  If a task is running in *kernel* mode, then I think that
>> pretty much all of the combinations are possible.  When the task
>> actually enters user mode, we have to make sure we're in state 1 so
>> the CPU regs belong to the task and are writable.
>
> That is fair.
>
>> > It sounds like the first function you name would
>> > only be useful if we prevented a switch to user
>> > space, which could immediately do whatever it wants
>> > with the registers.
>> >
>> > Is there a use case for user_fpregs_copy_to_cpu()?
>>
>> With PKRU, any user memory access would need
>> user_fpregs_copy_to_cpu().  I think we should get rid of this and
>> just
>> switch to using RDPKRU and WRPKRU to eagerly update PKRU in every
>> context switch to avoid this.  Other than that, I can't think of any
>> use cases off the top of my head unless the signal code wanted to
>> play
>> games with this stuff.
>
> You are right, read_pkru() and write_pkru() can only deal with
> the pkru state being present in registers. Is this because of an
> assumption in the code, or because of a hardware requirement?
>
>> > > user_fpregs_copy_to_memory() -- Makes the in-memory state be
>> > > valid
>> > > for
>> > > current.  After calling this, we're in state 2a or 2b.
>> > >
>> > > user_fpregs_move_to_memory() -- Makes the in-memory state be
>> > > valid
>> > > and
>> > > writable for current.  After calling this, we're in state 2b.
>> >
>> > We can only ensure that the in memory copy stays
>> > valid, by stopping the task from writing to the
>> > in-CPU copy.
>> >
>> > This sounds a lot like fpu__activate_fpstate_read
>> >
>> > What we need after copying the contents to memory is
>> > a way to invalidate the in-CPU copy, if changes were
>> > made to the in-memory copy.
>>
>> My proposal is that we invalidate the CPU state *before* writing the
>> in-memory state, not after.
>
> Modifying the in-memory FPU state can only be done on a task
> that is not currently running in user space. As long as the
> modification is done before task runs again, that is safe.
>
> I am probably missing some special case you are concerned about.
>
>> > This is already done by fpu__activate_fpstate_write
>> >
>> > >
>> > > Scheduling out calls user_fpregs_copy_to_memory(), as do all the
>> > > MPX
>> > > and PKRU memory state readers.  MPX and PKRU will do
>> > > user_fpregs_move_to_memory() before writing to memory.
>> > > kernel_fpu_begin() does user_fpregs_move_to_memory().
>> >
>> > This is done by switch_fpu_prepare.
>> >
>> > Can you think of any other place in the kernel where we
>> > need a 1 -> 2a transition, besides context switching and
>> > ptrace/xfpregs_get?
>>
>> Anything else that tries to read task xstate from memory, i.e. MPX
>> and
>> PKRU.  (Although if we switch to eager-switched PKRU, then PKRU stops
>> mattering for this purpose.)
>>
>> Actually, I don't see any way your patches can be compatible with
>> PKRU
>> without switching to eager-switched PKRU.
>
> My patches preserve eager saving of the FPU state, they only
> make restore lazy. That is 1 -> 2a transitions continue to
> be eager.
>
> --
> All Rights Reversed.


-- 
Andy Lutomirski
AMA Capital Management, LLC