Re: [RFC PATCH 0/3] kernel: add support for 256-bit IO access

From: Andy Lutomirski <luto@kernel.org>
To: David Laight <David.Laight@aculab.com>
Cc: Andy Lutomirski <luto@kernel.org>, Ingo Molnar <mingo@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>,
	"x86@kernel.org" <x86@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"hpa@zytor.com" <hpa@zytor.com>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"torvalds@linux-foundation.org" <torvalds@linux-foundation.org>,
	"ganeshgr@chelsio.com" <ganeshgr@chelsio.com>,
	"nirranjan@chelsio.com" <nirranjan@chelsio.com>,
	"indranil@chelsio.com" <indranil@chelsio.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Fenghua Yu <fenghua.yu@intel.com>,
	Eric Biggers <ebiggers3@gmail.com>,
	Rik van Riel <riel@redhat.com>
Subject: Re: [RFC PATCH 0/3] kernel: add support for 256-bit IO access
Date: Wed, 21 Mar 2018 00:39:53 +0000	[thread overview]
Message-ID: <CALCETrV8d5HD2q_qmiRfs-vB=JkiDm3yDx-5KcYqu7yEQrdu0A@mail.gmail.com> (raw)
In-Reply-To: <c1eac43cd8e143d09477a34ed6de6302@AcuMS.aculab.com>

On Tue, Mar 20, 2018 at 3:10 PM, David Laight <David.Laight@aculab.com> wrote:
> From: Andy Lutomirski
>> Sent: 20 March 2018 14:57
> ...
>> I'd rather see us finally finish the work that Rik started to rework
>> this differently.  I'd like kernel_fpu_begin() to look like:
>>
>> if (test_thread_flag(TIF_NEED_FPU_RESTORE)) {
>>   return; // we're already okay.  maybe we need to check
>> in_interrupt() or something, though?
>> } else {
>>   XSAVES/XSAVEOPT/XSAVE;
>>   set_thread_flag(TIF_NEED_FPU_RESTORE):
>> }
>>
>> and kernel_fpu_end() does nothing at all.
>
> I guess it might need to set (clear?) the CFLAGS bit for a process
> that isn't using the fpu at all - which seems a sensible feature.

What do you mean "CFLAGS"?

But we no longer have any concept of "isn't using the fpu at all" --
we got rid of that.

>
>> We take the full performance hit for a *single* kernel_fpu_begin() on
>> an otherwise short syscall or interrupt, but there's no additional
>> cost for more of them or for long-enough-running things that we
>> schedule in the middle.
>
> It might be worth adding a parameter to kernel_fpu_begin() to indicate
> which registers are needed, and a return value to say what has been
> granted.
> Then a driver could request AVX2 (for example) and use a fallback path
> if the register set isn't available (for any reason).
> A call from an ISR could always fail.

Last time I benchmarked it, XSAVEC on none of the state wasn't a whole
lot faster than XSAVEC for all of it.

>
>> As I remember, the main hangup was that this interacts a bit oddly
>> with PKRU, but that's manageable.
>
> WTF PKRU ??

PKRU is uniquely demented.  All the rest of the XSAVEC state only
affects code that explicitly references that state.  PKRU affects
every single access to user pages, so we need PKRU to match the
current task at all times in the kernel.  This means that, if we start
deferring XRSTORS until prepare_exit_to_usermode(), we need to start
restoring PKRU using WRPKRU in __switch_to().  Of course, *that*
interacts a bit oddly with XINUSE, but maybe we don't care.

Phooey on you, Intel, for putting PKRU into xstate and not giving a
fast instruction to control XINUSE.