linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Len Brown <lenb@kernel.org>
To: "Chang S. Bae" <chang.seok.bae@intel.com>
Cc: Borislav Petkov <bp@suse.de>, Andy Lutomirski <luto@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@kernel.org>, X86 ML <x86@kernel.org>,
	"Brown, Len" <len.brown@intel.com>,
	Dave Hansen <dave.hansen@intel.com>,
	"Liu, Jing2" <jing2.liu@intel.com>,
	"Ravi V. Shankar" <ravi.v.shankar@intel.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v5 15/28] x86/arch_prctl: Create ARCH_GET_XSTATE/ARCH_PUT_XSTATE
Date: Mon, 24 May 2021 19:10:57 -0400	[thread overview]
Message-ID: <CAJvTdKnrFSS0fvhNz5mb9v8epEVtphUesEUV0hhNErMBK5HNHQ@mail.gmail.com> (raw)
In-Reply-To: <20210523193259.26200-16-chang.seok.bae@intel.com>

On Sun, May 23, 2021 at 3:39 PM Chang S. Bae <chang.seok.bae@intel.com> wrote:
>
> N.B. This interface is currently under active discussion on LKML. This [v4]]
> proposal implements a per-task system call with GET/PUT semantics. A
> per-process system call without PUT semantics may be superior.

There is a better way...

In the v5 strawman here, every library routine that uses AMX
must precede that use with a system call
requesting permission from the kernel,
and then a system call yielding permission.
These calls must be always present because
this API requires that it be invoked per task,
but the library has no idea if the current thread has called the API or not.

The two concepts at play are "fine grained permission" and "buffer management".
We can improve on this by treating them as dependent, rather than equivalent.

The reality is that process-wide, rather than task-specific permission
is sufficient.
For if there exists a use-case where an administrator would want to grant
AMX access to one task in a process, but deny it from another, I have
yet to imagine it,
and I pity the application/library programmer who would have to face it.

Further, I submit that granting permission to an application,
but then denying permission to a subsequent request is unrealistic.
Applications will have already initialized their threads
and data structures based on the capabilities of the hardware.
Asking them to cope with such a change at run time is not reasonable.

The reality is that if they actually look at a failed return code,
they will exit.
If they ignore the return code and proceed, they will trap and crash.
Not likely will programmers be excited, or willing, to write code
to actually handle dynamic loss of permission.

The down-side of permission (whether it be per-process as proposed here,
or per-task, as in the v5 patch) is that to check if AMX is enabled, user-space
must check three things instead of two:

1. CPUID has AMX
2. XCR0 has AMX
3. Linux permission has been requested and granted to this process

If we accept that fine-grained permission is required, I don't see a practical
or efficient way to do it without #3.  (No, our hardware can not trap CPUID
reads, or avoid VMEXIT on XCR0 changes)

And so I propose:
1. Per-process permission (not per-task).
2. Permission, once granted, remains valid for the lifetime of that process.

And so any access to AMX before this process-wide permission is
granted will fail,
and any access to AMX after process-side permission is granted will succeed.

Period.

Which brings us to context switch buffer management.

After per-process permission is granted, we have two options on how
to handle context switch buffer management, both have merit:

1. On-demand.  Any task in a process that has AMX permission can touch AMX.
When it does, it takes a #NM, the kernel allocates the 8KB buffer, disarms XFD
and returns.  This is how v4 of this patch series worked.

The first benefit of on-demand is that user-space isn't mandated to do any
more Linux-specific system calls after the per-process permission is granted.

The second benefit of on-demand is that a process with 1,000 threads
and only 8 of them
in a pool actually touch AMX, then 8 buffers will be allocated, not 1,000.

The dis-advantage of on-demand is that there is no buffer release mechanism --
the buffer lives as long as the task lives.  Though, per previous conversation,
a future kernel could easily implement a buffer re-claim mechanism
behind the scenes
since the kernel is empowered to re-arm XFD for whatever reason it wants...

2. Synchronous allocation.  Any task in the process that has AMX permission can
make a 2nd system call to request that the kernel synchronously allocate the
8KB buffer for that task. *

* This could also be implemented to mean "allocate for this task
and upon the creation of all future threads in this process".
Though doing so could result in cases where the kernel allocates
many more buffers than are actually necessary.

The benefit of synchronous allocation is that after an application has created
all of its threads, it knows that it would have got a synchronous error code
from this special system call if any of them failed to allocate.

Note that detection of allocation failure upon thread creation
could also be implemented using dynamic allocation by simply touching
AMX state --
except the failure would be delivered through a signal,
rather than an error code from a special system call.

Although it is possible, and it is implemented in v5, I don't advocate that
synchronous allocation have a matching synchronous de-allocation.
I don't think programmers will use it; and I don't see a case for
complicating the kernel code with reference counters that are unused.

So the value proposition for synchronous allocation is thin, but it exists.

My recommendation is to implement both #1 and #2, and to document
that #1 may fail you with a signal, while #2 under the same scenario
would fail you with a return code.  If you are a programmer that prefers
that error code rather than writing a signal handler, then use the
new system call.

-Len Brown
Intel Open Source Technology Center

  reply	other threads:[~2021-05-24 23:11 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-23 19:32 [PATCH v5 00/28] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 01/28] x86/fpu/xstate: Modify the initialization helper to handle both static and dynamic buffers Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 02/28] x86/fpu/xstate: Modify state copy helpers " Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 03/28] x86/fpu/xstate: Modify address finders " Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 04/28] x86/fpu/xstate: Modify the context restore helper " Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 05/28] x86/fpu/xstate: Add a new variable to indicate dynamic user states Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 06/28] x86/fpu/xstate: Add new variables to indicate dynamic xstate buffer size Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 07/28] x86/fpu/xstate: Calculate and remember dynamic xstate buffer sizes Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 08/28] x86/fpu/xstate: Convert the struct fpu 'state' field to a pointer Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 09/28] x86/fpu/xstate: Introduce helpers to manage the xstate buffer dynamically Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 10/28] x86/fpu/xstate: Define the scope of the initial xstate data Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 11/28] x86/fpu/xstate: Update the xstate save function to support dynamic states Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 12/28] x86/fpu/xstate: Update the xstate buffer address finder " Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 13/28] x86/fpu/xstate: Update the xstate context copy function " Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 14/28] x86/fpu/xstate: Prevent unauthorised use of dynamic user state Chang S. Bae
2021-06-16 16:17   ` Dave Hansen
2021-06-16 16:27   ` Dave Hansen
2021-06-16 18:12     ` Andy Lutomirski
2021-06-16 18:47       ` Bae, Chang Seok
2021-06-16 19:01         ` Dave Hansen
2021-06-16 19:23           ` Bae, Chang Seok
2021-06-16 19:28             ` Dave Hansen
2021-06-16 19:37               ` Bae, Chang Seok
2021-06-28 10:11               ` Liu, Jing2
2021-06-29 17:43           ` Bae, Chang Seok
2021-06-29 17:54             ` Dave Hansen
2021-06-29 18:35               ` Bae, Chang Seok
2021-06-29 18:50                 ` Dave Hansen
2021-06-29 19:13                   ` Bae, Chang Seok
2021-06-29 19:26                     ` Dave Hansen
2021-05-23 19:32 ` [PATCH v5 15/28] x86/arch_prctl: Create ARCH_GET_XSTATE/ARCH_PUT_XSTATE Chang S. Bae
2021-05-24 23:10   ` Len Brown [this message]
2021-05-25 17:27     ` Borislav Petkov
2021-05-25 17:33       ` Dave Hansen
2021-05-26  0:38     ` Len Brown
2021-05-27 11:14       ` second, sync-alloc syscall Borislav Petkov
2021-05-27 13:59         ` Len Brown
2021-05-27 19:35           ` Andy Lutomirski
2021-05-25 15:46   ` [PATCH v5 15/28] x86/arch_prctl: Create ARCH_GET_XSTATE/ARCH_PUT_XSTATE Dave Hansen
2021-05-23 19:32 ` [PATCH v5 16/28] x86/fpu/xstate: Support ptracer-induced xstate buffer expansion Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 17/28] x86/fpu/xstate: Adjust the XSAVE feature table to address gaps in state component numbers Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 18/28] x86/fpu/xstate: Disable xstate support if an inconsistent state is detected Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 19/28] x86/cpufeatures/amx: Enumerate Advanced Matrix Extension (AMX) feature bits Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 20/28] x86/fpu/amx: Define AMX state components and have it used for boot-time checks Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 21/28] x86/fpu/amx: Initialize child's AMX state Chang S. Bae
2021-05-24  3:09   ` Andy Lutomirski
2021-05-24 17:37     ` Len Brown
2021-05-24 18:13       ` Andy Lutomirski
2021-05-24 18:21         ` Len Brown
2021-05-25  3:44           ` Andy Lutomirski
2021-05-23 19:32 ` [PATCH v5 22/28] x86/fpu/amx: Enable the AMX feature in 64-bit mode Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 23/28] selftest/x86/amx: Test cases for the AMX state management Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 24/28] x86/fpu/xstate: Use per-task xstate mask for saving xstate in signal frame Chang S. Bae
2021-05-24  3:15   ` Andy Lutomirski
2021-05-24 18:06     ` Len Brown
2021-05-25  4:47       ` Andy Lutomirski
2021-05-25 14:04         ` Len Brown
2021-05-23 19:32 ` [PATCH v5 25/28] x86/fpu/xstate: Skip writing zeros to signal frame for dynamic user states if in INIT-state Chang S. Bae
2021-05-24  3:25   ` Andy Lutomirski
2021-05-24 18:15     ` Len Brown
2021-05-24 18:29       ` Dave Hansen
2021-05-25  4:46       ` Andy Lutomirski
2021-05-23 19:32 ` [PATCH v5 26/28] selftest/x86/amx: Test case for AMX state copy optimization in signal delivery Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 27/28] x86/insn/amx: Add TILERELEASE instruction to the opcode map Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 28/28] x86/fpu/amx: Clear the AMX state when appropriate Chang S. Bae
2021-05-24  3:13   ` Andy Lutomirski
2021-05-24 14:10     ` Dave Hansen
2021-05-24 17:32       ` Len Brown
2021-05-24 17:39         ` Dave Hansen
2021-05-24 18:24           ` Len Brown
2021-05-27 11:56             ` Peter Zijlstra
2021-05-27 14:02               ` Len Brown
2021-05-24 14:06   ` Dave Hansen
2021-05-24 17:34     ` Len Brown
2021-05-24 21:11       ` [PATCH v5-fix " Chang S. Bae

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJvTdKnrFSS0fvhNz5mb9v8epEVtphUesEUV0hhNErMBK5HNHQ@mail.gmail.com \
    --to=lenb@kernel.org \
    --cc=bp@suse.de \
    --cc=chang.seok.bae@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=jing2.liu@intel.com \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@kernel.org \
    --cc=ravi.v.shankar@intel.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).