archive mirror
 help / color / mirror / Atom feed
From: Len Brown <>
To: "Chang S. Bae" <>
Cc: Borislav Petkov <>, Andy Lutomirski <>,
	Thomas Gleixner <>,
	Ingo Molnar <>, X86 ML <>,
	"Brown, Len" <>,
	Dave Hansen <>,
	"Liu, Jing2" <>,
	"Ravi V. Shankar" <>,
	Linux Kernel Mailing List <>
Subject: Re: [PATCH v5 15/28] x86/arch_prctl: Create ARCH_GET_XSTATE/ARCH_PUT_XSTATE
Date: Tue, 25 May 2021 20:38:16 -0400	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

After today's discussion, I believe we are close to consensus on this plan:

1. Kernel sets XCR0.AMX=1 at boot, and leaves it set, always.

2. Kernel arms XFD for all tasks, by default.

3. New prctl() system call allows any task in a process to ask for AMX
permission for all tasks in that process. Permission is granted for
the lifetime of that process, and there is no interface for a process
to "un-request" permission.

4. If a task touches AMX without permission, #NM/signal kills the process

5. If a task touches AMX with permission, #NM handler will
transparently allocate a context switch buffer, and disarm XFD for
that task. (MSR_XFD is context switched per task)

6. If the #NM handler can not allocate the 8KB buffer, the task will
receive a signal at the instruction that took the #NM fault, likely
resulting in program exit.

7. In addition, a 2nd system call to request that buffers be
pre-allocated is available. This is a per task system call. This
synchronous allocate system call will return an error code if it
fails, which will also likely result in program exit.

8. NEW TODAY: Linux will exclude the AMX 8KB region from the XSTATE on
the signal stack for tasks in process that do not have AMX permission.

9. For tasks in processes that have requested and received AMX
permission, Linux will XSAVE/XRESTOR directly to/from the signal
stack, and the stack will always include the 8KB space for AMX. (we do
have a manual optimization to in place to skip writing zeros to the
stack frame if AMX=INIT)

10. Linux reserves the right to plumb the new permission syscall into
cgroup administrative interface in the future.


Legacy software will not see signal stack growth on AMX hardware.

New AMX software will see AMX state on the signal stack.

If new AMX software uses an alternative signal stack, it should be
built using the signal.h ABI in glibc 2.34 or later, so that it can
calculate the appropriate size for the current hardware.  Note that
non-AMX software that is newly built will get the same answer from the
ABI, which would handle the case if it does use AMX.

Today it is possible for an application to calculate the uncompressed
XSTATE size from XCR0 and CPUID, allocate buffers of that size, and
use XSAVE and XRESTOR on those buffers in user-space.  Applications
can also XRESTOR from (and XSAVE back to) the signal stack, if they
choose.  Now, this capability can break for non-AMX programs, because
their XSAVE will be 8KB  larger than the buffer that they XRESTOR.
Andy L questions whether such applications actually exist, and Thomas
states that even if they do, that is a much smaller problem than 8KB
signal stack growth would be for legacy applications.

Unclear if we have consensus on the need for a synchronous allocation
system call (#7 above).  Observe that this system call does not
improve the likelihood of failure or the timing of failure.  An
#NM-based allocation and be done at exactly the same spot by simply
touching a TMM register.  The benefit of this system call is that it
returns an error code to the caller, versus the program being
delivered a SIGSEGV at the offending instruction pointer.  Both will
likely result in the program exiting, and at the same point in

A future mechanism to lazy harvest not-recently-used context switchy
buffers has been discussed.  Eg. the kernel under low memory could
re-arm XFD for all AMX tasks, and if their buffers are clean, free
them.  If that mechanism is implemented, and we also implement the
synchronous allocation system call, that mechanism must respect the
guarantee made by that system call and not harvest
system-call-allocated buffers.

Len Brown, Intel Open Source Technology Center

  parent reply	other threads:[~2021-05-26  0:38 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-23 19:32 [PATCH v5 00/28] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 01/28] x86/fpu/xstate: Modify the initialization helper to handle both static and dynamic buffers Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 02/28] x86/fpu/xstate: Modify state copy helpers " Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 03/28] x86/fpu/xstate: Modify address finders " Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 04/28] x86/fpu/xstate: Modify the context restore helper " Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 05/28] x86/fpu/xstate: Add a new variable to indicate dynamic user states Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 06/28] x86/fpu/xstate: Add new variables to indicate dynamic xstate buffer size Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 07/28] x86/fpu/xstate: Calculate and remember dynamic xstate buffer sizes Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 08/28] x86/fpu/xstate: Convert the struct fpu 'state' field to a pointer Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 09/28] x86/fpu/xstate: Introduce helpers to manage the xstate buffer dynamically Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 10/28] x86/fpu/xstate: Define the scope of the initial xstate data Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 11/28] x86/fpu/xstate: Update the xstate save function to support dynamic states Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 12/28] x86/fpu/xstate: Update the xstate buffer address finder " Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 13/28] x86/fpu/xstate: Update the xstate context copy function " Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 14/28] x86/fpu/xstate: Prevent unauthorised use of dynamic user state Chang S. Bae
2021-06-16 16:17   ` Dave Hansen
2021-06-16 16:27   ` Dave Hansen
2021-06-16 18:12     ` Andy Lutomirski
2021-06-16 18:47       ` Bae, Chang Seok
2021-06-16 19:01         ` Dave Hansen
2021-06-16 19:23           ` Bae, Chang Seok
2021-06-16 19:28             ` Dave Hansen
2021-06-16 19:37               ` Bae, Chang Seok
2021-06-28 10:11               ` Liu, Jing2
2021-06-29 17:43           ` Bae, Chang Seok
2021-06-29 17:54             ` Dave Hansen
2021-06-29 18:35               ` Bae, Chang Seok
2021-06-29 18:50                 ` Dave Hansen
2021-06-29 19:13                   ` Bae, Chang Seok
2021-06-29 19:26                     ` Dave Hansen
2021-05-23 19:32 ` [PATCH v5 15/28] x86/arch_prctl: Create ARCH_GET_XSTATE/ARCH_PUT_XSTATE Chang S. Bae
2021-05-24 23:10   ` Len Brown
2021-05-25 17:27     ` Borislav Petkov
2021-05-25 17:33       ` Dave Hansen
2021-05-26  0:38     ` Len Brown [this message]
2021-05-27 11:14       ` second, sync-alloc syscall Borislav Petkov
2021-05-27 13:59         ` Len Brown
2021-05-27 19:35           ` Andy Lutomirski
2021-05-25 15:46   ` [PATCH v5 15/28] x86/arch_prctl: Create ARCH_GET_XSTATE/ARCH_PUT_XSTATE Dave Hansen
2021-05-23 19:32 ` [PATCH v5 16/28] x86/fpu/xstate: Support ptracer-induced xstate buffer expansion Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 17/28] x86/fpu/xstate: Adjust the XSAVE feature table to address gaps in state component numbers Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 18/28] x86/fpu/xstate: Disable xstate support if an inconsistent state is detected Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 19/28] x86/cpufeatures/amx: Enumerate Advanced Matrix Extension (AMX) feature bits Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 20/28] x86/fpu/amx: Define AMX state components and have it used for boot-time checks Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 21/28] x86/fpu/amx: Initialize child's AMX state Chang S. Bae
2021-05-24  3:09   ` Andy Lutomirski
2021-05-24 17:37     ` Len Brown
2021-05-24 18:13       ` Andy Lutomirski
2021-05-24 18:21         ` Len Brown
2021-05-25  3:44           ` Andy Lutomirski
2021-05-23 19:32 ` [PATCH v5 22/28] x86/fpu/amx: Enable the AMX feature in 64-bit mode Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 23/28] selftest/x86/amx: Test cases for the AMX state management Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 24/28] x86/fpu/xstate: Use per-task xstate mask for saving xstate in signal frame Chang S. Bae
2021-05-24  3:15   ` Andy Lutomirski
2021-05-24 18:06     ` Len Brown
2021-05-25  4:47       ` Andy Lutomirski
2021-05-25 14:04         ` Len Brown
2021-05-23 19:32 ` [PATCH v5 25/28] x86/fpu/xstate: Skip writing zeros to signal frame for dynamic user states if in INIT-state Chang S. Bae
2021-05-24  3:25   ` Andy Lutomirski
2021-05-24 18:15     ` Len Brown
2021-05-24 18:29       ` Dave Hansen
2021-05-25  4:46       ` Andy Lutomirski
2021-05-23 19:32 ` [PATCH v5 26/28] selftest/x86/amx: Test case for AMX state copy optimization in signal delivery Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 27/28] x86/insn/amx: Add TILERELEASE instruction to the opcode map Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 28/28] x86/fpu/amx: Clear the AMX state when appropriate Chang S. Bae
2021-05-24  3:13   ` Andy Lutomirski
2021-05-24 14:10     ` Dave Hansen
2021-05-24 17:32       ` Len Brown
2021-05-24 17:39         ` Dave Hansen
2021-05-24 18:24           ` Len Brown
2021-05-27 11:56             ` Peter Zijlstra
2021-05-27 14:02               ` Len Brown
2021-05-24 14:06   ` Dave Hansen
2021-05-24 17:34     ` Len Brown
2021-05-24 21:11       ` [PATCH v5-fix " Chang S. Bae

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='' \ \ \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).