All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 00/28] x86: Support Intel Advanced Matrix Extensions
@ 2021-05-23 19:32 Chang S. Bae
  2021-05-23 19:32 ` [PATCH v5 01/28] x86/fpu/xstate: Modify the initialization helper to handle both static and dynamic buffers Chang S. Bae
                   ` (27 more replies)
  0 siblings, 28 replies; 75+ messages in thread
From: Chang S. Bae @ 2021-05-23 19:32 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, dave.hansen, jing2.liu, ravi.v.shankar, linux-kernel,
	chang.seok.bae

Intel Advanced Matrix Extensions (AMX)[1][2] will be shipping on servers
soon.  AMX consists of configurable TMM "TILE" registers plus new
accelerator instructions that operate on them.  TMUL (Tile matrix MULtiply)
is the first accelerator instruction set to use the new registers, and we
anticipate additional instructions in the future.

Neither AMX state nor TMUL instructions depend on AVX.  However, AMX and
AVX do share common challenges.  The TMM registers are 8KB today, and
architecturally as large as 64KB, which merits updates to hardware and
software state management.

Further, both technologies run faster when they are not simultaneously
running on SMT siblings, and both technologies use of power and bandwidth
impact the power and performance available to neighboring cores.  (This
impact has measurably improved in recent hardware.)

If the existing kernel approach for managing XSAVE state was employed to
handle AMX, 8KB space would be added to every task, but possibly rarely
used.  So Linux support is optimized by using a new XSAVE feature: eXtended
Feature Disabling (XFD).  The kernel arms XFD to provide a #NM exception
upon a tasks' first access to TILE state. The kernel exception handler
installs the appropriate XSAVE context switch buffer, and the task behaves
as if the kernel had done that for all tasks.  Using XFD, AMX space is
allocated only when needed, eliminating the memory waste for unused state
components.

This series requires glibc 2.34 or later [8][9], or the new minimum signal
stack size export [3]. The patches are based on the mainline (4.13-rc2).
The series is composed of three parts:
* Patch 01-18: Foundation to support dynamic user state management
* Patch 19-23: AMX enablement, including unit tests
* Patch 24-28: Optimizations for signal frame and for power/performance

Note. The per-task system call with GET/PUT semantics in PATCH15 is
currently under active discussion on LKML [10]. This is its v1 proposal.

Thanks to Len Brown and Dave Hansen for help with the cover letter.

Changes from v4 [7]:
* Changed the buffer expansion policy to the access-request based approach
  from the transparent #NM-based approach. (Andy Lutomirski, Thomas
  Gleixner, and et al)
* Removed the boot parameter patch. (Thomas Gleixner)
* Included code to explicitly initialize AMX state during a context switch.
  (Thomas Gleixner)
* Added a new arch_prctl to pre-allocate a buffer for dynamic state. (Andy
  Lutomirski)
* Updated the fork() path to initialize all the AMX state.
* Improved ptracer's dynamic user state injection path.
* Add optimization to skip tile data in sigframe when an AMX thread
  initialized the state.
* Updated to treat the mismatched state size as an error. (Thomas Gleixner)
* Simplified the xstate feature check routine. (Thomas Gleixner)
* Simplified and updated the selftest.
* Updated some changelog. (Thomas Gleixner)
* Updated a function description. (Borislav Petkov)

Changes from v3 [6]:
* Updated some commit messages and code comments. (Borislav Petkov)
* Added and removed some helpers. (Borislav Petkov)
* Revised the buffer allocation function. (Borislav Petkov)
* Simplified in accessing buffers. (Borislav Petkov)
* Re-organized some code change more reviewable. (PATCH9/10)
* Reverted unnecessary changes. (PATCH4)
* Fixed typo in the documentation. (Randy Dunlap)

Changes from v2 [5]:
* Removed the patch for the tile data inheritance. Also, updated the
  selftest patch. (Andy Lutomirski)
* Changed the kernel tainted when any unknown state is enabled. (Andy
  Lutomirski)
* Changed to use the XFD feature only when the compacted format in use.
* Improved the test code.
* Simplified the cmdline handling.
* Removed 'task->fpu' in changelogs. (Boris Petkov)
* Updated the variable name / comments / changelogs for clarification.

Changes from v1 [4]:
* Added vmalloc() error tracing (Dave Hansen, PeterZ, and Andy Lutomirski)
* Inlined the #NM handling code (Andy Lutomirski)
* Made signal handling optimization revertible
* Revised the new parameter handling code (Andy Lutomirski and Dave Hansen)
* Rebased on the upstream kernel

[1]: Intel Architecture Instruction Set Extension Programming Reference
    February 2021, https://software.intel.com/content/dam/develop/external/us/en/documents-tps/architecture-instruction-set-extensions-programming-reference.pdf
[2]: https://software.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top/compiler-reference/intrinsics/intrinsics-for-intel-advanced-matrix-extensions-intel-amx-instructions.html
[3]: https://lore.kernel.org/lkml/20210518200320.17239-1-chang.seok.bae@intel.com/
[4]: https://lore.kernel.org/lkml/20201001203913.9125-1-chang.seok.bae@intel.com/
[5]: https://lore.kernel.org/lkml/20201119233257.2939-1-chang.seok.bae@intel.com/
[6]: https://lore.kernel.org/lkml/20201223155717.19556-1-chang.seok.bae@intel.com/
[7]: https://lore.kernel.org/lkml/20210221185637.19281-1-chang.seok.bae@intel.com/
[8]: https://sourceware.org/git/?p=glibc.git;a=commit;h=6c57d320484988e87e446e2e60ce42816bf51d53
[9]: https://sourceware.org/git/?p=glibc.git;a=blob;f=NEWS;h=aa0f10a891f8f9b4e6f0f6d25b6a307898c07d82;hb=HEAD#l12
[10]: https://lore.kernel.org/lkml/CALCETrW2QHa2TLvnUuVxAAheqcbSZ-5_WRXtDSAGcbG8N+gtdQ@mail.gmail.com/

Chang S. Bae (28):
  x86/fpu/xstate: Modify the initialization helper to handle both static
    and dynamic buffers
  x86/fpu/xstate: Modify state copy helpers to handle both static and
    dynamic buffers
  x86/fpu/xstate: Modify address finders to handle both static and
    dynamic buffers
  x86/fpu/xstate: Modify the context restore helper to handle both
    static and dynamic buffers
  x86/fpu/xstate: Add a new variable to indicate dynamic user states
  x86/fpu/xstate: Add new variables to indicate dynamic xstate buffer
    size
  x86/fpu/xstate: Calculate and remember dynamic xstate buffer sizes
  x86/fpu/xstate: Convert the struct fpu 'state' field to a pointer
  x86/fpu/xstate: Introduce helpers to manage the xstate buffer
    dynamically
  x86/fpu/xstate: Define the scope of the initial xstate data
  x86/fpu/xstate: Update the xstate save function to support dynamic
    states
  x86/fpu/xstate: Update the xstate buffer address finder to support
    dynamic states
  x86/fpu/xstate: Update the xstate context copy function to support
    dynamic states
  x86/fpu/xstate: Prevent unauthorised use of dynamic user state
  x86/arch_prctl: Create ARCH_GET_XSTATE/ARCH_PUT_XSTATE
  x86/fpu/xstate: Support ptracer-induced xstate buffer expansion
  x86/fpu/xstate: Adjust the XSAVE feature table to address gaps in
    state component numbers
  x86/fpu/xstate: Disable xstate support if an inconsistent state is
    detected
  x86/cpufeatures/amx: Enumerate Advanced Matrix Extension (AMX) feature
    bits
  x86/fpu/amx: Define AMX state components and have it used for
    boot-time checks
  x86/fpu/amx: Initialize child's AMX state
  x86/fpu/amx: Enable the AMX feature in 64-bit mode
  selftest/x86/amx: Test cases for the AMX state management
  x86/fpu/xstate: Use per-task xstate mask for saving xstate in signal
    frame
  x86/fpu/xstate: Skip writing zeros to signal frame for dynamic user
    states if in INIT-state
  selftest/x86/amx: Test case for AMX state copy optimization in signal
    delivery
  x86/insn/amx: Add TILERELEASE instruction to the opcode map
  x86/fpu/amx: Clear the AMX state when appropriate

 arch/x86/include/asm/cpufeatures.h    |   4 +
 arch/x86/include/asm/fpu/internal.h   |  93 +++-
 arch/x86/include/asm/fpu/types.h      |  69 ++-
 arch/x86/include/asm/fpu/xstate.h     |  42 +-
 arch/x86/include/asm/msr-index.h      |   2 +
 arch/x86/include/asm/pgtable.h        |   2 +-
 arch/x86/include/asm/processor.h      |  10 +-
 arch/x86/include/asm/proto.h          |   2 +-
 arch/x86/include/asm/special_insns.h  |   6 +
 arch/x86/include/asm/trace/fpu.h      |   9 +-
 arch/x86/include/uapi/asm/prctl.h     |   3 +
 arch/x86/kernel/cpu/common.c          |   2 +-
 arch/x86/kernel/cpu/cpuid-deps.c      |   4 +
 arch/x86/kernel/fpu/core.c            | 100 +++-
 arch/x86/kernel/fpu/init.c            |  42 +-
 arch/x86/kernel/fpu/regset.c          |  66 ++-
 arch/x86/kernel/fpu/signal.c          |  61 ++-
 arch/x86/kernel/fpu/xstate.c          | 725 +++++++++++++++++++++----
 arch/x86/kernel/process.c             |  20 +-
 arch/x86/kernel/process_32.c          |   2 +-
 arch/x86/kernel/process_64.c          |   2 +-
 arch/x86/kernel/traps.c               |  12 +
 arch/x86/kvm/x86.c                    |  48 +-
 arch/x86/lib/x86-opcode-map.txt       |   8 +-
 arch/x86/math-emu/fpu_aux.c           |   2 +-
 arch/x86/math-emu/fpu_entry.c         |   4 +-
 arch/x86/math-emu/fpu_system.h        |   2 +-
 arch/x86/mm/pkeys.c                   |   2 +-
 tools/arch/x86/lib/x86-opcode-map.txt |   8 +-
 tools/testing/selftests/x86/Makefile  |   2 +-
 tools/testing/selftests/x86/amx.c     | 738 ++++++++++++++++++++++++++
 31 files changed, 1825 insertions(+), 267 deletions(-)
 create mode 100644 tools/testing/selftests/x86/amx.c


base-commit: d07f6ca923ea0927a1024dfccafc5b53b61cfecc
--
2.17.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

end of thread, other threads:[~2021-06-29 19:27 UTC | newest]

Thread overview: 75+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-23 19:32 [PATCH v5 00/28] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 01/28] x86/fpu/xstate: Modify the initialization helper to handle both static and dynamic buffers Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 02/28] x86/fpu/xstate: Modify state copy helpers " Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 03/28] x86/fpu/xstate: Modify address finders " Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 04/28] x86/fpu/xstate: Modify the context restore helper " Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 05/28] x86/fpu/xstate: Add a new variable to indicate dynamic user states Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 06/28] x86/fpu/xstate: Add new variables to indicate dynamic xstate buffer size Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 07/28] x86/fpu/xstate: Calculate and remember dynamic xstate buffer sizes Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 08/28] x86/fpu/xstate: Convert the struct fpu 'state' field to a pointer Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 09/28] x86/fpu/xstate: Introduce helpers to manage the xstate buffer dynamically Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 10/28] x86/fpu/xstate: Define the scope of the initial xstate data Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 11/28] x86/fpu/xstate: Update the xstate save function to support dynamic states Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 12/28] x86/fpu/xstate: Update the xstate buffer address finder " Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 13/28] x86/fpu/xstate: Update the xstate context copy function " Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 14/28] x86/fpu/xstate: Prevent unauthorised use of dynamic user state Chang S. Bae
2021-06-16 16:17   ` Dave Hansen
2021-06-16 16:27   ` Dave Hansen
2021-06-16 18:12     ` Andy Lutomirski
2021-06-16 18:47       ` Bae, Chang Seok
2021-06-16 19:01         ` Dave Hansen
2021-06-16 19:23           ` Bae, Chang Seok
2021-06-16 19:28             ` Dave Hansen
2021-06-16 19:37               ` Bae, Chang Seok
2021-06-28 10:11               ` Liu, Jing2
2021-06-29 17:43           ` Bae, Chang Seok
2021-06-29 17:54             ` Dave Hansen
2021-06-29 18:35               ` Bae, Chang Seok
2021-06-29 18:50                 ` Dave Hansen
2021-06-29 19:13                   ` Bae, Chang Seok
2021-06-29 19:26                     ` Dave Hansen
2021-05-23 19:32 ` [PATCH v5 15/28] x86/arch_prctl: Create ARCH_GET_XSTATE/ARCH_PUT_XSTATE Chang S. Bae
2021-05-24 23:10   ` Len Brown
2021-05-25 17:27     ` Borislav Petkov
2021-05-25 17:33       ` Dave Hansen
2021-05-26  0:38     ` Len Brown
2021-05-27 11:14       ` second, sync-alloc syscall Borislav Petkov
2021-05-27 13:59         ` Len Brown
2021-05-27 19:35           ` Andy Lutomirski
2021-05-25 15:46   ` [PATCH v5 15/28] x86/arch_prctl: Create ARCH_GET_XSTATE/ARCH_PUT_XSTATE Dave Hansen
2021-05-23 19:32 ` [PATCH v5 16/28] x86/fpu/xstate: Support ptracer-induced xstate buffer expansion Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 17/28] x86/fpu/xstate: Adjust the XSAVE feature table to address gaps in state component numbers Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 18/28] x86/fpu/xstate: Disable xstate support if an inconsistent state is detected Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 19/28] x86/cpufeatures/amx: Enumerate Advanced Matrix Extension (AMX) feature bits Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 20/28] x86/fpu/amx: Define AMX state components and have it used for boot-time checks Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 21/28] x86/fpu/amx: Initialize child's AMX state Chang S. Bae
2021-05-24  3:09   ` Andy Lutomirski
2021-05-24 17:37     ` Len Brown
2021-05-24 18:13       ` Andy Lutomirski
2021-05-24 18:21         ` Len Brown
2021-05-25  3:44           ` Andy Lutomirski
2021-05-23 19:32 ` [PATCH v5 22/28] x86/fpu/amx: Enable the AMX feature in 64-bit mode Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 23/28] selftest/x86/amx: Test cases for the AMX state management Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 24/28] x86/fpu/xstate: Use per-task xstate mask for saving xstate in signal frame Chang S. Bae
2021-05-24  3:15   ` Andy Lutomirski
2021-05-24 18:06     ` Len Brown
2021-05-25  4:47       ` Andy Lutomirski
2021-05-25 14:04         ` Len Brown
2021-05-23 19:32 ` [PATCH v5 25/28] x86/fpu/xstate: Skip writing zeros to signal frame for dynamic user states if in INIT-state Chang S. Bae
2021-05-24  3:25   ` Andy Lutomirski
2021-05-24 18:15     ` Len Brown
2021-05-24 18:29       ` Dave Hansen
2021-05-25  4:46       ` Andy Lutomirski
2021-05-23 19:32 ` [PATCH v5 26/28] selftest/x86/amx: Test case for AMX state copy optimization in signal delivery Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 27/28] x86/insn/amx: Add TILERELEASE instruction to the opcode map Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 28/28] x86/fpu/amx: Clear the AMX state when appropriate Chang S. Bae
2021-05-24  3:13   ` Andy Lutomirski
2021-05-24 14:10     ` Dave Hansen
2021-05-24 17:32       ` Len Brown
2021-05-24 17:39         ` Dave Hansen
2021-05-24 18:24           ` Len Brown
2021-05-27 11:56             ` Peter Zijlstra
2021-05-27 14:02               ` Len Brown
2021-05-24 14:06   ` Dave Hansen
2021-05-24 17:34     ` Len Brown
2021-05-24 21:11       ` [PATCH v5-fix " Chang S. Bae

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.