From: "Chang S. Bae" <chang.seok.bae@intel.com> To: bp@suse.de, luto@kernel.org, tglx@linutronix.de, mingo@kernel.org, x86@kernel.org Cc: len.brown@intel.com, dave.hansen@intel.com, jing2.liu@intel.com, ravi.v.shankar@intel.com, linux-kernel@vger.kernel.org, chang.seok.bae@intel.com Subject: [PATCH v3 00/21] x86: Support Intel Advanced Matrix Extensions Date: Wed, 23 Dec 2020 07:56:56 -0800 [thread overview] Message-ID: <20201223155717.19556-1-chang.seok.bae@intel.com> (raw) Intel Advanced Matrix Extensions (AMX)[1][2] will be shipping on servers soon. AMX consists of configurable TMM "TILE" registers plus new accelerator instructions that operate on them. TMUL (Tile matrix MULtiply) is the first accelerator instruction set to use the new registers, and we anticipate additional instructions in the future. Neither AMX state nor TMUL instructions depend on AVX. However, AMX and AVX do share common challenges. The TMM registers are 8KB today, and architecturally as large as 64KB, which merits updates to hardware and software state management. Further, both technologies run faster when they are not simultaneously running on SMT siblings, and both technologies use of power and bandwidth impact the power and performance available to neighboring cores. (This impact has measurably improved in recent hardware.) If the existing kernel approach for managing XSAVE state was employed to handle AMX, 8KB space would be added to every task, but possibly rarely used. So Linux support is optimized by using a new XSAVE feature: eXtended Feature Disabling (XFD). The kernel arms XFD to provide a #NM exception upon a tasks' first access to TILE state. The kernel exception handler installs the appropriate XSAVE context switch buffer, and the task behaves as if the kernel had done that for all tasks. Using XFD, AMX space is allocated only when needed, eliminating the memory waste for unused state components. This series requires the new minimum sigaltstack support [3] and is based on the mainline. The series is composed of three parts: * Patch 01-14: Foundation to support dynamic user state management * Patch 15-19: AMX enablement, including unit tests * Patch 20-21: Signal handling optimization and new boot-parameters Thanks to Len Brown and Dave Hansen for help with the cover letter. Changes from v2 [5]: * Removed the patch for the tile data inheritance. Also, updated the selftest patch. (Andy Lutomirski) * Changed the kernel tainted when any unknown state is enabled. (Andy Lutomirski) * Changed to use the XFD feature only when the compacted format in use. * Improved the test code. * Simplified the cmdline handling. * Removed 'task->fpu' in changelogs. (Boris Petkov) * Updated the variable name / comments / changelogs for clarification. Changes from v1 [4]: * Added vmalloc() error tracing (Dave Hansen, PeterZ, and Andy Lutomirski) * Inlined the #NM handling code (Andy Lutomirski) * Made signal handling optimization revertible * Revised the new parameter handling code (Andy Lutomirski and Dave Hansen) * Rebased on the upstream kernel [1]: Intel Architecture Instruction Set Extension Programming Reference October 2020, https://software.intel.com/content/dam/develop/external/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf [2]: https://software.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top/compiler-reference/intrinsics/intrinsics-for-intel-advanced-matrix-extensions-intel-amx-instructions.html [3]: https://lore.kernel.org/lkml/20201223015312.4882-1-chang.seok.bae@intel.com/ [4]: https://lore.kernel.org/lkml/20201001203913.9125-1-chang.seok.bae@intel.com/ [5]: https://lore.kernel.org/lkml/20201119233257.2939-1-chang.seok.bae@intel.com/ Chang S. Bae (21): x86/fpu/xstate: Modify initialization helper to handle both static and dynamic buffers x86/fpu/xstate: Modify state copy helpers to handle both static and dynamic buffers x86/fpu/xstate: Modify address finders to handle both static and dynamic buffers x86/fpu/xstate: Modify context switch helpers to handle both static and dynamic buffers x86/fpu/xstate: Add a new variable to indicate dynamic user states x86/fpu/xstate: Calculate and remember dynamic xstate buffer sizes x86/fpu/xstate: Introduce helpers to manage dynamic xstate buffers x86/fpu/xstate: Define the scope of the initial xstate data x86/fpu/xstate: Introduce wrapper functions to organize xstate buffer access x86/fpu/xstate: Update xstate save function to support dynamic xstate x86/fpu/xstate: Update xstate buffer address finder to support dynamic xstate x86/fpu/xstate: Update xstate context copy function to support dynamic buffer x86/fpu/xstate: Expand dynamic context switch buffer on first use x86/fpu/xstate: Support ptracer-induced xstate buffer expansion x86/fpu/xstate: Extend the table to map xstate components with features x86/cpufeatures/amx: Enumerate Advanced Matrix Extension (AMX) feature bits x86/fpu/amx: Define AMX state components and have it used for boot-time checks x86/fpu/amx: Enable the AMX feature in 64-bit mode selftest/x86/amx: Include test cases for the AMX state management x86/fpu/xstate: Support dynamic user state in the signal handling path x86/fpu/xstate: Introduce boot-parameters to control some state component support .../admin-guide/kernel-parameters.txt | 15 + arch/x86/include/asm/cpufeatures.h | 4 + arch/x86/include/asm/fpu/internal.h | 97 ++- arch/x86/include/asm/fpu/types.h | 62 +- arch/x86/include/asm/fpu/xstate.h | 61 +- arch/x86/include/asm/msr-index.h | 2 + arch/x86/include/asm/pgtable.h | 2 +- arch/x86/include/asm/processor.h | 10 +- arch/x86/include/asm/trace/fpu.h | 11 +- arch/x86/kernel/cpu/common.c | 2 +- arch/x86/kernel/cpu/cpuid-deps.c | 4 + arch/x86/kernel/fpu/core.c | 50 +- arch/x86/kernel/fpu/init.c | 103 ++- arch/x86/kernel/fpu/regset.c | 65 +- arch/x86/kernel/fpu/signal.c | 40 +- arch/x86/kernel/fpu/xstate.c | 481 ++++++++++-- arch/x86/kernel/process.c | 11 + arch/x86/kernel/process_32.c | 2 +- arch/x86/kernel/process_64.c | 2 +- arch/x86/kernel/traps.c | 40 + arch/x86/kvm/x86.c | 43 +- arch/x86/mm/pkeys.c | 2 +- tools/testing/selftests/x86/Makefile | 2 +- tools/testing/selftests/x86/amx.c | 743 ++++++++++++++++++ 24 files changed, 1631 insertions(+), 223 deletions(-) create mode 100644 tools/testing/selftests/x86/amx.c -- 2.17.1
next reply other threads:[~2020-12-23 16:03 UTC|newest] Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-12-23 15:56 Chang S. Bae [this message] 2020-12-23 15:56 ` [PATCH v3 01/21] x86/fpu/xstate: Modify initialization helper to handle both static and dynamic buffers Chang S. Bae 2021-01-15 12:40 ` Borislav Petkov 2020-12-23 15:56 ` [PATCH v3 02/21] x86/fpu/xstate: Modify state copy helpers " Chang S. Bae 2021-01-15 12:50 ` Borislav Petkov 2021-01-19 18:50 ` Bae, Chang Seok 2021-01-20 20:53 ` Borislav Petkov 2021-01-20 21:12 ` Bae, Chang Seok 2020-12-23 15:56 ` [PATCH v3 03/21] x86/fpu/xstate: Modify address finders " Chang S. Bae 2021-01-15 13:06 ` Borislav Petkov 2020-12-23 15:57 ` [PATCH v3 04/21] x86/fpu/xstate: Modify context switch helpers " Chang S. Bae 2021-01-15 13:18 ` Borislav Petkov 2021-01-19 18:49 ` Bae, Chang Seok 2020-12-23 15:57 ` [PATCH v3 05/21] x86/fpu/xstate: Add a new variable to indicate dynamic user states Chang S. Bae 2021-01-15 13:39 ` Borislav Petkov 2021-01-15 19:47 ` Bae, Chang Seok 2021-01-19 15:57 ` Borislav Petkov 2021-01-19 18:57 ` Bae, Chang Seok 2021-01-22 10:56 ` Borislav Petkov 2021-01-27 1:23 ` Bae, Chang Seok 2020-12-23 15:57 ` [PATCH v3 06/21] x86/fpu/xstate: Calculate and remember dynamic xstate buffer sizes Chang S. Bae 2021-01-22 11:44 ` Borislav Petkov 2021-01-27 1:23 ` Bae, Chang Seok 2021-01-27 9:38 ` Borislav Petkov 2021-02-03 2:54 ` Bae, Chang Seok 2020-12-23 15:57 ` [PATCH v3 07/21] x86/fpu/xstate: Introduce helpers to manage dynamic xstate buffers Chang S. Bae 2021-01-26 20:17 ` Borislav Petkov 2021-01-27 1:23 ` Bae, Chang Seok 2021-01-27 10:41 ` Borislav Petkov 2021-02-03 4:10 ` Bae, Chang Seok 2021-02-04 13:10 ` Borislav Petkov 2021-02-03 4:10 ` Bae, Chang Seok 2020-12-23 15:57 ` [PATCH v3 08/21] x86/fpu/xstate: Define the scope of the initial xstate data Chang S. Bae 2021-02-08 12:33 ` Borislav Petkov 2021-02-08 18:53 ` Bae, Chang Seok 2021-02-09 12:49 ` Borislav Petkov 2021-02-09 15:38 ` Bae, Chang Seok 2020-12-23 15:57 ` [PATCH v3 09/21] x86/fpu/xstate: Introduce wrapper functions to organize xstate buffer access Chang S. Bae 2021-02-08 12:33 ` Borislav Petkov 2021-02-09 15:50 ` Bae, Chang Seok 2020-12-23 15:57 ` [PATCH v3 10/21] x86/fpu/xstate: Update xstate save function to support dynamic xstate Chang S. Bae 2021-01-07 8:41 ` Liu, Jing2 2021-01-07 18:40 ` Bae, Chang Seok 2021-01-12 2:52 ` Liu, Jing2 2021-01-15 4:59 ` Bae, Chang Seok 2021-01-15 5:45 ` Liu, Jing2 2021-02-08 12:33 ` Borislav Petkov 2021-02-09 15:48 ` Bae, Chang Seok 2020-12-23 15:57 ` [PATCH v3 11/21] x86/fpu/xstate: Update xstate buffer address finder " Chang S. Bae 2021-02-19 15:00 ` Borislav Petkov 2021-02-19 19:19 ` Bae, Chang Seok 2020-12-23 15:57 ` [PATCH v3 12/21] x86/fpu/xstate: Update xstate context copy function to support dynamic buffer Chang S. Bae 2020-12-23 15:57 ` [PATCH v3 13/21] x86/fpu/xstate: Expand dynamic context switch buffer on first use Chang S. Bae 2020-12-23 15:57 ` [PATCH v3 14/21] x86/fpu/xstate: Support ptracer-induced xstate buffer expansion Chang S. Bae 2020-12-23 15:57 ` [PATCH v3 15/21] x86/fpu/xstate: Extend the table to map xstate components with features Chang S. Bae 2020-12-23 15:57 ` [PATCH v3 16/21] x86/cpufeatures/amx: Enumerate Advanced Matrix Extension (AMX) feature bits Chang S. Bae 2020-12-23 15:57 ` [PATCH v3 17/21] x86/fpu/amx: Define AMX state components and have it used for boot-time checks Chang S. Bae 2020-12-23 15:57 ` [PATCH v3 18/21] x86/fpu/amx: Enable the AMX feature in 64-bit mode Chang S. Bae 2020-12-23 15:57 ` [PATCH v3 19/21] selftest/x86/amx: Include test cases for the AMX state management Chang S. Bae 2020-12-23 15:57 ` [PATCH v3 20/21] x86/fpu/xstate: Support dynamic user state in the signal handling path Chang S. Bae 2020-12-23 15:57 ` [PATCH v3 21/21] x86/fpu/xstate: Introduce boot-parameters to control some state component support Chang S. Bae 2020-12-23 18:37 ` Randy Dunlap 2021-01-14 21:31 ` Bae, Chang Seok 2021-01-14 21:31 ` [PATCH v3 00/21] x86: Support Intel Advanced Matrix Extensions Bae, Chang Seok
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20201223155717.19556-1-chang.seok.bae@intel.com \ --to=chang.seok.bae@intel.com \ --cc=bp@suse.de \ --cc=dave.hansen@intel.com \ --cc=jing2.liu@intel.com \ --cc=len.brown@intel.com \ --cc=linux-kernel@vger.kernel.org \ --cc=luto@kernel.org \ --cc=mingo@kernel.org \ --cc=ravi.v.shankar@intel.com \ --cc=tglx@linutronix.de \ --cc=x86@kernel.org \ --subject='Re: [PATCH v3 00/21] x86: Support Intel Advanced Matrix Extensions' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).