linux-man.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/4] Documentation/x86: Improve the AMX documentation
@ 2022-09-09 20:15 Chang S. Bae
  2022-09-09 20:15 ` [PATCH v4 1/4] Documentation/x86: Explain the purpose for dynamic features Chang S. Bae
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Chang S. Bae @ 2022-09-09 20:15 UTC (permalink / raw)
  To: x86, tglx, mingo, bp, dave.hansen
  Cc: hpa, corbet, bagasdotme, tony.luck, yang.zhong, linux-doc,
	linux-man, linux-kernel, chang.seok.bae

Hi all,

Here are the changes from the last version [1]:
* Add the motivation for the dynamic feature enabling.
* Add the description for the guest options back.

Thank you Tony for the review.

=== Cover Letter ===

When the AMX feature was supported in Linux, the dynamic feature enabling
model was introduced. The following documentation changes were considered
to help users to adopt this new enabling model:

(1) The AMX-enabling code example is expected to clarify the steps.
(2) Along with that, a couple of ABI constants may be useful for users.
(3) Also, describing the motiviation will provide the context of this.
(4) Lastly, the description of new guest options are added as missing.

If they are acceptable, the arch_prctl(2) manual page [2] will be followed
up separately with something similar to the kernel documentation.

These patches can be also found in this repository:
  git://github.com/intel/amx-linux.git amx-doc

And the compiled preview is available here:
  https://htmlpreview.github.io/?https://github.com/intel/amx-linux/doc-web/x86/xstate.html

Thanks,
Chang

[1] https://lore.kernel.org/lkml/20220711171347.27309-1-chang.seok.bae@intel.com/
[2] arch_prctl(2): https://man7.org/linux/man-pages/man2/arch_prctl.2.html

Chang S. Bae (4):
  Documentation/x86: Explain the purpose for dynamic features
  x86/arch_prctl: Add AMX feature numbers as ABI constants
  Documentation/x86: Add the AMX enabling example
  Documentation/x86: Explain the state component permission for guests

 Documentation/x86/xstate.rst      | 98 +++++++++++++++++++++++++++++++
 arch/x86/include/uapi/asm/prctl.h |  3 +
 2 files changed, 101 insertions(+)


base-commit: 132bde89b5234d0ca8909775b354c48b214e1abc
-- 
2.17.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v4 1/4] Documentation/x86: Explain the purpose for dynamic features
  2022-09-09 20:15 [PATCH v4 0/4] Documentation/x86: Improve the AMX documentation Chang S. Bae
@ 2022-09-09 20:15 ` Chang S. Bae
  2022-09-09 21:36   ` Dave Hansen
  2022-09-09 20:15 ` [PATCH v4 2/4] x86/arch_prctl: Add AMX feature numbers as ABI constants Chang S. Bae
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 7+ messages in thread
From: Chang S. Bae @ 2022-09-09 20:15 UTC (permalink / raw)
  To: x86, tglx, mingo, bp, dave.hansen
  Cc: hpa, corbet, bagasdotme, tony.luck, yang.zhong, linux-doc,
	linux-man, linux-kernel, chang.seok.bae

This summary will help to guide the proper use of the enabling model.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-doc@vger.kernel.org
---
Changes from v3:
* Add as a new patch (Tony Luck).
---
 Documentation/x86/xstate.rst | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/Documentation/x86/xstate.rst b/Documentation/x86/xstate.rst
index 5cec7fb558d6..2577b28ad942 100644
--- a/Documentation/x86/xstate.rst
+++ b/Documentation/x86/xstate.rst
@@ -11,6 +11,20 @@ are enabled by XCR0 as well, but the first use of related instruction is
 trapped by the kernel because by default the required large XSTATE buffers
 are not allocated automatically.
 
+The purpose for dynamic features
+--------------------------------
+
+ - Legacy userspace libraries have hard-coded sizes for an alternate signal
+   stack. With the arch_prctl() options, the signal frame beyond AVX-512
+   and PKRU will not be written by old programs as they are prevented from
+   using dynamic features. Then, the small signal stack will be compatible
+   on systems that support dynamic features.
+
+ - Modern server systems are consolidating more applications to share the
+   CPU resource. The risk of applications interfering with each other is
+   growing. The controllability on the resource trends to be more
+   warranted. Thus, this permission mechanism will be useful for that.
+
 Using dynamically enabled XSTATE features in user space applications
 --------------------------------------------------------------------
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v4 2/4] x86/arch_prctl: Add AMX feature numbers as ABI constants
  2022-09-09 20:15 [PATCH v4 0/4] Documentation/x86: Improve the AMX documentation Chang S. Bae
  2022-09-09 20:15 ` [PATCH v4 1/4] Documentation/x86: Explain the purpose for dynamic features Chang S. Bae
@ 2022-09-09 20:15 ` Chang S. Bae
  2022-09-09 20:15 ` [PATCH v4 3/4] Documentation/x86: Add the AMX enabling example Chang S. Bae
  2022-09-09 20:15 ` [PATCH v4 4/4] Documentation/x86: Explain the state component permission for guests Chang S. Bae
  3 siblings, 0 replies; 7+ messages in thread
From: Chang S. Bae @ 2022-09-09 20:15 UTC (permalink / raw)
  To: x86, tglx, mingo, bp, dave.hansen
  Cc: hpa, corbet, bagasdotme, tony.luck, yang.zhong, linux-doc,
	linux-man, linux-kernel, chang.seok.bae

AMX state is dynamically enabled by the architecture-specific prctl().
Expose the state components as ABI constants. They become handy not to be
looked up from the architecture specification.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v2:
* Add as a new patch (Tony Luck).
---
 arch/x86/include/uapi/asm/prctl.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h
index 500b96e71f18..f298c778f856 100644
--- a/arch/x86/include/uapi/asm/prctl.h
+++ b/arch/x86/include/uapi/asm/prctl.h
@@ -16,6 +16,9 @@
 #define ARCH_GET_XCOMP_GUEST_PERM	0x1024
 #define ARCH_REQ_XCOMP_GUEST_PERM	0x1025
 
+#define ARCH_XCOMP_TILECFG		17
+#define ARCH_XCOMP_TILEDATA		18
+
 #define ARCH_MAP_VDSO_X32		0x2001
 #define ARCH_MAP_VDSO_32		0x2002
 #define ARCH_MAP_VDSO_64		0x2003
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v4 3/4] Documentation/x86: Add the AMX enabling example
  2022-09-09 20:15 [PATCH v4 0/4] Documentation/x86: Improve the AMX documentation Chang S. Bae
  2022-09-09 20:15 ` [PATCH v4 1/4] Documentation/x86: Explain the purpose for dynamic features Chang S. Bae
  2022-09-09 20:15 ` [PATCH v4 2/4] x86/arch_prctl: Add AMX feature numbers as ABI constants Chang S. Bae
@ 2022-09-09 20:15 ` Chang S. Bae
  2022-09-09 20:15 ` [PATCH v4 4/4] Documentation/x86: Explain the state component permission for guests Chang S. Bae
  3 siblings, 0 replies; 7+ messages in thread
From: Chang S. Bae @ 2022-09-09 20:15 UTC (permalink / raw)
  To: x86, tglx, mingo, bp, dave.hansen
  Cc: hpa, corbet, bagasdotme, tony.luck, yang.zhong, linux-doc,
	linux-man, linux-kernel, chang.seok.bae

Explain steps to enable the dynamic feature with a code example.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-doc@vger.kernel.org
---
Changes from v1:
* Update the description without mentioning CPUID & XGETBV (Dave Hansen).

Changes from v2:
* Massage sentences (Bagas Sanjaya).
* Adjust the example with the (future) prctl.h.
---
 Documentation/x86/xstate.rst | 55 ++++++++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)

diff --git a/Documentation/x86/xstate.rst b/Documentation/x86/xstate.rst
index 2577b28ad942..f7aad2241d32 100644
--- a/Documentation/x86/xstate.rst
+++ b/Documentation/x86/xstate.rst
@@ -78,6 +78,61 @@ the handler allocates a larger xstate buffer for the task so the large
 state can be context switched. In the unlikely cases that the allocation
 fails, the kernel sends SIGSEGV.
 
+AMX TILE_DATA enabling example
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Below is the example of how userspace applications enable
+TILE_DATA dynamically:
+
+  1. The application first needs to query the kernel for AMX
+     support::
+
+        #include <asm/prctl.h>
+        #include <sys/syscall.h>
+        #include <stdio.h>
+        #include <unistd.h>
+
+        #ifndef ARCH_GET_XCOMP_SUPP
+        #define ARCH_GET_XCOMP_SUPP  0x1021
+        #endif
+
+        #ifndef ARCH_XCOMP_TILECFG
+        #define ARCH_XCOMP_TILECFG   17
+        #endif
+
+        #ifndef ARCH_XCOMP_TILEDATA
+        #define ARCH_XCOMP_TILEDATA  18
+        #endif
+
+        #define MASK_XCOMP_TILE      ((1 << ARCH_XCOMP_TILECFG) | \
+                                      (1 << ARCH_XCOMP_TILEDATA))
+
+        unsigned long features;
+        long rc;
+
+        ...
+
+        rc = syscall(SYS_arch_prctl, ARCH_GET_XCOMP_SUPP, &features);
+
+        if (!rc && (features & MASK_XCOMP_TILE) == MASK_XCOMP_TILE)
+            printf("AMX is available.\n");
+
+  2. After that, determining support for AMX, an application must
+     explicitly ask permission to use it::
+
+        #ifndef ARCH_REQ_XCOMP_PERM
+        #define ARCH_REQ_XCOMP_PERM  0x1023
+        #endif
+
+        ...
+
+        rc = syscall(SYS_arch_prctl, ARCH_REQ_XCOMP_PERM, ARCH_XCOMP_TILEDATA);
+
+        if (!rc)
+            printf("AMX is ready for use.\n");
+
+Note this example does not include the sigaltstack preparation.
+
 Dynamic features in signal frames
 ---------------------------------
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v4 4/4] Documentation/x86: Explain the state component permission for guests
  2022-09-09 20:15 [PATCH v4 0/4] Documentation/x86: Improve the AMX documentation Chang S. Bae
                   ` (2 preceding siblings ...)
  2022-09-09 20:15 ` [PATCH v4 3/4] Documentation/x86: Add the AMX enabling example Chang S. Bae
@ 2022-09-09 20:15 ` Chang S. Bae
  3 siblings, 0 replies; 7+ messages in thread
From: Chang S. Bae @ 2022-09-09 20:15 UTC (permalink / raw)
  To: x86, tglx, mingo, bp, dave.hansen
  Cc: hpa, corbet, bagasdotme, tony.luck, yang.zhong, linux-doc,
	linux-man, linux-kernel, chang.seok.bae

Commit 980fe2fddcff ("x86/fpu: Extend fpu_xstate_prctl() with guest
permissions") extends a couple of arch_prctl(2) options for VCPU threads.
Add description for them.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Reviewed-by: Yang Zhong <yang.zhong@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-doc@vger.kernel.org
---
Changes from v1:
* Add the reason for the guest options (Dave Hansen).
* Add a note to allude some VMM policy, i.e. KVM_X86_XCOMP_GUEST_SUPP.
* Move it in the separate section.

Note the correspondent attributes were also proposed for the KVM API. But,
it was seen as inessential:
    https://lore.kernel.org/lkml/20220823231402.7839-1-chang.seok.bae@intel.com/
---
 Documentation/x86/xstate.rst | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/Documentation/x86/xstate.rst b/Documentation/x86/xstate.rst
index f7aad2241d32..fd7b5333bd70 100644
--- a/Documentation/x86/xstate.rst
+++ b/Documentation/x86/xstate.rst
@@ -141,3 +141,32 @@ entry if the feature is in its initial configuration.  This differs from
 non-dynamic features which are always written regardless of their
 configuration.  Signal handlers can examine the XSAVE buffer's XSTATE_BV
 field to determine if a features was written.
+
+Dynamic features for virtual machines
+-------------------------------------
+
+The permission for the guest state component needs to be managed separately
+from the host, as they are exclusive to each other. A coupled of options
+are extended to control the guest permission:
+
+-ARCH_GET_XCOMP_GUEST_PERM
+
+ arch_prctl(ARCH_GET_XCOMP_GUEST_PERM, &features);
+
+ ARCH_GET_XCOMP_GUEST_PERM is a variant of ARCH_GET_XCOMP_PERM. So it
+ provides the same semantics and functionality but for the guest
+ components.
+
+-ARCH_REQ_XCOMP_GUEST_PERM
+
+ arch_prctl(ARCH_REQ_XCOMP_GUEST_PERM, feature_nr);
+
+ ARCH_REQ_XCOMP_GUEST_PERM is a variant of ARCH_REQ_XCOMP_PERM. It has the
+ same semantics for the guest permission. While providing a similiar
+ functionality, this comes with a constraint. Permission is frozen when the
+ first VCPU is created. Any attempt to change permission after that point
+ is going to be rejected. So, the permission has to be requested before the
+ first VCPU creation.
+
+Note that some VMMs may have already established a set of supported state
+components. These options are not presumed to support any particular VMM.
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 1/4] Documentation/x86: Explain the purpose for dynamic features
  2022-09-09 20:15 ` [PATCH v4 1/4] Documentation/x86: Explain the purpose for dynamic features Chang S. Bae
@ 2022-09-09 21:36   ` Dave Hansen
  2022-09-14  5:25     ` Chang S. Bae
  0 siblings, 1 reply; 7+ messages in thread
From: Dave Hansen @ 2022-09-09 21:36 UTC (permalink / raw)
  To: Chang S. Bae, x86, tglx, mingo, bp, dave.hansen
  Cc: hpa, corbet, bagasdotme, tony.luck, yang.zhong, linux-doc,
	linux-man, linux-kernel

On 9/9/22 13:15, Chang S. Bae wrote:
> +The purpose for dynamic features
> +--------------------------------
> +
> + - Legacy userspace libraries have hard-coded sizes for an alternate signal
> +   stack. With the arch_prctl() options, the signal frame beyond AVX-512
> +   and PKRU will not be written by old programs as they are prevented from
> +   using dynamic features. Then, the small signal stack will be compatible
> +   on systems that support dynamic features.

This doesn't really ever broach the _problem_ that dynamic features solve.

	Legacy userspace libraries often have hard-coded, static sizes
	for alternate signal stacks, often using MINSIGSTKSZ which is
	typically 2k.  That stack must be able to store at *least*
	the signal frame that the kernel sets up before jumping into
	the signal handler.  That signal frame must include an XSAVE
	buffer defined by the CPU.

	However, that means that the size of signal stacks is dynamic,
	not static, because different CPUs have differently-sized XSAVE
	buffers.  Those old <=2k buffers are now too small for new CPU
	features like AVX-512, which is causing stack overflows at
	signal entry.


> + - Modern server systems are consolidating more applications to share the
> +   CPU resource.

I'm not sure what this means.  Are you saying that CPU time is more
overcommitted?  Or that different users are more likely to be sharing
the same CPU core?  Or, is this trying to allude to the frequency
penalties that cores (and even packages) pay for using features like
AVX-512?

> The risk of applications interfering with each other is
> +   growing. The controllability on the resource trends to be more
> +   warranted. Thus, this permission mechanism will be useful for that.

Should this be something more like:

Historically, a CPU shared very few resources with its neighbors outside
of caches.  A CPU could execute whatever instructions it wanted without
impacting other CPUs.  Also, there were minimal long-lasting temporal
effects; an application that preceded yours running on a CPU would not
impact how your application runs.

That model has been eroding, first with SMT where multiple logical CPUs
share a core's resources.  Then, with features like AVX-512 that have a
frequency and thermal impact which can last even after AVX-512 use
ceases and have an impact wider than a single core.

In other words, it has become easier to be a "noisy neighbor".

Dynamic features allow the kernel limit applications' ability to become
noisy neighbors in the first place.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 1/4] Documentation/x86: Explain the purpose for dynamic features
  2022-09-09 21:36   ` Dave Hansen
@ 2022-09-14  5:25     ` Chang S. Bae
  0 siblings, 0 replies; 7+ messages in thread
From: Chang S. Bae @ 2022-09-14  5:25 UTC (permalink / raw)
  To: Dave Hansen, x86, tglx, mingo, bp, dave.hansen
  Cc: hpa, corbet, bagasdotme, tony.luck, yang.zhong, linux-doc,
	linux-man, linux-kernel

On 9/9/2022 2:36 PM, Dave Hansen wrote:
> On 9/9/22 13:15, Chang S. Bae wrote:
>> +The purpose for dynamic features
>> +-------------------------------- >> +
>> + - Legacy userspace libraries have hard-coded sizes for an alternate signal
>> +   stack. With the arch_prctl() options, the signal frame beyond AVX-512
>> +   and PKRU will not be written by old programs as they are prevented from
>> +   using dynamic features. Then, the small signal stack will be compatible
>> +   on systems that support dynamic features.
> 
> This doesn't really ever broach the _problem_ that dynamic features solve.
> 
> 	Legacy userspace libraries often have hard-coded, static sizes
> 	for alternate signal stacks, often using MINSIGSTKSZ which is
> 	typically 2k.  That stack must be able to store at *least*
> 	the signal frame that the kernel sets up before jumping into
> 	the signal handler.  That signal frame must include an XSAVE
> 	buffer defined by the CPU. >
> 	However, that means that the size of signal stacks is dynamic,
> 	not static, because different CPUs have differently-sized XSAVE
> 	buffers.  

Yes, it was missing some points like:
* The buffer size is dynamic.
* And it depends on the CPU.

 >	Those old <=2k buffers are now too small for new CPU
 > 	features like AVX-512, which is causing stack overflows at
 > 	signal entry.

FWIW, some details are worth to be noted:
* Today's kernel prevents the overflow with commit 2beb4a53fc3f
   ("x86/signal: Detect and prevent an alternate signal stack overflow").
* On sigaltstack(), it also rejects a too-small altstack with
   the CONFIG_STRICT_SIGALTSTACK_SIZE option [2].

But, then I think AVX-512 is kind of unrelated here as it is not a 
dynamic feature. Maybe, something like:

     A compiled-in size of 2KB with existing applications is too small
     for new CPU features like AMX. Instead of universally requiring
     larger stack, this dynamic enabling can selectively enforce programs
     to have properly-sized altstacks.

>> + - Modern server systems are consolidating more applications to share the
>> +   CPU resource.
> 
> I'm not sure what this means.  Are you saying that CPU time is more
> overcommitted?  Or that different users are more likely to be sharing
> the same CPU core?  Or, is this trying to allude to the frequency
> penalties that cores (and even packages) pay for using features like
> AVX-512?

Sorry, this point looks to be too sketchy. But, clarifying the problem, 
may help but it is hardly related to the solution to one of them.

The AVX-512 use was proliferated especially in userspace libraries. Then 
notable side effect like the frequency drop was observed. But, it is 
unclear how this dynamic enabling can prevent the library code from 
enabling those features.

>> The risk of applications interfering with each other is
>> +   growing. The controllability on the resource trends to be more
>> +   warranted. Thus, this permission mechanism will be useful for that.
> 
> Should this be something more like:
> 
> Historically, a CPU shared very few resources with its neighbors outside
> of caches.  A CPU could execute whatever instructions it wanted without
> impacting other CPUs.  Also, there were minimal long-lasting temporal
> effects; an application that preceded yours running on a CPU would not
> impact how your application runs.
> 
> That model has been eroding, first with SMT where multiple logical CPUs
> share a core's resources.  Then, with features like AVX-512 that have a
> frequency and thermal impact which can last even after AVX-512 use
> ceases and have an impact wider than a single core.
> 
> In other words, it has become easier to be a "noisy neighbor".
> 
> Dynamic features allow the kernel limit applications' ability to become
> noisy neighbors in the first place.

Yeah, but it looks to be less relevant than the coscheduling mechanism 
as the solution for this. Maybe I'm missing something here.

I'd step back from this second point until finding a case that it solves 
any other problem.

Thanks,
Chang

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-09-14  5:25 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-09 20:15 [PATCH v4 0/4] Documentation/x86: Improve the AMX documentation Chang S. Bae
2022-09-09 20:15 ` [PATCH v4 1/4] Documentation/x86: Explain the purpose for dynamic features Chang S. Bae
2022-09-09 21:36   ` Dave Hansen
2022-09-14  5:25     ` Chang S. Bae
2022-09-09 20:15 ` [PATCH v4 2/4] x86/arch_prctl: Add AMX feature numbers as ABI constants Chang S. Bae
2022-09-09 20:15 ` [PATCH v4 3/4] Documentation/x86: Add the AMX enabling example Chang S. Bae
2022-09-09 20:15 ` [PATCH v4 4/4] Documentation/x86: Explain the state component permission for guests Chang S. Bae

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).