All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/7] AMX support in Qemu
@ 2022-01-07  9:31 Yang Zhong
  2022-01-07  9:31 ` [RFC PATCH 1/7] x86: Fix the 64-byte boundary enumeration for extended state Yang Zhong
                   ` (6 more replies)
  0 siblings, 7 replies; 31+ messages in thread
From: Yang Zhong @ 2022-01-07  9:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: yang.zhong, kevin.tian, seanjc, jing2.liu, wei.w.wang,
	guang.zeng, pbonzini

Intel introduces Advanced Matrix Extensions (AMX) [1] feature that
consists of configurable two-dimensional "TILE" registers and new
accelerator instructions that operate on them. TMUL (Tile matrix
MULtiply) is the first accelerator instruction set to use the new
registers.

This series is based on the AMX KVM series [2] and exposes AMX feature
to guest (The detailed design discussions can be found in [3]).

According to the KVM design, the userspace VMM (e.g. Qemu) is expected
to request guest permission for the dynamically-enabled XSAVE features
only once when the first vCPU is created, and KVM checks guest permission
in KVM_SET_CPUID2.

Intel AMX is XSAVE supported and XSAVE enabled. Those extended features
has large state while current kvm_xsave only allows 4KB. The AMX KVM has
extended struct kvm_xsave to meet this requirenment and added one extra
KVM_GET_XSAVE2 ioctl to handle extended features. From our test, the AMX
live migration work well.

Notice: This version still includes some definitions in the linux-headers,
once AMX KVM is merged and Qemu sync those linux-headers, I will remove
those definitions. So please ignore those changes.

[1] Intel Architecture Instruction Set Extension Programming Reference
    https://software.intel.com/content/dam/develop/external/us/en/documents/\
    architecture-instruction-set-extensions-programming-reference.pdf
[2] https://www.spinics.net/lists/kvm/msg263577.html
[3] https://www.spinics.net/lists/kvm/msg259015.html

Thanks,
Yang
----

Jing Liu (5):
  x86: Fix the 64-byte boundary enumeration for extended state
  x86: Add AMX XTILECFG and XTILEDATA components
  x86: Add XFD faulting bit for state components
  x86: Add AMX CPUIDs enumeration
  x86: Use new XSAVE ioctls handling

Yang Zhong (1):
  x86: Grant AMX permission for guest

Zeng Guang (1):
  x86: Support XFD and AMX xsave data migration

 linux-headers/asm-x86/kvm.h | 14 ++++++++
 linux-headers/linux/kvm.h   |  2 ++
 target/i386/cpu.h           | 40 ++++++++++++++++++++++-
 hw/i386/x86.c               | 28 ++++++++++++++++
 target/i386/cpu.c           | 64 +++++++++++++++++++++++++++++++++++--
 target/i386/kvm/kvm-cpu.c   |  4 +++
 target/i386/kvm/kvm.c       | 37 +++++++++++++++++++--
 target/i386/machine.c       | 42 ++++++++++++++++++++++++
 target/i386/xsave_helper.c  | 35 ++++++++++++++++++++
 9 files changed, 259 insertions(+), 7 deletions(-)



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RFC PATCH 1/7] x86: Fix the 64-byte boundary enumeration for extended state
  2022-01-07  9:31 [RFC PATCH 0/7] AMX support in Qemu Yang Zhong
@ 2022-01-07  9:31 ` Yang Zhong
  2022-01-10  8:20   ` Tian, Kevin
  2022-01-07  9:31 ` [RFC PATCH 2/7] x86: Add AMX XTILECFG and XTILEDATA components Yang Zhong
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 31+ messages in thread
From: Yang Zhong @ 2022-01-07  9:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: yang.zhong, kevin.tian, seanjc, jing2.liu, wei.w.wang,
	guang.zeng, pbonzini

From: Jing Liu <jing2.liu@intel.com>

The extended state subleaves (EAX=0Dh, ECX=n, n>1).ECX[1]
are all zero, while spec actually introduces that bit 01
should indicate if the extended state component locates
on the next 64-byte boundary following the preceding state
component when the compacted format of an XSAVE area is
used.

Fix the subleaves value according to the host supported
cpuid. The upcoming AMX feature would be the first one
using it.

Signed-off-by: Jing Liu <jing2.liu@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
---
 target/i386/cpu.h         | 1 +
 target/i386/cpu.c         | 1 +
 target/i386/kvm/kvm-cpu.c | 3 +++
 3 files changed, 5 insertions(+)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 04f2b790c9..7f9700544f 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1354,6 +1354,7 @@ QEMU_BUILD_BUG_ON(sizeof(XSavePKRU) != 0x8);
 typedef struct ExtSaveArea {
     uint32_t feature, bits;
     uint32_t offset, size;
+    uint32_t need_align;
 } ExtSaveArea;
 
 #define XSAVE_STATE_AREA_COUNT (XSTATE_PKRU_BIT + 1)
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index aa9e636800..47bc4d5c1a 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -5487,6 +5487,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
                 const ExtSaveArea *esa = &x86_ext_save_areas[count];
                 *eax = esa->size;
                 *ebx = esa->offset;
+                *ecx = esa->need_align << 1;
             }
         }
         break;
diff --git a/target/i386/kvm/kvm-cpu.c b/target/i386/kvm/kvm-cpu.c
index d95028018e..6c4c1c6f9d 100644
--- a/target/i386/kvm/kvm-cpu.c
+++ b/target/i386/kvm/kvm-cpu.c
@@ -105,6 +105,9 @@ static void kvm_cpu_xsave_init(void)
                 assert(esa->size == sz);
                 esa->offset = kvm_arch_get_supported_cpuid(s, 0xd, i, R_EBX);
             }
+
+            uint32_t ecx = kvm_arch_get_supported_cpuid(s, 0xd, i, R_ECX);
+            esa->need_align = ecx & (1u << 1) ? 1 : 0;
         }
     }
 }


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [RFC PATCH 2/7] x86: Add AMX XTILECFG and XTILEDATA components
  2022-01-07  9:31 [RFC PATCH 0/7] AMX support in Qemu Yang Zhong
  2022-01-07  9:31 ` [RFC PATCH 1/7] x86: Fix the 64-byte boundary enumeration for extended state Yang Zhong
@ 2022-01-07  9:31 ` Yang Zhong
  2022-01-10  8:23   ` Tian, Kevin
  2022-01-07  9:31 ` [RFC PATCH 3/7] x86: Grant AMX permission for guest Yang Zhong
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 31+ messages in thread
From: Yang Zhong @ 2022-01-07  9:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: yang.zhong, kevin.tian, seanjc, jing2.liu, wei.w.wang,
	guang.zeng, pbonzini

From: Jing Liu <jing2.liu@intel.com>

AMX XTILECFG and XTILEDATA are managed by XSAVE feature
set. State component 17 is used for 64-byte TILECFG register
(XTILECFG state) and component 18 is used for 8192 bytes
of tile data (XTILEDATA state).

Add AMX feature bits to x86_ext_save_areas array to set
up AMX components. Add structs that define the layout of
AMX XSAVE areas and use QEMU_BUILD_BUG_ON to validate the
structs sizes.

Signed-off-by: Jing Liu <jing2.liu@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
---
 target/i386/cpu.h | 16 +++++++++++++++-
 target/i386/cpu.c |  8 ++++++++
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 7f9700544f..768a8218be 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -537,6 +537,8 @@ typedef enum X86Seg {
 #define XSTATE_ZMM_Hi256_BIT            6
 #define XSTATE_Hi16_ZMM_BIT             7
 #define XSTATE_PKRU_BIT                 9
+#define XSTATE_XTILE_CFG_BIT            17
+#define XSTATE_XTILE_DATA_BIT           18
 
 #define XSTATE_FP_MASK                  (1ULL << XSTATE_FP_BIT)
 #define XSTATE_SSE_MASK                 (1ULL << XSTATE_SSE_BIT)
@@ -1343,6 +1345,16 @@ typedef struct XSavePKRU {
     uint32_t padding;
 } XSavePKRU;
 
+/* Ext. save area 17: AMX XTILECFG state */
+typedef struct XSaveXTILE_CFG {
+    uint8_t xtilecfg[64];
+} XSaveXTILE_CFG;
+
+/* Ext. save area 18: AMX XTILEDATA state */
+typedef struct XSaveXTILE_DATA {
+    uint8_t xtiledata[8][1024];
+} XSaveXTILE_DATA;
+
 QEMU_BUILD_BUG_ON(sizeof(XSaveAVX) != 0x100);
 QEMU_BUILD_BUG_ON(sizeof(XSaveBNDREG) != 0x40);
 QEMU_BUILD_BUG_ON(sizeof(XSaveBNDCSR) != 0x40);
@@ -1350,6 +1362,8 @@ QEMU_BUILD_BUG_ON(sizeof(XSaveOpmask) != 0x40);
 QEMU_BUILD_BUG_ON(sizeof(XSaveZMM_Hi256) != 0x200);
 QEMU_BUILD_BUG_ON(sizeof(XSaveHi16_ZMM) != 0x400);
 QEMU_BUILD_BUG_ON(sizeof(XSavePKRU) != 0x8);
+QEMU_BUILD_BUG_ON(sizeof(XSaveXTILE_CFG) != 0x40);
+QEMU_BUILD_BUG_ON(sizeof(XSaveXTILE_DATA) != 0x2000);
 
 typedef struct ExtSaveArea {
     uint32_t feature, bits;
@@ -1357,7 +1371,7 @@ typedef struct ExtSaveArea {
     uint32_t need_align;
 } ExtSaveArea;
 
-#define XSAVE_STATE_AREA_COUNT (XSTATE_PKRU_BIT + 1)
+#define XSAVE_STATE_AREA_COUNT (XSTATE_XTILE_DATA_BIT + 1)
 
 extern ExtSaveArea x86_ext_save_areas[XSAVE_STATE_AREA_COUNT];
 
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 47bc4d5c1a..dd2c919c33 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1401,6 +1401,14 @@ ExtSaveArea x86_ext_save_areas[XSAVE_STATE_AREA_COUNT] = {
     [XSTATE_PKRU_BIT] =
           { .feature = FEAT_7_0_ECX, .bits = CPUID_7_0_ECX_PKU,
             .size = sizeof(XSavePKRU) },
+    [XSTATE_XTILE_CFG_BIT] = {
+        .feature = FEAT_7_0_EDX, .bits = CPUID_7_0_EDX_AMX_TILE,
+        .size = sizeof(XSaveXTILE_CFG),
+    },
+    [XSTATE_XTILE_DATA_BIT] = {
+        .feature = FEAT_7_0_EDX, .bits = CPUID_7_0_EDX_AMX_TILE,
+        .size = sizeof(XSaveXTILE_DATA),
+    },
 };
 
 static uint32_t xsave_area_size(uint64_t mask)


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [RFC PATCH 3/7] x86: Grant AMX permission for guest
  2022-01-07  9:31 [RFC PATCH 0/7] AMX support in Qemu Yang Zhong
  2022-01-07  9:31 ` [RFC PATCH 1/7] x86: Fix the 64-byte boundary enumeration for extended state Yang Zhong
  2022-01-07  9:31 ` [RFC PATCH 2/7] x86: Add AMX XTILECFG and XTILEDATA components Yang Zhong
@ 2022-01-07  9:31 ` Yang Zhong
  2022-01-10  8:36   ` Tian, Kevin
  2022-01-18 12:52   ` Paolo Bonzini
  2022-01-07  9:31 ` [RFC PATCH 4/7] x86: Add XFD faulting bit for state components Yang Zhong
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 31+ messages in thread
From: Yang Zhong @ 2022-01-07  9:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: yang.zhong, kevin.tian, seanjc, jing2.liu, wei.w.wang,
	guang.zeng, pbonzini

Kernel mechanism for dynamically enabled XSAVE features
asks userspace VMM requesting guest permission if it wants
to expose the features. Only with the permission, kernel
can try to enable the features when detecting the intention
from guest in runtime.

Qemu should request the permission for guest only once
before the first vCPU is created. KVM checks the guest
permission when Qemu advertises the features, and the
advertising operation fails w/o permission.

Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Signed-off-by: Jing Liu <jing2.liu@intel.com>
---
 target/i386/cpu.h |  7 +++++++
 hw/i386/x86.c     | 28 ++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 768a8218be..79023fe723 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -549,6 +549,13 @@ typedef enum X86Seg {
 #define XSTATE_ZMM_Hi256_MASK           (1ULL << XSTATE_ZMM_Hi256_BIT)
 #define XSTATE_Hi16_ZMM_MASK            (1ULL << XSTATE_Hi16_ZMM_BIT)
 #define XSTATE_PKRU_MASK                (1ULL << XSTATE_PKRU_BIT)
+#define XSTATE_XTILE_CFG_MASK           (1ULL << XSTATE_XTILE_CFG_BIT)
+#define XSTATE_XTILE_DATA_MASK          (1ULL << XSTATE_XTILE_DATA_BIT)
+#define XFEATURE_XTILE_MASK             (XSTATE_XTILE_CFG_MASK \
+                                         | XSTATE_XTILE_DATA_MASK)
+
+#define ARCH_GET_XCOMP_GUEST_PERM       0x1024
+#define ARCH_REQ_XCOMP_GUEST_PERM       0x1025
 
 /* CPUID feature words */
 typedef enum FeatureWord {
diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index b84840a1bb..0a204c375e 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -41,6 +41,8 @@
 #include "sysemu/cpu-timers.h"
 #include "trace.h"
 
+#include <sys/syscall.h>
+
 #include "hw/i386/x86.h"
 #include "target/i386/cpu.h"
 #include "hw/i386/topology.h"
@@ -117,6 +119,30 @@ out:
     object_unref(cpu);
 }
 
+static void x86_xsave_req_perm(void)
+{
+    unsigned long bitmask;
+
+    long rc = syscall(SYS_arch_prctl, ARCH_REQ_XCOMP_GUEST_PERM,
+                      XSTATE_XTILE_DATA_BIT);
+    if (rc) {
+        /*
+         * The older kernel version(<5.15) can't support
+         * ARCH_REQ_XCOMP_GUEST_PERM and directly return.
+         */
+        return;
+    }
+
+    rc = syscall(SYS_arch_prctl, ARCH_GET_XCOMP_GUEST_PERM, &bitmask);
+    if (rc) {
+        error_report("prctl(ARCH_GET_XCOMP_GUEST_PERM) error: %ld", rc);
+    } else if (!(bitmask & XFEATURE_XTILE_MASK)) {
+        error_report("prctl(ARCH_REQ_XCOMP_GUEST_PERM) failure "
+                     "and bitmask=0x%lx", bitmask);
+        exit(EXIT_FAILURE);
+    }
+}
+
 void x86_cpus_init(X86MachineState *x86ms, int default_cpu_version)
 {
     int i;
@@ -124,6 +150,8 @@ void x86_cpus_init(X86MachineState *x86ms, int default_cpu_version)
     MachineState *ms = MACHINE(x86ms);
     MachineClass *mc = MACHINE_GET_CLASS(x86ms);
 
+    /* Request AMX pemission for guest */
+    x86_xsave_req_perm();
     x86_cpu_set_default_version(default_cpu_version);
 
     /*


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [RFC PATCH 4/7] x86: Add XFD faulting bit for state components
  2022-01-07  9:31 [RFC PATCH 0/7] AMX support in Qemu Yang Zhong
                   ` (2 preceding siblings ...)
  2022-01-07  9:31 ` [RFC PATCH 3/7] x86: Grant AMX permission for guest Yang Zhong
@ 2022-01-07  9:31 ` Yang Zhong
  2022-01-10  8:38   ` Tian, Kevin
  2022-01-18 12:52   ` Paolo Bonzini
  2022-01-07  9:31 ` [RFC PATCH 5/7] x86: Add AMX CPUIDs enumeration Yang Zhong
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 31+ messages in thread
From: Yang Zhong @ 2022-01-07  9:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: yang.zhong, kevin.tian, seanjc, jing2.liu, wei.w.wang,
	guang.zeng, pbonzini

From: Jing Liu <jing2.liu@intel.com>

Intel introduces XFD faulting mechanism for extended
XSAVE features to dynamically enable the features in
runtime. If CPUID (EAX=0Dh, ECX=n, n>1).ECX[2] is set
as 1, it indicates support for XFD faulting of this
state component.

Signed-off-by: Jing Liu <jing2.liu@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
---
 target/i386/cpu.h         | 2 +-
 target/i386/cpu.c         | 2 +-
 target/i386/kvm/kvm-cpu.c | 1 +
 3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 79023fe723..22f7ff40a6 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1375,7 +1375,7 @@ QEMU_BUILD_BUG_ON(sizeof(XSaveXTILE_DATA) != 0x2000);
 typedef struct ExtSaveArea {
     uint32_t feature, bits;
     uint32_t offset, size;
-    uint32_t need_align;
+    uint32_t need_align, support_xfd;
 } ExtSaveArea;
 
 #define XSAVE_STATE_AREA_COUNT (XSTATE_XTILE_DATA_BIT + 1)
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index dd2c919c33..1adc3f0f99 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -5495,7 +5495,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
                 const ExtSaveArea *esa = &x86_ext_save_areas[count];
                 *eax = esa->size;
                 *ebx = esa->offset;
-                *ecx = esa->need_align << 1;
+                *ecx = (esa->need_align << 1) | (esa->support_xfd << 2);
             }
         }
         break;
diff --git a/target/i386/kvm/kvm-cpu.c b/target/i386/kvm/kvm-cpu.c
index 6c4c1c6f9d..3b3c203f11 100644
--- a/target/i386/kvm/kvm-cpu.c
+++ b/target/i386/kvm/kvm-cpu.c
@@ -108,6 +108,7 @@ static void kvm_cpu_xsave_init(void)
 
             uint32_t ecx = kvm_arch_get_supported_cpuid(s, 0xd, i, R_ECX);
             esa->need_align = ecx & (1u << 1) ? 1 : 0;
+            esa->support_xfd = ecx & (1u << 2) ? 1 : 0;
         }
     }
 }


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [RFC PATCH 5/7] x86: Add AMX CPUIDs enumeration
  2022-01-07  9:31 [RFC PATCH 0/7] AMX support in Qemu Yang Zhong
                   ` (3 preceding siblings ...)
  2022-01-07  9:31 ` [RFC PATCH 4/7] x86: Add XFD faulting bit for state components Yang Zhong
@ 2022-01-07  9:31 ` Yang Zhong
  2022-01-07  9:31 ` [RFC PATCH 6/7] x86: Use new XSAVE ioctls handling Yang Zhong
  2022-01-07  9:31 ` [RFC PATCH 7/7] x86: Support XFD and AMX xsave data migration Yang Zhong
  6 siblings, 0 replies; 31+ messages in thread
From: Yang Zhong @ 2022-01-07  9:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: yang.zhong, kevin.tian, seanjc, jing2.liu, wei.w.wang,
	guang.zeng, pbonzini

From: Jing Liu <jing2.liu@intel.com>

Add AMX primary feature bits XFD and AMX_TILE to
enumerate the CPU's AMX capability. Meanwhile, add
AMX TILE and TMUL CPUID leaf and subleaves which
exist when AMX TILE is present to provide the maximum
capability of TILE and TMUL.

Signed-off-by: Jing Liu <jing2.liu@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
---
 target/i386/cpu.h     |  2 ++
 target/i386/cpu.c     | 55 ++++++++++++++++++++++++++++++++++++++++---
 target/i386/kvm/kvm.c |  3 ++-
 3 files changed, 56 insertions(+), 4 deletions(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 22f7ff40a6..245e8b5a1a 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -849,6 +849,8 @@ typedef uint64_t FeatureWordArray[FEATURE_WORDS];
 #define CPUID_7_0_EDX_TSX_LDTRK         (1U << 16)
 /* AVX512_FP16 instruction */
 #define CPUID_7_0_EDX_AVX512_FP16       (1U << 23)
+/* AMX tile (two-dimensional register)*/
+#define CPUID_7_0_EDX_AMX_TILE          (1U << 24)
 /* Speculation Control */
 #define CPUID_7_0_EDX_SPEC_CTRL         (1U << 26)
 /* Single Thread Indirect Branch Predictors */
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 1adc3f0f99..025e35471f 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -574,6 +574,18 @@ static CPUCacheInfo legacy_l3_cache = {
 #define INTEL_PT_CYCLE_BITMAP    0x1fff         /* Support 0,2^(0~11) */
 #define INTEL_PT_PSB_BITMAP      (0x003f << 16) /* Support 2K,4K,8K,16K,32K,64K */
 
+/* CPUID Leaf 0x1D constants: */
+#define INTEL_AMX_TILE_MAX_SUBLEAF     0x1
+#define INTEL_AMX_TOTAL_TILE_BYTES     0x2000
+#define INTEL_AMX_BYTES_PER_TILE       0x400
+#define INTEL_AMX_BYTES_PER_ROW        0x40
+#define INTEL_AMX_TILE_MAX_NAMES       0x8
+#define INTEL_AMX_TILE_MAX_ROWS        0x10
+
+/* CPUID Leaf 0x1E constants: */
+#define INTEL_AMX_TMUL_MAX_K           0x10
+#define INTEL_AMX_TMUL_MAX_N           0x40
+
 void x86_cpu_vendor_words2str(char *dst, uint32_t vendor1,
                               uint32_t vendor2, uint32_t vendor3)
 {
@@ -843,8 +855,8 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
             "avx512-vp2intersect", NULL, "md-clear", NULL,
             NULL, NULL, "serialize", NULL,
             "tsx-ldtrk", NULL, NULL /* pconfig */, NULL,
-            NULL, NULL, NULL, "avx512-fp16",
-            NULL, NULL, "spec-ctrl", "stibp",
+            NULL, NULL, "amx-bf16", "avx512-fp16",
+            "amx-tile", "amx-int8", "spec-ctrl", "stibp",
             NULL, "arch-capabilities", "core-capability", "ssbd",
         },
         .cpuid = {
@@ -909,7 +921,7 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
         .type = CPUID_FEATURE_WORD,
         .feat_names = {
             "xsaveopt", "xsavec", "xgetbv1", "xsaves",
-            NULL, NULL, NULL, NULL,
+            "xfd", NULL, NULL, NULL,
             NULL, NULL, NULL, NULL,
             NULL, NULL, NULL, NULL,
             NULL, NULL, NULL, NULL,
@@ -5584,6 +5596,43 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
         }
         break;
     }
+    case 0x1D: {
+        /* AMX TILE */
+        *eax = 0;
+        *ebx = 0;
+        *ecx = 0;
+        *edx = 0;
+        if (!(env->features[FEAT_7_0_EDX] & CPUID_7_0_EDX_AMX_TILE)) {
+            break;
+        }
+
+        if (count == 0) {
+            /* Highest numbered palette subleaf */
+            *eax = INTEL_AMX_TILE_MAX_SUBLEAF;
+        } else if (count == 1) {
+            *eax = INTEL_AMX_TOTAL_TILE_BYTES |
+                   (INTEL_AMX_BYTES_PER_TILE << 16);
+            *ebx = INTEL_AMX_BYTES_PER_ROW | (INTEL_AMX_TILE_MAX_NAMES << 16);
+            *ecx = INTEL_AMX_TILE_MAX_ROWS;
+        }
+        break;
+    }
+    case 0x1E: {
+        /* AMX TMUL */
+        *eax = 0;
+        *ebx = 0;
+        *ecx = 0;
+        *edx = 0;
+        if (!(env->features[FEAT_7_0_EDX] & CPUID_7_0_EDX_AMX_TILE)) {
+            break;
+        }
+
+        if (count == 0) {
+            /* Highest numbered palette subleaf */
+            *ebx = INTEL_AMX_TMUL_MAX_K | (INTEL_AMX_TMUL_MAX_N << 8);
+        }
+        break;
+    }
     case 0x40000000:
         /*
          * CPUID code in kvm_arch_init_vcpu() ignores stuff
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 13f8e30c2a..3fb3ddbe2b 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -1758,7 +1758,8 @@ int kvm_arch_init_vcpu(CPUState *cs)
                 c = &cpuid_data.entries[cpuid_i++];
             }
             break;
-        case 0x14: {
+        case 0x14:
+        case 0x1d: {
             uint32_t times;
 
             c->function = i;


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [RFC PATCH 6/7] x86: Use new XSAVE ioctls handling
  2022-01-07  9:31 [RFC PATCH 0/7] AMX support in Qemu Yang Zhong
                   ` (4 preceding siblings ...)
  2022-01-07  9:31 ` [RFC PATCH 5/7] x86: Add AMX CPUIDs enumeration Yang Zhong
@ 2022-01-07  9:31 ` Yang Zhong
  2022-01-10  8:40   ` Tian, Kevin
  2022-01-07  9:31 ` [RFC PATCH 7/7] x86: Support XFD and AMX xsave data migration Yang Zhong
  6 siblings, 1 reply; 31+ messages in thread
From: Yang Zhong @ 2022-01-07  9:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: yang.zhong, kevin.tian, seanjc, jing2.liu, wei.w.wang,
	guang.zeng, pbonzini

From: Jing Liu <jing2.liu@intel.com>

Extended feature has large state while current
kvm_xsave only allows 4KB. Use new XSAVE ioctls
if the xstate size is large than kvm_xsave.

Signed-off-by: Jing Liu <jing2.liu@intel.com>
Signed-off-by: Zeng Guang <guang.zeng@intel.com>
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
---
 linux-headers/asm-x86/kvm.h | 14 ++++++++++++++
 linux-headers/linux/kvm.h   |  2 ++
 target/i386/cpu.h           |  5 +++++
 target/i386/kvm/kvm.c       | 16 ++++++++++++++--
 target/i386/xsave_helper.c  | 35 +++++++++++++++++++++++++++++++++++
 5 files changed, 70 insertions(+), 2 deletions(-)

diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h
index 5a776a08f7..32f2a921e8 100644
--- a/linux-headers/asm-x86/kvm.h
+++ b/linux-headers/asm-x86/kvm.h
@@ -376,6 +376,20 @@ struct kvm_debugregs {
 /* for KVM_CAP_XSAVE */
 struct kvm_xsave {
 	__u32 region[1024];
+	/*
+	 * KVM_GET_XSAVE2 and KVM_SET_XSAVE write and read as many bytes
+	 * as are returned by KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2)
+	 * respectively, when invoked on the vm file descriptor.
+	 *
+	 * The size value returned by KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2)
+	 * will always be at least 4096. Currently, it is only greater
+	 * than 4096 if a dynamic feature has been enabled with
+	 * ``arch_prctl()``, but this may change in the future.
+	 *
+	 * The offsets of the state save areas in struct kvm_xsave follow
+	 * the contents of CPUID leaf 0xD on the host.
+	 */
+	__u32 extra[0];
 };
 
 #define KVM_MAX_XCRS	16
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 02c5e7b7bb..97d5b6d81d 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -1130,6 +1130,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_BINARY_STATS_FD 203
 #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
 #define KVM_CAP_ARM_MTE 205
+#define KVM_CAP_XSAVE2  207
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1550,6 +1551,7 @@ struct kvm_s390_ucas_mapping {
 /* Available with KVM_CAP_XSAVE */
 #define KVM_GET_XSAVE		  _IOR(KVMIO,  0xa4, struct kvm_xsave)
 #define KVM_SET_XSAVE		  _IOW(KVMIO,  0xa5, struct kvm_xsave)
+#define KVM_GET_XSAVE2		  _IOR(KVMIO,  0xcf, struct kvm_xsave)
 /* Available with KVM_CAP_XCRS */
 #define KVM_GET_XCRS		  _IOR(KVMIO,  0xa6, struct kvm_xcrs)
 #define KVM_SET_XCRS		  _IOW(KVMIO,  0xa7, struct kvm_xcrs)
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 245e8b5a1a..6153c4ab1a 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1519,6 +1519,11 @@ typedef struct CPUX86State {
     YMMReg zmmh_regs[CPU_NB_REGS];
     ZMMReg hi16_zmm_regs[CPU_NB_REGS];
 
+#ifdef TARGET_X86_64
+    uint8_t xtilecfg[64];
+    uint8_t xtiledata[8192];
+#endif
+
     /* sysenter registers */
     uint32_t sysenter_cs;
     target_ulong sysenter_esp;
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 3fb3ddbe2b..97520e9dff 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -1983,7 +1983,12 @@ int kvm_arch_init_vcpu(CPUState *cs)
     }
 
     if (has_xsave) {
-        env->xsave_buf_len = sizeof(struct kvm_xsave);
+        uint32_t size = kvm_vm_check_extension(cs->kvm_state, KVM_CAP_XSAVE2);
+        if (!size) {
+            size = sizeof(struct kvm_xsave);
+        }
+
+        env->xsave_buf_len = QEMU_ALIGN_UP(size, 4096);
         env->xsave_buf = qemu_memalign(4096, env->xsave_buf_len);
         memset(env->xsave_buf, 0, env->xsave_buf_len);
 
@@ -2580,6 +2585,7 @@ static int kvm_put_xsave(X86CPU *cpu)
     if (!has_xsave) {
         return kvm_put_fpu(cpu);
     }
+
     x86_cpu_xsave_all_areas(cpu, xsave, env->xsave_buf_len);
 
     return kvm_vcpu_ioctl(CPU(cpu), KVM_SET_XSAVE, xsave);
@@ -3247,10 +3253,16 @@ static int kvm_get_xsave(X86CPU *cpu)
         return kvm_get_fpu(cpu);
     }
 
-    ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_XSAVE, xsave);
+    if (env->xsave_buf_len <= sizeof(struct kvm_xsave)) {
+        ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_XSAVE, xsave);
+    } else {
+        ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_XSAVE2, xsave);
+    }
+
     if (ret < 0) {
         return ret;
     }
+
     x86_cpu_xrstor_all_areas(cpu, xsave, env->xsave_buf_len);
 
     return 0;
diff --git a/target/i386/xsave_helper.c b/target/i386/xsave_helper.c
index ac61a96344..090424e820 100644
--- a/target/i386/xsave_helper.c
+++ b/target/i386/xsave_helper.c
@@ -5,6 +5,7 @@
 #include "qemu/osdep.h"
 
 #include "cpu.h"
+#include <asm/kvm.h>
 
 void x86_cpu_xsave_all_areas(X86CPU *cpu, void *buf, uint32_t buflen)
 {
@@ -126,6 +127,23 @@ void x86_cpu_xsave_all_areas(X86CPU *cpu, void *buf, uint32_t buflen)
 
         memcpy(pkru, &env->pkru, sizeof(env->pkru));
     }
+
+    e = &x86_ext_save_areas[XSTATE_XTILE_CFG_BIT];
+    if (e->size && e->offset) {
+        XSaveXTILE_CFG *tilecfg = buf + e->offset;
+
+        memcpy(tilecfg, &env->xtilecfg, sizeof(env->xtilecfg));
+    }
+
+    if (buflen > sizeof(struct kvm_xsave)) {
+        e = &x86_ext_save_areas[XSTATE_XTILE_DATA_BIT];
+
+        if (e->size && e->offset) {
+            XSaveXTILE_DATA *tiledata = buf + e->offset;
+
+            memcpy(tiledata, &env->xtiledata, sizeof(env->xtiledata));
+        }
+    }
 #endif
 }
 
@@ -247,5 +265,22 @@ void x86_cpu_xrstor_all_areas(X86CPU *cpu, const void *buf, uint32_t buflen)
         pkru = buf + e->offset;
         memcpy(&env->pkru, pkru, sizeof(env->pkru));
     }
+
+    e = &x86_ext_save_areas[XSTATE_XTILE_CFG_BIT];
+    if (e->size && e->offset) {
+        const XSaveXTILE_CFG *tilecfg = buf + e->offset;
+
+        memcpy(&env->xtilecfg, tilecfg, sizeof(env->xtilecfg));
+    }
+
+    if (buflen > sizeof(struct kvm_xsave)) {
+        e = &x86_ext_save_areas[XSTATE_XTILE_DATA_BIT];
+
+        if (e->size && e->offset) {
+            const XSaveXTILE_DATA *tiledata = buf + e->offset;
+
+            memcpy(&env->xtiledata, tiledata, sizeof(env->xtiledata));
+        }
+    }
 #endif
 }


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [RFC PATCH 7/7] x86: Support XFD and AMX xsave data migration
  2022-01-07  9:31 [RFC PATCH 0/7] AMX support in Qemu Yang Zhong
                   ` (5 preceding siblings ...)
  2022-01-07  9:31 ` [RFC PATCH 6/7] x86: Use new XSAVE ioctls handling Yang Zhong
@ 2022-01-07  9:31 ` Yang Zhong
  6 siblings, 0 replies; 31+ messages in thread
From: Yang Zhong @ 2022-01-07  9:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: yang.zhong, kevin.tian, seanjc, jing2.liu, wei.w.wang,
	guang.zeng, pbonzini

From: Zeng Guang <guang.zeng@intel.com>

XFD(eXtended Feature Disable) allows to enable a
feature on xsave state while preventing specific
user threads from using the feature.

Support save and restore XFD MSRs if CPUID.D.1.EAX[4]
enumerate to be valid. Likewise migrate the MSRs and
related xsave state necessarily.

Signed-off-by: Zeng Guang <guang.zeng@intel.com>
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
---
 target/i386/cpu.h     |  9 +++++++++
 target/i386/kvm/kvm.c | 18 ++++++++++++++++++
 target/i386/machine.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 69 insertions(+)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 6153c4ab1a..1627988790 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -505,6 +505,9 @@ typedef enum X86Seg {
 
 #define MSR_VM_HSAVE_PA                 0xc0010117
 
+#define MSR_IA32_XFD                    0x000001c4
+#define MSR_IA32_XFD_ERR                0x000001c5
+
 #define MSR_IA32_BNDCFGS                0x00000d90
 #define MSR_IA32_XSS                    0x00000da0
 #define MSR_IA32_UMWAIT_CONTROL         0xe1
@@ -866,6 +869,8 @@ typedef uint64_t FeatureWordArray[FEATURE_WORDS];
 #define CPUID_7_1_EAX_AVX_VNNI          (1U << 4)
 /* AVX512 BFloat16 Instruction */
 #define CPUID_7_1_EAX_AVX512_BF16       (1U << 5)
+/* XFD Extend Feature Disabled */
+#define CPUID_D_1_EAX_XFD               (1U << 4)
 
 /* Packets which contain IP payload have LIP values */
 #define CPUID_14_0_ECX_LIP              (1U << 31)
@@ -1608,6 +1613,10 @@ typedef struct CPUX86State {
     uint64_t msr_rtit_cr3_match;
     uint64_t msr_rtit_addrs[MAX_RTIT_ADDRS];
 
+    /* Per-VCPU XFD MSRs */
+    uint64_t msr_xfd;
+    uint64_t msr_xfd_err;
+
     /* exception/interrupt handling */
     int error_code;
     int exception_is_int;
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 97520e9dff..02d5cf1063 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -3192,6 +3192,13 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
                               env->msr_ia32_sgxlepubkeyhash[3]);
         }
 
+        if (env->features[FEAT_XSAVE] & CPUID_D_1_EAX_XFD) {
+            kvm_msr_entry_add(cpu, MSR_IA32_XFD,
+                              env->msr_xfd);
+            kvm_msr_entry_add(cpu, MSR_IA32_XFD_ERR,
+                              env->msr_xfd_err);
+        }
+
         /* Note: MSR_IA32_FEATURE_CONTROL is written separately, see
          *       kvm_put_msr_feature_control. */
     }
@@ -3548,6 +3555,11 @@ static int kvm_get_msrs(X86CPU *cpu)
         kvm_msr_entry_add(cpu, MSR_IA32_SGXLEPUBKEYHASH3, 0);
     }
 
+    if (env->features[FEAT_XSAVE] & CPUID_D_1_EAX_XFD) {
+        kvm_msr_entry_add(cpu, MSR_IA32_XFD, 0);
+        kvm_msr_entry_add(cpu, MSR_IA32_XFD_ERR, 0);
+    }
+
     ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_MSRS, cpu->kvm_msr_buf);
     if (ret < 0) {
         return ret;
@@ -3844,6 +3856,12 @@ static int kvm_get_msrs(X86CPU *cpu)
             env->msr_ia32_sgxlepubkeyhash[index - MSR_IA32_SGXLEPUBKEYHASH0] =
                            msrs[i].data;
             break;
+        case MSR_IA32_XFD:
+            env->msr_xfd = msrs[i].data;
+            break;
+        case MSR_IA32_XFD_ERR:
+            env->msr_xfd_err = msrs[i].data;
+            break;
         }
     }
 
diff --git a/target/i386/machine.c b/target/i386/machine.c
index 83c2b91529..fdeb5bab50 100644
--- a/target/i386/machine.c
+++ b/target/i386/machine.c
@@ -1455,6 +1455,46 @@ static const VMStateDescription vmstate_msr_intel_sgx = {
     }
 };
 
+static bool xfd_msrs_needed(void *opaque)
+{
+    X86CPU *cpu = opaque;
+    CPUX86State *env = &cpu->env;
+
+    return !!(env->features[FEAT_XSAVE] & CPUID_D_1_EAX_XFD);
+}
+
+static const VMStateDescription vmstate_msr_xfd = {
+    .name = "cpu/msr_xfd",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = xfd_msrs_needed,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT64(env.msr_xfd, X86CPU),
+        VMSTATE_UINT64(env.msr_xfd_err, X86CPU),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static bool amx_xtile_needed(void *opaque)
+{
+    X86CPU *cpu = opaque;
+    CPUX86State *env = &cpu->env;
+
+    return !!(env->features[FEAT_7_0_EDX] & CPUID_7_0_EDX_AMX_TILE);
+}
+
+static const VMStateDescription vmstate_amx_xtile = {
+    .name = "cpu/intel_amx_xtile",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = amx_xtile_needed,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT8_ARRAY(env.xtilecfg, X86CPU, 64),
+        VMSTATE_UINT8_ARRAY(env.xtiledata, X86CPU, 8192),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
 const VMStateDescription vmstate_x86_cpu = {
     .name = "cpu",
     .version_id = 12,
@@ -1593,6 +1633,8 @@ const VMStateDescription vmstate_x86_cpu = {
 #endif
         &vmstate_msr_tsx_ctrl,
         &vmstate_msr_intel_sgx,
+        &vmstate_msr_xfd,
+        &vmstate_amx_xtile,
         NULL
     }
 };


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* RE: [RFC PATCH 1/7] x86: Fix the 64-byte boundary enumeration for extended state
  2022-01-07  9:31 ` [RFC PATCH 1/7] x86: Fix the 64-byte boundary enumeration for extended state Yang Zhong
@ 2022-01-10  8:20   ` Tian, Kevin
  2022-01-11  2:22     ` Yang Zhong
  0 siblings, 1 reply; 31+ messages in thread
From: Tian, Kevin @ 2022-01-10  8:20 UTC (permalink / raw)
  To: Zhong, Yang, qemu-devel
  Cc: pbonzini, Wang, Wei W, jing2.liu, Zeng, Guang, Christopherson, , Sean

> From: Zhong, Yang <yang.zhong@intel.com>
> Sent: Friday, January 7, 2022 5:31 PM
> 
> From: Jing Liu <jing2.liu@intel.com>
> 
> The extended state subleaves (EAX=0Dh, ECX=n, n>1).ECX[1]
> are all zero, while spec actually introduces that bit 01
> should indicate if the extended state component locates
> on the next 64-byte boundary following the preceding state
> component when the compacted format of an XSAVE area is
> used.

Above would read clearer if you revise to:

"The extended state subleaves (EAX=0Dh, ECX=n, n>1).ECX[1]
indicate whether the extended state component locates
on the next 64-byte boundary following the preceding state
component when the compacted format of an XSAVE area is
used.

But ECX[1] is always cleared in current implementation."

> 
> Fix the subleaves value according to the host supported
> cpuid. The upcoming AMX feature would be the first one
> using it.
> 
> Signed-off-by: Jing Liu <jing2.liu@intel.com>
> Signed-off-by: Yang Zhong <yang.zhong@intel.com>
> ---
>  target/i386/cpu.h         | 1 +
>  target/i386/cpu.c         | 1 +
>  target/i386/kvm/kvm-cpu.c | 3 +++
>  3 files changed, 5 insertions(+)
> 
> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> index 04f2b790c9..7f9700544f 100644
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -1354,6 +1354,7 @@ QEMU_BUILD_BUG_ON(sizeof(XSavePKRU) != 0x8);
>  typedef struct ExtSaveArea {
>      uint32_t feature, bits;
>      uint32_t offset, size;
> +    uint32_t need_align;
>  } ExtSaveArea;
> 
>  #define XSAVE_STATE_AREA_COUNT (XSTATE_PKRU_BIT + 1)
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index aa9e636800..47bc4d5c1a 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -5487,6 +5487,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t
> index, uint32_t count,
>                  const ExtSaveArea *esa = &x86_ext_save_areas[count];
>                  *eax = esa->size;
>                  *ebx = esa->offset;
> +                *ecx = esa->need_align << 1;
>              }
>          }
>          break;
> diff --git a/target/i386/kvm/kvm-cpu.c b/target/i386/kvm/kvm-cpu.c
> index d95028018e..6c4c1c6f9d 100644
> --- a/target/i386/kvm/kvm-cpu.c
> +++ b/target/i386/kvm/kvm-cpu.c
> @@ -105,6 +105,9 @@ static void kvm_cpu_xsave_init(void)
>                  assert(esa->size == sz);
>                  esa->offset = kvm_arch_get_supported_cpuid(s, 0xd, i, R_EBX);
>              }
> +
> +            uint32_t ecx = kvm_arch_get_supported_cpuid(s, 0xd, i, R_ECX);
> +            esa->need_align = ecx & (1u << 1) ? 1 : 0;
>          }
>      }
>  }


^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [RFC PATCH 2/7] x86: Add AMX XTILECFG and XTILEDATA components
  2022-01-07  9:31 ` [RFC PATCH 2/7] x86: Add AMX XTILECFG and XTILEDATA components Yang Zhong
@ 2022-01-10  8:23   ` Tian, Kevin
  2022-01-11  2:32     ` Yang Zhong
  2022-01-18 12:39     ` Paolo Bonzini
  0 siblings, 2 replies; 31+ messages in thread
From: Tian, Kevin @ 2022-01-10  8:23 UTC (permalink / raw)
  To: Zhong, Yang, qemu-devel
  Cc: pbonzini, Wang, Wei W, jing2.liu, Zeng, Guang, Christopherson, , Sean

> From: Zhong, Yang <yang.zhong@intel.com>
> Sent: Friday, January 7, 2022 5:31 PM
> 
> From: Jing Liu <jing2.liu@intel.com>
> 
> AMX XTILECFG and XTILEDATA are managed by XSAVE feature
> set. State component 17 is used for 64-byte TILECFG register
> (XTILECFG state) and component 18 is used for 8192 bytes
> of tile data (XTILEDATA state).

to be consistent, "tile data" -> "TILEDATA"

> 
> Add AMX feature bits to x86_ext_save_areas array to set
> up AMX components. Add structs that define the layout of
> AMX XSAVE areas and use QEMU_BUILD_BUG_ON to validate the
> structs sizes.
> 
> Signed-off-by: Jing Liu <jing2.liu@intel.com>
> Signed-off-by: Yang Zhong <yang.zhong@intel.com>
> ---
>  target/i386/cpu.h | 16 +++++++++++++++-
>  target/i386/cpu.c |  8 ++++++++
>  2 files changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> index 7f9700544f..768a8218be 100644
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -537,6 +537,8 @@ typedef enum X86Seg {
>  #define XSTATE_ZMM_Hi256_BIT            6
>  #define XSTATE_Hi16_ZMM_BIT             7
>  #define XSTATE_PKRU_BIT                 9
> +#define XSTATE_XTILE_CFG_BIT            17
> +#define XSTATE_XTILE_DATA_BIT           18
> 
>  #define XSTATE_FP_MASK                  (1ULL << XSTATE_FP_BIT)
>  #define XSTATE_SSE_MASK                 (1ULL << XSTATE_SSE_BIT)
> @@ -1343,6 +1345,16 @@ typedef struct XSavePKRU {
>      uint32_t padding;
>  } XSavePKRU;
> 
> +/* Ext. save area 17: AMX XTILECFG state */
> +typedef struct XSaveXTILE_CFG {

remove "_"?

> +    uint8_t xtilecfg[64];
> +} XSaveXTILE_CFG;
> +
> +/* Ext. save area 18: AMX XTILEDATA state */
> +typedef struct XSaveXTILE_DATA {

ditto

> +    uint8_t xtiledata[8][1024];
> +} XSaveXTILE_DATA;
> +
>  QEMU_BUILD_BUG_ON(sizeof(XSaveAVX) != 0x100);
>  QEMU_BUILD_BUG_ON(sizeof(XSaveBNDREG) != 0x40);
>  QEMU_BUILD_BUG_ON(sizeof(XSaveBNDCSR) != 0x40);
> @@ -1350,6 +1362,8 @@ QEMU_BUILD_BUG_ON(sizeof(XSaveOpmask) !=
> 0x40);
>  QEMU_BUILD_BUG_ON(sizeof(XSaveZMM_Hi256) != 0x200);
>  QEMU_BUILD_BUG_ON(sizeof(XSaveHi16_ZMM) != 0x400);
>  QEMU_BUILD_BUG_ON(sizeof(XSavePKRU) != 0x8);
> +QEMU_BUILD_BUG_ON(sizeof(XSaveXTILE_CFG) != 0x40);
> +QEMU_BUILD_BUG_ON(sizeof(XSaveXTILE_DATA) != 0x2000);
> 
>  typedef struct ExtSaveArea {
>      uint32_t feature, bits;
> @@ -1357,7 +1371,7 @@ typedef struct ExtSaveArea {
>      uint32_t need_align;
>  } ExtSaveArea;
> 
> -#define XSAVE_STATE_AREA_COUNT (XSTATE_PKRU_BIT + 1)
> +#define XSAVE_STATE_AREA_COUNT (XSTATE_XTILE_DATA_BIT + 1)
> 
>  extern ExtSaveArea x86_ext_save_areas[XSAVE_STATE_AREA_COUNT];
> 
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 47bc4d5c1a..dd2c919c33 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -1401,6 +1401,14 @@ ExtSaveArea
> x86_ext_save_areas[XSAVE_STATE_AREA_COUNT] = {
>      [XSTATE_PKRU_BIT] =
>            { .feature = FEAT_7_0_ECX, .bits = CPUID_7_0_ECX_PKU,
>              .size = sizeof(XSavePKRU) },
> +    [XSTATE_XTILE_CFG_BIT] = {
> +        .feature = FEAT_7_0_EDX, .bits = CPUID_7_0_EDX_AMX_TILE,
> +        .size = sizeof(XSaveXTILE_CFG),
> +    },
> +    [XSTATE_XTILE_DATA_BIT] = {
> +        .feature = FEAT_7_0_EDX, .bits = CPUID_7_0_EDX_AMX_TILE,
> +        .size = sizeof(XSaveXTILE_DATA),
> +    },
>  };
> 
>  static uint32_t xsave_area_size(uint64_t mask)


^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [RFC PATCH 3/7] x86: Grant AMX permission for guest
  2022-01-07  9:31 ` [RFC PATCH 3/7] x86: Grant AMX permission for guest Yang Zhong
@ 2022-01-10  8:36   ` Tian, Kevin
  2022-01-11  6:46     ` Yang Zhong
  2022-01-18 12:52   ` Paolo Bonzini
  1 sibling, 1 reply; 31+ messages in thread
From: Tian, Kevin @ 2022-01-10  8:36 UTC (permalink / raw)
  To: Zhong, Yang, qemu-devel
  Cc: pbonzini, Wang, Wei W, jing2.liu, Zeng, Guang, Christopherson, , Sean

> From: Zhong, Yang <yang.zhong@intel.com>
> Sent: Friday, January 7, 2022 5:32 PM
> 
> Kernel mechanism for dynamically enabled XSAVE features

there is no definition of "dynamically-enabled XSAVE features).

> asks userspace VMM requesting guest permission if it wants
> to expose the features. Only with the permission, kernel
> can try to enable the features when detecting the intention
> from guest in runtime.
> 
> Qemu should request the permission for guest only once
> before the first vCPU is created. KVM checks the guest
> permission when Qemu advertises the features, and the
> advertising operation fails w/o permission.

what about below?

"Kernel allocates 4K xstate buffer by default. For XSAVE features
which require large state component (e.g. AMX), Linux kernel 
dynamically expands the xstate buffer only after the process has
acquired the necessary permissions. Those are called dynamically-
enabled XSAVE features (or dynamic xfeatures).

There are separate permissions for native tasks and guests.

Qemu should request the guest permissions for dynamic xfeatures 
which will be exposed to the guest. This only needs to be done
once before the first vcpu is created."

> 
> Signed-off-by: Yang Zhong <yang.zhong@intel.com>
> Signed-off-by: Jing Liu <jing2.liu@intel.com>
> ---
>  target/i386/cpu.h |  7 +++++++
>  hw/i386/x86.c     | 28 ++++++++++++++++++++++++++++
>  2 files changed, 35 insertions(+)
> 
> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> index 768a8218be..79023fe723 100644
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -549,6 +549,13 @@ typedef enum X86Seg {
>  #define XSTATE_ZMM_Hi256_MASK           (1ULL << XSTATE_ZMM_Hi256_BIT)
>  #define XSTATE_Hi16_ZMM_MASK            (1ULL << XSTATE_Hi16_ZMM_BIT)
>  #define XSTATE_PKRU_MASK                (1ULL << XSTATE_PKRU_BIT)
> +#define XSTATE_XTILE_CFG_MASK           (1ULL << XSTATE_XTILE_CFG_BIT)
> +#define XSTATE_XTILE_DATA_MASK          (1ULL << XSTATE_XTILE_DATA_BIT)
> +#define XFEATURE_XTILE_MASK             (XSTATE_XTILE_CFG_MASK \
> +                                         | XSTATE_XTILE_DATA_MASK)
> +
> +#define ARCH_GET_XCOMP_GUEST_PERM       0x1024
> +#define ARCH_REQ_XCOMP_GUEST_PERM       0x1025
> 
>  /* CPUID feature words */
>  typedef enum FeatureWord {
> diff --git a/hw/i386/x86.c b/hw/i386/x86.c
> index b84840a1bb..0a204c375e 100644
> --- a/hw/i386/x86.c
> +++ b/hw/i386/x86.c
> @@ -41,6 +41,8 @@
>  #include "sysemu/cpu-timers.h"
>  #include "trace.h"
> 
> +#include <sys/syscall.h>
> +
>  #include "hw/i386/x86.h"
>  #include "target/i386/cpu.h"
>  #include "hw/i386/topology.h"
> @@ -117,6 +119,30 @@ out:
>      object_unref(cpu);
>  }
> 
> +static void x86_xsave_req_perm(void)
> +{
> +    unsigned long bitmask;
> +
> +    long rc = syscall(SYS_arch_prctl, ARCH_REQ_XCOMP_GUEST_PERM,
> +                      XSTATE_XTILE_DATA_BIT);

Should we do it based on the cpuid for the first vcpu?

> +    if (rc) {
> +        /*
> +         * The older kernel version(<5.15) can't support
> +         * ARCH_REQ_XCOMP_GUEST_PERM and directly return.
> +         */
> +        return;
> +    }
> +
> +    rc = syscall(SYS_arch_prctl, ARCH_GET_XCOMP_GUEST_PERM, &bitmask);
> +    if (rc) {
> +        error_report("prctl(ARCH_GET_XCOMP_GUEST_PERM) error: %ld", rc);
> +    } else if (!(bitmask & XFEATURE_XTILE_MASK)) {
> +        error_report("prctl(ARCH_REQ_XCOMP_GUEST_PERM) failure "
> +                     "and bitmask=0x%lx", bitmask);
> +        exit(EXIT_FAILURE);
> +    }
> +}
> +
>  void x86_cpus_init(X86MachineState *x86ms, int default_cpu_version)
>  {
>      int i;
> @@ -124,6 +150,8 @@ void x86_cpus_init(X86MachineState *x86ms, int
> default_cpu_version)
>      MachineState *ms = MACHINE(x86ms);
>      MachineClass *mc = MACHINE_GET_CLASS(x86ms);
> 
> +    /* Request AMX pemission for guest */
> +    x86_xsave_req_perm();
>      x86_cpu_set_default_version(default_cpu_version);
> 
>      /*


^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [RFC PATCH 4/7] x86: Add XFD faulting bit for state components
  2022-01-07  9:31 ` [RFC PATCH 4/7] x86: Add XFD faulting bit for state components Yang Zhong
@ 2022-01-10  8:38   ` Tian, Kevin
  2022-01-11  5:32     ` Yang Zhong
  2022-01-18 12:52   ` Paolo Bonzini
  1 sibling, 1 reply; 31+ messages in thread
From: Tian, Kevin @ 2022-01-10  8:38 UTC (permalink / raw)
  To: Zhong, Yang, qemu-devel
  Cc: pbonzini, Wang, Wei W, jing2.liu, Zeng, Guang, Christopherson, , Sean

> From: Zhong, Yang <yang.zhong@intel.com>
> Sent: Friday, January 7, 2022 5:32 PM
> 
> From: Jing Liu <jing2.liu@intel.com>
> 
> Intel introduces XFD faulting mechanism for extended
> XSAVE features to dynamically enable the features in
> runtime. If CPUID (EAX=0Dh, ECX=n, n>1).ECX[2] is set
> as 1, it indicates support for XFD faulting of this
> state component.
> 
> Signed-off-by: Jing Liu <jing2.liu@intel.com>
> Signed-off-by: Yang Zhong <yang.zhong@intel.com>
> ---
>  target/i386/cpu.h         | 2 +-
>  target/i386/cpu.c         | 2 +-
>  target/i386/kvm/kvm-cpu.c | 1 +
>  3 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> index 79023fe723..22f7ff40a6 100644
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -1375,7 +1375,7 @@
> QEMU_BUILD_BUG_ON(sizeof(XSaveXTILE_DATA) != 0x2000);
>  typedef struct ExtSaveArea {
>      uint32_t feature, bits;
>      uint32_t offset, size;
> -    uint32_t need_align;
> +    uint32_t need_align, support_xfd;

why each flag be a 32-bit field?

also it's more natural to have them in separate lines, though I'm not
sure why existing fields are put this way (possibly due to short names?).

>  } ExtSaveArea;
> 
>  #define XSAVE_STATE_AREA_COUNT (XSTATE_XTILE_DATA_BIT + 1)
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index dd2c919c33..1adc3f0f99 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -5495,7 +5495,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t
> index, uint32_t count,
>                  const ExtSaveArea *esa = &x86_ext_save_areas[count];
>                  *eax = esa->size;
>                  *ebx = esa->offset;
> -                *ecx = esa->need_align << 1;
> +                *ecx = (esa->need_align << 1) | (esa->support_xfd << 2);
>              }
>          }
>          break;
> diff --git a/target/i386/kvm/kvm-cpu.c b/target/i386/kvm/kvm-cpu.c
> index 6c4c1c6f9d..3b3c203f11 100644
> --- a/target/i386/kvm/kvm-cpu.c
> +++ b/target/i386/kvm/kvm-cpu.c
> @@ -108,6 +108,7 @@ static void kvm_cpu_xsave_init(void)
> 
>              uint32_t ecx = kvm_arch_get_supported_cpuid(s, 0xd, i, R_ECX);
>              esa->need_align = ecx & (1u << 1) ? 1 : 0;
> +            esa->support_xfd = ecx & (1u << 2) ? 1 : 0;
>          }
>      }
>  }


^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [RFC PATCH 6/7] x86: Use new XSAVE ioctls handling
  2022-01-07  9:31 ` [RFC PATCH 6/7] x86: Use new XSAVE ioctls handling Yang Zhong
@ 2022-01-10  8:40   ` Tian, Kevin
  2022-01-10  9:47     ` Zeng Guang
  0 siblings, 1 reply; 31+ messages in thread
From: Tian, Kevin @ 2022-01-10  8:40 UTC (permalink / raw)
  To: Zhong, Yang, qemu-devel
  Cc: pbonzini, Wang, Wei W, jing2.liu, Zeng, Guang, Christopherson, , Sean

> From: Zhong, Yang <yang.zhong@intel.com>
> Sent: Friday, January 7, 2022 5:32 PM
> 
> From: Jing Liu <jing2.liu@intel.com>
> 
> Extended feature has large state while current
> kvm_xsave only allows 4KB. Use new XSAVE ioctls
> if the xstate size is large than kvm_xsave.

shouldn't we always use the new xsave ioctls as long as
CAP_XSAVE2 is available?

> 
> Signed-off-by: Jing Liu <jing2.liu@intel.com>
> Signed-off-by: Zeng Guang <guang.zeng@intel.com>
> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> Signed-off-by: Yang Zhong <yang.zhong@intel.com>
> ---
>  linux-headers/asm-x86/kvm.h | 14 ++++++++++++++
>  linux-headers/linux/kvm.h   |  2 ++
>  target/i386/cpu.h           |  5 +++++
>  target/i386/kvm/kvm.c       | 16 ++++++++++++++--
>  target/i386/xsave_helper.c  | 35 +++++++++++++++++++++++++++++++++++
>  5 files changed, 70 insertions(+), 2 deletions(-)
> 
> diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h
> index 5a776a08f7..32f2a921e8 100644
> --- a/linux-headers/asm-x86/kvm.h
> +++ b/linux-headers/asm-x86/kvm.h
> @@ -376,6 +376,20 @@ struct kvm_debugregs {
>  /* for KVM_CAP_XSAVE */
>  struct kvm_xsave {
>  	__u32 region[1024];
> +	/*
> +	 * KVM_GET_XSAVE2 and KVM_SET_XSAVE write and read as many
> bytes
> +	 * as are returned by KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2)
> +	 * respectively, when invoked on the vm file descriptor.
> +	 *
> +	 * The size value returned by
> KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2)
> +	 * will always be at least 4096. Currently, it is only greater
> +	 * than 4096 if a dynamic feature has been enabled with
> +	 * ``arch_prctl()``, but this may change in the future.
> +	 *
> +	 * The offsets of the state save areas in struct kvm_xsave follow
> +	 * the contents of CPUID leaf 0xD on the host.
> +	 */
> +	__u32 extra[0];
>  };
> 
>  #define KVM_MAX_XCRS	16
> diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
> index 02c5e7b7bb..97d5b6d81d 100644
> --- a/linux-headers/linux/kvm.h
> +++ b/linux-headers/linux/kvm.h
> @@ -1130,6 +1130,7 @@ struct kvm_ppc_resize_hpt {
>  #define KVM_CAP_BINARY_STATS_FD 203
>  #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
>  #define KVM_CAP_ARM_MTE 205
> +#define KVM_CAP_XSAVE2  207
> 
>  #ifdef KVM_CAP_IRQ_ROUTING
> 
> @@ -1550,6 +1551,7 @@ struct kvm_s390_ucas_mapping {
>  /* Available with KVM_CAP_XSAVE */
>  #define KVM_GET_XSAVE		  _IOR(KVMIO,  0xa4, struct
> kvm_xsave)
>  #define KVM_SET_XSAVE		  _IOW(KVMIO,  0xa5, struct
> kvm_xsave)
> +#define KVM_GET_XSAVE2		  _IOR(KVMIO,  0xcf, struct
> kvm_xsave)
>  /* Available with KVM_CAP_XCRS */
>  #define KVM_GET_XCRS		  _IOR(KVMIO,  0xa6, struct kvm_xcrs)
>  #define KVM_SET_XCRS		  _IOW(KVMIO,  0xa7, struct kvm_xcrs)
> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> index 245e8b5a1a..6153c4ab1a 100644
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -1519,6 +1519,11 @@ typedef struct CPUX86State {
>      YMMReg zmmh_regs[CPU_NB_REGS];
>      ZMMReg hi16_zmm_regs[CPU_NB_REGS];
> 
> +#ifdef TARGET_X86_64
> +    uint8_t xtilecfg[64];
> +    uint8_t xtiledata[8192];
> +#endif
> +
>      /* sysenter registers */
>      uint32_t sysenter_cs;
>      target_ulong sysenter_esp;
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 3fb3ddbe2b..97520e9dff 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -1983,7 +1983,12 @@ int kvm_arch_init_vcpu(CPUState *cs)
>      }
> 
>      if (has_xsave) {
> -        env->xsave_buf_len = sizeof(struct kvm_xsave);
> +        uint32_t size = kvm_vm_check_extension(cs->kvm_state,
> KVM_CAP_XSAVE2);
> +        if (!size) {
> +            size = sizeof(struct kvm_xsave);
> +        }
> +
> +        env->xsave_buf_len = QEMU_ALIGN_UP(size, 4096);
>          env->xsave_buf = qemu_memalign(4096, env->xsave_buf_len);
>          memset(env->xsave_buf, 0, env->xsave_buf_len);
> 
> @@ -2580,6 +2585,7 @@ static int kvm_put_xsave(X86CPU *cpu)
>      if (!has_xsave) {
>          return kvm_put_fpu(cpu);
>      }
> +
>      x86_cpu_xsave_all_areas(cpu, xsave, env->xsave_buf_len);
> 
>      return kvm_vcpu_ioctl(CPU(cpu), KVM_SET_XSAVE, xsave);
> @@ -3247,10 +3253,16 @@ static int kvm_get_xsave(X86CPU *cpu)
>          return kvm_get_fpu(cpu);
>      }
> 
> -    ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_XSAVE, xsave);
> +    if (env->xsave_buf_len <= sizeof(struct kvm_xsave)) {
> +        ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_XSAVE, xsave);
> +    } else {
> +        ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_XSAVE2, xsave);
> +    }
> +
>      if (ret < 0) {
>          return ret;
>      }
> +
>      x86_cpu_xrstor_all_areas(cpu, xsave, env->xsave_buf_len);
> 
>      return 0;
> diff --git a/target/i386/xsave_helper.c b/target/i386/xsave_helper.c
> index ac61a96344..090424e820 100644
> --- a/target/i386/xsave_helper.c
> +++ b/target/i386/xsave_helper.c
> @@ -5,6 +5,7 @@
>  #include "qemu/osdep.h"
> 
>  #include "cpu.h"
> +#include <asm/kvm.h>
> 
>  void x86_cpu_xsave_all_areas(X86CPU *cpu, void *buf, uint32_t buflen)
>  {
> @@ -126,6 +127,23 @@ void x86_cpu_xsave_all_areas(X86CPU *cpu, void
> *buf, uint32_t buflen)
> 
>          memcpy(pkru, &env->pkru, sizeof(env->pkru));
>      }
> +
> +    e = &x86_ext_save_areas[XSTATE_XTILE_CFG_BIT];
> +    if (e->size && e->offset) {
> +        XSaveXTILE_CFG *tilecfg = buf + e->offset;
> +
> +        memcpy(tilecfg, &env->xtilecfg, sizeof(env->xtilecfg));
> +    }
> +
> +    if (buflen > sizeof(struct kvm_xsave)) {
> +        e = &x86_ext_save_areas[XSTATE_XTILE_DATA_BIT];
> +
> +        if (e->size && e->offset) {
> +            XSaveXTILE_DATA *tiledata = buf + e->offset;
> +
> +            memcpy(tiledata, &env->xtiledata, sizeof(env->xtiledata));
> +        }
> +    }
>  #endif
>  }
> 
> @@ -247,5 +265,22 @@ void x86_cpu_xrstor_all_areas(X86CPU *cpu, const
> void *buf, uint32_t buflen)
>          pkru = buf + e->offset;
>          memcpy(&env->pkru, pkru, sizeof(env->pkru));
>      }
> +
> +    e = &x86_ext_save_areas[XSTATE_XTILE_CFG_BIT];
> +    if (e->size && e->offset) {
> +        const XSaveXTILE_CFG *tilecfg = buf + e->offset;
> +
> +        memcpy(&env->xtilecfg, tilecfg, sizeof(env->xtilecfg));
> +    }
> +
> +    if (buflen > sizeof(struct kvm_xsave)) {
> +        e = &x86_ext_save_areas[XSTATE_XTILE_DATA_BIT];
> +
> +        if (e->size && e->offset) {
> +            const XSaveXTILE_DATA *tiledata = buf + e->offset;
> +
> +            memcpy(&env->xtiledata, tiledata, sizeof(env->xtiledata));
> +        }
> +    }
>  #endif
>  }


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 6/7] x86: Use new XSAVE ioctls handling
  2022-01-10  8:40   ` Tian, Kevin
@ 2022-01-10  9:47     ` Zeng Guang
  2022-01-11  2:30       ` Tian, Kevin
  0 siblings, 1 reply; 31+ messages in thread
From: Zeng Guang @ 2022-01-10  9:47 UTC (permalink / raw)
  To: Tian, Kevin, Zhong, Yang, qemu-devel
  Cc: pbonzini, Wang, Wei W, jing2.liu, Christopherson, , Sean

On 1/10/2022 4:40 PM, Tian, Kevin wrote:
>> From: Zhong, Yang <yang.zhong@intel.com>
>> Sent: Friday, January 7, 2022 5:32 PM
>>
>> From: Jing Liu <jing2.liu@intel.com>
>>
>> Extended feature has large state while current
>> kvm_xsave only allows 4KB. Use new XSAVE ioctls
>> if the xstate size is large than kvm_xsave.
> shouldn't we always use the new xsave ioctls as long as
> CAP_XSAVE2 is available?


CAP_XSAVE2 may return legacy xsave size or 0 working with old kvm 
version in which it's not available.
QEMU just use the new xsave ioctls only when the return value of 
CAP_XSAVE2 is larger than legacy xsave size.

>> Signed-off-by: Jing Liu <jing2.liu@intel.com>
>> Signed-off-by: Zeng Guang <guang.zeng@intel.com>
>> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
>> Signed-off-by: Yang Zhong <yang.zhong@intel.com>
>> ---
>>   linux-headers/asm-x86/kvm.h | 14 ++++++++++++++
>>   linux-headers/linux/kvm.h   |  2 ++
>>   target/i386/cpu.h           |  5 +++++
>>   target/i386/kvm/kvm.c       | 16 ++++++++++++++--
>>   target/i386/xsave_helper.c  | 35 +++++++++++++++++++++++++++++++++++
>>   5 files changed, 70 insertions(+), 2 deletions(-)
>>
>> diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h
>> index 5a776a08f7..32f2a921e8 100644
>> --- a/linux-headers/asm-x86/kvm.h
>> +++ b/linux-headers/asm-x86/kvm.h
>> @@ -376,6 +376,20 @@ struct kvm_debugregs {
>>   /* for KVM_CAP_XSAVE */
>>   struct kvm_xsave {
>>   	__u32 region[1024];
>> +	/*
>> +	 * KVM_GET_XSAVE2 and KVM_SET_XSAVE write and read as many
>> bytes
>> +	 * as are returned by KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2)
>> +	 * respectively, when invoked on the vm file descriptor.
>> +	 *
>> +	 * The size value returned by
>> KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2)
>> +	 * will always be at least 4096. Currently, it is only greater
>> +	 * than 4096 if a dynamic feature has been enabled with
>> +	 * ``arch_prctl()``, but this may change in the future.
>> +	 *
>> +	 * The offsets of the state save areas in struct kvm_xsave follow
>> +	 * the contents of CPUID leaf 0xD on the host.
>> +	 */
>> +	__u32 extra[0];
>>   };
>>
>>   #define KVM_MAX_XCRS	16
>> diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
>> index 02c5e7b7bb..97d5b6d81d 100644
>> --- a/linux-headers/linux/kvm.h
>> +++ b/linux-headers/linux/kvm.h
>> @@ -1130,6 +1130,7 @@ struct kvm_ppc_resize_hpt {
>>   #define KVM_CAP_BINARY_STATS_FD 203
>>   #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
>>   #define KVM_CAP_ARM_MTE 205
>> +#define KVM_CAP_XSAVE2  207
>>
>>   #ifdef KVM_CAP_IRQ_ROUTING
>>
>> @@ -1550,6 +1551,7 @@ struct kvm_s390_ucas_mapping {
>>   /* Available with KVM_CAP_XSAVE */
>>   #define KVM_GET_XSAVE		  _IOR(KVMIO,  0xa4, struct
>> kvm_xsave)
>>   #define KVM_SET_XSAVE		  _IOW(KVMIO,  0xa5, struct
>> kvm_xsave)
>> +#define KVM_GET_XSAVE2		  _IOR(KVMIO,  0xcf, struct
>> kvm_xsave)
>>   /* Available with KVM_CAP_XCRS */
>>   #define KVM_GET_XCRS		  _IOR(KVMIO,  0xa6, struct kvm_xcrs)
>>   #define KVM_SET_XCRS		  _IOW(KVMIO,  0xa7, struct kvm_xcrs)
>> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
>> index 245e8b5a1a..6153c4ab1a 100644
>> --- a/target/i386/cpu.h
>> +++ b/target/i386/cpu.h
>> @@ -1519,6 +1519,11 @@ typedef struct CPUX86State {
>>       YMMReg zmmh_regs[CPU_NB_REGS];
>>       ZMMReg hi16_zmm_regs[CPU_NB_REGS];
>>
>> +#ifdef TARGET_X86_64
>> +    uint8_t xtilecfg[64];
>> +    uint8_t xtiledata[8192];
>> +#endif
>> +
>>       /* sysenter registers */
>>       uint32_t sysenter_cs;
>>       target_ulong sysenter_esp;
>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
>> index 3fb3ddbe2b..97520e9dff 100644
>> --- a/target/i386/kvm/kvm.c
>> +++ b/target/i386/kvm/kvm.c
>> @@ -1983,7 +1983,12 @@ int kvm_arch_init_vcpu(CPUState *cs)
>>       }
>>
>>       if (has_xsave) {
>> -        env->xsave_buf_len = sizeof(struct kvm_xsave);
>> +        uint32_t size = kvm_vm_check_extension(cs->kvm_state,
>> KVM_CAP_XSAVE2);
>> +        if (!size) {
>> +            size = sizeof(struct kvm_xsave);
>> +        }
>> +
>> +        env->xsave_buf_len = QEMU_ALIGN_UP(size, 4096);
>>           env->xsave_buf = qemu_memalign(4096, env->xsave_buf_len);
>>           memset(env->xsave_buf, 0, env->xsave_buf_len);
>>
>> @@ -2580,6 +2585,7 @@ static int kvm_put_xsave(X86CPU *cpu)
>>       if (!has_xsave) {
>>           return kvm_put_fpu(cpu);
>>       }
>> +
>>       x86_cpu_xsave_all_areas(cpu, xsave, env->xsave_buf_len);
>>
>>       return kvm_vcpu_ioctl(CPU(cpu), KVM_SET_XSAVE, xsave);
>> @@ -3247,10 +3253,16 @@ static int kvm_get_xsave(X86CPU *cpu)
>>           return kvm_get_fpu(cpu);
>>       }
>>
>> -    ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_XSAVE, xsave);
>> +    if (env->xsave_buf_len <= sizeof(struct kvm_xsave)) {
>> +        ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_XSAVE, xsave);
>> +    } else {
>> +        ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_XSAVE2, xsave);
>> +    }
>> +
>>       if (ret < 0) {
>>           return ret;
>>       }
>> +
>>       x86_cpu_xrstor_all_areas(cpu, xsave, env->xsave_buf_len);
>>
>>       return 0;
>> diff --git a/target/i386/xsave_helper.c b/target/i386/xsave_helper.c
>> index ac61a96344..090424e820 100644
>> --- a/target/i386/xsave_helper.c
>> +++ b/target/i386/xsave_helper.c
>> @@ -5,6 +5,7 @@
>>   #include "qemu/osdep.h"
>>
>>   #include "cpu.h"
>> +#include <asm/kvm.h>
>>
>>   void x86_cpu_xsave_all_areas(X86CPU *cpu, void *buf, uint32_t buflen)
>>   {
>> @@ -126,6 +127,23 @@ void x86_cpu_xsave_all_areas(X86CPU *cpu, void
>> *buf, uint32_t buflen)
>>
>>           memcpy(pkru, &env->pkru, sizeof(env->pkru));
>>       }
>> +
>> +    e = &x86_ext_save_areas[XSTATE_XTILE_CFG_BIT];
>> +    if (e->size && e->offset) {
>> +        XSaveXTILE_CFG *tilecfg = buf + e->offset;
>> +
>> +        memcpy(tilecfg, &env->xtilecfg, sizeof(env->xtilecfg));
>> +    }
>> +
>> +    if (buflen > sizeof(struct kvm_xsave)) {
>> +        e = &x86_ext_save_areas[XSTATE_XTILE_DATA_BIT];
>> +
>> +        if (e->size && e->offset) {
>> +            XSaveXTILE_DATA *tiledata = buf + e->offset;
>> +
>> +            memcpy(tiledata, &env->xtiledata, sizeof(env->xtiledata));
>> +        }
>> +    }
>>   #endif
>>   }
>>
>> @@ -247,5 +265,22 @@ void x86_cpu_xrstor_all_areas(X86CPU *cpu, const
>> void *buf, uint32_t buflen)
>>           pkru = buf + e->offset;
>>           memcpy(&env->pkru, pkru, sizeof(env->pkru));
>>       }
>> +
>> +    e = &x86_ext_save_areas[XSTATE_XTILE_CFG_BIT];
>> +    if (e->size && e->offset) {
>> +        const XSaveXTILE_CFG *tilecfg = buf + e->offset;
>> +
>> +        memcpy(&env->xtilecfg, tilecfg, sizeof(env->xtilecfg));
>> +    }
>> +
>> +    if (buflen > sizeof(struct kvm_xsave)) {
>> +        e = &x86_ext_save_areas[XSTATE_XTILE_DATA_BIT];
>> +
>> +        if (e->size && e->offset) {
>> +            const XSaveXTILE_DATA *tiledata = buf + e->offset;
>> +
>> +            memcpy(&env->xtiledata, tiledata, sizeof(env->xtiledata));
>> +        }
>> +    }
>>   #endif
>>   }


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 1/7] x86: Fix the 64-byte boundary enumeration for extended state
  2022-01-10  8:20   ` Tian, Kevin
@ 2022-01-11  2:22     ` Yang Zhong
  2022-01-18 12:37       ` Paolo Bonzini
  0 siblings, 1 reply; 31+ messages in thread
From: Yang Zhong @ 2022-01-11  2:22 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: yang.zhong, Christopherson, ,
	Sean, jing2.liu, qemu-devel, Wang, Wei W, Zeng, Guang, pbonzini

On Mon, Jan 10, 2022 at 04:20:41PM +0800, Tian, Kevin wrote:
> > From: Zhong, Yang <yang.zhong@intel.com>
> > Sent: Friday, January 7, 2022 5:31 PM
> >
> > From: Jing Liu <jing2.liu@intel.com>
> >
> > The extended state subleaves (EAX=0Dh, ECX=n, n>1).ECX[1]
> > are all zero, while spec actually introduces that bit 01
> > should indicate if the extended state component locates
> > on the next 64-byte boundary following the preceding state
> > component when the compacted format of an XSAVE area is
> > used.
> 
> Above would read clearer if you revise to:
> 
> "The extended state subleaves (EAX=0Dh, ECX=n, n>1).ECX[1]
> indicate whether the extended state component locates
> on the next 64-byte boundary following the preceding state
> component when the compacted format of an XSAVE area is
> used.
> 
> But ECX[1] is always cleared in current implementation."

  Thanks Kevin, I will update this in next version.

  Yang



^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [RFC PATCH 6/7] x86: Use new XSAVE ioctls handling
  2022-01-10  9:47     ` Zeng Guang
@ 2022-01-11  2:30       ` Tian, Kevin
  2022-01-11  4:29         ` Zeng Guang
  2022-01-12  2:51         ` Zeng Guang
  0 siblings, 2 replies; 31+ messages in thread
From: Tian, Kevin @ 2022-01-11  2:30 UTC (permalink / raw)
  To: Zeng, Guang, Zhong, Yang, qemu-devel
  Cc: pbonzini, Wang, Wei W, jing2.liu, Christopherson, , Sean

> From: Zeng, Guang <guang.zeng@intel.com>
> Sent: Monday, January 10, 2022 5:47 PM
> 
> On 1/10/2022 4:40 PM, Tian, Kevin wrote:
> >> From: Zhong, Yang <yang.zhong@intel.com>
> >> Sent: Friday, January 7, 2022 5:32 PM
> >>
> >> From: Jing Liu <jing2.liu@intel.com>
> >>
> >> Extended feature has large state while current
> >> kvm_xsave only allows 4KB. Use new XSAVE ioctls
> >> if the xstate size is large than kvm_xsave.
> > shouldn't we always use the new xsave ioctls as long as
> > CAP_XSAVE2 is available?
> 
> 
> CAP_XSAVE2 may return legacy xsave size or 0 working with old kvm
> version in which it's not available.
> QEMU just use the new xsave ioctls only when the return value of
> CAP_XSAVE2 is larger than legacy xsave size.

CAP_XSAVE2  is the superset of CAP_XSAVE. If available it can support
both legacy 4K size or bigger.

> 
> >> Signed-off-by: Jing Liu <jing2.liu@intel.com>
> >> Signed-off-by: Zeng Guang <guang.zeng@intel.com>
> >> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> >> Signed-off-by: Yang Zhong <yang.zhong@intel.com>
> >> ---
> >>   linux-headers/asm-x86/kvm.h | 14 ++++++++++++++
> >>   linux-headers/linux/kvm.h   |  2 ++
> >>   target/i386/cpu.h           |  5 +++++
> >>   target/i386/kvm/kvm.c       | 16 ++++++++++++++--
> >>   target/i386/xsave_helper.c  | 35
> +++++++++++++++++++++++++++++++++++
> >>   5 files changed, 70 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h
> >> index 5a776a08f7..32f2a921e8 100644
> >> --- a/linux-headers/asm-x86/kvm.h
> >> +++ b/linux-headers/asm-x86/kvm.h
> >> @@ -376,6 +376,20 @@ struct kvm_debugregs {
> >>   /* for KVM_CAP_XSAVE */
> >>   struct kvm_xsave {
> >>   	__u32 region[1024];
> >> +	/*
> >> +	 * KVM_GET_XSAVE2 and KVM_SET_XSAVE write and read as many
> >> bytes
> >> +	 * as are returned by KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2)
> >> +	 * respectively, when invoked on the vm file descriptor.
> >> +	 *
> >> +	 * The size value returned by
> >> KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2)
> >> +	 * will always be at least 4096. Currently, it is only greater
> >> +	 * than 4096 if a dynamic feature has been enabled with
> >> +	 * ``arch_prctl()``, but this may change in the future.
> >> +	 *
> >> +	 * The offsets of the state save areas in struct kvm_xsave follow
> >> +	 * the contents of CPUID leaf 0xD on the host.
> >> +	 */
> >> +	__u32 extra[0];
> >>   };
> >>
> >>   #define KVM_MAX_XCRS	16
> >> diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
> >> index 02c5e7b7bb..97d5b6d81d 100644
> >> --- a/linux-headers/linux/kvm.h
> >> +++ b/linux-headers/linux/kvm.h
> >> @@ -1130,6 +1130,7 @@ struct kvm_ppc_resize_hpt {
> >>   #define KVM_CAP_BINARY_STATS_FD 203
> >>   #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
> >>   #define KVM_CAP_ARM_MTE 205
> >> +#define KVM_CAP_XSAVE2  207
> >>
> >>   #ifdef KVM_CAP_IRQ_ROUTING
> >>
> >> @@ -1550,6 +1551,7 @@ struct kvm_s390_ucas_mapping {
> >>   /* Available with KVM_CAP_XSAVE */
> >>   #define KVM_GET_XSAVE		  _IOR(KVMIO,  0xa4, struct
> >> kvm_xsave)
> >>   #define KVM_SET_XSAVE		  _IOW(KVMIO,  0xa5, struct
> >> kvm_xsave)
> >> +#define KVM_GET_XSAVE2		  _IOR(KVMIO,  0xcf, struct
> >> kvm_xsave)
> >>   /* Available with KVM_CAP_XCRS */
> >>   #define KVM_GET_XCRS		  _IOR(KVMIO,  0xa6, struct kvm_xcrs)
> >>   #define KVM_SET_XCRS		  _IOW(KVMIO,  0xa7, struct kvm_xcrs)
> >> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> >> index 245e8b5a1a..6153c4ab1a 100644
> >> --- a/target/i386/cpu.h
> >> +++ b/target/i386/cpu.h
> >> @@ -1519,6 +1519,11 @@ typedef struct CPUX86State {
> >>       YMMReg zmmh_regs[CPU_NB_REGS];
> >>       ZMMReg hi16_zmm_regs[CPU_NB_REGS];
> >>
> >> +#ifdef TARGET_X86_64
> >> +    uint8_t xtilecfg[64];
> >> +    uint8_t xtiledata[8192];
> >> +#endif
> >> +
> >>       /* sysenter registers */
> >>       uint32_t sysenter_cs;
> >>       target_ulong sysenter_esp;
> >> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> >> index 3fb3ddbe2b..97520e9dff 100644
> >> --- a/target/i386/kvm/kvm.c
> >> +++ b/target/i386/kvm/kvm.c
> >> @@ -1983,7 +1983,12 @@ int kvm_arch_init_vcpu(CPUState *cs)
> >>       }
> >>
> >>       if (has_xsave) {
> >> -        env->xsave_buf_len = sizeof(struct kvm_xsave);
> >> +        uint32_t size = kvm_vm_check_extension(cs->kvm_state,
> >> KVM_CAP_XSAVE2);
> >> +        if (!size) {
> >> +            size = sizeof(struct kvm_xsave);
> >> +        }
> >> +
> >> +        env->xsave_buf_len = QEMU_ALIGN_UP(size, 4096);
> >>           env->xsave_buf = qemu_memalign(4096, env->xsave_buf_len);
> >>           memset(env->xsave_buf, 0, env->xsave_buf_len);
> >>
> >> @@ -2580,6 +2585,7 @@ static int kvm_put_xsave(X86CPU *cpu)
> >>       if (!has_xsave) {
> >>           return kvm_put_fpu(cpu);
> >>       }
> >> +
> >>       x86_cpu_xsave_all_areas(cpu, xsave, env->xsave_buf_len);
> >>
> >>       return kvm_vcpu_ioctl(CPU(cpu), KVM_SET_XSAVE, xsave);
> >> @@ -3247,10 +3253,16 @@ static int kvm_get_xsave(X86CPU *cpu)
> >>           return kvm_get_fpu(cpu);
> >>       }
> >>
> >> -    ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_XSAVE, xsave);
> >> +    if (env->xsave_buf_len <= sizeof(struct kvm_xsave)) {
> >> +        ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_XSAVE, xsave);
> >> +    } else {
> >> +        ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_XSAVE2, xsave);
> >> +    }
> >> +
> >>       if (ret < 0) {
> >>           return ret;
> >>       }
> >> +
> >>       x86_cpu_xrstor_all_areas(cpu, xsave, env->xsave_buf_len);
> >>
> >>       return 0;
> >> diff --git a/target/i386/xsave_helper.c b/target/i386/xsave_helper.c
> >> index ac61a96344..090424e820 100644
> >> --- a/target/i386/xsave_helper.c
> >> +++ b/target/i386/xsave_helper.c
> >> @@ -5,6 +5,7 @@
> >>   #include "qemu/osdep.h"
> >>
> >>   #include "cpu.h"
> >> +#include <asm/kvm.h>
> >>
> >>   void x86_cpu_xsave_all_areas(X86CPU *cpu, void *buf, uint32_t buflen)
> >>   {
> >> @@ -126,6 +127,23 @@ void x86_cpu_xsave_all_areas(X86CPU *cpu,
> void
> >> *buf, uint32_t buflen)
> >>
> >>           memcpy(pkru, &env->pkru, sizeof(env->pkru));
> >>       }
> >> +
> >> +    e = &x86_ext_save_areas[XSTATE_XTILE_CFG_BIT];
> >> +    if (e->size && e->offset) {
> >> +        XSaveXTILE_CFG *tilecfg = buf + e->offset;
> >> +
> >> +        memcpy(tilecfg, &env->xtilecfg, sizeof(env->xtilecfg));
> >> +    }
> >> +
> >> +    if (buflen > sizeof(struct kvm_xsave)) {
> >> +        e = &x86_ext_save_areas[XSTATE_XTILE_DATA_BIT];
> >> +
> >> +        if (e->size && e->offset) {
> >> +            XSaveXTILE_DATA *tiledata = buf + e->offset;
> >> +
> >> +            memcpy(tiledata, &env->xtiledata, sizeof(env->xtiledata));
> >> +        }
> >> +    }
> >>   #endif
> >>   }
> >>
> >> @@ -247,5 +265,22 @@ void x86_cpu_xrstor_all_areas(X86CPU *cpu,
> const
> >> void *buf, uint32_t buflen)
> >>           pkru = buf + e->offset;
> >>           memcpy(&env->pkru, pkru, sizeof(env->pkru));
> >>       }
> >> +
> >> +    e = &x86_ext_save_areas[XSTATE_XTILE_CFG_BIT];
> >> +    if (e->size && e->offset) {
> >> +        const XSaveXTILE_CFG *tilecfg = buf + e->offset;
> >> +
> >> +        memcpy(&env->xtilecfg, tilecfg, sizeof(env->xtilecfg));
> >> +    }
> >> +
> >> +    if (buflen > sizeof(struct kvm_xsave)) {
> >> +        e = &x86_ext_save_areas[XSTATE_XTILE_DATA_BIT];
> >> +
> >> +        if (e->size && e->offset) {
> >> +            const XSaveXTILE_DATA *tiledata = buf + e->offset;
> >> +
> >> +            memcpy(&env->xtiledata, tiledata, sizeof(env->xtiledata));
> >> +        }
> >> +    }
> >>   #endif
> >>   }

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 2/7] x86: Add AMX XTILECFG and XTILEDATA components
  2022-01-10  8:23   ` Tian, Kevin
@ 2022-01-11  2:32     ` Yang Zhong
  2022-01-18 12:39     ` Paolo Bonzini
  1 sibling, 0 replies; 31+ messages in thread
From: Yang Zhong @ 2022-01-11  2:32 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: yang.zhong, Christopherson, ,
	Sean, jing2.liu, qemu-devel, Wang, Wei W, Zeng, Guang, pbonzini

On Mon, Jan 10, 2022 at 04:23:47PM +0800, Tian, Kevin wrote:
> > From: Zhong, Yang <yang.zhong@intel.com>
> > Sent: Friday, January 7, 2022 5:31 PM
> >
> > From: Jing Liu <jing2.liu@intel.com>
> >
> > AMX XTILECFG and XTILEDATA are managed by XSAVE feature
> > set. State component 17 is used for 64-byte TILECFG register
> > (XTILECFG state) and component 18 is used for 8192 bytes
> > of tile data (XTILEDATA state).
> 
> to be consistent, "tile data" -> "TILEDATA"
> 
> >
> > Add AMX feature bits to x86_ext_save_areas array to set
> > up AMX components. Add structs that define the layout of
> > AMX XSAVE areas and use QEMU_BUILD_BUG_ON to validate the
> > structs sizes.
> >
> > Signed-off-by: Jing Liu <jing2.liu@intel.com>
> > Signed-off-by: Yang Zhong <yang.zhong@intel.com>
> > ---
> >  target/i386/cpu.h | 16 +++++++++++++++-
> >  target/i386/cpu.c |  8 ++++++++
> >  2 files changed, 23 insertions(+), 1 deletion(-)
> >
> > diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> > index 7f9700544f..768a8218be 100644
> > --- a/target/i386/cpu.h
> > +++ b/target/i386/cpu.h
> > @@ -537,6 +537,8 @@ typedef enum X86Seg {
> >  #define XSTATE_ZMM_Hi256_BIT            6
> >  #define XSTATE_Hi16_ZMM_BIT             7
> >  #define XSTATE_PKRU_BIT                 9
> > +#define XSTATE_XTILE_CFG_BIT            17
> > +#define XSTATE_XTILE_DATA_BIT           18
> >
> >  #define XSTATE_FP_MASK                  (1ULL << XSTATE_FP_BIT)
> >  #define XSTATE_SSE_MASK                 (1ULL << XSTATE_SSE_BIT)
> > @@ -1343,6 +1345,16 @@ typedef struct XSavePKRU {
> >      uint32_t padding;
> >  } XSavePKRU;
> >
> > +/* Ext. save area 17: AMX XTILECFG state */
> > +typedef struct XSaveXTILE_CFG {
> 
> remove "_"?
> 
> > +    uint8_t xtilecfg[64];
> > +} XSaveXTILE_CFG;
> > +
> > +/* Ext. save area 18: AMX XTILEDATA state */
> > +typedef struct XSaveXTILE_DATA {
> 
> ditto
>

  Thanks Kevin, I will update this in new version.

  Yang 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 6/7] x86: Use new XSAVE ioctls handling
  2022-01-11  2:30       ` Tian, Kevin
@ 2022-01-11  4:29         ` Zeng Guang
  2022-01-12  2:51         ` Zeng Guang
  1 sibling, 0 replies; 31+ messages in thread
From: Zeng Guang @ 2022-01-11  4:29 UTC (permalink / raw)
  To: Tian, Kevin, Zhong, Yang, qemu-devel
  Cc: pbonzini, Wang, Wei W, jing2.liu, Christopherson, , Sean

On 1/11/2022 10:30 AM, Tian, Kevin wrote:
>> From: Zeng, Guang <guang.zeng@intel.com>
>> Sent: Monday, January 10, 2022 5:47 PM
>>
>> On 1/10/2022 4:40 PM, Tian, Kevin wrote:
>>>> From: Zhong, Yang <yang.zhong@intel.com>
>>>> Sent: Friday, January 7, 2022 5:32 PM
>>>>
>>>> From: Jing Liu <jing2.liu@intel.com>
>>>>
>>>> Extended feature has large state while current
>>>> kvm_xsave only allows 4KB. Use new XSAVE ioctls
>>>> if the xstate size is large than kvm_xsave.
>>> shouldn't we always use the new xsave ioctls as long as
>>> CAP_XSAVE2 is available?
>>
>> CAP_XSAVE2 may return legacy xsave size or 0 working with old kvm
>> version in which it's not available.
>> QEMU just use the new xsave ioctls only when the return value of
>> CAP_XSAVE2 is larger than legacy xsave size.
> CAP_XSAVE2  is the superset of CAP_XSAVE. If available it can support
> both legacy 4K size or bigger.

Yes. According to the return value of CAP_XSAVE2, further determine 
whether need use new ioctl
KVM_GET_XSAVE2 for extended xsave state. This is the main change to 
support dynamically enabled
feature.

>>>> Signed-off-by: Jing Liu <jing2.liu@intel.com>
>>>> Signed-off-by: Zeng Guang <guang.zeng@intel.com>
>>>> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
>>>> Signed-off-by: Yang Zhong <yang.zhong@intel.com>
>>>> ---
>>>>    linux-headers/asm-x86/kvm.h | 14 ++++++++++++++
>>>>    linux-headers/linux/kvm.h   |  2 ++
>>>>    target/i386/cpu.h           |  5 +++++
>>>>    target/i386/kvm/kvm.c       | 16 ++++++++++++++--
>>>>    target/i386/xsave_helper.c  | 35
>> +++++++++++++++++++++++++++++++++++
>>>>    5 files changed, 70 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h
>>>> index 5a776a08f7..32f2a921e8 100644
>>>> --- a/linux-headers/asm-x86/kvm.h
>>>> +++ b/linux-headers/asm-x86/kvm.h
>>>> @@ -376,6 +376,20 @@ struct kvm_debugregs {
>>>>    /* for KVM_CAP_XSAVE */
>>>>    struct kvm_xsave {
>>>>    	__u32 region[1024];
>>>> +	/*
>>>> +	 * KVM_GET_XSAVE2 and KVM_SET_XSAVE write and read as many
>>>> bytes
>>>> +	 * as are returned by KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2)
>>>> +	 * respectively, when invoked on the vm file descriptor.
>>>> +	 *
>>>> +	 * The size value returned by
>>>> KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2)
>>>> +	 * will always be at least 4096. Currently, it is only greater
>>>> +	 * than 4096 if a dynamic feature has been enabled with
>>>> +	 * ``arch_prctl()``, but this may change in the future.
>>>> +	 *
>>>> +	 * The offsets of the state save areas in struct kvm_xsave follow
>>>> +	 * the contents of CPUID leaf 0xD on the host.
>>>> +	 */
>>>> +	__u32 extra[0];
>>>>    };
>>>>
>>>>    #define KVM_MAX_XCRS	16
>>>> diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
>>>> index 02c5e7b7bb..97d5b6d81d 100644
>>>> --- a/linux-headers/linux/kvm.h
>>>> +++ b/linux-headers/linux/kvm.h
>>>> @@ -1130,6 +1130,7 @@ struct kvm_ppc_resize_hpt {
>>>>    #define KVM_CAP_BINARY_STATS_FD 203
>>>>    #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
>>>>    #define KVM_CAP_ARM_MTE 205
>>>> +#define KVM_CAP_XSAVE2  207
>>>>
>>>>    #ifdef KVM_CAP_IRQ_ROUTING
>>>>
>>>> @@ -1550,6 +1551,7 @@ struct kvm_s390_ucas_mapping {
>>>>    /* Available with KVM_CAP_XSAVE */
>>>>    #define KVM_GET_XSAVE		  _IOR(KVMIO,  0xa4, struct
>>>> kvm_xsave)
>>>>    #define KVM_SET_XSAVE		  _IOW(KVMIO,  0xa5, struct
>>>> kvm_xsave)
>>>> +#define KVM_GET_XSAVE2		  _IOR(KVMIO,  0xcf, struct
>>>> kvm_xsave)
>>>>    /* Available with KVM_CAP_XCRS */
>>>>    #define KVM_GET_XCRS		  _IOR(KVMIO,  0xa6, struct kvm_xcrs)
>>>>    #define KVM_SET_XCRS		  _IOW(KVMIO,  0xa7, struct kvm_xcrs)
>>>> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
>>>> index 245e8b5a1a..6153c4ab1a 100644
>>>> --- a/target/i386/cpu.h
>>>> +++ b/target/i386/cpu.h
>>>> @@ -1519,6 +1519,11 @@ typedef struct CPUX86State {
>>>>        YMMReg zmmh_regs[CPU_NB_REGS];
>>>>        ZMMReg hi16_zmm_regs[CPU_NB_REGS];
>>>>
>>>> +#ifdef TARGET_X86_64
>>>> +    uint8_t xtilecfg[64];
>>>> +    uint8_t xtiledata[8192];
>>>> +#endif
>>>> +
>>>>        /* sysenter registers */
>>>>        uint32_t sysenter_cs;
>>>>        target_ulong sysenter_esp;
>>>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
>>>> index 3fb3ddbe2b..97520e9dff 100644
>>>> --- a/target/i386/kvm/kvm.c
>>>> +++ b/target/i386/kvm/kvm.c
>>>> @@ -1983,7 +1983,12 @@ int kvm_arch_init_vcpu(CPUState *cs)
>>>>        }
>>>>
>>>>        if (has_xsave) {
>>>> -        env->xsave_buf_len = sizeof(struct kvm_xsave);
>>>> +        uint32_t size = kvm_vm_check_extension(cs->kvm_state,
>>>> KVM_CAP_XSAVE2);
>>>> +        if (!size) {
>>>> +            size = sizeof(struct kvm_xsave);
>>>> +        }
>>>> +
>>>> +        env->xsave_buf_len = QEMU_ALIGN_UP(size, 4096);
>>>>            env->xsave_buf = qemu_memalign(4096, env->xsave_buf_len);
>>>>            memset(env->xsave_buf, 0, env->xsave_buf_len);
>>>>
>>>> @@ -2580,6 +2585,7 @@ static int kvm_put_xsave(X86CPU *cpu)
>>>>        if (!has_xsave) {
>>>>            return kvm_put_fpu(cpu);
>>>>        }
>>>> +
>>>>        x86_cpu_xsave_all_areas(cpu, xsave, env->xsave_buf_len);
>>>>
>>>>        return kvm_vcpu_ioctl(CPU(cpu), KVM_SET_XSAVE, xsave);
>>>> @@ -3247,10 +3253,16 @@ static int kvm_get_xsave(X86CPU *cpu)
>>>>            return kvm_get_fpu(cpu);
>>>>        }
>>>>
>>>> -    ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_XSAVE, xsave);
>>>> +    if (env->xsave_buf_len <= sizeof(struct kvm_xsave)) {
>>>> +        ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_XSAVE, xsave);
>>>> +    } else {
>>>> +        ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_XSAVE2, xsave);
>>>> +    }
>>>> +
>>>>        if (ret < 0) {
>>>>            return ret;
>>>>        }
>>>> +
>>>>        x86_cpu_xrstor_all_areas(cpu, xsave, env->xsave_buf_len);
>>>>
>>>>        return 0;
>>>> diff --git a/target/i386/xsave_helper.c b/target/i386/xsave_helper.c
>>>> index ac61a96344..090424e820 100644
>>>> --- a/target/i386/xsave_helper.c
>>>> +++ b/target/i386/xsave_helper.c
>>>> @@ -5,6 +5,7 @@
>>>>    #include "qemu/osdep.h"
>>>>
>>>>    #include "cpu.h"
>>>> +#include <asm/kvm.h>
>>>>
>>>>    void x86_cpu_xsave_all_areas(X86CPU *cpu, void *buf, uint32_t buflen)
>>>>    {
>>>> @@ -126,6 +127,23 @@ void x86_cpu_xsave_all_areas(X86CPU *cpu,
>> void
>>>> *buf, uint32_t buflen)
>>>>
>>>>            memcpy(pkru, &env->pkru, sizeof(env->pkru));
>>>>        }
>>>> +
>>>> +    e = &x86_ext_save_areas[XSTATE_XTILE_CFG_BIT];
>>>> +    if (e->size && e->offset) {
>>>> +        XSaveXTILE_CFG *tilecfg = buf + e->offset;
>>>> +
>>>> +        memcpy(tilecfg, &env->xtilecfg, sizeof(env->xtilecfg));
>>>> +    }
>>>> +
>>>> +    if (buflen > sizeof(struct kvm_xsave)) {
>>>> +        e = &x86_ext_save_areas[XSTATE_XTILE_DATA_BIT];
>>>> +
>>>> +        if (e->size && e->offset) {
>>>> +            XSaveXTILE_DATA *tiledata = buf + e->offset;
>>>> +
>>>> +            memcpy(tiledata, &env->xtiledata, sizeof(env->xtiledata));
>>>> +        }
>>>> +    }
>>>>    #endif
>>>>    }
>>>>
>>>> @@ -247,5 +265,22 @@ void x86_cpu_xrstor_all_areas(X86CPU *cpu,
>> const
>>>> void *buf, uint32_t buflen)
>>>>            pkru = buf + e->offset;
>>>>            memcpy(&env->pkru, pkru, sizeof(env->pkru));
>>>>        }
>>>> +
>>>> +    e = &x86_ext_save_areas[XSTATE_XTILE_CFG_BIT];
>>>> +    if (e->size && e->offset) {
>>>> +        const XSaveXTILE_CFG *tilecfg = buf + e->offset;
>>>> +
>>>> +        memcpy(&env->xtilecfg, tilecfg, sizeof(env->xtilecfg));
>>>> +    }
>>>> +
>>>> +    if (buflen > sizeof(struct kvm_xsave)) {
>>>> +        e = &x86_ext_save_areas[XSTATE_XTILE_DATA_BIT];
>>>> +
>>>> +        if (e->size && e->offset) {
>>>> +            const XSaveXTILE_DATA *tiledata = buf + e->offset;
>>>> +
>>>> +            memcpy(&env->xtiledata, tiledata, sizeof(env->xtiledata));
>>>> +        }
>>>> +    }
>>>>    #endif
>>>>    }


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 4/7] x86: Add XFD faulting bit for state components
  2022-01-10  8:38   ` Tian, Kevin
@ 2022-01-11  5:32     ` Yang Zhong
  0 siblings, 0 replies; 31+ messages in thread
From: Yang Zhong @ 2022-01-11  5:32 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: yang.zhong, Christopherson, ,
	Sean, jing2.liu, qemu-devel, Wang, Wei W, Zeng, Guang, pbonzini

On Mon, Jan 10, 2022 at 04:38:18PM +0800, Tian, Kevin wrote:
> > From: Zhong, Yang <yang.zhong@intel.com>
> > Sent: Friday, January 7, 2022 5:32 PM
> >
> > From: Jing Liu <jing2.liu@intel.com>
> >
> > Intel introduces XFD faulting mechanism for extended
> > XSAVE features to dynamically enable the features in
> > runtime. If CPUID (EAX=0Dh, ECX=n, n>1).ECX[2] is set
> > as 1, it indicates support for XFD faulting of this
> > state component.
> >
> > Signed-off-by: Jing Liu <jing2.liu@intel.com>
> > Signed-off-by: Yang Zhong <yang.zhong@intel.com>
> > ---
> >  target/i386/cpu.h         | 2 +-
> >  target/i386/cpu.c         | 2 +-
> >  target/i386/kvm/kvm-cpu.c | 1 +
> >  3 files changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> > index 79023fe723..22f7ff40a6 100644
> > --- a/target/i386/cpu.h
> > +++ b/target/i386/cpu.h
> > @@ -1375,7 +1375,7 @@
> > QEMU_BUILD_BUG_ON(sizeof(XSaveXTILE_DATA) != 0x2000);
> >  typedef struct ExtSaveArea {
> >      uint32_t feature, bits;
> >      uint32_t offset, size;
> > -    uint32_t need_align;
> > +    uint32_t need_align, support_xfd;
> 
> why each flag be a 32-bit field?
>
  
  Using the uint32_t to define those flags for below ecx value 
  *ecx = (esa->need_align << 1) | (esa->support_xfd << 2);

 
> also it's more natural to have them in separate lines, though I'm not
> sure why existing fields are put this way (possibly due to short names?).
> 

  Yes, support_xfd flag will be in another line to define, thanks!

  Yang


> >  } ExtSaveArea;
> >
> >  #define XSAVE_STATE_AREA_COUNT (XSTATE_XTILE_DATA_BIT + 1)
> > diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> > index dd2c919c33..1adc3f0f99 100644
> > --- a/target/i386/cpu.c
> > +++ b/target/i386/cpu.c
> > @@ -5495,7 +5495,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t
> > index, uint32_t count,
> >                  const ExtSaveArea *esa = &x86_ext_save_areas[count];
> >                  *eax = esa->size;
> >                  *ebx = esa->offset;
> > -                *ecx = esa->need_align << 1;
> > +                *ecx = (esa->need_align << 1) | (esa->support_xfd << 2);
> >              }
> >          }
> >          break;
> > diff --git a/target/i386/kvm/kvm-cpu.c b/target/i386/kvm/kvm-cpu.c
> > index 6c4c1c6f9d..3b3c203f11 100644
> > --- a/target/i386/kvm/kvm-cpu.c
> > +++ b/target/i386/kvm/kvm-cpu.c
> > @@ -108,6 +108,7 @@ static void kvm_cpu_xsave_init(void)
> >
> >              uint32_t ecx = kvm_arch_get_supported_cpuid(s, 0xd, i, R_ECX);
> >              esa->need_align = ecx & (1u << 1) ? 1 : 0;
> > +            esa->support_xfd = ecx & (1u << 2) ? 1 : 0;
> >          }
> >      }
> >  }


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 3/7] x86: Grant AMX permission for guest
  2022-01-10  8:36   ` Tian, Kevin
@ 2022-01-11  6:46     ` Yang Zhong
  0 siblings, 0 replies; 31+ messages in thread
From: Yang Zhong @ 2022-01-11  6:46 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: yang.zhong, Christopherson, ,
	Sean, jing2.liu, qemu-devel, Wang, Wei W, Zeng, Guang, pbonzini

On Mon, Jan 10, 2022 at 04:36:13PM +0800, Tian, Kevin wrote:
> > From: Zhong, Yang <yang.zhong@intel.com>
> > Sent: Friday, January 7, 2022 5:32 PM
> >
> > Kernel mechanism for dynamically enabled XSAVE features
> 
> there is no definition of "dynamically-enabled XSAVE features).
> 

  Thanks!


> > asks userspace VMM requesting guest permission if it wants
> > to expose the features. Only with the permission, kernel
> > can try to enable the features when detecting the intention
> > from guest in runtime.
> >
> > Qemu should request the permission for guest only once
> > before the first vCPU is created. KVM checks the guest
> > permission when Qemu advertises the features, and the
> > advertising operation fails w/o permission.
> 
> what about below?
> 
> "Kernel allocates 4K xstate buffer by default. For XSAVE features
> which require large state component (e.g. AMX), Linux kernel
> dynamically expands the xstate buffer only after the process has
> acquired the necessary permissions. Those are called dynamically-
> enabled XSAVE features (or dynamic xfeatures).
> 
> There are separate permissions for native tasks and guests.
> 
> Qemu should request the guest permissions for dynamic xfeatures
> which will be exposed to the guest. This only needs to be done
> once before the first vcpu is created."


  This is clearer. Will update this in new version, thanks!


> 
> >
> > Signed-off-by: Yang Zhong <yang.zhong@intel.com>
> > Signed-off-by: Jing Liu <jing2.liu@intel.com>
> > ---
> >  target/i386/cpu.h |  7 +++++++
> >  hw/i386/x86.c     | 28 ++++++++++++++++++++++++++++
> >  2 files changed, 35 insertions(+)
> >
> > diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> > index 768a8218be..79023fe723 100644
> > --- a/target/i386/cpu.h
> > +++ b/target/i386/cpu.h
> > @@ -549,6 +549,13 @@ typedef enum X86Seg {
> >  #define XSTATE_ZMM_Hi256_MASK           (1ULL << XSTATE_ZMM_Hi256_BIT)
> >  #define XSTATE_Hi16_ZMM_MASK            (1ULL << XSTATE_Hi16_ZMM_BIT)
> >  #define XSTATE_PKRU_MASK                (1ULL << XSTATE_PKRU_BIT)
> > +#define XSTATE_XTILE_CFG_MASK           (1ULL << XSTATE_XTILE_CFG_BIT)
> > +#define XSTATE_XTILE_DATA_MASK          (1ULL << XSTATE_XTILE_DATA_BIT)
> > +#define XFEATURE_XTILE_MASK             (XSTATE_XTILE_CFG_MASK \
> > +                                         | XSTATE_XTILE_DATA_MASK)
> > +
> > +#define ARCH_GET_XCOMP_GUEST_PERM       0x1024
> > +#define ARCH_REQ_XCOMP_GUEST_PERM       0x1025
> >
> >  /* CPUID feature words */
> >  typedef enum FeatureWord {
> > diff --git a/hw/i386/x86.c b/hw/i386/x86.c
> > index b84840a1bb..0a204c375e 100644
> > --- a/hw/i386/x86.c
> > +++ b/hw/i386/x86.c
> > @@ -41,6 +41,8 @@
> >  #include "sysemu/cpu-timers.h"
> >  #include "trace.h"
> >
> > +#include <sys/syscall.h>
> > +
> >  #include "hw/i386/x86.h"
> >  #include "target/i386/cpu.h"
> >  #include "hw/i386/topology.h"
> > @@ -117,6 +119,30 @@ out:
> >      object_unref(cpu);
> >  }
> >
> > +static void x86_xsave_req_perm(void)
> > +{
> > +    unsigned long bitmask;
> > +
> > +    long rc = syscall(SYS_arch_prctl, ARCH_REQ_XCOMP_GUEST_PERM,
> > +                      XSTATE_XTILE_DATA_BIT);
> 
> Should we do it based on the cpuid for the first vcpu?


  This permission is requested before vcpu init, so put it in
  x86_cpus_init(). If the host kernel does not include AMX changes, or
  the latest kernel(include AMX) install on previous generation x86
  platform, this syscall() will directly return. I ever put this
  permission request in the vcpu create function, but it's hard
  to find a good location to handle this. As for cpuid, you mean
  I need check host cpuid info? to check if this host cpu can support
  AMX? thanks!

  Yang   
   
> 
> > +    if (rc) {
> > +        /*
> > +         * The older kernel version(<5.15) can't support
> > +         * ARCH_REQ_XCOMP_GUEST_PERM and directly return.
> > +         */
> > +        return;
> > +    }
> > +
> > +    rc = syscall(SYS_arch_prctl, ARCH_GET_XCOMP_GUEST_PERM, &bitmask);
> > +    if (rc) {
> > +        error_report("prctl(ARCH_GET_XCOMP_GUEST_PERM) error: %ld", rc);
> > +    } else if (!(bitmask & XFEATURE_XTILE_MASK)) {
> > +        error_report("prctl(ARCH_REQ_XCOMP_GUEST_PERM) failure "
> > +                     "and bitmask=0x%lx", bitmask);
> > +        exit(EXIT_FAILURE);
> > +    }
> > +}
> > +
> >  void x86_cpus_init(X86MachineState *x86ms, int default_cpu_version)
> >  {
> >      int i;
> > @@ -124,6 +150,8 @@ void x86_cpus_init(X86MachineState *x86ms, int
> > default_cpu_version)
> >      MachineState *ms = MACHINE(x86ms);
> >      MachineClass *mc = MACHINE_GET_CLASS(x86ms);
> >
> > +    /* Request AMX pemission for guest */
> > +    x86_xsave_req_perm();
> >      x86_cpu_set_default_version(default_cpu_version);
> >
> >      /*


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 6/7] x86: Use new XSAVE ioctls handling
  2022-01-11  2:30       ` Tian, Kevin
  2022-01-11  4:29         ` Zeng Guang
@ 2022-01-12  2:51         ` Zeng Guang
  2022-01-12  4:34           ` Wang, Wei W
  1 sibling, 1 reply; 31+ messages in thread
From: Zeng Guang @ 2022-01-12  2:51 UTC (permalink / raw)
  To: Tian, Kevin, Zhong, Yang, qemu-devel
  Cc: pbonzini, Wang, Wei W, jing2.liu, Christopherson, , Sean

On 1/11/2022 10:30 AM, Tian, Kevin wrote:
>> From: Zeng, Guang <guang.zeng@intel.com>
>> Sent: Monday, January 10, 2022 5:47 PM
>>
>> On 1/10/2022 4:40 PM, Tian, Kevin wrote:
>>>> From: Zhong, Yang <yang.zhong@intel.com>
>>>> Sent: Friday, January 7, 2022 5:32 PM
>>>>
>>>> From: Jing Liu <jing2.liu@intel.com>
>>>>
>>>> Extended feature has large state while current
>>>> kvm_xsave only allows 4KB. Use new XSAVE ioctls
>>>> if the xstate size is large than kvm_xsave.
>>> shouldn't we always use the new xsave ioctls as long as
>>> CAP_XSAVE2 is available?
>>
>> CAP_XSAVE2 may return legacy xsave size or 0 working with old kvm
>> version in which it's not available.
>> QEMU just use the new xsave ioctls only when the return value of
>> CAP_XSAVE2 is larger than legacy xsave size.
> CAP_XSAVE2  is the superset of CAP_XSAVE. If available it can support
> both legacy 4K size or bigger.

Got your point now. We can use new ioctl once CAP_XSAVE2 is available.
As your suggestion, I'd like to change commit log as follows:

"x86: Use new XSAVE ioctls handling

   Extended feature has large state while current
   kvm_xsave only allows 4KB. Use new XSAVE ioctls
   if check extension of CAP_XSAVE2 is available."

And introduce has_xsave2 to indicate the valid of CAP_XSAVE2
with following change:

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 97520e9dff..c8dae88ced 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -116,6 +116,7 @@ static bool has_msr_ucode_rev;
  static bool has_msr_vmx_procbased_ctls2;
  static bool has_msr_perf_capabs;
  static bool has_msr_pkrs;
+static bool has_xsave2 = false;

  static uint32_t has_architectural_pmu_version;
  static uint32_t num_architectural_pmu_gp_counters;
@@ -1986,7 +1987,8 @@ int kvm_arch_init_vcpu(CPUState *cs)
          uint32_t size = kvm_vm_check_extension(cs->kvm_state, 
KVM_CAP_XSAVE2);
          if (!size) {
              size = sizeof(struct kvm_xsave);
-        }
+        } else
+            has_xsave2 = true;

          env->xsave_buf_len = QEMU_ALIGN_UP(size, 4096);
          env->xsave_buf = qemu_memalign(4096, env->xsave_buf_len);
@@ -3253,7 +3255,7 @@ static int kvm_get_xsave(X86CPU *cpu)
          return kvm_get_fpu(cpu);
      }

-    if (env->xsave_buf_len <= sizeof(struct kvm_xsave)) {
+    if (!has_xsave2) {
          ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_XSAVE, xsave);
      } else {
          ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_XSAVE2, xsave);

>   
>>>> Signed-off-by: Jing Liu <jing2.liu@intel.com>
>>>> Signed-off-by: Zeng Guang <guang.zeng@intel.com>
>>>> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
>>>> Signed-off-by: Yang Zhong <yang.zhong@intel.com>
>>>> ---
>>>>    linux-headers/asm-x86/kvm.h | 14 ++++++++++++++
>>>>    linux-headers/linux/kvm.h   |  2 ++
>>>>    target/i386/cpu.h           |  5 +++++
>>>>    target/i386/kvm/kvm.c       | 16 ++++++++++++++--
>>>>    target/i386/xsave_helper.c  | 35
>> +++++++++++++++++++++++++++++++++++
>>>>    5 files changed, 70 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h
>>>> index 5a776a08f7..32f2a921e8 100644
>>>> --- a/linux-headers/asm-x86/kvm.h
>>>> +++ b/linux-headers/asm-x86/kvm.h
>>>> @@ -376,6 +376,20 @@ struct kvm_debugregs {
>>>>    /* for KVM_CAP_XSAVE */
>>>>    struct kvm_xsave {
>>>>    	__u32 region[1024];
>>>> +	/*
>>>> +	 * KVM_GET_XSAVE2 and KVM_SET_XSAVE write and read as many
>>>> bytes
>>>> +	 * as are returned by KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2)
>>>> +	 * respectively, when invoked on the vm file descriptor.
>>>> +	 *
>>>> +	 * The size value returned by
>>>> KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2)
>>>> +	 * will always be at least 4096. Currently, it is only greater
>>>> +	 * than 4096 if a dynamic feature has been enabled with
>>>> +	 * ``arch_prctl()``, but this may change in the future.
>>>> +	 *
>>>> +	 * The offsets of the state save areas in struct kvm_xsave follow
>>>> +	 * the contents of CPUID leaf 0xD on the host.
>>>> +	 */
>>>> +	__u32 extra[0];
>>>>    };
>>>>
>>>>    #define KVM_MAX_XCRS	16
>>>> diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
>>>> index 02c5e7b7bb..97d5b6d81d 100644
>>>> --- a/linux-headers/linux/kvm.h
>>>> +++ b/linux-headers/linux/kvm.h
>>>> @@ -1130,6 +1130,7 @@ struct kvm_ppc_resize_hpt {
>>>>    #define KVM_CAP_BINARY_STATS_FD 203
>>>>    #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
>>>>    #define KVM_CAP_ARM_MTE 205
>>>> +#define KVM_CAP_XSAVE2  207
>>>>
>>>>    #ifdef KVM_CAP_IRQ_ROUTING
>>>>
>>>> @@ -1550,6 +1551,7 @@ struct kvm_s390_ucas_mapping {
>>>>    /* Available with KVM_CAP_XSAVE */
>>>>    #define KVM_GET_XSAVE		  _IOR(KVMIO,  0xa4, struct
>>>> kvm_xsave)
>>>>    #define KVM_SET_XSAVE		  _IOW(KVMIO,  0xa5, struct
>>>> kvm_xsave)
>>>> +#define KVM_GET_XSAVE2		  _IOR(KVMIO,  0xcf, struct
>>>> kvm_xsave)
>>>>    /* Available with KVM_CAP_XCRS */
>>>>    #define KVM_GET_XCRS		  _IOR(KVMIO,  0xa6, struct kvm_xcrs)
>>>>    #define KVM_SET_XCRS		  _IOW(KVMIO,  0xa7, struct kvm_xcrs)
>>>> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
>>>> index 245e8b5a1a..6153c4ab1a 100644
>>>> --- a/target/i386/cpu.h
>>>> +++ b/target/i386/cpu.h
>>>> @@ -1519,6 +1519,11 @@ typedef struct CPUX86State {
>>>>        YMMReg zmmh_regs[CPU_NB_REGS];
>>>>        ZMMReg hi16_zmm_regs[CPU_NB_REGS];
>>>>
>>>> +#ifdef TARGET_X86_64
>>>> +    uint8_t xtilecfg[64];
>>>> +    uint8_t xtiledata[8192];
>>>> +#endif
>>>> +
>>>>        /* sysenter registers */
>>>>        uint32_t sysenter_cs;
>>>>        target_ulong sysenter_esp;
>>>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
>>>> index 3fb3ddbe2b..97520e9dff 100644
>>>> --- a/target/i386/kvm/kvm.c
>>>> +++ b/target/i386/kvm/kvm.c
>>>> @@ -1983,7 +1983,12 @@ int kvm_arch_init_vcpu(CPUState *cs)
>>>>        }
>>>>
>>>>        if (has_xsave) {
>>>> -        env->xsave_buf_len = sizeof(struct kvm_xsave);
>>>> +        uint32_t size = kvm_vm_check_extension(cs->kvm_state,
>>>> KVM_CAP_XSAVE2);
>>>> +        if (!size) {
>>>> +            size = sizeof(struct kvm_xsave);
>>>> +        }
>>>> +
>>>> +        env->xsave_buf_len = QEMU_ALIGN_UP(size, 4096);
>>>>            env->xsave_buf = qemu_memalign(4096, env->xsave_buf_len);
>>>>            memset(env->xsave_buf, 0, env->xsave_buf_len);
>>>>
>>>> @@ -2580,6 +2585,7 @@ static int kvm_put_xsave(X86CPU *cpu)
>>>>        if (!has_xsave) {
>>>>            return kvm_put_fpu(cpu);
>>>>        }
>>>> +
>>>>        x86_cpu_xsave_all_areas(cpu, xsave, env->xsave_buf_len);
>>>>
>>>>        return kvm_vcpu_ioctl(CPU(cpu), KVM_SET_XSAVE, xsave);
>>>> @@ -3247,10 +3253,16 @@ static int kvm_get_xsave(X86CPU *cpu)
>>>>            return kvm_get_fpu(cpu);
>>>>        }
>>>>
>>>> -    ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_XSAVE, xsave);
>>>> +    if (env->xsave_buf_len <= sizeof(struct kvm_xsave)) {
>>>> +        ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_XSAVE, xsave);
>>>> +    } else {
>>>> +        ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_XSAVE2, xsave);
>>>> +    }
>>>> +
>>>>        if (ret < 0) {
>>>>            return ret;
>>>>        }
>>>> +
>>>>        x86_cpu_xrstor_all_areas(cpu, xsave, env->xsave_buf_len);
>>>>
>>>>        return 0;
>>>> diff --git a/target/i386/xsave_helper.c b/target/i386/xsave_helper.c
>>>> index ac61a96344..090424e820 100644
>>>> --- a/target/i386/xsave_helper.c
>>>> +++ b/target/i386/xsave_helper.c
>>>> @@ -5,6 +5,7 @@
>>>>    #include "qemu/osdep.h"
>>>>
>>>>    #include "cpu.h"
>>>> +#include <asm/kvm.h>
>>>>
>>>>    void x86_cpu_xsave_all_areas(X86CPU *cpu, void *buf, uint32_t buflen)
>>>>    {
>>>> @@ -126,6 +127,23 @@ void x86_cpu_xsave_all_areas(X86CPU *cpu,
>> void
>>>> *buf, uint32_t buflen)
>>>>
>>>>            memcpy(pkru, &env->pkru, sizeof(env->pkru));
>>>>        }
>>>> +
>>>> +    e = &x86_ext_save_areas[XSTATE_XTILE_CFG_BIT];
>>>> +    if (e->size && e->offset) {
>>>> +        XSaveXTILE_CFG *tilecfg = buf + e->offset;
>>>> +
>>>> +        memcpy(tilecfg, &env->xtilecfg, sizeof(env->xtilecfg));
>>>> +    }
>>>> +
>>>> +    if (buflen > sizeof(struct kvm_xsave)) {
>>>> +        e = &x86_ext_save_areas[XSTATE_XTILE_DATA_BIT];
>>>> +
>>>> +        if (e->size && e->offset) {
>>>> +            XSaveXTILE_DATA *tiledata = buf + e->offset;
>>>> +
>>>> +            memcpy(tiledata, &env->xtiledata, sizeof(env->xtiledata));
>>>> +        }
>>>> +    }
>>>>    #endif
>>>>    }
>>>>
>>>> @@ -247,5 +265,22 @@ void x86_cpu_xrstor_all_areas(X86CPU *cpu,
>> const
>>>> void *buf, uint32_t buflen)
>>>>            pkru = buf + e->offset;
>>>>            memcpy(&env->pkru, pkru, sizeof(env->pkru));
>>>>        }
>>>> +
>>>> +    e = &x86_ext_save_areas[XSTATE_XTILE_CFG_BIT];
>>>> +    if (e->size && e->offset) {
>>>> +        const XSaveXTILE_CFG *tilecfg = buf + e->offset;
>>>> +
>>>> +        memcpy(&env->xtilecfg, tilecfg, sizeof(env->xtilecfg));
>>>> +    }
>>>> +
>>>> +    if (buflen > sizeof(struct kvm_xsave)) {
>>>> +        e = &x86_ext_save_areas[XSTATE_XTILE_DATA_BIT];
>>>> +
>>>> +        if (e->size && e->offset) {
>>>> +            const XSaveXTILE_DATA *tiledata = buf + e->offset;
>>>> +
>>>> +            memcpy(&env->xtiledata, tiledata, sizeof(env->xtiledata));
>>>> +        }
>>>> +    }
>>>>    #endif
>>>>    }


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* RE: [RFC PATCH 6/7] x86: Use new XSAVE ioctls handling
  2022-01-12  2:51         ` Zeng Guang
@ 2022-01-12  4:34           ` Wang, Wei W
  0 siblings, 0 replies; 31+ messages in thread
From: Wang, Wei W @ 2022-01-12  4:34 UTC (permalink / raw)
  To: Zeng, Guang, Tian, Kevin, Zhong, Yang, qemu-devel
  Cc: pbonzini, jing2.liu, Christopherson, , Sean

On Wednesday, January 12, 2022 10:51 AM, Zeng, Guang wrote:
> To: Tian, Kevin <kevin.tian@intel.com>; Zhong, Yang <yang.zhong@intel.com>;
> qemu-devel@nongnu.org
> Cc: pbonzini@redhat.com; Christopherson,, Sean <seanjc@google.com>;
> jing2.liu@linux.intel.com; Wang, Wei W <wei.w.wang@intel.com>
> Subject: Re: [RFC PATCH 6/7] x86: Use new XSAVE ioctls handling
> 
> On 1/11/2022 10:30 AM, Tian, Kevin wrote:
> >> From: Zeng, Guang <guang.zeng@intel.com>
> >> Sent: Monday, January 10, 2022 5:47 PM
> >>
> >> On 1/10/2022 4:40 PM, Tian, Kevin wrote:
> >>>> From: Zhong, Yang <yang.zhong@intel.com>
> >>>> Sent: Friday, January 7, 2022 5:32 PM
> >>>>
> >>>> From: Jing Liu <jing2.liu@intel.com>
> >>>>
> >>>> Extended feature has large state while current kvm_xsave only
> >>>> allows 4KB. Use new XSAVE ioctls if the xstate size is large than
> >>>> kvm_xsave.
> >>> shouldn't we always use the new xsave ioctls as long as
> >>> CAP_XSAVE2 is available?
> >>
> >> CAP_XSAVE2 may return legacy xsave size or 0 working with old kvm
> >> version in which it's not available.
> >> QEMU just use the new xsave ioctls only when the return value of
> >> CAP_XSAVE2 is larger than legacy xsave size.
> > CAP_XSAVE2  is the superset of CAP_XSAVE. If available it can support
> > both legacy 4K size or bigger.
> 
> Got your point now. We can use new ioctl once CAP_XSAVE2 is available.
> As your suggestion, I'd like to change commit log as follows:
> 
> "x86: Use new XSAVE ioctls handling
> 
>    Extended feature has large state while current
>    kvm_xsave only allows 4KB. Use new XSAVE ioctls
>    if check extension of CAP_XSAVE2 is available."
> 
> And introduce has_xsave2 to indicate the valid of CAP_XSAVE2 with following
> change:
> 
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index
> 97520e9dff..c8dae88ced 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -116,6 +116,7 @@ static bool has_msr_ucode_rev;
>   static bool has_msr_vmx_procbased_ctls2;
>   static bool has_msr_perf_capabs;
>   static bool has_msr_pkrs;
> +static bool has_xsave2 = false;

It's 0-initialized, so I think no need for the "false" assignment.
Probably better to use "int" (like has_xsave), and improved it a bit:

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 3fb3ddbe2b..dee40ad0ad 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -122,6 +122,7 @@ static uint32_t num_architectural_pmu_gp_counters;
 static uint32_t num_architectural_pmu_fixed_counters;

 static int has_xsave;
+static int has_xsave2;
 static int has_xcrs;
 static int has_pit_state2;
 static int has_exception_payload;
@@ -1564,6 +1565,26 @@ static Error *invtsc_mig_blocker;

 #define KVM_MAX_CPUID_ENTRIES  100

+static void kvm_init_xsave(CPUX86State *env)
+{
+    if (has_xsave2) {
+        env->xsave_buf_len = QEMU_ALIGN_UP(has_xsave2, 4096);;
+    } else if (has_xsave) {
+        env->xsave_buf_len = sizeof(struct kvm_xsave);
+    } else {
+        return;
+    }
+
+    env->xsave_buf = qemu_memalign(4096, env->xsave_buf_len);
+    memset(env->xsave_buf, 0, env->xsave_buf_len);
+     /*
+      * The allocated storage must be large enough for all of the
+      * possible XSAVE state components.
+      */
+    assert(kvm_arch_get_supported_cpuid(kvm_state, 0xd, 0, R_ECX) <=
+           env->xsave_buf_len);
+}
+
 int kvm_arch_init_vcpu(CPUState *cs)
 {
     struct {
@@ -1982,18 +2003,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
         goto fail;
     }

-    if (has_xsave) {
-        env->xsave_buf_len = sizeof(struct kvm_xsave);
-        env->xsave_buf = qemu_memalign(4096, env->xsave_buf_len);
-        memset(env->xsave_buf, 0, env->xsave_buf_len);
-
-        /*
-         * The allocated storage must be large enough for all of the
-         * possible XSAVE state components.
-         */
-        assert(kvm_arch_get_supported_cpuid(kvm_state, 0xd, 0, R_ECX)
-               <= env->xsave_buf_len);
-    }
+    kvm_init_xsave(env);

     max_nested_state_len = kvm_max_nested_state_length();
     if (max_nested_state_len > 0) {
@@ -2323,6 +2333,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
     }

     has_xsave = kvm_check_extension(s, KVM_CAP_XSAVE);
+    has_xsave2 = kvm_check_extension(s, KVM_CAP_XSAVE2);
     has_xcrs = kvm_check_extension(s, KVM_CAP_XCRS);
     has_pit_state2 = kvm_check_extension(s, KVM_CAP_PIT_STATE2);

@@ -3241,13 +3252,14 @@ static int kvm_get_xsave(X86CPU *cpu)
 {
     CPUX86State *env = &cpu->env;
     void *xsave = env->xsave_buf;
-    int ret;
+    int type, ret;

     if (!has_xsave) {
         return kvm_get_fpu(cpu);
     }

-    ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_XSAVE, xsave);
+    type = has_xsave2 ? KVM_GET_XSAVE2: KVM_GET_XSAVE;
+    ret = kvm_vcpu_ioctl(CPU(cpu), type, xsave);
     if (ret < 0) {
         return ret;
     }

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 1/7] x86: Fix the 64-byte boundary enumeration for extended state
  2022-01-11  2:22     ` Yang Zhong
@ 2022-01-18 12:37       ` Paolo Bonzini
  2022-01-21  7:14         ` Yang Zhong
  0 siblings, 1 reply; 31+ messages in thread
From: Paolo Bonzini @ 2022-01-18 12:37 UTC (permalink / raw)
  To: Yang Zhong, Tian, Kevin
  Cc: Christopherson, , Sean, Wang, Wei W, jing2.liu, qemu-devel, Zeng, Guang

On 1/11/22 03:22, Yang Zhong wrote:
>    Thanks Kevin, I will update this in next version.

Also:

     The extended state subleaves (EAX=0Dh, ECX=n, n>1).ECX[1]
     indicate whether the extended state component locates
     on the next 64-byte boundary following the preceding state
     component when the compacted format of an XSAVE area is
     used.

     Right now, they are all zero because no supported component
     needed the bit to be set, but the upcoming AMX feature will
     use it.  Fix the subleaves value according to KVM's supported
     cpuid.

Paolo


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 2/7] x86: Add AMX XTILECFG and XTILEDATA components
  2022-01-10  8:23   ` Tian, Kevin
  2022-01-11  2:32     ` Yang Zhong
@ 2022-01-18 12:39     ` Paolo Bonzini
  2022-01-21  7:15       ` Yang Zhong
  1 sibling, 1 reply; 31+ messages in thread
From: Paolo Bonzini @ 2022-01-18 12:39 UTC (permalink / raw)
  To: Tian, Kevin, Zhong, Yang, qemu-devel
  Cc: Zeng, Guang, Wang, Wei W, jing2.liu, Christopherson, , Sean

On 1/10/22 09:23, Tian, Kevin wrote:
>>
>> AMX XTILECFG and XTILEDATA are managed by XSAVE feature
>> set. State component 17 is used for 64-byte TILECFG register
>> (XTILECFG state) and component 18 is used for 8192 bytes
>> of tile data (XTILEDATA state).
> to be consistent, "tile data" -> "TILEDATA"
> 

Previous sentences use "XTILECFG" / "XTILEDATA", not "TILEDATA".

So I would say:

The AMX TILECFG register and the TMMx tile data registers are 
saved/restored via XSAVE, respectively in state component 17 (64 bytes) 
and state component 18 (8192 bytes).

Paolo


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 4/7] x86: Add XFD faulting bit for state components
  2022-01-07  9:31 ` [RFC PATCH 4/7] x86: Add XFD faulting bit for state components Yang Zhong
  2022-01-10  8:38   ` Tian, Kevin
@ 2022-01-18 12:52   ` Paolo Bonzini
  2022-01-21  7:18     ` Yang Zhong
  1 sibling, 1 reply; 31+ messages in thread
From: Paolo Bonzini @ 2022-01-18 12:52 UTC (permalink / raw)
  To: Yang Zhong, qemu-devel
  Cc: seanjc, kevin.tian, jing2.liu, wei.w.wang, guang.zeng

On 1/7/22 10:31, Yang Zhong wrote:
> -    uint32_t need_align;
> +    uint32_t need_align, support_xfd;

These can be replaced by a single field "uint32_t ecx".

You can add also macros like

#define ESA_FEATURE_ALIGN64_BIT	(1)
#define ESA_FEATURE_XFD_BIT	(2)

to simplify access.

Paolo


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 3/7] x86: Grant AMX permission for guest
  2022-01-07  9:31 ` [RFC PATCH 3/7] x86: Grant AMX permission for guest Yang Zhong
  2022-01-10  8:36   ` Tian, Kevin
@ 2022-01-18 12:52   ` Paolo Bonzini
  2022-01-18 13:06     ` Paolo Bonzini
  1 sibling, 1 reply; 31+ messages in thread
From: Paolo Bonzini @ 2022-01-18 12:52 UTC (permalink / raw)
  To: Yang Zhong, qemu-devel
  Cc: seanjc, kevin.tian, jing2.liu, wei.w.wang, guang.zeng

On 1/7/22 10:31, Yang Zhong wrote:
> +static void x86_xsave_req_perm(void)
> +{
> +    unsigned long bitmask;
> +
> +    long rc = syscall(SYS_arch_prctl, ARCH_REQ_XCOMP_GUEST_PERM,
> +                      XSTATE_XTILE_DATA_BIT);
> +    if (rc) {
> +        /*
> +         * The older kernel version(<5.15) can't support
> +         * ARCH_REQ_XCOMP_GUEST_PERM and directly return.
> +         */
> +        return;
> +    }
> +
> +    rc = syscall(SYS_arch_prctl, ARCH_GET_XCOMP_GUEST_PERM, &bitmask);
> +    if (rc) {
> +        error_report("prctl(ARCH_GET_XCOMP_GUEST_PERM) error: %ld", rc);
> +    } else if (!(bitmask & XFEATURE_XTILE_MASK)) {
> +        error_report("prctl(ARCH_REQ_XCOMP_GUEST_PERM) failure "
> +                     "and bitmask=0x%lx", bitmask);
> +        exit(EXIT_FAILURE);
> +    }
> +}
> +
>   void x86_cpus_init(X86MachineState *x86ms, int default_cpu_version)
>   {
>       int i;
> @@ -124,6 +150,8 @@ void x86_cpus_init(X86MachineState *x86ms, int default_cpu_version)
>       MachineState *ms = MACHINE(x86ms);
>       MachineClass *mc = MACHINE_GET_CLASS(x86ms);
>   
> +    /* Request AMX pemission for guest */
> +    x86_xsave_req_perm();
>       x86_cpu_set_default_version(default_cpu_version);
>   

This should be done before creating a CPU with support for state 
component 18.  It happens in kvm_init_vcpu, with the following call stack:

	kvm_init_vcpu
	kvm_vcpu_thread_fn
	kvm_start_vcpu_thread
	qemu_init_vcpu
	x86_cpu_realizefn

The issue however is that this has to be done before 
KVM_GET_SUPPORTED_CPUID and KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2).

For the former, you can assume that anything returned by 
ARCH_GET_XCOMP_GUEST_PERM will be returned by KVM_GET_SUPPORTED_CPUID in 
CPUID[0xD].EDX:EAX, so you can:

- add it to kvm_arch_get_supported_cpuid


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 3/7] x86: Grant AMX permission for guest
  2022-01-18 12:52   ` Paolo Bonzini
@ 2022-01-18 13:06     ` Paolo Bonzini
  2022-01-21  7:21       ` Yang Zhong
  0 siblings, 1 reply; 31+ messages in thread
From: Paolo Bonzini @ 2022-01-18 13:06 UTC (permalink / raw)
  To: Yang Zhong, qemu-devel
  Cc: seanjc, kevin.tian, jing2.liu, wei.w.wang, guang.zeng

Sorry, hit send on the wrong window.  This is the only patch that will 
require a bit more work.

On 1/18/22 13:52, Paolo Bonzini wrote:
>> @@ -124,6 +150,8 @@ void x86_cpus_init(X86MachineState *x86ms, int 
>> default_cpu_version)
>>       MachineState *ms = MACHINE(x86ms);
>>       MachineClass *mc = MACHINE_GET_CLASS(x86ms);
>> +    /* Request AMX pemission for guest */
>> +    x86_xsave_req_perm();
>>       x86_cpu_set_default_version(default_cpu_version);
> 
> This should be done before creating a CPU with support for state 
> component 18.  It happens in kvm_init_vcpu, with the following call stack:
> 
>      kvm_init_vcpu
>      kvm_vcpu_thread_fn
>      kvm_start_vcpu_thread
>      qemu_init_vcpu
>      x86_cpu_realizefn
> 
> The issue however is that this has to be done before 
> KVM_GET_SUPPORTED_CPUID and KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2).
> 
> For the former, you can assume that anything returned by 
> ARCH_GET_XCOMP_GUEST_PERM will be returned by KVM_GET_SUPPORTED_CPUID in 
> CPUID[0xD].EDX:EAX, so you can:
> 
> - add it to kvm_arch_get_supported_cpuid

... together with the other special cases (otherwise 
x86_cpu_get_supported_feature_word complains that XTILEDATA is not 
available)

- change kvm_cpu_xsave_init to use host_cpuid instead of 
kvm_arch_get_supported_cpuid.

- call ARCH_REQ_XCOMP_GUEST_PERM from x86_cpu_enable_xsave_components, 
with a conditional like

     if (kvm_enabled()) {
         kvm_request_xsave_components(cpu, mask);
     }

KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2) is actually not a problem; the ioctl 
is only called from kvm_arch_init_vcpu and therefore after 
x86_cpu_enable_xsave_components.

Thanks,

Paolo



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 1/7] x86: Fix the 64-byte boundary enumeration for extended state
  2022-01-18 12:37       ` Paolo Bonzini
@ 2022-01-21  7:14         ` Yang Zhong
  0 siblings, 0 replies; 31+ messages in thread
From: Yang Zhong @ 2022-01-21  7:14 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: yang.zhong, Tian, Kevin, Christopherson, ,
	Sean, jing2.liu, qemu-devel, Wang, Wei W, Zeng, Guang

On Tue, Jan 18, 2022 at 01:37:20PM +0100, Paolo Bonzini wrote:
> On 1/11/22 03:22, Yang Zhong wrote:
> >   Thanks Kevin, I will update this in next version.
> 
> Also:
> 
>     The extended state subleaves (EAX=0Dh, ECX=n, n>1).ECX[1]
>     indicate whether the extended state component locates
>     on the next 64-byte boundary following the preceding state
>     component when the compacted format of an XSAVE area is
>     used.
> 
>     Right now, they are all zero because no supported component
>     needed the bit to be set, but the upcoming AMX feature will
>     use it.  Fix the subleaves value according to KVM's supported
>     cpuid.
>
      Thanks Paolo, I will update this in new version.

      Yang
      
> Paolo


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 2/7] x86: Add AMX XTILECFG and XTILEDATA components
  2022-01-18 12:39     ` Paolo Bonzini
@ 2022-01-21  7:15       ` Yang Zhong
  0 siblings, 0 replies; 31+ messages in thread
From: Yang Zhong @ 2022-01-21  7:15 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: yang.zhong, Tian, Kevin, Christopherson, ,
	Sean, jing2.liu, qemu-devel, Wang, Wei W, Zeng, Guang

On Tue, Jan 18, 2022 at 01:39:59PM +0100, Paolo Bonzini wrote:
> On 1/10/22 09:23, Tian, Kevin wrote:
> >>
> >>AMX XTILECFG and XTILEDATA are managed by XSAVE feature
> >>set. State component 17 is used for 64-byte TILECFG register
> >>(XTILECFG state) and component 18 is used for 8192 bytes
> >>of tile data (XTILEDATA state).
> >to be consistent, "tile data" -> "TILEDATA"
> >
> 
> Previous sentences use "XTILECFG" / "XTILEDATA", not "TILEDATA".
> 
> So I would say:
> 
> The AMX TILECFG register and the TMMx tile data registers are
> saved/restored via XSAVE, respectively in state component 17 (64
> bytes) and state component 18 (8192 bytes).
>

  Thanks Paolo, I will update this in new version.
  Yang
 
> Paolo


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 4/7] x86: Add XFD faulting bit for state components
  2022-01-18 12:52   ` Paolo Bonzini
@ 2022-01-21  7:18     ` Yang Zhong
  0 siblings, 0 replies; 31+ messages in thread
From: Yang Zhong @ 2022-01-21  7:18 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: yang.zhong, kevin.tian, seanjc, jing2.liu, qemu-devel,
	wei.w.wang, guang.zeng

On Tue, Jan 18, 2022 at 01:52:51PM +0100, Paolo Bonzini wrote:
> On 1/7/22 10:31, Yang Zhong wrote:
> >-    uint32_t need_align;
> >+    uint32_t need_align, support_xfd;
> 
> These can be replaced by a single field "uint32_t ecx".
> 
> You can add also macros like
> 
> #define ESA_FEATURE_ALIGN64_BIT	(1)
> #define ESA_FEATURE_XFD_BIT	(2)
> 
> to simplify access.
  
  Thanks Paolo, this is a more simplified solution, thanks!

  Yang
 
> Paolo


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 3/7] x86: Grant AMX permission for guest
  2022-01-18 13:06     ` Paolo Bonzini
@ 2022-01-21  7:21       ` Yang Zhong
  0 siblings, 0 replies; 31+ messages in thread
From: Yang Zhong @ 2022-01-21  7:21 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: yang.zhong, kevin.tian, seanjc, jing2.liu, qemu-devel,
	wei.w.wang, guang.zeng

On Tue, Jan 18, 2022 at 02:06:55PM +0100, Paolo Bonzini wrote:
> Sorry, hit send on the wrong window.  This is the only patch that
> will require a bit more work.
> 
> On 1/18/22 13:52, Paolo Bonzini wrote:
> >>@@ -124,6 +150,8 @@ void x86_cpus_init(X86MachineState *x86ms,
> >>int default_cpu_version)
> >>      MachineState *ms = MACHINE(x86ms);
> >>      MachineClass *mc = MACHINE_GET_CLASS(x86ms);
> >>+    /* Request AMX pemission for guest */
> >>+    x86_xsave_req_perm();
> >>      x86_cpu_set_default_version(default_cpu_version);
> >
> >This should be done before creating a CPU with support for state
> >component 18.  It happens in kvm_init_vcpu, with the following
> >call stack:
> >
> >     kvm_init_vcpu
> >     kvm_vcpu_thread_fn
> >     kvm_start_vcpu_thread
> >     qemu_init_vcpu
> >     x86_cpu_realizefn
> >
> >The issue however is that this has to be done before
> >KVM_GET_SUPPORTED_CPUID and KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2).
> >
> >For the former, you can assume that anything returned by
> >ARCH_GET_XCOMP_GUEST_PERM will be returned by
> >KVM_GET_SUPPORTED_CPUID in CPUID[0xD].EDX:EAX, so you can:
> >
> >- add it to kvm_arch_get_supported_cpuid
> 
> ... together with the other special cases (otherwise
> x86_cpu_get_supported_feature_word complains that XTILEDATA is not
> available)
> 
> - change kvm_cpu_xsave_init to use host_cpuid instead of
> kvm_arch_get_supported_cpuid.
> 
> - call ARCH_REQ_XCOMP_GUEST_PERM from
> x86_cpu_enable_xsave_components, with a conditional like
> 
>     if (kvm_enabled()) {
>         kvm_request_xsave_components(cpu, mask);
>     }
> 
> KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2) is actually not a problem; the
> ioctl is only called from kvm_arch_init_vcpu and therefore after
> x86_cpu_enable_xsave_components.
>
  
  Paolo, thanks too much for those detailed steps!
  I have completed the new patch according to those steps, and work well.

  Since this is only big change patch, the next version will be removed RFC.

  Thanks!
  Yang  
  
 
> Thanks,
> 
> Paolo


^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2022-01-21  9:43 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-07  9:31 [RFC PATCH 0/7] AMX support in Qemu Yang Zhong
2022-01-07  9:31 ` [RFC PATCH 1/7] x86: Fix the 64-byte boundary enumeration for extended state Yang Zhong
2022-01-10  8:20   ` Tian, Kevin
2022-01-11  2:22     ` Yang Zhong
2022-01-18 12:37       ` Paolo Bonzini
2022-01-21  7:14         ` Yang Zhong
2022-01-07  9:31 ` [RFC PATCH 2/7] x86: Add AMX XTILECFG and XTILEDATA components Yang Zhong
2022-01-10  8:23   ` Tian, Kevin
2022-01-11  2:32     ` Yang Zhong
2022-01-18 12:39     ` Paolo Bonzini
2022-01-21  7:15       ` Yang Zhong
2022-01-07  9:31 ` [RFC PATCH 3/7] x86: Grant AMX permission for guest Yang Zhong
2022-01-10  8:36   ` Tian, Kevin
2022-01-11  6:46     ` Yang Zhong
2022-01-18 12:52   ` Paolo Bonzini
2022-01-18 13:06     ` Paolo Bonzini
2022-01-21  7:21       ` Yang Zhong
2022-01-07  9:31 ` [RFC PATCH 4/7] x86: Add XFD faulting bit for state components Yang Zhong
2022-01-10  8:38   ` Tian, Kevin
2022-01-11  5:32     ` Yang Zhong
2022-01-18 12:52   ` Paolo Bonzini
2022-01-21  7:18     ` Yang Zhong
2022-01-07  9:31 ` [RFC PATCH 5/7] x86: Add AMX CPUIDs enumeration Yang Zhong
2022-01-07  9:31 ` [RFC PATCH 6/7] x86: Use new XSAVE ioctls handling Yang Zhong
2022-01-10  8:40   ` Tian, Kevin
2022-01-10  9:47     ` Zeng Guang
2022-01-11  2:30       ` Tian, Kevin
2022-01-11  4:29         ` Zeng Guang
2022-01-12  2:51         ` Zeng Guang
2022-01-12  4:34           ` Wang, Wei W
2022-01-07  9:31 ` [RFC PATCH 7/7] x86: Support XFD and AMX xsave data migration Yang Zhong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.