All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/4] target/i386: Add new CPU model SapphireRapids and new fast string op leaves
@ 2023-02-27 10:13 Paolo Bonzini
  2023-02-27 10:13 ` [PATCH v4 1/4] target/i386: add FSRM to TCG Paolo Bonzini
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Paolo Bonzini @ 2023-02-27 10:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: lei4.wang, robert.hu, xiaoyao.li, chenyi.qiang

Sapphire Rapids enablement patches got stuck on the doubts regarding
properties for AMX support.  However, for now there is no need to have
anything but hardcoded values, because all Intel processors with AMX
currently support exactly the same palettes and TMUL limits.  Intel has
also promised that palette formats will remain backwards compatible so
the only worry is for the TMUL leaf, CPUID[1Eh].

However, providing modifiable properties for AMX is premature.  Rather,
the first step should be to _validate_ host CPUID values against the
ones supported by QEMU.  So for now apply the simpler patch that only
adds the new model.

In addition, add the FZRM, FSRS, FSRC bits: first, they are now supported
by Linux (albeit only in the upcoming 6.3 release); second, they are just
markers that do not require any support in the hypervisors.  While at
it, this series also adds these new markers as well as FSRM to TCG's
"-cpu max" model.

Supersedes: <20230106083826.5384-1-lei4.wang@intel.com>

Paolo Bonzini (3):
  target/i386: add FSRM to TCG
  target/i386: add FZRM, FSRS, FSRC
  target/i386: KVM: allow fast string operations if host supports them

Wang, Lei (1):
  target/i386: Add new CPU model SapphireRapids

 target/i386/cpu.c     | 142 ++++++++++++++++++++++++++++++++++++++++--
 target/i386/cpu.h     |  11 ++++
 target/i386/kvm/kvm.c |  17 ++++-
 3 files changed, 163 insertions(+), 7 deletions(-)

-- 
2.39.1



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v4 1/4] target/i386: add FSRM to TCG
  2023-02-27 10:13 [PATCH v4 0/4] target/i386: Add new CPU model SapphireRapids and new fast string op leaves Paolo Bonzini
@ 2023-02-27 10:13 ` Paolo Bonzini
  2023-02-27 19:29   ` Richard Henderson
  2023-02-27 10:13 ` [PATCH v4 2/4] target/i386: add FZRM, FSRS, FSRC Paolo Bonzini
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2023-02-27 10:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: lei4.wang, robert.hu, xiaoyao.li, chenyi.qiang

Fast short REP MOVS can be added to TCG, since a trivial translation
of string operation is a good option for short lengths.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 target/i386/cpu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 4d2b8d0444df..34e2cead870e 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -661,7 +661,7 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t vendor1,
 #define TCG_7_0_ECX_FEATURES (CPUID_7_0_ECX_UMIP | CPUID_7_0_ECX_PKU | \
           /* CPUID_7_0_ECX_OSPKE is dynamic */ \
           CPUID_7_0_ECX_LA57 | CPUID_7_0_ECX_PKS | CPUID_7_0_ECX_VAES)
-#define TCG_7_0_EDX_FEATURES 0
+#define TCG_7_0_EDX_FEATURES CPUID_7_0_EDX_FSRM
 #define TCG_7_1_EAX_FEATURES 0
 #define TCG_APM_FEATURES 0
 #define TCG_6_EAX_FEATURES CPUID_6_EAX_ARAT
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 2/4] target/i386: add FZRM, FSRS, FSRC
  2023-02-27 10:13 [PATCH v4 0/4] target/i386: Add new CPU model SapphireRapids and new fast string op leaves Paolo Bonzini
  2023-02-27 10:13 ` [PATCH v4 1/4] target/i386: add FSRM to TCG Paolo Bonzini
@ 2023-02-27 10:13 ` Paolo Bonzini
  2023-02-27 13:39   ` Xiaoyao Li
  2023-02-27 19:31   ` Richard Henderson
  2023-02-27 10:13 ` [PATCH v4 3/4] target/i386: KVM: allow fast string operations if host supports them Paolo Bonzini
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 11+ messages in thread
From: Paolo Bonzini @ 2023-02-27 10:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: lei4.wang, robert.hu, xiaoyao.li, chenyi.qiang

These are three more markers for string operation optimizations.
They can all be added to TCG, whose string operations are more or
less as fast as they can be for short lengths.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 target/i386/cpu.c | 7 ++++---
 target/i386/cpu.h | 7 +++++++
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 34e2cead870e..26ec6e9da754 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -662,7 +662,8 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t vendor1,
           /* CPUID_7_0_ECX_OSPKE is dynamic */ \
           CPUID_7_0_ECX_LA57 | CPUID_7_0_ECX_PKS | CPUID_7_0_ECX_VAES)
 #define TCG_7_0_EDX_FEATURES CPUID_7_0_EDX_FSRM
-#define TCG_7_1_EAX_FEATURES 0
+#define TCG_7_1_EAX_FEATURES (CPUID_7_1_EAX_FZRM | CPUID_7_1_EAX_FSRS | \
+          CPUID_7_1_EAX_FSRC)
 #define TCG_APM_FEATURES 0
 #define TCG_6_EAX_FEATURES CPUID_6_EAX_ARAT
 #define TCG_XSAVE_FEATURES (CPUID_XSAVE_XSAVEOPT | CPUID_XSAVE_XGETBV1)
@@ -872,8 +873,8 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
         .feat_names = {
             NULL, NULL, NULL, NULL,
             "avx-vnni", "avx512-bf16", NULL, NULL,
-            NULL, NULL, NULL, NULL,
-            NULL, NULL, NULL, NULL,
+            NULL, NULL, "fzrm", "fsrs",
+            "fsrc", NULL, NULL, NULL,
             NULL, NULL, NULL, NULL,
             NULL, NULL, NULL, NULL,
             NULL, NULL, NULL, NULL,
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index d4bc19577a21..e0703feb5ed0 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -900,6 +900,13 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
 #define CPUID_7_1_EAX_AVX_VNNI          (1U << 4)
 /* AVX512 BFloat16 Instruction */
 #define CPUID_7_1_EAX_AVX512_BF16       (1U << 5)
+/* Fast Zero REP MOVS */
+#define CPUID_7_1_EAX_FZRM              (1U << 10)
+/* Fast Short REP STOS */
+#define CPUID_7_1_EAX_FSRS              (1U << 11)
+/* Fast Short REP CMPS/SCAS */
+#define CPUID_7_1_EAX_FSRC              (1U << 12)
+
 /* XFD Extend Feature Disabled */
 #define CPUID_D_1_EAX_XFD               (1U << 4)
 
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 3/4] target/i386: KVM: allow fast string operations if host supports them
  2023-02-27 10:13 [PATCH v4 0/4] target/i386: Add new CPU model SapphireRapids and new fast string op leaves Paolo Bonzini
  2023-02-27 10:13 ` [PATCH v4 1/4] target/i386: add FSRM to TCG Paolo Bonzini
  2023-02-27 10:13 ` [PATCH v4 2/4] target/i386: add FZRM, FSRS, FSRC Paolo Bonzini
@ 2023-02-27 10:13 ` Paolo Bonzini
  2023-02-27 13:35   ` Xiaoyao Li
  2023-02-27 19:32   ` Richard Henderson
       [not found] ` <20230227101332.636203-5-pbonzini@redhat.com>
  2023-02-28  8:46 ` [PATCH v4 0/4] target/i386: Add new CPU model SapphireRapids and new fast string op leaves Xiaoyao Li
  4 siblings, 2 replies; 11+ messages in thread
From: Paolo Bonzini @ 2023-02-27 10:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: lei4.wang, robert.hu, xiaoyao.li, chenyi.qiang

These are just a flag that documents the performance characteristic of
an instruction; it needs no hypervisor support.  So include them even
if KVM does not show them.  In particular, FZRM/FSRS/FSRC have only
been added very recently, but they are available on Sapphire Rapids
processors.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 target/i386/kvm/kvm.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 587030199192..fe66a4953d41 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -352,7 +352,7 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t function,
 {
     struct kvm_cpuid2 *cpuid;
     uint32_t ret = 0;
-    uint32_t cpuid_1_edx;
+    uint32_t cpuid_1_edx, unused;
     uint64_t bitmask;
 
     cpuid = get_supported_cpuid(s);
@@ -399,10 +399,20 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t function,
     } else if (function == 6 && reg == R_EAX) {
         ret |= CPUID_6_EAX_ARAT; /* safe to allow because of emulated APIC */
     } else if (function == 7 && index == 0 && reg == R_EBX) {
+        /* Not new instructions, just an optimization.  */
+        uint32_t ebx;
+        host_cpuid(1, 0, &unused, &ebx, &unused, &unused);
+        ret |= ebx & CPUID_7_0_EBX_ERMS;
+
         if (host_tsx_broken()) {
             ret &= ~(CPUID_7_0_EBX_RTM | CPUID_7_0_EBX_HLE);
         }
     } else if (function == 7 && index == 0 && reg == R_EDX) {
+        /* Not new instructions, just an optimization.  */
+        uint32_t edx;
+        host_cpuid(1, 0, &unused, &unused, &unused, &edx);
+        ret |= edx & CPUID_7_0_EDX_FSRM;
+
         /*
          * Linux v4.17-v4.20 incorrectly return ARCH_CAPABILITIES on SVM hosts.
          * We can detect the bug by checking if MSR_IA32_ARCH_CAPABILITIES is
@@ -411,6 +421,11 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t function,
         if (!has_msr_arch_capabs) {
             ret &= ~CPUID_7_0_EDX_ARCH_CAPABILITIES;
         }
+    } else if (function == 7 && index == 1 && reg == R_EAX) {
+        /* Not new instructions, just an optimization.  */
+        uint32_t eax;
+        host_cpuid(1, 0, &eax, &unused, &unused, &unused);
+        ret |= eax & (CPUID_7_1_EAX_FZRM | CPUID_7_1_EAX_FSRS | CPUID_7_1_EAX_FSRC);
     } else if (function == 0xd && index == 0 &&
                (reg == R_EAX || reg == R_EDX)) {
         /*
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 3/4] target/i386: KVM: allow fast string operations if host supports them
  2023-02-27 10:13 ` [PATCH v4 3/4] target/i386: KVM: allow fast string operations if host supports them Paolo Bonzini
@ 2023-02-27 13:35   ` Xiaoyao Li
  2023-02-27 19:32   ` Richard Henderson
  1 sibling, 0 replies; 11+ messages in thread
From: Xiaoyao Li @ 2023-02-27 13:35 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: lei4.wang, robert.hu, chenyi.qiang

On 2/27/2023 6:13 PM, Paolo Bonzini wrote:
> These are just a flag that documents the performance characteristic of
> an instruction; it needs no hypervisor support.  So include them even
> if KVM does not show them.  In particular, FZRM/FSRS/FSRC have only
> been added very recently, but they are available on Sapphire Rapids
> processors.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   target/i386/kvm/kvm.c | 17 ++++++++++++++++-
>   1 file changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 587030199192..fe66a4953d41 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -352,7 +352,7 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t function,
>   {
>       struct kvm_cpuid2 *cpuid;
>       uint32_t ret = 0;
> -    uint32_t cpuid_1_edx;
> +    uint32_t cpuid_1_edx, unused;
>       uint64_t bitmask;
>   
>       cpuid = get_supported_cpuid(s);
> @@ -399,10 +399,20 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t function,
>       } else if (function == 6 && reg == R_EAX) {
>           ret |= CPUID_6_EAX_ARAT; /* safe to allow because of emulated APIC */
>       } else if (function == 7 && index == 0 && reg == R_EBX) {
> +        /* Not new instructions, just an optimization.  */
> +        uint32_t ebx;
> +        host_cpuid(1, 0, &unused, &ebx, &unused, &unused);
                       ^

It should be leaf 7, not 1.

> +        ret |= ebx & CPUID_7_0_EBX_ERMS;
> +
>           if (host_tsx_broken()) {
>               ret &= ~(CPUID_7_0_EBX_RTM | CPUID_7_0_EBX_HLE);
>           }
>       } else if (function == 7 && index == 0 && reg == R_EDX) {
> +        /* Not new instructions, just an optimization.  */
> +        uint32_t edx;
> +        host_cpuid(1, 0, &unused, &unused, &unused, &edx);

Ditto.

> +        ret |= edx & CPUID_7_0_EDX_FSRM;
> +
>           /*
>            * Linux v4.17-v4.20 incorrectly return ARCH_CAPABILITIES on SVM hosts.
>            * We can detect the bug by checking if MSR_IA32_ARCH_CAPABILITIES is
> @@ -411,6 +421,11 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t function,
>           if (!has_msr_arch_capabs) {
>               ret &= ~CPUID_7_0_EDX_ARCH_CAPABILITIES;
>           }
> +    } else if (function == 7 && index == 1 && reg == R_EAX) {
> +        /* Not new instructions, just an optimization.  */
> +        uint32_t eax;
> +        host_cpuid(1, 0, &eax, &unused, &unused, &unused);

Ditto.

After them fixed,

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>

> +        ret |= eax & (CPUID_7_1_EAX_FZRM | CPUID_7_1_EAX_FSRS | CPUID_7_1_EAX_FSRC);
>       } else if (function == 0xd && index == 0 &&
>                  (reg == R_EAX || reg == R_EDX)) {
>           /*



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 2/4] target/i386: add FZRM, FSRS, FSRC
  2023-02-27 10:13 ` [PATCH v4 2/4] target/i386: add FZRM, FSRS, FSRC Paolo Bonzini
@ 2023-02-27 13:39   ` Xiaoyao Li
  2023-02-27 19:31   ` Richard Henderson
  1 sibling, 0 replies; 11+ messages in thread
From: Xiaoyao Li @ 2023-02-27 13:39 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: lei4.wang, robert.hu, chenyi.qiang

On 2/27/2023 6:13 PM, Paolo Bonzini wrote:
> These are three more markers for string operation optimizations.
> They can all be added to TCG, whose string operations are more or
> less as fast as they can be for short lengths.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>

> ---
>   target/i386/cpu.c | 7 ++++---
>   target/i386/cpu.h | 7 +++++++
>   2 files changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 34e2cead870e..26ec6e9da754 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -662,7 +662,8 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t vendor1,
>             /* CPUID_7_0_ECX_OSPKE is dynamic */ \
>             CPUID_7_0_ECX_LA57 | CPUID_7_0_ECX_PKS | CPUID_7_0_ECX_VAES)
>   #define TCG_7_0_EDX_FEATURES CPUID_7_0_EDX_FSRM
> -#define TCG_7_1_EAX_FEATURES 0
> +#define TCG_7_1_EAX_FEATURES (CPUID_7_1_EAX_FZRM | CPUID_7_1_EAX_FSRS | \
> +          CPUID_7_1_EAX_FSRC)
>   #define TCG_APM_FEATURES 0
>   #define TCG_6_EAX_FEATURES CPUID_6_EAX_ARAT
>   #define TCG_XSAVE_FEATURES (CPUID_XSAVE_XSAVEOPT | CPUID_XSAVE_XGETBV1)
> @@ -872,8 +873,8 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
>           .feat_names = {
>               NULL, NULL, NULL, NULL,
>               "avx-vnni", "avx512-bf16", NULL, NULL,
> -            NULL, NULL, NULL, NULL,
> -            NULL, NULL, NULL, NULL,
> +            NULL, NULL, "fzrm", "fsrs",
> +            "fsrc", NULL, NULL, NULL,
>               NULL, NULL, NULL, NULL,
>               NULL, NULL, NULL, NULL,
>               NULL, NULL, NULL, NULL,
> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> index d4bc19577a21..e0703feb5ed0 100644
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -900,6 +900,13 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
>   #define CPUID_7_1_EAX_AVX_VNNI          (1U << 4)
>   /* AVX512 BFloat16 Instruction */
>   #define CPUID_7_1_EAX_AVX512_BF16       (1U << 5)
> +/* Fast Zero REP MOVS */
> +#define CPUID_7_1_EAX_FZRM              (1U << 10)
> +/* Fast Short REP STOS */
> +#define CPUID_7_1_EAX_FSRS              (1U << 11)
> +/* Fast Short REP CMPS/SCAS */
> +#define CPUID_7_1_EAX_FSRC              (1U << 12)
> +
>   /* XFD Extend Feature Disabled */
>   #define CPUID_D_1_EAX_XFD               (1U << 4)
>   



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 4/4] target/i386: Add new CPU model SapphireRapids
       [not found] ` <20230227101332.636203-5-pbonzini@redhat.com>
@ 2023-02-27 13:45   ` Xiaoyao Li
  0 siblings, 0 replies; 11+ messages in thread
From: Xiaoyao Li @ 2023-02-27 13:45 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: lei4.wang, robert.hu, chenyi.qiang

On 2/27/2023 6:13 PM, Paolo Bonzini wrote:
> From: "Wang, Lei" <lei4.wang@intel.com>
> 
> The new CPU model mostly inherits features from Icelake-Server, while
> adding new features:
>   - AMX (Advance Matrix eXtensions)
>   - Bus Lock Debug Exception
> and new instructions:
>   - AVX VNNI (Vector Neural Network Instruction):
>      - VPDPBUS: Multiply and Add Unsigned and Signed Bytes
>      - VPDPBUSDS: Multiply and Add Unsigned and Signed Bytes with Saturation
>      - VPDPWSSD: Multiply and Add Signed Word Integers
>      - VPDPWSSDS: Multiply and Add Signed Integers with Saturation
>   - FP16: Replicates existing AVX512 computational SP (FP32) instructions
>     using FP16 instead of FP32 for ~2X performance gain
>   - SERIALIZE: Provide software with a simple way to force the processor to
>     complete all modifications, faster, allowed in all privilege levels and
>     not causing an unconditional VM exit
>   - TSX Suspend Load Address Tracking: Allows programmers to choose which
>     memory accesses do not need to be tracked in the TSX read set
>   - AVX512_BF16: Vector Neural Network Instructions supporting BFLOAT16
>     inputs and conversion instructions from IEEE single precision
> 
> Features may be added in future versions:
>   - CET (virtualization support hasn't been merged)
> Instructions may be added in future versions:
>   - fast zero-length MOVSB (KVM doesn't support yet)
>   - fast short STOSB (KVM doesn't support yet)
>   - fast short CMPSB, SCASB (KVM doesn't support yet)

Paolo,

CPUID of the three intructions are added in your this re-post comparing 
to Lei's original. Please remove the description above.

> Signed-off-by: Wang, Lei <lei4.wang@intel.com>
> Reviewed-by: Robert Hoo <robert.hu@linux.intel.com>
> Message-Id: <20220812055751.14553-1-lei4.wang@intel.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

With above fixed,

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>

> ---
>   target/i386/cpu.c | 133 +++++++++++++++++++++++++++++++++++++++++++++-
>   target/i386/cpu.h |   4 ++
>   2 files changed, 135 insertions(+), 2 deletions(-)
> 
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 26ec6e9da754..4bad3d41d33f 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -3468,6 +3468,135 @@ static const X86CPUDefinition builtin_x86_defs[] = {
>               { /* end of list */ }
>           }
>       },
> +    {
> +        .name = "SapphireRapids",
> +        .level = 0x20,
> +        .vendor = CPUID_VENDOR_INTEL,
> +        .family = 6,
> +        .model = 143,
> +        .stepping = 4,
> +        /*
> +         * please keep the ascending order so that we can have a clear view of
> +         * bit position of each feature.
> +         */
> +        .features[FEAT_1_EDX] =
> +            CPUID_FP87 | CPUID_VME | CPUID_DE | CPUID_PSE | CPUID_TSC |
> +            CPUID_MSR | CPUID_PAE | CPUID_MCE | CPUID_CX8 | CPUID_APIC |
> +            CPUID_SEP | CPUID_MTRR | CPUID_PGE | CPUID_MCA | CPUID_CMOV |
> +            CPUID_PAT | CPUID_PSE36 | CPUID_CLFLUSH | CPUID_MMX | CPUID_FXSR |
> +            CPUID_SSE | CPUID_SSE2,
> +        .features[FEAT_1_ECX] =
> +            CPUID_EXT_SSE3 | CPUID_EXT_PCLMULQDQ | CPUID_EXT_SSSE3 |
> +            CPUID_EXT_FMA | CPUID_EXT_CX16 | CPUID_EXT_PCID | CPUID_EXT_SSE41 |
> +            CPUID_EXT_SSE42 | CPUID_EXT_X2APIC | CPUID_EXT_MOVBE |
> +            CPUID_EXT_POPCNT | CPUID_EXT_TSC_DEADLINE_TIMER | CPUID_EXT_AES |
> +            CPUID_EXT_XSAVE | CPUID_EXT_AVX | CPUID_EXT_F16C | CPUID_EXT_RDRAND,
> +        .features[FEAT_8000_0001_EDX] =
> +            CPUID_EXT2_SYSCALL | CPUID_EXT2_NX | CPUID_EXT2_PDPE1GB |
> +            CPUID_EXT2_RDTSCP | CPUID_EXT2_LM,
> +        .features[FEAT_8000_0001_ECX] =
> +            CPUID_EXT3_LAHF_LM | CPUID_EXT3_ABM | CPUID_EXT3_3DNOWPREFETCH,
> +        .features[FEAT_8000_0008_EBX] =
> +            CPUID_8000_0008_EBX_WBNOINVD,
> +        .features[FEAT_7_0_EBX] =
> +            CPUID_7_0_EBX_FSGSBASE | CPUID_7_0_EBX_BMI1 | CPUID_7_0_EBX_HLE |
> +            CPUID_7_0_EBX_AVX2 | CPUID_7_0_EBX_SMEP | CPUID_7_0_EBX_BMI2 |
> +            CPUID_7_0_EBX_ERMS | CPUID_7_0_EBX_INVPCID | CPUID_7_0_EBX_RTM |
> +            CPUID_7_0_EBX_AVX512F | CPUID_7_0_EBX_AVX512DQ |
> +            CPUID_7_0_EBX_RDSEED | CPUID_7_0_EBX_ADX | CPUID_7_0_EBX_SMAP |
> +            CPUID_7_0_EBX_AVX512IFMA | CPUID_7_0_EBX_CLFLUSHOPT |
> +            CPUID_7_0_EBX_CLWB | CPUID_7_0_EBX_AVX512CD | CPUID_7_0_EBX_SHA_NI |
> +            CPUID_7_0_EBX_AVX512BW | CPUID_7_0_EBX_AVX512VL,
> +        .features[FEAT_7_0_ECX] =
> +            CPUID_7_0_ECX_AVX512_VBMI | CPUID_7_0_ECX_UMIP | CPUID_7_0_ECX_PKU |
> +            CPUID_7_0_ECX_AVX512_VBMI2 | CPUID_7_0_ECX_GFNI |
> +            CPUID_7_0_ECX_VAES | CPUID_7_0_ECX_VPCLMULQDQ |
> +            CPUID_7_0_ECX_AVX512VNNI | CPUID_7_0_ECX_AVX512BITALG |
> +            CPUID_7_0_ECX_AVX512_VPOPCNTDQ | CPUID_7_0_ECX_LA57 |
> +            CPUID_7_0_ECX_RDPID | CPUID_7_0_ECX_BUS_LOCK_DETECT,
> +        .features[FEAT_7_0_EDX] =
> +            CPUID_7_0_EDX_FSRM | CPUID_7_0_EDX_SERIALIZE |
> +            CPUID_7_0_EDX_TSX_LDTRK | CPUID_7_0_EDX_AMX_BF16 |
> +            CPUID_7_0_EDX_AVX512_FP16 | CPUID_7_0_EDX_AMX_TILE |
> +            CPUID_7_0_EDX_AMX_INT8 | CPUID_7_0_EDX_SPEC_CTRL |
> +            CPUID_7_0_EDX_ARCH_CAPABILITIES | CPUID_7_0_EDX_SPEC_CTRL_SSBD,
> +        .features[FEAT_ARCH_CAPABILITIES] =
> +            MSR_ARCH_CAP_RDCL_NO | MSR_ARCH_CAP_IBRS_ALL |
> +            MSR_ARCH_CAP_SKIP_L1DFL_VMENTRY | MSR_ARCH_CAP_MDS_NO |
> +            MSR_ARCH_CAP_PSCHANGE_MC_NO | MSR_ARCH_CAP_TAA_NO,
> +        .features[FEAT_XSAVE] =
> +            CPUID_XSAVE_XSAVEOPT | CPUID_XSAVE_XSAVEC |
> +            CPUID_XSAVE_XGETBV1 | CPUID_XSAVE_XSAVES | CPUID_D_1_EAX_XFD,
> +        .features[FEAT_6_EAX] =
> +            CPUID_6_EAX_ARAT,
> +        .features[FEAT_7_1_EAX] =
> +            CPUID_7_1_EAX_AVX_VNNI | CPUID_7_1_EAX_AVX512_BF16 |
> +            CPUID_7_1_EAX_FZRM | CPUID_7_1_EAX_FSRS | CPUID_7_1_EAX_FSRC,
> +        .features[FEAT_VMX_BASIC] =
> +            MSR_VMX_BASIC_INS_OUTS | MSR_VMX_BASIC_TRUE_CTLS,
> +        .features[FEAT_VMX_ENTRY_CTLS] =
> +            VMX_VM_ENTRY_LOAD_DEBUG_CONTROLS | VMX_VM_ENTRY_IA32E_MODE |
> +            VMX_VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL |
> +            VMX_VM_ENTRY_LOAD_IA32_PAT | VMX_VM_ENTRY_LOAD_IA32_EFER,
> +        .features[FEAT_VMX_EPT_VPID_CAPS] =
> +            MSR_VMX_EPT_EXECONLY |
> +            MSR_VMX_EPT_PAGE_WALK_LENGTH_4 | MSR_VMX_EPT_PAGE_WALK_LENGTH_5 |
> +            MSR_VMX_EPT_WB | MSR_VMX_EPT_2MB | MSR_VMX_EPT_1GB |
> +            MSR_VMX_EPT_INVEPT | MSR_VMX_EPT_AD_BITS |
> +            MSR_VMX_EPT_INVEPT_SINGLE_CONTEXT | MSR_VMX_EPT_INVEPT_ALL_CONTEXT |
> +            MSR_VMX_EPT_INVVPID | MSR_VMX_EPT_INVVPID_SINGLE_ADDR |
> +            MSR_VMX_EPT_INVVPID_SINGLE_CONTEXT |
> +            MSR_VMX_EPT_INVVPID_ALL_CONTEXT |
> +            MSR_VMX_EPT_INVVPID_SINGLE_CONTEXT_NOGLOBALS,
> +        .features[FEAT_VMX_EXIT_CTLS] =
> +            VMX_VM_EXIT_SAVE_DEBUG_CONTROLS |
> +            VMX_VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL |
> +            VMX_VM_EXIT_ACK_INTR_ON_EXIT | VMX_VM_EXIT_SAVE_IA32_PAT |
> +            VMX_VM_EXIT_LOAD_IA32_PAT | VMX_VM_EXIT_SAVE_IA32_EFER |
> +            VMX_VM_EXIT_LOAD_IA32_EFER | VMX_VM_EXIT_SAVE_VMX_PREEMPTION_TIMER,
> +        .features[FEAT_VMX_MISC] =
> +            MSR_VMX_MISC_STORE_LMA | MSR_VMX_MISC_ACTIVITY_HLT |
> +            MSR_VMX_MISC_VMWRITE_VMEXIT,
> +        .features[FEAT_VMX_PINBASED_CTLS] =
> +            VMX_PIN_BASED_EXT_INTR_MASK | VMX_PIN_BASED_NMI_EXITING |
> +            VMX_PIN_BASED_VIRTUAL_NMIS | VMX_PIN_BASED_VMX_PREEMPTION_TIMER |
> +            VMX_PIN_BASED_POSTED_INTR,
> +        .features[FEAT_VMX_PROCBASED_CTLS] =
> +            VMX_CPU_BASED_VIRTUAL_INTR_PENDING |
> +            VMX_CPU_BASED_USE_TSC_OFFSETING | VMX_CPU_BASED_HLT_EXITING |
> +            VMX_CPU_BASED_INVLPG_EXITING | VMX_CPU_BASED_MWAIT_EXITING |
> +            VMX_CPU_BASED_RDPMC_EXITING | VMX_CPU_BASED_RDTSC_EXITING |
> +            VMX_CPU_BASED_CR3_LOAD_EXITING | VMX_CPU_BASED_CR3_STORE_EXITING |
> +            VMX_CPU_BASED_CR8_LOAD_EXITING | VMX_CPU_BASED_CR8_STORE_EXITING |
> +            VMX_CPU_BASED_TPR_SHADOW | VMX_CPU_BASED_VIRTUAL_NMI_PENDING |
> +            VMX_CPU_BASED_MOV_DR_EXITING | VMX_CPU_BASED_UNCOND_IO_EXITING |
> +            VMX_CPU_BASED_USE_IO_BITMAPS | VMX_CPU_BASED_MONITOR_TRAP_FLAG |
> +            VMX_CPU_BASED_USE_MSR_BITMAPS | VMX_CPU_BASED_MONITOR_EXITING |
> +            VMX_CPU_BASED_PAUSE_EXITING |
> +            VMX_CPU_BASED_ACTIVATE_SECONDARY_CONTROLS,
> +        .features[FEAT_VMX_SECONDARY_CTLS] =
> +            VMX_SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
> +            VMX_SECONDARY_EXEC_ENABLE_EPT | VMX_SECONDARY_EXEC_DESC |
> +            VMX_SECONDARY_EXEC_RDTSCP |
> +            VMX_SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE |
> +            VMX_SECONDARY_EXEC_ENABLE_VPID | VMX_SECONDARY_EXEC_WBINVD_EXITING |
> +            VMX_SECONDARY_EXEC_UNRESTRICTED_GUEST |
> +            VMX_SECONDARY_EXEC_APIC_REGISTER_VIRT |
> +            VMX_SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
> +            VMX_SECONDARY_EXEC_RDRAND_EXITING |
> +            VMX_SECONDARY_EXEC_ENABLE_INVPCID |
> +            VMX_SECONDARY_EXEC_ENABLE_VMFUNC | VMX_SECONDARY_EXEC_SHADOW_VMCS |
> +            VMX_SECONDARY_EXEC_RDSEED_EXITING | VMX_SECONDARY_EXEC_ENABLE_PML |
> +            VMX_SECONDARY_EXEC_XSAVES,
> +        .features[FEAT_VMX_VMFUNC] =
> +            MSR_VMX_VMFUNC_EPT_SWITCHING,
> +        .xlevel = 0x80000008,
> +        .model_id = "Intel Xeon Processor (SapphireRapids)",
> +        .versions = (X86CPUVersionDefinition[]) {
> +            { .version = 1 },
> +            { /* end of list */ },
> +        },
> +    },
>       {
>           .name = "Denverton",
>           .level = 21,
> @@ -5623,7 +5752,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
>           break;
>       }
>       case 0x1D: {
> -        /* AMX TILE */
> +        /* AMX TILE, for now hardcoded for Sapphire Rapids*/
>           *eax = 0;
>           *ebx = 0;
>           *ecx = 0;
> @@ -5644,7 +5773,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
>           break;
>       }
>       case 0x1E: {
> -        /* AMX TMUL */
> +        /* AMX TMUL, for now hardcoded for Sapphire Rapids */
>           *eax = 0;
>           *ebx = 0;
>           *ecx = 0;
> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> index e0703feb5ed0..41777fb4b029 100644
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -881,10 +881,14 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
>   #define CPUID_7_0_EDX_TSX_LDTRK         (1U << 16)
>   /* Architectural LBRs */
>   #define CPUID_7_0_EDX_ARCH_LBR          (1U << 19)
> +/* AMX_BF16 instruction */
> +#define CPUID_7_0_EDX_AMX_BF16          (1U << 22)
>   /* AVX512_FP16 instruction */
>   #define CPUID_7_0_EDX_AVX512_FP16       (1U << 23)
>   /* AMX tile (two-dimensional register) */
>   #define CPUID_7_0_EDX_AMX_TILE          (1U << 24)
> +/* AMX_INT8 instruction */
> +#define CPUID_7_0_EDX_AMX_INT8          (1U << 25)
>   /* Speculation Control */
>   #define CPUID_7_0_EDX_SPEC_CTRL         (1U << 26)
>   /* Single Thread Indirect Branch Predictors */



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 1/4] target/i386: add FSRM to TCG
  2023-02-27 10:13 ` [PATCH v4 1/4] target/i386: add FSRM to TCG Paolo Bonzini
@ 2023-02-27 19:29   ` Richard Henderson
  0 siblings, 0 replies; 11+ messages in thread
From: Richard Henderson @ 2023-02-27 19:29 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: lei4.wang, robert.hu, xiaoyao.li, chenyi.qiang

On 2/27/23 00:13, Paolo Bonzini wrote:
> Fast short REP MOVS can be added to TCG, since a trivial translation
> of string operation is a good option for short lengths.
> 
> Signed-off-by: Paolo Bonzini<pbonzini@redhat.com>
> ---
>   target/i386/cpu.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 2/4] target/i386: add FZRM, FSRS, FSRC
  2023-02-27 10:13 ` [PATCH v4 2/4] target/i386: add FZRM, FSRS, FSRC Paolo Bonzini
  2023-02-27 13:39   ` Xiaoyao Li
@ 2023-02-27 19:31   ` Richard Henderson
  1 sibling, 0 replies; 11+ messages in thread
From: Richard Henderson @ 2023-02-27 19:31 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: lei4.wang, robert.hu, xiaoyao.li, chenyi.qiang

On 2/27/23 00:13, Paolo Bonzini wrote:
> These are three more markers for string operation optimizations.
> They can all be added to TCG, whose string operations are more or
> less as fast as they can be for short lengths.
> 
> Signed-off-by: Paolo Bonzini<pbonzini@redhat.com>
> ---
>   target/i386/cpu.c | 7 ++++---
>   target/i386/cpu.h | 7 +++++++
>   2 files changed, 11 insertions(+), 3 deletions(-)

They could in fact be faster, but good enough.  :-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 3/4] target/i386: KVM: allow fast string operations if host supports them
  2023-02-27 10:13 ` [PATCH v4 3/4] target/i386: KVM: allow fast string operations if host supports them Paolo Bonzini
  2023-02-27 13:35   ` Xiaoyao Li
@ 2023-02-27 19:32   ` Richard Henderson
  1 sibling, 0 replies; 11+ messages in thread
From: Richard Henderson @ 2023-02-27 19:32 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: lei4.wang, robert.hu, xiaoyao.li, chenyi.qiang

On 2/27/23 00:13, Paolo Bonzini wrote:
> These are just a flag that documents the performance characteristic of
> an instruction; it needs no hypervisor support.  So include them even
> if KVM does not show them.  In particular, FZRM/FSRS/FSRC have only
> been added very recently, but they are available on Sapphire Rapids
> processors.
> 
> Signed-off-by: Paolo Bonzini<pbonzini@redhat.com>
> ---
>   target/i386/kvm/kvm.c | 17 ++++++++++++++++-
>   1 file changed, 16 insertions(+), 1 deletion(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 0/4] target/i386: Add new CPU model SapphireRapids and new fast string op leaves
  2023-02-27 10:13 [PATCH v4 0/4] target/i386: Add new CPU model SapphireRapids and new fast string op leaves Paolo Bonzini
                   ` (3 preceding siblings ...)
       [not found] ` <20230227101332.636203-5-pbonzini@redhat.com>
@ 2023-02-28  8:46 ` Xiaoyao Li
  4 siblings, 0 replies; 11+ messages in thread
From: Xiaoyao Li @ 2023-02-28  8:46 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: lei4.wang, robert.hu, chenyi.qiang

On 2/27/2023 6:13 PM, Paolo Bonzini wrote:
> Sapphire Rapids enablement patches got stuck on the doubts regarding
> properties for AMX support.  However, for now there is no need to have
> anything but hardcoded values, because all Intel processors with AMX
> currently support exactly the same palettes and TMUL limits.  Intel has
> also promised that palette formats will remain backwards compatible so
> the only worry is for the TMUL leaf, CPUID[1Eh].
> 
> However, providing modifiable properties for AMX is premature.  
> Rather,
> the first step should be to_validate_  host CPUID values against the
> ones supported by QEMU.  

Paolo,

The validation of host CPUID values (kvm supported CPUIDs) against the 
ones supported by QEMU (the hardcoded value) is missing in current QEMU.

As for how to implement the validation, I have two options in mind:

a) special check in x86_cpu_filter_features() just like what did for 
Intel PT:

     if ((env->features[FEAT_7_0_EBX] & CPUID_7_0_EBX_INTEL_PT) &&
         kvm_enabled()) {
         KVMState *s = CPU(cpu)->kvm_state;
         uint32_t eax_0 = kvm_arch_get_supported_cpuid(s, 0x14, 0, R_EAX);
         uint32_t ebx_0 = kvm_arch_get_supported_cpuid(s, 0x14, 0, R_EBX);
         uint32_t ecx_0 = kvm_arch_get_supported_cpuid(s, 0x14, 0, R_ECX);
         uint32_t eax_1 = kvm_arch_get_supported_cpuid(s, 0x14, 1, R_EAX);
         uint32_t ebx_1 = kvm_arch_get_supported_cpuid(s, 0x14, 1, R_EBX);

         if (!eax_0 ||
            ((ebx_0 & INTEL_PT_MINIMAL_EBX) != INTEL_PT_MINIMAL_EBX) ||
            ((ecx_0 & INTEL_PT_MINIMAL_ECX) != INTEL_PT_MINIMAL_ECX) ||
            ((eax_1 & INTEL_PT_MTC_BITMAP) != INTEL_PT_MTC_BITMAP) ||
            ((eax_1 & INTEL_PT_ADDR_RANGES_NUM_MASK) <
                                            INTEL_PT_ADDR_RANGES_NUM) ||
            ((ebx_1 & (INTEL_PT_PSB_BITMAP | INTEL_PT_CYCLE_BITMAP)) !=
                 (INTEL_PT_PSB_BITMAP | INTEL_PT_CYCLE_BITMAP)) ||
            ((ecx_0 & CPUID_14_0_ECX_LIP) !=
                 (env->features[FEAT_14_0_ECX] & CPUID_14_0_ECX_LIP))) {
             /*
              * Processor Trace capabilities aren't configurable, so if the
              * host can't emulate the capabilities we report on
              * cpu_x86_cpuid(), intel-pt can't be enabled on the 
current host.
              */
             mark_unavailable_features(cpu, FEAT_7_0_EBX, 
CPUID_7_0_EBX_INTEL_PT, prefix);
         }
     }

This has flaws for leaf 0x1e, since its value might change on future 
production (Intel PT is facing this exact problem that SPR has less PT 
capabilities of CPUID(0x14,1):EBX[15:0] than ICX, and Intel PT cannot be 
enabled for guest on SPR machine). As well, if hardware reports 
different value of leaf 0x1e in the future, QEMU will fail to enable AMX 
for guest.

b) at least introduce FEAT_ for CPUID leaf 0x1E, so that it will be 
checked in x86_cpu_filter_features() automatically and "-cpu max/host" 
can pass through the host's value to guest. The additional work is that 
we might need MultiBitFeature framework introducing in 
https://lore.kernel.org/qemu-devel/20230106083826.5384-1-lei4.wang@intel.com/T/#t

Do you think it worths the effort to go for option b? or just option a 
for now is enough?

> So for now apply the simpler patch that only
> adds the new model.



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-02-28  8:47 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-27 10:13 [PATCH v4 0/4] target/i386: Add new CPU model SapphireRapids and new fast string op leaves Paolo Bonzini
2023-02-27 10:13 ` [PATCH v4 1/4] target/i386: add FSRM to TCG Paolo Bonzini
2023-02-27 19:29   ` Richard Henderson
2023-02-27 10:13 ` [PATCH v4 2/4] target/i386: add FZRM, FSRS, FSRC Paolo Bonzini
2023-02-27 13:39   ` Xiaoyao Li
2023-02-27 19:31   ` Richard Henderson
2023-02-27 10:13 ` [PATCH v4 3/4] target/i386: KVM: allow fast string operations if host supports them Paolo Bonzini
2023-02-27 13:35   ` Xiaoyao Li
2023-02-27 19:32   ` Richard Henderson
     [not found] ` <20230227101332.636203-5-pbonzini@redhat.com>
2023-02-27 13:45   ` [PATCH v4 4/4] target/i386: Add new CPU model SapphireRapids Xiaoyao Li
2023-02-28  8:46 ` [PATCH v4 0/4] target/i386: Add new CPU model SapphireRapids and new fast string op leaves Xiaoyao Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.