All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/9] x86emul: misc additions
@ 2023-04-04 14:48 Jan Beulich
  2023-04-04 14:49 ` [PATCH 1/9] x86emul: support LKGS Jan Beulich
                   ` (8 more replies)
  0 siblings, 9 replies; 26+ messages in thread
From: Jan Beulich @ 2023-04-04 14:48 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

This series adds support for a number of more or less recently announced
ISA extensions. Plus a little bit of (more or less related) cleanup. The
series interacts mildly (and only contextually) with the AVX512-FP16 one.
Note that patch 1 was previously posted standalone; the posting here is
unchanged, so isn't called "v2". Note further that while the last patch
is kind of incomplete (it doesn't enable the feature for guest use), it
could still be applied ahead of further VMX-specific work that's needed
there (and that's dependent upon documentation becoming more complete).

Apart from contextual interaction the patches should be largely
independent of one another; only 5 strictly depends on 4.

1: support LKGS
2: support WRMSRNS
3: drop regs field from emulator state structure
4: support CMPccXADD
5: re-use new stub_exn field in state structure
6: support AVX-IFMA insns
7: support AVX-VNNI-INT8
8: support AVX-NE-CONVERT insns
9: support {RD,WR}MSRLIST

Jan


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/9] x86emul: support LKGS
  2023-04-04 14:48 [PATCH 0/9] x86emul: misc additions Jan Beulich
@ 2023-04-04 14:49 ` Jan Beulich
  2023-04-04 21:54   ` Andrew Cooper
  2023-04-04 14:50 ` [PATCH 2/9] x86emul: support WRMSRNS Jan Beulich
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2023-04-04 14:49 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

Provide support for this insn, which is a prereq to FRED. CPUID-wise
introduce both its and FRED's bit at this occasion, thus allowing to
also express the dependency right away.

While adding a testcase, also add a SWAPGS one. In order to not affect
the behavior of pre-existing tests, install write_{segment,msr} hooks
only transiently.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Instead of ->read_segment() we could of course also use ->read_msr() to
fetch the original GS base. I don't think I can see a clear advantage of
either approach; the way it's done it matches how we handle SWAPGS.

For PV save_segments() would need adjustment, but the insn being
restricted to ring 0 means PV guests can't use it anyway (unless we
wanted to emulate it as another privileged insn).

--- a/tools/libs/light/libxl_cpuid.c
+++ b/tools/libs/light/libxl_cpuid.c
@@ -235,6 +235,8 @@ int libxl_cpuid_parse_config(libxl_cpuid
         {"fzrm",         0x00000007,  1, CPUID_REG_EAX, 10,  1},
         {"fsrs",         0x00000007,  1, CPUID_REG_EAX, 11,  1},
         {"fsrcs",        0x00000007,  1, CPUID_REG_EAX, 12,  1},
+        {"fred",         0x00000007,  1, CPUID_REG_EAX, 17,  1},
+        {"lkgs",         0x00000007,  1, CPUID_REG_EAX, 18,  1},
         {"wrmsrns",      0x00000007,  1, CPUID_REG_EAX, 19,  1},
 
         {"cet-sss",      0x00000007,  1, CPUID_REG_EDX, 18,  1},
--- a/tools/misc/xen-cpuid.c
+++ b/tools/misc/xen-cpuid.c
@@ -190,7 +190,8 @@ static const char *const str_7a1[32] =
     [10] = "fzrm",          [11] = "fsrs",
     [12] = "fsrcs",
 
-    /* 18 */                [19] = "wrmsrns",
+    /* 16 */                [17] = "fred",
+    [18] = "lkgs",          [19] = "wrmsrns",
 };
 
 static const char *const str_e21a[32] =
--- a/tools/tests/x86_emulator/predicates.c
+++ b/tools/tests/x86_emulator/predicates.c
@@ -326,6 +326,7 @@ static const struct {
     { { 0x00, 0x18 }, { 2, 2 }, T, R }, /* ltr */
     { { 0x00, 0x20 }, { 2, 2 }, T, R }, /* verr */
     { { 0x00, 0x28 }, { 2, 2 }, T, R }, /* verw */
+    { { 0x00, 0x30 }, { 0, 2 }, T, R, pfx_f2 }, /* lkgs */
     { { 0x01, 0x00 }, { 2, 2 }, F, W }, /* sgdt */
     { { 0x01, 0x08 }, { 2, 2 }, F, W }, /* sidt */
     { { 0x01, 0x10 }, { 2, 2 }, F, R }, /* lgdt */
--- a/tools/tests/x86_emulator/test_x86_emulator.c
+++ b/tools/tests/x86_emulator/test_x86_emulator.c
@@ -666,6 +666,10 @@ static int blk(
     return x86_emul_blk((void *)offset, p_data, bytes, eflags, state, ctxt);
 }
 
+#ifdef __x86_64__
+static unsigned long gs_base, gs_base_shadow;
+#endif
+
 static int read_segment(
     enum x86_segment seg,
     struct segment_register *reg,
@@ -675,8 +679,30 @@ static int read_segment(
         return X86EMUL_UNHANDLEABLE;
     memset(reg, 0, sizeof(*reg));
     reg->p = 1;
+
+#ifdef __x86_64__
+    if ( seg == x86_seg_gs )
+        reg->base = gs_base;
+#endif
+
+    return X86EMUL_OKAY;
+}
+
+#ifdef __x86_64__
+static int write_segment(
+    enum x86_segment seg,
+    const struct segment_register *reg,
+    struct x86_emulate_ctxt *ctxt)
+{
+    if ( !is_x86_user_segment(seg) )
+        return X86EMUL_UNHANDLEABLE;
+
+    if ( seg == x86_seg_gs )
+        gs_base = reg->base;
+
     return X86EMUL_OKAY;
 }
+#endif
 
 static int read_msr(
     unsigned int reg,
@@ -689,6 +715,20 @@ static int read_msr(
         *val = ctxt->addr_size > 32 ? 0x500 /* LME|LMA */ : 0;
         return X86EMUL_OKAY;
 
+#ifdef __x86_64__
+    case 0xc0000101: /* GS_BASE */
+        if ( ctxt->addr_size < 64 )
+            break;
+        *val = gs_base;
+        return X86EMUL_OKAY;
+
+    case 0xc0000102: /* SHADOW_GS_BASE */
+        if ( ctxt->addr_size < 64 )
+            break;
+        *val = gs_base_shadow;
+        return X86EMUL_OKAY;
+#endif
+
     case 0xc0000103: /* TSC_AUX */
 #define TSC_AUX_VALUE 0xCACACACA
         *val = TSC_AUX_VALUE;
@@ -698,6 +738,31 @@ static int read_msr(
     return X86EMUL_UNHANDLEABLE;
 }
 
+#ifdef __x86_64__
+static int write_msr(
+    unsigned int reg,
+    uint64_t val,
+    struct x86_emulate_ctxt *ctxt)
+{
+    switch ( reg )
+    {
+    case 0xc0000101: /* GS_BASE */
+        if ( ctxt->addr_size < 64 || !is_canonical_address(val) )
+            break;
+        gs_base = val;
+        return X86EMUL_OKAY;
+
+    case 0xc0000102: /* SHADOW_GS_BASE */
+        if ( ctxt->addr_size < 64 || !is_canonical_address(val) )
+            break;
+        gs_base_shadow = val;
+        return X86EMUL_OKAY;
+    }
+
+    return X86EMUL_UNHANDLEABLE;
+}
+#endif
+
 #define INVPCID_ADDR 0x12345678
 #define INVPCID_PCID 0x123
 
@@ -1331,6 +1396,41 @@ int main(int argc, char **argv)
         printf("%u bytes read - ", bytes_read);
         goto fail;
     }
+    printf("okay\n");
+
+    emulops.write_segment = write_segment;
+    emulops.write_msr     = write_msr;
+
+    printf("%-40s", "Testing swapgs...");
+    instr[0] = 0x0f; instr[1] = 0x01; instr[2] = 0xf8;
+    regs.eip = (unsigned long)&instr[0];
+    gs_base = 0xffffeeeecccc8888UL;
+    gs_base_shadow = 0x0000111122224444UL;
+    rc = x86_emulate(&ctxt, &emulops);
+    if ( (rc != X86EMUL_OKAY) ||
+         (regs.eip != (unsigned long)&instr[3]) ||
+         (gs_base != 0x0000111122224444UL) ||
+         (gs_base_shadow != 0xffffeeeecccc8888UL) )
+        goto fail;
+    printf("okay\n");
+
+    printf("%-40s", "Testing lkgs 2(%rdx)...");
+    instr[0] = 0xf2; instr[1] = 0x0f; instr[2] = 0x00; instr[3] = 0x72; instr[4] = 0x02;
+    regs.eip = (unsigned long)&instr[0];
+    regs.edx = (unsigned long)res;
+    res[0]   = 0x00004444;
+    res[1]   = 0x8888cccc;
+    i = cp.extd.nscb; cp.extd.nscb = true; /* for AMD */
+    rc = x86_emulate(&ctxt, &emulops);
+    if ( (rc != X86EMUL_OKAY) ||
+         (regs.eip != (unsigned long)&instr[5]) ||
+         (gs_base != 0x0000111122224444UL) ||
+         gs_base_shadow )
+        goto fail;
+
+    cp.extd.nscb = i;
+    emulops.write_segment = NULL;
+    emulops.write_msr     = NULL;
 #endif
     printf("okay\n");
 
--- a/tools/tests/x86_emulator/x86-emulate.c
+++ b/tools/tests/x86_emulator/x86-emulate.c
@@ -86,6 +86,7 @@ bool emul_test_init(void)
     cp.feat.adx = true;
     cp.feat.avx512pf = cp.feat.avx512f;
     cp.feat.rdpid = true;
+    cp.feat.lkgs = true;
     cp.extd.clzero = true;
 
     if ( cpu_has_xsave )
--- a/xen/arch/x86/x86_emulate/decode.c
+++ b/xen/arch/x86/x86_emulate/decode.c
@@ -744,8 +744,12 @@ decode_twobyte(struct x86_emulate_state
         case 0:
             s->desc |= DstMem | SrcImplicit | Mov;
             break;
+        case 6:
+            if ( !(s->modrm_reg & 1) && mode_64bit() )
+            {
         case 2: case 4:
-            s->desc |= SrcMem16;
+                s->desc |= SrcMem16;
+            }
             break;
         }
         break;
--- a/xen/arch/x86/x86_emulate/private.h
+++ b/xen/arch/x86/x86_emulate/private.h
@@ -594,6 +594,7 @@ amd_like(const struct x86_emulate_ctxt *
 #define vcpu_has_tsxldtrk()    (ctxt->cpuid->feat.tsxldtrk)
 #define vcpu_has_avx_vnni()    (ctxt->cpuid->feat.avx_vnni)
 #define vcpu_has_avx512_bf16() (ctxt->cpuid->feat.avx512_bf16)
+#define vcpu_has_lkgs()        (ctxt->cpuid->feat.lkgs)
 
 #define vcpu_must_have(feat) \
     generate_exception_if(!vcpu_has_##feat(), X86_EXC_UD)
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -2886,8 +2886,31 @@ x86_emulate(
                 break;
             }
             break;
-        default:
-            generate_exception_if(true, EXC_UD);
+        case 6: /* lkgs */
+            generate_exception_if((modrm_reg & 1) || vex.pfx != vex_f2, EXC_UD);
+            generate_exception_if(!mode_64bit() || !mode_ring0(), EXC_UD);
+            vcpu_must_have(lkgs);
+            fail_if(!ops->read_segment || !ops->read_msr ||
+                    !ops->write_segment || !ops->write_msr);
+            if ( (rc = ops->read_msr(MSR_SHADOW_GS_BASE, &msr_val,
+                                     ctxt)) != X86EMUL_OKAY ||
+                 (rc = ops->read_segment(x86_seg_gs, &sreg,
+                                         ctxt)) != X86EMUL_OKAY )
+                goto done;
+            dst.orig_val = sreg.base;
+            if ( (rc = protmode_load_seg(x86_seg_gs, src.val, false, &sreg,
+                                         ctxt, ops)) != X86EMUL_OKAY ||
+                 (rc = ops->write_msr(MSR_SHADOW_GS_BASE, sreg.base,
+                                      ctxt)) != X86EMUL_OKAY )
+                goto done;
+            sreg.base = dst.orig_val;
+            if ( (rc = ops->write_segment(x86_seg_gs, &sreg,
+                                          ctxt)) != X86EMUL_OKAY )
+            {
+                /* Best effort unwind (i.e. no error checking). */
+                ops->write_msr(MSR_SHADOW_GS_BASE, msr_val, ctxt);
+                goto done;
+            }
             break;
         }
         break;
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -281,6 +281,8 @@ XEN_CPUFEATURE(AVX512_BF16,  10*32+ 5) /
 XEN_CPUFEATURE(FZRM,         10*32+10) /*A  Fast Zero-length REP MOVSB */
 XEN_CPUFEATURE(FSRS,         10*32+11) /*A  Fast Short REP STOSB */
 XEN_CPUFEATURE(FSRCS,        10*32+12) /*A  Fast Short REP CMPSB/SCASB */
+XEN_CPUFEATURE(FRED,         10*32+17) /*   Flexible Return and Event Delivery */
+XEN_CPUFEATURE(LKGS,         10*32+18) /*S  Load Kernel GS Base */
 XEN_CPUFEATURE(WRMSRNS,      10*32+19) /*   WRMSR Non-Serialising */
 
 /* AMD-defined CPU features, CPUID level 0x80000021.eax, word 11 */
--- a/xen/tools/gen-cpuid.py
+++ b/xen/tools/gen-cpuid.py
@@ -295,6 +295,9 @@ def crunch_numbers(state):
 
         # In principle the TSXLDTRK insns could also be considered independent.
         RTM: [TSXLDTRK],
+
+        # FRED builds on the LKGS instruction.
+        LKGS: [FRED],
     }
 
     deep_features = tuple(sorted(deps.keys()))



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 2/9] x86emul: support WRMSRNS
  2023-04-04 14:48 [PATCH 0/9] x86emul: misc additions Jan Beulich
  2023-04-04 14:49 ` [PATCH 1/9] x86emul: support LKGS Jan Beulich
@ 2023-04-04 14:50 ` Jan Beulich
  2023-04-05 21:29   ` Andrew Cooper
  2023-04-04 14:51 ` [PATCH 3/9] x86emul: drop regs field from emulator state structure Jan Beulich
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2023-04-04 14:50 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

This insn differs from WRMSR solely in the lack of serialization. Hence
the code used there can simply be used here as well, plus a feature
check of course. As there's no other infrastructure needed beyond
permitting the insn for PV privileged-op emulation (in particular no
separate new VMEXIT) we can expose the insn to guests right away.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/tools/tests/x86_emulator/predicates.c
+++ b/tools/tests/x86_emulator/predicates.c
@@ -341,6 +341,7 @@ static const struct {
     /*{ 0x01, 0xc3 }, { 2, 2 }, F, R }, vmresume */
     { { 0x01, 0xc4 }, { 2, 2 }, F, N }, /* vmxoff */
     { { 0x01, 0xc5 }, { 2, 2 }, F, N }, /* pconfig */
+    { { 0x01, 0xc6 }, { 2, 2 }, F, N }, /* wrmsrns */
     { { 0x01, 0xc8 }, { 2, 2 }, F, N }, /* monitor */
     { { 0x01, 0xc9 }, { 2, 2 }, F, N }, /* mwait */
     { { 0x01, 0xca }, { 2, 2 }, F, N }, /* clac */
--- a/tools/tests/x86_emulator/x86-emulate.c
+++ b/tools/tests/x86_emulator/x86-emulate.c
@@ -87,6 +87,7 @@ bool emul_test_init(void)
     cp.feat.avx512pf = cp.feat.avx512f;
     cp.feat.rdpid = true;
     cp.feat.lkgs = true;
+    cp.feat.wrmsrns = true;
     cp.extd.clzero = true;
 
     if ( cpu_has_xsave )
--- a/xen/arch/x86/pv/emul-priv-op.c
+++ b/xen/arch/x86/pv/emul-priv-op.c
@@ -1252,8 +1252,11 @@ static int cf_check validate(
     {
         unsigned int modrm_rm, modrm_reg;
 
-        if ( x86_insn_modrm(state, &modrm_rm, &modrm_reg) != 3 ||
-             (modrm_rm & 7) != 1 )
+        if ( x86_insn_modrm(state, &modrm_rm, &modrm_reg) != 3 )
+            break;
+        if ( (modrm_rm & 7) == 6 && !(modrm_reg & 7) ) /* wrmsrns, {rd,wr}msrlist */
+            return X86EMUL_OKAY;
+        if ( (modrm_rm & 7) != 1 )
             break;
         switch ( modrm_reg & 7 )
         {
--- a/xen/arch/x86/x86_emulate/0f01.c
+++ b/xen/arch/x86/x86_emulate/0f01.c
@@ -43,6 +43,20 @@ int x86emul_0f01(struct x86_emulate_stat
         struct segment_register sreg;
         uint64_t msr_val;
 
+    case 0xc6:
+        switch ( s->vex.pfx )
+        {
+        case vex_none: /* wrmsrns */
+            vcpu_must_have(wrmsrns);
+            generate_exception_if(!mode_ring0(), X86_EXC_GP, 0);
+            fail_if(!ops->write_msr);
+            rc = ops->write_msr(regs->ecx,
+                                ((uint64_t)regs->r(dx) << 32) | regs->eax,
+                                ctxt);
+            goto done;
+        }
+        generate_exception(X86_EXC_UD);
+
     case 0xca: /* clac */
     case 0xcb: /* stac */
         vcpu_must_have(smap);
--- a/xen/arch/x86/x86_emulate/private.h
+++ b/xen/arch/x86/x86_emulate/private.h
@@ -595,6 +595,7 @@ amd_like(const struct x86_emulate_ctxt *
 #define vcpu_has_avx_vnni()    (ctxt->cpuid->feat.avx_vnni)
 #define vcpu_has_avx512_bf16() (ctxt->cpuid->feat.avx512_bf16)
 #define vcpu_has_lkgs()        (ctxt->cpuid->feat.lkgs)
+#define vcpu_has_wrmsrns()     (ctxt->cpuid->feat.wrmsrns)
 
 #define vcpu_must_have(feat) \
     generate_exception_if(!vcpu_has_##feat(), X86_EXC_UD)
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -283,7 +283,7 @@ XEN_CPUFEATURE(FSRS,         10*32+11) /
 XEN_CPUFEATURE(FSRCS,        10*32+12) /*A  Fast Short REP CMPSB/SCASB */
 XEN_CPUFEATURE(FRED,         10*32+17) /*   Flexible Return and Event Delivery */
 XEN_CPUFEATURE(LKGS,         10*32+18) /*S  Load Kernel GS Base */
-XEN_CPUFEATURE(WRMSRNS,      10*32+19) /*   WRMSR Non-Serialising */
+XEN_CPUFEATURE(WRMSRNS,      10*32+19) /*A  WRMSR Non-Serialising */
 
 /* AMD-defined CPU features, CPUID level 0x80000021.eax, word 11 */
 XEN_CPUFEATURE(LFENCE_DISPATCH,    11*32+ 2) /*A  LFENCE always serializing */



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 3/9] x86emul: drop regs field from emulator state structure
  2023-04-04 14:48 [PATCH 0/9] x86emul: misc additions Jan Beulich
  2023-04-04 14:49 ` [PATCH 1/9] x86emul: support LKGS Jan Beulich
  2023-04-04 14:50 ` [PATCH 2/9] x86emul: support WRMSRNS Jan Beulich
@ 2023-04-04 14:51 ` Jan Beulich
  2023-04-06 15:34   ` Andrew Cooper
  2023-04-04 14:52 ` [PATCH 4/9] x86emul: support CMPccXADD Jan Beulich
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2023-04-04 14:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

For an unclear reason 0552a8cfda43 ("x86emul: track only rIP in emulator
state") converted the original struct cpu_user_regs instance to a
pointer, rather than dropping the field altogether: The pointer merely
aliases the one in the context structure.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/x86_emulate/decode.c
+++ b/xen/arch/x86/x86_emulate/decode.c
@@ -1013,7 +1013,6 @@ int x86emul_decode(struct x86_emulate_st
     s->ea.type = OP_NONE;
     s->ea.mem.seg = x86_seg_ds;
     s->ea.reg = PTR_POISON;
-    s->regs = ctxt->regs;
     s->ip = ctxt->regs->r(ip);
 
     s->op_bytes = def_op_bytes = ad_bytes = def_ad_bytes =
@@ -1129,7 +1128,7 @@ int x86emul_decode(struct x86_emulate_st
             default:
                 BUG(); /* Shouldn't be possible. */
             case 2:
-                if ( s->regs->eflags & X86_EFLAGS_VM )
+                if ( ctxt->regs->eflags & X86_EFLAGS_VM )
                     break;
                 /* fall through */
             case 4:
@@ -1458,33 +1457,33 @@ int x86emul_decode(struct x86_emulate_st
             switch ( s->modrm_rm )
             {
             case 0:
-                s->ea.mem.off = s->regs->bx + s->regs->si;
+                s->ea.mem.off = ctxt->regs->bx + ctxt->regs->si;
                 break;
             case 1:
-                s->ea.mem.off = s->regs->bx + s->regs->di;
+                s->ea.mem.off = ctxt->regs->bx + ctxt->regs->di;
                 break;
             case 2:
                 s->ea.mem.seg = x86_seg_ss;
-                s->ea.mem.off = s->regs->bp + s->regs->si;
+                s->ea.mem.off = ctxt->regs->bp + ctxt->regs->si;
                 break;
             case 3:
                 s->ea.mem.seg = x86_seg_ss;
-                s->ea.mem.off = s->regs->bp + s->regs->di;
+                s->ea.mem.off = ctxt->regs->bp + ctxt->regs->di;
                 break;
             case 4:
-                s->ea.mem.off = s->regs->si;
+                s->ea.mem.off = ctxt->regs->si;
                 break;
             case 5:
-                s->ea.mem.off = s->regs->di;
+                s->ea.mem.off = ctxt->regs->di;
                 break;
             case 6:
                 if ( s->modrm_mod == 0 )
                     break;
                 s->ea.mem.seg = x86_seg_ss;
-                s->ea.mem.off = s->regs->bp;
+                s->ea.mem.off = ctxt->regs->bp;
                 break;
             case 7:
-                s->ea.mem.off = s->regs->bx;
+                s->ea.mem.off = ctxt->regs->bx;
                 break;
             }
             switch ( s->modrm_mod )
@@ -1517,7 +1516,7 @@ int x86emul_decode(struct x86_emulate_st
                                      !s->evex.RX) << 4;
                 else if ( s->sib_index != 4 )
                 {
-                    s->ea.mem.off = *decode_gpr(s->regs, s->sib_index);
+                    s->ea.mem.off = *decode_gpr(ctxt->regs, s->sib_index);
                     s->ea.mem.off <<= s->sib_scale;
                 }
                 if ( (s->modrm_mod == 0) && ((sib_base & 7) == 5) )
@@ -1525,7 +1524,7 @@ int x86emul_decode(struct x86_emulate_st
                 else if ( sib_base == 4 )
                 {
                     s->ea.mem.seg  = x86_seg_ss;
-                    s->ea.mem.off += s->regs->r(sp);
+                    s->ea.mem.off += ctxt->regs->r(sp);
                     if ( !s->ext && (b == 0x8f) )
                         /* POP <rm> computes its EA post increment. */
                         s->ea.mem.off += ((mode_64bit() && (s->op_bytes == 4))
@@ -1534,16 +1533,16 @@ int x86emul_decode(struct x86_emulate_st
                 else if ( sib_base == 5 )
                 {
                     s->ea.mem.seg  = x86_seg_ss;
-                    s->ea.mem.off += s->regs->r(bp);
+                    s->ea.mem.off += ctxt->regs->r(bp);
                 }
                 else
-                    s->ea.mem.off += *decode_gpr(s->regs, sib_base);
+                    s->ea.mem.off += *decode_gpr(ctxt->regs, sib_base);
             }
             else
             {
                 generate_exception_if(d & vSIB, X86_EXC_UD);
                 s->modrm_rm |= (s->rex_prefix & 1) << 3;
-                s->ea.mem.off = *decode_gpr(s->regs, s->modrm_rm);
+                s->ea.mem.off = *decode_gpr(ctxt->regs, s->modrm_rm);
                 if ( (s->modrm_rm == 5) && (s->modrm_mod != 0) )
                     s->ea.mem.seg = x86_seg_ss;
             }
--- a/xen/arch/x86/x86_emulate/private.h
+++ b/xen/arch/x86/x86_emulate/private.h
@@ -321,7 +321,6 @@ struct x86_emulate_state {
 #define imm2 ea.orig_val
 
     unsigned long ip;
-    struct cpu_user_regs *regs;
 
 #ifndef NDEBUG
     /*



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 4/9] x86emul: support CMPccXADD
  2023-04-04 14:48 [PATCH 0/9] x86emul: misc additions Jan Beulich
                   ` (2 preceding siblings ...)
  2023-04-04 14:51 ` [PATCH 3/9] x86emul: drop regs field from emulator state structure Jan Beulich
@ 2023-04-04 14:52 ` Jan Beulich
  2023-04-06 19:28   ` Andrew Cooper
  2023-04-04 14:53 ` [PATCH 5/9] x86emul: re-use new stub_exn field in state structure Jan Beulich
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2023-04-04 14:52 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

Unconditionally wire this through the ->rmw() hook. Since x86_emul_rmw()
now wants to construct and invoke a stub, make stub_exn available to it
via a new field in the emulator state structure.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
# SDE: -grr or -srf

--- a/tools/libs/light/libxl_cpuid.c
+++ b/tools/libs/light/libxl_cpuid.c
@@ -232,6 +232,7 @@ int libxl_cpuid_parse_config(libxl_cpuid
 
         {"avx-vnni",     0x00000007,  1, CPUID_REG_EAX,  4,  1},
         {"avx512-bf16",  0x00000007,  1, CPUID_REG_EAX,  5,  1},
+        {"cmpccxadd",    0x00000007,  1, CPUID_REG_EAX,  7,  1},
         {"fzrm",         0x00000007,  1, CPUID_REG_EAX, 10,  1},
         {"fsrs",         0x00000007,  1, CPUID_REG_EAX, 11,  1},
         {"fsrcs",        0x00000007,  1, CPUID_REG_EAX, 12,  1},
--- a/tools/misc/xen-cpuid.c
+++ b/tools/misc/xen-cpuid.c
@@ -186,6 +186,7 @@ static const char *const str_7d0[32] =
 static const char *const str_7a1[32] =
 {
     [ 4] = "avx-vnni",      [ 5] = "avx512-bf16",
+    /* 6 */                 [ 7] = "cmpccxadd",
 
     [10] = "fzrm",          [11] = "fsrs",
     [12] = "fsrcs",
--- a/tools/tests/x86_emulator/predicates.c
+++ b/tools/tests/x86_emulator/predicates.c
@@ -1388,6 +1388,22 @@ static const struct vex {
     { { 0xdd }, 2, T, R, pfx_66, WIG, Ln }, /* vaesenclast */
     { { 0xde }, 2, T, R, pfx_66, WIG, Ln }, /* vaesdec */
     { { 0xdf }, 2, T, R, pfx_66, WIG, Ln }, /* vaesdeclast */
+    { { 0xe0 }, 2, F, W, pfx_66, Wn, L0 }, /* cmpoxadd */
+    { { 0xe1 }, 2, F, W, pfx_66, Wn, L0 }, /* cmpnoxadd */
+    { { 0xe2 }, 2, F, W, pfx_66, Wn, L0 }, /* cmpbxadd */
+    { { 0xe3 }, 2, F, W, pfx_66, Wn, L0 }, /* cmpnbxadd */
+    { { 0xe4 }, 2, F, W, pfx_66, Wn, L0 }, /* cmpexadd */
+    { { 0xe5 }, 2, F, W, pfx_66, Wn, L0 }, /* cmpnexadd */
+    { { 0xe6 }, 2, F, W, pfx_66, Wn, L0 }, /* cmpbexadd */
+    { { 0xe7 }, 2, F, W, pfx_66, Wn, L0 }, /* cmpaxadd */
+    { { 0xe8 }, 2, F, W, pfx_66, Wn, L0 }, /* cmpsxadd */
+    { { 0xe9 }, 2, F, W, pfx_66, Wn, L0 }, /* cmpnsxadd */
+    { { 0xea }, 2, F, W, pfx_66, Wn, L0 }, /* cmppxadd */
+    { { 0xeb }, 2, F, W, pfx_66, Wn, L0 }, /* cmpnpxadd */
+    { { 0xec }, 2, F, W, pfx_66, Wn, L0 }, /* cmplxadd */
+    { { 0xed }, 2, F, W, pfx_66, Wn, L0 }, /* cmpgexadd */
+    { { 0xee }, 2, F, W, pfx_66, Wn, L0 }, /* cmplexadd */
+    { { 0xef }, 2, F, W, pfx_66, Wn, L0 }, /* cmpgxadd */
     { { 0xf2 }, 2, T, R, pfx_no, Wn, L0 }, /* andn */
     { { 0xf3, 0x08 }, 2, T, R, pfx_no, Wn, L0 }, /* blsr */
     { { 0xf3, 0x10 }, 2, T, R, pfx_no, Wn, L0 }, /* blsmsk */
--- a/tools/tests/x86_emulator/test_x86_emulator.c
+++ b/tools/tests/x86_emulator/test_x86_emulator.c
@@ -1398,6 +1398,78 @@ int main(int argc, char **argv)
     }
     printf("okay\n");
 
+    printf("%-40s", "Testing cmpbxadd %rbx,%r9,(%rdx)...");
+    if ( stack_exec && cpu_has_cmpccxadd )
+    {
+        instr[0] = 0xc4; instr[1] = 0x62; instr[2] = 0xe1; instr[3] = 0xe2; instr[4] = 0x0a;
+        regs.rip = (unsigned long)&instr[0];
+        regs.eflags = EFLAGS_ALWAYS_SET;
+        res[0] = 0x11223344;
+        res[1] = 0x01020304;
+        regs.rdx = (unsigned long)res;
+        regs.r9  = 0x0001020300112233UL;
+        regs.rbx = 0x0101010101010101UL;
+        rc = x86_emulate(&ctxt, &emulops);
+        if ( (rc != X86EMUL_OKAY) ||
+             (regs.eip != (unsigned long)&instr[5]) ||
+             (regs.r9 != 0x0102030411223344UL) ||
+             (regs.rbx != 0x0101010101010101UL) ||
+             ((regs.eflags & EFLAGS_MASK) !=
+              (X86_EFLAGS_PF | EFLAGS_ALWAYS_SET)) ||
+             (res[0] != 0x11223344) ||
+             (res[1] != 0x01020304) )
+            goto fail;
+
+        regs.rip = (unsigned long)&instr[0];
+        regs.r9 <<= 8;
+        rc = x86_emulate(&ctxt, &emulops);
+        if ( (rc != X86EMUL_OKAY) ||
+             (regs.eip != (unsigned long)&instr[5]) ||
+             (regs.r9 != 0x0102030411223344UL) ||
+             (regs.rbx != 0x0101010101010101UL) ||
+             ((regs.eflags & EFLAGS_MASK) !=
+              (X86_EFLAGS_CF | X86_EFLAGS_PF | X86_EFLAGS_SF |
+               EFLAGS_ALWAYS_SET)) ||
+             (res[0] != 0x12233445) ||
+             (res[1] != 0x02030405) )
+            goto fail;
+        printf("okay\n");
+
+        printf("%-40s", "Testing cmpsxadd %r9d,%ebx,4(%r10)...");
+        instr[1] = 0xc2; instr[2] = 0x31; instr[3] = 0xe8; instr[4] = 0x5a; instr[5] = 0x04;
+        regs.rip = (unsigned long)&instr[0];
+        res[2] = res[0] = ~0;
+        regs.r10 = (unsigned long)res;
+        rc = x86_emulate(&ctxt, &emulops);
+        if ( (rc != X86EMUL_OKAY) ||
+             (regs.eip != (unsigned long)&instr[6]) ||
+             (regs.r9 != 0x0102030411223344UL) ||
+             (regs.rbx != 0x02030405) ||
+             ((regs.eflags & EFLAGS_MASK) != EFLAGS_ALWAYS_SET) ||
+             (res[0] + 1) ||
+             (res[1] != 0x02030405) ||
+             (res[2] + 1) )
+            goto fail;
+
+        regs.rip = (unsigned long)&instr[0];
+        regs.rbx <<= 8;
+        rc = x86_emulate(&ctxt, &emulops);
+        if ( (rc != X86EMUL_OKAY) ||
+             (regs.eip != (unsigned long)&instr[6]) ||
+             (regs.r9 != 0x0102030411223344UL) ||
+             (regs.rbx != 0x02030405) ||
+             ((regs.eflags & EFLAGS_MASK) !=
+              (X86_EFLAGS_CF | X86_EFLAGS_PF | X86_EFLAGS_SF |
+               EFLAGS_ALWAYS_SET)) ||
+             (res[0] + 1) ||
+             (res[1] != 0x13253749) ||
+             (res[2] + 1) )
+            goto fail;
+        printf("okay\n");
+    }
+    else
+        printf("skipped\n");
+
     emulops.write_segment = write_segment;
     emulops.write_msr     = write_msr;
 
--- a/tools/tests/x86_emulator/x86-emulate.h
+++ b/tools/tests/x86_emulator/x86-emulate.h
@@ -185,6 +185,7 @@ void wrpkru(unsigned int val);
 #define cpu_has_serialize  cp.feat.serialize
 #define cpu_has_avx_vnni   (cp.feat.avx_vnni && xcr0_mask(6))
 #define cpu_has_avx512_bf16 (cp.feat.avx512_bf16 && xcr0_mask(0xe6))
+#define cpu_has_cmpccxadd  cp.feat.cmpccxadd
 
 #define cpu_has_xgetbv1   (cpu_has_xsave && cp.xstate.xgetbv1)
 
--- a/xen/arch/x86/include/asm/cpufeature.h
+++ b/xen/arch/x86/include/asm/cpufeature.h
@@ -170,6 +170,7 @@ extern struct cpuinfo_x86 boot_cpu_data;
 /* CPUID level 0x00000007:1.eax */
 #define cpu_has_avx_vnni        boot_cpu_has(X86_FEATURE_AVX_VNNI)
 #define cpu_has_avx512_bf16     boot_cpu_has(X86_FEATURE_AVX512_BF16)
+#define cpu_has_cmpccxadd       boot_cpu_has(X86_FEATURE_CMPCCXADD)
 
 /* Synthesized. */
 #define cpu_has_arch_perfmon    boot_cpu_has(X86_FEATURE_ARCH_PERFMON)
--- a/xen/arch/x86/x86_emulate/decode.c
+++ b/xen/arch/x86/x86_emulate/decode.c
@@ -443,6 +443,7 @@ static const struct ext0f38_table {
     [0xcf] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0xdb] = { .simd_size = simd_packed_int, .two_op = 1 },
     [0xdc ... 0xdf] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
+    [0xe0 ... 0xef] = { .to_mem = 1 },
     [0xf0] = { .two_op = 1 },
     [0xf1] = { .to_mem = 1, .two_op = 1 },
     [0xf2 ... 0xf3] = {},
@@ -934,6 +935,8 @@ decode_0f38(struct x86_emulate_state *s,
             ctxt->opcode |= MASK_INSR(s->vex.pfx, X86EMUL_OPC_PFX_MASK);
         break;
 
+    case X86EMUL_OPC_VEX_66(0, 0xe0)
+     ... X86EMUL_OPC_VEX_66(0, 0xef): /* cmp<cc>xadd */
     case X86EMUL_OPC_VEX(0, 0xf2):    /* andn */
     case X86EMUL_OPC_VEX(0, 0xf3):    /* Grp 17 */
     case X86EMUL_OPC_VEX(0, 0xf5):    /* bzhi */
--- a/xen/arch/x86/x86_emulate/private.h
+++ b/xen/arch/x86/x86_emulate/private.h
@@ -265,6 +265,7 @@ struct x86_emulate_state {
         rmw_btc,
         rmw_btr,
         rmw_bts,
+        rmw_cmpccxadd,
         rmw_dec,
         rmw_inc,
         rmw_neg,
@@ -322,6 +323,8 @@ struct x86_emulate_state {
 
     unsigned long ip;
 
+    struct stub_exn *stub_exn;
+
 #ifndef NDEBUG
     /*
      * Track caller of x86_decode_insn() to spot missing as well as
@@ -593,6 +596,7 @@ amd_like(const struct x86_emulate_ctxt *
 #define vcpu_has_tsxldtrk()    (ctxt->cpuid->feat.tsxldtrk)
 #define vcpu_has_avx_vnni()    (ctxt->cpuid->feat.avx_vnni)
 #define vcpu_has_avx512_bf16() (ctxt->cpuid->feat.avx512_bf16)
+#define vcpu_has_cmpccxadd()   (ctxt->cpuid->feat.cmpccxadd)
 #define vcpu_has_lkgs()        (ctxt->cpuid->feat.lkgs)
 #define vcpu_has_wrmsrns()     (ctxt->cpuid->feat.wrmsrns)
 
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -6881,6 +6881,15 @@ x86_emulate(
 
 #endif /* !X86EMUL_NO_SIMD */
 
+    case X86EMUL_OPC_VEX_66(0x0f38, 0xe0)
+     ... X86EMUL_OPC_VEX_66(0x0f38, 0xef): /* cmp<cc>xadd r,r,m */
+        generate_exception_if(!mode_64bit() || dst.type != OP_MEM || vex.l,
+                              EXC_UD);
+        host_and_vcpu_must_have(cmpccxadd);
+        fail_if(!ops->rmw);
+        state->rmw = rmw_cmpccxadd;
+        break;
+
     case X86EMUL_OPC(0x0f38, 0xf0): /* movbe m,r */
     case X86EMUL_OPC(0x0f38, 0xf1): /* movbe r,m */
         vcpu_must_have(movbe);
@@ -7942,14 +7951,20 @@ x86_emulate(
     {
         ea.val = src.val;
         op_bytes = dst.bytes;
+        state->stub_exn = &stub_exn;
         rc = ops->rmw(dst.mem.seg, dst.mem.off, dst.bytes, &_regs.eflags,
                       state, ctxt);
+#ifdef __XEN__
+        if ( rc == X86EMUL_stub_failure )
+            goto emulation_stub_failure;
+#endif
         if ( rc != X86EMUL_OKAY )
             goto done;
 
         /* Some operations require a register to be written. */
         switch ( state->rmw )
         {
+        case rmw_cmpccxadd:
         case rmw_xchg:
         case rmw_xadd:
             switch ( dst.bytes )
@@ -8224,6 +8239,7 @@ int x86_emul_rmw(
     uint32_t *eflags,
     struct x86_emulate_state *state,
     struct x86_emulate_ctxt *ctxt)
+#define stub_exn (*state->stub_exn) /* for invoke_stub() */
 {
     unsigned long *dst = ptr;
 
@@ -8289,6 +8305,37 @@ int x86_emul_rmw(
 #undef BINOP
 #undef SHIFT
 
+#ifdef __x86_64__
+    case rmw_cmpccxadd:
+    {
+        struct x86_emulate_stub stub = {};
+        uint8_t *buf = get_stub(stub);
+        typeof(state->vex) *pvex = container_of(buf + 1, typeof(state->vex),
+                                                raw[0]);
+        unsigned long dummy;
+
+        buf[0] = 0xc4;
+        *pvex = state->vex;
+        pvex->b = 1;
+        pvex->r = 1;
+        pvex->reg = 0xf; /* rAX */
+        buf[3] = ctxt->opcode;
+        buf[4] = 0x11; /* reg=rDX r/m=(%RCX) */
+        buf[5] = 0xc3;
+
+        *eflags &= ~EFLAGS_MASK;
+        invoke_stub("",
+                    _POST_EFLAGS("[eflags]", "[mask]", "[tmp]"),
+                    "+m" (*dst), "+d" (state->ea.val),
+                    [tmp] "=&r" (dummy), [eflags] "+g" (*eflags)
+                    : "a" (*decode_vex_gpr(state->vex.reg, ctxt->regs, ctxt)),
+                      "c" (dst), [mask] "i" (EFLAGS_MASK));
+
+        put_stub(stub);
+        break;
+    }
+#endif
+
     case rmw_not:
         switch ( state->op_bytes )
         {
@@ -8384,7 +8431,13 @@ int x86_emul_rmw(
 #undef JCXZ
 
     return X86EMUL_OKAY;
+
+#if defined(__XEN__) && defined(__x86_64__)
+ emulation_stub_failure:
+    return X86EMUL_stub_failure;
+#endif
 }
+#undef stub_exn
 
 static void __init __maybe_unused build_assertions(void)
 {
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -278,6 +278,7 @@ XEN_CPUFEATURE(SSBD,          9*32+31) /
 /* Intel-defined CPU features, CPUID level 0x00000007:1.eax, word 10 */
 XEN_CPUFEATURE(AVX_VNNI,     10*32+ 4) /*A  AVX-VNNI Instructions */
 XEN_CPUFEATURE(AVX512_BF16,  10*32+ 5) /*A  AVX512 BFloat16 Instructions */
+XEN_CPUFEATURE(CMPCCXADD,    10*32+ 7) /*A  CMPccXADD Instructions */
 XEN_CPUFEATURE(FZRM,         10*32+10) /*A  Fast Zero-length REP MOVSB */
 XEN_CPUFEATURE(FSRS,         10*32+11) /*A  Fast Short REP STOSB */
 XEN_CPUFEATURE(FSRCS,        10*32+12) /*A  Fast Short REP CMPSB/SCASB */



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 5/9] x86emul: re-use new stub_exn field in state structure
  2023-04-04 14:48 [PATCH 0/9] x86emul: misc additions Jan Beulich
                   ` (3 preceding siblings ...)
  2023-04-04 14:52 ` [PATCH 4/9] x86emul: support CMPccXADD Jan Beulich
@ 2023-04-04 14:53 ` Jan Beulich
  2023-04-06 19:48   ` Andrew Cooper
  2023-04-04 14:53 ` [PATCH 6/9] x86emul: support AVX-IFMA insns Jan Beulich
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2023-04-04 14:53 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

This can now also be used to reduce the number of parameters
x86emul_fpu() needs to take.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
We could of course set the struct field once early in x86_emulate(), but
for now I think we're better off leaving it as NULL where not actually
needed.

--- a/xen/arch/x86/x86_emulate/fpu.c
+++ b/xen/arch/x86/x86_emulate/fpu.c
@@ -90,9 +90,8 @@ int x86emul_fpu(struct x86_emulate_state
                 unsigned int *insn_bytes,
                 enum x86_emulate_fpu_type *fpu_type,
 #define fpu_type (*fpu_type) /* for get_fpu() */
-                struct stub_exn *stub_exn,
-#define stub_exn (*stub_exn) /* for invoke_stub() */
                 mmval_t *mmvalp)
+#define stub_exn (*s->stub_exn) /* for invoke_stub() */
 {
     uint8_t b;
     int rc;
--- a/xen/arch/x86/x86_emulate/private.h
+++ b/xen/arch/x86/x86_emulate/private.h
@@ -764,7 +764,6 @@ int x86emul_fpu(struct x86_emulate_state
                 const struct x86_emulate_ops *ops,
                 unsigned int *insn_bytes,
                 enum x86_emulate_fpu_type *fpu_type,
-                struct stub_exn *stub_exn,
                 mmval_t *mmvalp);
 int x86emul_0f01(struct x86_emulate_state *s,
                  struct cpu_user_regs *regs,
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -2058,8 +2058,9 @@ x86_emulate(
 #ifndef X86EMUL_NO_FPU
     case 0x9b:  /* wait/fwait */
     case 0xd8 ... 0xdf: /* FPU */
+        state->stub_exn = &stub_exn;
         rc = x86emul_fpu(state, &_regs, &dst, &src, ctxt, ops,
-                         &insn_bytes, &fpu_type, &stub_exn, mmvalp);
+                         &insn_bytes, &fpu_type, mmvalp);
         goto dispatch_from_helper;
 #endif
 



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 6/9] x86emul: support AVX-IFMA insns
  2023-04-04 14:48 [PATCH 0/9] x86emul: misc additions Jan Beulich
                   ` (4 preceding siblings ...)
  2023-04-04 14:53 ` [PATCH 5/9] x86emul: re-use new stub_exn field in state structure Jan Beulich
@ 2023-04-04 14:53 ` Jan Beulich
  2023-04-06 20:51   ` Andrew Cooper
  2023-04-04 14:54 ` [PATCH 7/9] x86emul: support AVX-VNNI-INT8 Jan Beulich
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2023-04-04 14:53 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

As in a few cases before (in particular: AVX512_IFMA), since the insns
here and in particular their memory access patterns follow the usual
scheme, I didn't think it was necessary to add a contrived test
specifically for them.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/tools/libs/light/libxl_cpuid.c
+++ b/tools/libs/light/libxl_cpuid.c
@@ -239,6 +239,7 @@ int libxl_cpuid_parse_config(libxl_cpuid
         {"fred",         0x00000007,  1, CPUID_REG_EAX, 17,  1},
         {"lkgs",         0x00000007,  1, CPUID_REG_EAX, 18,  1},
         {"wrmsrns",      0x00000007,  1, CPUID_REG_EAX, 19,  1},
+        {"avx-ifma",     0x00000007,  1, CPUID_REG_EAX, 23,  1},
 
         {"cet-sss",      0x00000007,  1, CPUID_REG_EDX, 18,  1},
 
--- a/tools/misc/xen-cpuid.c
+++ b/tools/misc/xen-cpuid.c
@@ -193,6 +193,8 @@ static const char *const str_7a1[32] =
 
     /* 16 */                [17] = "fred",
     [18] = "lkgs",          [19] = "wrmsrns",
+
+    /* 22 */                [23] = "avx-ifma",
 };
 
 static const char *const str_e21a[32] =
--- a/tools/tests/x86_emulator/predicates.c
+++ b/tools/tests/x86_emulator/predicates.c
@@ -1372,6 +1372,8 @@ static const struct vex {
     { { 0xad }, 2, T, R, pfx_66, Wn, LIG }, /* vnmadd213s{s,d} */
     { { 0xae }, 2, T, R, pfx_66, Wn, Ln }, /* vnmsub213p{s,d} */
     { { 0xaf }, 2, T, R, pfx_66, Wn, LIG }, /* vnmsub213s{s,d} */
+    { { 0xb4 }, 2, T, R, pfx_66, W1, Ln }, /* vpmadd52luq */
+    { { 0xb5 }, 2, T, R, pfx_66, W1, Ln }, /* vpmadd52huq */
     { { 0xb6 }, 2, T, R, pfx_66, Wn, Ln }, /* vmaddsub231p{s,d} */
     { { 0xb7 }, 2, T, R, pfx_66, Wn, Ln }, /* vmsubadd231p{s,d} */
     { { 0xb8 }, 2, T, R, pfx_66, Wn, Ln }, /* vmadd231p{s,d} */
--- a/tools/tests/x86_emulator/x86-emulate.h
+++ b/tools/tests/x86_emulator/x86-emulate.h
@@ -186,6 +186,7 @@ void wrpkru(unsigned int val);
 #define cpu_has_avx_vnni   (cp.feat.avx_vnni && xcr0_mask(6))
 #define cpu_has_avx512_bf16 (cp.feat.avx512_bf16 && xcr0_mask(0xe6))
 #define cpu_has_cmpccxadd  cp.feat.cmpccxadd
+#define cpu_has_avx_ifma   (cp.feat.avx_ifma && xcr0_mask(6))
 
 #define cpu_has_xgetbv1   (cpu_has_xsave && cp.xstate.xgetbv1)
 
--- a/xen/arch/x86/include/asm/cpufeature.h
+++ b/xen/arch/x86/include/asm/cpufeature.h
@@ -171,6 +171,7 @@ extern struct cpuinfo_x86 boot_cpu_data;
 #define cpu_has_avx_vnni        boot_cpu_has(X86_FEATURE_AVX_VNNI)
 #define cpu_has_avx512_bf16     boot_cpu_has(X86_FEATURE_AVX512_BF16)
 #define cpu_has_cmpccxadd       boot_cpu_has(X86_FEATURE_CMPCCXADD)
+#define cpu_has_avx_ifma        boot_cpu_has(X86_FEATURE_AVX_IFMA)
 
 /* Synthesized. */
 #define cpu_has_arch_perfmon    boot_cpu_has(X86_FEATURE_ARCH_PERFMON)
--- a/xen/arch/x86/x86_emulate/private.h
+++ b/xen/arch/x86/x86_emulate/private.h
@@ -599,6 +599,7 @@ amd_like(const struct x86_emulate_ctxt *
 #define vcpu_has_cmpccxadd()   (ctxt->cpuid->feat.cmpccxadd)
 #define vcpu_has_lkgs()        (ctxt->cpuid->feat.lkgs)
 #define vcpu_has_wrmsrns()     (ctxt->cpuid->feat.wrmsrns)
+#define vcpu_has_avx_ifma()    (ctxt->cpuid->feat.avx_ifma)
 
 #define vcpu_must_have(feat) \
     generate_exception_if(!vcpu_has_##feat(), X86_EXC_UD)
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -6727,6 +6727,12 @@ x86_emulate(
         break;
     }
 
+    case X86EMUL_OPC_VEX_66(0x0f38, 0xb4): /* vpmadd52luq [xy]mm/mem,[xy]mm,[xy]mm */
+    case X86EMUL_OPC_VEX_66(0x0f38, 0xb5): /* vpmadd52huq [xy]mm/mem,[xy]mm,[xy]mm */
+        host_and_vcpu_must_have(avx_ifma);
+        generate_exception_if(!vex.w, EXC_UD);
+        goto simd_0f_ymm;
+
     case X86EMUL_OPC_EVEX_66(0x0f38, 0xb4): /* vpmadd52luq [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f38, 0xb5): /* vpmadd52huq [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
         host_and_vcpu_must_have(avx512_ifma);
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -285,6 +285,7 @@ XEN_CPUFEATURE(FSRCS,        10*32+12) /
 XEN_CPUFEATURE(FRED,         10*32+17) /*   Flexible Return and Event Delivery */
 XEN_CPUFEATURE(LKGS,         10*32+18) /*S  Load Kernel GS Base */
 XEN_CPUFEATURE(WRMSRNS,      10*32+19) /*A  WRMSR Non-Serialising */
+XEN_CPUFEATURE(AVX_IFMA,     10*32+23) /*A  AVX-IFMA Instructions */
 
 /* AMD-defined CPU features, CPUID level 0x80000021.eax, word 11 */
 XEN_CPUFEATURE(LFENCE_DISPATCH,    11*32+ 2) /*A  LFENCE always serializing */
--- a/xen/tools/gen-cpuid.py
+++ b/xen/tools/gen-cpuid.py
@@ -254,7 +254,7 @@ def crunch_numbers(state):
         # feature flags.  If want to use AVX512, AVX2 must be supported and
         # enabled.  Certain later extensions, acting on 256-bit vectors of
         # integers, better depend on AVX2 than AVX.
-        AVX2: [AVX512F, VAES, VPCLMULQDQ, AVX_VNNI],
+        AVX2: [AVX512F, VAES, VPCLMULQDQ, AVX_VNNI, AVX_IFMA],
 
         # AVX512F is taken to mean hardware support for 512bit registers
         # (which in practice depends on the EVEX prefix to encode) as well



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 7/9] x86emul: support AVX-VNNI-INT8
  2023-04-04 14:48 [PATCH 0/9] x86emul: misc additions Jan Beulich
                   ` (5 preceding siblings ...)
  2023-04-04 14:53 ` [PATCH 6/9] x86emul: support AVX-IFMA insns Jan Beulich
@ 2023-04-04 14:54 ` Jan Beulich
  2023-04-06 21:01   ` Andrew Cooper
  2023-04-04 14:54 ` [PATCH 8/9] x86emul: support AVX-NE-CONVERT insns Jan Beulich
  2023-04-04 14:55 ` [PATCH 9/9] x86emul+VMX: support {RD,WR}MSRLIST Jan Beulich
  8 siblings, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2023-04-04 14:54 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

These are close relatives of the AVX-VNNI ISA extension. Since the insns
here and in particular their memory access patterns follow the usual
scheme (and especially the byte variants of AVX-VNNI), I didn't think it
was necessary to add a contrived test specifically for them.

While making the addition also re-wire AVX-VNNI's handling to
simd_0f_ymm: There's no reason to check the AVX feature alongside the
one actually of interest (there are a few features where two checks are
actually necessary, e.g. GFNI+AVX, but this isn't the case here).

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/tools/libs/light/libxl_cpuid.c
+++ b/tools/libs/light/libxl_cpuid.c
@@ -241,6 +241,7 @@ int libxl_cpuid_parse_config(libxl_cpuid
         {"wrmsrns",      0x00000007,  1, CPUID_REG_EAX, 19,  1},
         {"avx-ifma",     0x00000007,  1, CPUID_REG_EAX, 23,  1},
 
+        {"avx-vnni-int8",0x00000007,  1, CPUID_REG_EDX,  4,  1},
         {"cet-sss",      0x00000007,  1, CPUID_REG_EDX, 18,  1},
 
         {"intel-psfd",   0x00000007,  2, CPUID_REG_EDX,  0,  1},
--- a/tools/misc/xen-cpuid.c
+++ b/tools/misc/xen-cpuid.c
@@ -214,6 +214,8 @@ static const char *const str_7c1[32] =
 
 static const char *const str_7d1[32] =
 {
+    [ 4] = "avx-vnni-int8",
+
     [18] = "cet-sss",
 };
 
--- a/tools/tests/x86_emulator/predicates.c
+++ b/tools/tests/x86_emulator/predicates.c
@@ -1337,8 +1337,14 @@ static const struct vex {
     { { 0x45 }, 2, T, R, pfx_66, Wn, Ln }, /* vpsrlv{d,q} */
     { { 0x46 }, 2, T, R, pfx_66, W0, Ln }, /* vpsravd */
     { { 0x47 }, 2, T, R, pfx_66, Wn, Ln }, /* vpsllv{d,q} */
+    { { 0x50 }, 2, T, R, pfx_no, W0, Ln }, /* vpdpbuud */
     { { 0x50 }, 2, T, R, pfx_66, W0, Ln }, /* vpdpbusd */
+    { { 0x50 }, 2, T, R, pfx_f3, W0, Ln }, /* vpdpbsud */
+    { { 0x50 }, 2, T, R, pfx_f2, W0, Ln }, /* vpdpbssd */
+    { { 0x51 }, 2, T, R, pfx_no, W0, Ln }, /* vpdpbuuds */
     { { 0x51 }, 2, T, R, pfx_66, W0, Ln }, /* vpdpbusds */
+    { { 0x51 }, 2, T, R, pfx_f3, W0, Ln }, /* vpdpbsuds */
+    { { 0x51 }, 2, T, R, pfx_f2, W0, Ln }, /* vpdpbssds */
     { { 0x52 }, 2, T, R, pfx_66, W0, Ln }, /* vpdpwssd */
     { { 0x53 }, 2, T, R, pfx_66, W0, Ln }, /* vpdpwssds */
     { { 0x58 }, 2, T, R, pfx_66, W0, Ln }, /* vpbroadcastd */
--- a/tools/tests/x86_emulator/x86-emulate.h
+++ b/tools/tests/x86_emulator/x86-emulate.h
@@ -187,6 +187,7 @@ void wrpkru(unsigned int val);
 #define cpu_has_avx512_bf16 (cp.feat.avx512_bf16 && xcr0_mask(0xe6))
 #define cpu_has_cmpccxadd  cp.feat.cmpccxadd
 #define cpu_has_avx_ifma   (cp.feat.avx_ifma && xcr0_mask(6))
+#define cpu_has_avx_vnni_int8 (cp.feat.avx_vnni_int8 && xcr0_mask(6))
 
 #define cpu_has_xgetbv1   (cpu_has_xsave && cp.xstate.xgetbv1)
 
--- a/xen/arch/x86/include/asm/cpufeature.h
+++ b/xen/arch/x86/include/asm/cpufeature.h
@@ -173,6 +173,9 @@ extern struct cpuinfo_x86 boot_cpu_data;
 #define cpu_has_cmpccxadd       boot_cpu_has(X86_FEATURE_CMPCCXADD)
 #define cpu_has_avx_ifma        boot_cpu_has(X86_FEATURE_AVX_IFMA)
 
+/* CPUID level 0x00000007:1.edx */
+#define cpu_has_avx_vnni_int8   boot_cpu_has(X86_FEATURE_AVX_VNNI_INT8)
+
 /* Synthesized. */
 #define cpu_has_arch_perfmon    boot_cpu_has(X86_FEATURE_ARCH_PERFMON)
 #define cpu_has_cpuid_faulting  boot_cpu_has(X86_FEATURE_CPUID_FAULTING)
--- a/xen/arch/x86/x86_emulate/private.h
+++ b/xen/arch/x86/x86_emulate/private.h
@@ -600,6 +600,7 @@ amd_like(const struct x86_emulate_ctxt *
 #define vcpu_has_lkgs()        (ctxt->cpuid->feat.lkgs)
 #define vcpu_has_wrmsrns()     (ctxt->cpuid->feat.wrmsrns)
 #define vcpu_has_avx_ifma()    (ctxt->cpuid->feat.avx_ifma)
+#define vcpu_has_avx_vnni_int8() (ctxt->cpuid->feat.avx_vnni_int8)
 
 #define vcpu_must_have(feat) \
     generate_exception_if(!vcpu_has_##feat(), X86_EXC_UD)
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -6077,13 +6077,23 @@ x86_emulate(
         generate_exception_if(vex.l, EXC_UD);
         goto simd_0f_avx;
 
+    case X86EMUL_OPC_VEX   (0x0f38, 0x50): /* vpdpbuud [xy]mm/mem,[xy]mm,[xy]mm */
+    case X86EMUL_OPC_VEX_F3(0x0f38, 0x50): /* vpdpbsud [xy]mm/mem,[xy]mm,[xy]mm */
+    case X86EMUL_OPC_VEX_F2(0x0f38, 0x50): /* vpdpbssd [xy]mm/mem,[xy]mm,[xy]mm */
+    case X86EMUL_OPC_VEX   (0x0f38, 0x51): /* vpdpbuuds [xy]mm/mem,[xy]mm,[xy]mm */
+    case X86EMUL_OPC_VEX_F3(0x0f38, 0x51): /* vpdpbsuds [xy]mm/mem,[xy]mm,[xy]mm */
+    case X86EMUL_OPC_VEX_F2(0x0f38, 0x51): /* vpdpbssds [xy]mm/mem,[xy]mm,[xy]mm */
+        host_and_vcpu_must_have(avx_vnni_int8);
+        generate_exception_if(vex.w, EXC_UD);
+        goto simd_0f_ymm;
+
     case X86EMUL_OPC_VEX_66(0x0f38, 0x50): /* vpdpbusd [xy]mm/mem,[xy]mm,[xy]mm */
     case X86EMUL_OPC_VEX_66(0x0f38, 0x51): /* vpdpbusds [xy]mm/mem,[xy]mm,[xy]mm */
     case X86EMUL_OPC_VEX_66(0x0f38, 0x52): /* vpdpwssd [xy]mm/mem,[xy]mm,[xy]mm */
     case X86EMUL_OPC_VEX_66(0x0f38, 0x53): /* vpdpwssds [xy]mm/mem,[xy]mm,[xy]mm */
         host_and_vcpu_must_have(avx_vnni);
         generate_exception_if(vex.w, EXC_UD);
-        goto simd_0f_avx;
+        goto simd_0f_ymm;
 
     case X86EMUL_OPC_EVEX_66(0x0f38, 0x50): /* vpdpbusd [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f38, 0x51): /* vpdpbusds [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -305,6 +305,7 @@ XEN_CPUFEATURE(MCDT_NO,            13*32
 /* Intel-defined CPU features, CPUID level 0x00000007:1.ecx, word 14 */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:1.edx, word 15 */
+XEN_CPUFEATURE(AVX_VNNI_INT8,      15*32+ 4) /*A  AVX-VNNI-INT8 Instructions */
 XEN_CPUFEATURE(CET_SSS,            15*32+18) /*   CET Supervisor Shadow Stacks safe to use */
 
 #endif /* XEN_CPUFEATURE */
--- a/xen/tools/gen-cpuid.py
+++ b/xen/tools/gen-cpuid.py
@@ -254,7 +254,7 @@ def crunch_numbers(state):
         # feature flags.  If want to use AVX512, AVX2 must be supported and
         # enabled.  Certain later extensions, acting on 256-bit vectors of
         # integers, better depend on AVX2 than AVX.
-        AVX2: [AVX512F, VAES, VPCLMULQDQ, AVX_VNNI, AVX_IFMA],
+        AVX2: [AVX512F, VAES, VPCLMULQDQ, AVX_VNNI, AVX_IFMA, AVX_VNNI_INT8],
 
         # AVX512F is taken to mean hardware support for 512bit registers
         # (which in practice depends on the EVEX prefix to encode) as well



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 8/9] x86emul: support AVX-NE-CONVERT insns
  2023-04-04 14:48 [PATCH 0/9] x86emul: misc additions Jan Beulich
                   ` (6 preceding siblings ...)
  2023-04-04 14:54 ` [PATCH 7/9] x86emul: support AVX-VNNI-INT8 Jan Beulich
@ 2023-04-04 14:54 ` Jan Beulich
  2023-04-06 21:22   ` Andrew Cooper
  2023-04-04 14:55 ` [PATCH 9/9] x86emul+VMX: support {RD,WR}MSRLIST Jan Beulich
  8 siblings, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2023-04-04 14:54 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

Matching what was done earlier, explicit tests are added only for
irregular insn / memory access patterns.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
SDE: -grr or -srf

--- a/tools/libs/light/libxl_cpuid.c
+++ b/tools/libs/light/libxl_cpuid.c
@@ -242,6 +242,7 @@ int libxl_cpuid_parse_config(libxl_cpuid
         {"avx-ifma",     0x00000007,  1, CPUID_REG_EAX, 23,  1},
 
         {"avx-vnni-int8",0x00000007,  1, CPUID_REG_EDX,  4,  1},
+        {"avx-ne-convert",0x00000007, 1, CPUID_REG_EDX,  5,  1},
         {"cet-sss",      0x00000007,  1, CPUID_REG_EDX, 18,  1},
 
         {"intel-psfd",   0x00000007,  2, CPUID_REG_EDX,  0,  1},
--- a/tools/misc/xen-cpuid.c
+++ b/tools/misc/xen-cpuid.c
@@ -214,7 +214,7 @@ static const char *const str_7c1[32] =
 
 static const char *const str_7d1[32] =
 {
-    [ 4] = "avx-vnni-int8",
+    [ 4] = "avx-vnni-int8", [ 5] = "avx-ne-convert",
 
     [18] = "cet-sss",
 };
--- a/tools/tests/x86_emulator/predicates.c
+++ b/tools/tests/x86_emulator/predicates.c
@@ -1350,6 +1350,7 @@ static const struct vex {
     { { 0x58 }, 2, T, R, pfx_66, W0, Ln }, /* vpbroadcastd */
     { { 0x59 }, 2, T, R, pfx_66, W0, Ln }, /* vpbroadcastq */
     { { 0x5a }, 2, F, R, pfx_66, W0, L1 }, /* vbroadcasti128 */
+    { { 0x72 }, 2, T, R, pfx_f3, W0, Ln }, /* vcvtneps2bf16 */
     { { 0x78 }, 2, T, R, pfx_66, W0, Ln }, /* vpbroadcastb */
     { { 0x79 }, 2, T, R, pfx_66, W0, Ln }, /* vpbroadcastw */
     { { 0x8c }, 2, F, R, pfx_66, Wn, Ln }, /* vpmaskmov{d,q} */
@@ -1378,6 +1379,12 @@ static const struct vex {
     { { 0xad }, 2, T, R, pfx_66, Wn, LIG }, /* vnmadd213s{s,d} */
     { { 0xae }, 2, T, R, pfx_66, Wn, Ln }, /* vnmsub213p{s,d} */
     { { 0xaf }, 2, T, R, pfx_66, Wn, LIG }, /* vnmsub213s{s,d} */
+    { { 0xb0 }, 2, F, R, pfx_no, W0, Ln }, /* vcvtneoph2ps */
+    { { 0xb0 }, 2, F, R, pfx_66, W0, Ln }, /* vcvtneeph2ps */
+    { { 0xb0 }, 2, F, R, pfx_f3, W0, Ln }, /* vcvtneebf162ps */
+    { { 0xb0 }, 2, F, R, pfx_f2, W0, Ln }, /* vcvtneobf162ps */
+    { { 0xb1 }, 2, F, R, pfx_66, W0, Ln }, /* vbcstnesh2ps */
+    { { 0xb1 }, 2, F, R, pfx_f3, W0, Ln }, /* vbcstnebf162ps */
     { { 0xb4 }, 2, T, R, pfx_66, W1, Ln }, /* vpmadd52luq */
     { { 0xb5 }, 2, T, R, pfx_66, W1, Ln }, /* vpmadd52huq */
     { { 0xb6 }, 2, T, R, pfx_66, Wn, Ln }, /* vmaddsub231p{s,d} */
--- a/tools/tests/x86_emulator/test_x86_emulator.c
+++ b/tools/tests/x86_emulator/test_x86_emulator.c
@@ -4572,6 +4572,39 @@ int main(int argc, char **argv)
     else
         printf("skipped\n");
 
+    printf("%-40s", "Testing vbcstnebf162ps 2(%ecx),%ymm3...");
+    if ( stack_exec && cpu_has_avx_ne_convert )
+    {
+        decl_insn(vbcstnebf162ps);
+
+        asm volatile ( /* vbcstnebf162ps 2(%0), %%ymm3 */
+                       put_insn(vbcstnebf162ps,
+                                ".byte 0xc4, 0xe2, 0x7e, 0xb1, 0x59, 0x02 ")
+                       :: "c" (NULL) );
+
+        res[0] = 0x43210000;
+        regs.ecx = (unsigned long)res;
+        set_insn(vbcstnebf162ps);
+        bytes_read  = 0;
+        rc = x86_emulate(&ctxt, &emulops);
+        if ( rc != X86EMUL_OKAY || !check_eip(vbcstnebf162ps) ||
+             bytes_read != 2 )
+            goto fail;
+
+        asm volatile ( "vbroadcastss %1, %%ymm2;"
+                       "vsubps %%ymm3, %%ymm2, %%ymm1;"
+                       "vptest %%ymm1, %%ymm1;"
+                       "setc %b0; setz %h0"
+                       : "=&Q" (rc)
+                       : "m" (res[0]) );
+        if ( (rc & 0xffff) != 0x0101 )
+            goto fail;
+
+        printf("okay\n");
+    }
+    else
+        printf("skipped\n");
+
     printf("%-40s", "Testing stmxcsr (%edx)...");
     if ( cpu_has_sse )
     {
--- a/tools/tests/x86_emulator/x86-emulate.h
+++ b/tools/tests/x86_emulator/x86-emulate.h
@@ -188,6 +188,7 @@ void wrpkru(unsigned int val);
 #define cpu_has_cmpccxadd  cp.feat.cmpccxadd
 #define cpu_has_avx_ifma   (cp.feat.avx_ifma && xcr0_mask(6))
 #define cpu_has_avx_vnni_int8 (cp.feat.avx_vnni_int8 && xcr0_mask(6))
+#define cpu_has_avx_ne_convert (cp.feat.avx_ne_convert && xcr0_mask(6))
 
 #define cpu_has_xgetbv1   (cpu_has_xsave && cp.xstate.xgetbv1)
 
--- a/xen/arch/x86/include/asm/cpufeature.h
+++ b/xen/arch/x86/include/asm/cpufeature.h
@@ -175,6 +175,7 @@ extern struct cpuinfo_x86 boot_cpu_data;
 
 /* CPUID level 0x00000007:1.edx */
 #define cpu_has_avx_vnni_int8   boot_cpu_has(X86_FEATURE_AVX_VNNI_INT8)
+#define cpu_has_avx_ne_convert  boot_cpu_has(X86_FEATURE_AVX_NE_CONVERT)
 
 /* Synthesized. */
 #define cpu_has_arch_perfmon    boot_cpu_has(X86_FEATURE_ARCH_PERFMON)
--- a/xen/arch/x86/x86_emulate/decode.c
+++ b/xen/arch/x86/x86_emulate/decode.c
@@ -423,6 +423,8 @@ static const struct ext0f38_table {
     [0xad] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq },
     [0xae] = { .simd_size = simd_packed_fp, .d8s = d8s_vl },
     [0xaf] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq },
+    [0xb0] = { .simd_size = simd_other, .two_op = 1 },
+    [0xb1] = { .simd_size = simd_other, .two_op = 1 },
     [0xb4 ... 0xb5] = { .simd_size = simd_packed_int, .d8s = d8s_vl },
     [0xb6 ... 0xb8] = { .simd_size = simd_packed_fp, .d8s = d8s_vl },
     [0xb9] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq },
--- a/xen/arch/x86/x86_emulate/private.h
+++ b/xen/arch/x86/x86_emulate/private.h
@@ -601,6 +601,7 @@ amd_like(const struct x86_emulate_ctxt *
 #define vcpu_has_wrmsrns()     (ctxt->cpuid->feat.wrmsrns)
 #define vcpu_has_avx_ifma()    (ctxt->cpuid->feat.avx_ifma)
 #define vcpu_has_avx_vnni_int8() (ctxt->cpuid->feat.avx_vnni_int8)
+#define vcpu_has_avx_ne_convert() (ctxt->cpuid->feat.avx_ne_convert)
 
 #define vcpu_must_have(feat) \
     generate_exception_if(!vcpu_has_##feat(), X86_EXC_UD)
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -6208,6 +6208,19 @@ x86_emulate(
         host_and_vcpu_must_have(avx512_vbmi2);
         goto avx512f_no_sae;
 
+    case X86EMUL_OPC_VEX   (0x0f38, 0xb0): /* vcvtneoph2ps mem,[xy]mm */
+    case X86EMUL_OPC_VEX_66(0x0f38, 0xb0): /* vcvtneeph2ps mem,[xy]mm */
+    case X86EMUL_OPC_VEX_F3(0x0f38, 0xb0): /* vcvtneebf162ps mem,[xy]mm */
+    case X86EMUL_OPC_VEX_F2(0x0f38, 0xb0): /* vcvtneobf162ps mem,[xy]mm */
+        generate_exception_if(ea.type != OP_MEM, EXC_UD);
+        /* fall through */
+    case X86EMUL_OPC_VEX_F3(0x0f38, 0x72): /* vcvtneps2bf16 [xy]mm/mem,xmm */
+        host_and_vcpu_must_have(avx_ne_convert);
+        generate_exception_if(vex.w, EXC_UD);
+        d |= TwoOp;
+        op_bytes = 16 << vex.l;
+        goto simd_0f_ymm;
+
     case X86EMUL_OPC_EVEX_66(0x0f38, 0x75): /* vpermi2{b,w} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f38, 0x7d): /* vpermt2{b,w} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
     case X86EMUL_OPC_EVEX_66(0x0f38, 0x8d): /* vperm{b,w} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
@@ -6737,6 +6750,13 @@ x86_emulate(
         break;
     }
 
+    case X86EMUL_OPC_VEX_66(0x0f38, 0xb1): /* vbcstnesh2ps mem,[xy]mm */
+    case X86EMUL_OPC_VEX_F3(0x0f38, 0xb1): /* vbcstnebf162ps mem,[xy]mm */
+        host_and_vcpu_must_have(avx_ne_convert);
+        generate_exception_if(vex.w || ea.type != OP_MEM, EXC_UD);
+        op_bytes = 2;
+        goto simd_0f_ymm;
+
     case X86EMUL_OPC_VEX_66(0x0f38, 0xb4): /* vpmadd52luq [xy]mm/mem,[xy]mm,[xy]mm */
     case X86EMUL_OPC_VEX_66(0x0f38, 0xb5): /* vpmadd52huq [xy]mm/mem,[xy]mm,[xy]mm */
         host_and_vcpu_must_have(avx_ifma);
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -306,6 +306,7 @@ XEN_CPUFEATURE(MCDT_NO,            13*32
 
 /* Intel-defined CPU features, CPUID level 0x00000007:1.edx, word 15 */
 XEN_CPUFEATURE(AVX_VNNI_INT8,      15*32+ 4) /*A  AVX-VNNI-INT8 Instructions */
+XEN_CPUFEATURE(AVX_NE_CONVERT,     15*32+ 5) /*A  AVX-NE-CONVERT Instructions */
 XEN_CPUFEATURE(CET_SSS,            15*32+18) /*   CET Supervisor Shadow Stacks safe to use */
 
 #endif /* XEN_CPUFEATURE */
--- a/xen/tools/gen-cpuid.py
+++ b/xen/tools/gen-cpuid.py
@@ -232,7 +232,7 @@ def crunch_numbers(state):
         # for the XOP prefix).  VEX/XOP-encoded GPR instructions, such as
         # those from the BMI{1,2}, TBM and LWP sets function fine in the
         # absence of any enabled xstate.
-        AVX: [FMA, FMA4, F16C, AVX2, XOP],
+        AVX: [FMA, FMA4, F16C, AVX2, XOP, AVX_NE_CONVERT],
 
         # This dependency exists solely for the shadow pagetable code.  If the
         # host doesn't have NX support, the shadow pagetable code can't handle



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 9/9] x86emul+VMX: support {RD,WR}MSRLIST
  2023-04-04 14:48 [PATCH 0/9] x86emul: misc additions Jan Beulich
                   ` (7 preceding siblings ...)
  2023-04-04 14:54 ` [PATCH 8/9] x86emul: support AVX-NE-CONVERT insns Jan Beulich
@ 2023-04-04 14:55 ` Jan Beulich
  2023-04-04 14:57   ` Jan Beulich
  2023-04-06 22:03   ` Andrew Cooper
  8 siblings, 2 replies; 26+ messages in thread
From: Jan Beulich @ 2023-04-04 14:55 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

These are "compound" instructions to issue a series of RDMSR / WRMSR
respectively. In the emulator we can therefore implement them by using
the existing msr_{read,write}() hooks. The memory accesses utilize that
the HVM ->read() / ->write() hooks are already linear-address
(x86_seg_none) aware (by way of hvmemul_virtual_to_linear() handling
this case).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
TODO: Use VMX tertiary execution control (once bit is known; see
      //todo-s) and then further adjust cpufeatureset.h.

RFC: In vmx_vmexit_handler() handling is forwarded to the emulator
     blindly. Alternatively we could consult the exit qualification and
     process just a single MSR at a time (without involving the
     emulator), exiting back to the guest after every iteration. (I
     don't think a mix of both models makes a lot of sense.)

RFC: For PV priv_op_ops would need to gain proper read/write hooks,
     which doesn't look desirable (albeit there we could refuse to
     handle anything else than x86_seg_none); we may want to consider to
     instead not support the feature for PV guests, requiring e.g. Linux
     to process the lists in new pvops hooks.

RFC: I wasn't sure whether to add preemption checks to the loops -
     thoughts?

With the VMX side of the spec still unclear (tertiary execution control
bit unspecified in ISE 046) we can't enable the insn yet for (HVM) guest
use. The precise behavior of MSR_BARRIER is also not spelled out, so the
(minimal) implementation is a guess for now.

--- a/tools/libs/light/libxl_cpuid.c
+++ b/tools/libs/light/libxl_cpuid.c
@@ -240,6 +240,7 @@ int libxl_cpuid_parse_config(libxl_cpuid
         {"lkgs",         0x00000007,  1, CPUID_REG_EAX, 18,  1},
         {"wrmsrns",      0x00000007,  1, CPUID_REG_EAX, 19,  1},
         {"avx-ifma",     0x00000007,  1, CPUID_REG_EAX, 23,  1},
+        {"msrlist",      0x00000007,  1, CPUID_REG_EAX, 27,  1},
 
         {"avx-vnni-int8",0x00000007,  1, CPUID_REG_EDX,  4,  1},
         {"avx-ne-convert",0x00000007, 1, CPUID_REG_EDX,  5,  1},
--- a/tools/misc/xen-cpuid.c
+++ b/tools/misc/xen-cpuid.c
@@ -195,6 +195,8 @@ static const char *const str_7a1[32] =
     [18] = "lkgs",          [19] = "wrmsrns",
 
     /* 22 */                [23] = "avx-ifma",
+
+    /* 26 */                [27] = "msrlist",
 };
 
 static const char *const str_e21a[32] =
--- a/tools/tests/x86_emulator/predicates.c
+++ b/tools/tests/x86_emulator/predicates.c
@@ -342,6 +342,8 @@ static const struct {
     { { 0x01, 0xc4 }, { 2, 2 }, F, N }, /* vmxoff */
     { { 0x01, 0xc5 }, { 2, 2 }, F, N }, /* pconfig */
     { { 0x01, 0xc6 }, { 2, 2 }, F, N }, /* wrmsrns */
+    { { 0x01, 0xc6 }, { 0, 2 }, F, W, pfx_f2 }, /* rdmsrlist */
+    { { 0x01, 0xc6 }, { 0, 2 }, F, R, pfx_f3 }, /* wrmsrlist */
     { { 0x01, 0xc8 }, { 2, 2 }, F, N }, /* monitor */
     { { 0x01, 0xc9 }, { 2, 2 }, F, N }, /* mwait */
     { { 0x01, 0xca }, { 2, 2 }, F, N }, /* clac */
--- a/tools/tests/x86_emulator/test_x86_emulator.c
+++ b/tools/tests/x86_emulator/test_x86_emulator.c
@@ -589,6 +589,7 @@ static int read(
     default:
         if ( !is_x86_user_segment(seg) )
             return X86EMUL_UNHANDLEABLE;
+    case x86_seg_none:
         bytes_read += bytes;
         break;
     }
@@ -619,7 +620,7 @@ static int write(
     if ( verbose )
         printf("** %s(%u, %p,, %u,)\n", __func__, seg, (void *)offset, bytes);
 
-    if ( !is_x86_user_segment(seg) )
+    if ( !is_x86_user_segment(seg) && seg != x86_seg_none )
         return X86EMUL_UNHANDLEABLE;
     memcpy((void *)offset, p_data, bytes);
     return X86EMUL_OKAY;
@@ -711,6 +712,10 @@ static int read_msr(
 {
     switch ( reg )
     {
+    case 0x0000002f: /* BARRIER */
+        *val = 0;
+        return X86EMUL_OKAY;
+
     case 0xc0000080: /* EFER */
         *val = ctxt->addr_size > 32 ? 0x500 /* LME|LMA */ : 0;
         return X86EMUL_OKAY;
@@ -1499,9 +1504,53 @@ int main(int argc, char **argv)
          (gs_base != 0x0000111122224444UL) ||
          gs_base_shadow )
         goto fail;
+    printf("okay\n");
 
     cp.extd.nscb = i;
     emulops.write_segment = NULL;
+
+    printf("%-40s", "Testing rdmsrlist...");
+    instr[0] = 0xf2; instr[1] = 0x0f; instr[2] = 0x01; instr[3] = 0xc6;
+    regs.rip = (unsigned long)&instr[0];
+    regs.rsi = (unsigned long)(res + 0x80);
+    regs.rdi = (unsigned long)(res + 0x80 + 0x40 * 2);
+    regs.rcx = 0x0002000100008000UL;
+    gs_base_shadow = 0x0000222244446666UL;
+    memset(res + 0x80, ~0, 0x40 * 8 * 2);
+    res[0x80 + 0x0f * 2] = 0xc0000101; /* GS_BASE */
+    res[0x80 + 0x0f * 2 + 1] = 0;
+    res[0x80 + 0x20 * 2] = 0xc0000102; /* SHADOW_GS_BASE */
+    res[0x80 + 0x20 * 2 + 1] = 0;
+    res[0x80 + 0x31 * 2] = 0x2f; /* BARRIER */
+    res[0x80 + 0x31 * 2 + 1] = 0;
+    rc = x86_emulate(&ctxt, &emulops);
+    if ( (rc != X86EMUL_OKAY) ||
+         (regs.rip != (unsigned long)&instr[4]) ||
+         regs.rcx ||
+         (res[0x80 + (0x40 + 0x0f) * 2] != (unsigned int)gs_base) ||
+         (res[0x80 + (0x40 + 0x0f) * 2 + 1] != (gs_base >> (8 * sizeof(int)))) ||
+         (res[0x80 + (0x40 + 0x20) * 2] != (unsigned int)gs_base_shadow) ||
+         (res[0x80 + (0x40 + 0x20) * 2 + 1] != (gs_base_shadow >> (8 * sizeof(int)))) ||
+         res[0x80 + (0x40 + 0x31) * 2] || res[0x80 + (0x40 + 0x31) * 2 + 1] )
+        goto fail;
+    printf("okay\n");
+
+    printf("%-40s", "Testing wrmsrlist...");
+    instr[0] = 0xf3; instr[1] = 0x0f; instr[2] = 0x01; instr[3] = 0xc6;
+    regs.eip = (unsigned long)&instr[0];
+    regs.rsi -= 0x11 * 8;
+    regs.rdi -= 0x11 * 8;
+    regs.rcx = 0x0002000100000000UL;
+    res[0x80 + 0x0f * 2] = 0xc0000102; /* SHADOW_GS_BASE */
+    res[0x80 + 0x20 * 2] = 0xc0000101; /* GS_BASE */
+    rc = x86_emulate(&ctxt, &emulops);
+    if ( (rc != X86EMUL_OKAY) ||
+         (regs.rip != (unsigned long)&instr[4]) ||
+         regs.rcx ||
+         (gs_base != 0x0000222244446666UL) ||
+         (gs_base_shadow != 0x0000111122224444UL) )
+        goto fail;
+
     emulops.write_msr     = NULL;
 #endif
     printf("okay\n");
--- a/tools/tests/x86_emulator/x86-emulate.c
+++ b/tools/tests/x86_emulator/x86-emulate.c
@@ -88,6 +88,7 @@ bool emul_test_init(void)
     cp.feat.rdpid = true;
     cp.feat.lkgs = true;
     cp.feat.wrmsrns = true;
+    cp.feat.msrlist = true;
     cp.extd.clzero = true;
 
     if ( cpu_has_xsave )
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -835,6 +835,17 @@ static void cf_check vmx_cpuid_policy_ch
     else
         vmx_set_msr_intercept(v, MSR_PKRS, VMX_MSR_RW);
 
+    if ( cp->feat.msrlist )
+    {
+        vmx_clear_msr_intercept(v, MSR_BARRIER, VMX_MSR_RW);
+        //todo enable MSRLIST tertiary execution control
+    }
+    else
+    {
+        vmx_set_msr_intercept(v, MSR_BARRIER, VMX_MSR_RW);
+        //todo disable MSRLIST tertiary execution control
+    }
+
  out:
     vmx_vmcs_exit(v);
 
@@ -3705,6 +3716,22 @@ gp_fault:
     return X86EMUL_EXCEPTION;
 }
 
+static bool cf_check is_msrlist(
+    const struct x86_emulate_state *state, const struct x86_emulate_ctxt *ctxt)
+{
+
+    if ( ctxt->opcode == X86EMUL_OPC(0x0f, 0x01) )
+    {
+        unsigned int rm, reg;
+        int mode = x86_insn_modrm(state, &rm, &reg);
+
+        /* This also includes WRMSRNS; should be okay. */
+        return mode == 3 && rm == 6 && !reg;
+    }
+
+    return false;
+}
+
 static void vmx_do_extint(struct cpu_user_regs *regs)
 {
     unsigned long vector;
@@ -4513,6 +4540,17 @@ void vmx_vmexit_handler(struct cpu_user_
         }
         break;
 
+    case EXIT_REASON_RDMSRLIST:
+    case EXIT_REASON_WRMSRLIST:
+        if ( vmx_guest_x86_mode(v) != 8 || !currd->arch.cpuid->feat.msrlist )
+        {
+            ASSERT_UNREACHABLE();
+            hvm_inject_hw_exception(TRAP_invalid_op, X86_EVENT_NO_EC);
+        }
+        else if ( !hvm_emulate_one_insn(is_msrlist, "MSR list") )
+            hvm_inject_hw_exception(TRAP_gp_fault, 0);
+        break;
+
     case EXIT_REASON_VMXOFF:
     case EXIT_REASON_VMXON:
     case EXIT_REASON_VMCLEAR:
--- a/xen/arch/x86/include/asm/hvm/vmx/vmx.h
+++ b/xen/arch/x86/include/asm/hvm/vmx/vmx.h
@@ -211,6 +211,8 @@ static inline void pi_clear_sn(struct pi
 #define EXIT_REASON_XRSTORS             64
 #define EXIT_REASON_BUS_LOCK            74
 #define EXIT_REASON_NOTIFY              75
+#define EXIT_REASON_RDMSRLIST           78
+#define EXIT_REASON_WRMSRLIST           79
 /* Remember to also update VMX_PERF_EXIT_REASON_SIZE! */
 
 /*
--- a/xen/arch/x86/include/asm/msr-index.h
+++ b/xen/arch/x86/include/asm/msr-index.h
@@ -24,6 +24,8 @@
 #define  APIC_BASE_ENABLE                   (_AC(1, ULL) << 11)
 #define  APIC_BASE_ADDR_MASK                0x000ffffffffff000ULL
 
+#define MSR_BARRIER                         0x0000002f
+
 #define MSR_TEST_CTRL                       0x00000033
 #define  TEST_CTRL_SPLITLOCK_DETECT         (_AC(1, ULL) << 29)
 #define  TEST_CTRL_SPLITLOCK_DISABLE        (_AC(1, ULL) << 31)
--- a/xen/arch/x86/include/asm/perfc_defn.h
+++ b/xen/arch/x86/include/asm/perfc_defn.h
@@ -6,7 +6,7 @@ PERFCOUNTER_ARRAY(exceptions,
 
 #ifdef CONFIG_HVM
 
-#define VMX_PERF_EXIT_REASON_SIZE 76
+#define VMX_PERF_EXIT_REASON_SIZE 80
 #define VMEXIT_NPF_PERFC 143
 #define SVM_PERF_EXIT_REASON_SIZE (VMEXIT_NPF_PERFC + 1)
 PERFCOUNTER_ARRAY(vmexits,              "vmexits",
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -223,6 +223,12 @@ int guest_rdmsr(struct vcpu *v, uint32_t
     case MSR_AMD_PPIN:
         goto gp_fault;
 
+    case MSR_BARRIER:
+        if ( !cp->feat.msrlist )
+            goto gp_fault;
+        *val = 0;
+        break;
+
     case MSR_IA32_FEATURE_CONTROL:
         /*
          * Architecturally, availability of this MSR is enumerated by the
@@ -493,6 +499,7 @@ int guest_wrmsr(struct vcpu *v, uint32_t
         uint64_t rsvd;
 
         /* Read-only */
+    case MSR_BARRIER:
     case MSR_IA32_PLATFORM_ID:
     case MSR_CORE_CAPABILITIES:
     case MSR_INTEL_CORE_THREAD_COUNT:
--- a/xen/arch/x86/x86_emulate/0f01.c
+++ b/xen/arch/x86/x86_emulate/0f01.c
@@ -40,6 +40,7 @@ int x86emul_0f01(struct x86_emulate_stat
     switch ( s->modrm )
     {
         unsigned long base, limit, cr0, cr0w, cr4;
+        unsigned int n;
         struct segment_register sreg;
         uint64_t msr_val;
 
@@ -54,6 +55,56 @@ int x86emul_0f01(struct x86_emulate_stat
                                 ((uint64_t)regs->r(dx) << 32) | regs->eax,
                                 ctxt);
             goto done;
+
+        case vex_f3: /* wrmsrlist */
+            vcpu_must_have(msrlist);
+            generate_exception_if(!mode_64bit(), X86_EXC_UD);
+            generate_exception_if(!mode_ring0() || (regs->r(si) & 7) ||
+                                  (regs->r(di) & 7),
+                                  X86_EXC_GP, 0);
+            fail_if(!ops->write_msr);
+            while ( regs->r(cx) )
+            {
+                n = __builtin_ffsl(regs->r(cx)) - 1;
+                if ( (rc = ops->read(x86_seg_none, regs->r(si) + n * 8,
+                                     &msr_val, 8, ctxt)) != X86EMUL_OKAY )
+                    break;
+                generate_exception_if(msr_val != (uint32_t)msr_val,
+                                      X86_EXC_GP, 0);
+                base = msr_val;
+                if ( (rc = ops->read(x86_seg_none, regs->r(di) + n * 8,
+                                     &msr_val, 8, ctxt)) != X86EMUL_OKAY ||
+                     (rc = ops->write_msr(base, msr_val, ctxt)) != X86EMUL_OKAY )
+                    break;
+                regs->r(cx) &= ~(1UL << n);
+            }
+            goto done;
+
+        case vex_f2: /* rdmsrlist */
+            vcpu_must_have(msrlist);
+            generate_exception_if(!mode_64bit(), X86_EXC_UD);
+            generate_exception_if(!mode_ring0() || (regs->r(si) & 7) ||
+                                  (regs->r(di) & 7),
+                                  X86_EXC_GP, 0);
+            fail_if(!ops->read_msr || !ops->write);
+            while ( regs->r(cx) )
+            {
+                n = __builtin_ffsl(regs->r(cx)) - 1;
+                if ( (rc = ops->read(x86_seg_none, regs->r(si) + n * 8,
+                                     &msr_val, 8, ctxt)) != X86EMUL_OKAY )
+                    break;
+                generate_exception_if(msr_val != (uint32_t)msr_val,
+                                      X86_EXC_GP, 0);
+                if ( (rc = ops->read_msr(msr_val, &msr_val,
+                                         ctxt)) != X86EMUL_OKAY ||
+                     (rc = ops->write(x86_seg_none, regs->r(di) + n * 8,
+                                      &msr_val, 8, ctxt)) != X86EMUL_OKAY )
+                    break;
+                regs->r(cx) &= ~(1UL << n);
+            }
+            if ( rc != X86EMUL_OKAY )
+                ctxt->regs->r(cx) = regs->r(cx);
+            goto done;
         }
         generate_exception(X86_EXC_UD);
 
--- a/xen/arch/x86/x86_emulate/private.h
+++ b/xen/arch/x86/x86_emulate/private.h
@@ -600,6 +600,7 @@ amd_like(const struct x86_emulate_ctxt *
 #define vcpu_has_lkgs()        (ctxt->cpuid->feat.lkgs)
 #define vcpu_has_wrmsrns()     (ctxt->cpuid->feat.wrmsrns)
 #define vcpu_has_avx_ifma()    (ctxt->cpuid->feat.avx_ifma)
+#define vcpu_has_msrlist()     (ctxt->cpuid->feat.msrlist)
 #define vcpu_has_avx_vnni_int8() (ctxt->cpuid->feat.avx_vnni_int8)
 #define vcpu_has_avx_ne_convert() (ctxt->cpuid->feat.avx_ne_convert)
 
--- a/xen/arch/x86/x86_emulate/util.c
+++ b/xen/arch/x86/x86_emulate/util.c
@@ -112,6 +112,9 @@ bool cf_check x86_insn_is_mem_access(con
         break;
 
     case X86EMUL_OPC(0x0f, 0x01):
+        /* {RD,WR}MSRLIST */
+        if ( mode_64bit() && s->modrm == 0xc6 )
+            return s->vex.pfx >= vex_f3;
         /* Cover CLZERO. */
         return (s->modrm_rm & 7) == 4 && (s->modrm_reg & 7) == 7;
     }
@@ -172,7 +175,11 @@ bool cf_check x86_insn_is_mem_write(cons
         case 0xff: /* Grp5 */
             break;
 
-        case X86EMUL_OPC(0x0f, 0x01): /* CLZERO is the odd one. */
+        case X86EMUL_OPC(0x0f, 0x01):
+            /* RDMSRLIST */
+            if ( mode_64bit() && s->modrm == 0xc6 )
+                return s->vex.pfx == vex_f2;
+            /* CLZERO is another odd one. */
             return (s->modrm_rm & 7) == 4 && (s->modrm_reg & 7) == 7;
 
         default:
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -286,6 +286,7 @@ XEN_CPUFEATURE(FRED,         10*32+17) /
 XEN_CPUFEATURE(LKGS,         10*32+18) /*S  Load Kernel GS Base */
 XEN_CPUFEATURE(WRMSRNS,      10*32+19) /*A  WRMSR Non-Serialising */
 XEN_CPUFEATURE(AVX_IFMA,     10*32+23) /*A  AVX-IFMA Instructions */
+XEN_CPUFEATURE(MSRLIST,      10*32+27) /*   MSR list instructions */
 
 /* AMD-defined CPU features, CPUID level 0x80000021.eax, word 11 */
 XEN_CPUFEATURE(LFENCE_DISPATCH,    11*32+ 2) /*A  LFENCE always serializing */



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 9/9] x86emul+VMX: support {RD,WR}MSRLIST
  2023-04-04 14:55 ` [PATCH 9/9] x86emul+VMX: support {RD,WR}MSRLIST Jan Beulich
@ 2023-04-04 14:57   ` Jan Beulich
  2023-04-06 22:03   ` Andrew Cooper
  1 sibling, 0 replies; 26+ messages in thread
From: Jan Beulich @ 2023-04-04 14:57 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Wei Liu, Roger Pau Monné, Kevin Tian, Jun Nakajima

On 04.04.2023 16:55, Jan Beulich wrote:
> These are "compound" instructions to issue a series of RDMSR / WRMSR
> respectively. In the emulator we can therefore implement them by using
> the existing msr_{read,write}() hooks. The memory accesses utilize that
> the HVM ->read() / ->write() hooks are already linear-address
> (x86_seg_none) aware (by way of hvmemul_virtual_to_linear() handling
> this case).
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> TODO: Use VMX tertiary execution control (once bit is known; see
>       //todo-s) and then further adjust cpufeatureset.h.

Argh, should have Cc-ed Kevin and Jun, even if there weren't this issue.

Jan

> RFC: In vmx_vmexit_handler() handling is forwarded to the emulator
>      blindly. Alternatively we could consult the exit qualification and
>      process just a single MSR at a time (without involving the
>      emulator), exiting back to the guest after every iteration. (I
>      don't think a mix of both models makes a lot of sense.)
> 
> RFC: For PV priv_op_ops would need to gain proper read/write hooks,
>      which doesn't look desirable (albeit there we could refuse to
>      handle anything else than x86_seg_none); we may want to consider to
>      instead not support the feature for PV guests, requiring e.g. Linux
>      to process the lists in new pvops hooks.
> 
> RFC: I wasn't sure whether to add preemption checks to the loops -
>      thoughts?
> 
> With the VMX side of the spec still unclear (tertiary execution control
> bit unspecified in ISE 046) we can't enable the insn yet for (HVM) guest
> use. The precise behavior of MSR_BARRIER is also not spelled out, so the
> (minimal) implementation is a guess for now.
> 
> --- a/tools/libs/light/libxl_cpuid.c
> +++ b/tools/libs/light/libxl_cpuid.c
> @@ -240,6 +240,7 @@ int libxl_cpuid_parse_config(libxl_cpuid
>          {"lkgs",         0x00000007,  1, CPUID_REG_EAX, 18,  1},
>          {"wrmsrns",      0x00000007,  1, CPUID_REG_EAX, 19,  1},
>          {"avx-ifma",     0x00000007,  1, CPUID_REG_EAX, 23,  1},
> +        {"msrlist",      0x00000007,  1, CPUID_REG_EAX, 27,  1},
>  
>          {"avx-vnni-int8",0x00000007,  1, CPUID_REG_EDX,  4,  1},
>          {"avx-ne-convert",0x00000007, 1, CPUID_REG_EDX,  5,  1},
> --- a/tools/misc/xen-cpuid.c
> +++ b/tools/misc/xen-cpuid.c
> @@ -195,6 +195,8 @@ static const char *const str_7a1[32] =
>      [18] = "lkgs",          [19] = "wrmsrns",
>  
>      /* 22 */                [23] = "avx-ifma",
> +
> +    /* 26 */                [27] = "msrlist",
>  };
>  
>  static const char *const str_e21a[32] =
> --- a/tools/tests/x86_emulator/predicates.c
> +++ b/tools/tests/x86_emulator/predicates.c
> @@ -342,6 +342,8 @@ static const struct {
>      { { 0x01, 0xc4 }, { 2, 2 }, F, N }, /* vmxoff */
>      { { 0x01, 0xc5 }, { 2, 2 }, F, N }, /* pconfig */
>      { { 0x01, 0xc6 }, { 2, 2 }, F, N }, /* wrmsrns */
> +    { { 0x01, 0xc6 }, { 0, 2 }, F, W, pfx_f2 }, /* rdmsrlist */
> +    { { 0x01, 0xc6 }, { 0, 2 }, F, R, pfx_f3 }, /* wrmsrlist */
>      { { 0x01, 0xc8 }, { 2, 2 }, F, N }, /* monitor */
>      { { 0x01, 0xc9 }, { 2, 2 }, F, N }, /* mwait */
>      { { 0x01, 0xca }, { 2, 2 }, F, N }, /* clac */
> --- a/tools/tests/x86_emulator/test_x86_emulator.c
> +++ b/tools/tests/x86_emulator/test_x86_emulator.c
> @@ -589,6 +589,7 @@ static int read(
>      default:
>          if ( !is_x86_user_segment(seg) )
>              return X86EMUL_UNHANDLEABLE;
> +    case x86_seg_none:
>          bytes_read += bytes;
>          break;
>      }
> @@ -619,7 +620,7 @@ static int write(
>      if ( verbose )
>          printf("** %s(%u, %p,, %u,)\n", __func__, seg, (void *)offset, bytes);
>  
> -    if ( !is_x86_user_segment(seg) )
> +    if ( !is_x86_user_segment(seg) && seg != x86_seg_none )
>          return X86EMUL_UNHANDLEABLE;
>      memcpy((void *)offset, p_data, bytes);
>      return X86EMUL_OKAY;
> @@ -711,6 +712,10 @@ static int read_msr(
>  {
>      switch ( reg )
>      {
> +    case 0x0000002f: /* BARRIER */
> +        *val = 0;
> +        return X86EMUL_OKAY;
> +
>      case 0xc0000080: /* EFER */
>          *val = ctxt->addr_size > 32 ? 0x500 /* LME|LMA */ : 0;
>          return X86EMUL_OKAY;
> @@ -1499,9 +1504,53 @@ int main(int argc, char **argv)
>           (gs_base != 0x0000111122224444UL) ||
>           gs_base_shadow )
>          goto fail;
> +    printf("okay\n");
>  
>      cp.extd.nscb = i;
>      emulops.write_segment = NULL;
> +
> +    printf("%-40s", "Testing rdmsrlist...");
> +    instr[0] = 0xf2; instr[1] = 0x0f; instr[2] = 0x01; instr[3] = 0xc6;
> +    regs.rip = (unsigned long)&instr[0];
> +    regs.rsi = (unsigned long)(res + 0x80);
> +    regs.rdi = (unsigned long)(res + 0x80 + 0x40 * 2);
> +    regs.rcx = 0x0002000100008000UL;
> +    gs_base_shadow = 0x0000222244446666UL;
> +    memset(res + 0x80, ~0, 0x40 * 8 * 2);
> +    res[0x80 + 0x0f * 2] = 0xc0000101; /* GS_BASE */
> +    res[0x80 + 0x0f * 2 + 1] = 0;
> +    res[0x80 + 0x20 * 2] = 0xc0000102; /* SHADOW_GS_BASE */
> +    res[0x80 + 0x20 * 2 + 1] = 0;
> +    res[0x80 + 0x31 * 2] = 0x2f; /* BARRIER */
> +    res[0x80 + 0x31 * 2 + 1] = 0;
> +    rc = x86_emulate(&ctxt, &emulops);
> +    if ( (rc != X86EMUL_OKAY) ||
> +         (regs.rip != (unsigned long)&instr[4]) ||
> +         regs.rcx ||
> +         (res[0x80 + (0x40 + 0x0f) * 2] != (unsigned int)gs_base) ||
> +         (res[0x80 + (0x40 + 0x0f) * 2 + 1] != (gs_base >> (8 * sizeof(int)))) ||
> +         (res[0x80 + (0x40 + 0x20) * 2] != (unsigned int)gs_base_shadow) ||
> +         (res[0x80 + (0x40 + 0x20) * 2 + 1] != (gs_base_shadow >> (8 * sizeof(int)))) ||
> +         res[0x80 + (0x40 + 0x31) * 2] || res[0x80 + (0x40 + 0x31) * 2 + 1] )
> +        goto fail;
> +    printf("okay\n");
> +
> +    printf("%-40s", "Testing wrmsrlist...");
> +    instr[0] = 0xf3; instr[1] = 0x0f; instr[2] = 0x01; instr[3] = 0xc6;
> +    regs.eip = (unsigned long)&instr[0];
> +    regs.rsi -= 0x11 * 8;
> +    regs.rdi -= 0x11 * 8;
> +    regs.rcx = 0x0002000100000000UL;
> +    res[0x80 + 0x0f * 2] = 0xc0000102; /* SHADOW_GS_BASE */
> +    res[0x80 + 0x20 * 2] = 0xc0000101; /* GS_BASE */
> +    rc = x86_emulate(&ctxt, &emulops);
> +    if ( (rc != X86EMUL_OKAY) ||
> +         (regs.rip != (unsigned long)&instr[4]) ||
> +         regs.rcx ||
> +         (gs_base != 0x0000222244446666UL) ||
> +         (gs_base_shadow != 0x0000111122224444UL) )
> +        goto fail;
> +
>      emulops.write_msr     = NULL;
>  #endif
>      printf("okay\n");
> --- a/tools/tests/x86_emulator/x86-emulate.c
> +++ b/tools/tests/x86_emulator/x86-emulate.c
> @@ -88,6 +88,7 @@ bool emul_test_init(void)
>      cp.feat.rdpid = true;
>      cp.feat.lkgs = true;
>      cp.feat.wrmsrns = true;
> +    cp.feat.msrlist = true;
>      cp.extd.clzero = true;
>  
>      if ( cpu_has_xsave )
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -835,6 +835,17 @@ static void cf_check vmx_cpuid_policy_ch
>      else
>          vmx_set_msr_intercept(v, MSR_PKRS, VMX_MSR_RW);
>  
> +    if ( cp->feat.msrlist )
> +    {
> +        vmx_clear_msr_intercept(v, MSR_BARRIER, VMX_MSR_RW);
> +        //todo enable MSRLIST tertiary execution control
> +    }
> +    else
> +    {
> +        vmx_set_msr_intercept(v, MSR_BARRIER, VMX_MSR_RW);
> +        //todo disable MSRLIST tertiary execution control
> +    }
> +
>   out:
>      vmx_vmcs_exit(v);
>  
> @@ -3705,6 +3716,22 @@ gp_fault:
>      return X86EMUL_EXCEPTION;
>  }
>  
> +static bool cf_check is_msrlist(
> +    const struct x86_emulate_state *state, const struct x86_emulate_ctxt *ctxt)
> +{
> +
> +    if ( ctxt->opcode == X86EMUL_OPC(0x0f, 0x01) )
> +    {
> +        unsigned int rm, reg;
> +        int mode = x86_insn_modrm(state, &rm, &reg);
> +
> +        /* This also includes WRMSRNS; should be okay. */
> +        return mode == 3 && rm == 6 && !reg;
> +    }
> +
> +    return false;
> +}
> +
>  static void vmx_do_extint(struct cpu_user_regs *regs)
>  {
>      unsigned long vector;
> @@ -4513,6 +4540,17 @@ void vmx_vmexit_handler(struct cpu_user_
>          }
>          break;
>  
> +    case EXIT_REASON_RDMSRLIST:
> +    case EXIT_REASON_WRMSRLIST:
> +        if ( vmx_guest_x86_mode(v) != 8 || !currd->arch.cpuid->feat.msrlist )
> +        {
> +            ASSERT_UNREACHABLE();
> +            hvm_inject_hw_exception(TRAP_invalid_op, X86_EVENT_NO_EC);
> +        }
> +        else if ( !hvm_emulate_one_insn(is_msrlist, "MSR list") )
> +            hvm_inject_hw_exception(TRAP_gp_fault, 0);
> +        break;
> +
>      case EXIT_REASON_VMXOFF:
>      case EXIT_REASON_VMXON:
>      case EXIT_REASON_VMCLEAR:
> --- a/xen/arch/x86/include/asm/hvm/vmx/vmx.h
> +++ b/xen/arch/x86/include/asm/hvm/vmx/vmx.h
> @@ -211,6 +211,8 @@ static inline void pi_clear_sn(struct pi
>  #define EXIT_REASON_XRSTORS             64
>  #define EXIT_REASON_BUS_LOCK            74
>  #define EXIT_REASON_NOTIFY              75
> +#define EXIT_REASON_RDMSRLIST           78
> +#define EXIT_REASON_WRMSRLIST           79
>  /* Remember to also update VMX_PERF_EXIT_REASON_SIZE! */
>  
>  /*
> --- a/xen/arch/x86/include/asm/msr-index.h
> +++ b/xen/arch/x86/include/asm/msr-index.h
> @@ -24,6 +24,8 @@
>  #define  APIC_BASE_ENABLE                   (_AC(1, ULL) << 11)
>  #define  APIC_BASE_ADDR_MASK                0x000ffffffffff000ULL
>  
> +#define MSR_BARRIER                         0x0000002f
> +
>  #define MSR_TEST_CTRL                       0x00000033
>  #define  TEST_CTRL_SPLITLOCK_DETECT         (_AC(1, ULL) << 29)
>  #define  TEST_CTRL_SPLITLOCK_DISABLE        (_AC(1, ULL) << 31)
> --- a/xen/arch/x86/include/asm/perfc_defn.h
> +++ b/xen/arch/x86/include/asm/perfc_defn.h
> @@ -6,7 +6,7 @@ PERFCOUNTER_ARRAY(exceptions,
>  
>  #ifdef CONFIG_HVM
>  
> -#define VMX_PERF_EXIT_REASON_SIZE 76
> +#define VMX_PERF_EXIT_REASON_SIZE 80
>  #define VMEXIT_NPF_PERFC 143
>  #define SVM_PERF_EXIT_REASON_SIZE (VMEXIT_NPF_PERFC + 1)
>  PERFCOUNTER_ARRAY(vmexits,              "vmexits",
> --- a/xen/arch/x86/msr.c
> +++ b/xen/arch/x86/msr.c
> @@ -223,6 +223,12 @@ int guest_rdmsr(struct vcpu *v, uint32_t
>      case MSR_AMD_PPIN:
>          goto gp_fault;
>  
> +    case MSR_BARRIER:
> +        if ( !cp->feat.msrlist )
> +            goto gp_fault;
> +        *val = 0;
> +        break;
> +
>      case MSR_IA32_FEATURE_CONTROL:
>          /*
>           * Architecturally, availability of this MSR is enumerated by the
> @@ -493,6 +499,7 @@ int guest_wrmsr(struct vcpu *v, uint32_t
>          uint64_t rsvd;
>  
>          /* Read-only */
> +    case MSR_BARRIER:
>      case MSR_IA32_PLATFORM_ID:
>      case MSR_CORE_CAPABILITIES:
>      case MSR_INTEL_CORE_THREAD_COUNT:
> --- a/xen/arch/x86/x86_emulate/0f01.c
> +++ b/xen/arch/x86/x86_emulate/0f01.c
> @@ -40,6 +40,7 @@ int x86emul_0f01(struct x86_emulate_stat
>      switch ( s->modrm )
>      {
>          unsigned long base, limit, cr0, cr0w, cr4;
> +        unsigned int n;
>          struct segment_register sreg;
>          uint64_t msr_val;
>  
> @@ -54,6 +55,56 @@ int x86emul_0f01(struct x86_emulate_stat
>                                  ((uint64_t)regs->r(dx) << 32) | regs->eax,
>                                  ctxt);
>              goto done;
> +
> +        case vex_f3: /* wrmsrlist */
> +            vcpu_must_have(msrlist);
> +            generate_exception_if(!mode_64bit(), X86_EXC_UD);
> +            generate_exception_if(!mode_ring0() || (regs->r(si) & 7) ||
> +                                  (regs->r(di) & 7),
> +                                  X86_EXC_GP, 0);
> +            fail_if(!ops->write_msr);
> +            while ( regs->r(cx) )
> +            {
> +                n = __builtin_ffsl(regs->r(cx)) - 1;
> +                if ( (rc = ops->read(x86_seg_none, regs->r(si) + n * 8,
> +                                     &msr_val, 8, ctxt)) != X86EMUL_OKAY )
> +                    break;
> +                generate_exception_if(msr_val != (uint32_t)msr_val,
> +                                      X86_EXC_GP, 0);
> +                base = msr_val;
> +                if ( (rc = ops->read(x86_seg_none, regs->r(di) + n * 8,
> +                                     &msr_val, 8, ctxt)) != X86EMUL_OKAY ||
> +                     (rc = ops->write_msr(base, msr_val, ctxt)) != X86EMUL_OKAY )
> +                    break;
> +                regs->r(cx) &= ~(1UL << n);
> +            }
> +            goto done;
> +
> +        case vex_f2: /* rdmsrlist */
> +            vcpu_must_have(msrlist);
> +            generate_exception_if(!mode_64bit(), X86_EXC_UD);
> +            generate_exception_if(!mode_ring0() || (regs->r(si) & 7) ||
> +                                  (regs->r(di) & 7),
> +                                  X86_EXC_GP, 0);
> +            fail_if(!ops->read_msr || !ops->write);
> +            while ( regs->r(cx) )
> +            {
> +                n = __builtin_ffsl(regs->r(cx)) - 1;
> +                if ( (rc = ops->read(x86_seg_none, regs->r(si) + n * 8,
> +                                     &msr_val, 8, ctxt)) != X86EMUL_OKAY )
> +                    break;
> +                generate_exception_if(msr_val != (uint32_t)msr_val,
> +                                      X86_EXC_GP, 0);
> +                if ( (rc = ops->read_msr(msr_val, &msr_val,
> +                                         ctxt)) != X86EMUL_OKAY ||
> +                     (rc = ops->write(x86_seg_none, regs->r(di) + n * 8,
> +                                      &msr_val, 8, ctxt)) != X86EMUL_OKAY )
> +                    break;
> +                regs->r(cx) &= ~(1UL << n);
> +            }
> +            if ( rc != X86EMUL_OKAY )
> +                ctxt->regs->r(cx) = regs->r(cx);
> +            goto done;
>          }
>          generate_exception(X86_EXC_UD);
>  
> --- a/xen/arch/x86/x86_emulate/private.h
> +++ b/xen/arch/x86/x86_emulate/private.h
> @@ -600,6 +600,7 @@ amd_like(const struct x86_emulate_ctxt *
>  #define vcpu_has_lkgs()        (ctxt->cpuid->feat.lkgs)
>  #define vcpu_has_wrmsrns()     (ctxt->cpuid->feat.wrmsrns)
>  #define vcpu_has_avx_ifma()    (ctxt->cpuid->feat.avx_ifma)
> +#define vcpu_has_msrlist()     (ctxt->cpuid->feat.msrlist)
>  #define vcpu_has_avx_vnni_int8() (ctxt->cpuid->feat.avx_vnni_int8)
>  #define vcpu_has_avx_ne_convert() (ctxt->cpuid->feat.avx_ne_convert)
>  
> --- a/xen/arch/x86/x86_emulate/util.c
> +++ b/xen/arch/x86/x86_emulate/util.c
> @@ -112,6 +112,9 @@ bool cf_check x86_insn_is_mem_access(con
>          break;
>  
>      case X86EMUL_OPC(0x0f, 0x01):
> +        /* {RD,WR}MSRLIST */
> +        if ( mode_64bit() && s->modrm == 0xc6 )
> +            return s->vex.pfx >= vex_f3;
>          /* Cover CLZERO. */
>          return (s->modrm_rm & 7) == 4 && (s->modrm_reg & 7) == 7;
>      }
> @@ -172,7 +175,11 @@ bool cf_check x86_insn_is_mem_write(cons
>          case 0xff: /* Grp5 */
>              break;
>  
> -        case X86EMUL_OPC(0x0f, 0x01): /* CLZERO is the odd one. */
> +        case X86EMUL_OPC(0x0f, 0x01):
> +            /* RDMSRLIST */
> +            if ( mode_64bit() && s->modrm == 0xc6 )
> +                return s->vex.pfx == vex_f2;
> +            /* CLZERO is another odd one. */
>              return (s->modrm_rm & 7) == 4 && (s->modrm_reg & 7) == 7;
>  
>          default:
> --- a/xen/include/public/arch-x86/cpufeatureset.h
> +++ b/xen/include/public/arch-x86/cpufeatureset.h
> @@ -286,6 +286,7 @@ XEN_CPUFEATURE(FRED,         10*32+17) /
>  XEN_CPUFEATURE(LKGS,         10*32+18) /*S  Load Kernel GS Base */
>  XEN_CPUFEATURE(WRMSRNS,      10*32+19) /*A  WRMSR Non-Serialising */
>  XEN_CPUFEATURE(AVX_IFMA,     10*32+23) /*A  AVX-IFMA Instructions */
> +XEN_CPUFEATURE(MSRLIST,      10*32+27) /*   MSR list instructions */
>  
>  /* AMD-defined CPU features, CPUID level 0x80000021.eax, word 11 */
>  XEN_CPUFEATURE(LFENCE_DISPATCH,    11*32+ 2) /*A  LFENCE always serializing */
> 
> 



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/9] x86emul: support LKGS
  2023-04-04 14:49 ` [PATCH 1/9] x86emul: support LKGS Jan Beulich
@ 2023-04-04 21:54   ` Andrew Cooper
  2023-04-05  7:51     ` Jan Beulich
  0 siblings, 1 reply; 26+ messages in thread
From: Andrew Cooper @ 2023-04-04 21:54 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Wei Liu, Roger Pau Monné

On 04/04/2023 3:49 pm, Jan Beulich wrote:
> Provide support for this insn, which is a prereq to FRED. CPUID-wise
> introduce both its and FRED's bit at this occasion, thus allowing to
> also express the dependency right away.
>
> While adding a testcase, also add a SWAPGS one. In order to not affect
> the behavior of pre-existing tests, install write_{segment,msr} hooks
> only transiently.

IMO, the emulator is already complicated enough without us having
fallback logic to cope with callers that don't set up all the hooks.

Nor do I think making these hooks transient in the test harness is a
clever idea.

> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> Instead of ->read_segment() we could of course also use ->read_msr() to
> fetch the original GS base. I don't think I can see a clear advantage of
> either approach; the way it's done it matches how we handle SWAPGS.

read_segment() is a much shorter logic chain under the hood, so will be
marginally faster, but it will be a couple of unnecessary VMREADs (on
Intel at least).

We could expose the get/set reg paths for cases where we know we're not
going to need sanity checks, but I'm not sure it's worth it in this case.

> For PV save_segments() would need adjustment, but the insn being
> restricted to ring 0 means PV guests can't use it anyway (unless we
> wanted to emulate it as another privileged insn).

I know, it's on the list.

What is rather irritating is that, depending on FRED or not, it's
opposite whether the guest user or guest kernel GS base is in context. 
Sadly Intel refused my request for a control knob to turn off FRED's
auto-SWAPGS, but I didn't really push them on it because for practically
all other circumstances, it would just be a way for OSes to shoot
themselves in the foot.

For PV guests, our regular ABI is half-way to FRED anyway.  I suspect we
can get most of the interesting rest of the functionality by adding an
ERET bit to the HYPERCALL_iret flags.  I'm not sure yet if we ought to
bother exposing CSL in the pvFRED ABI or not, but doing so would reduce
the divergence from native even further.

> --- a/xen/arch/x86/x86_emulate/private.h
> +++ b/xen/arch/x86/x86_emulate/private.h
> @@ -594,6 +594,7 @@ amd_like(const struct x86_emulate_ctxt *
>  #define vcpu_has_tsxldtrk()    (ctxt->cpuid->feat.tsxldtrk)
>  #define vcpu_has_avx_vnni()    (ctxt->cpuid->feat.avx_vnni)
>  #define vcpu_has_avx512_bf16() (ctxt->cpuid->feat.avx512_bf16)
> +#define vcpu_has_lkgs()        (ctxt->cpuid->feat.lkgs)
>  
>  #define vcpu_must_have(feat) \
>      generate_exception_if(!vcpu_has_##feat(), X86_EXC_UD)
> --- a/xen/arch/x86/x86_emulate/x86_emulate.c
> +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
> @@ -2886,8 +2886,31 @@ x86_emulate(
>                  break;
>              }
>              break;
> -        default:
> -            generate_exception_if(true, EXC_UD);
> +        case 6: /* lkgs */
> +            generate_exception_if((modrm_reg & 1) || vex.pfx != vex_f2, EXC_UD);
> +            generate_exception_if(!mode_64bit() || !mode_ring0(), EXC_UD);

Can we switch to X86_* please.  Alternatively, I've got such a patch
which I've just rebased over all your emulator changes anyway, if we're
happy to fix this in one fell swoop.

(Sadly, you did move some TRAP_* names into util-xen.c which I fixed up
in my other tree-wide exception constant patch.)

> +            vcpu_must_have(lkgs);
> +            fail_if(!ops->read_segment || !ops->read_msr ||
> +                    !ops->write_segment || !ops->write_msr);
> +            if ( (rc = ops->read_msr(MSR_SHADOW_GS_BASE, &msr_val,
> +                                     ctxt)) != X86EMUL_OKAY ||
> +                 (rc = ops->read_segment(x86_seg_gs, &sreg,
> +                                         ctxt)) != X86EMUL_OKAY )
> +                goto done;
> +            dst.orig_val = sreg.base;
> +            if ( (rc = protmode_load_seg(x86_seg_gs, src.val, false, &sreg,
> +                                         ctxt, ops)) != X86EMUL_OKAY ||
> +                 (rc = ops->write_msr(MSR_SHADOW_GS_BASE, sreg.base,
> +                                      ctxt)) != X86EMUL_OKAY )
> +                goto done;
> +            sreg.base = dst.orig_val;

Honestly, I think a comment is needed here, because I'm struggling to
work out if this is correct or not.

There is a 64->32 bit truncation of base with LGKS, just as there is
with MOV GS.

Which I think does happen as a side effect of protmode_load_seg() only
filling in the lower half of sreg.base, but I think it would be nicer to
have:

+            dst.orig_val = sreg.base; /* Preserve full GS Base */
+            if ( (rc = protmode_load_seg(x86_seg_gs, src.val, false, &sreg,
+                                         ctxt, ops)) != X86EMUL_OKAY ||
+                 /* Write truncated base into GS_SHADOW */
+                 (rc = ops->write_msr(MSR_SHADOW_GS_BASE, sreg.base,
+                                      ctxt)) != X86EMUL_OKAY )
+                goto done;
+            sreg.base = dst.orig_val; /* Reinstate full GS Base */

Or so, because it's weird not to see a (uint32_t) somewhere in this logic.

> +            if ( (rc = ops->write_segment(x86_seg_gs, &sreg,
> +                                          ctxt)) != X86EMUL_OKAY )
> +            {
> +                /* Best effort unwind (i.e. no error checking). */
> +                ops->write_msr(MSR_SHADOW_GS_BASE, msr_val, ctxt);

write_segment() can't fail.  (The sanity checks are actually deferred
until after emulation is complete, and I'm not sure if that's behaviour
we want...)

However, more importantly, if we actually take this error path (for some
future reason) then we've created a security vulnerability in the guest.

It will be strictly better to crash the domain in this case, than to try
and let it continue in this state.

> +                goto done;
> +            }
>              break;
>          }
>          break;
> --- a/xen/include/public/arch-x86/cpufeatureset.h
> +++ b/xen/include/public/arch-x86/cpufeatureset.h
> @@ -281,6 +281,8 @@ XEN_CPUFEATURE(AVX512_BF16,  10*32+ 5) /
>  XEN_CPUFEATURE(FZRM,         10*32+10) /*A  Fast Zero-length REP MOVSB */
>  XEN_CPUFEATURE(FSRS,         10*32+11) /*A  Fast Short REP STOSB */
>  XEN_CPUFEATURE(FSRCS,        10*32+12) /*A  Fast Short REP CMPSB/SCASB */
> +XEN_CPUFEATURE(FRED,         10*32+17) /*   Flexible Return and Event Delivery */
> +XEN_CPUFEATURE(LKGS,         10*32+18) /*S  Load Kernel GS Base */
>  XEN_CPUFEATURE(WRMSRNS,      10*32+19) /*   WRMSR Non-Serialising */
>  
>  /* AMD-defined CPU features, CPUID level 0x80000021.eax, word 11 */
> --- a/xen/tools/gen-cpuid.py
> +++ b/xen/tools/gen-cpuid.py
> @@ -295,6 +295,9 @@ def crunch_numbers(state):
>  
>          # In principle the TSXLDTRK insns could also be considered independent.
>          RTM: [TSXLDTRK],
> +
> +        # FRED builds on the LKGS instruction.
> +        LKGS: [FRED],

Hmm...  This is the first case (I think) we've got where a dependency
that goes back numerically in terms of feature number.

Obviously we need to support it, but I'm not sure if the deep_deps loop
will cope in its current form.

~Andrew


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/9] x86emul: support LKGS
  2023-04-04 21:54   ` Andrew Cooper
@ 2023-04-05  7:51     ` Jan Beulich
  0 siblings, 0 replies; 26+ messages in thread
From: Jan Beulich @ 2023-04-05  7:51 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, Roger Pau Monné, xen-devel

On 04.04.2023 23:54, Andrew Cooper wrote:
> On 04/04/2023 3:49 pm, Jan Beulich wrote:
>> Provide support for this insn, which is a prereq to FRED. CPUID-wise
>> introduce both its and FRED's bit at this occasion, thus allowing to
>> also express the dependency right away.
>>
>> While adding a testcase, also add a SWAPGS one. In order to not affect
>> the behavior of pre-existing tests, install write_{segment,msr} hooks
>> only transiently.
> 
> IMO, the emulator is already complicated enough without us having
> fallback logic to cope with callers that don't set up all the hooks.
> 
> Nor do I think making these hooks transient in the test harness is a
> clever idea.

Are you suggesting we start to _require_ all hooks to be set? That'll
mean many useless stubs in particular in PV emulation. Furthermore
absent hooks sometimes cause default behavior to apply rather than
outright failure, so altering what hooks are present can affect
overall behavior. Hence the transient establishing of the two hooks
here.

>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> Instead of ->read_segment() we could of course also use ->read_msr() to
>> fetch the original GS base. I don't think I can see a clear advantage of
>> either approach; the way it's done it matches how we handle SWAPGS.
> 
> read_segment() is a much shorter logic chain under the hood, so will be
> marginally faster, but it will be a couple of unnecessary VMREADs (on
> Intel at least).

And this is precisely why I think it's not entirely clear. Anyway,
the remark is here just in case you (or Roger) thought doing it the
other way might be better.

> We could expose the get/set reg paths for cases where we know we're not
> going to need sanity checks, but I'm not sure it's worth it in this case.

Probably not.

>> --- a/xen/arch/x86/x86_emulate/x86_emulate.c
>> +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
>> @@ -2886,8 +2886,31 @@ x86_emulate(
>>                  break;
>>              }
>>              break;
>> -        default:
>> -            generate_exception_if(true, EXC_UD);
>> +        case 6: /* lkgs */
>> +            generate_exception_if((modrm_reg & 1) || vex.pfx != vex_f2, EXC_UD);
>> +            generate_exception_if(!mode_64bit() || !mode_ring0(), EXC_UD);
> 
> Can we switch to X86_* please.  Alternatively, I've got such a patch
> which I've just rebased over all your emulator changes anyway, if we're
> happy to fix this in one fell swoop.

Of course, and I've applied this transformation to all the emulator
patches I have pending (i.e. no need to re-request this elsewhere).

> (Sadly, you did move some TRAP_* names into util-xen.c which I fixed up
> in my other tree-wide exception constant patch.)

Hmm, yes, I changed EXC_* -> X86_EXC_* but failed to pay attention to
TRAP_* (because there no build issue arose).

>> +            vcpu_must_have(lkgs);
>> +            fail_if(!ops->read_segment || !ops->read_msr ||
>> +                    !ops->write_segment || !ops->write_msr);
>> +            if ( (rc = ops->read_msr(MSR_SHADOW_GS_BASE, &msr_val,
>> +                                     ctxt)) != X86EMUL_OKAY ||
>> +                 (rc = ops->read_segment(x86_seg_gs, &sreg,
>> +                                         ctxt)) != X86EMUL_OKAY )
>> +                goto done;
>> +            dst.orig_val = sreg.base;
>> +            if ( (rc = protmode_load_seg(x86_seg_gs, src.val, false, &sreg,
>> +                                         ctxt, ops)) != X86EMUL_OKAY ||
>> +                 (rc = ops->write_msr(MSR_SHADOW_GS_BASE, sreg.base,
>> +                                      ctxt)) != X86EMUL_OKAY )
>> +                goto done;
>> +            sreg.base = dst.orig_val;
> 
> Honestly, I think a comment is needed here, because I'm struggling to
> work out if this is correct or not.
> 
> There is a 64->32 bit truncation of base with LGKS, just as there is
> with MOV GS.
> 
> Which I think does happen as a side effect of protmode_load_seg() only
> filling in the lower half of sreg.base,

I thought that's obvious, as protmode_load_seg() can't possibly have
any other behavior. But ...

> but I think it would be nicer to
> have:
> 
> +            dst.orig_val = sreg.base; /* Preserve full GS Base */
> +            if ( (rc = protmode_load_seg(x86_seg_gs, src.val, false, &sreg,
> +                                         ctxt, ops)) != X86EMUL_OKAY ||
> +                 /* Write truncated base into GS_SHADOW */
> +                 (rc = ops->write_msr(MSR_SHADOW_GS_BASE, sreg.base,
> +                                      ctxt)) != X86EMUL_OKAY )
> +                goto done;
> +            sreg.base = dst.orig_val; /* Reinstate full GS Base */
> 
> Or so, because it's weird not to see a (uint32_t) somewhere in this logic.

... sure, I've added comments. I don't think "truncated" is correct,
as there's no truncation - there's no more than 32 bits we can read
out of the GDT/LDT. I've used "32-bit" instead.

>> +            if ( (rc = ops->write_segment(x86_seg_gs, &sreg,
>> +                                          ctxt)) != X86EMUL_OKAY )
>> +            {
>> +                /* Best effort unwind (i.e. no error checking). */
>> +                ops->write_msr(MSR_SHADOW_GS_BASE, msr_val, ctxt);
> 
> write_segment() can't fail.  (The sanity checks are actually deferred
> until after emulation is complete, and I'm not sure if that's behaviour
> we want...)

Irrespective we shouldn't start omitting error checks. If we truly
mean write_segment() to always be successful, we should make it
return "void". Yet I wouldn't view such as a good move.

> However, more importantly, if we actually take this error path (for some
> future reason) then we've created a security vulnerability in the guest.

You mean in case additionally the write_msr() also fails? In any
event, ...

> It will be strictly better to crash the domain in this case, than to try
> and let it continue in this state.

... we don't return OKAY in this case, so in most cases the guest
will already be crashed, won't it? Otherwise it's not really clear
to me what action you propose to take to "crash the domain": Right
here we better wouldn't call domain_crash() (or else we'd need yet
another #ifdef __XEN__ around it).

Furthermore - what does what you say here mean for the (existing)
similar code path in SWAPGS handling?

>> --- a/xen/tools/gen-cpuid.py
>> +++ b/xen/tools/gen-cpuid.py
>> @@ -295,6 +295,9 @@ def crunch_numbers(state):
>>  
>>          # In principle the TSXLDTRK insns could also be considered independent.
>>          RTM: [TSXLDTRK],
>> +
>> +        # FRED builds on the LKGS instruction.
>> +        LKGS: [FRED],
> 
> Hmm...  This is the first case (I think) we've got where a dependency
> that goes back numerically in terms of feature number.
> 
> Obviously we need to support it, but I'm not sure if the deep_deps loop
> will cope in its current form.

As you know my Python isn't overly good, but looking at the loop
makes me think it deals with this fine; I can't see an ordering
dependency. Looking at the generated INIT_DEEP_DEPS appears to
confirm this - there is an entry for LKGS, and while I haven't
counted elements, the set bit there looks to be in a plausible
position. The only thing I'm not entirely certain about is whether
a further (hypothetical) transitive dependency would also be dealt
with correctly.

If there was an issue here, I'd really like to leave addressing it
to you, as your Python is surely much better than mine.

Jan


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/9] x86emul: support WRMSRNS
  2023-04-04 14:50 ` [PATCH 2/9] x86emul: support WRMSRNS Jan Beulich
@ 2023-04-05 21:29   ` Andrew Cooper
  2023-04-06  6:47     ` Jan Beulich
  0 siblings, 1 reply; 26+ messages in thread
From: Andrew Cooper @ 2023-04-05 21:29 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Wei Liu, Roger Pau Monné

On 04/04/2023 3:50 pm, Jan Beulich wrote:
> This insn differs from WRMSR solely in the lack of serialization. Hence
> the code used there can simply be used here as well, plus a feature
> check of course.

I have a feeling this is a bit stale now that 0f01.c has moved into a
separate file ?

>  As there's no other infrastructure needed beyond
> permitting the insn for PV privileged-op emulation (in particular no
> separate new VMEXIT) we can expose the insn to guests right away.

There are good reasons not to expose this to PV guests.

The #GP fault to trigger emulation is serialising, so from the guest's
point of view there is literally no difference between WRMSR and WRMSRNS.

All code is going to need to default to WRMSR for compatibility reasons,
so not advertising WRMSRNS will save the guest going through and
self-modifying itself to use a "more efficient" instruction which is not
actually more efficient.


But, in general and as said before, I feel it is irresponsible to be
continuing to add slow guest-fastpaths without introducing a real fastpath.

We desperately need HYPERCALL_pv_priv_op which can take an array of
inputs, and has an sub-op for each of CPUID, RDMSR, WRMSR and whatever
other ops are trapping because of no fastpath.  Perf testing routinely
shows that #GP/#UD faults are only 2 orders of magnitude less rare in PV
guests than #PF, which is an insane wastage when guests are supposed to
be enlightened and use fastpaths in the first place.


The emulation implementation is fine, so Reviewed-by: Andrew Cooper
<andrew.cooper3@citrix.com> but dependent on the PV guest changes being
dropped.

~Andrew


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/9] x86emul: support WRMSRNS
  2023-04-05 21:29   ` Andrew Cooper
@ 2023-04-06  6:47     ` Jan Beulich
  0 siblings, 0 replies; 26+ messages in thread
From: Jan Beulich @ 2023-04-06  6:47 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, Roger Pau Monné, xen-devel

On 05.04.2023 23:29, Andrew Cooper wrote:
> On 04/04/2023 3:50 pm, Jan Beulich wrote:
>> This insn differs from WRMSR solely in the lack of serialization. Hence
>> the code used there can simply be used here as well, plus a feature
>> check of course.
> 
> I have a feeling this is a bit stale now that 0f01.c has moved into a
> separate file ?

Hmm, no, not really. This code was written long after the split anyway.
"Used here as well" is meant as "copy that code", not "goto ...".

>>  As there's no other infrastructure needed beyond
>> permitting the insn for PV privileged-op emulation (in particular no
>> separate new VMEXIT) we can expose the insn to guests right away.
> 
> There are good reasons not to expose this to PV guests.
> 
> The #GP fault to trigger emulation is serialising, so from the guest's
> point of view there is literally no difference between WRMSR and WRMSRNS.
> 
> All code is going to need to default to WRMSR for compatibility reasons,
> so not advertising WRMSRNS will save the guest going through and
> self-modifying itself to use a "more efficient" instruction which is not
> actually more efficient.

Well, sure, I can drop that again. I don't view "self-modifying itself"
as meaningful wastage. Plus I'd consider it reasonable to expose the
feature by default only to HVM, while permitting (non-default) exposure
to PV (in which case the emul-priv-op.c change would want retaining),
except that we can't express this afaict. Even without exposing at all
I'd consider it kind of reasonable to allow use of the insn, in case a
guest infers its availability (or simply knows from executing raw CPUID).

> The emulation implementation is fine, so Reviewed-by: Andrew Cooper
> <andrew.cooper3@citrix.com> but dependent on the PV guest changes being
> dropped.

Thanks.

Jan


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/9] x86emul: drop regs field from emulator state structure
  2023-04-04 14:51 ` [PATCH 3/9] x86emul: drop regs field from emulator state structure Jan Beulich
@ 2023-04-06 15:34   ` Andrew Cooper
  0 siblings, 0 replies; 26+ messages in thread
From: Andrew Cooper @ 2023-04-06 15:34 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Wei Liu, Roger Pau Monné

On 04/04/2023 3:51 pm, Jan Beulich wrote:
> For an unclear reason 0552a8cfda43 ("x86emul: track only rIP in emulator
> state") converted the original struct cpu_user_regs instance to a
> pointer, rather than dropping the field altogether: The pointer merely
> aliases the one in the context structure.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/9] x86emul: support CMPccXADD
  2023-04-04 14:52 ` [PATCH 4/9] x86emul: support CMPccXADD Jan Beulich
@ 2023-04-06 19:28   ` Andrew Cooper
  2023-04-17 10:56     ` Jan Beulich
  0 siblings, 1 reply; 26+ messages in thread
From: Andrew Cooper @ 2023-04-06 19:28 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Wei Liu, Roger Pau Monné

On 04/04/2023 3:52 pm, Jan Beulich wrote:
> Unconditionally wire this through the ->rmw() hook. Since x86_emul_rmw()
> now wants to construct and invoke a stub, make stub_exn available to it
> via a new field in the emulator state structure.

IMO, patch 5 should be re-ordered with this, because it removes one
incidental change that's not really related to CMPccXADD.

>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> # SDE: -grr or -srf

The ISE makes a point of noting that CMPccXADD is implicitly locked,
like XCHG.  (Unlike XCHG, there isn't a valid reg/reg encoding.)

Right now, the xchg emulation overrides lock_prefix, but I have a
feeling that's stale now with the rmw() hook in place.  But it is
dubious that we let xchg fall back to a non-atomic exchange if the rmw()
hook is missing.

Either way, I think it would be nice to clean that up so we don't have
differences in the handling of instructions which the ISE at least
claims are similar.

Tangentially, what about the RAO instructions?

> --- a/tools/tests/x86_emulator/x86-emulate.h
> +++ b/tools/tests/x86_emulator/x86-emulate.h
> @@ -934,6 +935,8 @@ decode_0f38(struct x86_emulate_state *s,
>              ctxt->opcode |= MASK_INSR(s->vex.pfx, X86EMUL_OPC_PFX_MASK);
>          break;
>  
> +    case X86EMUL_OPC_VEX_66(0, 0xe0)
> +     ... X86EMUL_OPC_VEX_66(0, 0xef): /* cmp<cc>xadd */

I know the style is a little mixed in the emulator, but

+    case X86EMUL_OPC_VEX_66(0, 0xe0) ...
+         X86EMUL_OPC_VEX_66(0, 0xef): /* cmp<cc>xadd */

is more consistent with Xen style (because it's somewhat of a binary
operator), and more readable IMO.

> --- a/xen/include/public/arch-x86/cpufeatureset.h
> +++ b/xen/include/public/arch-x86/cpufeatureset.h
> @@ -278,6 +278,7 @@ XEN_CPUFEATURE(SSBD,          9*32+31) /
>  /* Intel-defined CPU features, CPUID level 0x00000007:1.eax, word 10 */
>  XEN_CPUFEATURE(AVX_VNNI,     10*32+ 4) /*A  AVX-VNNI Instructions */
>  XEN_CPUFEATURE(AVX512_BF16,  10*32+ 5) /*A  AVX512 BFloat16 Instructions */
> +XEN_CPUFEATURE(CMPCCXADD,    10*32+ 7) /*A  CMPccXADD Instructions */

Given the non-triviality of this instruction, I'd prefer to keep this
"a" until we've tried it on real hardware.

~Andrew


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/9] x86emul: re-use new stub_exn field in state structure
  2023-04-04 14:53 ` [PATCH 5/9] x86emul: re-use new stub_exn field in state structure Jan Beulich
@ 2023-04-06 19:48   ` Andrew Cooper
  2023-04-17 11:26     ` Jan Beulich
  0 siblings, 1 reply; 26+ messages in thread
From: Andrew Cooper @ 2023-04-06 19:48 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Wei Liu, Roger Pau Monné

On 04/04/2023 3:53 pm, Jan Beulich wrote:
> This can now also be used to reduce the number of parameters
> x86emul_fpu() needs to take.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

As said in the previous patch, I think this patch wants reordering
forwards and picking up the addition into state.

"Because we're going to need it in another hook, and it simplifies an
existing one" is a perfectly fine justification in isolation.

With that done, Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>,
although...

> ---
> We could of course set the struct field once early in x86_emulate(), but
> for now I think we're better off leaving it as NULL where not actually
> needed.

Do we gain anything useful from not doing it once?  it's certainly more
to remember, and more code overall, to assign when needed.

>
> --- a/xen/arch/x86/x86_emulate/fpu.c
> +++ b/xen/arch/x86/x86_emulate/fpu.c
> @@ -90,9 +90,8 @@ int x86emul_fpu(struct x86_emulate_state
>                  unsigned int *insn_bytes,
>                  enum x86_emulate_fpu_type *fpu_type,
>  #define fpu_type (*fpu_type) /* for get_fpu() */
> -                struct stub_exn *stub_exn,
> -#define stub_exn (*stub_exn) /* for invoke_stub() */
>                  mmval_t *mmvalp)
> +#define stub_exn (*s->stub_exn) /* for invoke_stub() */

... honestly, I'd really like to see these macros purged.

I know the general style was done like this to avoid code churn, but
hiding indirection is a very rude thing to do, and none of these are
usefully shortening the expressions they replace.

Also, putting stub_exn in the K&R type space is still weird to read.

~Andrew


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 6/9] x86emul: support AVX-IFMA insns
  2023-04-04 14:53 ` [PATCH 6/9] x86emul: support AVX-IFMA insns Jan Beulich
@ 2023-04-06 20:51   ` Andrew Cooper
  0 siblings, 0 replies; 26+ messages in thread
From: Andrew Cooper @ 2023-04-06 20:51 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Wei Liu, Roger Pau Monné

On 04/04/2023 3:53 pm, Jan Beulich wrote:
> As in a few cases before (in particular: AVX512_IFMA), since the insns
> here and in particular their memory access patterns follow the usual
> scheme, I didn't think it was necessary to add a contrived test
> specifically for them.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 7/9] x86emul: support AVX-VNNI-INT8
  2023-04-04 14:54 ` [PATCH 7/9] x86emul: support AVX-VNNI-INT8 Jan Beulich
@ 2023-04-06 21:01   ` Andrew Cooper
  0 siblings, 0 replies; 26+ messages in thread
From: Andrew Cooper @ 2023-04-06 21:01 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Wei Liu, Roger Pau Monné

On 04/04/2023 3:54 pm, Jan Beulich wrote:
> These are close relatives of the AVX-VNNI ISA extension. Since the insns
> here and in particular their memory access patterns follow the usual
> scheme (and especially the byte variants of AVX-VNNI), I didn't think it
> was necessary to add a contrived test specifically for them.
>
> While making the addition also re-wire AVX-VNNI's handling to
> simd_0f_ymm: There's no reason to check the AVX feature alongside the
> one actually of interest (there are a few features where two checks are
> actually necessary, e.g. GFNI+AVX, but this isn't the case here).
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 8/9] x86emul: support AVX-NE-CONVERT insns
  2023-04-04 14:54 ` [PATCH 8/9] x86emul: support AVX-NE-CONVERT insns Jan Beulich
@ 2023-04-06 21:22   ` Andrew Cooper
  2023-04-17 12:50     ` Jan Beulich
  0 siblings, 1 reply; 26+ messages in thread
From: Andrew Cooper @ 2023-04-06 21:22 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Wei Liu, Roger Pau Monné

On 04/04/2023 3:54 pm, Jan Beulich wrote:
> Matching what was done earlier, explicit tests are added only for
> irregular insn / memory access patterns.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>, with two minor
requests.

> --- a/tools/misc/xen-cpuid.c
> +++ b/tools/misc/xen-cpuid.c
> @@ -214,7 +214,7 @@ static const char *const str_7c1[32] =
>  
>  static const char *const str_7d1[32] =
>  {
> -    [ 4] = "avx-vnni-int8",
> +    [ 4] = "avx-vnni-int8", [ 5] = "avx-ne-convert",

I'd leave a bit more horizontal space.  These names are getting rather
long, and we're only 10% into this word.

> --- a/xen/arch/x86/x86_emulate/x86_emulate.c
> +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
> @@ -6208,6 +6208,19 @@ x86_emulate(
>          host_and_vcpu_must_have(avx512_vbmi2);
>          goto avx512f_no_sae;
>  
> +    case X86EMUL_OPC_VEX   (0x0f38, 0xb0): /* vcvtneoph2ps mem,[xy]mm */
> +    case X86EMUL_OPC_VEX_66(0x0f38, 0xb0): /* vcvtneeph2ps mem,[xy]mm */
> +    case X86EMUL_OPC_VEX_F3(0x0f38, 0xb0): /* vcvtneebf162ps mem,[xy]mm */
> +    case X86EMUL_OPC_VEX_F2(0x0f38, 0xb0): /* vcvtneobf162ps mem,[xy]mm */
> +        generate_exception_if(ea.type != OP_MEM, EXC_UD);
> +        /* fall through */

Only just occurred to me, but we should probably be using fallthrough;
in new code, now there's a real attribute to use.

~Andrew


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 9/9] x86emul+VMX: support {RD,WR}MSRLIST
  2023-04-04 14:55 ` [PATCH 9/9] x86emul+VMX: support {RD,WR}MSRLIST Jan Beulich
  2023-04-04 14:57   ` Jan Beulich
@ 2023-04-06 22:03   ` Andrew Cooper
  2023-04-17 13:02     ` Jan Beulich
  1 sibling, 1 reply; 26+ messages in thread
From: Andrew Cooper @ 2023-04-06 22:03 UTC (permalink / raw)
  To: Jan Beulich, xen-devel
  Cc: Wei Liu, Roger Pau Monné, Kevin Tian, Jun Nakajima

On 04/04/2023 3:55 pm, Jan Beulich wrote:
> These are "compound" instructions to issue a series of RDMSR / WRMSR
> respectively. In the emulator we can therefore implement them by using
> the existing msr_{read,write}() hooks. The memory accesses utilize that
> the HVM ->read() / ->write() hooks are already linear-address
> (x86_seg_none) aware (by way of hvmemul_virtual_to_linear() handling
> this case).
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> TODO: Use VMX tertiary execution control (once bit is known; see
>       //todo-s) and then further adjust cpufeatureset.h.
>
> RFC: In vmx_vmexit_handler() handling is forwarded to the emulator
>      blindly. Alternatively we could consult the exit qualification and
>      process just a single MSR at a time (without involving the
>      emulator), exiting back to the guest after every iteration. (I
>      don't think a mix of both models makes a lot of sense.)

{RD,WR}MSRLIST are supposed to be used for context switch paths.  They
really shouldn't be exiting in the common case.

What matters here is the conditional probability of a second MSR being
intercepted too, given that one already has been.  And I don't have a
clue how to answer this.

I would not expect Introspection to be intercepting a fastpath MSR. 
(And if it does, frankly it can live with the consequences.)

For future scenarios such as reloading PMU/LBR/whatever, these will be
all-or-nothing and we'd expect to have them eagerly in context anyway.

If I were going to guess, I'd say that probably MSR_XSS or MSR_SPEC_CTRL
will be giving us the most grief here, because they're both ones that
are liable to be touched on a context switch path, and have split
host/guest bits.

> RFC: For PV priv_op_ops would need to gain proper read/write hooks,
>      which doesn't look desirable (albeit there we could refuse to
>      handle anything else than x86_seg_none); we may want to consider to
>      instead not support the feature for PV guests, requiring e.g. Linux
>      to process the lists in new pvops hooks.

Ah - funny you should ask.  See patch 2.  These are even better reasons
not to support on PV guests.

> RFC: I wasn't sure whether to add preemption checks to the loops -
>      thoughts?

I'd be tempted to.  Mostly because a guest can exert 64* longest MSR
worth of pressure here, along with associated emulation overhead.

64* "write hypercall page" sounds expensive.  So too does 64* MSR_PAT,
given all the EPT actions behind the scenes.

Its probably one of those

> With the VMX side of the spec still unclear (tertiary execution control
> bit unspecified in ISE 046) we can't enable the insn yet for (HVM) guest
> use. The precise behavior of MSR_BARRIER is also not spelled out, so the
> (minimal) implementation is a guess for now.

MSRs are expensive for several reasons.  The %ecx decode alone isn't
cheap, nor is the reserved bit checking, and that's before starting the
action itself.

What the list form gets you is the ability to pipeline the checks on the
following MSR while performing the action of the current one.  I suspect
there are plans to try and parallelise the actions too if possible.

As I understand it, BARRIER just halts the piplineing of processing the
next MSR.

~Andrew


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/9] x86emul: support CMPccXADD
  2023-04-06 19:28   ` Andrew Cooper
@ 2023-04-17 10:56     ` Jan Beulich
  0 siblings, 0 replies; 26+ messages in thread
From: Jan Beulich @ 2023-04-17 10:56 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, Roger Pau Monné, xen-devel

On 06.04.2023 21:28, Andrew Cooper wrote:
> On 04/04/2023 3:52 pm, Jan Beulich wrote:
>> Unconditionally wire this through the ->rmw() hook. Since x86_emul_rmw()
>> now wants to construct and invoke a stub, make stub_exn available to it
>> via a new field in the emulator state structure.
> 
> IMO, patch 5 should be re-ordered with this, because it removes one
> incidental change that's not really related to CMPccXADD.

Yeah, I can probably do that. The order here really is simply the order
things were written; I did notice the potential for the subsequent patch
only when already done with the one here.

>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> # SDE: -grr or -srf
> 
> The ISE makes a point of noting that CMPccXADD is implicitly locked,
> like XCHG.  (Unlike XCHG, there isn't a valid reg/reg encoding.)
> 
> Right now, the xchg emulation overrides lock_prefix, but I have a
> feeling that's stale now with the rmw() hook in place.  But it is
> dubious that we let xchg fall back to a non-atomic exchange if the rmw()
> hook is missing.
> Either way, I think it would be nice to clean that up so we don't have
> differences in the handling of instructions which the ISE at least
> claims are similar.

We can certainly revisit this (independently).

> Tangentially, what about the RAO instructions?

With the infrastructure we have right now, I don't see how we could get
their exception behavior (non-WB -> #GP) correct. Hence while I have
these on my todo list, I don't have immediate plans to deal with them.

>> --- a/tools/tests/x86_emulator/x86-emulate.h
>> +++ b/tools/tests/x86_emulator/x86-emulate.h
>> @@ -934,6 +935,8 @@ decode_0f38(struct x86_emulate_state *s,
>>              ctxt->opcode |= MASK_INSR(s->vex.pfx, X86EMUL_OPC_PFX_MASK);
>>          break;
>>  
>> +    case X86EMUL_OPC_VEX_66(0, 0xe0)
>> +     ... X86EMUL_OPC_VEX_66(0, 0xef): /* cmp<cc>xadd */
> 
> I know the style is a little mixed in the emulator, but
> 
> +    case X86EMUL_OPC_VEX_66(0, 0xe0) ...
> +         X86EMUL_OPC_VEX_66(0, 0xef): /* cmp<cc>xadd */
> 
> is more consistent with Xen style (because it's somewhat of a binary
> operator), and more readable IMO.

I don't mind; done.

>> --- a/xen/include/public/arch-x86/cpufeatureset.h
>> +++ b/xen/include/public/arch-x86/cpufeatureset.h
>> @@ -278,6 +278,7 @@ XEN_CPUFEATURE(SSBD,          9*32+31) /
>>  /* Intel-defined CPU features, CPUID level 0x00000007:1.eax, word 10 */
>>  XEN_CPUFEATURE(AVX_VNNI,     10*32+ 4) /*A  AVX-VNNI Instructions */
>>  XEN_CPUFEATURE(AVX512_BF16,  10*32+ 5) /*A  AVX512 BFloat16 Instructions */
>> +XEN_CPUFEATURE(CMPCCXADD,    10*32+ 7) /*A  CMPccXADD Instructions */
> 
> Given the non-triviality of this instruction, I'd prefer to keep this
> "a" until we've tried it on real hardware.

Hmm, yes, probably better.

Jan


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/9] x86emul: re-use new stub_exn field in state structure
  2023-04-06 19:48   ` Andrew Cooper
@ 2023-04-17 11:26     ` Jan Beulich
  0 siblings, 0 replies; 26+ messages in thread
From: Jan Beulich @ 2023-04-17 11:26 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, Roger Pau Monné, xen-devel

On 06.04.2023 21:48, Andrew Cooper wrote:
> On 04/04/2023 3:53 pm, Jan Beulich wrote:
>> This can now also be used to reduce the number of parameters
>> x86emul_fpu() needs to take.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> As said in the previous patch, I think this patch wants reordering
> forwards and picking up the addition into state.
> 
> "Because we're going to need it in another hook, and it simplifies an
> existing one" is a perfectly fine justification in isolation.

As said there, I'll do that.

> With that done, Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>,

Thanks.

> although...> 
>> ---
>> We could of course set the struct field once early in x86_emulate(), but
>> for now I think we're better off leaving it as NULL where not actually
>> needed.
> 
> Do we gain anything useful from not doing it once?  it's certainly more
> to remember, and more code overall, to assign when needed.

Right, that's why I did add the remark. In harness and fuzzer we'll be
able to catch stray uses of the field if we keep it at NULL. Actually
if we were to always set the pointer, perhaps it wouldn't make sense
to have a pointer in struct x86_emulate_state in the first place, but
then we'd better put the struct itself there.

>> --- a/xen/arch/x86/x86_emulate/fpu.c
>> +++ b/xen/arch/x86/x86_emulate/fpu.c
>> @@ -90,9 +90,8 @@ int x86emul_fpu(struct x86_emulate_state
>>                  unsigned int *insn_bytes,
>>                  enum x86_emulate_fpu_type *fpu_type,
>>  #define fpu_type (*fpu_type) /* for get_fpu() */
>> -                struct stub_exn *stub_exn,
>> -#define stub_exn (*stub_exn) /* for invoke_stub() */
>>                  mmval_t *mmvalp)
>> +#define stub_exn (*s->stub_exn) /* for invoke_stub() */
> 
> ... honestly, I'd really like to see these macros purged.
> 
> I know the general style was done like this to avoid code churn, but
> hiding indirection is a very rude thing to do, and none of these are
> usefully shortening the expressions they replace.

Right, getting rid of those is certainly on my radar. But it'll be
further code-churn-ish changes in an area where history tells me
things often take long to get in (and hence there may be measurable
re-basing effort in the meantime).

> Also, putting stub_exn in the K&R type space is still weird to read.

What's K&R-ish there?

Jan


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 8/9] x86emul: support AVX-NE-CONVERT insns
  2023-04-06 21:22   ` Andrew Cooper
@ 2023-04-17 12:50     ` Jan Beulich
  0 siblings, 0 replies; 26+ messages in thread
From: Jan Beulich @ 2023-04-17 12:50 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, Roger Pau Monné, xen-devel

On 06.04.2023 23:22, Andrew Cooper wrote:
> On 04/04/2023 3:54 pm, Jan Beulich wrote:
>> Matching what was done earlier, explicit tests are added only for
>> irregular insn / memory access patterns.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>,

Thanks.

> with two minor requests.
> 
>> --- a/tools/misc/xen-cpuid.c
>> +++ b/tools/misc/xen-cpuid.c
>> @@ -214,7 +214,7 @@ static const char *const str_7c1[32] =
>>  
>>  static const char *const str_7d1[32] =
>>  {
>> -    [ 4] = "avx-vnni-int8",
>> +    [ 4] = "avx-vnni-int8", [ 5] = "avx-ne-convert",
> 
> I'd leave a bit more horizontal space.  These names are getting rather
> long, and we're only 10% into this word.

Sure. I had taken neighboring arrays as reference; I've now switched
to "aligning" with rtm-always-abort.

>> --- a/xen/arch/x86/x86_emulate/x86_emulate.c
>> +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
>> @@ -6208,6 +6208,19 @@ x86_emulate(
>>          host_and_vcpu_must_have(avx512_vbmi2);
>>          goto avx512f_no_sae;
>>  
>> +    case X86EMUL_OPC_VEX   (0x0f38, 0xb0): /* vcvtneoph2ps mem,[xy]mm */
>> +    case X86EMUL_OPC_VEX_66(0x0f38, 0xb0): /* vcvtneeph2ps mem,[xy]mm */
>> +    case X86EMUL_OPC_VEX_F3(0x0f38, 0xb0): /* vcvtneebf162ps mem,[xy]mm */
>> +    case X86EMUL_OPC_VEX_F2(0x0f38, 0xb0): /* vcvtneobf162ps mem,[xy]mm */
>> +        generate_exception_if(ea.type != OP_MEM, EXC_UD);
>> +        /* fall through */
> 
> Only just occurred to me, but we should probably be using fallthrough;
> in new code, now there's a real attribute to use.

I did actually consider doing so (and iirc also already on an earlier
occasion), but that'll be yet another item we also need to cater for
in the harness'es x86-emulate.h. For the moment I prefer to stick to
comments, switching over - if necessary for e.g. Misra - all in one
go at some point.

Jan


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 9/9] x86emul+VMX: support {RD,WR}MSRLIST
  2023-04-06 22:03   ` Andrew Cooper
@ 2023-04-17 13:02     ` Jan Beulich
  0 siblings, 0 replies; 26+ messages in thread
From: Jan Beulich @ 2023-04-17 13:02 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Wei Liu, Roger Pau Monné, Kevin Tian, Jun Nakajima, xen-devel

On 07.04.2023 00:03, Andrew Cooper wrote:
> On 04/04/2023 3:55 pm, Jan Beulich wrote:
>> ---
>> TODO: Use VMX tertiary execution control (once bit is known; see
>>       //todo-s) and then further adjust cpufeatureset.h.
>>
>> RFC: In vmx_vmexit_handler() handling is forwarded to the emulator
>>      blindly. Alternatively we could consult the exit qualification and
>>      process just a single MSR at a time (without involving the
>>      emulator), exiting back to the guest after every iteration. (I
>>      don't think a mix of both models makes a lot of sense.)
> 
> {RD,WR}MSRLIST are supposed to be used for context switch paths.  They
> really shouldn't be exiting in the common case.
> 
> What matters here is the conditional probability of a second MSR being
> intercepted too, given that one already has been.  And I don't have a
> clue how to answer this.
> 
> I would not expect Introspection to be intercepting a fastpath MSR. 
> (And if it does, frankly it can live with the consequences.)
> 
> For future scenarios such as reloading PMU/LBR/whatever, these will be
> all-or-nothing and we'd expect to have them eagerly in context anyway.
> 
> If I were going to guess, I'd say that probably MSR_XSS or MSR_SPEC_CTRL
> will be giving us the most grief here, because they're both ones that
> are liable to be touched on a context switch path, and have split
> host/guest bits.

I'm not really certain, but I'm tending to interpret your reply as
agreement with the choice made (and hence not as a request to change
anything in this regard); clarification appreciated.

>> RFC: For PV priv_op_ops would need to gain proper read/write hooks,
>>      which doesn't look desirable (albeit there we could refuse to
>>      handle anything else than x86_seg_none); we may want to consider to
>>      instead not support the feature for PV guests, requiring e.g. Linux
>>      to process the lists in new pvops hooks.
> 
> Ah - funny you should ask.  See patch 2.  These are even better reasons
> not to support on PV guests.

Yeah, with PV dropped there it's quite obvious to not consider it here
either. I'll drop the remark.

>> RFC: I wasn't sure whether to add preemption checks to the loops -
>>      thoughts?
> 
> I'd be tempted to.  Mostly because a guest can exert 64* longest MSR
> worth of pressure here, along with associated emulation overhead.
> 
> 64* "write hypercall page" sounds expensive.  So too does 64* MSR_PAT,
> given all the EPT actions behind the scenes.
> 
> Its probably one of those

Which leaves me inclined to add preemption checking to writes, but
keep reads as they are. Thoughts?

Jan


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2023-04-17 13:03 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-04 14:48 [PATCH 0/9] x86emul: misc additions Jan Beulich
2023-04-04 14:49 ` [PATCH 1/9] x86emul: support LKGS Jan Beulich
2023-04-04 21:54   ` Andrew Cooper
2023-04-05  7:51     ` Jan Beulich
2023-04-04 14:50 ` [PATCH 2/9] x86emul: support WRMSRNS Jan Beulich
2023-04-05 21:29   ` Andrew Cooper
2023-04-06  6:47     ` Jan Beulich
2023-04-04 14:51 ` [PATCH 3/9] x86emul: drop regs field from emulator state structure Jan Beulich
2023-04-06 15:34   ` Andrew Cooper
2023-04-04 14:52 ` [PATCH 4/9] x86emul: support CMPccXADD Jan Beulich
2023-04-06 19:28   ` Andrew Cooper
2023-04-17 10:56     ` Jan Beulich
2023-04-04 14:53 ` [PATCH 5/9] x86emul: re-use new stub_exn field in state structure Jan Beulich
2023-04-06 19:48   ` Andrew Cooper
2023-04-17 11:26     ` Jan Beulich
2023-04-04 14:53 ` [PATCH 6/9] x86emul: support AVX-IFMA insns Jan Beulich
2023-04-06 20:51   ` Andrew Cooper
2023-04-04 14:54 ` [PATCH 7/9] x86emul: support AVX-VNNI-INT8 Jan Beulich
2023-04-06 21:01   ` Andrew Cooper
2023-04-04 14:54 ` [PATCH 8/9] x86emul: support AVX-NE-CONVERT insns Jan Beulich
2023-04-06 21:22   ` Andrew Cooper
2023-04-17 12:50     ` Jan Beulich
2023-04-04 14:55 ` [PATCH 9/9] x86emul+VMX: support {RD,WR}MSRLIST Jan Beulich
2023-04-04 14:57   ` Jan Beulich
2023-04-06 22:03   ` Andrew Cooper
2023-04-17 13:02     ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.