All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] x86: Expose consistent topology to guests
@ 2024-01-09 15:38 Alejandro Vallejo
  2024-01-09 15:38 ` [PATCH 1/6] xen/x86: Add initial x2APIC ID to the per-vLAPIC save area Alejandro Vallejo
                   ` (5 more replies)
  0 siblings, 6 replies; 32+ messages in thread
From: Alejandro Vallejo @ 2024-01-09 15:38 UTC (permalink / raw)
  To: Xen-devel
  Cc: Alejandro Vallejo, Jan Beulich, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, Anthony PERARD, Juergen Gross

Current topology handling is close to non-existent. As things stand, APIC
IDs are allocated through the apic_id=vcpu_id*2 relation without giving any
hints to the OS on how to parse the x2APIC ID of a given CPU and assuming
the guest will assume 2 threads per core.

This series involves bringing x2APIC IDs into the migration stream, so
older guests keep operating as they used to and enhancing Xen+toolstack so
new guests get topology information consistent with their x2APIC IDs. As a
side effect of this, x2APIC IDs are now packed and don't have (unless under
a pathological case) gaps.

Further work ought to allow combining this topology configurations with
gang-scheduling of guest hyperthreads into affine physical hyperthreads.
For the time being it purposefully keeps the configuration of "1 socket" +
"1 thread per core" + "1 core per vCPU".

Patch 1: Includes x2APIC IDs in the migration stream. This allows Xen to
         reconstruct the right x2APIC IDs on migrated-in guests, and
         future-proofs itself in the face of x2APIC ID derivation changes.
Patch 2: Minor refactor to expose xc_cpu_policy in libxl
Patch 3: Refactors xen/lib/x86 to work on non-Xen freestanding environments
         (e.g: hvmloader)
Patch 4: Remove old assumptions about vcpu_id<->apic_id relationship in hvmloader
Patch 5: Add logic to derive x2APIC IDs given a CPU policy and vCPU IDs
Patch 6: Includes a simple topology generator for toolstack so new guests
         have topologically consistent information in CPUID

Alejandro Vallejo (6):
  xen/x86: Add initial x2APIC ID to the per-vLAPIC save area
  tools/xc: Add xc_cpu_policy to the public xenctrl.h header
  xen/x86: Refactor xen/lib/x86 so it can be linked in hvmloader
  tools/hvmloader: Use cpu_policy to determine APIC IDs
  xen/x86: Derive topologically correct x2APIC IDs from the policy
  xen/x86: Add topology generator

 tools/firmware/hvmloader/Makefile        |   7 ++
 tools/firmware/hvmloader/config.h        |   5 +-
 tools/firmware/hvmloader/hvmloader.c     |   6 +
 tools/firmware/hvmloader/util.c          |   3 +-
 tools/include/xenguest.h                 |  23 +++-
 tools/libacpi/build.c                    |  27 ++++-
 tools/libacpi/libacpi.h                  |   5 +-
 tools/libs/guest/xg_cpuid_x86.c          | 144 ++++++++++++++---------
 tools/libs/guest/xg_private.h            |  10 --
 tools/libs/light/libxl_x86_acpi.c        |  21 +++-
 tools/tests/cpu-policy/test-cpu-policy.c | 128 ++++++++++++++++++++
 xen/arch/x86/cpu-policy.c                |   6 +-
 xen/arch/x86/cpuid.c                     |  20 +---
 xen/arch/x86/domain.c                    |   3 +
 xen/arch/x86/hvm/vlapic.c                |  27 ++++-
 xen/arch/x86/include/asm/hvm/vlapic.h    |   2 +
 xen/include/public/arch-x86/hvm/save.h   |   2 +
 xen/include/xen/lib/x86/cpu-policy.h     |  16 +++
 xen/lib/x86/cpuid.c                      |  12 +-
 xen/lib/x86/policy.c                     |  74 ++++++++++++
 xen/lib/x86/private.h                    |   8 +-
 21 files changed, 442 insertions(+), 107 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 1/6] xen/x86: Add initial x2APIC ID to the per-vLAPIC save area
  2024-01-09 15:38 [PATCH 0/6] x86: Expose consistent topology to guests Alejandro Vallejo
@ 2024-01-09 15:38 ` Alejandro Vallejo
  2024-03-19 16:20   ` Roger Pau Monné
  2024-03-25 16:45   ` Jan Beulich
  2024-01-09 15:38 ` [PATCH 2/6] tools/xc: Add xc_cpu_policy to the public xenctrl.h header Alejandro Vallejo
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 32+ messages in thread
From: Alejandro Vallejo @ 2024-01-09 15:38 UTC (permalink / raw)
  To: Xen-devel
  Cc: Alejandro Vallejo, Jan Beulich, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu

This allows the initial x2APIC ID to be sent on the migration stream. The
hardcoded mapping x2apic_id=2*vcpu_id is maintained for the time being.
Given the vlapic data is zero-extended on restore, fix up migrations from
hosts without the field by setting it to the old convention if zero.

x2APIC IDs are calculated from the CPU policy where the guest topology is
defined. For the time being, the function simply returns the old
relationship, but will eventually return results consistent with the
topology.

Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
---
 xen/arch/x86/cpuid.c                   | 20 ++++---------------
 xen/arch/x86/domain.c                  |  3 +++
 xen/arch/x86/hvm/vlapic.c              | 27 ++++++++++++++++++++++++--
 xen/arch/x86/include/asm/hvm/vlapic.h  |  2 ++
 xen/include/public/arch-x86/hvm/save.h |  2 ++
 xen/include/xen/lib/x86/cpu-policy.h   |  9 +++++++++
 xen/lib/x86/policy.c                   | 11 +++++++++++
 7 files changed, 56 insertions(+), 18 deletions(-)

diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c
index 7290a979c6..6e259785d0 100644
--- a/xen/arch/x86/cpuid.c
+++ b/xen/arch/x86/cpuid.c
@@ -139,10 +139,9 @@ void guest_cpuid(const struct vcpu *v, uint32_t leaf,
         const struct cpu_user_regs *regs;
 
     case 0x1:
-        /* TODO: Rework topology logic. */
         res->b &= 0x00ffffffu;
         if ( is_hvm_domain(d) )
-            res->b |= (v->vcpu_id * 2) << 24;
+            res->b |= SET_xAPIC_ID(vlapic_x2apic_id(vcpu_vlapic(v)));
 
         /* TODO: Rework vPMU control in terms of toolstack choices. */
         if ( vpmu_available(v) &&
@@ -311,20 +310,9 @@ void guest_cpuid(const struct vcpu *v, uint32_t leaf,
         break;
 
     case 0xb:
-        /*
-         * In principle, this leaf is Intel-only.  In practice, it is tightly
-         * coupled with x2apic, and we offer an x2apic-capable APIC emulation
-         * to guests on AMD hardware as well.
-         *
-         * TODO: Rework topology logic.
-         */
-        if ( p->basic.x2apic )
-        {
-            *(uint8_t *)&res->c = subleaf;
-
-            /* Fix the x2APIC identifier. */
-            res->d = v->vcpu_id * 2;
-        }
+        /* ecx != 0 if the subleaf is implemented */
+        if ( res->c && p->basic.x2apic )
+            res->d = vlapic_x2apic_id(vcpu_vlapic(v));
         break;
 
     case XSTATE_CPUID:
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 8a31d18f69..e0c7ed8d5d 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -288,7 +288,10 @@ void update_guest_memory_policy(struct vcpu *v,
 static void cpu_policy_updated(struct vcpu *v)
 {
     if ( is_hvm_vcpu(v) )
+    {
         hvm_cpuid_policy_changed(v);
+        vlapic_cpu_policy_changed(v);
+    }
 }
 
 void domain_cpu_policy_changed(struct domain *d)
diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index cdb69d9742..f500d66543 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -1069,7 +1069,7 @@ static uint32_t x2apic_ldr_from_id(uint32_t id)
 static void set_x2apic_id(struct vlapic *vlapic)
 {
     const struct vcpu *v = vlapic_vcpu(vlapic);
-    uint32_t apic_id = v->vcpu_id * 2;
+    uint32_t apic_id = vlapic->hw.x2apic_id;
     uint32_t apic_ldr = x2apic_ldr_from_id(apic_id);
 
     /*
@@ -1083,6 +1083,22 @@ static void set_x2apic_id(struct vlapic *vlapic)
     vlapic_set_reg(vlapic, APIC_LDR, apic_ldr);
 }
 
+void vlapic_cpu_policy_changed(struct vcpu *v)
+{
+    struct vlapic *vlapic = vcpu_vlapic(v);
+    struct cpu_policy *cp = v->domain->arch.cpu_policy;
+
+    /*
+     * Don't override the initial x2APIC ID if we have migrated it or
+     * if the domain doesn't have vLAPIC at all.
+     */
+    if ( !has_vlapic(v->domain) || vlapic->loaded.hw )
+        return;
+
+    vlapic->hw.x2apic_id = x86_x2apic_id_from_vcpu_id(cp, v->vcpu_id);
+    vlapic_set_reg(vlapic, APIC_ID, SET_xAPIC_ID(vlapic->hw.x2apic_id));
+}
+
 int guest_wrmsr_apic_base(struct vcpu *v, uint64_t val)
 {
     const struct cpu_policy *cp = v->domain->arch.cpu_policy;
@@ -1449,7 +1465,7 @@ void vlapic_reset(struct vlapic *vlapic)
     if ( v->vcpu_id == 0 )
         vlapic->hw.apic_base_msr |= APIC_BASE_BSP;
 
-    vlapic_set_reg(vlapic, APIC_ID, (v->vcpu_id * 2) << 24);
+    vlapic_set_reg(vlapic, APIC_ID, SET_xAPIC_ID(vlapic->hw.x2apic_id));
     vlapic_do_init(vlapic);
 }
 
@@ -1514,6 +1530,13 @@ static void lapic_load_fixup(struct vlapic *vlapic)
     const struct vcpu *v = vlapic_vcpu(vlapic);
     uint32_t good_ldr = x2apic_ldr_from_id(vlapic->loaded.id);
 
+    /*
+     * Guest with hardcoded assumptions about x2apic_id <-> vcpu_id
+     * mappings. Recreate the mapping it used to have in old host.
+     */
+    if ( !vlapic->hw.x2apic_id )
+        vlapic->hw.x2apic_id = v->vcpu_id * 2;
+
     /* Skip fixups on xAPIC mode, or if the x2APIC LDR is already correct */
     if ( !vlapic_x2apic_mode(vlapic) ||
          (vlapic->loaded.ldr == good_ldr) )
diff --git a/xen/arch/x86/include/asm/hvm/vlapic.h b/xen/arch/x86/include/asm/hvm/vlapic.h
index 88ef945243..e8d41313ab 100644
--- a/xen/arch/x86/include/asm/hvm/vlapic.h
+++ b/xen/arch/x86/include/asm/hvm/vlapic.h
@@ -44,6 +44,7 @@
 #define vlapic_xapic_mode(vlapic)                               \
     (!vlapic_hw_disabled(vlapic) && \
      !((vlapic)->hw.apic_base_msr & APIC_BASE_EXTD))
+#define vlapic_x2apic_id(vlapic) ((vlapic)->hw.x2apic_id)
 
 /*
  * Generic APIC bitmap vector update & search routines.
@@ -107,6 +108,7 @@ int vlapic_ack_pending_irq(struct vcpu *v, int vector, bool force_ack);
 
 int  vlapic_init(struct vcpu *v);
 void vlapic_destroy(struct vcpu *v);
+void vlapic_cpu_policy_changed(struct vcpu *v);
 
 void vlapic_reset(struct vlapic *vlapic);
 
diff --git a/xen/include/public/arch-x86/hvm/save.h b/xen/include/public/arch-x86/hvm/save.h
index 7ecacadde1..1c2ec669ff 100644
--- a/xen/include/public/arch-x86/hvm/save.h
+++ b/xen/include/public/arch-x86/hvm/save.h
@@ -394,6 +394,8 @@ struct hvm_hw_lapic {
     uint32_t             disabled; /* VLAPIC_xx_DISABLED */
     uint32_t             timer_divisor;
     uint64_t             tdt_msr;
+    uint32_t             x2apic_id;
+    uint32_t             rsvd_zero;
 };
 
 DECLARE_HVM_SAVE_TYPE(LAPIC, 5, struct hvm_hw_lapic);
diff --git a/xen/include/xen/lib/x86/cpu-policy.h b/xen/include/xen/lib/x86/cpu-policy.h
index d5e447e9dc..14724cedff 100644
--- a/xen/include/xen/lib/x86/cpu-policy.h
+++ b/xen/include/xen/lib/x86/cpu-policy.h
@@ -542,6 +542,15 @@ int x86_cpu_policies_are_compatible(const struct cpu_policy *host,
                                     const struct cpu_policy *guest,
                                     struct cpu_policy_errors *err);
 
+/**
+ * Calculates the x2APIC ID of a vCPU given a CPU policy
+ *
+ * @param p          CPU policy of the domain.
+ * @param vcpu_id    vCPU ID of the vCPU.
+ * @returns x2APIC ID of the vCPU.
+ */
+uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t vcpu_id);
+
 #endif /* !XEN_LIB_X86_POLICIES_H */
 
 /*
diff --git a/xen/lib/x86/policy.c b/xen/lib/x86/policy.c
index f033d22785..a3b24e6879 100644
--- a/xen/lib/x86/policy.c
+++ b/xen/lib/x86/policy.c
@@ -2,6 +2,17 @@
 
 #include <xen/lib/x86/cpu-policy.h>
 
+uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t vcpu_id)
+{
+    /*
+     * TODO: Derive x2APIC ID from the topology information inside `p`
+     *       rather than from vCPU ID. This bodge is a temporary measure
+     *       until all infra is in place to retrieve or derive the initial
+     *       x2APIC ID from migrated domains.
+     */
+    return vcpu_id * 2;
+}
+
 int x86_cpu_policies_are_compatible(const struct cpu_policy *host,
                                     const struct cpu_policy *guest,
                                     struct cpu_policy_errors *err)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 2/6] tools/xc: Add xc_cpu_policy to the public xenctrl.h header
  2024-01-09 15:38 [PATCH 0/6] x86: Expose consistent topology to guests Alejandro Vallejo
  2024-01-09 15:38 ` [PATCH 1/6] xen/x86: Add initial x2APIC ID to the per-vLAPIC save area Alejandro Vallejo
@ 2024-01-09 15:38 ` Alejandro Vallejo
  2024-03-19 17:55   ` Roger Pau Monné
  2024-01-09 15:38 ` [PATCH 3/6] xen/x86: Refactor xen/lib/x86 so it can be linked in hvmloader Alejandro Vallejo
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 32+ messages in thread
From: Alejandro Vallejo @ 2024-01-09 15:38 UTC (permalink / raw)
  To: Xen-devel
  Cc: Alejandro Vallejo, Wei Liu, Anthony PERARD, Juergen Gross,
	Jan Beulich, Andrew Cooper, Roger Pau Monné

Move struct xc_cpu_policy data structure out of xg_private.h and into
the public xenguest.h so it can be used by libxl.

Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
---
 tools/include/xenguest.h             |  8 +++++++-
 tools/libs/guest/xg_private.h        | 10 ----------
 xen/include/xen/lib/x86/cpu-policy.h |  5 +++++
 3 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/tools/include/xenguest.h b/tools/include/xenguest.h
index e01f494b77..4e9078fdee 100644
--- a/tools/include/xenguest.h
+++ b/tools/include/xenguest.h
@@ -784,7 +784,13 @@ xen_pfn_t *xc_map_m2p(xc_interface *xch,
                       unsigned long *mfn0);
 
 #if defined(__i386__) || defined(__x86_64__)
-typedef struct xc_cpu_policy xc_cpu_policy_t;
+#include <xen/lib/x86/cpu-policy.h>
+
+typedef struct xc_cpu_policy {
+    struct cpu_policy policy;
+    xen_cpuid_leaf_t leaves[CPUID_MAX_SERIALISED_LEAVES];
+    xen_msr_entry_t msrs[MSR_MAX_SERIALISED_ENTRIES];
+} xc_cpu_policy_t;
 
 /* Create and free a xc_cpu_policy object. */
 xc_cpu_policy_t *xc_cpu_policy_init(void);
diff --git a/tools/libs/guest/xg_private.h b/tools/libs/guest/xg_private.h
index d73947094f..d1940f1ea4 100644
--- a/tools/libs/guest/xg_private.h
+++ b/tools/libs/guest/xg_private.h
@@ -170,14 +170,4 @@ int pin_table(xc_interface *xch, unsigned int type, unsigned long mfn,
 #define M2P_SIZE(_m)    ROUNDUP(((_m) * sizeof(xen_pfn_t)), M2P_SHIFT)
 #define M2P_CHUNKS(_m)  (M2P_SIZE((_m)) >> M2P_SHIFT)
 
-#if defined(__x86_64__) || defined(__i386__)
-#include <xen/lib/x86/cpu-policy.h>
-
-struct xc_cpu_policy {
-    struct cpu_policy policy;
-    xen_cpuid_leaf_t leaves[CPUID_MAX_SERIALISED_LEAVES];
-    xen_msr_entry_t msrs[MSR_MAX_SERIALISED_ENTRIES];
-};
-#endif /* x86 */
-
 #endif /* XG_PRIVATE_H */
diff --git a/xen/include/xen/lib/x86/cpu-policy.h b/xen/include/xen/lib/x86/cpu-policy.h
index 14724cedff..65f6335b32 100644
--- a/xen/include/xen/lib/x86/cpu-policy.h
+++ b/xen/include/xen/lib/x86/cpu-policy.h
@@ -85,6 +85,11 @@ unsigned int x86_cpuid_lookup_vendor(uint32_t ebx, uint32_t ecx, uint32_t edx);
  */
 const char *x86_cpuid_vendor_to_str(unsigned int vendor);
 
+#ifndef __XEN__
+/* Needed for MAX() */
+#include <xen-tools/common-macros.h>
+#endif /* __XEN__ */
+
 #define CPUID_GUEST_NR_BASIC      (0xdu + 1)
 #define CPUID_GUEST_NR_CACHE      (5u + 1)
 #define CPUID_GUEST_NR_FEAT       (2u + 1)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 3/6] xen/x86: Refactor xen/lib/x86 so it can be linked in hvmloader
  2024-01-09 15:38 [PATCH 0/6] x86: Expose consistent topology to guests Alejandro Vallejo
  2024-01-09 15:38 ` [PATCH 1/6] xen/x86: Add initial x2APIC ID to the per-vLAPIC save area Alejandro Vallejo
  2024-01-09 15:38 ` [PATCH 2/6] tools/xc: Add xc_cpu_policy to the public xenctrl.h header Alejandro Vallejo
@ 2024-01-09 15:38 ` Alejandro Vallejo
  2024-03-25 16:55   ` Jan Beulich
  2024-01-09 15:38 ` [PATCH 4/6] tools/hvmloader: Use cpu_policy to determine APIC IDs Alejandro Vallejo
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 32+ messages in thread
From: Alejandro Vallejo @ 2024-01-09 15:38 UTC (permalink / raw)
  To: Xen-devel
  Cc: Alejandro Vallejo, Jan Beulich, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu

A future patch will use these in hvmloader, which is freestanding, but
lacks the Xen code. The following changes fix the compilation errors.

* string.h => Remove memset() usages and bzero through assignments
* inttypes.h => Use stdint.h (it's what it should've been to begin with)
* errno.h => Use xen/errno.h instead

No functional change intended.

Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
---
 xen/lib/x86/cpuid.c   | 12 ++++++------
 xen/lib/x86/private.h |  8 +++++---
 2 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/xen/lib/x86/cpuid.c b/xen/lib/x86/cpuid.c
index eb7698dc73..4a138c3a11 100644
--- a/xen/lib/x86/cpuid.c
+++ b/xen/lib/x86/cpuid.c
@@ -5,8 +5,8 @@
 static void zero_leaves(struct cpuid_leaf *l,
                         unsigned int first, unsigned int last)
 {
-    if ( first <= last )
-        memset(&l[first], 0, sizeof(*l) * (last - first + 1));
+    for (l = &l[first]; first <= last; first++, l++ )
+        *l = (struct cpuid_leaf){};
 }
 
 unsigned int x86_cpuid_lookup_vendor(uint32_t ebx, uint32_t ecx, uint32_t edx)
@@ -244,7 +244,7 @@ void x86_cpu_policy_clear_out_of_range_leaves(struct cpu_policy *p)
                 ARRAY_SIZE(p->basic.raw) - 1);
 
     if ( p->basic.max_leaf < 4 )
-        memset(p->cache.raw, 0, sizeof(p->cache.raw));
+        p->cache = (typeof(p->cache)){};
     else
     {
         for ( i = 0; (i < ARRAY_SIZE(p->cache.raw) &&
@@ -255,13 +255,13 @@ void x86_cpu_policy_clear_out_of_range_leaves(struct cpu_policy *p)
     }
 
     if ( p->basic.max_leaf < 7 )
-        memset(p->feat.raw, 0, sizeof(p->feat.raw));
+        p->feat = (typeof(p->feat)){};
     else
         zero_leaves(p->feat.raw, p->feat.max_subleaf + 1,
                     ARRAY_SIZE(p->feat.raw) - 1);
 
     if ( p->basic.max_leaf < 0xb )
-        memset(p->topo.raw, 0, sizeof(p->topo.raw));
+        p->topo = (typeof(p->topo)){};
     else
     {
         for ( i = 0; (i < ARRAY_SIZE(p->topo.raw) &&
@@ -272,7 +272,7 @@ void x86_cpu_policy_clear_out_of_range_leaves(struct cpu_policy *p)
     }
 
     if ( p->basic.max_leaf < 0xd || !cpu_policy_xstates(p) )
-        memset(p->xstate.raw, 0, sizeof(p->xstate.raw));
+        p->xstate = (typeof(p->xstate)){};
     else
     {
         /* This logic will probably need adjusting when XCR0[63] gets used. */
diff --git a/xen/lib/x86/private.h b/xen/lib/x86/private.h
index 60bb82a400..4b8cb97e64 100644
--- a/xen/lib/x86/private.h
+++ b/xen/lib/x86/private.h
@@ -17,12 +17,14 @@
 
 #else
 
-#include <errno.h>
-#include <inttypes.h>
+#include <stdint.h>
 #include <stdbool.h>
 #include <stddef.h>
-#include <string.h>
 
+enum {
+#define XEN_ERRNO(ident, rc) ident = (rc),
+#include <xen/errno.h>
+};
 #include <xen/asm/msr-index.h>
 #include <xen/asm/x86-vendors.h>
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 4/6] tools/hvmloader: Use cpu_policy to determine APIC IDs
  2024-01-09 15:38 [PATCH 0/6] x86: Expose consistent topology to guests Alejandro Vallejo
                   ` (2 preceding siblings ...)
  2024-01-09 15:38 ` [PATCH 3/6] xen/x86: Refactor xen/lib/x86 so it can be linked in hvmloader Alejandro Vallejo
@ 2024-01-09 15:38 ` Alejandro Vallejo
  2024-03-20  8:52   ` Roger Pau Monné
  2024-01-09 15:38 ` [PATCH 5/6] xen/x86: Derive topologically correct x2APIC IDs from the policy Alejandro Vallejo
  2024-01-09 15:38 ` [PATCH 6/6] xen/x86: Add topology generator Alejandro Vallejo
  5 siblings, 1 reply; 32+ messages in thread
From: Alejandro Vallejo @ 2024-01-09 15:38 UTC (permalink / raw)
  To: Xen-devel
  Cc: Alejandro Vallejo, Jan Beulich, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, Anthony PERARD, Juergen Gross

As part of topology correction efforts, APIC IDs can no longer be derived
strictly through the vCPU ID alone. Bring in the machinery for policy
retrieval and parsing in order to generate the proper MADT table and wake
the appropriate CPUs.

Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
---
 tools/firmware/hvmloader/Makefile    |  7 +++++++
 tools/firmware/hvmloader/config.h    |  5 ++++-
 tools/firmware/hvmloader/hvmloader.c |  6 ++++++
 tools/firmware/hvmloader/util.c      |  3 ++-
 tools/libacpi/build.c                | 27 +++++++++++++++++++++------
 tools/libacpi/libacpi.h              |  5 ++++-
 tools/libs/light/libxl_x86_acpi.c    | 21 +++++++++++++++++++--
 7 files changed, 63 insertions(+), 11 deletions(-)

diff --git a/tools/firmware/hvmloader/Makefile b/tools/firmware/hvmloader/Makefile
index e5de1ade17..503ef2e219 100644
--- a/tools/firmware/hvmloader/Makefile
+++ b/tools/firmware/hvmloader/Makefile
@@ -37,6 +37,13 @@ ifeq ($(debug),y)
 OBJS += tests.o
 endif
 
+CFLAGS += -I../../../xen/arch/x86/include/
+
+vpath cpuid.c ../../../xen/lib/x86/
+OBJS += cpuid.o
+vpath policy.c ../../../xen/lib/x86/
+OBJS += policy.o
+
 CIRRUSVGA_DEBUG ?= n
 
 ROMBIOS_DIR := ../rombios
diff --git a/tools/firmware/hvmloader/config.h b/tools/firmware/hvmloader/config.h
index c82adf6dc5..3304b63cd8 100644
--- a/tools/firmware/hvmloader/config.h
+++ b/tools/firmware/hvmloader/config.h
@@ -3,10 +3,13 @@
 
 #include <stdint.h>
 #include <stdbool.h>
+#include <xen/lib/x86/cpu-policy.h>
 
 enum virtual_vga { VGA_none, VGA_std, VGA_cirrus, VGA_pt };
 extern enum virtual_vga virtual_vga;
 
+extern struct cpu_policy cpu_policy;
+
 extern unsigned long igd_opregion_pgbase;
 #define IGD_OPREGION_PAGES 3
 
@@ -50,7 +53,7 @@ extern uint8_t ioapic_version;
 #define IOAPIC_ID           0x01
 
 #define LAPIC_BASE_ADDRESS  0xfee00000
-#define LAPIC_ID(vcpu_id)   ((vcpu_id) * 2)
+#define LAPIC_ID(vcpu_id) x86_x2apic_id_from_vcpu_id(&cpu_policy, vcpu_id)
 
 #define PCI_ISA_DEVFN       0x08    /* dev 1, fn 0 */
 #define PCI_ISA_IRQ_MASK    0x0c20U /* ISA IRQs 5,10,11 are PCI connected */
diff --git a/tools/firmware/hvmloader/hvmloader.c b/tools/firmware/hvmloader/hvmloader.c
index c58841e5b5..8874165c66 100644
--- a/tools/firmware/hvmloader/hvmloader.c
+++ b/tools/firmware/hvmloader/hvmloader.c
@@ -28,6 +28,7 @@
 #include <acpi2_0.h>
 #include <xen/version.h>
 #include <xen/hvm/params.h>
+#include <xen/lib/x86/cpu-policy.h>
 #include <xen/arch-x86/hvm/start_info.h>
 
 const struct hvm_start_info *hvm_start_info;
@@ -118,6 +119,9 @@ uint8_t ioapic_version;
 
 bool acpi_enabled;
 
+/* TODO: Remove after HVM ACPI building makes it to libxl */
+struct cpu_policy cpu_policy;
+
 static void init_hypercalls(void)
 {
     uint32_t eax, ebx, ecx, edx;
@@ -331,6 +335,8 @@ int main(void)
     printf("HVM Loader\n");
     BUG_ON(hvm_start_info->magic != XEN_HVM_START_MAGIC_VALUE);
 
+    x86_cpu_policy_fill_native(&cpu_policy);
+
     init_hypercalls();
 
     memory_map_setup();
diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index e82047d993..25a0245c5c 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -30,6 +30,7 @@
 #include <xen/sched.h>
 #include <xen/hvm/hvm_xs_strings.h>
 #include <xen/hvm/params.h>
+#include <xen/lib/x86/cpu-policy.h>
 
 /*
  * Check whether there exists overlap in the specified memory range.
@@ -1041,7 +1042,7 @@ void hvmloader_acpi_build_tables(struct acpi_config *config,
     ctxt.mem_ops.free = acpi_mem_free;
     ctxt.mem_ops.v2p = acpi_v2p;
 
-    acpi_build_tables(&ctxt, config);
+    acpi_build_tables(&ctxt, config, &cpu_policy);
 
     hvm_param_set(HVM_PARAM_VM_GENERATION_ID_ADDR, config->vm_gid_addr);
 }
diff --git a/tools/libacpi/build.c b/tools/libacpi/build.c
index 2f29863db1..ab05e54c96 100644
--- a/tools/libacpi/build.c
+++ b/tools/libacpi/build.c
@@ -17,6 +17,7 @@
 #include <xen/hvm/hvm_xs_strings.h>
 #include <xen/hvm/params.h>
 #include <xen/memory.h>
+#include <xen/lib/x86/cpu-policy.h>
 
 #define ACPI_MAX_SECONDARY_TABLES 16
 
@@ -65,7 +66,8 @@ static void set_checksum(
 
 static struct acpi_20_madt *construct_madt(struct acpi_ctxt *ctxt,
                                            const struct acpi_config *config,
-                                           struct acpi_info *info)
+                                           struct acpi_info *info,
+                                           const struct cpu_policy *policy)
 {
     struct acpi_20_madt           *madt;
     struct acpi_20_madt_intsrcovr *intsrcovr;
@@ -143,14 +145,25 @@ static struct acpi_20_madt *construct_madt(struct acpi_ctxt *ctxt,
     info->madt_lapic0_addr = ctxt->mem_ops.v2p(ctxt, lapic);
     for ( i = 0; i < hvminfo->nr_vcpus; i++ )
     {
+        uint32_t x2apic_id = x86_x2apic_id_from_vcpu_id(policy, i);
+
         memset(lapic, 0, sizeof(*lapic));
         lapic->type    = ACPI_PROCESSOR_LOCAL_APIC;
         lapic->length  = sizeof(*lapic);
         /* Processor ID must match processor-object IDs in the DSDT. */
         lapic->acpi_processor_id = i;
-        lapic->apic_id = config->lapic_id(i);
+        lapic->apic_id = x2apic_id;
         lapic->flags = (test_bit(i, hvminfo->vcpu_online)
                         ? ACPI_LOCAL_APIC_ENABLED : 0);
+
+        /*
+         * Error-out if the x2APIC ID doesn't fit in the entry
+         *
+         * TODO: Use "x2apic" entries if biggest x2apic_id > 254
+         */
+        if ( lapic->apic_id != x2apic_id )
+            return NULL;
+
         lapic++;
     }
 
@@ -335,7 +348,8 @@ static int construct_passthrough_tables(struct acpi_ctxt *ctxt,
 static int construct_secondary_tables(struct acpi_ctxt *ctxt,
                                       unsigned long *table_ptrs,
                                       struct acpi_config *config,
-                                      struct acpi_info *info)
+                                      struct acpi_info *info,
+                                      const struct cpu_policy *policy)
 {
     int nr_tables = 0;
     struct acpi_20_madt *madt;
@@ -349,7 +363,7 @@ static int construct_secondary_tables(struct acpi_ctxt *ctxt,
     /* MADT. */
     if ( (config->hvminfo->nr_vcpus > 1) || config->hvminfo->apic_mode )
     {
-        madt = construct_madt(ctxt, config, info);
+        madt = construct_madt(ctxt, config, info, policy);
         if (!madt) return -1;
         table_ptrs[nr_tables++] = ctxt->mem_ops.v2p(ctxt, madt);
     }
@@ -539,7 +553,8 @@ static int new_vm_gid(struct acpi_ctxt *ctxt,
     return 1;
 }
 
-int acpi_build_tables(struct acpi_ctxt *ctxt, struct acpi_config *config)
+int acpi_build_tables(struct acpi_ctxt *ctxt, struct acpi_config *config,
+                      struct cpu_policy *policy)
 {
     struct acpi_info *acpi_info;
     struct acpi_20_rsdp *rsdp;
@@ -667,7 +682,7 @@ int acpi_build_tables(struct acpi_ctxt *ctxt, struct acpi_config *config)
     set_checksum(fadt, offsetof(struct acpi_header, checksum), fadt_size);
 
     nr_secondaries = construct_secondary_tables(ctxt, secondary_tables,
-                 config, acpi_info);
+                 config, acpi_info, policy);
     if ( nr_secondaries < 0 )
         goto oom;
 
diff --git a/tools/libacpi/libacpi.h b/tools/libacpi/libacpi.h
index deda39e5db..24f62aa5d4 100644
--- a/tools/libacpi/libacpi.h
+++ b/tools/libacpi/libacpi.h
@@ -28,6 +28,8 @@
 #define ACPI_HAS_CMOS_RTC          (1<<14)
 #define ACPI_HAS_SSDT_LAPTOP_SLATE (1<<15)
 
+struct cpu_policy;
+
 struct xen_vmemrange;
 struct acpi_numa {
     uint32_t nr_vmemranges;
@@ -91,7 +93,8 @@ struct acpi_config {
     uint8_t ioapic_id;
 };
 
-int acpi_build_tables(struct acpi_ctxt *ctxt, struct acpi_config *config);
+int acpi_build_tables(struct acpi_ctxt *ctxt, struct acpi_config *config,
+                      struct cpu_policy *policy);
 
 #endif /* __LIBACPI_H__ */
 
diff --git a/tools/libs/light/libxl_x86_acpi.c b/tools/libs/light/libxl_x86_acpi.c
index 620f3c700c..a5bd42683e 100644
--- a/tools/libs/light/libxl_x86_acpi.c
+++ b/tools/libs/light/libxl_x86_acpi.c
@@ -158,11 +158,27 @@ int libxl__dom_load_acpi(libxl__gc *gc,
 {
     struct acpi_config config = {0};
     struct libxl_acpi_ctxt libxl_ctxt;
-    int rc = 0, acpi_pages_num;
+    uint16_t domid = dom->guest_domid;
+    int rc = 0, r, acpi_pages_num;
+    xc_cpu_policy_t *policy = NULL;
 
     if (b_info->type != LIBXL_DOMAIN_TYPE_PVH)
         goto out;
 
+    policy = xc_cpu_policy_init();
+    if (!policy) {
+        LOGED(ERROR, domid, "xc_cpu_policy_init failed");
+        rc = ERROR_NOMEM;
+        goto out;
+    }
+
+    r = xc_cpu_policy_get_domain(dom->xch, domid, policy);
+    if (r < 0) {
+        LOGED(ERROR, domid, "get_cpu_policy failed");
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
     libxl_ctxt.page_size = XC_DOM_PAGE_SIZE(dom);
     libxl_ctxt.page_shift =  XC_DOM_PAGE_SHIFT(dom);
 
@@ -192,7 +208,7 @@ int libxl__dom_load_acpi(libxl__gc *gc,
     libxl_ctxt.guest_end += NUM_ACPI_PAGES * libxl_ctxt.page_size;
 
     /* Build the tables. */
-    rc = acpi_build_tables(&libxl_ctxt.c, &config);
+    rc = acpi_build_tables(&libxl_ctxt.c, &config, &policy->policy);
     if (rc) {
         LOG(ERROR, "acpi_build_tables failed with %d", rc);
         goto out;
@@ -226,6 +242,7 @@ int libxl__dom_load_acpi(libxl__gc *gc,
         libxl_ctxt.page_size;
 
 out:
+    xc_cpu_policy_destroy(policy);
     return rc;
 }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 5/6] xen/x86: Derive topologically correct x2APIC IDs from the policy
  2024-01-09 15:38 [PATCH 0/6] x86: Expose consistent topology to guests Alejandro Vallejo
                   ` (3 preceding siblings ...)
  2024-01-09 15:38 ` [PATCH 4/6] tools/hvmloader: Use cpu_policy to determine APIC IDs Alejandro Vallejo
@ 2024-01-09 15:38 ` Alejandro Vallejo
  2024-03-20 10:15   ` Roger Pau Monné
  2024-03-26 16:41   ` Jan Beulich
  2024-01-09 15:38 ` [PATCH 6/6] xen/x86: Add topology generator Alejandro Vallejo
  5 siblings, 2 replies; 32+ messages in thread
From: Alejandro Vallejo @ 2024-01-09 15:38 UTC (permalink / raw)
  To: Xen-devel
  Cc: Alejandro Vallejo, Jan Beulich, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, Anthony PERARD

Implements the helper for mapping vcpu_id to x2apic_id given a valid
topology in a policy. The algo is written with the intention of extending
it to leaves 0x1f and e26 in the future.

Toolstack doesn't set leaf 0xb and the HVM default policy has it cleared,
so the leaf is not implemented. In that case, the new helper just returns
the legacy mapping.

Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
---
 tools/tests/cpu-policy/test-cpu-policy.c | 128 +++++++++++++++++++++++
 xen/include/xen/lib/x86/cpu-policy.h     |   2 +
 xen/lib/x86/policy.c                     |  75 +++++++++++--
 3 files changed, 199 insertions(+), 6 deletions(-)

diff --git a/tools/tests/cpu-policy/test-cpu-policy.c b/tools/tests/cpu-policy/test-cpu-policy.c
index 301df2c002..6ff5c1dd3d 100644
--- a/tools/tests/cpu-policy/test-cpu-policy.c
+++ b/tools/tests/cpu-policy/test-cpu-policy.c
@@ -650,6 +650,132 @@ static void test_is_compatible_failure(void)
     }
 }
 
+static void test_x2apic_id_from_vcpu_id_success(void)
+{
+    static struct test {
+        const char *name;
+        uint32_t vcpu_id;
+        uint32_t x2apic_id;
+        struct cpu_policy policy;
+    } tests[] = {
+        {
+            .name = "3v: 3 t/c, 8 c/s", .vcpu_id = 3, .x2apic_id = 1 << 2,
+            .policy = {
+                .x86_vendor = X86_VENDOR_AMD,
+                .topo.subleaf = {
+                    [0] = { .nr_logical = 3, .level = 0, .type = 1, .id_shift = 2, },
+                    [1] = { .nr_logical = 8, .level = 1, .type = 2, .id_shift = 5, },
+                },
+            },
+        },
+        {
+            .name = "6v: 3 t/c, 8 c/s", .vcpu_id = 6, .x2apic_id = 2 << 2,
+            .policy = {
+                .x86_vendor = X86_VENDOR_AMD,
+                .topo.subleaf = {
+                    [0] = { .nr_logical = 3, .level = 0, .type = 1, .id_shift = 2, },
+                    [1] = { .nr_logical = 8, .level = 1, .type = 2, .id_shift = 5, },
+                },
+            },
+        },
+        {
+            .name = "24v: 3 t/c, 8 c/s", .vcpu_id = 24, .x2apic_id = 1 << 5,
+            .policy = {
+                .x86_vendor = X86_VENDOR_AMD,
+                .topo.subleaf = {
+                    [0] = { .nr_logical = 3, .level = 0, .type = 1, .id_shift = 2, },
+                    [1] = { .nr_logical = 8, .level = 1, .type = 2, .id_shift = 5, },
+                },
+            },
+        },
+        {
+            .name = "35v: 3 t/c, 8 c/s", .vcpu_id = 35,
+            .x2apic_id = (35 % 3) | (((35 / 3) % 8)  << 2) | ((35 / 24) << 5),
+            .policy = {
+                .x86_vendor = X86_VENDOR_AMD,
+                .topo.subleaf = {
+                    [0] = { .nr_logical = 3, .level = 0, .type = 1, .id_shift = 2, },
+                    [1] = { .nr_logical = 8, .level = 1, .type = 2, .id_shift = 5, },
+                },
+            },
+        },
+        {
+            .name = "96v: 7 t/c, 3 c/s", .vcpu_id = 96,
+            .x2apic_id = (96 % 7) | (((96 / 7) % 3)  << 3) | ((96 / 21) << 5),
+            .policy = {
+                .x86_vendor = X86_VENDOR_AMD,
+                .topo.subleaf = {
+                    [0] = { .nr_logical = 7, .level = 0, .type = 1, .id_shift = 3, },
+                    [1] = { .nr_logical = 3, .level = 1, .type = 2, .id_shift = 5, },
+                },
+            },
+        },
+        {
+            .name = "3v: 3 t/c, 8 c/s", .vcpu_id = 3, .x2apic_id = 1 << 2,
+            .policy = {
+                .x86_vendor = X86_VENDOR_INTEL,
+                .topo.subleaf = {
+                    [0] = { .nr_logical = 3,  .level = 0, .type = 1, .id_shift = 2, },
+                    [1] = { .nr_logical = 24, .level = 1, .type = 2, .id_shift = 5, },
+                },
+            },
+        },
+        {
+            .name = "6v: 3 t/c, 8 c/s", .vcpu_id = 6, .x2apic_id = 2 << 2,
+            .policy = {
+                .x86_vendor = X86_VENDOR_INTEL,
+                .topo.subleaf = {
+                    [0] = { .nr_logical = 3,  .level = 0, .type = 1, .id_shift = 2, },
+                    [1] = { .nr_logical = 24, .level = 1, .type = 2, .id_shift = 5, },
+                },
+            },
+        },
+        {
+            .name = "24v: 3 t/c, 8 c/s", .vcpu_id = 24, .x2apic_id = 1 << 5,
+            .policy = {
+                .x86_vendor = X86_VENDOR_INTEL,
+                .topo.subleaf = {
+                    [0] = { .nr_logical = 3,  .level = 0, .type = 1, .id_shift = 2, },
+                    [1] = { .nr_logical = 24, .level = 1, .type = 2, .id_shift = 5, },
+                },
+            },
+        },
+        {
+            .name = "35v: 3 t/c, 8 c/s", .vcpu_id = 35,
+            .x2apic_id = (35 % 3) | (((35 / 3) % 8)  << 2) | ((35 / 24) << 5),
+            .policy = {
+                .x86_vendor = X86_VENDOR_INTEL,
+                .topo.subleaf = {
+                    [0] = { .nr_logical = 3,  .level = 0, .type = 1, .id_shift = 2, },
+                    [1] = { .nr_logical = 24, .level = 1, .type = 2, .id_shift = 5, },
+                },
+            },
+        },
+        {
+            .name = "96v: 7 t/c, 3 c/s", .vcpu_id = 96,
+            .x2apic_id = (96 % 7) | (((96 / 7) % 3)  << 3) | ((96 / 21) << 5),
+            .policy = {
+                .x86_vendor = X86_VENDOR_INTEL,
+                .topo.subleaf = {
+                    [0] = { .nr_logical = 7,   .level = 0, .type = 1, .id_shift = 3, },
+                    [1] = { .nr_logical = 21,  .level = 1, .type = 2, .id_shift = 5, },
+                },
+            },
+        },
+    };
+
+    printf("Testing x2apic id from vcpu id success:\n");
+
+    for ( size_t i = 0; i < ARRAY_SIZE(tests); ++i )
+    {
+        struct test *t = &tests[i];
+        uint32_t x2apic_id = x86_x2apic_id_from_vcpu_id(&t->policy, t->vcpu_id);
+        if ( x2apic_id != t->x2apic_id )
+            fail("FAIL - '%s'. bad x2apic_id: expected=%u actual=%u\n",
+                 t->name, t->x2apic_id, x2apic_id);
+    }
+}
+
 int main(int argc, char **argv)
 {
     printf("CPU Policy unit tests\n");
@@ -667,6 +793,8 @@ int main(int argc, char **argv)
     test_is_compatible_success();
     test_is_compatible_failure();
 
+    test_x2apic_id_from_vcpu_id_success();
+
     if ( nr_failures )
         printf("Done: %u failures\n", nr_failures);
     else
diff --git a/xen/include/xen/lib/x86/cpu-policy.h b/xen/include/xen/lib/x86/cpu-policy.h
index 65f6335b32..d81ae2f47c 100644
--- a/xen/include/xen/lib/x86/cpu-policy.h
+++ b/xen/include/xen/lib/x86/cpu-policy.h
@@ -550,6 +550,8 @@ int x86_cpu_policies_are_compatible(const struct cpu_policy *host,
 /**
  * Calculates the x2APIC ID of a vCPU given a CPU policy
  *
+ * If the policy lacks leaf 0xb falls back to legacy mapping of apic_id=cpu*2
+ *
  * @param p          CPU policy of the domain.
  * @param vcpu_id    vCPU ID of the vCPU.
  * @returns x2APIC ID of the vCPU.
diff --git a/xen/lib/x86/policy.c b/xen/lib/x86/policy.c
index a3b24e6879..2a50bc076a 100644
--- a/xen/lib/x86/policy.c
+++ b/xen/lib/x86/policy.c
@@ -2,15 +2,78 @@
 
 #include <xen/lib/x86/cpu-policy.h>
 
-uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t vcpu_id)
+static uint32_t parts_per_higher_scoped_level(const struct cpu_policy *p, size_t lvl)
 {
     /*
-     * TODO: Derive x2APIC ID from the topology information inside `p`
-     *       rather than from vCPU ID. This bodge is a temporary measure
-     *       until all infra is in place to retrieve or derive the initial
-     *       x2APIC ID from migrated domains.
+     * `nr_logical` reported by Intel is the number of THREADS contained in
+     * the next topological scope. For example, assuming a system with 2
+     * threads/core and 3 cores/module in a fully symmetric topology,
+     * `nr_logical` at the core level will report 6. Because it's reporting
+     * the number of threads in a module.
+     *
+     * On AMD/Hygon, nr_logical is already normalized by the higher scoped
+     * level (cores/complex, etc) so we can return it as-is.
      */
-    return vcpu_id * 2;
+    if ( p->x86_vendor != X86_VENDOR_INTEL || !lvl )
+        return p->topo.subleaf[lvl].nr_logical;
+
+    return p->topo.subleaf[lvl].nr_logical / p->topo.subleaf[lvl - 1].nr_logical;
+}
+
+uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t id)
+{
+    uint32_t shift = 0, x2apic_id = 0;
+
+    /* In the absence of topology leaves, fallback to traditional mapping */
+    if ( !p->topo.subleaf[0].type )
+        return id * 2;
+
+    /*
+     * `id` means different things at different points of the algo
+     *
+     * At lvl=0: global thread_id (same as vcpu_id)
+     * At lvl=1: global core_id
+     * At lvl=2: global socket_id (actually complex_id in AMD, module_id
+     *                             in Intel, but the name is inconsequential)
+     *
+     *                 +--+
+     *            ____ |#0| ______           <= 1 socket
+     *           /     +--+       \+--+
+     *       __#0__              __|#1|__    <= 2 cores/socket
+     *      /   |  \        +--+/  +-|+  \
+     *    #0   #1   #2      |#3|    #4    #5 <= 3 threads/core
+     *                      +--+
+     *
+     * ... and so on. Global in this context means that it's a unique
+     * identifier for the whole topology, and not relative to the level
+     * it's in. For example, in the diagram shown above, we're looking at
+     * thread #3 in the global sense, though it's #0 within its core.
+     *
+     * Note that dividing a global thread_id by the number of threads per
+     * core returns the global core id that contains it. e.g: 0, 1 or 2
+     * divided by 3 returns core_id=0. 3, 4 or 5 divided by 3 returns core
+     * 1, and so on. An analogous argument holds for higher levels. This is
+     * the property we exploit to derive x2apic_id from vcpu_id.
+     *
+     * NOTE: `topo` is currently derived from leaf 0xb, which is bound to
+     * two levels, but once we track leaves 0x1f (or e26) there will be a
+     * few more. The algorithm is written to cope with that case.
+     */
+    for ( uint32_t i = 0; i < ARRAY_SIZE(p->topo.raw); i++ )
+    {
+        uint32_t nr_parts;
+
+        if ( !p->topo.subleaf[i].type )
+            /* sentinel subleaf */
+            break;
+
+        nr_parts = parts_per_higher_scoped_level(p, i);
+        x2apic_id |= (id % nr_parts) << shift;
+        id /= nr_parts;
+        shift = p->topo.subleaf[i].id_shift;
+    }
+
+    return (id << shift) | x2apic_id;
 }
 
 int x86_cpu_policies_are_compatible(const struct cpu_policy *host,
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 6/6] xen/x86: Add topology generator
  2024-01-09 15:38 [PATCH 0/6] x86: Expose consistent topology to guests Alejandro Vallejo
                   ` (4 preceding siblings ...)
  2024-01-09 15:38 ` [PATCH 5/6] xen/x86: Derive topologically correct x2APIC IDs from the policy Alejandro Vallejo
@ 2024-01-09 15:38 ` Alejandro Vallejo
  2024-01-10 14:16   ` Alejandro Vallejo
                     ` (2 more replies)
  5 siblings, 3 replies; 32+ messages in thread
From: Alejandro Vallejo @ 2024-01-09 15:38 UTC (permalink / raw)
  To: Xen-devel
  Cc: Alejandro Vallejo, Wei Liu, Anthony PERARD, Juergen Gross,
	Jan Beulich, Andrew Cooper, Roger Pau Monné

This allows toolstack to synthesise sensible topologies for guests. In
particular, this patch causes x2APIC IDs to be packed according to the
topology now exposed to the guests on leaf 0xb.

Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
---
 tools/include/xenguest.h        |  15 ++++
 tools/libs/guest/xg_cpuid_x86.c | 144 ++++++++++++++++++++------------
 xen/arch/x86/cpu-policy.c       |   6 +-
 3 files changed, 107 insertions(+), 58 deletions(-)

diff --git a/tools/include/xenguest.h b/tools/include/xenguest.h
index 4e9078fdee..f0043c559b 100644
--- a/tools/include/xenguest.h
+++ b/tools/include/xenguest.h
@@ -843,5 +843,20 @@ enum xc_static_cpu_featuremask {
     XC_FEATUREMASK_HVM_HAP_DEF,
 };
 const uint32_t *xc_get_static_cpu_featuremask(enum xc_static_cpu_featuremask);
+
+/**
+ * Synthesise topology information in `p` given high-level constraints
+ *
+ * Topology is given in various fields accross several leaves, some of
+ * which are vendor-specific. This function uses the policy itself to
+ * derive such leaves from threads/core and cores/package.
+ *
+ * @param p                   CPU policy of the domain.
+ * @param threads_per_core    threads/core. Doesn't need to be a power of 2.
+ * @param cores_per_package   cores/package. Doesn't need to be a power of 2.
+ */
+void xc_topo_from_parts(struct cpu_policy *p,
+                        uint32_t threads_per_core, uint32_t cores_per_pkg);
+
 #endif /* __i386__ || __x86_64__ */
 #endif /* XENGUEST_H */
diff --git a/tools/libs/guest/xg_cpuid_x86.c b/tools/libs/guest/xg_cpuid_x86.c
index 4453178100..7a622721be 100644
--- a/tools/libs/guest/xg_cpuid_x86.c
+++ b/tools/libs/guest/xg_cpuid_x86.c
@@ -584,7 +584,7 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, bool restore,
     bool hvm;
     xc_domaininfo_t di;
     struct xc_cpu_policy *p = xc_cpu_policy_init();
-    unsigned int i, nr_leaves = ARRAY_SIZE(p->leaves), nr_msrs = 0;
+    unsigned int nr_leaves = ARRAY_SIZE(p->leaves), nr_msrs = 0;
     uint32_t err_leaf = -1, err_subleaf = -1, err_msr = -1;
     uint32_t host_featureset[FEATURESET_NR_ENTRIES] = {};
     uint32_t len = ARRAY_SIZE(host_featureset);
@@ -727,60 +727,8 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, bool restore,
     }
     else
     {
-        /*
-         * Topology for HVM guests is entirely controlled by Xen.  For now, we
-         * hardcode APIC_ID = vcpu_id * 2 to give the illusion of no SMT.
-         */
-        p->policy.basic.htt = true;
-        p->policy.extd.cmp_legacy = false;
-
-        /*
-         * Leaf 1 EBX[23:16] is Maximum Logical Processors Per Package.
-         * Update to reflect vLAPIC_ID = vCPU_ID * 2, but make sure to avoid
-         * overflow.
-         */
-        if ( !p->policy.basic.lppp )
-            p->policy.basic.lppp = 2;
-        else if ( !(p->policy.basic.lppp & 0x80) )
-            p->policy.basic.lppp *= 2;
-
-        switch ( p->policy.x86_vendor )
-        {
-        case X86_VENDOR_INTEL:
-            for ( i = 0; (p->policy.cache.subleaf[i].type &&
-                          i < ARRAY_SIZE(p->policy.cache.raw)); ++i )
-            {
-                p->policy.cache.subleaf[i].cores_per_package =
-                    (p->policy.cache.subleaf[i].cores_per_package << 1) | 1;
-                p->policy.cache.subleaf[i].threads_per_cache = 0;
-            }
-            break;
-
-        case X86_VENDOR_AMD:
-        case X86_VENDOR_HYGON:
-            /*
-             * Leaf 0x80000008 ECX[15:12] is ApicIdCoreSize.
-             * Leaf 0x80000008 ECX[7:0] is NumberOfCores (minus one).
-             * Update to reflect vLAPIC_ID = vCPU_ID * 2.  But avoid
-             * - overflow,
-             * - going out of sync with leaf 1 EBX[23:16],
-             * - incrementing ApicIdCoreSize when it's zero (which changes the
-             *   meaning of bits 7:0).
-             *
-             * UPDATE: I addition to avoiding overflow, some
-             * proprietary operating systems have trouble with
-             * apic_id_size values greater than 7.  Limit the value to
-             * 7 for now.
-             */
-            if ( p->policy.extd.nc < 0x7f )
-            {
-                if ( p->policy.extd.apic_id_size != 0 && p->policy.extd.apic_id_size < 0x7 )
-                    p->policy.extd.apic_id_size++;
-
-                p->policy.extd.nc = (p->policy.extd.nc << 1) | 1;
-            }
-            break;
-        }
+        /* TODO: Expose the ability to choose a custom topology for HVM/PVH */
+        xc_topo_from_parts(&p->policy, 1, di.max_vcpu_id + 1);
     }
 
     nr_leaves = ARRAY_SIZE(p->leaves);
@@ -1028,3 +976,89 @@ bool xc_cpu_policy_is_compatible(xc_interface *xch, xc_cpu_policy_t *host,
 
     return false;
 }
+
+static uint32_t order(uint32_t n)
+{
+    return 32 - __builtin_clz(n);
+}
+
+void xc_topo_from_parts(struct cpu_policy *p,
+                        uint32_t threads_per_core, uint32_t cores_per_pkg)
+{
+    uint32_t threads_per_pkg = threads_per_core * cores_per_pkg;
+    uint32_t apic_id_size;
+
+    if ( p->basic.max_leaf < 0xb )
+        p->basic.max_leaf = 0xb;
+
+    memset(p->topo.raw, 0, sizeof(p->topo.raw));
+
+    /* thread level */
+    p->topo.subleaf[0].nr_logical = threads_per_core;
+    p->topo.subleaf[0].id_shift = 0;
+    p->topo.subleaf[0].level = 0;
+    p->topo.subleaf[0].type = 1;
+    if ( threads_per_core > 1 )
+        p->topo.subleaf[0].id_shift = order(threads_per_core - 1);
+
+    /* core level */
+    p->topo.subleaf[1].nr_logical = cores_per_pkg;
+    if ( p->x86_vendor == X86_VENDOR_INTEL )
+        p->topo.subleaf[1].nr_logical = threads_per_pkg;
+    p->topo.subleaf[1].id_shift = p->topo.subleaf[0].id_shift;
+    p->topo.subleaf[1].level = 1;
+    p->topo.subleaf[1].type = 2;
+    if ( cores_per_pkg > 1 )
+        p->topo.subleaf[1].id_shift += order(cores_per_pkg - 1);
+
+    apic_id_size = p->topo.subleaf[1].id_shift;
+
+    /*
+     * Contrary to what the name might seem to imply. HTT is an enabler for
+     * SMP and there's no harm in setting it even with a single vCPU.
+     */
+    p->basic.htt = true;
+
+    p->basic.lppp = 0xff;
+    if ( threads_per_pkg < 0xff )
+        p->basic.lppp = threads_per_pkg;
+
+    switch ( p->x86_vendor )
+    {
+        case X86_VENDOR_INTEL:
+            struct cpuid_cache_leaf *sl = p->cache.subleaf;
+            for ( size_t i = 0; sl->type &&
+                                i < ARRAY_SIZE(p->cache.raw); i++, sl++ )
+            {
+                sl->cores_per_package = cores_per_pkg - 1;
+                sl->threads_per_cache = threads_per_core - 1;
+                if ( sl->type == 3 /* unified cache */ )
+                    sl->threads_per_cache = threads_per_pkg - 1;
+            }
+            break;
+
+        case X86_VENDOR_AMD:
+        case X86_VENDOR_HYGON:
+            /* Expose p->basic.lppp */
+            p->extd.cmp_legacy = true;
+
+            /* Clip NC to the maximum value it can hold */
+            p->extd.nc = 0xff;
+            if ( threads_per_pkg <= 0xff )
+                p->extd.nc = threads_per_pkg - 1;
+
+            /* TODO: Expose leaf e1E */
+            p->extd.topoext = false;
+
+            /*
+             * Clip APIC ID to 8, as that's what high core-count machines do
+             *
+             * That what AMD EPYC 9654 does with >256 CPUs
+             */
+            p->extd.apic_id_size = 8;
+            if ( apic_id_size < 8 )
+                p->extd.apic_id_size = apic_id_size;
+
+            break;
+    }
+}
diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c
index 76efb050ed..679d1fe4fa 100644
--- a/xen/arch/x86/cpu-policy.c
+++ b/xen/arch/x86/cpu-policy.c
@@ -278,9 +278,6 @@ static void recalculate_misc(struct cpu_policy *p)
 
     p->basic.raw[0x8] = EMPTY_LEAF;
 
-    /* TODO: Rework topology logic. */
-    memset(p->topo.raw, 0, sizeof(p->topo.raw));
-
     p->basic.raw[0xc] = EMPTY_LEAF;
 
     p->extd.e1d &= ~CPUID_COMMON_1D_FEATURES;
@@ -387,6 +384,9 @@ static void __init calculate_host_policy(void)
     recalculate_xstate(p);
     recalculate_misc(p);
 
+    /* Wipe host topology. Toolstack is expected to synthesise a sensible one */
+    memset(p->topo.raw, 0, sizeof(p->topo.raw));
+
     /* When vPMU is disabled, drop it from the host policy. */
     if ( vpmu_mode == XENPMU_MODE_OFF )
         p->basic.raw[0xa] = EMPTY_LEAF;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 6/6] xen/x86: Add topology generator
  2024-01-09 15:38 ` [PATCH 6/6] xen/x86: Add topology generator Alejandro Vallejo
@ 2024-01-10 14:16   ` Alejandro Vallejo
  2024-01-10 14:28     ` Jan Beulich
  2024-03-20 10:29   ` Roger Pau Monné
  2024-03-26 17:02   ` Jan Beulich
  2 siblings, 1 reply; 32+ messages in thread
From: Alejandro Vallejo @ 2024-01-10 14:16 UTC (permalink / raw)
  To: Xen-devel
  Cc: Wei Liu, Anthony PERARD, Juergen Gross, Jan Beulich,
	Andrew Cooper, Roger Pau Monné

Review-to-self after running in Gitlab:

On 09/01/2024 15:38, Alejandro Vallejo wrote:
> +    p->basic.lppp = 0xff;
> +    if ( threads_per_pkg < 0xff )
> +        p->basic.lppp = threads_per_pkg;
> +
> +    switch ( p->x86_vendor )
> +    {
> +        case X86_VENDOR_INTEL:
> +            struct cpuid_cache_leaf *sl = p->cache.subleaf;
> +            for ( size_t i = 0; sl->type &&
> +                                i < ARRAY_SIZE(p->cache.raw); i++, sl++ )
> +            {
> +                sl->cores_per_package = cores_per_pkg - 1;
> +                sl->threads_per_cache = threads_per_core - 1;
> +                if ( sl->type == 3 /* unified cache */ )
> +                    sl->threads_per_cache = threads_per_pkg - 1;
> +            }
> +            break;
> +
> +        case X86_VENDOR_AMD:
> +        case X86_VENDOR_HYGON:

Missing braces around the INTEL block due to the variable declarared
there. I'll include that in v2 after the rest of the review comments
come through.

Cheers,
Alejandro


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 6/6] xen/x86: Add topology generator
  2024-01-10 14:16   ` Alejandro Vallejo
@ 2024-01-10 14:28     ` Jan Beulich
  0 siblings, 0 replies; 32+ messages in thread
From: Jan Beulich @ 2024-01-10 14:28 UTC (permalink / raw)
  To: Alejandro Vallejo
  Cc: Wei Liu, Anthony PERARD, Juergen Gross, Andrew Cooper,
	Roger Pau Monné,
	Xen-devel

On 10.01.2024 15:16, Alejandro Vallejo wrote:
> Review-to-self after running in Gitlab:
> 
> On 09/01/2024 15:38, Alejandro Vallejo wrote:
>> +    p->basic.lppp = 0xff;
>> +    if ( threads_per_pkg < 0xff )
>> +        p->basic.lppp = threads_per_pkg;
>> +
>> +    switch ( p->x86_vendor )
>> +    {
>> +        case X86_VENDOR_INTEL:
>> +            struct cpuid_cache_leaf *sl = p->cache.subleaf;
>> +            for ( size_t i = 0; sl->type &&
>> +                                i < ARRAY_SIZE(p->cache.raw); i++, sl++ )
>> +            {
>> +                sl->cores_per_package = cores_per_pkg - 1;
>> +                sl->threads_per_cache = threads_per_core - 1;
>> +                if ( sl->type == 3 /* unified cache */ )
>> +                    sl->threads_per_cache = threads_per_pkg - 1;
>> +            }
>> +            break;
>> +
>> +        case X86_VENDOR_AMD:
>> +        case X86_VENDOR_HYGON:
> 
> Missing braces around the INTEL block due to the variable declarared
> there. I'll include that in v2 after the rest of the review comments
> come through.

And (just looking at the fragment above) too deep indentation as well.

Jan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/6] xen/x86: Add initial x2APIC ID to the per-vLAPIC save area
  2024-01-09 15:38 ` [PATCH 1/6] xen/x86: Add initial x2APIC ID to the per-vLAPIC save area Alejandro Vallejo
@ 2024-03-19 16:20   ` Roger Pau Monné
  2024-03-20  9:33     ` Roger Pau Monné
  2024-03-25 15:44     ` Alejandro Vallejo
  2024-03-25 16:45   ` Jan Beulich
  1 sibling, 2 replies; 32+ messages in thread
From: Roger Pau Monné @ 2024-03-19 16:20 UTC (permalink / raw)
  To: Alejandro Vallejo; +Cc: Xen-devel, Jan Beulich, Andrew Cooper, Wei Liu

On Tue, Jan 09, 2024 at 03:38:29PM +0000, Alejandro Vallejo wrote:
> This allows the initial x2APIC ID to be sent on the migration stream. The
> hardcoded mapping x2apic_id=2*vcpu_id is maintained for the time being.
> Given the vlapic data is zero-extended on restore, fix up migrations from
> hosts without the field by setting it to the old convention if zero.
> 
> x2APIC IDs are calculated from the CPU policy where the guest topology is
> defined. For the time being, the function simply returns the old
> relationship, but will eventually return results consistent with the
> topology.
> 
> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
> ---
>  xen/arch/x86/cpuid.c                   | 20 ++++---------------
>  xen/arch/x86/domain.c                  |  3 +++
>  xen/arch/x86/hvm/vlapic.c              | 27 ++++++++++++++++++++++++--
>  xen/arch/x86/include/asm/hvm/vlapic.h  |  2 ++
>  xen/include/public/arch-x86/hvm/save.h |  2 ++
>  xen/include/xen/lib/x86/cpu-policy.h   |  9 +++++++++
>  xen/lib/x86/policy.c                   | 11 +++++++++++
>  7 files changed, 56 insertions(+), 18 deletions(-)
> 
> diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c
> index 7290a979c6..6e259785d0 100644
> --- a/xen/arch/x86/cpuid.c
> +++ b/xen/arch/x86/cpuid.c
> @@ -139,10 +139,9 @@ void guest_cpuid(const struct vcpu *v, uint32_t leaf,
>          const struct cpu_user_regs *regs;
>  
>      case 0x1:
> -        /* TODO: Rework topology logic. */
>          res->b &= 0x00ffffffu;
>          if ( is_hvm_domain(d) )
> -            res->b |= (v->vcpu_id * 2) << 24;
> +            res->b |= SET_xAPIC_ID(vlapic_x2apic_id(vcpu_vlapic(v)));

SET_xAPIC_ID() was intended to be used with the APIC_ID register,
which also shifts the ID.  Not sure it's logically correct to use
here, even if functionally equivalent (as is shifts left by 24).

>  
>          /* TODO: Rework vPMU control in terms of toolstack choices. */
>          if ( vpmu_available(v) &&
> @@ -311,20 +310,9 @@ void guest_cpuid(const struct vcpu *v, uint32_t leaf,
>          break;
>  
>      case 0xb:
> -        /*
> -         * In principle, this leaf is Intel-only.  In practice, it is tightly
> -         * coupled with x2apic, and we offer an x2apic-capable APIC emulation
> -         * to guests on AMD hardware as well.
> -         *
> -         * TODO: Rework topology logic.
> -         */
> -        if ( p->basic.x2apic )
> -        {
> -            *(uint8_t *)&res->c = subleaf;
> -
> -            /* Fix the x2APIC identifier. */
> -            res->d = v->vcpu_id * 2;
> -        }
> +        /* ecx != 0 if the subleaf is implemented */
> +        if ( res->c && p->basic.x2apic )
> +            res->d = vlapic_x2apic_id(vcpu_vlapic(v));

This needs to be protected so it's reachable by HVM guests only,
otherwise you will dereference v->arch.hvm.vlapic on a PV vCPU if it
happens to have p->basic.x2apic set.

Why not just return the x2apic_id field from the cpu_policy object?
(topo.subleaf[X].x2apic_id)

Also, I'm not sure I get why the setting of res->d is gated on res->c
!= 0, won't res->c be 0 when the guest %ecx is 0, yet %edx must be
valid for all %ecx inputs, the SDM states:

"The EDX output of leaf 0BH is always valid and does not vary with
input value in ECX."

I think you need to keep the previous logic that doesn't gate setting
->d on anything other than p->basic.x2apic.

>          break;
>  
>      case XSTATE_CPUID:
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index 8a31d18f69..e0c7ed8d5d 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -288,7 +288,10 @@ void update_guest_memory_policy(struct vcpu *v,
>  static void cpu_policy_updated(struct vcpu *v)
>  {
>      if ( is_hvm_vcpu(v) )
> +    {
>          hvm_cpuid_policy_changed(v);
> +        vlapic_cpu_policy_changed(v);
> +    }
>  }
>  
>  void domain_cpu_policy_changed(struct domain *d)
> diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
> index cdb69d9742..f500d66543 100644
> --- a/xen/arch/x86/hvm/vlapic.c
> +++ b/xen/arch/x86/hvm/vlapic.c
> @@ -1069,7 +1069,7 @@ static uint32_t x2apic_ldr_from_id(uint32_t id)
>  static void set_x2apic_id(struct vlapic *vlapic)
>  {
>      const struct vcpu *v = vlapic_vcpu(vlapic);
> -    uint32_t apic_id = v->vcpu_id * 2;
> +    uint32_t apic_id = vlapic->hw.x2apic_id;
>      uint32_t apic_ldr = x2apic_ldr_from_id(apic_id);
>  
>      /*
> @@ -1083,6 +1083,22 @@ static void set_x2apic_id(struct vlapic *vlapic)
>      vlapic_set_reg(vlapic, APIC_LDR, apic_ldr);
>  }
>  
> +void vlapic_cpu_policy_changed(struct vcpu *v)
> +{
> +    struct vlapic *vlapic = vcpu_vlapic(v);
> +    struct cpu_policy *cp = v->domain->arch.cpu_policy;
> +
> +    /*
> +     * Don't override the initial x2APIC ID if we have migrated it or
> +     * if the domain doesn't have vLAPIC at all.
> +     */
> +    if ( !has_vlapic(v->domain) || vlapic->loaded.hw )
> +        return;
> +
> +    vlapic->hw.x2apic_id = x86_x2apic_id_from_vcpu_id(cp, v->vcpu_id);
> +    vlapic_set_reg(vlapic, APIC_ID, SET_xAPIC_ID(vlapic->hw.x2apic_id));
> +}
> +
>  int guest_wrmsr_apic_base(struct vcpu *v, uint64_t val)
>  {
>      const struct cpu_policy *cp = v->domain->arch.cpu_policy;
> @@ -1449,7 +1465,7 @@ void vlapic_reset(struct vlapic *vlapic)
>      if ( v->vcpu_id == 0 )
>          vlapic->hw.apic_base_msr |= APIC_BASE_BSP;
>  
> -    vlapic_set_reg(vlapic, APIC_ID, (v->vcpu_id * 2) << 24);
> +    vlapic_set_reg(vlapic, APIC_ID, SET_xAPIC_ID(vlapic->hw.x2apic_id));
>      vlapic_do_init(vlapic);
>  }
>  
> @@ -1514,6 +1530,13 @@ static void lapic_load_fixup(struct vlapic *vlapic)
>      const struct vcpu *v = vlapic_vcpu(vlapic);
>      uint32_t good_ldr = x2apic_ldr_from_id(vlapic->loaded.id);
>  
> +    /*
> +     * Guest with hardcoded assumptions about x2apic_id <-> vcpu_id
> +     * mappings. Recreate the mapping it used to have in old host.

Wouldn't it be more appropriate to state "Loading record without
hw.x2apic_id in the save stream, calculate using the vcpu_id * 2
relation" or some such.

Current comment makes it looks like the guest has some kind of
restriction with this relation, but that's just an internal Xen
limitation.

> +     */
> +    if ( !vlapic->hw.x2apic_id )
> +        vlapic->hw.x2apic_id = v->vcpu_id * 2;
> +
>      /* Skip fixups on xAPIC mode, or if the x2APIC LDR is already correct */
>      if ( !vlapic_x2apic_mode(vlapic) ||
>           (vlapic->loaded.ldr == good_ldr) )
> diff --git a/xen/arch/x86/include/asm/hvm/vlapic.h b/xen/arch/x86/include/asm/hvm/vlapic.h
> index 88ef945243..e8d41313ab 100644
> --- a/xen/arch/x86/include/asm/hvm/vlapic.h
> +++ b/xen/arch/x86/include/asm/hvm/vlapic.h
> @@ -44,6 +44,7 @@
>  #define vlapic_xapic_mode(vlapic)                               \
>      (!vlapic_hw_disabled(vlapic) && \
>       !((vlapic)->hw.apic_base_msr & APIC_BASE_EXTD))
> +#define vlapic_x2apic_id(vlapic) ((vlapic)->hw.x2apic_id)
>  
>  /*
>   * Generic APIC bitmap vector update & search routines.
> @@ -107,6 +108,7 @@ int vlapic_ack_pending_irq(struct vcpu *v, int vector, bool force_ack);
>  
>  int  vlapic_init(struct vcpu *v);
>  void vlapic_destroy(struct vcpu *v);
> +void vlapic_cpu_policy_changed(struct vcpu *v);
>  
>  void vlapic_reset(struct vlapic *vlapic);
>  
> diff --git a/xen/include/public/arch-x86/hvm/save.h b/xen/include/public/arch-x86/hvm/save.h
> index 7ecacadde1..1c2ec669ff 100644
> --- a/xen/include/public/arch-x86/hvm/save.h
> +++ b/xen/include/public/arch-x86/hvm/save.h
> @@ -394,6 +394,8 @@ struct hvm_hw_lapic {
>      uint32_t             disabled; /* VLAPIC_xx_DISABLED */
>      uint32_t             timer_divisor;
>      uint64_t             tdt_msr;
> +    uint32_t             x2apic_id;
> +    uint32_t             rsvd_zero;

Do we really to add a new field, couldn't we get the lapic IDs from
the cpu_policy?

>  };
>  
>  DECLARE_HVM_SAVE_TYPE(LAPIC, 5, struct hvm_hw_lapic);
> diff --git a/xen/include/xen/lib/x86/cpu-policy.h b/xen/include/xen/lib/x86/cpu-policy.h
> index d5e447e9dc..14724cedff 100644
> --- a/xen/include/xen/lib/x86/cpu-policy.h
> +++ b/xen/include/xen/lib/x86/cpu-policy.h
> @@ -542,6 +542,15 @@ int x86_cpu_policies_are_compatible(const struct cpu_policy *host,
>                                      const struct cpu_policy *guest,
>                                      struct cpu_policy_errors *err);
>  
> +/**
> + * Calculates the x2APIC ID of a vCPU given a CPU policy
> + *
> + * @param p          CPU policy of the domain.
> + * @param vcpu_id    vCPU ID of the vCPU.
> + * @returns x2APIC ID of the vCPU.
> + */
> +uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t vcpu_id);
> +
>  #endif /* !XEN_LIB_X86_POLICIES_H */
>  
>  /*
> diff --git a/xen/lib/x86/policy.c b/xen/lib/x86/policy.c
> index f033d22785..a3b24e6879 100644
> --- a/xen/lib/x86/policy.c
> +++ b/xen/lib/x86/policy.c
> @@ -2,6 +2,17 @@
>  
>  #include <xen/lib/x86/cpu-policy.h>
>  
> +uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t vcpu_id)
> +{
> +    /*
> +     * TODO: Derive x2APIC ID from the topology information inside `p`
> +     *       rather than from vCPU ID. This bodge is a temporary measure
> +     *       until all infra is in place to retrieve or derive the initial
> +     *       x2APIC ID from migrated domains.
> +     */
> +    return vcpu_id * 2;

As noted above, won't a suitable initial step would be to populate the
apic_id and x2apic_id fields in struct cpu_policy with this relation
(x{,2}apic_id == vcpu_id * 2), and avoid this extra handler?

Thanks, Roger.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/6] tools/xc: Add xc_cpu_policy to the public xenctrl.h header
  2024-01-09 15:38 ` [PATCH 2/6] tools/xc: Add xc_cpu_policy to the public xenctrl.h header Alejandro Vallejo
@ 2024-03-19 17:55   ` Roger Pau Monné
  2024-03-25 18:19     ` Alejandro Vallejo
  0 siblings, 1 reply; 32+ messages in thread
From: Roger Pau Monné @ 2024-03-19 17:55 UTC (permalink / raw)
  To: Alejandro Vallejo
  Cc: Xen-devel, Wei Liu, Anthony PERARD, Juergen Gross, Jan Beulich,
	Andrew Cooper

On Tue, Jan 09, 2024 at 03:38:30PM +0000, Alejandro Vallejo wrote:
> Move struct xc_cpu_policy data structure out of xg_private.h and into
> the public xenguest.h so it can be used by libxl.

I will let Andrew comment on this one, IIRC the initial idea was to
not leak cpu_policy into libxl, and to instead have it as an opaque
object that libxl can interact with using helpers.

I haven't looked at followup patches - I assume this is done to
manipulate the cpuid data more easily from libxl, and ultimately drop
xc_xend_cpuid?

> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
> ---
>  tools/include/xenguest.h             |  8 +++++++-
>  tools/libs/guest/xg_private.h        | 10 ----------
>  xen/include/xen/lib/x86/cpu-policy.h |  5 +++++
>  3 files changed, 12 insertions(+), 11 deletions(-)
> 
> diff --git a/tools/include/xenguest.h b/tools/include/xenguest.h
> index e01f494b77..4e9078fdee 100644
> --- a/tools/include/xenguest.h
> +++ b/tools/include/xenguest.h
> @@ -784,7 +784,13 @@ xen_pfn_t *xc_map_m2p(xc_interface *xch,
>                        unsigned long *mfn0);
>  
>  #if defined(__i386__) || defined(__x86_64__)
> -typedef struct xc_cpu_policy xc_cpu_policy_t;
> +#include <xen/lib/x86/cpu-policy.h>
> +
> +typedef struct xc_cpu_policy {
> +    struct cpu_policy policy;
> +    xen_cpuid_leaf_t leaves[CPUID_MAX_SERIALISED_LEAVES];
> +    xen_msr_entry_t msrs[MSR_MAX_SERIALISED_ENTRIES];
> +} xc_cpu_policy_t;
>  
>  /* Create and free a xc_cpu_policy object. */
>  xc_cpu_policy_t *xc_cpu_policy_init(void);
> diff --git a/tools/libs/guest/xg_private.h b/tools/libs/guest/xg_private.h
> index d73947094f..d1940f1ea4 100644
> --- a/tools/libs/guest/xg_private.h
> +++ b/tools/libs/guest/xg_private.h
> @@ -170,14 +170,4 @@ int pin_table(xc_interface *xch, unsigned int type, unsigned long mfn,
>  #define M2P_SIZE(_m)    ROUNDUP(((_m) * sizeof(xen_pfn_t)), M2P_SHIFT)
>  #define M2P_CHUNKS(_m)  (M2P_SIZE((_m)) >> M2P_SHIFT)
>  
> -#if defined(__x86_64__) || defined(__i386__)
> -#include <xen/lib/x86/cpu-policy.h>
> -
> -struct xc_cpu_policy {
> -    struct cpu_policy policy;
> -    xen_cpuid_leaf_t leaves[CPUID_MAX_SERIALISED_LEAVES];
> -    xen_msr_entry_t msrs[MSR_MAX_SERIALISED_ENTRIES];
> -};
> -#endif /* x86 */
> -
>  #endif /* XG_PRIVATE_H */
> diff --git a/xen/include/xen/lib/x86/cpu-policy.h b/xen/include/xen/lib/x86/cpu-policy.h
> index 14724cedff..65f6335b32 100644
> --- a/xen/include/xen/lib/x86/cpu-policy.h
> +++ b/xen/include/xen/lib/x86/cpu-policy.h
> @@ -85,6 +85,11 @@ unsigned int x86_cpuid_lookup_vendor(uint32_t ebx, uint32_t ecx, uint32_t edx);
>   */
>  const char *x86_cpuid_vendor_to_str(unsigned int vendor);
>  
> +#ifndef __XEN__
> +/* Needed for MAX() */
> +#include <xen-tools/common-macros.h>
> +#endif /* __XEN__ */

I think for this header it is up to the user to provide the required
context, iow: it should be tools/include/xenguest.h to include
xen-tools/common-macros.h ahead of cpu-policy.h.  Otherwise we should
likely do the same for #ifdef __XEN__ branch and include whatever
header that defines MAX().

Thanks, Roger.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/6] tools/hvmloader: Use cpu_policy to determine APIC IDs
  2024-01-09 15:38 ` [PATCH 4/6] tools/hvmloader: Use cpu_policy to determine APIC IDs Alejandro Vallejo
@ 2024-03-20  8:52   ` Roger Pau Monné
  2024-03-25 17:00     ` Jan Beulich
  0 siblings, 1 reply; 32+ messages in thread
From: Roger Pau Monné @ 2024-03-20  8:52 UTC (permalink / raw)
  To: Alejandro Vallejo
  Cc: Xen-devel, Jan Beulich, Andrew Cooper, Wei Liu, Anthony PERARD,
	Juergen Gross

On Tue, Jan 09, 2024 at 03:38:32PM +0000, Alejandro Vallejo wrote:
> As part of topology correction efforts, APIC IDs can no longer be derived
> strictly through the vCPU ID alone. Bring in the machinery for policy
> retrieval and parsing in order to generate the proper MADT table and wake
> the appropriate CPUs.

I'm kind of unsure about this last part of the sentence, as I see no
waking of CPUs in hvmloader.  Is this referring to something else?

I'm kind of unsure about bringing the cpu_policy machinery to
hvmloader.  Won't it be simpler to just pass the array of APIC IDs in
hvm_info_table?  The current size of this struct is 48bytes, and an
array of 128 32bit integers would be an additional 512bytes.

AFAICT there's plenty of room in hvm_info_table, it's
positioned at 0x9f800, and there's unused space up to 0x9fc00, so 1024
bytes of memory we could use.

I know this doesn't give us much room for expansion if we want to bump
past 128 vCPUs, but a more appropriate solution IMO would be to move
ACPI table creation to the toolstack.

It's possible I'm missing some aspects, so if this has been considered
and rejected would be good to note in the commit message.

> 
> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
> ---
>  tools/firmware/hvmloader/Makefile    |  7 +++++++
>  tools/firmware/hvmloader/config.h    |  5 ++++-
>  tools/firmware/hvmloader/hvmloader.c |  6 ++++++
>  tools/firmware/hvmloader/util.c      |  3 ++-
>  tools/libacpi/build.c                | 27 +++++++++++++++++++++------
>  tools/libacpi/libacpi.h              |  5 ++++-
>  tools/libs/light/libxl_x86_acpi.c    | 21 +++++++++++++++++++--
>  7 files changed, 63 insertions(+), 11 deletions(-)
> 
> diff --git a/tools/firmware/hvmloader/Makefile b/tools/firmware/hvmloader/Makefile
> index e5de1ade17..503ef2e219 100644
> --- a/tools/firmware/hvmloader/Makefile
> +++ b/tools/firmware/hvmloader/Makefile
> @@ -37,6 +37,13 @@ ifeq ($(debug),y)
>  OBJS += tests.o
>  endif
>  
> +CFLAGS += -I../../../xen/arch/x86/include/
> +
> +vpath cpuid.c ../../../xen/lib/x86/
> +OBJS += cpuid.o
> +vpath policy.c ../../../xen/lib/x86/
> +OBJS += policy.o
> +
>  CIRRUSVGA_DEBUG ?= n
>  
>  ROMBIOS_DIR := ../rombios
> diff --git a/tools/firmware/hvmloader/config.h b/tools/firmware/hvmloader/config.h
> index c82adf6dc5..3304b63cd8 100644
> --- a/tools/firmware/hvmloader/config.h
> +++ b/tools/firmware/hvmloader/config.h
> @@ -3,10 +3,13 @@
>  
>  #include <stdint.h>
>  #include <stdbool.h>
> +#include <xen/lib/x86/cpu-policy.h>
>  
>  enum virtual_vga { VGA_none, VGA_std, VGA_cirrus, VGA_pt };
>  extern enum virtual_vga virtual_vga;
>  
> +extern struct cpu_policy cpu_policy;
> +
>  extern unsigned long igd_opregion_pgbase;
>  #define IGD_OPREGION_PAGES 3
>  
> @@ -50,7 +53,7 @@ extern uint8_t ioapic_version;
>  #define IOAPIC_ID           0x01
>  
>  #define LAPIC_BASE_ADDRESS  0xfee00000
> -#define LAPIC_ID(vcpu_id)   ((vcpu_id) * 2)
> +#define LAPIC_ID(vcpu_id) x86_x2apic_id_from_vcpu_id(&cpu_policy, vcpu_id)
>  
>  #define PCI_ISA_DEVFN       0x08    /* dev 1, fn 0 */
>  #define PCI_ISA_IRQ_MASK    0x0c20U /* ISA IRQs 5,10,11 are PCI connected */
> diff --git a/tools/firmware/hvmloader/hvmloader.c b/tools/firmware/hvmloader/hvmloader.c
> index c58841e5b5..8874165c66 100644
> --- a/tools/firmware/hvmloader/hvmloader.c
> +++ b/tools/firmware/hvmloader/hvmloader.c
> @@ -28,6 +28,7 @@
>  #include <acpi2_0.h>
>  #include <xen/version.h>
>  #include <xen/hvm/params.h>
> +#include <xen/lib/x86/cpu-policy.h>
>  #include <xen/arch-x86/hvm/start_info.h>
>  
>  const struct hvm_start_info *hvm_start_info;
> @@ -118,6 +119,9 @@ uint8_t ioapic_version;
>  
>  bool acpi_enabled;
>  
> +/* TODO: Remove after HVM ACPI building makes it to libxl */
> +struct cpu_policy cpu_policy;
> +
>  static void init_hypercalls(void)
>  {
>      uint32_t eax, ebx, ecx, edx;
> @@ -331,6 +335,8 @@ int main(void)
>      printf("HVM Loader\n");
>      BUG_ON(hvm_start_info->magic != XEN_HVM_START_MAGIC_VALUE);
>  
> +    x86_cpu_policy_fill_native(&cpu_policy);
> +
>      init_hypercalls();
>  
>      memory_map_setup();
> diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
> index e82047d993..25a0245c5c 100644
> --- a/tools/firmware/hvmloader/util.c
> +++ b/tools/firmware/hvmloader/util.c
> @@ -30,6 +30,7 @@
>  #include <xen/sched.h>
>  #include <xen/hvm/hvm_xs_strings.h>
>  #include <xen/hvm/params.h>
> +#include <xen/lib/x86/cpu-policy.h>
>  
>  /*
>   * Check whether there exists overlap in the specified memory range.
> @@ -1041,7 +1042,7 @@ void hvmloader_acpi_build_tables(struct acpi_config *config,
>      ctxt.mem_ops.free = acpi_mem_free;
>      ctxt.mem_ops.v2p = acpi_v2p;
>  
> -    acpi_build_tables(&ctxt, config);
> +    acpi_build_tables(&ctxt, config, &cpu_policy);
>  
>      hvm_param_set(HVM_PARAM_VM_GENERATION_ID_ADDR, config->vm_gid_addr);
>  }
> diff --git a/tools/libacpi/build.c b/tools/libacpi/build.c
> index 2f29863db1..ab05e54c96 100644
> --- a/tools/libacpi/build.c
> +++ b/tools/libacpi/build.c
> @@ -17,6 +17,7 @@
>  #include <xen/hvm/hvm_xs_strings.h>
>  #include <xen/hvm/params.h>
>  #include <xen/memory.h>
> +#include <xen/lib/x86/cpu-policy.h>
>  
>  #define ACPI_MAX_SECONDARY_TABLES 16
>  
> @@ -65,7 +66,8 @@ static void set_checksum(
>  
>  static struct acpi_20_madt *construct_madt(struct acpi_ctxt *ctxt,
>                                             const struct acpi_config *config,
> -                                           struct acpi_info *info)
> +                                           struct acpi_info *info,
> +                                           const struct cpu_policy *policy)
>  {
>      struct acpi_20_madt           *madt;
>      struct acpi_20_madt_intsrcovr *intsrcovr;
> @@ -143,14 +145,25 @@ static struct acpi_20_madt *construct_madt(struct acpi_ctxt *ctxt,
>      info->madt_lapic0_addr = ctxt->mem_ops.v2p(ctxt, lapic);
>      for ( i = 0; i < hvminfo->nr_vcpus; i++ )
>      {
> +        uint32_t x2apic_id = x86_x2apic_id_from_vcpu_id(policy, i);
> +
>          memset(lapic, 0, sizeof(*lapic));
>          lapic->type    = ACPI_PROCESSOR_LOCAL_APIC;
>          lapic->length  = sizeof(*lapic);
>          /* Processor ID must match processor-object IDs in the DSDT. */
>          lapic->acpi_processor_id = i;
> -        lapic->apic_id = config->lapic_id(i);
> +        lapic->apic_id = x2apic_id;
>          lapic->flags = (test_bit(i, hvminfo->vcpu_online)
>                          ? ACPI_LOCAL_APIC_ENABLED : 0);
> +
> +        /*
> +         * Error-out if the x2APIC ID doesn't fit in the entry
> +         *
> +         * TODO: Use "x2apic" entries if biggest x2apic_id > 254

Nit: while it's true that we could fit bigger IDs by using x2apic
entries, there's a bit more to it.  Most OSes will refuse to use APIC
IDs >= 255 unless they have a way to target those CPUs as interrupt
destinations, either with interrupt remapping or with the extended
destination ID workaround [0].

[0] https://gitlab.com/xen-project/xen/-/issues/173

Thanks, Roger.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/6] xen/x86: Add initial x2APIC ID to the per-vLAPIC save area
  2024-03-19 16:20   ` Roger Pau Monné
@ 2024-03-20  9:33     ` Roger Pau Monné
  2024-03-25 16:56       ` Alejandro Vallejo
  2024-03-25 15:44     ` Alejandro Vallejo
  1 sibling, 1 reply; 32+ messages in thread
From: Roger Pau Monné @ 2024-03-20  9:33 UTC (permalink / raw)
  To: Alejandro Vallejo; +Cc: Xen-devel, Jan Beulich, Andrew Cooper, Wei Liu

On Tue, Mar 19, 2024 at 05:20:35PM +0100, Roger Pau Monné wrote:
> On Tue, Jan 09, 2024 at 03:38:29PM +0000, Alejandro Vallejo wrote:
> > This allows the initial x2APIC ID to be sent on the migration stream. The
> > hardcoded mapping x2apic_id=2*vcpu_id is maintained for the time being.
> > Given the vlapic data is zero-extended on restore, fix up migrations from
> > hosts without the field by setting it to the old convention if zero.
> > 
> > x2APIC IDs are calculated from the CPU policy where the guest topology is
> > defined. For the time being, the function simply returns the old
> > relationship, but will eventually return results consistent with the
> > topology.
> > 
> > Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
> > ---
> >  xen/arch/x86/cpuid.c                   | 20 ++++---------------
> >  xen/arch/x86/domain.c                  |  3 +++
> >  xen/arch/x86/hvm/vlapic.c              | 27 ++++++++++++++++++++++++--
> >  xen/arch/x86/include/asm/hvm/vlapic.h  |  2 ++
> >  xen/include/public/arch-x86/hvm/save.h |  2 ++
> >  xen/include/xen/lib/x86/cpu-policy.h   |  9 +++++++++
> >  xen/lib/x86/policy.c                   | 11 +++++++++++
> >  7 files changed, 56 insertions(+), 18 deletions(-)
> > 
> > diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c
> > index 7290a979c6..6e259785d0 100644
> > --- a/xen/arch/x86/cpuid.c
> > +++ b/xen/arch/x86/cpuid.c
> > @@ -139,10 +139,9 @@ void guest_cpuid(const struct vcpu *v, uint32_t leaf,
> >          const struct cpu_user_regs *regs;
> >  
> >      case 0x1:
> > -        /* TODO: Rework topology logic. */
> >          res->b &= 0x00ffffffu;
> >          if ( is_hvm_domain(d) )
> > -            res->b |= (v->vcpu_id * 2) << 24;
> > +            res->b |= SET_xAPIC_ID(vlapic_x2apic_id(vcpu_vlapic(v)));
> 
> SET_xAPIC_ID() was intended to be used with the APIC_ID register,
> which also shifts the ID.  Not sure it's logically correct to use
> here, even if functionally equivalent (as is shifts left by 24).
> 
> >  
> >          /* TODO: Rework vPMU control in terms of toolstack choices. */
> >          if ( vpmu_available(v) &&
> > @@ -311,20 +310,9 @@ void guest_cpuid(const struct vcpu *v, uint32_t leaf,
> >          break;
> >  
> >      case 0xb:
> > -        /*
> > -         * In principle, this leaf is Intel-only.  In practice, it is tightly
> > -         * coupled with x2apic, and we offer an x2apic-capable APIC emulation
> > -         * to guests on AMD hardware as well.
> > -         *
> > -         * TODO: Rework topology logic.
> > -         */
> > -        if ( p->basic.x2apic )
> > -        {
> > -            *(uint8_t *)&res->c = subleaf;
> > -
> > -            /* Fix the x2APIC identifier. */
> > -            res->d = v->vcpu_id * 2;
> > -        }
> > +        /* ecx != 0 if the subleaf is implemented */
> > +        if ( res->c && p->basic.x2apic )
> > +            res->d = vlapic_x2apic_id(vcpu_vlapic(v));
> 
> This needs to be protected so it's reachable by HVM guests only,
> otherwise you will dereference v->arch.hvm.vlapic on a PV vCPU if it
> happens to have p->basic.x2apic set.
> 
> Why not just return the x2apic_id field from the cpu_policy object?
> (topo.subleaf[X].x2apic_id)

Scratch that, the cpu policy is per-domain, not per-vcpu, and hence
cannot hold the x{,2}apic IDs.

> Also, I'm not sure I get why the setting of res->d is gated on res->c
> != 0, won't res->c be 0 when the guest %ecx is 0, yet %edx must be
> valid for all %ecx inputs, the SDM states:
> 
> "The EDX output of leaf 0BH is always valid and does not vary with
> input value in ECX."
> 
> I think you need to keep the previous logic that doesn't gate setting
> ->d on anything other than p->basic.x2apic.
> 
> >          break;
> >  
> >      case XSTATE_CPUID:
> > diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> > index 8a31d18f69..e0c7ed8d5d 100644
> > --- a/xen/arch/x86/domain.c
> > +++ b/xen/arch/x86/domain.c
> > @@ -288,7 +288,10 @@ void update_guest_memory_policy(struct vcpu *v,
> >  static void cpu_policy_updated(struct vcpu *v)
> >  {
> >      if ( is_hvm_vcpu(v) )
> > +    {
> >          hvm_cpuid_policy_changed(v);
> > +        vlapic_cpu_policy_changed(v);
> > +    }
> >  }
> >  
> >  void domain_cpu_policy_changed(struct domain *d)
> > diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
> > index cdb69d9742..f500d66543 100644
> > --- a/xen/arch/x86/hvm/vlapic.c
> > +++ b/xen/arch/x86/hvm/vlapic.c
> > @@ -1069,7 +1069,7 @@ static uint32_t x2apic_ldr_from_id(uint32_t id)
> >  static void set_x2apic_id(struct vlapic *vlapic)
> >  {
> >      const struct vcpu *v = vlapic_vcpu(vlapic);
> > -    uint32_t apic_id = v->vcpu_id * 2;
> > +    uint32_t apic_id = vlapic->hw.x2apic_id;
> >      uint32_t apic_ldr = x2apic_ldr_from_id(apic_id);
> >  
> >      /*
> > @@ -1083,6 +1083,22 @@ static void set_x2apic_id(struct vlapic *vlapic)
> >      vlapic_set_reg(vlapic, APIC_LDR, apic_ldr);
> >  }
> >  
> > +void vlapic_cpu_policy_changed(struct vcpu *v)
> > +{
> > +    struct vlapic *vlapic = vcpu_vlapic(v);
> > +    struct cpu_policy *cp = v->domain->arch.cpu_policy;
> > +
> > +    /*
> > +     * Don't override the initial x2APIC ID if we have migrated it or
> > +     * if the domain doesn't have vLAPIC at all.
> > +     */
> > +    if ( !has_vlapic(v->domain) || vlapic->loaded.hw )
> > +        return;
> > +
> > +    vlapic->hw.x2apic_id = x86_x2apic_id_from_vcpu_id(cp, v->vcpu_id);
> > +    vlapic_set_reg(vlapic, APIC_ID, SET_xAPIC_ID(vlapic->hw.x2apic_id));
> > +}
> > +
> >  int guest_wrmsr_apic_base(struct vcpu *v, uint64_t val)
> >  {
> >      const struct cpu_policy *cp = v->domain->arch.cpu_policy;
> > @@ -1449,7 +1465,7 @@ void vlapic_reset(struct vlapic *vlapic)
> >      if ( v->vcpu_id == 0 )
> >          vlapic->hw.apic_base_msr |= APIC_BASE_BSP;
> >  
> > -    vlapic_set_reg(vlapic, APIC_ID, (v->vcpu_id * 2) << 24);
> > +    vlapic_set_reg(vlapic, APIC_ID, SET_xAPIC_ID(vlapic->hw.x2apic_id));
> >      vlapic_do_init(vlapic);
> >  }
> >  
> > @@ -1514,6 +1530,13 @@ static void lapic_load_fixup(struct vlapic *vlapic)
> >      const struct vcpu *v = vlapic_vcpu(vlapic);
> >      uint32_t good_ldr = x2apic_ldr_from_id(vlapic->loaded.id);
> >  
> > +    /*
> > +     * Guest with hardcoded assumptions about x2apic_id <-> vcpu_id
> > +     * mappings. Recreate the mapping it used to have in old host.
> 
> Wouldn't it be more appropriate to state "Loading record without
> hw.x2apic_id in the save stream, calculate using the vcpu_id * 2
> relation" or some such.
> 
> Current comment makes it looks like the guest has some kind of
> restriction with this relation, but that's just an internal Xen
> limitation.
> 
> > +     */
> > +    if ( !vlapic->hw.x2apic_id )
> > +        vlapic->hw.x2apic_id = v->vcpu_id * 2;
> > +
> >      /* Skip fixups on xAPIC mode, or if the x2APIC LDR is already correct */
> >      if ( !vlapic_x2apic_mode(vlapic) ||
> >           (vlapic->loaded.ldr == good_ldr) )
> > diff --git a/xen/arch/x86/include/asm/hvm/vlapic.h b/xen/arch/x86/include/asm/hvm/vlapic.h
> > index 88ef945243..e8d41313ab 100644
> > --- a/xen/arch/x86/include/asm/hvm/vlapic.h
> > +++ b/xen/arch/x86/include/asm/hvm/vlapic.h
> > @@ -44,6 +44,7 @@
> >  #define vlapic_xapic_mode(vlapic)                               \
> >      (!vlapic_hw_disabled(vlapic) && \
> >       !((vlapic)->hw.apic_base_msr & APIC_BASE_EXTD))
> > +#define vlapic_x2apic_id(vlapic) ((vlapic)->hw.x2apic_id)
> >  
> >  /*
> >   * Generic APIC bitmap vector update & search routines.
> > @@ -107,6 +108,7 @@ int vlapic_ack_pending_irq(struct vcpu *v, int vector, bool force_ack);
> >  
> >  int  vlapic_init(struct vcpu *v);
> >  void vlapic_destroy(struct vcpu *v);
> > +void vlapic_cpu_policy_changed(struct vcpu *v);
> >  
> >  void vlapic_reset(struct vlapic *vlapic);
> >  
> > diff --git a/xen/include/public/arch-x86/hvm/save.h b/xen/include/public/arch-x86/hvm/save.h
> > index 7ecacadde1..1c2ec669ff 100644
> > --- a/xen/include/public/arch-x86/hvm/save.h
> > +++ b/xen/include/public/arch-x86/hvm/save.h
> > @@ -394,6 +394,8 @@ struct hvm_hw_lapic {
> >      uint32_t             disabled; /* VLAPIC_xx_DISABLED */
> >      uint32_t             timer_divisor;
> >      uint64_t             tdt_msr;
> > +    uint32_t             x2apic_id;
> > +    uint32_t             rsvd_zero;
> 
> Do we really to add a new field, couldn't we get the lapic IDs from
> the cpu_policy?

Since getting from the cpu_policy is not possible, what about getting
it from the registers itself?  It's already present in hvm_hw_lapic_regs.

Regards, Roger.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 5/6] xen/x86: Derive topologically correct x2APIC IDs from the policy
  2024-01-09 15:38 ` [PATCH 5/6] xen/x86: Derive topologically correct x2APIC IDs from the policy Alejandro Vallejo
@ 2024-03-20 10:15   ` Roger Pau Monné
  2024-05-01 16:05     ` Alejandro Vallejo
  2024-03-26 16:41   ` Jan Beulich
  1 sibling, 1 reply; 32+ messages in thread
From: Roger Pau Monné @ 2024-03-20 10:15 UTC (permalink / raw)
  To: Alejandro Vallejo
  Cc: Xen-devel, Jan Beulich, Andrew Cooper, Wei Liu, Anthony PERARD

On Tue, Jan 09, 2024 at 03:38:33PM +0000, Alejandro Vallejo wrote:
> Implements the helper for mapping vcpu_id to x2apic_id given a valid
> topology in a policy. The algo is written with the intention of extending
> it to leaves 0x1f and e26 in the future.
> 
> Toolstack doesn't set leaf 0xb and the HVM default policy has it cleared,
> so the leaf is not implemented. In that case, the new helper just returns
> the legacy mapping.
> 
> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
> ---
>  tools/tests/cpu-policy/test-cpu-policy.c | 128 +++++++++++++++++++++++
>  xen/include/xen/lib/x86/cpu-policy.h     |   2 +
>  xen/lib/x86/policy.c                     |  75 +++++++++++--
>  3 files changed, 199 insertions(+), 6 deletions(-)
> 
> diff --git a/tools/tests/cpu-policy/test-cpu-policy.c b/tools/tests/cpu-policy/test-cpu-policy.c
> index 301df2c002..6ff5c1dd3d 100644
> --- a/tools/tests/cpu-policy/test-cpu-policy.c
> +++ b/tools/tests/cpu-policy/test-cpu-policy.c
> @@ -650,6 +650,132 @@ static void test_is_compatible_failure(void)
>      }
>  }
>  
> +static void test_x2apic_id_from_vcpu_id_success(void)
> +{
> +    static struct test {

const

> +        const char *name;
> +        uint32_t vcpu_id;
> +        uint32_t x2apic_id;
> +        struct cpu_policy policy;
> +    } tests[] = {
> +        {
> +            .name = "3v: 3 t/c, 8 c/s", .vcpu_id = 3, .x2apic_id = 1 << 2,
> +            .policy = {
> +                .x86_vendor = X86_VENDOR_AMD,
> +                .topo.subleaf = {
> +                    [0] = { .nr_logical = 3, .level = 0, .type = 1, .id_shift = 2, },
> +                    [1] = { .nr_logical = 8, .level = 1, .type = 2, .id_shift = 5, },
> +                },

Don't we need a helper that gets passed the number of cores per
socket and threads per core and fills this up?

I would introduce this first, add a test for it, and then do this
testing using the helper.

> +            },
> +        },
> +        {
> +            .name = "6v: 3 t/c, 8 c/s", .vcpu_id = 6, .x2apic_id = 2 << 2,
> +            .policy = {
> +                .x86_vendor = X86_VENDOR_AMD,
> +                .topo.subleaf = {
> +                    [0] = { .nr_logical = 3, .level = 0, .type = 1, .id_shift = 2, },
> +                    [1] = { .nr_logical = 8, .level = 1, .type = 2, .id_shift = 5, },
> +                },
> +            },
> +        },
> +        {
> +            .name = "24v: 3 t/c, 8 c/s", .vcpu_id = 24, .x2apic_id = 1 << 5,
> +            .policy = {
> +                .x86_vendor = X86_VENDOR_AMD,
> +                .topo.subleaf = {
> +                    [0] = { .nr_logical = 3, .level = 0, .type = 1, .id_shift = 2, },
> +                    [1] = { .nr_logical = 8, .level = 1, .type = 2, .id_shift = 5, },
> +                },
> +            },
> +        },
> +        {
> +            .name = "35v: 3 t/c, 8 c/s", .vcpu_id = 35,
> +            .x2apic_id = (35 % 3) | (((35 / 3) % 8)  << 2) | ((35 / 24) << 5),
> +            .policy = {
> +                .x86_vendor = X86_VENDOR_AMD,
> +                .topo.subleaf = {
> +                    [0] = { .nr_logical = 3, .level = 0, .type = 1, .id_shift = 2, },
> +                    [1] = { .nr_logical = 8, .level = 1, .type = 2, .id_shift = 5, },
> +                },
> +            },
> +        },
> +        {
> +            .name = "96v: 7 t/c, 3 c/s", .vcpu_id = 96,
> +            .x2apic_id = (96 % 7) | (((96 / 7) % 3)  << 3) | ((96 / 21) << 5),
> +            .policy = {
> +                .x86_vendor = X86_VENDOR_AMD,
> +                .topo.subleaf = {
> +                    [0] = { .nr_logical = 7, .level = 0, .type = 1, .id_shift = 3, },
> +                    [1] = { .nr_logical = 3, .level = 1, .type = 2, .id_shift = 5, },
> +                },
> +            },
> +        },
> +        {
> +            .name = "3v: 3 t/c, 8 c/s", .vcpu_id = 3, .x2apic_id = 1 << 2,
> +            .policy = {
> +                .x86_vendor = X86_VENDOR_INTEL,
> +                .topo.subleaf = {
> +                    [0] = { .nr_logical = 3,  .level = 0, .type = 1, .id_shift = 2, },
> +                    [1] = { .nr_logical = 24, .level = 1, .type = 2, .id_shift = 5, },
> +                },
> +            },
> +        },
> +        {
> +            .name = "6v: 3 t/c, 8 c/s", .vcpu_id = 6, .x2apic_id = 2 << 2,
> +            .policy = {
> +                .x86_vendor = X86_VENDOR_INTEL,
> +                .topo.subleaf = {
> +                    [0] = { .nr_logical = 3,  .level = 0, .type = 1, .id_shift = 2, },
> +                    [1] = { .nr_logical = 24, .level = 1, .type = 2, .id_shift = 5, },
> +                },
> +            },
> +        },
> +        {
> +            .name = "24v: 3 t/c, 8 c/s", .vcpu_id = 24, .x2apic_id = 1 << 5,
> +            .policy = {
> +                .x86_vendor = X86_VENDOR_INTEL,
> +                .topo.subleaf = {
> +                    [0] = { .nr_logical = 3,  .level = 0, .type = 1, .id_shift = 2, },
> +                    [1] = { .nr_logical = 24, .level = 1, .type = 2, .id_shift = 5, },
> +                },
> +            },
> +        },
> +        {
> +            .name = "35v: 3 t/c, 8 c/s", .vcpu_id = 35,
> +            .x2apic_id = (35 % 3) | (((35 / 3) % 8)  << 2) | ((35 / 24) << 5),
> +            .policy = {
> +                .x86_vendor = X86_VENDOR_INTEL,
> +                .topo.subleaf = {
> +                    [0] = { .nr_logical = 3,  .level = 0, .type = 1, .id_shift = 2, },
> +                    [1] = { .nr_logical = 24, .level = 1, .type = 2, .id_shift = 5, },
> +                },
> +            },
> +        },
> +        {
> +            .name = "96v: 7 t/c, 3 c/s", .vcpu_id = 96,
> +            .x2apic_id = (96 % 7) | (((96 / 7) % 3)  << 3) | ((96 / 21) << 5),
> +            .policy = {
> +                .x86_vendor = X86_VENDOR_INTEL,
> +                .topo.subleaf = {
> +                    [0] = { .nr_logical = 7,   .level = 0, .type = 1, .id_shift = 3, },
> +                    [1] = { .nr_logical = 21,  .level = 1, .type = 2, .id_shift = 5, },
> +                },
> +            },
> +        },
> +    };
> +
> +    printf("Testing x2apic id from vcpu id success:\n");
> +
> +    for ( size_t i = 0; i < ARRAY_SIZE(tests); ++i )
> +    {
> +        struct test *t = &tests[i];

const

> +        uint32_t x2apic_id = x86_x2apic_id_from_vcpu_id(&t->policy, t->vcpu_id);

Newline preferably.

> +        if ( x2apic_id != t->x2apic_id )
> +            fail("FAIL - '%s'. bad x2apic_id: expected=%u actual=%u\n",
> +                 t->name, t->x2apic_id, x2apic_id);
> +    }
> +}
> +
>  int main(int argc, char **argv)
>  {
>      printf("CPU Policy unit tests\n");
> @@ -667,6 +793,8 @@ int main(int argc, char **argv)
>      test_is_compatible_success();
>      test_is_compatible_failure();
>  
> +    test_x2apic_id_from_vcpu_id_success();
> +
>      if ( nr_failures )
>          printf("Done: %u failures\n", nr_failures);
>      else
> diff --git a/xen/include/xen/lib/x86/cpu-policy.h b/xen/include/xen/lib/x86/cpu-policy.h
> index 65f6335b32..d81ae2f47c 100644
> --- a/xen/include/xen/lib/x86/cpu-policy.h
> +++ b/xen/include/xen/lib/x86/cpu-policy.h
> @@ -550,6 +550,8 @@ int x86_cpu_policies_are_compatible(const struct cpu_policy *host,
>  /**
>   * Calculates the x2APIC ID of a vCPU given a CPU policy
>   *
> + * If the policy lacks leaf 0xb falls back to legacy mapping of apic_id=cpu*2
> + *
>   * @param p          CPU policy of the domain.
>   * @param vcpu_id    vCPU ID of the vCPU.
>   * @returns x2APIC ID of the vCPU.
> diff --git a/xen/lib/x86/policy.c b/xen/lib/x86/policy.c
> index a3b24e6879..2a50bc076a 100644
> --- a/xen/lib/x86/policy.c
> +++ b/xen/lib/x86/policy.c
> @@ -2,15 +2,78 @@
>  
>  #include <xen/lib/x86/cpu-policy.h>
>  
> -uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t vcpu_id)
> +static uint32_t parts_per_higher_scoped_level(const struct cpu_policy *p, size_t lvl)
>  {
>      /*
> -     * TODO: Derive x2APIC ID from the topology information inside `p`
> -     *       rather than from vCPU ID. This bodge is a temporary measure
> -     *       until all infra is in place to retrieve or derive the initial
> -     *       x2APIC ID from migrated domains.
> +     * `nr_logical` reported by Intel is the number of THREADS contained in
> +     * the next topological scope. For example, assuming a system with 2
> +     * threads/core and 3 cores/module in a fully symmetric topology,
> +     * `nr_logical` at the core level will report 6. Because it's reporting
> +     * the number of threads in a module.
> +     *
> +     * On AMD/Hygon, nr_logical is already normalized by the higher scoped
> +     * level (cores/complex, etc) so we can return it as-is.
>       */
> -    return vcpu_id * 2;
> +    if ( p->x86_vendor != X86_VENDOR_INTEL || !lvl )
> +        return p->topo.subleaf[lvl].nr_logical;
> +
> +    return p->topo.subleaf[lvl].nr_logical / p->topo.subleaf[lvl - 1].nr_logical;

I think this is an optimization because we only have 2 levels,
otherwise you would need a loop like:

unsigned int nr = p->topo.subleaf[lvl].nr_logical
for ( unsigned int i; i < lvl; i++ )
    nr /= p->topo.subleaf[i].nr_logical;

If that's the case I think we need a
BUILD_BUG_ON(ARRAY_SIZE(p->topo.subleaf) > 1);

> +}
> +
> +uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t id)

Can you keep the previous vcpu_id parameter name?  Or alternatively
adjust the prototype to also be id.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 6/6] xen/x86: Add topology generator
  2024-01-09 15:38 ` [PATCH 6/6] xen/x86: Add topology generator Alejandro Vallejo
  2024-01-10 14:16   ` Alejandro Vallejo
@ 2024-03-20 10:29   ` Roger Pau Monné
  2024-03-26 17:02   ` Jan Beulich
  2 siblings, 0 replies; 32+ messages in thread
From: Roger Pau Monné @ 2024-03-20 10:29 UTC (permalink / raw)
  To: Alejandro Vallejo
  Cc: Xen-devel, Wei Liu, Anthony PERARD, Juergen Gross, Jan Beulich,
	Andrew Cooper

On Tue, Jan 09, 2024 at 03:38:34PM +0000, Alejandro Vallejo wrote:
> This allows toolstack to synthesise sensible topologies for guests. In
> particular, this patch causes x2APIC IDs to be packed according to the
> topology now exposed to the guests on leaf 0xb.
> 
> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
> ---
>  tools/include/xenguest.h        |  15 ++++
>  tools/libs/guest/xg_cpuid_x86.c | 144 ++++++++++++++++++++------------
>  xen/arch/x86/cpu-policy.c       |   6 +-
>  3 files changed, 107 insertions(+), 58 deletions(-)
> 
> diff --git a/tools/include/xenguest.h b/tools/include/xenguest.h
> index 4e9078fdee..f0043c559b 100644
> --- a/tools/include/xenguest.h
> +++ b/tools/include/xenguest.h
> @@ -843,5 +843,20 @@ enum xc_static_cpu_featuremask {
>      XC_FEATUREMASK_HVM_HAP_DEF,
>  };
>  const uint32_t *xc_get_static_cpu_featuremask(enum xc_static_cpu_featuremask);
> +
> +/**
> + * Synthesise topology information in `p` given high-level constraints
> + *
> + * Topology is given in various fields accross several leaves, some of
> + * which are vendor-specific. This function uses the policy itself to
> + * derive such leaves from threads/core and cores/package.
> + *
> + * @param p                   CPU policy of the domain.
> + * @param threads_per_core    threads/core. Doesn't need to be a power of 2.
> + * @param cores_per_package   cores/package. Doesn't need to be a power of 2.
> + */
> +void xc_topo_from_parts(struct cpu_policy *p,
> +                        uint32_t threads_per_core, uint32_t cores_per_pkg);

I think you can use plain unsigned int here.

> +
>  #endif /* __i386__ || __x86_64__ */
>  #endif /* XENGUEST_H */
> diff --git a/tools/libs/guest/xg_cpuid_x86.c b/tools/libs/guest/xg_cpuid_x86.c
> index 4453178100..7a622721be 100644
> --- a/tools/libs/guest/xg_cpuid_x86.c
> +++ b/tools/libs/guest/xg_cpuid_x86.c
> @@ -584,7 +584,7 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, bool restore,
>      bool hvm;
>      xc_domaininfo_t di;
>      struct xc_cpu_policy *p = xc_cpu_policy_init();
> -    unsigned int i, nr_leaves = ARRAY_SIZE(p->leaves), nr_msrs = 0;
> +    unsigned int nr_leaves = ARRAY_SIZE(p->leaves), nr_msrs = 0;
>      uint32_t err_leaf = -1, err_subleaf = -1, err_msr = -1;
>      uint32_t host_featureset[FEATURESET_NR_ENTRIES] = {};
>      uint32_t len = ARRAY_SIZE(host_featureset);
> @@ -727,60 +727,8 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, bool restore,
>      }
>      else
>      {
> -        /*
> -         * Topology for HVM guests is entirely controlled by Xen.  For now, we
> -         * hardcode APIC_ID = vcpu_id * 2 to give the illusion of no SMT.
> -         */
> -        p->policy.basic.htt = true;
> -        p->policy.extd.cmp_legacy = false;
> -
> -        /*
> -         * Leaf 1 EBX[23:16] is Maximum Logical Processors Per Package.
> -         * Update to reflect vLAPIC_ID = vCPU_ID * 2, but make sure to avoid
> -         * overflow.
> -         */
> -        if ( !p->policy.basic.lppp )
> -            p->policy.basic.lppp = 2;
> -        else if ( !(p->policy.basic.lppp & 0x80) )
> -            p->policy.basic.lppp *= 2;
> -
> -        switch ( p->policy.x86_vendor )
> -        {
> -        case X86_VENDOR_INTEL:
> -            for ( i = 0; (p->policy.cache.subleaf[i].type &&
> -                          i < ARRAY_SIZE(p->policy.cache.raw)); ++i )
> -            {
> -                p->policy.cache.subleaf[i].cores_per_package =
> -                    (p->policy.cache.subleaf[i].cores_per_package << 1) | 1;
> -                p->policy.cache.subleaf[i].threads_per_cache = 0;
> -            }
> -            break;
> -
> -        case X86_VENDOR_AMD:
> -        case X86_VENDOR_HYGON:
> -            /*
> -             * Leaf 0x80000008 ECX[15:12] is ApicIdCoreSize.
> -             * Leaf 0x80000008 ECX[7:0] is NumberOfCores (minus one).
> -             * Update to reflect vLAPIC_ID = vCPU_ID * 2.  But avoid
> -             * - overflow,
> -             * - going out of sync with leaf 1 EBX[23:16],
> -             * - incrementing ApicIdCoreSize when it's zero (which changes the
> -             *   meaning of bits 7:0).
> -             *
> -             * UPDATE: I addition to avoiding overflow, some
> -             * proprietary operating systems have trouble with
> -             * apic_id_size values greater than 7.  Limit the value to
> -             * 7 for now.
> -             */
> -            if ( p->policy.extd.nc < 0x7f )
> -            {
> -                if ( p->policy.extd.apic_id_size != 0 && p->policy.extd.apic_id_size < 0x7 )
> -                    p->policy.extd.apic_id_size++;
> -
> -                p->policy.extd.nc = (p->policy.extd.nc << 1) | 1;
> -            }
> -            break;
> -        }
> +        /* TODO: Expose the ability to choose a custom topology for HVM/PVH */
> +        xc_topo_from_parts(&p->policy, 1, di.max_vcpu_id + 1);

It would be better if this was split into two commits.  First commit
would move the code as-is into xc_topo_from_parts() without any
changes.  Second patch would do the functional changes.  That was it's
far easier to spot what are the relevant changes vs pure code
movement.

>      }
>  
>      nr_leaves = ARRAY_SIZE(p->leaves);
> @@ -1028,3 +976,89 @@ bool xc_cpu_policy_is_compatible(xc_interface *xch, xc_cpu_policy_t *host,
>  
>      return false;
>  }
> +
> +static uint32_t order(uint32_t n)
> +{
> +    return 32 - __builtin_clz(n);
> +}
> +
> +void xc_topo_from_parts(struct cpu_policy *p,
> +                        uint32_t threads_per_core, uint32_t cores_per_pkg)
> +{
> +    uint32_t threads_per_pkg = threads_per_core * cores_per_pkg;
> +    uint32_t apic_id_size;
> +
> +    if ( p->basic.max_leaf < 0xb )
> +        p->basic.max_leaf = 0xb;
> +
> +    memset(p->topo.raw, 0, sizeof(p->topo.raw));
> +
> +    /* thread level */
> +    p->topo.subleaf[0].nr_logical = threads_per_core;
> +    p->topo.subleaf[0].id_shift = 0;
> +    p->topo.subleaf[0].level = 0;
> +    p->topo.subleaf[0].type = 1;
> +    if ( threads_per_core > 1 )
> +        p->topo.subleaf[0].id_shift = order(threads_per_core - 1);
> +
> +    /* core level */
> +    p->topo.subleaf[1].nr_logical = cores_per_pkg;
> +    if ( p->x86_vendor == X86_VENDOR_INTEL )
> +        p->topo.subleaf[1].nr_logical = threads_per_pkg;
> +    p->topo.subleaf[1].id_shift = p->topo.subleaf[0].id_shift;
> +    p->topo.subleaf[1].level = 1;
> +    p->topo.subleaf[1].type = 2;
> +    if ( cores_per_pkg > 1 )
> +        p->topo.subleaf[1].id_shift += order(cores_per_pkg - 1);

I was kind of expecting this to be part of cpu-policy rather than xc,
as we could then also test this like you do test
x86_x2apic_id_from_vcpu_id().

It could also be used to populate the topologies in the tests
themselves.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/6] xen/x86: Add initial x2APIC ID to the per-vLAPIC save area
  2024-03-19 16:20   ` Roger Pau Monné
  2024-03-20  9:33     ` Roger Pau Monné
@ 2024-03-25 15:44     ` Alejandro Vallejo
  1 sibling, 0 replies; 32+ messages in thread
From: Alejandro Vallejo @ 2024-03-25 15:44 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Xen-devel, Jan Beulich, Andrew Cooper, Wei Liu

Hi,

I'll answer the policy-related feedback in the next email, as you
corrected yourself for that.

On 19/03/2024 16:20, Roger Pau Monné wrote:
> On Tue, Jan 09, 2024 at 03:38:29PM +0000, Alejandro Vallejo wrote:
>> This allows the initial x2APIC ID to be sent on the migration stream. The
>> hardcoded mapping x2apic_id=2*vcpu_id is maintained for the time being.
>> Given the vlapic data is zero-extended on restore, fix up migrations from
>> hosts without the field by setting it to the old convention if zero.
>>
>> x2APIC IDs are calculated from the CPU policy where the guest topology is
>> defined. For the time being, the function simply returns the old
>> relationship, but will eventually return results consistent with the
>> topology.
>>
>> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
>> ---
>>  xen/arch/x86/cpuid.c                   | 20 ++++---------------
>>  xen/arch/x86/domain.c                  |  3 +++
>>  xen/arch/x86/hvm/vlapic.c              | 27 ++++++++++++++++++++++++--
>>  xen/arch/x86/include/asm/hvm/vlapic.h  |  2 ++
>>  xen/include/public/arch-x86/hvm/save.h |  2 ++
>>  xen/include/xen/lib/x86/cpu-policy.h   |  9 +++++++++
>>  xen/lib/x86/policy.c                   | 11 +++++++++++
>>  7 files changed, 56 insertions(+), 18 deletions(-)
>>
>> diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c
>> index 7290a979c6..6e259785d0 100644
>> --- a/xen/arch/x86/cpuid.c
>> +++ b/xen/arch/x86/cpuid.c
>> @@ -139,10 +139,9 @@ void guest_cpuid(const struct vcpu *v, uint32_t leaf,
>>          const struct cpu_user_regs *regs;
>>  
>>      case 0x1:
>> -        /* TODO: Rework topology logic. */
>>          res->b &= 0x00ffffffu;
>>          if ( is_hvm_domain(d) )
>> -            res->b |= (v->vcpu_id * 2) << 24;
>> +            res->b |= SET_xAPIC_ID(vlapic_x2apic_id(vcpu_vlapic(v)));
> 
> SET_xAPIC_ID() was intended to be used with the APIC_ID register,
> which also shifts the ID.  Not sure it's logically correct to use
> here, even if functionally equivalent (as is shifts left by 24).
Ack.

> 
>>  
>>          /* TODO: Rework vPMU control in terms of toolstack choices. */
>>          if ( vpmu_available(v) &&
>> @@ -311,20 +310,9 @@ void guest_cpuid(const struct vcpu *v, uint32_t leaf,
>>          break;
>>  
>>      case 0xb:
>> -        /*
>> -         * In principle, this leaf is Intel-only.  In practice, it is tightly
>> -         * coupled with x2apic, and we offer an x2apic-capable APIC emulation
>> -         * to guests on AMD hardware as well.
>> -         *
>> -         * TODO: Rework topology logic.
>> -         */
>> -        if ( p->basic.x2apic )
>> -        {
>> -            *(uint8_t *)&res->c = subleaf;
>> -
>> -            /* Fix the x2APIC identifier. */
>> -            res->d = v->vcpu_id * 2;
>> -        }
>> +        /* ecx != 0 if the subleaf is implemented */
>> +        if ( res->c && p->basic.x2apic )
>> +            res->d = vlapic_x2apic_id(vcpu_vlapic(v));
> 
> This needs to be protected so it's reachable by HVM guests only,
> otherwise you will dereference v->arch.hvm.vlapic on a PV vCPU if it
> happens to have p->basic.x2apic set.
Very true. Ack.

> 
> Why not just return the x2apic_id field from the cpu_policy object?
> (topo.subleaf[X].x2apic_id)
> 
> Also, I'm not sure I get why the setting of res->d is gated on res->c
> != 0, won't res->c be 0 when the guest %ecx is 0, yet %edx must be
> valid for all %ecx inputs, the SDM states:
> 
> "The EDX output of leaf 0BH is always valid and does not vary with
> input value in ECX."
> 
> I think you need to keep the previous logic that doesn't gate setting
> ->d on anything other than p->basic.x2apic.
> 

Ack.

Real HW says you're right. Oddly enough that quote is (AFAICS) for leaf
1FH, but it appears to also hold for 0BH.


>>          break;
>>  
>>      case XSTATE_CPUID:
>> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
>> index 8a31d18f69..e0c7ed8d5d 100644
>> --- a/xen/arch/x86/domain.c
>> +++ b/xen/arch/x86/domain.c
>> @@ -288,7 +288,10 @@ void update_guest_memory_policy(struct vcpu *v,
>>  static void cpu_policy_updated(struct vcpu *v)
>>  {
>>      if ( is_hvm_vcpu(v) )
>> +    {
>>          hvm_cpuid_policy_changed(v);
>> +        vlapic_cpu_policy_changed(v);
>> +    }
>>  }
>>  
>>  void domain_cpu_policy_changed(struct domain *d)
>> diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
>> index cdb69d9742..f500d66543 100644
>> --- a/xen/arch/x86/hvm/vlapic.c
>> +++ b/xen/arch/x86/hvm/vlapic.c
>> @@ -1069,7 +1069,7 @@ static uint32_t x2apic_ldr_from_id(uint32_t id)
>>  static void set_x2apic_id(struct vlapic *vlapic)
>>  {
>>      const struct vcpu *v = vlapic_vcpu(vlapic);
>> -    uint32_t apic_id = v->vcpu_id * 2;
>> +    uint32_t apic_id = vlapic->hw.x2apic_id;
>>      uint32_t apic_ldr = x2apic_ldr_from_id(apic_id);
>>  
>>      /*
>> @@ -1083,6 +1083,22 @@ static void set_x2apic_id(struct vlapic *vlapic)
>>      vlapic_set_reg(vlapic, APIC_LDR, apic_ldr);
>>  }
>>  
>> +void vlapic_cpu_policy_changed(struct vcpu *v)
>> +{
>> +    struct vlapic *vlapic = vcpu_vlapic(v);
>> +    struct cpu_policy *cp = v->domain->arch.cpu_policy;
>> +
>> +    /*
>> +     * Don't override the initial x2APIC ID if we have migrated it or
>> +     * if the domain doesn't have vLAPIC at all.
>> +     */
>> +    if ( !has_vlapic(v->domain) || vlapic->loaded.hw )
>> +        return;
>> +
>> +    vlapic->hw.x2apic_id = x86_x2apic_id_from_vcpu_id(cp, v->vcpu_id);
>> +    vlapic_set_reg(vlapic, APIC_ID, SET_xAPIC_ID(vlapic->hw.x2apic_id));
>> +}
>> +
>>  int guest_wrmsr_apic_base(struct vcpu *v, uint64_t val)
>>  {
>>      const struct cpu_policy *cp = v->domain->arch.cpu_policy;
>> @@ -1449,7 +1465,7 @@ void vlapic_reset(struct vlapic *vlapic)
>>      if ( v->vcpu_id == 0 )
>>          vlapic->hw.apic_base_msr |= APIC_BASE_BSP;
>>  
>> -    vlapic_set_reg(vlapic, APIC_ID, (v->vcpu_id * 2) << 24);
>> +    vlapic_set_reg(vlapic, APIC_ID, SET_xAPIC_ID(vlapic->hw.x2apic_id));
>>      vlapic_do_init(vlapic);
>>  }
>>  
>> @@ -1514,6 +1530,13 @@ static void lapic_load_fixup(struct vlapic *vlapic)
>>      const struct vcpu *v = vlapic_vcpu(vlapic);
>>      uint32_t good_ldr = x2apic_ldr_from_id(vlapic->loaded.id);
>>  
>> +    /*
>> +     * Guest with hardcoded assumptions about x2apic_id <-> vcpu_id
>> +     * mappings. Recreate the mapping it used to have in old host.
> 
> Wouldn't it be more appropriate to state "Loading record without
> hw.x2apic_id in the save stream, calculate using the vcpu_id * 2
> relation" or some such.>
> Current comment makes it looks like the guest has some kind of
> restriction with this relation, but that's just an internal Xen
> limitation.

Sure.

> 
>> +     */
>> +    if ( !vlapic->hw.x2apic_id )
>> +        vlapic->hw.x2apic_id = v->vcpu_id * 2;
>> +
>>      /* Skip fixups on xAPIC mode, or if the x2APIC LDR is already correct */
>>      if ( !vlapic_x2apic_mode(vlapic) ||
>>           (vlapic->loaded.ldr == good_ldr) )
>> diff --git a/xen/arch/x86/include/asm/hvm/vlapic.h b/xen/arch/x86/include/asm/hvm/vlapic.h
>> index 88ef945243..e8d41313ab 100644
>> --- a/xen/arch/x86/include/asm/hvm/vlapic.h
>> +++ b/xen/arch/x86/include/asm/hvm/vlapic.h
>> @@ -44,6 +44,7 @@
>>  #define vlapic_xapic_mode(vlapic)                               \
>>      (!vlapic_hw_disabled(vlapic) && \
>>       !((vlapic)->hw.apic_base_msr & APIC_BASE_EXTD))
>> +#define vlapic_x2apic_id(vlapic) ((vlapic)->hw.x2apic_id)
>>  
>>  /*
>>   * Generic APIC bitmap vector update & search routines.
>> @@ -107,6 +108,7 @@ int vlapic_ack_pending_irq(struct vcpu *v, int vector, bool force_ack);
>>  
>>  int  vlapic_init(struct vcpu *v);
>>  void vlapic_destroy(struct vcpu *v);
>> +void vlapic_cpu_policy_changed(struct vcpu *v);
>>  
>>  void vlapic_reset(struct vlapic *vlapic);
>>  
>> diff --git a/xen/include/public/arch-x86/hvm/save.h b/xen/include/public/arch-x86/hvm/save.h
>> index 7ecacadde1..1c2ec669ff 100644
>> --- a/xen/include/public/arch-x86/hvm/save.h
>> +++ b/xen/include/public/arch-x86/hvm/save.h
>> @@ -394,6 +394,8 @@ struct hvm_hw_lapic {
>>      uint32_t             disabled; /* VLAPIC_xx_DISABLED */
>>      uint32_t             timer_divisor;
>>      uint64_t             tdt_msr;
>> +    uint32_t             x2apic_id;
>> +    uint32_t             rsvd_zero;
> 
> Do we really to add a new field, couldn't we get the lapic IDs from
> the cpu_policy>
>>  };
>>  
>>  DECLARE_HVM_SAVE_TYPE(LAPIC, 5, struct hvm_hw_lapic);
>> diff --git a/xen/include/xen/lib/x86/cpu-policy.h b/xen/include/xen/lib/x86/cpu-policy.h
>> index d5e447e9dc..14724cedff 100644
>> --- a/xen/include/xen/lib/x86/cpu-policy.h
>> +++ b/xen/include/xen/lib/x86/cpu-policy.h
>> @@ -542,6 +542,15 @@ int x86_cpu_policies_are_compatible(const struct cpu_policy *host,
>>                                      const struct cpu_policy *guest,
>>                                      struct cpu_policy_errors *err);
>>  
>> +/**
>> + * Calculates the x2APIC ID of a vCPU given a CPU policy
>> + *
>> + * @param p          CPU policy of the domain.
>> + * @param vcpu_id    vCPU ID of the vCPU.
>> + * @returns x2APIC ID of the vCPU.
>> + */
>> +uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t vcpu_id);
>> +
>>  #endif /* !XEN_LIB_X86_POLICIES_H */
>>  
>>  /*
>> diff --git a/xen/lib/x86/policy.c b/xen/lib/x86/policy.c
>> index f033d22785..a3b24e6879 100644
>> --- a/xen/lib/x86/policy.c
>> +++ b/xen/lib/x86/policy.c
>> @@ -2,6 +2,17 @@
>>  
>>  #include <xen/lib/x86/cpu-policy.h>
>>  
>> +uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t vcpu_id)
>> +{
>> +    /*
>> +     * TODO: Derive x2APIC ID from the topology information inside `p`
>> +     *       rather than from vCPU ID. This bodge is a temporary measure
>> +     *       until all infra is in place to retrieve or derive the initial
>> +     *       x2APIC ID from migrated domains.
>> +     */
>> +    return vcpu_id * 2;
> 
> As noted above, won't a suitable initial step would be to populate the
> apic_id and x2apic_id fields in struct cpu_policy with this relation
> (x{,2}apic_id == vcpu_id * 2), and avoid this extra handler?
> 
> Thanks, Roger.

Cheers,
Alejandro



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/6] xen/x86: Add initial x2APIC ID to the per-vLAPIC save area
  2024-01-09 15:38 ` [PATCH 1/6] xen/x86: Add initial x2APIC ID to the per-vLAPIC save area Alejandro Vallejo
  2024-03-19 16:20   ` Roger Pau Monné
@ 2024-03-25 16:45   ` Jan Beulich
  2024-03-25 18:00     ` Alejandro Vallejo
  1 sibling, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2024-03-25 16:45 UTC (permalink / raw)
  To: Alejandro Vallejo; +Cc: Andrew Cooper, Roger Pau Monné, Xen-devel

On 09.01.2024 16:38, Alejandro Vallejo wrote:
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -288,7 +288,10 @@ void update_guest_memory_policy(struct vcpu *v,
>  static void cpu_policy_updated(struct vcpu *v)
>  {
>      if ( is_hvm_vcpu(v) )
> +    {
>          hvm_cpuid_policy_changed(v);
> +        vlapic_cpu_policy_changed(v);
> +    }
>  }

This is a layering violation imo; hvm_cpuid_policy_changed() wants
to call vlapic_cpu_policy_changed().

> @@ -1083,6 +1083,22 @@ static void set_x2apic_id(struct vlapic *vlapic)
>      vlapic_set_reg(vlapic, APIC_LDR, apic_ldr);
>  }
>  
> +void vlapic_cpu_policy_changed(struct vcpu *v)
> +{
> +    struct vlapic *vlapic = vcpu_vlapic(v);
> +    struct cpu_policy *cp = v->domain->arch.cpu_policy;

const please

> @@ -1514,6 +1530,13 @@ static void lapic_load_fixup(struct vlapic *vlapic)
>      const struct vcpu *v = vlapic_vcpu(vlapic);
>      uint32_t good_ldr = x2apic_ldr_from_id(vlapic->loaded.id);
>  
> +    /*
> +     * Guest with hardcoded assumptions about x2apic_id <-> vcpu_id
> +     * mappings. Recreate the mapping it used to have in old host.
> +     */
> +    if ( !vlapic->hw.x2apic_id )
> +        vlapic->hw.x2apic_id = v->vcpu_id * 2;

This looks to depend upon it only ever being vCPU which may get a (new
style) APIC ID of 0. I think such at least wants mentioning in the
comment.

> --- a/xen/include/public/arch-x86/hvm/save.h
> +++ b/xen/include/public/arch-x86/hvm/save.h
> @@ -394,6 +394,8 @@ struct hvm_hw_lapic {
>      uint32_t             disabled; /* VLAPIC_xx_DISABLED */
>      uint32_t             timer_divisor;
>      uint64_t             tdt_msr;
> +    uint32_t             x2apic_id;
> +    uint32_t             rsvd_zero;
>  };

I can't spot any checking of this last field indeed being zero.

Jan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/6] xen/x86: Refactor xen/lib/x86 so it can be linked in hvmloader
  2024-01-09 15:38 ` [PATCH 3/6] xen/x86: Refactor xen/lib/x86 so it can be linked in hvmloader Alejandro Vallejo
@ 2024-03-25 16:55   ` Jan Beulich
  0 siblings, 0 replies; 32+ messages in thread
From: Jan Beulich @ 2024-03-25 16:55 UTC (permalink / raw)
  To: Alejandro Vallejo; +Cc: Andrew Cooper, Roger Pau Monné, Xen-devel

On 09.01.2024 16:38, Alejandro Vallejo wrote:
> A future patch will use these in hvmloader, which is freestanding, but
> lacks the Xen code. The following changes fix the compilation errors.
> 
> * string.h => Remove memset() usages and bzero through assignments

But hvmloader does have memset(). It's just that it doesn't have string.h.
Yet it doesn't have e.g. stdint.h either.

> * inttypes.h => Use stdint.h (it's what it should've been to begin with)
> * errno.h => Use xen/errno.h instead
> 
> No functional change intended.
> 
> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
> ---
>  xen/lib/x86/cpuid.c   | 12 ++++++------
>  xen/lib/x86/private.h |  8 +++++---
>  2 files changed, 11 insertions(+), 9 deletions(-)
> 
> diff --git a/xen/lib/x86/cpuid.c b/xen/lib/x86/cpuid.c
> index eb7698dc73..4a138c3a11 100644
> --- a/xen/lib/x86/cpuid.c
> +++ b/xen/lib/x86/cpuid.c
> @@ -5,8 +5,8 @@
>  static void zero_leaves(struct cpuid_leaf *l,
>                          unsigned int first, unsigned int last)
>  {
> -    if ( first <= last )
> -        memset(&l[first], 0, sizeof(*l) * (last - first + 1));
> +    for (l = &l[first]; first <= last; first++, l++ )
> +        *l = (struct cpuid_leaf){};
>  }

Even if memset() was to be replaced here, we already have two instances
of an EMPTY_LEAF #define. Those will want moving to a single header and
hen using here, I expect. Presumably code further down could then use it,
too.

> --- a/xen/lib/x86/private.h
> +++ b/xen/lib/x86/private.h
> @@ -17,12 +17,14 @@
>  
>  #else
>  
> -#include <errno.h>
> -#include <inttypes.h>
> +#include <stdint.h>
>  #include <stdbool.h>
>  #include <stddef.h>
> -#include <string.h>
>  
> +enum {
> +#define XEN_ERRNO(ident, rc) ident = (rc),
> +#include <xen/errno.h>
> +};
>  #include <xen/asm/msr-index.h>
>  #include <xen/asm/x86-vendors.h>

As to the comment at the top - we're in a block of code conditional upon
!Xen here already. Would there be anything wrong with simply recognizing
hvmloader here, too, and then including its util.h in place of string.h?

Jan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/6] xen/x86: Add initial x2APIC ID to the per-vLAPIC save area
  2024-03-20  9:33     ` Roger Pau Monné
@ 2024-03-25 16:56       ` Alejandro Vallejo
  0 siblings, 0 replies; 32+ messages in thread
From: Alejandro Vallejo @ 2024-03-25 16:56 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Xen-devel, Jan Beulich, Andrew Cooper, Wei Liu

On 20/03/2024 09:33, Roger Pau Monné wrote:
>>
>> Why not just return the x2apic_id field from the cpu_policy object?
>> (topo.subleaf[X].x2apic_id)
> 
> Scratch that, the cpu policy is per-domain, not per-vcpu, and hence
> cannot hold the x{,2}apic IDs.
Yes, that :)

Originally I tried to make the policy per-vCPU, tbf, but that's a very,
very deep and complicated hole that was very hard to deal with and I'm
not tempted to try that again.

>>> diff --git a/xen/include/public/arch-x86/hvm/save.h b/xen/include/public/arch-x86/hvm/save.h
>>> index 7ecacadde1..1c2ec669ff 100644
>>> --- a/xen/include/public/arch-x86/hvm/save.h
>>> +++ b/xen/include/public/arch-x86/hvm/save.h
>>> @@ -394,6 +394,8 @@ struct hvm_hw_lapic {
>>>      uint32_t             disabled; /* VLAPIC_xx_DISABLED */
>>>      uint32_t             timer_divisor;
>>>      uint64_t             tdt_msr;
>>> +    uint32_t             x2apic_id;
>>> +    uint32_t             rsvd_zero;
>>
>> Do we really to add a new field, couldn't we get the lapic IDs from
>> the cpu_policy?
> 
> Since getting from the cpu_policy is not possible, what about getting
> it from the registers itself?  It's already present in hvm_hw_lapic_regs.
> 
> Regards, Roger.

If every APIC is in x2APIC mode, yes. But if any of them is in xAPIC
mode there's problems. The xAPIC ID is overridable by the guest simply
by writing into it, so it's tricky to know whether it's sane or not.
This new field is effectively an immutable "initial APIC ID", which we
can recreate to the old convention when not present in the stream.

If you can think of an alternative that doesn't involve adding a field
or fixing in stone the mapping strategy I'm happy to do that instead,
but I think this is the lesser evil.

Cheers,
Alejandro


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/6] tools/hvmloader: Use cpu_policy to determine APIC IDs
  2024-03-20  8:52   ` Roger Pau Monné
@ 2024-03-25 17:00     ` Jan Beulich
  0 siblings, 0 replies; 32+ messages in thread
From: Jan Beulich @ 2024-03-25 17:00 UTC (permalink / raw)
  To: Roger Pau Monné, Alejandro Vallejo
  Cc: Xen-devel, Andrew Cooper, Wei Liu, Anthony PERARD, Juergen Gross

On 20.03.2024 09:52, Roger Pau Monné wrote:
> On Tue, Jan 09, 2024 at 03:38:32PM +0000, Alejandro Vallejo wrote:
>> As part of topology correction efforts, APIC IDs can no longer be derived
>> strictly through the vCPU ID alone. Bring in the machinery for policy
>> retrieval and parsing in order to generate the proper MADT table and wake
>> the appropriate CPUs.
> 
> I'm kind of unsure about this last part of the sentence, as I see no
> waking of CPUs in hvmloader.  Is this referring to something else?
> 
> I'm kind of unsure about bringing the cpu_policy machinery to
> hvmloader.

I share this concern and ...

>  Won't it be simpler to just pass the array of APIC IDs in
> hvm_info_table?  The current size of this struct is 48bytes, and an
> array of 128 32bit integers would be an additional 512bytes.
> 
> AFAICT there's plenty of room in hvm_info_table, it's
> positioned at 0x9f800, and there's unused space up to 0x9fc00, so 1024
> bytes of memory we could use.
> 
> I know this doesn't give us much room for expansion if we want to bump
> past 128 vCPUs, but a more appropriate solution IMO would be to move
> ACPI table creation to the toolstack.
> 
> It's possible I'm missing some aspects, so if this has been considered
> and rejected would be good to note in the commit message.

... it being at least clarified whether alternatives were considered
and why they cannot / should not be used.

Jan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/6] xen/x86: Add initial x2APIC ID to the per-vLAPIC save area
  2024-03-25 16:45   ` Jan Beulich
@ 2024-03-25 18:00     ` Alejandro Vallejo
  2024-03-26  7:16       ` Jan Beulich
  0 siblings, 1 reply; 32+ messages in thread
From: Alejandro Vallejo @ 2024-03-25 18:00 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Roger Pau Monné, Xen-devel

Hi,

On 25/03/2024 16:45, Jan Beulich wrote:
> On 09.01.2024 16:38, Alejandro Vallejo wrote:
>> --- a/xen/arch/x86/domain.c
>> +++ b/xen/arch/x86/domain.c
>> @@ -288,7 +288,10 @@ void update_guest_memory_policy(struct vcpu *v,
>>  static void cpu_policy_updated(struct vcpu *v)
>>  {
>>      if ( is_hvm_vcpu(v) )
>> +    {
>>          hvm_cpuid_policy_changed(v);
>> +        vlapic_cpu_policy_changed(v);
>> +    }
>>  }
> 
> This is a layering violation imo; hvm_cpuid_policy_changed() wants
> to call vlapic_cpu_policy_changed().

Sure.

> 
>> @@ -1083,6 +1083,22 @@ static void set_x2apic_id(struct vlapic *vlapic)
>>      vlapic_set_reg(vlapic, APIC_LDR, apic_ldr);
>>  }
>>  
>> +void vlapic_cpu_policy_changed(struct vcpu *v)
>> +{
>> +    struct vlapic *vlapic = vcpu_vlapic(v);
>> +    struct cpu_policy *cp = v->domain->arch.cpu_policy;
> 
> const please

Ack

> 
>> @@ -1514,6 +1530,13 @@ static void lapic_load_fixup(struct vlapic *vlapic)
>>      const struct vcpu *v = vlapic_vcpu(vlapic);
>>      uint32_t good_ldr = x2apic_ldr_from_id(vlapic->loaded.id);
>>  
>> +    /*
>> +     * Guest with hardcoded assumptions about x2apic_id <-> vcpu_id
>> +     * mappings. Recreate the mapping it used to have in old host.
>> +     */
>> +    if ( !vlapic->hw.x2apic_id )
>> +        vlapic->hw.x2apic_id = v->vcpu_id * 2;
> 
> This looks to depend upon it only ever being vCPU which may get a (new
> style) APIC ID of 0. I think such at least wants mentioning in the
> comment.

I don't quite follow you, I'm afraid. There is an implicit control flow
assumption that I can extract into a comment (I assume you were going
for that angle?). The implicit assumption that "vCPU0 always has
APIC_ID=0", which makes vCPU0 go through that path even when no
corrections are necessary. It's benign because it resolves to APIC_ID 0.

Is that what you meant? If so, I'll add it to v2.

> 
>> --- a/xen/include/public/arch-x86/hvm/save.h
>> +++ b/xen/include/public/arch-x86/hvm/save.h
>> @@ -394,6 +394,8 @@ struct hvm_hw_lapic {
>>      uint32_t             disabled; /* VLAPIC_xx_DISABLED */
>>      uint32_t             timer_divisor;
>>      uint64_t             tdt_msr;
>> +    uint32_t             x2apic_id;
>> +    uint32_t             rsvd_zero;
>>  };
> 
> I can't spot any checking of this last field indeed being zero.
> 
> Jan

Huh. I was sure I zeroed that on vlapic_init(), but it must've been on a
previous discarded series. Good catch.

Do we also want a check on migrate so a migration from a future Xen in
which it's not zero fails?

Cheers,
Alejandro


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/6] tools/xc: Add xc_cpu_policy to the public xenctrl.h header
  2024-03-19 17:55   ` Roger Pau Monné
@ 2024-03-25 18:19     ` Alejandro Vallejo
  0 siblings, 0 replies; 32+ messages in thread
From: Alejandro Vallejo @ 2024-03-25 18:19 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Xen-devel, Wei Liu, Anthony PERARD, Juergen Gross, Jan Beulich,
	Andrew Cooper

On 19/03/2024 17:55, Roger Pau Monné wrote:
> On Tue, Jan 09, 2024 at 03:38:30PM +0000, Alejandro Vallejo wrote:
>> Move struct xc_cpu_policy data structure out of xg_private.h and into
>> the public xenguest.h so it can be used by libxl.
> 
> I will let Andrew comment on this one, IIRC the initial idea was to
> not leak cpu_policy into libxl, and to instead have it as an opaque
> object that libxl can interact with using helpers.

I wrote this a while ago, and can't quite remember why I left this here.
Looking closely I think it's leftover from something else I eventually
turned into its own series[1]. I don't _think_ it's needed for this
series, so I'll just drop it in v2. If everything blows up I'll tell you
why I needed it :)

[1]
https://lore.kernel.org/xen-devel/20240207173957.19811-1-alejandro.vallejo@cloud.com/

> 
> I haven't looked at followup patches - I assume this is done to
> manipulate the cpuid data more easily from libxl, and ultimately drop
> xc_xend_cpuid?

Yes, we can't drop that altogether because xl exposes an interface
allowing fine-grained manipulation of CPUID/MSR data, but it becomes
minimal. The general idea is to is to stop moving separate buffers
around in tandem and use a single data structure instead whenever possible.

Cheers,
Alejandro


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/6] xen/x86: Add initial x2APIC ID to the per-vLAPIC save area
  2024-03-25 18:00     ` Alejandro Vallejo
@ 2024-03-26  7:16       ` Jan Beulich
  0 siblings, 0 replies; 32+ messages in thread
From: Jan Beulich @ 2024-03-26  7:16 UTC (permalink / raw)
  To: Alejandro Vallejo; +Cc: Andrew Cooper, Roger Pau Monné, Xen-devel

On 25.03.2024 19:00, Alejandro Vallejo wrote:
> On 25/03/2024 16:45, Jan Beulich wrote:
>> On 09.01.2024 16:38, Alejandro Vallejo wrote:
>>> @@ -1514,6 +1530,13 @@ static void lapic_load_fixup(struct vlapic *vlapic)
>>>      const struct vcpu *v = vlapic_vcpu(vlapic);
>>>      uint32_t good_ldr = x2apic_ldr_from_id(vlapic->loaded.id);
>>>  
>>> +    /*
>>> +     * Guest with hardcoded assumptions about x2apic_id <-> vcpu_id
>>> +     * mappings. Recreate the mapping it used to have in old host.
>>> +     */
>>> +    if ( !vlapic->hw.x2apic_id )
>>> +        vlapic->hw.x2apic_id = v->vcpu_id * 2;
>>
>> This looks to depend upon it only ever being vCPU which may get a (new
>> style) APIC ID of 0. I think such at least wants mentioning in the
>> comment.
> 
> I don't quite follow you, I'm afraid. There is an implicit control flow
> assumption that I can extract into a comment (I assume you were going
> for that angle?). The implicit assumption that "vCPU0 always has
> APIC_ID=0", which makes vCPU0 go through that path even when no
> corrections are necessary. It's benign because it resolves to APIC_ID 0.
> 
> Is that what you meant? If so, I'll add it to v2.

Yes, and even in your reply you make the same assumption without further
explanation. It does not go without saying that vCPU 0 necessarily has
APIC ID 0. On bare metal that may be what one can typically observe, but
I'm unaware of any architectural guarantees to this effect.

>>> --- a/xen/include/public/arch-x86/hvm/save.h
>>> +++ b/xen/include/public/arch-x86/hvm/save.h
>>> @@ -394,6 +394,8 @@ struct hvm_hw_lapic {
>>>      uint32_t             disabled; /* VLAPIC_xx_DISABLED */
>>>      uint32_t             timer_divisor;
>>>      uint64_t             tdt_msr;
>>> +    uint32_t             x2apic_id;
>>> +    uint32_t             rsvd_zero;
>>>  };
>>
>> I can't spot any checking of this last field indeed being zero.
> 
> Huh. I was sure I zeroed that on vlapic_init(), but it must've been on a
> previous discarded series. Good catch.

No, explicit zeroing isn't needed, simply because all of struct vcpu
starts out zeroed.

> Do we also want a check on migrate so a migration from a future Xen in
> which it's not zero fails?

It's really only this that I meant. Recall we now even have a dedicated
checking hook, running ahead of any state loading.

Jan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 5/6] xen/x86: Derive topologically correct x2APIC IDs from the policy
  2024-01-09 15:38 ` [PATCH 5/6] xen/x86: Derive topologically correct x2APIC IDs from the policy Alejandro Vallejo
  2024-03-20 10:15   ` Roger Pau Monné
@ 2024-03-26 16:41   ` Jan Beulich
  2024-05-01 16:35     ` Alejandro Vallejo
  1 sibling, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2024-03-26 16:41 UTC (permalink / raw)
  To: Alejandro Vallejo
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, Anthony PERARD, Xen-devel

On 09.01.2024 16:38, Alejandro Vallejo wrote:
> --- a/xen/lib/x86/policy.c
> +++ b/xen/lib/x86/policy.c
> @@ -2,15 +2,78 @@
>  
>  #include <xen/lib/x86/cpu-policy.h>
>  
> -uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t vcpu_id)
> +static uint32_t parts_per_higher_scoped_level(const struct cpu_policy *p, size_t lvl)
>  {
>      /*
> -     * TODO: Derive x2APIC ID from the topology information inside `p`
> -     *       rather than from vCPU ID. This bodge is a temporary measure
> -     *       until all infra is in place to retrieve or derive the initial
> -     *       x2APIC ID from migrated domains.
> +     * `nr_logical` reported by Intel is the number of THREADS contained in
> +     * the next topological scope. For example, assuming a system with 2
> +     * threads/core and 3 cores/module in a fully symmetric topology,
> +     * `nr_logical` at the core level will report 6. Because it's reporting
> +     * the number of threads in a module.
> +     *
> +     * On AMD/Hygon, nr_logical is already normalized by the higher scoped
> +     * level (cores/complex, etc) so we can return it as-is.
>       */
> -    return vcpu_id * 2;
> +    if ( p->x86_vendor != X86_VENDOR_INTEL || !lvl )
> +        return p->topo.subleaf[lvl].nr_logical;

Is "!= Intel" really appropriate here? I'd rather see this being "AMD || Hygon".

Jan

> +    return p->topo.subleaf[lvl].nr_logical / p->topo.subleaf[lvl - 1].nr_logical;
> +}



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 6/6] xen/x86: Add topology generator
  2024-01-09 15:38 ` [PATCH 6/6] xen/x86: Add topology generator Alejandro Vallejo
  2024-01-10 14:16   ` Alejandro Vallejo
  2024-03-20 10:29   ` Roger Pau Monné
@ 2024-03-26 17:02   ` Jan Beulich
  2024-05-01 17:06     ` Alejandro Vallejo
  2 siblings, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2024-03-26 17:02 UTC (permalink / raw)
  To: Alejandro Vallejo
  Cc: Wei Liu, Anthony PERARD, Juergen Gross, Andrew Cooper,
	Roger Pau Monné,
	Xen-devel

On 09.01.2024 16:38, Alejandro Vallejo wrote:
> --- a/tools/include/xenguest.h
> +++ b/tools/include/xenguest.h
> @@ -843,5 +843,20 @@ enum xc_static_cpu_featuremask {
>      XC_FEATUREMASK_HVM_HAP_DEF,
>  };
>  const uint32_t *xc_get_static_cpu_featuremask(enum xc_static_cpu_featuremask);
> +
> +/**
> + * Synthesise topology information in `p` given high-level constraints
> + *
> + * Topology is given in various fields accross several leaves, some of
> + * which are vendor-specific. This function uses the policy itself to
> + * derive such leaves from threads/core and cores/package.
> + *
> + * @param p                   CPU policy of the domain.
> + * @param threads_per_core    threads/core. Doesn't need to be a power of 2.
> + * @param cores_per_package   cores/package. Doesn't need to be a power of 2.
> + */
> +void xc_topo_from_parts(struct cpu_policy *p,
> +                        uint32_t threads_per_core, uint32_t cores_per_pkg);

Do we really want to constrain things to just two (or in fact any fixed number
of) levels? Iirc on AMD there already can be up to 4.

> @@ -1028,3 +976,89 @@ bool xc_cpu_policy_is_compatible(xc_interface *xch, xc_cpu_policy_t *host,
>  
>      return false;
>  }
> +
> +static uint32_t order(uint32_t n)
> +{
> +    return 32 - __builtin_clz(n);
> +}

This isn't really portable. __builtin_clz() takes an unsigned int, which may
in principle be wider than 32 bits. I also can't see a reason for the
function to have a fixed-width return type. Perhaps

static unsigned int order(unsigned int n)
{
    return sizeof(n) * CHAR_BIT - __builtin_clz(n);
}

?

> +void xc_topo_from_parts(struct cpu_policy *p,
> +                        uint32_t threads_per_core, uint32_t cores_per_pkg)
> +{
> +    uint32_t threads_per_pkg = threads_per_core * cores_per_pkg;
> +    uint32_t apic_id_size;
> +
> +    if ( p->basic.max_leaf < 0xb )
> +        p->basic.max_leaf = 0xb;
> +
> +    memset(p->topo.raw, 0, sizeof(p->topo.raw));
> +
> +    /* thread level */
> +    p->topo.subleaf[0].nr_logical = threads_per_core;
> +    p->topo.subleaf[0].id_shift = 0;
> +    p->topo.subleaf[0].level = 0;
> +    p->topo.subleaf[0].type = 1;
> +    if ( threads_per_core > 1 )
> +        p->topo.subleaf[0].id_shift = order(threads_per_core - 1);
> +
> +    /* core level */
> +    p->topo.subleaf[1].nr_logical = cores_per_pkg;
> +    if ( p->x86_vendor == X86_VENDOR_INTEL )
> +        p->topo.subleaf[1].nr_logical = threads_per_pkg;

Same concern as in the other patch regarding "== Intel".

> +    p->topo.subleaf[1].id_shift = p->topo.subleaf[0].id_shift;
> +    p->topo.subleaf[1].level = 1;
> +    p->topo.subleaf[1].type = 2;
> +    if ( cores_per_pkg > 1 )
> +        p->topo.subleaf[1].id_shift += order(cores_per_pkg - 1);

Don't you want to return an error when any of the X_per_Y values is 0?

> +    apic_id_size = p->topo.subleaf[1].id_shift;
> +
> +    /*
> +     * Contrary to what the name might seem to imply. HTT is an enabler for
> +     * SMP and there's no harm in setting it even with a single vCPU.
> +     */
> +    p->basic.htt = true;
> +
> +    p->basic.lppp = 0xff;
> +    if ( threads_per_pkg < 0xff )
> +        p->basic.lppp = threads_per_pkg;
> +
> +    switch ( p->x86_vendor )
> +    {
> +        case X86_VENDOR_INTEL:
> +            struct cpuid_cache_leaf *sl = p->cache.subleaf;
> +            for ( size_t i = 0; sl->type &&
> +                                i < ARRAY_SIZE(p->cache.raw); i++, sl++ )
> +            {
> +                sl->cores_per_package = cores_per_pkg - 1;
> +                sl->threads_per_cache = threads_per_core - 1;

IOW the names in struct cpuid_cache_leaf aren't quite correct.

> +                if ( sl->type == 3 /* unified cache */ )
> +                    sl->threads_per_cache = threads_per_pkg - 1;
> +            }
> +            break;
> +
> +        case X86_VENDOR_AMD:
> +        case X86_VENDOR_HYGON:
> +            /* Expose p->basic.lppp */
> +            p->extd.cmp_legacy = true;
> +
> +            /* Clip NC to the maximum value it can hold */
> +            p->extd.nc = 0xff;
> +            if ( threads_per_pkg <= 0xff )
> +                p->extd.nc = threads_per_pkg - 1;
> +
> +            /* TODO: Expose leaf e1E */
> +            p->extd.topoext = false;
> +
> +            /*
> +             * Clip APIC ID to 8, as that's what high core-count machines do

Nit: "... to 8 bits, ..."

> +             *
> +             * That what AMD EPYC 9654 does with >256 CPUs
> +             */
> +            p->extd.apic_id_size = 8;
> +            if ( apic_id_size < 8 )
> +                p->extd.apic_id_size = apic_id_size;

Use min(), with apic_id_size's type suitably adjusted?

> --- a/xen/arch/x86/cpu-policy.c
> +++ b/xen/arch/x86/cpu-policy.c
> @@ -278,9 +278,6 @@ static void recalculate_misc(struct cpu_policy *p)
>  
>      p->basic.raw[0x8] = EMPTY_LEAF;
>  
> -    /* TODO: Rework topology logic. */
> -    memset(p->topo.raw, 0, sizeof(p->topo.raw));
> -
>      p->basic.raw[0xc] = EMPTY_LEAF;
>  
>      p->extd.e1d &= ~CPUID_COMMON_1D_FEATURES;
> @@ -387,6 +384,9 @@ static void __init calculate_host_policy(void)
>      recalculate_xstate(p);
>      recalculate_misc(p);
>  
> +    /* Wipe host topology. Toolstack is expected to synthesise a sensible one */
> +    memset(p->topo.raw, 0, sizeof(p->topo.raw));

I don't think this should be zapped from the host policy. It wants zapping
from the guest ones instead, imo. The host policy may (will) want using in
Xen itself, and hence it should reflect reality.

Jan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 5/6] xen/x86: Derive topologically correct x2APIC IDs from the policy
  2024-03-20 10:15   ` Roger Pau Monné
@ 2024-05-01 16:05     ` Alejandro Vallejo
  0 siblings, 0 replies; 32+ messages in thread
From: Alejandro Vallejo @ 2024-05-01 16:05 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Xen-devel, Jan Beulich, Andrew Cooper, Wei Liu, Anthony PERARD

Hi,

Ack and sure to everything on types, constness and variable names.

On 20/03/2024 10:15, Roger Pau Monné wrote:
>> +        const char *name;
>> +        uint32_t vcpu_id;
>> +        uint32_t x2apic_id;
>> +        struct cpu_policy policy;
>> +    } tests[] = {
>> +        {
>> +            .name = "3v: 3 t/c, 8 c/s", .vcpu_id = 3, .x2apic_id = 1 << 2,
>> +            .policy = {
>> +                .x86_vendor = X86_VENDOR_AMD,
>> +                .topo.subleaf = {
>> +                    [0] = { .nr_logical = 3, .level = 0, .type = 1, .id_shift = 2, },
>> +                    [1] = { .nr_logical = 8, .level = 1, .type = 2, .id_shift = 5, },
>> +                },
> 
> Don't we need a helper that gets passed the number of cores per
> socket and threads per core and fills this up?
> 
> I would introduce this first, add a test for it, and then do this
> testing using the helper.

Funnily enough that's how I tested it initially. Alas, it felt silly to
put it where it will be linked against the hypervisor. I'm happy to put
it back there.

>> diff --git a/xen/lib/x86/policy.c b/xen/lib/x86/policy.c
>> index a3b24e6879..2a50bc076a 100644
>> --- a/xen/lib/x86/policy.c
>> +++ b/xen/lib/x86/policy.c
>> @@ -2,15 +2,78 @@
>>  
>>  #include <xen/lib/x86/cpu-policy.h>
>>  
>> -uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t vcpu_id)
>> +static uint32_t parts_per_higher_scoped_level(const struct cpu_policy *p, size_t lvl)
>>  {
>>      /*
>> -     * TODO: Derive x2APIC ID from the topology information inside `p`
>> -     *       rather than from vCPU ID. This bodge is a temporary measure
>> -     *       until all infra is in place to retrieve or derive the initial
>> -     *       x2APIC ID from migrated domains.
>> +     * `nr_logical` reported by Intel is the number of THREADS contained in
>> +     * the next topological scope. For example, assuming a system with 2
>> +     * threads/core and 3 cores/module in a fully symmetric topology,
>> +     * `nr_logical` at the core level will report 6. Because it's reporting
>> +     * the number of threads in a module.
>> +     *
>> +     * On AMD/Hygon, nr_logical is already normalized by the higher scoped
>> +     * level (cores/complex, etc) so we can return it as-is.
>>       */
>> -    return vcpu_id * 2;
>> +    if ( p->x86_vendor != X86_VENDOR_INTEL || !lvl )
>> +        return p->topo.subleaf[lvl].nr_logical;
>> +
>> +    return p->topo.subleaf[lvl].nr_logical / p->topo.subleaf[lvl - 1].nr_logical;
> 
> I think this is an optimization because we only have 2 levels,
> otherwise you would need a loop like:
> 
> unsigned int nr = p->topo.subleaf[lvl].nr_logical
> for ( unsigned int i; i < lvl; i++ )
>     nr /= p->topo.subleaf[i].nr_logical;
> 
> If that's the case I think we need a
> BUILD_BUG_ON(ARRAY_SIZE(p->topo.subleaf) > 1);

It's not meant to be. That division should still hold for leaves 0x1f
and e26 where the level count typically exceeds 2. From the SDM.

```
Bits 15-00: The number of logical processors across all instances of
this domain within the next higherscoped domain. (For example, in a
processor socket/package comprising “M” dies of “N” cores each, where
each core has “L” logical processors, the “die” domain sub-leaf value of
this field would be M*N*L.) This number reflects configuration as
shipped by Intel. Note, software must not use this field to enumerate
processor topology*.
```

So. If level1 has nr_logical=X*Y and level2 has nr_logical=X*Y*Z then
dividing the latter by the former gives you Z, which is the number of
lvl1-parts for a single instance of lvl2 (i.e: cores/package, or whatever).

Unless I'm missing something?

Cheers,
Alejandro


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 5/6] xen/x86: Derive topologically correct x2APIC IDs from the policy
  2024-03-26 16:41   ` Jan Beulich
@ 2024-05-01 16:35     ` Alejandro Vallejo
  2024-05-02  6:55       ` Jan Beulich
  0 siblings, 1 reply; 32+ messages in thread
From: Alejandro Vallejo @ 2024-05-01 16:35 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, Anthony PERARD, Xen-devel

Hi,

On 26/03/2024 16:41, Jan Beulich wrote:
> On 09.01.2024 16:38, Alejandro Vallejo wrote:
>> --- a/xen/lib/x86/policy.c
>> +++ b/xen/lib/x86/policy.c
>> @@ -2,15 +2,78 @@
>>  
>>  #include <xen/lib/x86/cpu-policy.h>
>>  
>> -uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t vcpu_id)
>> +static uint32_t parts_per_higher_scoped_level(const struct cpu_policy *p, size_t lvl)
>>  {
>>      /*
>> -     * TODO: Derive x2APIC ID from the topology information inside `p`
>> -     *       rather than from vCPU ID. This bodge is a temporary measure
>> -     *       until all infra is in place to retrieve or derive the initial
>> -     *       x2APIC ID from migrated domains.
>> +     * `nr_logical` reported by Intel is the number of THREADS contained in
>> +     * the next topological scope. For example, assuming a system with 2
>> +     * threads/core and 3 cores/module in a fully symmetric topology,
>> +     * `nr_logical` at the core level will report 6. Because it's reporting
>> +     * the number of threads in a module.
>> +     *
>> +     * On AMD/Hygon, nr_logical is already normalized by the higher scoped
>> +     * level (cores/complex, etc) so we can return it as-is.
>>       */
>> -    return vcpu_id * 2;
>> +    if ( p->x86_vendor != X86_VENDOR_INTEL || !lvl )
>> +        return p->topo.subleaf[lvl].nr_logical;
> 
> Is "!= Intel" really appropriate here? I'd rather see this being "AMD || Hygon".

Sure, I don't particularly mind, but why? As far as we know only Intel
has this interpretation for the part counts. I definitely haven't seen
any non-Intel CPUID dump in which the part count is the total number of
threads (Centaur/Zhaoxin are not multithreaded, and don't expose leaves
1f or e26, as far as I could see).

Cheers,
Alejandro


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 6/6] xen/x86: Add topology generator
  2024-03-26 17:02   ` Jan Beulich
@ 2024-05-01 17:06     ` Alejandro Vallejo
  2024-05-02  7:13       ` Jan Beulich
  0 siblings, 1 reply; 32+ messages in thread
From: Alejandro Vallejo @ 2024-05-01 17:06 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Anthony PERARD, Juergen Gross, Andrew Cooper,
	Roger Pau Monné,
	Xen-devel

On 26/03/2024 17:02, Jan Beulich wrote:
> On 09.01.2024 16:38, Alejandro Vallejo wrote:
>> --- a/tools/include/xenguest.h
>> +++ b/tools/include/xenguest.h
>> @@ -843,5 +843,20 @@ enum xc_static_cpu_featuremask {
>>      XC_FEATUREMASK_HVM_HAP_DEF,
>>  };
>>  const uint32_t *xc_get_static_cpu_featuremask(enum xc_static_cpu_featuremask);
>> +
>> +/**
>> + * Synthesise topology information in `p` given high-level constraints
>> + *
>> + * Topology is given in various fields accross several leaves, some of
>> + * which are vendor-specific. This function uses the policy itself to
>> + * derive such leaves from threads/core and cores/package.
>> + *
>> + * @param p                   CPU policy of the domain.
>> + * @param threads_per_core    threads/core. Doesn't need to be a power of 2.
>> + * @param cores_per_package   cores/package. Doesn't need to be a power of 2.
>> + */
>> +void xc_topo_from_parts(struct cpu_policy *p,
>> +                        uint32_t threads_per_core, uint32_t cores_per_pkg);
> 
> Do we really want to constrain things to just two (or in fact any fixed number
> of) levels? Iirc on AMD there already can be up to 4.

For the time being, I think we should keep it simple(ish).

Leaf 0xb is always 2 levels, and it implicitly defines the offset into
the x2APIC ID for the 3rd level (the package, or the die). This approach
can be used for a long time before we need to expose anything else. It
can already be used for exposing multi-socket configurations, but it's
not properly wired to xl.

I suspect we won't have a need to expose more complicated topologies
until hetero systems are more common, and by that time the generator
will need tweaking anyway.

> 
>> @@ -1028,3 +976,89 @@ bool xc_cpu_policy_is_compatible(xc_interface *xch, xc_cpu_policy_t *host,
>>  
>>      return false;
>>  }
>> +
>> +static uint32_t order(uint32_t n)
>> +{
>> +    return 32 - __builtin_clz(n);
>> +}
> 
> This isn't really portable. __builtin_clz() takes an unsigned int, which may
> in principle be wider than 32 bits. I also can't see a reason for the
> function to have a fixed-width return type. Perhaps

Sure to unsigned int. I'll s/CHAR_BIT/8/ as that's mandated by POSIX
already.

> 
> static unsigned int order(unsigned int n)
> {
>     return sizeof(n) * CHAR_BIT - __builtin_clz(n);
> }
> 
> ?
> 
>> +void xc_topo_from_parts(struct cpu_policy *p,
>> +                        uint32_t threads_per_core, uint32_t cores_per_pkg)
>> +{
>> +    uint32_t threads_per_pkg = threads_per_core * cores_per_pkg;
>> +    uint32_t apic_id_size;
>> +
>> +    if ( p->basic.max_leaf < 0xb )
>> +        p->basic.max_leaf = 0xb;
>> +
>> +    memset(p->topo.raw, 0, sizeof(p->topo.raw));
>> +
>> +    /* thread level */
>> +    p->topo.subleaf[0].nr_logical = threads_per_core;
>> +    p->topo.subleaf[0].id_shift = 0;
>> +    p->topo.subleaf[0].level = 0;
>> +    p->topo.subleaf[0].type = 1;
>> +    if ( threads_per_core > 1 )
>> +        p->topo.subleaf[0].id_shift = order(threads_per_core - 1);
>> +
>> +    /* core level */
>> +    p->topo.subleaf[1].nr_logical = cores_per_pkg;
>> +    if ( p->x86_vendor == X86_VENDOR_INTEL )
>> +        p->topo.subleaf[1].nr_logical = threads_per_pkg;
> 
> Same concern as in the other patch regarding "== Intel".
> 

Can you please articulate the concern?

>> +    p->topo.subleaf[1].id_shift = p->topo.subleaf[0].id_shift;
>> +    p->topo.subleaf[1].level = 1;
>> +    p->topo.subleaf[1].type = 2;
>> +    if ( cores_per_pkg > 1 )
>> +        p->topo.subleaf[1].id_shift += order(cores_per_pkg - 1);
> 
> Don't you want to return an error when any of the X_per_Y values is 0?

I'd rather not.

The checks on input parameters should be done wherever the inputs are
taken from. Currently the call site passes threads_per_core=1 and
cores_per_pkg=1+max_vcpus, so it's all guaranteed to work out.

Once it comes from xl, libxl should be in charge of the validations.
Furthermore there's validations the function simply cannot do (nor
should it) in its current form, like checking that...

    max_vcpus == threads_per_core * cores_per_pkg * n_pkgs.

> 
>> +    apic_id_size = p->topo.subleaf[1].id_shift;
>> +
>> +    /*
>> +     * Contrary to what the name might seem to imply. HTT is an enabler for
>> +     * SMP and there's no harm in setting it even with a single vCPU.
>> +     */
>> +    p->basic.htt = true;
>> +
>> +    p->basic.lppp = 0xff;
>> +    if ( threads_per_pkg < 0xff )
>> +        p->basic.lppp = threads_per_pkg;
>> +
>> +    switch ( p->x86_vendor )
>> +    {
>> +        case X86_VENDOR_INTEL:
>> +            struct cpuid_cache_leaf *sl = p->cache.subleaf;
>> +            for ( size_t i = 0; sl->type &&
>> +                                i < ARRAY_SIZE(p->cache.raw); i++, sl++ )
>> +            {
>> +                sl->cores_per_package = cores_per_pkg - 1;
>> +                sl->threads_per_cache = threads_per_core - 1;
> 
> IOW the names in struct cpuid_cache_leaf aren't quite correct.

Because of the - 1, you mean? If anything our name is marginally clearer
than the SDM description. It goes on to say "Add one to the return value
to get the result" in a [**] note, so it's not something we made up.

Xen: threads_per_cache => SDM: Maximum number of addressable IDs for
logical processors sharing this cache

Xen: cores_per_package => SDM: Maximum number of addressable IDs for
processor cores in the physical package

> 
>> +                if ( sl->type == 3 /* unified cache */ )
>> +                    sl->threads_per_cache = threads_per_pkg - 1;
>> +            }
>> +            break;
>> +
>> +        case X86_VENDOR_AMD:
>> +        case X86_VENDOR_HYGON:
>> +            /* Expose p->basic.lppp */
>> +            p->extd.cmp_legacy = true;
>> +
>> +            /* Clip NC to the maximum value it can hold */
>> +            p->extd.nc = 0xff;
>> +            if ( threads_per_pkg <= 0xff )
>> +                p->extd.nc = threads_per_pkg - 1;
>> +
>> +            /* TODO: Expose leaf e1E */
>> +            p->extd.topoext = false;
>> +
>> +            /*
>> +             * Clip APIC ID to 8, as that's what high core-count machines do
> 
> Nit: "... to 8 bits, ..."

ack

> 
>> +             *
>> +             * That what AMD EPYC 9654 does with >256 CPUs
>> +             */
>> +            p->extd.apic_id_size = 8;
>> +            if ( apic_id_size < 8 )
>> +                p->extd.apic_id_size = apic_id_size;
> 
> Use min(), with apic_id_size's type suitably adjusted?

ack

> 
>> --- a/xen/arch/x86/cpu-policy.c
>> +++ b/xen/arch/x86/cpu-policy.c
>> @@ -278,9 +278,6 @@ static void recalculate_misc(struct cpu_policy *p)
>>  
>>      p->basic.raw[0x8] = EMPTY_LEAF;
>>  
>> -    /* TODO: Rework topology logic. */
>> -    memset(p->topo.raw, 0, sizeof(p->topo.raw));
>> -
>>      p->basic.raw[0xc] = EMPTY_LEAF;
>>  
>>      p->extd.e1d &= ~CPUID_COMMON_1D_FEATURES;
>> @@ -387,6 +384,9 @@ static void __init calculate_host_policy(void)
>>      recalculate_xstate(p);
>>      recalculate_misc(p);
>>  
>> +    /* Wipe host topology. Toolstack is expected to synthesise a sensible one */
>> +    memset(p->topo.raw, 0, sizeof(p->topo.raw));
> 
> I don't think this should be zapped from the host policy. It wants zapping
> from the guest ones instead, imo. The host policy may (will) want using in
> Xen itself, and hence it should reflect reality.
> 
> Jan

Shouldn't Xen be checking its own cached state from the raw policy
instead? My understanding was that to a first approximation the host
policy is a template for guest creation. It already has a bunch of
overrides that don't match the real hardware configuration.

Cheers,
Alejandro



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 5/6] xen/x86: Derive topologically correct x2APIC IDs from the policy
  2024-05-01 16:35     ` Alejandro Vallejo
@ 2024-05-02  6:55       ` Jan Beulich
  2024-05-02  6:57         ` Jan Beulich
  0 siblings, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2024-05-02  6:55 UTC (permalink / raw)
  To: Alejandro Vallejo
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, Anthony PERARD, Xen-devel

On 01.05.2024 18:35, Alejandro Vallejo wrote:
> Hi,
> 
> On 26/03/2024 16:41, Jan Beulich wrote:
>> On 09.01.2024 16:38, Alejandro Vallejo wrote:
>>> --- a/xen/lib/x86/policy.c
>>> +++ b/xen/lib/x86/policy.c
>>> @@ -2,15 +2,78 @@
>>>  
>>>  #include <xen/lib/x86/cpu-policy.h>
>>>  
>>> -uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t vcpu_id)
>>> +static uint32_t parts_per_higher_scoped_level(const struct cpu_policy *p, size_t lvl)
>>>  {
>>>      /*
>>> -     * TODO: Derive x2APIC ID from the topology information inside `p`
>>> -     *       rather than from vCPU ID. This bodge is a temporary measure
>>> -     *       until all infra is in place to retrieve or derive the initial
>>> -     *       x2APIC ID from migrated domains.
>>> +     * `nr_logical` reported by Intel is the number of THREADS contained in
>>> +     * the next topological scope. For example, assuming a system with 2
>>> +     * threads/core and 3 cores/module in a fully symmetric topology,
>>> +     * `nr_logical` at the core level will report 6. Because it's reporting
>>> +     * the number of threads in a module.
>>> +     *
>>> +     * On AMD/Hygon, nr_logical is already normalized by the higher scoped
>>> +     * level (cores/complex, etc) so we can return it as-is.
>>>       */
>>> -    return vcpu_id * 2;
>>> +    if ( p->x86_vendor != X86_VENDOR_INTEL || !lvl )
>>> +        return p->topo.subleaf[lvl].nr_logical;
>>
>> Is "!= Intel" really appropriate here? I'd rather see this being "AMD || Hygon".
> 
> Sure, I don't particularly mind, but why? As far as we know only Intel
> has this interpretation for the part counts. I definitely haven't seen
> any non-Intel CPUID dump in which the part count is the total number of
> threads (Centaur/Zhaoxin are not multithreaded, and don't expose leaves
> 1f or e26, as far as I could see).

Because of x86'es origin and perhaps other historical aspects, cloning
Intel behavior is far more likely. The fact that Hygon matches AMD is
simply because they took AMD's design wholesale.

Jan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 5/6] xen/x86: Derive topologically correct x2APIC IDs from the policy
  2024-05-02  6:55       ` Jan Beulich
@ 2024-05-02  6:57         ` Jan Beulich
  2024-05-02 14:44           ` Alejandro Vallejo
  0 siblings, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2024-05-02  6:57 UTC (permalink / raw)
  To: Alejandro Vallejo
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, Anthony PERARD, Xen-devel

On 02.05.2024 08:55, Jan Beulich wrote:
> On 01.05.2024 18:35, Alejandro Vallejo wrote:
>> Hi,
>>
>> On 26/03/2024 16:41, Jan Beulich wrote:
>>> On 09.01.2024 16:38, Alejandro Vallejo wrote:
>>>> --- a/xen/lib/x86/policy.c
>>>> +++ b/xen/lib/x86/policy.c
>>>> @@ -2,15 +2,78 @@
>>>>  
>>>>  #include <xen/lib/x86/cpu-policy.h>
>>>>  
>>>> -uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t vcpu_id)
>>>> +static uint32_t parts_per_higher_scoped_level(const struct cpu_policy *p, size_t lvl)
>>>>  {
>>>>      /*
>>>> -     * TODO: Derive x2APIC ID from the topology information inside `p`
>>>> -     *       rather than from vCPU ID. This bodge is a temporary measure
>>>> -     *       until all infra is in place to retrieve or derive the initial
>>>> -     *       x2APIC ID from migrated domains.
>>>> +     * `nr_logical` reported by Intel is the number of THREADS contained in
>>>> +     * the next topological scope. For example, assuming a system with 2
>>>> +     * threads/core and 3 cores/module in a fully symmetric topology,
>>>> +     * `nr_logical` at the core level will report 6. Because it's reporting
>>>> +     * the number of threads in a module.
>>>> +     *
>>>> +     * On AMD/Hygon, nr_logical is already normalized by the higher scoped
>>>> +     * level (cores/complex, etc) so we can return it as-is.
>>>>       */
>>>> -    return vcpu_id * 2;
>>>> +    if ( p->x86_vendor != X86_VENDOR_INTEL || !lvl )
>>>> +        return p->topo.subleaf[lvl].nr_logical;
>>>
>>> Is "!= Intel" really appropriate here? I'd rather see this being "AMD || Hygon".
>>
>> Sure, I don't particularly mind, but why? As far as we know only Intel
>> has this interpretation for the part counts. I definitely haven't seen
>> any non-Intel CPUID dump in which the part count is the total number of
>> threads (Centaur/Zhaoxin are not multithreaded, and don't expose leaves
>> 1f or e26, as far as I could see).
> 
> Because of x86'es origin and perhaps other historical aspects, cloning
> Intel behavior is far more likely. The fact that Hygon matches AMD is
> simply because they took AMD's design wholesale.

Perhaps: See how many dead ends AMD have created, i.e. stuff they proudly
introduced into the architecture, but then gave up again (presumably for
diverging too far from Intel, and hence lacking long term acceptance):
3DNow!, LWP, and XOP just to name those that come to mind right away.

Jan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 6/6] xen/x86: Add topology generator
  2024-05-01 17:06     ` Alejandro Vallejo
@ 2024-05-02  7:13       ` Jan Beulich
  0 siblings, 0 replies; 32+ messages in thread
From: Jan Beulich @ 2024-05-02  7:13 UTC (permalink / raw)
  To: Alejandro Vallejo
  Cc: Wei Liu, Anthony PERARD, Juergen Gross, Andrew Cooper,
	Roger Pau Monné,
	Xen-devel

On 01.05.2024 19:06, Alejandro Vallejo wrote:
> On 26/03/2024 17:02, Jan Beulich wrote:
>> On 09.01.2024 16:38, Alejandro Vallejo wrote:
>>> --- a/tools/include/xenguest.h
>>> +++ b/tools/include/xenguest.h
>>> @@ -843,5 +843,20 @@ enum xc_static_cpu_featuremask {
>>>      XC_FEATUREMASK_HVM_HAP_DEF,
>>>  };
>>>  const uint32_t *xc_get_static_cpu_featuremask(enum xc_static_cpu_featuremask);
>>> +
>>> +/**
>>> + * Synthesise topology information in `p` given high-level constraints
>>> + *
>>> + * Topology is given in various fields accross several leaves, some of
>>> + * which are vendor-specific. This function uses the policy itself to
>>> + * derive such leaves from threads/core and cores/package.
>>> + *
>>> + * @param p                   CPU policy of the domain.
>>> + * @param threads_per_core    threads/core. Doesn't need to be a power of 2.
>>> + * @param cores_per_package   cores/package. Doesn't need to be a power of 2.
>>> + */
>>> +void xc_topo_from_parts(struct cpu_policy *p,
>>> +                        uint32_t threads_per_core, uint32_t cores_per_pkg);
>>
>> Do we really want to constrain things to just two (or in fact any fixed number
>> of) levels? Iirc on AMD there already can be up to 4.
> 
> For the time being, I think we should keep it simple(ish).

Perhaps, with (briefly) stating the reason(s) for doing so.

>>> +void xc_topo_from_parts(struct cpu_policy *p,
>>> +                        uint32_t threads_per_core, uint32_t cores_per_pkg)
>>> +{
>>> +    uint32_t threads_per_pkg = threads_per_core * cores_per_pkg;
>>> +    uint32_t apic_id_size;
>>> +
>>> +    if ( p->basic.max_leaf < 0xb )
>>> +        p->basic.max_leaf = 0xb;
>>> +
>>> +    memset(p->topo.raw, 0, sizeof(p->topo.raw));
>>> +
>>> +    /* thread level */
>>> +    p->topo.subleaf[0].nr_logical = threads_per_core;
>>> +    p->topo.subleaf[0].id_shift = 0;
>>> +    p->topo.subleaf[0].level = 0;
>>> +    p->topo.subleaf[0].type = 1;
>>> +    if ( threads_per_core > 1 )
>>> +        p->topo.subleaf[0].id_shift = order(threads_per_core - 1);
>>> +
>>> +    /* core level */
>>> +    p->topo.subleaf[1].nr_logical = cores_per_pkg;
>>> +    if ( p->x86_vendor == X86_VENDOR_INTEL )
>>> +        p->topo.subleaf[1].nr_logical = threads_per_pkg;
>>
>> Same concern as in the other patch regarding "== Intel".
> 
> Can you please articulate the concern?

See my replies to patch 5. Exactly the same applies here.

>>> +    p->topo.subleaf[1].id_shift = p->topo.subleaf[0].id_shift;
>>> +    p->topo.subleaf[1].level = 1;
>>> +    p->topo.subleaf[1].type = 2;
>>> +    if ( cores_per_pkg > 1 )
>>> +        p->topo.subleaf[1].id_shift += order(cores_per_pkg - 1);
>>
>> Don't you want to return an error when any of the X_per_Y values is 0?
> 
> I'd rather not.
> 
> The checks on input parameters should be done wherever the inputs are
> taken from. Currently the call site passes threads_per_core=1 and
> cores_per_pkg=1+max_vcpus, so it's all guaranteed to work out.

Hmm, and then have this function produce potentially bogus results?

> Once it comes from xl, libxl should be in charge of the validations.
> Furthermore there's validations the function simply cannot do (nor
> should it) in its current form, like checking that...
> 
>     max_vcpus == threads_per_core * cores_per_pkg * n_pkgs.

But that isn't an equality that needs to hold, is it? It would put
constraints on exposing e.g. HT to a guest with an odd number of
vCPU-s. Imo especially with core scheduling HT-ness wants properly
reflecting from underlying hardware, so core-sched-like behavior in
the guest itself can actually achieve its goals.

>>> +    apic_id_size = p->topo.subleaf[1].id_shift;
>>> +
>>> +    /*
>>> +     * Contrary to what the name might seem to imply. HTT is an enabler for
>>> +     * SMP and there's no harm in setting it even with a single vCPU.
>>> +     */
>>> +    p->basic.htt = true;
>>> +
>>> +    p->basic.lppp = 0xff;
>>> +    if ( threads_per_pkg < 0xff )
>>> +        p->basic.lppp = threads_per_pkg;
>>> +
>>> +    switch ( p->x86_vendor )
>>> +    {
>>> +        case X86_VENDOR_INTEL:
>>> +            struct cpuid_cache_leaf *sl = p->cache.subleaf;
>>> +            for ( size_t i = 0; sl->type &&
>>> +                                i < ARRAY_SIZE(p->cache.raw); i++, sl++ )
>>> +            {
>>> +                sl->cores_per_package = cores_per_pkg - 1;
>>> +                sl->threads_per_cache = threads_per_core - 1;
>>
>> IOW the names in struct cpuid_cache_leaf aren't quite correct.
> 
> Because of the - 1, you mean?

Yes. The expressions above simply read as if there was an (obvious)
off-by-1.

> If anything our name is marginally clearer
> than the SDM description. It goes on to say "Add one to the return value
> to get the result" in a [**] note, so it's not something we made up.
> 
> Xen: threads_per_cache => SDM: Maximum number of addressable IDs for
> logical processors sharing this cache
> 
> Xen: cores_per_package => SDM: Maximum number of addressable IDs for
> processor cores in the physical package

I'm afraid I don't follow what you're trying to justify here. Following
SDM naming is generally advisable, yes, but only as far as no confusion
results. Otherwise imo a better name wants picking, with the struct
field declaration then accompanied by a comment clarifying the
difference wrt SDM.

>>> --- a/xen/arch/x86/cpu-policy.c
>>> +++ b/xen/arch/x86/cpu-policy.c
>>> @@ -278,9 +278,6 @@ static void recalculate_misc(struct cpu_policy *p)
>>>  
>>>      p->basic.raw[0x8] = EMPTY_LEAF;
>>>  
>>> -    /* TODO: Rework topology logic. */
>>> -    memset(p->topo.raw, 0, sizeof(p->topo.raw));
>>> -
>>>      p->basic.raw[0xc] = EMPTY_LEAF;
>>>  
>>>      p->extd.e1d &= ~CPUID_COMMON_1D_FEATURES;
>>> @@ -387,6 +384,9 @@ static void __init calculate_host_policy(void)
>>>      recalculate_xstate(p);
>>>      recalculate_misc(p);
>>>  
>>> +    /* Wipe host topology. Toolstack is expected to synthesise a sensible one */
>>> +    memset(p->topo.raw, 0, sizeof(p->topo.raw));
>>
>> I don't think this should be zapped from the host policy. It wants zapping
>> from the guest ones instead, imo. The host policy may (will) want using in
>> Xen itself, and hence it should reflect reality.
> 
> Shouldn't Xen be checking its own cached state from the raw policy
> instead? My understanding was that to a first approximation the host
> policy is a template for guest creation. It already has a bunch of
> overrides that don't match the real hardware configuration.

No, raw policy is what comes from hardware, entirely unadjusted for
e.g. command line options of Xen internal restrictions. Any decisions
Xen takes for itself should be based on the host policy (whereas guest
related decisions are to use the respective guest policy).

Jan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 5/6] xen/x86: Derive topologically correct x2APIC IDs from the policy
  2024-05-02  6:57         ` Jan Beulich
@ 2024-05-02 14:44           ` Alejandro Vallejo
  0 siblings, 0 replies; 32+ messages in thread
From: Alejandro Vallejo @ 2024-05-02 14:44 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, Anthony PERARD, Xen-devel

On 02/05/2024 07:57, Jan Beulich wrote:
> On 02.05.2024 08:55, Jan Beulich wrote:
>> On 01.05.2024 18:35, Alejandro Vallejo wrote:
>>> Hi,
>>>
>>> On 26/03/2024 16:41, Jan Beulich wrote:
>>>> On 09.01.2024 16:38, Alejandro Vallejo wrote:
>>>>> --- a/xen/lib/x86/policy.c
>>>>> +++ b/xen/lib/x86/policy.c
>>>>> @@ -2,15 +2,78 @@
>>>>>  
>>>>>  #include <xen/lib/x86/cpu-policy.h>
>>>>>  
>>>>> -uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t vcpu_id)
>>>>> +static uint32_t parts_per_higher_scoped_level(const struct cpu_policy *p, size_t lvl)
>>>>>  {
>>>>>      /*
>>>>> -     * TODO: Derive x2APIC ID from the topology information inside `p`
>>>>> -     *       rather than from vCPU ID. This bodge is a temporary measure
>>>>> -     *       until all infra is in place to retrieve or derive the initial
>>>>> -     *       x2APIC ID from migrated domains.
>>>>> +     * `nr_logical` reported by Intel is the number of THREADS contained in
>>>>> +     * the next topological scope. For example, assuming a system with 2
>>>>> +     * threads/core and 3 cores/module in a fully symmetric topology,
>>>>> +     * `nr_logical` at the core level will report 6. Because it's reporting
>>>>> +     * the number of threads in a module.
>>>>> +     *
>>>>> +     * On AMD/Hygon, nr_logical is already normalized by the higher scoped
>>>>> +     * level (cores/complex, etc) so we can return it as-is.
>>>>>       */
>>>>> -    return vcpu_id * 2;
>>>>> +    if ( p->x86_vendor != X86_VENDOR_INTEL || !lvl )
>>>>> +        return p->topo.subleaf[lvl].nr_logical;
>>>>
>>>> Is "!= Intel" really appropriate here? I'd rather see this being "AMD || Hygon".
>>>
>>> Sure, I don't particularly mind, but why? As far as we know only Intel
>>> has this interpretation for the part counts. I definitely haven't seen
>>> any non-Intel CPUID dump in which the part count is the total number of
>>> threads (Centaur/Zhaoxin are not multithreaded, and don't expose leaves
>>> 1f or e26, as far as I could see).
>>
>> Because of x86'es origin and perhaps other historical aspects, cloning
>> Intel behavior is far more likely.

That claim doesn't hold very well seeing how...

>> The fact that Hygon matches AMD is
>> simply because they took AMD's design wholesale.

... this statement contradicts it. We can't predict which new vendor (if
any) will be cloned/mimicked next, so that's not a very plausible reason
to prioritise a specific vendor in conditionals.

It remains to be seen what a Zhaoxin actually looks like, because I
couldn't get ahold of a complete cpuid dump.

> 
> Perhaps: See how many dead ends AMD have created, i.e. stuff they proudly
> introduced into the architecture, but then gave up again (presumably for
> diverging too far from Intel, and hence lacking long term acceptance):
> 3DNow!, LWP, and XOP just to name those that come to mind right away.
> 
> Jan

I can't say I agree on the cause; Regardless I'd rather not discuss the
relative merits of vendors with regards to backwards compatibility, as
that's besides the point. The point is whether there's a credible
technical reason to prefer this...

  if ( !(a & (B | C)) )
      foo();

... to this...

  if ( a == A )
      foo();

..., as is the case in patch6.

I argue there's not, and in fact legibility-wise the latter is very
clearly superior.

There's also a compelling reason to keep the check coherent on both
generators to avoid bad surprises down the line.

Cheers,
Alejandro



^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2024-05-02 14:44 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-09 15:38 [PATCH 0/6] x86: Expose consistent topology to guests Alejandro Vallejo
2024-01-09 15:38 ` [PATCH 1/6] xen/x86: Add initial x2APIC ID to the per-vLAPIC save area Alejandro Vallejo
2024-03-19 16:20   ` Roger Pau Monné
2024-03-20  9:33     ` Roger Pau Monné
2024-03-25 16:56       ` Alejandro Vallejo
2024-03-25 15:44     ` Alejandro Vallejo
2024-03-25 16:45   ` Jan Beulich
2024-03-25 18:00     ` Alejandro Vallejo
2024-03-26  7:16       ` Jan Beulich
2024-01-09 15:38 ` [PATCH 2/6] tools/xc: Add xc_cpu_policy to the public xenctrl.h header Alejandro Vallejo
2024-03-19 17:55   ` Roger Pau Monné
2024-03-25 18:19     ` Alejandro Vallejo
2024-01-09 15:38 ` [PATCH 3/6] xen/x86: Refactor xen/lib/x86 so it can be linked in hvmloader Alejandro Vallejo
2024-03-25 16:55   ` Jan Beulich
2024-01-09 15:38 ` [PATCH 4/6] tools/hvmloader: Use cpu_policy to determine APIC IDs Alejandro Vallejo
2024-03-20  8:52   ` Roger Pau Monné
2024-03-25 17:00     ` Jan Beulich
2024-01-09 15:38 ` [PATCH 5/6] xen/x86: Derive topologically correct x2APIC IDs from the policy Alejandro Vallejo
2024-03-20 10:15   ` Roger Pau Monné
2024-05-01 16:05     ` Alejandro Vallejo
2024-03-26 16:41   ` Jan Beulich
2024-05-01 16:35     ` Alejandro Vallejo
2024-05-02  6:55       ` Jan Beulich
2024-05-02  6:57         ` Jan Beulich
2024-05-02 14:44           ` Alejandro Vallejo
2024-01-09 15:38 ` [PATCH 6/6] xen/x86: Add topology generator Alejandro Vallejo
2024-01-10 14:16   ` Alejandro Vallejo
2024-01-10 14:28     ` Jan Beulich
2024-03-20 10:29   ` Roger Pau Monné
2024-03-26 17:02   ` Jan Beulich
2024-05-01 17:06     ` Alejandro Vallejo
2024-05-02  7:13       ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.