Xen-Devel Archive on lore.kernel.org
 help / color / Atom feed
* [Xen-devel] [PATCH v2 00/10] x86/cpuid: Switch to using XEN_DOMCTL_set_cpumsr_policy
@ 2019-09-13 19:27 Andrew Cooper
  2019-09-13 19:27 ` [Xen-devel] [PATCH v2 01/10] x86/msr: Offer CPUID Faulting to PVH control domains Andrew Cooper
                   ` (9 more replies)
  0 siblings, 10 replies; 30+ messages in thread
From: Andrew Cooper @ 2019-09-13 19:27 UTC (permalink / raw)
  To: Xen-devel
  Cc: Juergen Gross, Wei Liu, Andrew Cooper, Jan Beulich, Ian Jackson,
	Daniel De Graaf, Roger Pau Monné

This is the next part of the Xen/Toolstack CPUID/MSR work.  With most of the
pieces in place, implement XEN_DOMCTL_set_cpumsr_policy to obsolete the
problematic XEN_DOMCTL_set_cpuid.

Key improvements:

  1) The API supports configuring static MSR settings for the domain, a
     capbility which Xen has never had before.
  2) The hypercall supports saying no when the toolstack tries to pass
     problematic data.
  3) The domain builder no longer uses native CPUID instructions for
     constructing guest policies, which is and has always been erroneous
     behaviour.
  4) Vastily reduce the number of hypercalls for typicaly guest construction,
     by not issuing a hypercall per CPUID leaf.

Patch 3 has been posted before, but a long time ago and it has changed
substantially, so I've decided to start the version numbering from fresh.

This series can be found in git from from:
  http://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=shortlog;h=refs/heads/xen-cpuid

Large changes from v2:
 * Split several changes out into earlier patches.  Rebase around.
 * Introduce dom0=no-cpuid-faulting to restore previous behaviour.

See individual patches for changes.

Andrew Cooper (10):
  x86/msr: Offer CPUID Faulting to PVH control domains
  libx86: Proactively initialise error pointers
  libx86: Introduce x86_cpu_policies_are_compatible()
  x86/cpuid: Split update_domain_cpuid_info() in half
  x86/domctl: Implement XEN_DOMCTL_set_cpumsr_policy
  tools/libxc: Pre-cleanup for xc_cpuid_{set,apply_policy}()
  tools/libxc: Rework xc_cpuid_set() to use {get,set}_cpu_policy()
  tools/libxc: Rework xc_cpuid_apply_policy() to use {get,set}_cpu_policy()
  x86/domctl: Drop XEN_DOMCTL_set_cpuid
  x86/cpuid: Enable CPUID Faulting for PV control domains by default

 docs/misc/xen-command-line.pandoc        |  19 +-
 tools/flask/policy/modules/dom0.te       |   2 +-
 tools/flask/policy/modules/xen.if        |   2 +-
 tools/libxc/include/xenctrl.h            |   7 +-
 tools/libxc/xc_cpuid_x86.c               | 928 +++++++++++--------------------
 tools/tests/cpu-policy/Makefile          |   2 +-
 tools/tests/cpu-policy/test-cpu-policy.c | 115 +++-
 xen/arch/x86/cpu/common.c                |  29 +-
 xen/arch/x86/dom0_build.c                |   2 +
 xen/arch/x86/domctl.c                    | 258 ++++-----
 xen/arch/x86/msr.c                       |   5 +-
 xen/include/asm-x86/setup.h              |   1 +
 xen/include/public/domctl.h              |  29 +-
 xen/include/xen/lib/x86/cpu-policy.h     |  26 +
 xen/include/xen/lib/x86/cpuid.h          |  17 +-
 xen/include/xen/lib/x86/msr.h            |   4 +-
 xen/lib/x86/Makefile                     |   1 +
 xen/lib/x86/cpuid.c                      |   5 +
 xen/lib/x86/msr.c                        |   3 +
 xen/lib/x86/policy.c                     |  54 ++
 xen/xsm/flask/hooks.c                    |   4 +-
 xen/xsm/flask/policy/access_vectors      |   4 +-
 22 files changed, 692 insertions(+), 825 deletions(-)
 create mode 100644 xen/lib/x86/policy.c

-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Xen-devel] [PATCH v2 01/10] x86/msr: Offer CPUID Faulting to PVH control domains
  2019-09-13 19:27 [Xen-devel] [PATCH v2 00/10] x86/cpuid: Switch to using XEN_DOMCTL_set_cpumsr_policy Andrew Cooper
@ 2019-09-13 19:27 ` Andrew Cooper
  2019-09-16 10:53   ` Jan Beulich
  2019-09-13 19:27 ` [Xen-devel] [PATCH v2 02/10] libx86: Proactively initialise error pointers Andrew Cooper
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Andrew Cooper @ 2019-09-13 19:27 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monné

The control domain exclusion for CPUID Faulting predates dom0 PVH, but the
reason for the exclusion (to allow the domain builder to see host CPUID
values) isn't applicable.

The domain builder *is* broken in PVH control domains, and restricting the use
of CPUID Faulting doesn't make it any less broken.  Tweak the logic to only
exclude PV control domains.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>

v2:
 * New
---
 xen/arch/x86/cpu/common.c | 5 +++--
 xen/arch/x86/msr.c        | 4 ++--
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
index 937d8e82a8..4bf852c948 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -169,7 +169,7 @@ void ctxt_switch_levelling(const struct vcpu *next)
 		if (nextd && is_idle_domain(nextd))
 			return;
 		/*
-		 * We *should* be enabling faulting for the control domain.
+		 * We *should* be enabling faulting for PV control domains.
 		 *
 		 * Unfortunately, the domain builder (having only ever been a
 		 * PV guest) expects to be able to see host cpuid state in a
@@ -184,7 +184,8 @@ void ctxt_switch_levelling(const struct vcpu *next)
 		 * generating the maximum full cpuid policy into Xen, at which
 		 * this problem will disappear.
 		 */
-		set_cpuid_faulting(nextd && !is_control_domain(nextd) &&
+		set_cpuid_faulting(nextd && (!is_control_domain(nextd) ||
+					     !is_pv_domain(nextd)) &&
 				   (is_pv_domain(nextd) ||
 				    next->arch.msrs->
 				    misc_features_enables.cpuid_faulting));
diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index e65961fccb..a6c8cc7627 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -91,8 +91,8 @@ int init_domain_msr_policy(struct domain *d)
     if ( !mp )
         return -ENOMEM;
 
-    /* See comment in intel_ctxt_switch_levelling() */
-    if ( is_control_domain(d) )
+    /* See comment in ctxt_switch_levelling() */
+    if ( is_control_domain(d) && is_pv_domain(d) )
         mp->platform_info.cpuid_faulting = false;
 
     d->arch.msr = mp;
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Xen-devel] [PATCH v2 02/10] libx86: Proactively initialise error pointers
  2019-09-13 19:27 [Xen-devel] [PATCH v2 00/10] x86/cpuid: Switch to using XEN_DOMCTL_set_cpumsr_policy Andrew Cooper
  2019-09-13 19:27 ` [Xen-devel] [PATCH v2 01/10] x86/msr: Offer CPUID Faulting to PVH control domains Andrew Cooper
@ 2019-09-13 19:27 ` Andrew Cooper
       [not found]   ` <527f33ad-3de1-15c7-eb4b-603eaf65f3c5@suse.com>
  2019-09-13 19:27 ` [Xen-devel] [PATCH v2 03/10] libx86: Introduce x86_cpu_policies_are_compatible() Andrew Cooper
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Andrew Cooper @ 2019-09-13 19:27 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monné

This results in better behaviour for the caller.

Suggested-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>

v2:
 * New
---
 tools/tests/cpu-policy/test-cpu-policy.c | 4 ++--
 xen/include/xen/lib/x86/cpuid.h          | 6 +++---
 xen/include/xen/lib/x86/msr.h            | 4 ++--
 xen/lib/x86/cpuid.c                      | 5 +++++
 xen/lib/x86/msr.c                        | 3 +++
 5 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/tools/tests/cpu-policy/test-cpu-policy.c b/tools/tests/cpu-policy/test-cpu-policy.c
index fe00cd4276..201358d210 100644
--- a/tools/tests/cpu-policy/test-cpu-policy.c
+++ b/tools/tests/cpu-policy/test-cpu-policy.c
@@ -283,7 +283,7 @@ static void test_cpuid_deserialise_failure(void)
     for ( size_t i = 0; i < ARRAY_SIZE(tests); ++i )
     {
         const struct test *t = &tests[i];
-        uint32_t err_leaf = ~0u, err_subleaf = ~0u;
+        uint32_t err_leaf, err_subleaf;
         int rc;
 
         /* No writes should occur.  Use NULL to catch errors. */
@@ -336,7 +336,7 @@ static void test_msr_deserialise_failure(void)
     for ( size_t i = 0; i < ARRAY_SIZE(tests); ++i )
     {
         const struct test *t = &tests[i];
-        uint32_t err_msr = ~0u;
+        uint32_t err_msr;
         int rc;
 
         /* No writes should occur.  Use NULL to catch errors. */
diff --git a/xen/include/xen/lib/x86/cpuid.h b/xen/include/xen/lib/x86/cpuid.h
index df5946b6b1..79840f99ce 100644
--- a/xen/include/xen/lib/x86/cpuid.h
+++ b/xen/include/xen/lib/x86/cpuid.h
@@ -376,13 +376,13 @@ int x86_cpuid_copy_to_buffer(const struct cpuid_policy *policy,
  * @param policy      The cpuid_policy to unserialise into.
  * @param leaves      The array of leaves to unserialise from.
  * @param nr_entries  The number of entries in 'leaves'.
- * @param err_leaf    Optional hint filled on error.
- * @param err_subleaf Optional hint filled on error.
+ * @param err_leaf    Optional hint for error diagnostics.
+ * @param err_subleaf Optional hint for error diagnostics.
  * @returns -errno
  *
  * Reads at most CPUID_MAX_SERIALISED_LEAVES.  May return -ERANGE if an
  * incoming leaf is out of range of cpuid_policy, in which case the optional
- * err_* pointers are filled to aid diagnostics.
+ * err_* pointers will identify the out-of-range indicies.
  *
  * No content validation of in-range leaves is performed.  Synthesised data is
  * recalculated.
diff --git a/xen/include/xen/lib/x86/msr.h b/xen/include/xen/lib/x86/msr.h
index e83a8fbb0f..203c713320 100644
--- a/xen/include/xen/lib/x86/msr.h
+++ b/xen/include/xen/lib/x86/msr.h
@@ -54,14 +54,14 @@ int x86_msr_copy_to_buffer(const struct msr_policy *policy,
  * @param policy     The msr_policy object to unserialise into.
  * @param msrs       The array of msrs to unserialise from.
  * @param nr_entries The number of entries in 'msrs'.
- * @param err_msr    Optional hint filled on error.
+ * @param err_msr    Optional hint for error diagnostics.
  * @returns -errno
  *
  * Reads at most MSR_MAX_SERIALISED_ENTRIES.  May fail for a number of reasons
  * based on the content in an individual 'msrs' entry, including the MSR index
  * not being valid in the policy, the flags field being nonzero, or if the
  * value provided would truncate when stored in the policy.  In such cases,
- * the optional err_* pointer is filled in to aid diagnostics.
+ * the optional err_* pointer will identify the problematic MSR.
  *
  * No content validation is performed on the data stored in the policy object.
  */
diff --git a/xen/lib/x86/cpuid.c b/xen/lib/x86/cpuid.c
index 266084e613..76b8511034 100644
--- a/xen/lib/x86/cpuid.c
+++ b/xen/lib/x86/cpuid.c
@@ -381,6 +381,11 @@ int x86_cpuid_copy_from_buffer(struct cpuid_policy *p,
     unsigned int i;
     xen_cpuid_leaf_t data;
 
+    if ( err_leaf )
+        *err_leaf = -1;
+    if ( err_subleaf )
+        *err_subleaf = -1;
+
     /*
      * A well formed caller is expected to pass an array with leaves in order,
      * and without any repetitions.  However, due to per-vendor differences,
diff --git a/xen/lib/x86/msr.c b/xen/lib/x86/msr.c
index 256b5ec632..171abf7008 100644
--- a/xen/lib/x86/msr.c
+++ b/xen/lib/x86/msr.c
@@ -55,6 +55,9 @@ int x86_msr_copy_from_buffer(struct msr_policy *p,
     xen_msr_entry_t data;
     int rc;
 
+    if ( err_msr )
+        *err_msr = -1;
+
     /*
      * A well formed caller is expected to pass an array with entries in
      * order, and without any repetitions.  However, due to per-vendor
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Xen-devel] [PATCH v2 03/10] libx86: Introduce x86_cpu_policies_are_compatible()
  2019-09-13 19:27 [Xen-devel] [PATCH v2 00/10] x86/cpuid: Switch to using XEN_DOMCTL_set_cpumsr_policy Andrew Cooper
  2019-09-13 19:27 ` [Xen-devel] [PATCH v2 01/10] x86/msr: Offer CPUID Faulting to PVH control domains Andrew Cooper
  2019-09-13 19:27 ` [Xen-devel] [PATCH v2 02/10] libx86: Proactively initialise error pointers Andrew Cooper
@ 2019-09-13 19:27 ` Andrew Cooper
  2019-09-16 10:59   ` Jan Beulich
  2019-09-13 19:27 ` [Xen-devel] [PATCH v2 04/10] x86/cpuid: Split update_domain_cpuid_info() in half Andrew Cooper
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Andrew Cooper @ 2019-09-13 19:27 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monné

This helper will eventually be the core "can a guest configured like this run
on the CPU?" logic.  For now, it is just enough of a stub to allow us to
replace the hypercall interface while retaining the previous behaviour.

It will be expanded as various other bits of CPUID handling get cleaned up.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>

v2:
 * Rebase over 'plaform' typo fix
 * Proactively initialise the error pointer
 * Expand the function documentation
---
 tools/tests/cpu-policy/Makefile          |   2 +-
 tools/tests/cpu-policy/test-cpu-policy.c | 111 ++++++++++++++++++++++++++++++-
 xen/include/xen/lib/x86/cpu-policy.h     |  26 ++++++++
 xen/lib/x86/Makefile                     |   1 +
 xen/lib/x86/policy.c                     |  54 +++++++++++++++
 5 files changed, 191 insertions(+), 3 deletions(-)
 create mode 100644 xen/lib/x86/policy.c

diff --git a/tools/tests/cpu-policy/Makefile b/tools/tests/cpu-policy/Makefile
index fb548c9b9a..70ff154da6 100644
--- a/tools/tests/cpu-policy/Makefile
+++ b/tools/tests/cpu-policy/Makefile
@@ -39,7 +39,7 @@ CFLAGS += $(APPEND_CFLAGS)
 
 vpath %.c ../../../xen/lib/x86
 
-test-cpu-policy: test-cpu-policy.o msr.o cpuid.o
+test-cpu-policy: test-cpu-policy.o msr.o cpuid.o policy.o
 	$(CC) $(CFLAGS) $^ -o $@
 
 -include $(DEPS_INCLUDE)
diff --git a/tools/tests/cpu-policy/test-cpu-policy.c b/tools/tests/cpu-policy/test-cpu-policy.c
index 201358d210..20ebed923b 100644
--- a/tools/tests/cpu-policy/test-cpu-policy.c
+++ b/tools/tests/cpu-policy/test-cpu-policy.c
@@ -9,8 +9,7 @@
 
 #include <xen-tools/libs.h>
 #include <xen/asm/x86-vendors.h>
-#include <xen/lib/x86/cpuid.h>
-#include <xen/lib/x86/msr.h>
+#include <xen/lib/x86/cpu-policy.h>
 #include <xen/domctl.h>
 
 static unsigned int nr_failures;
@@ -503,6 +502,111 @@ static void test_cpuid_out_of_range_clearing(void)
     }
 }
 
+static void test_is_compatible_success(void)
+{
+    static struct test {
+        const char *name;
+        struct cpuid_policy host_cpuid;
+        struct cpuid_policy guest_cpuid;
+        struct msr_policy host_msr;
+        struct msr_policy guest_msr;
+    } tests[] = {
+        {
+            .name = "Host CPUID faulting, Guest not",
+            .host_msr = {
+                .platform_info.cpuid_faulting = true,
+            },
+        },
+        {
+            .name = "Host CPUID faulting, Guest wanted",
+            .host_msr = {
+                .platform_info.cpuid_faulting = true,
+            },
+            .guest_msr = {
+                .platform_info.cpuid_faulting = true,
+            },
+        },
+    };
+    struct cpu_policy_errors no_errors = INIT_CPU_POLICY_ERRORS;
+
+    printf("Testing policy compatibility success:\n");
+
+    for ( size_t i = 0; i < ARRAY_SIZE(tests); ++i )
+    {
+        struct test *t = &tests[i];
+        struct cpu_policy sys = {
+            &t->host_cpuid,
+            &t->host_msr,
+        }, new = {
+            &t->guest_cpuid,
+            &t->guest_msr,
+        };
+        struct cpu_policy_errors e;
+        int res = x86_cpu_policies_are_compatible(&sys, &new, &e);
+
+        /* Check the expected error output. */
+        if ( res != 0 || memcmp(&no_errors, &e, sizeof(no_errors)) )
+            fail("  Test '%s' expected no errors\n"
+                 "    got res %d { leaf %08x, subleaf %08x, msr %08x }\n",
+                 t->name, res, e.leaf, e.subleaf, e.msr);
+    }
+}
+
+static void test_is_compatible_failure(void)
+{
+    static struct test {
+        const char *name;
+        struct cpuid_policy host_cpuid;
+        struct cpuid_policy guest_cpuid;
+        struct msr_policy host_msr;
+        struct msr_policy guest_msr;
+        struct cpu_policy_errors e;
+    } tests[] = {
+        {
+            .name = "Host basic.max_leaf out of range",
+            .guest_cpuid.basic.max_leaf = 1,
+            .e = { 0, -1, -1 },
+        },
+        {
+            .name = "Host extd.max_leaf out of range",
+            .guest_cpuid.extd.max_leaf = 1,
+            .e = { 0x80000008, -1, -1 },
+        },
+        {
+            .name = "Host no CPUID faulting, Guest wanted",
+            .guest_msr = {
+                .platform_info.cpuid_faulting = true,
+            },
+            .e = { -1, -1, 0xce },
+        },
+    };
+
+    printf("Testing policy compatibility failure:\n");
+
+    for ( size_t i = 0; i < ARRAY_SIZE(tests); ++i )
+    {
+        struct test *t = &tests[i];
+        struct cpu_policy sys = {
+            &t->host_cpuid,
+            &t->host_msr,
+        }, new = {
+            &t->guest_cpuid,
+            &t->guest_msr,
+        };
+        struct cpu_policy_errors e;
+        int res = x86_cpu_policies_are_compatible(&sys, &new, &e);
+
+        /* Check the expected error output. */
+        if ( res == 0 || memcmp(&t->e, &e, sizeof(t->e)) )
+            fail("  Test '%s' res %d\n"
+                 "    expected { leaf %08x, subleaf %08x, msr %08x }\n"
+                 "    got      { leaf %08x, subleaf %08x, msr %08x }\n",
+                 t->name, res,
+                 t->e.leaf, t->e.subleaf, t->e.msr,
+                 e.leaf, e.subleaf, e.msr);
+    }
+}
+
 int main(int argc, char **argv)
 {
     printf("CPU Policy unit tests\n");
@@ -516,6 +620,9 @@ int main(int argc, char **argv)
     test_msr_serialise_success();
     test_msr_deserialise_failure();
 
+    test_is_compatible_success();
+    test_is_compatible_failure();
+
     if ( nr_failures )
         printf("Done: %u failures\n", nr_failures);
     else
diff --git a/xen/include/xen/lib/x86/cpu-policy.h b/xen/include/xen/lib/x86/cpu-policy.h
index 6f07c4b493..b7e38732a0 100644
--- a/xen/include/xen/lib/x86/cpu-policy.h
+++ b/xen/include/xen/lib/x86/cpu-policy.h
@@ -11,6 +11,32 @@ struct cpu_policy
     struct msr_policy *msr;
 };
 
+struct cpu_policy_errors
+{
+    uint32_t leaf, subleaf;
+    uint32_t msr;
+};
+
+#define INIT_CPU_POLICY_ERRORS { ~0u, ~0u, ~0u }
+
+/*
+ * Calculate whether two policies are compatible.
+ *
+ * i.e. Can a VM configured with @guest run on a CPU supporting @host.
+ *
+ * @param host     A cpu_policy describing the hardware capabilities.
+ * @param guest    A cpu_policy describing the intended VM configuration.
+ * @param err      Optional hint for error diagnostics.
+ * @returns -errno
+ *
+ * For typical usage, @host should be a system policy.  In the case that an
+ * incompatibility is detected, the optional err pointer may identify the
+ * problematic leaf/subleaf and/or MSR.
+ */
+int x86_cpu_policies_are_compatible(const struct cpu_policy *host,
+                                    const struct cpu_policy *guest,
+                                    struct cpu_policy_errors *err);
+
 #endif /* !XEN_LIB_X86_POLICIES_H */
 
 /*
diff --git a/xen/lib/x86/Makefile b/xen/lib/x86/Makefile
index 2f9691e964..780ea05db1 100644
--- a/xen/lib/x86/Makefile
+++ b/xen/lib/x86/Makefile
@@ -1,2 +1,3 @@
 obj-y += cpuid.o
 obj-y += msr.o
+obj-y += policy.o
diff --git a/xen/lib/x86/policy.c b/xen/lib/x86/policy.c
new file mode 100644
index 0000000000..33a347ff9b
--- /dev/null
+++ b/xen/lib/x86/policy.c
@@ -0,0 +1,54 @@
+#include "private.h"
+
+#include <xen/lib/x86/cpu-policy.h>
+
+int x86_cpu_policies_are_compatible(const struct cpu_policy *host,
+                                    const struct cpu_policy *guest,
+                                    struct cpu_policy_errors *err)
+{
+    struct cpu_policy_errors e = INIT_CPU_POLICY_ERRORS;
+    int ret = -EINVAL;
+
+    if ( err )
+        *err = e;
+
+#define NA XEN_CPUID_NO_SUBLEAF
+#define FAIL_CPUID(l, s) \
+    do { e.leaf = (l); e.subleaf = (s); goto out; } while ( 0 )
+#define FAIL_MSR(m) \
+    do { e.msr = (m); goto out; } while ( 0 )
+
+    if ( guest->cpuid->basic.max_leaf > host->cpuid->basic.max_leaf )
+        FAIL_CPUID(0, NA);
+
+    if ( guest->cpuid->extd.max_leaf > host->cpuid->extd.max_leaf )
+        FAIL_CPUID(0x80000008, NA);
+
+    /* TODO: Audit more CPUID data. */
+
+    if ( ~host->msr->platform_info.raw & guest->msr->platform_info.raw )
+        FAIL_MSR(MSR_INTEL_PLATFORM_INFO);
+
+#undef FAIL_MSR
+#undef FAIL_CPUID
+#undef NA
+
+    /* Success. */
+    ret = 0;
+
+ out:
+    if ( ret && err )
+        *err = e;
+
+    return ret;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Xen-devel] [PATCH v2 04/10] x86/cpuid: Split update_domain_cpuid_info() in half
  2019-09-13 19:27 [Xen-devel] [PATCH v2 00/10] x86/cpuid: Switch to using XEN_DOMCTL_set_cpumsr_policy Andrew Cooper
                   ` (2 preceding siblings ...)
  2019-09-13 19:27 ` [Xen-devel] [PATCH v2 03/10] libx86: Introduce x86_cpu_policies_are_compatible() Andrew Cooper
@ 2019-09-13 19:27 ` Andrew Cooper
  2019-09-13 19:27 ` [Xen-devel] [PATCH v2 05/10] x86/domctl: Implement XEN_DOMCTL_set_cpumsr_policy Andrew Cooper
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 30+ messages in thread
From: Andrew Cooper @ 2019-09-13 19:27 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monné

update_domain_cpuid_info() currently serves two purposes.  First to merge new
CPUID data from the toolstack, and second, to perform any necessary updating
of derived domain/vcpu settings.

The first part of this is going to be superseded by a new and substantially
more efficient hypercall.

Carve the second part out into a new domain_cpu_policy_changed() helper, and
call this from the remains of update_domain_cpuid_info().

This does drop the call_policy_changed, but with the new hypercall in place,
the common case will be a single call per domain.  Dropping the optimisation
here allows for a cleaner set of following changes.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>

v2:
 * Drop duplicate 'hypercall' in commit message
 * Fix for_each_vcpu () style
---
 xen/arch/x86/domctl.c | 245 ++++++++++++++++++++------------------------------
 1 file changed, 99 insertions(+), 146 deletions(-)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index a744696c6b..48fccf2f7b 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -46,101 +46,14 @@ static int gdbsx_guest_mem_io(domid_t domid, struct xen_domctl_gdbsx_memio *iop)
     return iop->remain ? -EFAULT : 0;
 }
 
-static int update_domain_cpuid_info(struct domain *d,
-                                    const struct xen_domctl_cpuid *ctl)
+static void domain_cpu_policy_changed(struct domain *d)
 {
-    struct cpuid_policy *p = d->arch.cpuid;
-    const struct cpuid_leaf leaf = { ctl->eax, ctl->ebx, ctl->ecx, ctl->edx };
-    int old_vendor = p->x86_vendor;
-    unsigned int old_7d0 = p->feat.raw[0].d, old_e8b = p->extd.raw[8].b;
-    bool call_policy_changed = false; /* Avoid for_each_vcpu() unnecessarily */
-
-    /*
-     * Skip update for leaves we don't care about, to avoid the overhead of
-     * recalculate_cpuid_policy().
-     */
-    switch ( ctl->input[0] )
-    {
-    case 0x00000000 ... ARRAY_SIZE(p->basic.raw) - 1:
-        if ( ctl->input[0] == 4 &&
-             ctl->input[1] >= ARRAY_SIZE(p->cache.raw) )
-            return 0;
-
-        if ( ctl->input[0] == 7 &&
-             ctl->input[1] >= ARRAY_SIZE(p->feat.raw) )
-            return 0;
-
-        if ( ctl->input[0] == 0xb &&
-             ctl->input[1] >= ARRAY_SIZE(p->topo.raw) )
-            return 0;
-
-        BUILD_BUG_ON(ARRAY_SIZE(p->xstate.raw) < 2);
-        if ( ctl->input[0] == XSTATE_CPUID &&
-             ctl->input[1] != 1 ) /* Everything else automatically calculated. */
-            return 0;
-        break;
-
-    case 0x40000000: case 0x40000100:
-        /* Only care about the max_leaf limit. */
-
-    case 0x80000000 ... 0x80000000 + ARRAY_SIZE(p->extd.raw) - 1:
-        break;
-
-    default:
-        return 0;
-    }
-
-    /* Insert ctl data into cpuid_policy. */
-    switch ( ctl->input[0] )
-    {
-    case 0x00000000 ... ARRAY_SIZE(p->basic.raw) - 1:
-        switch ( ctl->input[0] )
-        {
-        case 4:
-            p->cache.raw[ctl->input[1]] = leaf;
-            break;
-
-        case 7:
-            p->feat.raw[ctl->input[1]] = leaf;
-            break;
-
-        case 0xb:
-            p->topo.raw[ctl->input[1]] = leaf;
-            break;
-
-        case XSTATE_CPUID:
-            p->xstate.raw[ctl->input[1]] = leaf;
-            break;
-
-        default:
-            p->basic.raw[ctl->input[0]] = leaf;
-            break;
-        }
-        break;
-
-    case 0x40000000:
-        p->hv_limit = ctl->eax;
-        break;
+    const struct cpuid_policy *p = d->arch.cpuid;
+    struct vcpu *v;
 
-    case 0x40000100:
-        p->hv2_limit = ctl->eax;
-        break;
-
-    case 0x80000000 ... 0x80000000 + ARRAY_SIZE(p->extd.raw) - 1:
-        p->extd.raw[ctl->input[0] - 0x80000000] = leaf;
-        break;
-    }
-
-    recalculate_cpuid_policy(d);
-
-    switch ( ctl->input[0] )
+    if ( is_pv_domain(d) )
     {
-    case 0:
-        call_policy_changed = (p->x86_vendor != old_vendor);
-        break;
-
-    case 1:
-        if ( is_pv_domain(d) && ((levelling_caps & LCAP_1cd) == LCAP_1cd) )
+        if ( ((levelling_caps & LCAP_1cd) == LCAP_1cd) )
         {
             uint64_t mask = cpuidmask_defaults._1cd;
             uint32_t ecx = p->basic._1c;
@@ -197,25 +110,18 @@ static int update_domain_cpuid_info(struct domain *d,
 
             d->arch.pv.cpuidmasks->_1cd = mask;
         }
-        break;
 
-    case 6:
-        if ( is_pv_domain(d) && ((levelling_caps & LCAP_6c) == LCAP_6c) )
+        if ( ((levelling_caps & LCAP_6c) == LCAP_6c) )
         {
             uint64_t mask = cpuidmask_defaults._6c;
 
             if ( boot_cpu_data.x86_vendor == X86_VENDOR_AMD )
-                mask &= (~0ULL << 32) | ctl->ecx;
+                mask &= (~0ULL << 32) | p->basic.raw[6].c;
 
             d->arch.pv.cpuidmasks->_6c = mask;
         }
-        break;
-
-    case 7:
-        if ( ctl->input[1] != 0 )
-            break;
 
-        if ( is_pv_domain(d) && ((levelling_caps & LCAP_7ab0) == LCAP_7ab0) )
+        if ( ((levelling_caps & LCAP_7ab0) == LCAP_7ab0) )
         {
             uint64_t mask = cpuidmask_defaults._7ab0;
 
@@ -232,35 +138,7 @@ static int update_domain_cpuid_info(struct domain *d,
             d->arch.pv.cpuidmasks->_7ab0 = mask;
         }
 
-        /*
-         * If the IBRS/IBPB policy has changed, we need to recalculate the MSR
-         * interception bitmaps.
-         */
-        call_policy_changed = (is_hvm_domain(d) &&
-                               ((old_7d0 ^ p->feat.raw[0].d) &
-                                (cpufeat_mask(X86_FEATURE_IBRSB) |
-                                 cpufeat_mask(X86_FEATURE_L1D_FLUSH))));
-        break;
-
-    case 0xa:
-        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL )
-            break;
-
-        /* If PMU version is zero then the guest doesn't have VPMU */
-        if ( p->basic.pmu_version == 0 )
-        {
-            struct vcpu *v;
-
-            for_each_vcpu ( d, v )
-                vpmu_destroy(v);
-        }
-        break;
-
-    case 0xd:
-        if ( ctl->input[1] != 1 )
-            break;
-
-        if ( is_pv_domain(d) && ((levelling_caps & LCAP_Da1) == LCAP_Da1) )
+        if ( ((levelling_caps & LCAP_Da1) == LCAP_Da1) )
         {
             uint64_t mask = cpuidmask_defaults.Da1;
             uint32_t eax = p->xstate.Da1;
@@ -270,10 +148,8 @@ static int update_domain_cpuid_info(struct domain *d,
 
             d->arch.pv.cpuidmasks->Da1 = mask;
         }
-        break;
 
-    case 0x80000001:
-        if ( is_pv_domain(d) && ((levelling_caps & LCAP_e1cd) == LCAP_e1cd) )
+        if ( ((levelling_caps & LCAP_e1cd) == LCAP_e1cd) )
         {
             uint64_t mask = cpuidmask_defaults.e1cd;
             uint32_t ecx = p->extd.e1c;
@@ -317,27 +193,104 @@ static int update_domain_cpuid_info(struct domain *d,
 
             d->arch.pv.cpuidmasks->e1cd = mask;
         }
+    }
+
+    for_each_vcpu ( d, v )
+    {
+        cpuid_policy_updated(v);
+
+        /* If PMU version is zero then the guest doesn't have VPMU */
+        if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL &&
+             p->basic.pmu_version == 0 )
+            vpmu_destroy(v);
+    }
+}
+
+static int update_domain_cpuid_info(struct domain *d,
+                                    const struct xen_domctl_cpuid *ctl)
+{
+    struct cpuid_policy *p = d->arch.cpuid;
+    const struct cpuid_leaf leaf = { ctl->eax, ctl->ebx, ctl->ecx, ctl->edx };
+
+    /*
+     * Skip update for leaves we don't care about, to avoid the overhead of
+     * recalculate_cpuid_policy().
+     */
+    switch ( ctl->input[0] )
+    {
+    case 0x00000000 ... ARRAY_SIZE(p->basic.raw) - 1:
+        if ( ctl->input[0] == 4 &&
+             ctl->input[1] >= ARRAY_SIZE(p->cache.raw) )
+            return 0;
+
+        if ( ctl->input[0] == 7 &&
+             ctl->input[1] >= ARRAY_SIZE(p->feat.raw) )
+            return 0;
+
+        if ( ctl->input[0] == 0xb &&
+             ctl->input[1] >= ARRAY_SIZE(p->topo.raw) )
+            return 0;
+
+        BUILD_BUG_ON(ARRAY_SIZE(p->xstate.raw) < 2);
+        if ( ctl->input[0] == XSTATE_CPUID &&
+             ctl->input[1] != 1 ) /* Everything else automatically calculated. */
+            return 0;
         break;
 
-    case 0x80000008:
-        /*
-         * If the IBPB policy has changed, we need to recalculate the MSR
-         * interception bitmaps.
-         */
-        call_policy_changed = (is_hvm_domain(d) &&
-                               ((old_e8b ^ p->extd.raw[8].b) &
-                                cpufeat_mask(X86_FEATURE_IBPB)));
+    case 0x40000000: case 0x40000100:
+        /* Only care about the max_leaf limit. */
+
+    case 0x80000000 ... 0x80000000 + ARRAY_SIZE(p->extd.raw) - 1:
         break;
+
+    default:
+        return 0;
     }
 
-    if ( call_policy_changed )
+    /* Insert ctl data into cpuid_policy. */
+    switch ( ctl->input[0] )
     {
-        struct vcpu *v;
+    case 0x00000000 ... ARRAY_SIZE(p->basic.raw) - 1:
+        switch ( ctl->input[0] )
+        {
+        case 4:
+            p->cache.raw[ctl->input[1]] = leaf;
+            break;
+
+        case 7:
+            p->feat.raw[ctl->input[1]] = leaf;
+            break;
+
+        case 0xb:
+            p->topo.raw[ctl->input[1]] = leaf;
+            break;
 
-        for_each_vcpu( d, v )
-            cpuid_policy_updated(v);
+        case XSTATE_CPUID:
+            p->xstate.raw[ctl->input[1]] = leaf;
+            break;
+
+        default:
+            p->basic.raw[ctl->input[0]] = leaf;
+            break;
+        }
+        break;
+
+    case 0x40000000:
+        p->hv_limit = ctl->eax;
+        break;
+
+    case 0x40000100:
+        p->hv2_limit = ctl->eax;
+        break;
+
+    case 0x80000000 ... 0x80000000 + ARRAY_SIZE(p->extd.raw) - 1:
+        p->extd.raw[ctl->input[0] - 0x80000000] = leaf;
+        break;
     }
 
+    recalculate_cpuid_policy(d);
+    domain_cpu_policy_changed(d);
+
     return 0;
 }
 
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Xen-devel] [PATCH v2 05/10] x86/domctl: Implement XEN_DOMCTL_set_cpumsr_policy
  2019-09-13 19:27 [Xen-devel] [PATCH v2 00/10] x86/cpuid: Switch to using XEN_DOMCTL_set_cpumsr_policy Andrew Cooper
                   ` (3 preceding siblings ...)
  2019-09-13 19:27 ` [Xen-devel] [PATCH v2 04/10] x86/cpuid: Split update_domain_cpuid_info() in half Andrew Cooper
@ 2019-09-13 19:27 ` Andrew Cooper
  2019-09-16 11:04   ` Jan Beulich
  2019-09-13 19:27 ` [Xen-devel] [PATCH v2 06/10] tools/libxc: Pre-cleanup for xc_cpuid_{set, apply_policy}() Andrew Cooper
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Andrew Cooper @ 2019-09-13 19:27 UTC (permalink / raw)
  To: Xen-devel
  Cc: Sergey Dyasli, Wei Liu, Andrew Cooper, Ian Jackson, Jan Beulich,
	Daniel De Graaf, Roger Pau Monné

This hypercall allows the toolstack to present one combined CPUID and MSR
policy for a domain, which can be audited in one go by Xen, which is necessary
for correctness of the auditing.

Reuse the existing set_cpuid XSM access vector, as this is logically the same
operation.

As x86_cpu_policies_are_compatible() is still only a stub, retain the call to
recalculate_cpuid_policy() to discard unsafe toolstack settings.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Daniel De Graaf <dgdegra@tycho.nsa.gov>

v2:
 * Bump the DOMCTL interface version
 * Proactively set the error pointers in xc_set_domain_cpu_policy()
 * Adjust domctl API documentation to reflect that not all DOMCTL failures
   will write the error pointers.
---
 tools/libxc/include/xenctrl.h       |  5 +++
 tools/libxc/xc_cpuid_x86.c          | 46 +++++++++++++++++++++
 xen/arch/x86/domctl.c               | 80 +++++++++++++++++++++++++++++++++++++
 xen/include/public/domctl.h         | 18 ++++++---
 xen/xsm/flask/hooks.c               |  1 +
 xen/xsm/flask/policy/access_vectors |  1 +
 6 files changed, 146 insertions(+), 5 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 7559e1bc69..0da437318e 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2530,6 +2530,11 @@ int xc_get_system_cpu_policy(xc_interface *xch, uint32_t index,
 int xc_get_domain_cpu_policy(xc_interface *xch, uint32_t domid,
                              uint32_t *nr_leaves, xen_cpuid_leaf_t *leaves,
                              uint32_t *nr_msrs, xen_msr_entry_t *msrs);
+int xc_set_domain_cpu_policy(xc_interface *xch, uint32_t domid,
+                             uint32_t nr_leaves, xen_cpuid_leaf_t *leaves,
+                             uint32_t nr_msrs, xen_msr_entry_t *msrs,
+                             uint32_t *err_leaf_p, uint32_t *err_subleaf_p,
+                             uint32_t *err_msr_p);
 
 uint32_t xc_get_cpu_featureset_size(void);
 
diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
index b829336082..0f07317b54 100644
--- a/tools/libxc/xc_cpuid_x86.c
+++ b/tools/libxc/xc_cpuid_x86.c
@@ -229,6 +229,52 @@ int xc_get_domain_cpu_policy(xc_interface *xch, uint32_t domid,
     return ret;
 }
 
+int xc_set_domain_cpu_policy(xc_interface *xch, uint32_t domid,
+                             uint32_t nr_leaves, xen_cpuid_leaf_t *leaves,
+                             uint32_t nr_msrs, xen_msr_entry_t *msrs,
+                             uint32_t *err_leaf_p, uint32_t *err_subleaf_p,
+                             uint32_t *err_msr_p)
+{
+    DECLARE_DOMCTL;
+    DECLARE_HYPERCALL_BOUNCE(leaves,
+                             nr_leaves * sizeof(*leaves),
+                             XC_HYPERCALL_BUFFER_BOUNCE_IN);
+    DECLARE_HYPERCALL_BOUNCE(msrs,
+                             nr_msrs * sizeof(*msrs),
+                             XC_HYPERCALL_BUFFER_BOUNCE_IN);
+    int ret;
+
+    if ( xc_hypercall_bounce_pre(xch, leaves) )
+        return -1;
+
+    if ( xc_hypercall_bounce_pre(xch, msrs) )
+        return -1;
+
+    domctl.cmd = XEN_DOMCTL_set_cpu_policy;
+    domctl.domain = domid;
+    domctl.u.cpu_policy.nr_leaves = nr_leaves;
+    set_xen_guest_handle(domctl.u.cpu_policy.cpuid_policy, leaves);
+    domctl.u.cpu_policy.nr_msrs = nr_msrs;
+    set_xen_guest_handle(domctl.u.cpu_policy.msr_policy, msrs);
+    domctl.u.cpu_policy.err_leaf = ~0;
+    domctl.u.cpu_policy.err_subleaf = ~0;
+    domctl.u.cpu_policy.err_msr = ~0;
+
+    ret = do_domctl(xch, &domctl);
+
+    xc_hypercall_bounce_post(xch, leaves);
+    xc_hypercall_bounce_post(xch, msrs);
+
+    if ( err_leaf_p )
+        *err_leaf_p = domctl.u.cpu_policy.err_leaf;
+    if ( err_subleaf_p )
+        *err_subleaf_p = domctl.u.cpu_policy.err_subleaf;
+    if ( err_msr_p )
+        *err_msr_p = domctl.u.cpu_policy.err_msr;
+
+    return ret;
+}
+
 struct cpuid_domain_info
 {
     unsigned int vendor; /* X86_VENDOR_* */
diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 48fccf2f7b..97ced32c21 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -294,6 +294,65 @@ static int update_domain_cpuid_info(struct domain *d,
     return 0;
 }
 
+static int update_domain_cpu_policy(struct domain *d,
+                                    xen_domctl_cpu_policy_t *xdpc)
+{
+    struct cpu_policy new = {};
+    const struct cpu_policy *sys = is_pv_domain(d)
+        ? &system_policies[XEN_SYSCTL_cpu_policy_pv_max]
+        : &system_policies[XEN_SYSCTL_cpu_policy_hvm_max];
+    struct cpu_policy_errors err = INIT_CPU_POLICY_ERRORS;
+    int ret = -ENOMEM;
+
+    /* Start by copying the domain's existing policies. */
+    if ( !(new.cpuid = xmemdup(d->arch.cpuid)) ||
+         !(new.msr   = xmemdup(d->arch.msr)) )
+        goto out;
+
+    /* Merge the toolstack provided data. */
+    if ( (ret = x86_cpuid_copy_from_buffer(
+              new.cpuid, xdpc->cpuid_policy, xdpc->nr_leaves,
+              &err.leaf, &err.subleaf)) ||
+         (ret = x86_msr_copy_from_buffer(
+              new.msr, xdpc->msr_policy, xdpc->nr_msrs, &err.msr)) )
+        goto out;
+
+    /* Trim any newly-stale out-of-range leaves. */
+    x86_cpuid_policy_clear_out_of_range_leaves(new.cpuid);
+
+    /* Audit the combined dataset. */
+    ret = x86_cpu_policies_are_compatible(sys, &new, &err);
+    if ( ret )
+        goto out;
+
+    /*
+     * Audit was successful.  Replace existing policies, leaving the old
+     * policies to be freed.
+     */
+    SWAP(new.cpuid, d->arch.cpuid);
+    SWAP(new.msr,   d->arch.msr);
+
+    /* TODO: Drop when x86_cpu_policies_are_compatible() is completed. */
+    recalculate_cpuid_policy(d);
+
+    /* Recalculate relevant dom/vcpu state now the policy has changed. */
+    domain_cpu_policy_changed(d);
+
+ out:
+    /* Free whichever cpuid/msr structs are not installed in struct domain. */
+    xfree(new.cpuid);
+    xfree(new.msr);
+
+    if ( ret )
+    {
+        xdpc->err_leaf    = err.leaf;
+        xdpc->err_subleaf = err.subleaf;
+        xdpc->err_msr     = err.msr;
+    }
+
+    return ret;
+}
+
 static int vcpu_set_vmce(struct vcpu *v,
                          const struct xen_domctl_ext_vcpucontext *evc)
 {
@@ -1476,6 +1535,27 @@ long arch_do_domctl(
         copyback = true;
         break;
 
+    case XEN_DOMCTL_set_cpu_policy:
+        if ( d == currd ) /* No domain_pause() */
+        {
+            ret = -EINVAL;
+            break;
+        }
+
+        domain_pause(d);
+
+        if ( d->creation_finished )
+            ret = -EEXIST; /* No changing once the domain is running. */
+        else
+        {
+            ret = update_domain_cpu_policy(d, &domctl->u.cpu_policy);
+            if ( ret ) /* Copy domctl->u.cpu_policy.err_* to guest. */
+                copyback = true;
+        }
+
+        domain_unpause(d);
+        break;
+
     default:
         ret = iommu_do_domctl(domctl, d, u_domctl);
         break;
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 77f546cbb8..bd7d26545d 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -38,7 +38,7 @@
 #include "hvm/save.h"
 #include "memory.h"
 
-#define XEN_DOMCTL_INTERFACE_VERSION 0x00000011
+#define XEN_DOMCTL_INTERFACE_VERSION 0x00000012
 
 /*
  * NB. xen_domctl.domain is an IN/OUT parameter for this operation.
@@ -658,17 +658,24 @@ struct xen_domctl_cpuid {
 };
 
 /*
- * XEN_DOMCTL_get_cpu_policy (x86 specific)
+ * XEN_DOMCTL_{get,set}_cpu_policy (x86 specific)
  *
- * Query the CPUID and MSR policies for a specific domain.
+ * Query or set the CPUID and MSR policies for a specific domain.
  */
 struct xen_domctl_cpu_policy {
     uint32_t nr_leaves; /* IN/OUT: Number of leaves in/written to
                          * 'cpuid_policy'. */
     uint32_t nr_msrs;   /* IN/OUT: Number of MSRs in/written to
                          * 'msr_domain_policy' */
-    XEN_GUEST_HANDLE_64(xen_cpuid_leaf_t) cpuid_policy; /* OUT */
-    XEN_GUEST_HANDLE_64(xen_msr_entry_t) msr_policy;    /* OUT */
+    XEN_GUEST_HANDLE_64(xen_cpuid_leaf_t) cpuid_policy; /* IN/OUT */
+    XEN_GUEST_HANDLE_64(xen_msr_entry_t) msr_policy;    /* IN/OUT */
+
+    /*
+     * OUT, set_policy only.  Written in some (but not all) error cases to
+     * identify problem the CPUID leaf/subleaf and/or MSR which auditing
+     * objects to.
+     */
+    uint32_t err_leaf, err_subleaf, err_msr;
 };
 typedef struct xen_domctl_cpu_policy xen_domctl_cpu_policy_t;
 DEFINE_XEN_GUEST_HANDLE(xen_domctl_cpu_policy_t);
@@ -1193,6 +1200,7 @@ struct xen_domctl {
 /* #define XEN_DOMCTL_set_gnttab_limits          80 - Moved into XEN_DOMCTL_createdomain */
 #define XEN_DOMCTL_vuart_op                      81
 #define XEN_DOMCTL_get_cpu_policy                82
+#define XEN_DOMCTL_set_cpu_policy                83
 #define XEN_DOMCTL_gdbsx_guestmemio            1000
 #define XEN_DOMCTL_gdbsx_pausevcpu             1001
 #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index 6800f2d9a0..b23772786a 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -715,6 +715,7 @@ static int flask_domctl(struct domain *d, int cmd)
     case XEN_DOMCTL_set_virq_handler:
         return current_has_perm(d, SECCLASS_DOMAIN, DOMAIN__SET_VIRQ_HANDLER);
 
+    case XEN_DOMCTL_set_cpu_policy:
     case XEN_DOMCTL_set_cpuid:
         return current_has_perm(d, SECCLASS_DOMAIN2, DOMAIN2__SET_CPUID);
 
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index 76f3d60ddd..6f3f9493f8 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -207,6 +207,7 @@ class domain2
 #  source = the domain making the hypercall
 #  target = the new target domain
     set_as_target
+# XEN_DOMCTL_set_cpu_policy
 # XEN_DOMCTL_set_cpuid
     set_cpuid
 # XEN_DOMCTL_gettscinfo
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Xen-devel] [PATCH v2 06/10] tools/libxc: Pre-cleanup for xc_cpuid_{set, apply_policy}()
  2019-09-13 19:27 [Xen-devel] [PATCH v2 00/10] x86/cpuid: Switch to using XEN_DOMCTL_set_cpumsr_policy Andrew Cooper
                   ` (4 preceding siblings ...)
  2019-09-13 19:27 ` [Xen-devel] [PATCH v2 05/10] x86/domctl: Implement XEN_DOMCTL_set_cpumsr_policy Andrew Cooper
@ 2019-09-13 19:27 ` Andrew Cooper
  2019-09-16 11:09   ` Jan Beulich
  2019-09-13 19:27 ` [Xen-devel] [PATCH v2 07/10] tools/libxc: Rework xc_cpuid_set() to use {get, set}_cpu_policy() Andrew Cooper
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Andrew Cooper @ 2019-09-13 19:27 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, Ian Jackson, Wei Liu, Jan Beulich, Roger Pau Monné

This patch is broken out just to simplify the following two.

For xc_cpuid_set(), document how the 's' and 'k' options works because it is
quite subtle.  Replace a memset() with a for loop of 4 explicit NULL
assigments.  This mirrors the free()'s in the fail path.

For xc_cpuid_apply_policy(), const-ify the featureset pointer.  It isn't
written to, and was never intended to be mutable.

Drop three pieces of trailing whitespace.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Ian Jackson <Ian.Jackson@citrix.com>

v2:
 * Adjust the comments to include 's' along with 'k'
---
 tools/libxc/include/xenctrl.h |  2 +-
 tools/libxc/xc_cpuid_x86.c    | 21 ++++++++++++++-------
 2 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 0da437318e..f4431687b3 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1800,7 +1800,7 @@ int xc_cpuid_set(xc_interface *xch,
                  char **config_transformed);
 int xc_cpuid_apply_policy(xc_interface *xch,
                           uint32_t domid,
-                          uint32_t *featureset,
+                          const uint32_t *featureset,
                           unsigned int nr_features);
 int xc_mca_op(xc_interface *xch, struct xen_mc *mc);
 int xc_mca_op_inject_v2(xc_interface *xch, unsigned int flags,
diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
index 0f07317b54..8785cae329 100644
--- a/tools/libxc/xc_cpuid_x86.c
+++ b/tools/libxc/xc_cpuid_x86.c
@@ -1,5 +1,5 @@
 /******************************************************************************
- * xc_cpuid_x86.c 
+ * xc_cpuid_x86.c
  *
  * Compute cpuid of a domain.
  *
@@ -329,7 +329,7 @@ static void cpuid(const unsigned int *input, unsigned int *regs)
 
 static int get_cpuid_domain_info(xc_interface *xch, uint32_t domid,
                                  struct cpuid_domain_info *info,
-                                 uint32_t *featureset,
+                                 const uint32_t *featureset,
                                  unsigned int nr_features)
 {
     struct xen_domctl domctl = {};
@@ -804,8 +804,7 @@ static void sanitise_featureset(struct cpuid_domain_info *info)
 }
 
 int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid,
-                          uint32_t *featureset,
-                          unsigned int nr_features)
+                          const uint32_t *featureset, unsigned int nr_features)
 {
     struct cpuid_domain_info info = {};
     unsigned int input[2] = { 0, 0 }, regs[4];
@@ -895,7 +894,7 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid,
  *   'k' -> pass through host value
  *   's' -> pass through the first time and then keep the same value
  *          across save/restore and migration.
- * 
+ *
  * For 's' and 'x' the configuration is overwritten with the value applied.
  */
 int xc_cpuid_set(
@@ -906,7 +905,8 @@ int xc_cpuid_set(
     unsigned int i, j, regs[4], polregs[4];
     struct cpuid_domain_info info = {};
 
-    memset(config_transformed, 0, 4 * sizeof(*config_transformed));
+    for ( i = 0; i < 4; ++i )
+        config_transformed[i] = NULL;
 
     rc = get_cpuid_domain_info(xch, domid, &info, NULL, 0);
     if ( rc )
@@ -924,7 +924,7 @@ int xc_cpuid_set(
             regs[i] = polregs[i];
             continue;
         }
-        
+
         config_transformed[i] = calloc(33, 1); /* 32 bits, NUL terminator. */
         if ( config_transformed[i] == NULL )
         {
@@ -932,6 +932,13 @@ int xc_cpuid_set(
             goto fail;
         }
 
+        /*
+         * Notes for following this algorithm:
+         *
+         * While it will accept any leaf data, it only makes sense to use on
+         * feature leaves.  regs[] initially contains the host values.  This,
+         * with the fall-through chain is how the 's' and 'k' options works.
+         */
         for ( j = 0; j < 32; j++ )
         {
             unsigned char val = !!((regs[i] & (1U << (31 - j))));
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Xen-devel] [PATCH v2 07/10] tools/libxc: Rework xc_cpuid_set() to use {get, set}_cpu_policy()
  2019-09-13 19:27 [Xen-devel] [PATCH v2 00/10] x86/cpuid: Switch to using XEN_DOMCTL_set_cpumsr_policy Andrew Cooper
                   ` (5 preceding siblings ...)
  2019-09-13 19:27 ` [Xen-devel] [PATCH v2 06/10] tools/libxc: Pre-cleanup for xc_cpuid_{set, apply_policy}() Andrew Cooper
@ 2019-09-13 19:27 ` Andrew Cooper
  2019-09-13 19:27 ` [Xen-devel] [PATCH v2 08/10] tools/libxc: Rework xc_cpuid_apply_policy() " Andrew Cooper
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 30+ messages in thread
From: Andrew Cooper @ 2019-09-13 19:27 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, Ian Jackson, Wei Liu, Jan Beulich, Roger Pau Monné

The purpose of this change is to stop using xc_cpuid_do_domctl(), and to stop
basing decisions on a local CPUID instruction.  This is not an appropriate way
to construct policy information for other domains.

Obtain the host and domain-max policies from Xen, and mix the results as
before.  Provide rather more error logging than before.

No semantics changes to xc_cpuid_set().  There are conceptual problems with
how the function works, which will be addressed in future toolstack work.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Ian Jackson <Ian.Jackson@citrix.com>
---
 tools/libxc/xc_cpuid_x86.c | 95 ++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 84 insertions(+), 11 deletions(-)

diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
index 8785cae329..77f96a4ea6 100644
--- a/tools/libxc/xc_cpuid_x86.c
+++ b/tools/libxc/xc_cpuid_x86.c
@@ -902,20 +902,80 @@ int xc_cpuid_set(
     const char **config, char **config_transformed)
 {
     int rc;
-    unsigned int i, j, regs[4], polregs[4];
-    struct cpuid_domain_info info = {};
+    unsigned int i, j, regs[4] = {}, polregs[4] = {};
+    xc_dominfo_t di;
+    xen_cpuid_leaf_t *leaves = NULL;
+    unsigned int nr_leaves, policy_leaves, nr_msrs;
+    uint32_t err_leaf = -1, err_subleaf = -1, err_msr = -1;
 
     for ( i = 0; i < 4; ++i )
         config_transformed[i] = NULL;
 
-    rc = get_cpuid_domain_info(xch, domid, &info, NULL, 0);
+    if ( xc_domain_getinfo(xch, domid, 1, &di) != 1 ||
+         di.domid != domid )
+    {
+        ERROR("Failed to obtain d%d info", domid);
+        rc = -ESRCH;
+        goto fail;
+    }
+
+    rc = xc_get_cpu_policy_size(xch, &nr_leaves, &nr_msrs);
     if ( rc )
-        goto out;
+    {
+        PERROR("Failed to obtain policy info size");
+        rc = -errno;
+        goto fail;
+    }
 
-    cpuid(input, regs);
+    rc = -ENOMEM;
+    if ( (leaves = calloc(nr_leaves, sizeof(*leaves))) == NULL )
+    {
+        ERROR("Unable to allocate memory for %u CPUID leaves", nr_leaves);
+        goto fail;
+    }
 
-    memcpy(polregs, regs, sizeof(regs));
-    xc_cpuid_policy(&info, input, polregs);
+    /* Get the domain's max policy. */
+    nr_msrs = 0;
+    policy_leaves = nr_leaves;
+    rc = xc_get_system_cpu_policy(xch, di.hvm ? XEN_SYSCTL_cpu_policy_hvm_max
+                                              : XEN_SYSCTL_cpu_policy_pv_max,
+                                  &policy_leaves, leaves, &nr_msrs, NULL);
+    if ( rc )
+    {
+        PERROR("Failed to obtain %s max policy", di.hvm ? "hvm" : "pv");
+        rc = -errno;
+        goto fail;
+    }
+    for ( i = 0; i < policy_leaves; ++i )
+        if ( leaves[i].leaf == input[0] && leaves[i].subleaf == input[1] )
+        {
+            polregs[0] = leaves[i].a;
+            polregs[1] = leaves[i].b;
+            polregs[2] = leaves[i].c;
+            polregs[3] = leaves[i].d;
+            break;
+        }
+
+    /* Get the host policy. */
+    nr_msrs = 0;
+    policy_leaves = nr_leaves;
+    rc = xc_get_system_cpu_policy(xch, XEN_SYSCTL_cpu_policy_host,
+                                  &policy_leaves, leaves, &nr_msrs, NULL);
+    if ( rc )
+    {
+        PERROR("Failed to obtain host policy");
+        rc = -errno;
+        goto fail;
+    }
+    for ( i = 0; i < policy_leaves; ++i )
+        if ( leaves[i].leaf == input[0] && leaves[i].subleaf == input[1] )
+        {
+            regs[0] = leaves[i].a;
+            regs[1] = leaves[i].b;
+            regs[2] = leaves[i].c;
+            regs[3] = leaves[i].d;
+            break;
+        }
 
     for ( i = 0; i < 4; i++ )
     {
@@ -966,9 +1026,21 @@ int xc_cpuid_set(
         }
     }
 
-    rc = xc_cpuid_do_domctl(xch, domid, input, regs);
-    if ( rc == 0 )
-        goto out;
+    /* Feed the transformed leaf back up to Xen. */
+    leaves[0] = (xen_cpuid_leaf_t){ input[0], input[1],
+                                    regs[0], regs[1], regs[2], regs[3] };
+    rc = xc_set_domain_cpu_policy(xch, domid, 1, leaves, 0, NULL,
+                                  &err_leaf, &err_subleaf, &err_msr);
+    if ( rc )
+    {
+        PERROR("Failed to set d%d's policy (err leaf %#x, subleaf %#x, msr %#x)",
+               domid, err_leaf, err_subleaf, err_msr);
+        rc = -errno;
+        goto fail;
+    }
+
+    /* Success! */
+    goto out;
 
  fail:
     for ( i = 0; i < 4; i++ )
@@ -978,6 +1050,7 @@ int xc_cpuid_set(
     }
 
  out:
-    free_cpuid_domain_info(&info);
+    free(leaves);
+
     return rc;
 }
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Xen-devel] [PATCH v2 08/10] tools/libxc: Rework xc_cpuid_apply_policy() to use {get, set}_cpu_policy()
  2019-09-13 19:27 [Xen-devel] [PATCH v2 00/10] x86/cpuid: Switch to using XEN_DOMCTL_set_cpumsr_policy Andrew Cooper
                   ` (6 preceding siblings ...)
  2019-09-13 19:27 ` [Xen-devel] [PATCH v2 07/10] tools/libxc: Rework xc_cpuid_set() to use {get, set}_cpu_policy() Andrew Cooper
@ 2019-09-13 19:27 ` " Andrew Cooper
  2019-09-16 11:17   ` Jan Beulich
                     ` (2 more replies)
  2019-09-13 19:27 ` [Xen-devel] [PATCH v2 09/10] x86/domctl: Drop XEN_DOMCTL_set_cpuid Andrew Cooper
  2019-09-13 19:27 ` [Xen-devel] [PATCH v2 10/10] x86/cpuid: Enable CPUID Faulting for PV control domains by default Andrew Cooper
  9 siblings, 3 replies; 30+ messages in thread
From: Andrew Cooper @ 2019-09-13 19:27 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, Ian Jackson, Wei Liu, Jan Beulich, Roger Pau Monné

The purpose of this change is to stop using xc_cpuid_do_domctl(), and to stop
basing decisions on a local CPUID instruction.  This is not a correct or
appropriate way to construct policy information for other domains.

The overwhelming majority of this logic is redundant with the policy logic in
Xen, but has a habit of becoming stale (e.g. c/s 97e4ebdcd76 resulting in the
CPUID.7[1].eax not being offered to guests even when Xen is happy with the
content).

There are a few subtle side effects which need to remain in place.  A
successful call to xc_cpuid_apply_policy() must result in a call to
xc_set_domain_cpu_policy() because that is currently the only way the
ITSC/VMX/SVM bits become reflected in the guests CPUID view.  Future cleanup
will remove this side effect.

The topology tweaks are local to libxc.  Extend struct cpuid_policy with
enough named fields to express the logic, but keep it identical to before.
Fixing topology representation is another future area of work.

No (expected) change in behaviour.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Ian Jackson <Ian.Jackson@citrix.com>

The repositioning of xc_cpuid_apply_policy() relative to xc_cpuid_set() is
simply to make the diff readable.  It is completely illegible otherwise.

v2:
 * Reword the commit message to drop AVX512_BF16
 * Initialise host_featureset[] just in case.
---
 tools/libxc/xc_cpuid_x86.c      | 798 ++++++++++------------------------------
 xen/include/xen/lib/x86/cpuid.h |  11 +-
 2 files changed, 197 insertions(+), 612 deletions(-)

diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
index 77f96a4ea6..8e93a60978 100644
--- a/tools/libxc/xc_cpuid_x86.c
+++ b/tools/libxc/xc_cpuid_x86.c
@@ -34,18 +34,13 @@ enum {
 
 #include <xen/asm/x86-vendors.h>
 
-#include <xen/lib/x86/cpuid.h>
-#include <xen/lib/x86/msr.h>
+#include <xen/lib/x86/cpu-policy.h>
 
 #define bitmaskof(idx)      (1u << ((idx) & 31))
 #define featureword_of(idx) ((idx) >> 5)
 #define clear_feature(idx, dst) ((dst) &= ~bitmaskof(idx))
 #define set_feature(idx, dst)   ((dst) |=  bitmaskof(idx))
 
-#define DEF_MAX_BASE 0x0000000du
-#define DEF_MAX_INTELEXT  0x80000008u
-#define DEF_MAX_AMDEXT    0x8000001cu
-
 int xc_get_cpu_levelling_caps(xc_interface *xch, uint32_t *caps)
 {
     DECLARE_SYSCTL;
@@ -275,609 +270,6 @@ int xc_set_domain_cpu_policy(xc_interface *xch, uint32_t domid,
     return ret;
 }
 
-struct cpuid_domain_info
-{
-    unsigned int vendor; /* X86_VENDOR_* */
-
-    bool hvm;
-    uint64_t xfeature_mask;
-
-    /*
-     * Careful with featureset lengths.
-     *
-     * Code in this file requires featureset to have at least
-     * xc_get_cpu_featureset_size() entries.  This is a libxc compiletime
-     * constant.
-     *
-     * The featureset length used by the hypervisor may be different.  If the
-     * hypervisor version is longer, XEN_SYSCTL_get_cpu_featureset will fail
-     * with -ENOBUFS, and libxc really does need rebuilding.  If the
-     * hypervisor version is shorter, it is safe to zero-extend.
-     */
-    uint32_t *featureset;
-    unsigned int nr_features;
-
-    /* PV-only information. */
-    bool pv64;
-
-    /* HVM-only information. */
-    bool pae;
-    bool nestedhvm;
-};
-
-static void cpuid(const unsigned int *input, unsigned int *regs)
-{
-    unsigned int count = (input[1] == XEN_CPUID_INPUT_UNUSED) ? 0 : input[1];
-#ifdef __i386__
-    /* Use the stack to avoid reg constraint failures with some gcc flags */
-    asm (
-        "push %%ebx; push %%edx\n\t"
-        "cpuid\n\t"
-        "mov %%ebx,4(%4)\n\t"
-        "mov %%edx,12(%4)\n\t"
-        "pop %%edx; pop %%ebx\n\t"
-        : "=a" (regs[0]), "=c" (regs[2])
-        : "0" (input[0]), "1" (count), "S" (regs)
-        : "memory" );
-#else
-    asm (
-        "cpuid"
-        : "=a" (regs[0]), "=b" (regs[1]), "=c" (regs[2]), "=d" (regs[3])
-        : "0" (input[0]), "2" (count) );
-#endif
-}
-
-static int get_cpuid_domain_info(xc_interface *xch, uint32_t domid,
-                                 struct cpuid_domain_info *info,
-                                 const uint32_t *featureset,
-                                 unsigned int nr_features)
-{
-    struct xen_domctl domctl = {};
-    xc_dominfo_t di;
-    unsigned int in[2] = { 0, ~0U }, regs[4];
-    unsigned int i, host_nr_features = xc_get_cpu_featureset_size();
-    int rc;
-
-    cpuid(in, regs);
-    info->vendor = x86_cpuid_lookup_vendor(regs[1], regs[2], regs[3]);
-
-    if ( xc_domain_getinfo(xch, domid, 1, &di) != 1 ||
-         di.domid != domid )
-        return -ESRCH;
-
-    info->hvm = di.hvm;
-
-    info->featureset = calloc(host_nr_features, sizeof(*info->featureset));
-    if ( !info->featureset )
-        return -ENOMEM;
-
-    info->nr_features = host_nr_features;
-
-    if ( featureset )
-    {
-        /*
-         * The user supplied featureset may be shorter or longer than
-         * host_nr_features.  Shorter is fine, and we will zero-extend.
-         * Longer is fine, so long as it only padded with zeros.
-         */
-        unsigned int fslen = min(host_nr_features, nr_features);
-
-        memcpy(info->featureset, featureset,
-               fslen * sizeof(*info->featureset));
-
-        /* Check for truncated set bits. */
-        for ( i = fslen; i < nr_features; ++i )
-            if ( featureset[i] != 0 )
-                return -EOPNOTSUPP;
-    }
-    else
-    {
-        rc = xc_get_cpu_featureset(xch, (info->hvm
-                                         ? XEN_SYSCTL_cpu_featureset_hvm
-                                         : XEN_SYSCTL_cpu_featureset_pv),
-                                   &host_nr_features, info->featureset);
-        if ( rc )
-            return -errno;
-    }
-
-    /* Get xstate information. */
-    domctl.cmd = XEN_DOMCTL_getvcpuextstate;
-    domctl.domain = domid;
-    rc = do_domctl(xch, &domctl);
-    if ( rc )
-        return -errno;
-
-    info->xfeature_mask = domctl.u.vcpuextstate.xfeature_mask;
-
-    if ( di.hvm )
-    {
-        uint64_t val;
-
-        rc = xc_hvm_param_get(xch, domid, HVM_PARAM_PAE_ENABLED, &val);
-        if ( rc )
-            return -errno;
-
-        info->pae = !!val;
-
-        rc = xc_hvm_param_get(xch, domid, HVM_PARAM_NESTEDHVM, &val);
-        if ( rc )
-            return -errno;
-
-        info->nestedhvm = !!val;
-    }
-    else
-    {
-        unsigned int width;
-
-        rc = xc_domain_get_guest_width(xch, domid, &width);
-        if ( rc )
-            return -errno;
-
-        info->pv64 = (width == 8);
-    }
-
-    return 0;
-}
-
-static void free_cpuid_domain_info(struct cpuid_domain_info *info)
-{
-    free(info->featureset);
-}
-
-static void amd_xc_cpuid_policy(const struct cpuid_domain_info *info,
-                                const unsigned int *input, unsigned int *regs)
-{
-    switch ( input[0] )
-    {
-    case 0x00000002:
-    case 0x00000004:
-        regs[0] = regs[1] = regs[2] = 0;
-        break;
-
-    case 0x80000000:
-        if ( regs[0] > DEF_MAX_AMDEXT )
-            regs[0] = DEF_MAX_AMDEXT;
-        break;
-
-    case 0x80000008:
-        /*
-         * ECX[15:12] is ApicIdCoreSize: ECX[7:0] is NumberOfCores (minus one).
-         * Update to reflect vLAPIC_ID = vCPU_ID * 2.
-         */
-        regs[2] = ((regs[2] + (1u << 12)) & 0xf000u) |
-                  ((regs[2] & 0xffu) << 1) | 1u;
-        break;
-
-    case 0x8000000a: {
-        if ( !info->nestedhvm )
-        {
-            regs[0] = regs[1] = regs[2] = regs[3] = 0;
-            break;
-        }
-
-#define SVM_FEATURE_NPT            0x00000001 /* Nested page table support */
-#define SVM_FEATURE_LBRV           0x00000002 /* LBR virtualization support */
-#define SVM_FEATURE_SVML           0x00000004 /* SVM locking MSR support */
-#define SVM_FEATURE_NRIPS          0x00000008 /* Next RIP save on VMEXIT */
-#define SVM_FEATURE_TSCRATEMSR     0x00000010 /* TSC ratio MSR support */
-#define SVM_FEATURE_VMCBCLEAN      0x00000020 /* VMCB clean bits support */
-#define SVM_FEATURE_FLUSHBYASID    0x00000040 /* TLB flush by ASID support */
-#define SVM_FEATURE_DECODEASSISTS  0x00000080 /* Decode assists support */
-#define SVM_FEATURE_PAUSEFILTER    0x00000400 /* Pause intercept filter */
-
-        /* Pass 1: Only passthrough SVM features which are
-         * available in hw and which are implemented
-         */
-        regs[3] &= (SVM_FEATURE_NPT | SVM_FEATURE_LBRV | \
-            SVM_FEATURE_NRIPS | SVM_FEATURE_PAUSEFILTER | \
-            SVM_FEATURE_DECODEASSISTS);
-
-        /* Pass 2: Always enable SVM features which are emulated */
-        regs[3] |= SVM_FEATURE_VMCBCLEAN | SVM_FEATURE_TSCRATEMSR;
-        break;
-    }
-
-    }
-}
-
-static void intel_xc_cpuid_policy(const struct cpuid_domain_info *info,
-                                  const unsigned int *input, unsigned int *regs)
-{
-    switch ( input[0] )
-    {
-    case 0x00000004:
-        /*
-         * EAX[31:26] is Maximum Cores Per Package (minus one).
-         * Update to reflect vLAPIC_ID = vCPU_ID * 2.
-         */
-        regs[0] = (((regs[0] & 0x7c000000u) << 1) | 0x04000000u |
-                   (regs[0] & 0x3ffu));
-        regs[3] &= 0x3ffu;
-        break;
-
-    case 0x80000000:
-        if ( regs[0] > DEF_MAX_INTELEXT )
-            regs[0] = DEF_MAX_INTELEXT;
-        break;
-
-    case 0x80000005:
-        regs[0] = regs[1] = regs[2] = 0;
-        break;
-
-    case 0x80000008:
-        /* Mask AMD Number of Cores information. */
-        regs[2] = 0;
-        break;
-    }
-}
-
-static void xc_cpuid_hvm_policy(const struct cpuid_domain_info *info,
-                                const unsigned int *input, unsigned int *regs)
-{
-    switch ( input[0] )
-    {
-    case 0x00000000:
-        if ( regs[0] > DEF_MAX_BASE )
-            regs[0] = DEF_MAX_BASE;
-        break;
-
-    case 0x00000001:
-        /*
-         * EBX[23:16] is Maximum Logical Processors Per Package.
-         * Update to reflect vLAPIC_ID = vCPU_ID * 2.
-         */
-        regs[1] = (regs[1] & 0x0000ffffu) | ((regs[1] & 0x007f0000u) << 1);
-
-        regs[2] = info->featureset[featureword_of(X86_FEATURE_SSE3)];
-        regs[3] = (info->featureset[featureword_of(X86_FEATURE_FPU)] |
-                   bitmaskof(X86_FEATURE_HTT));
-        break;
-
-    case 0x00000007: /* Intel-defined CPU features */
-        if ( input[1] == 0 )
-        {
-            regs[1] = info->featureset[featureword_of(X86_FEATURE_FSGSBASE)];
-            regs[2] = info->featureset[featureword_of(X86_FEATURE_PREFETCHWT1)];
-            regs[3] = info->featureset[featureword_of(X86_FEATURE_AVX512_4VNNIW)];
-        }
-        else
-        {
-            regs[1] = 0;
-            regs[2] = 0;
-            regs[3] = 0;
-        }
-        regs[0] = 0;
-        break;
-
-    case 0x0000000d: /* Xen automatically calculates almost everything. */
-        if ( input[1] == 1 )
-            regs[0] = info->featureset[featureword_of(X86_FEATURE_XSAVEOPT)];
-        else
-            regs[0] = 0;
-        regs[1] = regs[2] = regs[3] = 0;
-        break;
-
-    case 0x80000000:
-        /* Passthrough to cpu vendor specific functions */
-        break;
-
-    case 0x80000001:
-        regs[2] = (info->featureset[featureword_of(X86_FEATURE_LAHF_LM)] &
-                   ~bitmaskof(X86_FEATURE_CMP_LEGACY));
-        regs[3] = info->featureset[featureword_of(X86_FEATURE_SYSCALL)];
-        break;
-
-    case 0x80000007:
-        /*
-         * Keep only TSCInvariant. This may be cleared by the hypervisor
-         * depending on guest TSC and migration settings.
-         */
-        regs[0] = regs[1] = regs[2] = 0;
-        regs[3] &= 1u<<8;
-        break;
-
-    case 0x80000008:
-        regs[0] &= 0x0000ffffu;
-        regs[1] = info->featureset[featureword_of(X86_FEATURE_CLZERO)];
-        /* regs[2] handled in the per-vendor logic. */
-        regs[3] = 0;
-        break;
-
-    case 0x00000002: /* Intel cache info (dumped by AMD policy) */
-    case 0x00000004: /* Intel cache info (dumped by AMD policy) */
-    case 0x0000000a: /* Architectural Performance Monitor Features */
-    case 0x80000002: /* Processor name string */
-    case 0x80000003: /* ... continued         */
-    case 0x80000004: /* ... continued         */
-    case 0x80000005: /* AMD L1 cache/TLB info (dumped by Intel policy) */
-    case 0x80000006: /* AMD L2/3 cache/TLB info ; Intel L2 cache features */
-    case 0x8000000a: /* AMD SVM feature bits */
-    case 0x80000019: /* AMD 1G TLB */
-    case 0x8000001a: /* AMD perf hints */
-    case 0x8000001c: /* AMD lightweight profiling */
-        break;
-
-    default:
-        regs[0] = regs[1] = regs[2] = regs[3] = 0;
-        break;
-    }
-
-    if ( info->vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON) )
-        amd_xc_cpuid_policy(info, input, regs);
-    else
-        intel_xc_cpuid_policy(info, input, regs);
-}
-
-static void xc_cpuid_pv_policy(const struct cpuid_domain_info *info,
-                               const unsigned int *input, unsigned int *regs)
-{
-    switch ( input[0] )
-    {
-    case 0x00000000:
-        if ( regs[0] > DEF_MAX_BASE )
-            regs[0] = DEF_MAX_BASE;
-        break;
-
-    case 0x00000001:
-    {
-        /* Host topology exposed to PV guest.  Provide host value. */
-        bool host_htt = regs[3] & bitmaskof(X86_FEATURE_HTT);
-
-        /*
-         * Don't pick host's Initial APIC ID which can change from run
-         * to run.
-         */
-        regs[1] &= 0x00ffffffu;
-
-        regs[2] = info->featureset[featureword_of(X86_FEATURE_SSE3)];
-        regs[3] = (info->featureset[featureword_of(X86_FEATURE_FPU)] &
-                   ~bitmaskof(X86_FEATURE_HTT));
-
-        if ( host_htt )
-            regs[3] |= bitmaskof(X86_FEATURE_HTT);
-        break;
-    }
-
-    case 0x00000007:
-        if ( input[1] == 0 )
-        {
-            regs[1] = info->featureset[featureword_of(X86_FEATURE_FSGSBASE)];
-            regs[2] = info->featureset[featureword_of(X86_FEATURE_PREFETCHWT1)];
-            regs[3] = info->featureset[featureword_of(X86_FEATURE_AVX512_4VNNIW)];
-        }
-        else
-        {
-            regs[1] = 0;
-            regs[2] = 0;
-            regs[3] = 0;
-        }
-        regs[0] = 0;
-        break;
-
-    case 0x0000000d: /* Xen automatically calculates almost everything. */
-        if ( input[1] == 1 )
-            regs[0] = info->featureset[featureword_of(X86_FEATURE_XSAVEOPT)];
-        else
-            regs[0] = 0;
-        regs[1] = regs[2] = regs[3] = 0;
-        break;
-
-    case 0x80000000:
-    {
-        unsigned int max = (info->vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON))
-            ? DEF_MAX_AMDEXT : DEF_MAX_INTELEXT;
-
-        if ( regs[0] > max )
-            regs[0] = max;
-        break;
-    }
-
-    case 0x80000001:
-    {
-        /* Host topology exposed to PV guest.  Provide host CMP_LEGACY value. */
-        bool host_cmp_legacy = regs[2] & bitmaskof(X86_FEATURE_CMP_LEGACY);
-
-        regs[2] = (info->featureset[featureword_of(X86_FEATURE_LAHF_LM)] &
-                   ~bitmaskof(X86_FEATURE_CMP_LEGACY));
-        regs[3] = info->featureset[featureword_of(X86_FEATURE_SYSCALL)];
-
-        if ( host_cmp_legacy )
-            regs[2] |= bitmaskof(X86_FEATURE_CMP_LEGACY);
-
-        break;
-    }
-
-    case 0x80000008:
-        regs[0] &= 0x0000ffffu;
-        regs[1] = info->featureset[featureword_of(X86_FEATURE_CLZERO)];
-        regs[2] = regs[3] = 0;
-        break;
-
-    case 0x00000005: /* MONITOR/MWAIT */
-    case 0x0000000b: /* Extended Topology Enumeration */
-    case 0x8000000a: /* SVM revision and features */
-    case 0x8000001b: /* Instruction Based Sampling */
-    case 0x8000001c: /* Light Weight Profiling */
-    case 0x8000001e: /* Extended topology reporting */
-        regs[0] = regs[1] = regs[2] = regs[3] = 0;
-        break;
-    }
-}
-
-static void xc_cpuid_policy(const struct cpuid_domain_info *info,
-                            const unsigned int *input, unsigned int *regs)
-{
-    /*
-     * For hypervisor leaves (0x4000XXXX) only 0x4000xx00.EAX[7:0] bits (max
-     * number of leaves) can be set by user. Hypervisor will enforce this so
-     * all other bits are don't-care and we can set them to zero.
-     */
-    if ( (input[0] & 0xffff0000) == 0x40000000 )
-    {
-        regs[0] = regs[1] = regs[2] = regs[3] = 0;
-        return;
-    }
-
-    if ( info->hvm )
-        xc_cpuid_hvm_policy(info, input, regs);
-    else
-        xc_cpuid_pv_policy(info, input, regs);
-}
-
-static int xc_cpuid_do_domctl(
-    xc_interface *xch, uint32_t domid,
-    const unsigned int *input, const unsigned int *regs)
-{
-    DECLARE_DOMCTL;
-
-    memset(&domctl, 0, sizeof (domctl));
-    domctl.domain = domid;
-    domctl.cmd = XEN_DOMCTL_set_cpuid;
-    domctl.u.cpuid.input[0] = input[0];
-    domctl.u.cpuid.input[1] = input[1];
-    domctl.u.cpuid.eax = regs[0];
-    domctl.u.cpuid.ebx = regs[1];
-    domctl.u.cpuid.ecx = regs[2];
-    domctl.u.cpuid.edx = regs[3];
-
-    return do_domctl(xch, &domctl);
-}
-
-static void sanitise_featureset(struct cpuid_domain_info *info)
-{
-    const uint32_t fs_size = xc_get_cpu_featureset_size();
-    uint32_t disabled_features[fs_size];
-    static const uint32_t deep_features[] = INIT_DEEP_FEATURES;
-    unsigned int i, b;
-
-    if ( info->hvm )
-    {
-        /* HVM or PVH Guest */
-
-        if ( !info->pae )
-            clear_bit(X86_FEATURE_PAE, info->featureset);
-
-        if ( !info->nestedhvm )
-        {
-            clear_bit(X86_FEATURE_SVM, info->featureset);
-            clear_bit(X86_FEATURE_VMX, info->featureset);
-        }
-    }
-    else
-    {
-        /* PV Guest */
-
-        if ( !info->pv64 )
-        {
-            clear_bit(X86_FEATURE_LM, info->featureset);
-            if ( !(info->vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) )
-                clear_bit(X86_FEATURE_SYSCALL, info->featureset);
-        }
-
-        clear_bit(X86_FEATURE_PSE, info->featureset);
-        clear_bit(X86_FEATURE_PSE36, info->featureset);
-        clear_bit(X86_FEATURE_PGE, info->featureset);
-        clear_bit(X86_FEATURE_PAGE1GB, info->featureset);
-    }
-
-    if ( info->xfeature_mask == 0 )
-        clear_bit(X86_FEATURE_XSAVE, info->featureset);
-
-    /* Disable deep dependencies of disabled features. */
-    for ( i = 0; i < ARRAY_SIZE(disabled_features); ++i )
-        disabled_features[i] = ~info->featureset[i] & deep_features[i];
-
-    for ( b = 0; b < sizeof(disabled_features) * CHAR_BIT; ++b )
-    {
-        const uint32_t *dfs;
-
-        if ( !test_bit(b, disabled_features) ||
-             !(dfs = x86_cpuid_lookup_deep_deps(b)) )
-             continue;
-
-        for ( i = 0; i < ARRAY_SIZE(disabled_features); ++i )
-        {
-            info->featureset[i] &= ~dfs[i];
-            disabled_features[i] &= ~dfs[i];
-        }
-    }
-}
-
-int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid,
-                          const uint32_t *featureset, unsigned int nr_features)
-{
-    struct cpuid_domain_info info = {};
-    unsigned int input[2] = { 0, 0 }, regs[4];
-    unsigned int base_max, ext_max;
-    int rc;
-
-    rc = get_cpuid_domain_info(xch, domid, &info, featureset, nr_features);
-    if ( rc )
-        goto out;
-
-    cpuid(input, regs);
-    base_max = (regs[0] <= DEF_MAX_BASE) ? regs[0] : DEF_MAX_BASE;
-    input[0] = 0x80000000;
-    cpuid(input, regs);
-
-    if ( info.vendor == X86_VENDOR_AMD || info.vendor == X86_VENDOR_HYGON )
-        ext_max = (regs[0] <= DEF_MAX_AMDEXT) ? regs[0] : DEF_MAX_AMDEXT;
-    else
-        ext_max = (regs[0] <= DEF_MAX_INTELEXT) ? regs[0] : DEF_MAX_INTELEXT;
-
-    sanitise_featureset(&info);
-
-    input[0] = 0;
-    input[1] = XEN_CPUID_INPUT_UNUSED;
-    for ( ; ; )
-    {
-        cpuid(input, regs);
-        xc_cpuid_policy(&info, input, regs);
-
-        if ( regs[0] || regs[1] || regs[2] || regs[3] )
-        {
-            rc = xc_cpuid_do_domctl(xch, domid, input, regs);
-            if ( rc )
-                goto out;
-        }
-
-        /* Intel cache descriptor leaves. */
-        if ( input[0] == 4 )
-        {
-            input[1]++;
-            /* More to do? Then loop keeping %%eax==0x00000004. */
-            if ( (regs[0] & 0x1f) != 0 )
-                continue;
-        }
-        /* Extended Topology leaves. */
-        else if ( input[0] == 0xb )
-        {
-            uint8_t level_type = regs[2] >> 8;
-
-            input[1]++;
-            if ( level_type >= 1 && level_type <= 2 )
-                continue;
-        }
-
-        input[0]++;
-        if ( !(input[0] & 0x80000000u) && (input[0] > base_max ) )
-            input[0] = 0x80000000u;
-
-        input[1] = XEN_CPUID_INPUT_UNUSED;
-        if ( (input[0] == 4) || (input[0] == 7) || (input[0] == 0xb) )
-            input[1] = 0;
-        else if ( input[0] == 0xd )
-            input[1] = 1; /* Xen automatically calculates almost everything. */
-
-        if ( (input[0] & 0x80000000u) && (input[0] > ext_max) )
-            break;
-    }
-
- out:
-    free_cpuid_domain_info(&info);
-    return rc;
-}
-
 /*
  * Configure a single input with the informatiom from config.
  *
@@ -1054,3 +446,191 @@ int xc_cpuid_set(
 
     return rc;
 }
+
+int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid,
+                          const uint32_t *featureset, unsigned int nr_features)
+{
+    int rc;
+    xc_dominfo_t di;
+    unsigned int i, nr_leaves, nr_msrs;
+    xen_cpuid_leaf_t *leaves = NULL;
+    struct cpuid_policy *p = NULL;
+    uint32_t err_leaf = -1, err_subleaf = -1, err_msr = -1;
+
+    if ( xc_domain_getinfo(xch, domid, 1, &di) != 1 ||
+         di.domid != domid )
+    {
+        ERROR("Failed to obtain d%d info", domid);
+        rc = -ESRCH;
+        goto out;
+    }
+
+    rc = xc_get_cpu_policy_size(xch, &nr_leaves, &nr_msrs);
+    if ( rc )
+    {
+        PERROR("Failed to obtain policy info size");
+        rc = -errno;
+        goto out;
+    }
+
+    rc = -ENOMEM;
+    if ( (leaves = calloc(nr_leaves, sizeof(*leaves))) == NULL ||
+         (p = calloc(1, sizeof(*p))) == NULL )
+        goto out;
+
+    nr_msrs = 0;
+    rc = xc_get_domain_cpu_policy(xch, domid, &nr_leaves, leaves,
+                                  &nr_msrs, NULL);
+    if ( rc )
+    {
+        PERROR("Failed to obtain d%d's policy", domid);
+        rc = -errno;
+        goto out;
+    }
+
+    rc = x86_cpuid_copy_from_buffer(p, leaves, nr_leaves,
+                                    &err_leaf, &err_subleaf);
+    if ( rc )
+    {
+        ERROR("Failed to deserialise CPUID (err leaf %#x, subleaf %#x) (%d = %s)",
+              err_leaf, err_subleaf, -rc, strerror(-rc));
+        goto out;
+    }
+
+    if ( featureset )
+    {
+        uint32_t disabled_features[FEATURESET_NR_ENTRIES],
+            feat[FEATURESET_NR_ENTRIES] = {};
+        static const uint32_t deep_features[] = INIT_DEEP_FEATURES;
+        unsigned int i, b;
+
+        /*
+         * The user supplied featureset may be shorter or longer than
+         * FEATURESET_NR_ENTRIES.  Shorter is fine, and we will zero-extend.
+         * Longer is fine, so long as it only padded with zeros.
+         */
+        unsigned int user_len = min(FEATURESET_NR_ENTRIES + 0u, nr_features);
+
+        /* Check for truncated set bits. */
+        rc = -EOPNOTSUPP;
+        for ( i = user_len; i < nr_features; ++i )
+            if ( featureset[i] != 0 )
+                goto out;
+
+        memcpy(feat, featureset, sizeof(*featureset) * user_len);
+
+        /* Disable deep dependencies of disabled features. */
+        for ( i = 0; i < ARRAY_SIZE(disabled_features); ++i )
+            disabled_features[i] = ~feat[i] & deep_features[i];
+
+        for ( b = 0; b < sizeof(disabled_features) * CHAR_BIT; ++b )
+        {
+            const uint32_t *dfs;
+
+            if ( !test_bit(b, disabled_features) ||
+                 !(dfs = x86_cpuid_lookup_deep_deps(b)) )
+                continue;
+
+            for ( i = 0; i < ARRAY_SIZE(disabled_features); ++i )
+            {
+                feat[i] &= ~dfs[i];
+                disabled_features[i] &= ~dfs[i];
+            }
+        }
+
+        cpuid_featureset_to_policy(feat, p);
+    }
+
+    if ( !di.hvm )
+    {
+        uint32_t host_featureset[FEATURESET_NR_ENTRIES] = {};
+        uint32_t len = ARRAY_SIZE(host_featureset);
+
+        rc = xc_get_cpu_featureset(xch, XEN_SYSCTL_cpu_featureset_host,
+                                   &len, host_featureset);
+        if ( rc )
+        {
+            /* Tolerate "buffer too small", as we've got the bits we need. */
+            if ( errno == ENOBUFS )
+                rc = 0;
+            else
+            {
+                PERROR("Failed to obtain host featureset");
+                rc = -errno;
+                goto out;
+            }
+        }
+
+        /*
+         * On hardware without CPUID Faulting, PV guests see real topology.
+         * As a consequence, they also need to see the host htt/cmp fields.
+         */
+        p->basic.htt       = test_bit(X86_FEATURE_HTT, host_featureset);
+        p->extd.cmp_legacy = test_bit(X86_FEATURE_CMP_LEGACY, host_featureset);
+    }
+    else
+    {
+        /*
+         * Topology for HVM guests is entirely controlled by Xen.  For now, we
+         * hardcode APIC_ID = vcpu_id * 2 to give the illusion of no SMT.
+         */
+        p->basic.htt = true;
+        p->extd.cmp_legacy = false;
+
+        p->basic.lppp *= 2;
+
+        switch ( p->x86_vendor )
+        {
+        case X86_VENDOR_INTEL:
+            for ( i = 0; (p->cache.subleaf[i].type &&
+                          i < ARRAY_SIZE(p->cache.raw)); ++i )
+            {
+                p->cache.subleaf[i].cores_per_package =
+                    (p->cache.subleaf[i].cores_per_package << 1) | 1;
+                p->cache.subleaf[i].threads_per_cache = 0;
+            }
+            break;
+
+        case X86_VENDOR_AMD:
+        case X86_VENDOR_HYGON:
+            p->extd.nc = (p->extd.nc << 1) | 1;
+            p->extd.apic_id_size++;
+            break;
+        }
+
+        /*
+         * These settings are necessary to cause earlier HVM_PARAM_NESTEDHVM /
+         * XEN_DOMCTL_disable_migrate settings to be reflected correctly in
+         * CPUID.  Xen will discard these bits if configuration hasn't been
+         * set for the domain.
+         */
+        p->extd.itsc = true;
+        p->basic.vmx = true;
+        p->extd.svm = true;
+    }
+
+    rc = x86_cpuid_copy_to_buffer(p, leaves, &nr_leaves);
+    if ( rc )
+    {
+        ERROR("Failed to serialise CPUID (%d = %s)", -rc, strerror(-rc));
+        goto out;
+    }
+
+    rc = xc_set_domain_cpu_policy(xch, domid, nr_leaves, leaves, 0, NULL,
+                                  &err_leaf, &err_subleaf, &err_msr);
+    if ( rc )
+    {
+        PERROR("Failed to set d%d's policy (err leaf %#x, subleaf %#x, msr %#x)",
+               domid, err_leaf, err_subleaf, err_msr);
+        rc = -errno;
+        goto out;
+    }
+
+    rc = 0;
+
+out:
+    free(p);
+    free(leaves);
+
+    return rc;
+}
diff --git a/xen/include/xen/lib/x86/cpuid.h b/xen/include/xen/lib/x86/cpuid.h
index 79840f99ce..331ef4f4f0 100644
--- a/xen/include/xen/lib/x86/cpuid.h
+++ b/xen/include/xen/lib/x86/cpuid.h
@@ -154,8 +154,12 @@ struct cpuid_policy
     union {
         struct cpuid_leaf raw[CPUID_GUEST_NR_CACHE];
         struct cpuid_cache_leaf {
-            uint32_t type:5,
-                :27, :32, :32, :32;
+            uint32_t /* a */ type:5, level:3;
+            bool self_init:1, fully_assoc:1;
+            uint32_t :4, threads_per_cache:12, cores_per_package:6;
+            uint32_t /* b */ line_size:12, partitions:10, ways:10;
+            uint32_t /* c */ sets;
+            bool /* d */ wbinvd:1, inclusive:1, complex:1;
         } subleaf[CPUID_GUEST_NR_CACHE];
     } cache;
 
@@ -259,7 +263,8 @@ struct cpuid_policy
                 uint32_t e8b;
                 struct { DECL_BITFIELD(e8b); };
             };
-            uint32_t /* c */:32, /* d */:32;
+            uint32_t nc:8, :4, apic_id_size:4, :16;
+            uint32_t /* d */:32;
         };
     } extd;
 
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Xen-devel] [PATCH v2 09/10] x86/domctl: Drop XEN_DOMCTL_set_cpuid
  2019-09-13 19:27 [Xen-devel] [PATCH v2 00/10] x86/cpuid: Switch to using XEN_DOMCTL_set_cpumsr_policy Andrew Cooper
                   ` (7 preceding siblings ...)
  2019-09-13 19:27 ` [Xen-devel] [PATCH v2 08/10] tools/libxc: Rework xc_cpuid_apply_policy() " Andrew Cooper
@ 2019-09-13 19:27 ` Andrew Cooper
  2019-09-13 19:27 ` [Xen-devel] [PATCH v2 10/10] x86/cpuid: Enable CPUID Faulting for PV control domains by default Andrew Cooper
  9 siblings, 0 replies; 30+ messages in thread
From: Andrew Cooper @ 2019-09-13 19:27 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, Daniel De Graaf, Wei Liu, Jan Beulich,
	Roger Pau Monné

With the final users moved over to using XEN_DOMCTL_set_cpumsr_policy, drop
this domctl and associated infrastructure.

Rename the preexisting set_cpuid XSM vector to set_cpu_policy, now that it is
back to having a single user.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Daniel De Graaf <dgdegra@tycho.nsa.gov>
---
 tools/flask/policy/modules/dom0.te  |   2 +-
 tools/flask/policy/modules/xen.if   |   2 +-
 xen/arch/x86/domctl.c               | 101 ------------------------------------
 xen/include/public/domctl.h         |  11 +---
 xen/xsm/flask/hooks.c               |   3 +-
 xen/xsm/flask/policy/access_vectors |   3 +-
 6 files changed, 5 insertions(+), 117 deletions(-)

diff --git a/tools/flask/policy/modules/dom0.te b/tools/flask/policy/modules/dom0.te
index 9970f9dc08..272f6a4f75 100644
--- a/tools/flask/policy/modules/dom0.te
+++ b/tools/flask/policy/modules/dom0.te
@@ -38,7 +38,7 @@ allow dom0_t dom0_t:domain {
 	getpodtarget setpodtarget set_misc_info set_virq_handler
 };
 allow dom0_t dom0_t:domain2 {
-	set_cpuid gettsc settsc setscheduler set_vnumainfo
+	set_cpu_policy gettsc settsc setscheduler set_vnumainfo
 	get_vnumainfo psr_cmt_op psr_alloc get_cpu_policy
 };
 allow dom0_t dom0_t:resource { add remove };
diff --git a/tools/flask/policy/modules/xen.if b/tools/flask/policy/modules/xen.if
index de5fb331bf..8eb2293a52 100644
--- a/tools/flask/policy/modules/xen.if
+++ b/tools/flask/policy/modules/xen.if
@@ -50,7 +50,7 @@ define(`create_domain_common', `
 			getdomaininfo hypercall setvcpucontext getscheduler
 			getvcpuinfo getaddrsize getaffinity setaffinity
 			settime setdomainhandle getvcpucontext set_misc_info };
-	allow $1 $2:domain2 { set_cpuid settsc setscheduler setclaim
+	allow $1 $2:domain2 { set_cpu_policy settsc setscheduler setclaim
 			set_vnumainfo get_vnumainfo cacheflush
 			psr_cmt_op psr_alloc soft_reset
 			resource_map get_cpu_policy };
diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 97ced32c21..f31edf923b 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -206,94 +206,6 @@ static void domain_cpu_policy_changed(struct domain *d)
     }
 }
 
-static int update_domain_cpuid_info(struct domain *d,
-                                    const struct xen_domctl_cpuid *ctl)
-{
-    struct cpuid_policy *p = d->arch.cpuid;
-    const struct cpuid_leaf leaf = { ctl->eax, ctl->ebx, ctl->ecx, ctl->edx };
-
-    /*
-     * Skip update for leaves we don't care about, to avoid the overhead of
-     * recalculate_cpuid_policy().
-     */
-    switch ( ctl->input[0] )
-    {
-    case 0x00000000 ... ARRAY_SIZE(p->basic.raw) - 1:
-        if ( ctl->input[0] == 4 &&
-             ctl->input[1] >= ARRAY_SIZE(p->cache.raw) )
-            return 0;
-
-        if ( ctl->input[0] == 7 &&
-             ctl->input[1] >= ARRAY_SIZE(p->feat.raw) )
-            return 0;
-
-        if ( ctl->input[0] == 0xb &&
-             ctl->input[1] >= ARRAY_SIZE(p->topo.raw) )
-            return 0;
-
-        BUILD_BUG_ON(ARRAY_SIZE(p->xstate.raw) < 2);
-        if ( ctl->input[0] == XSTATE_CPUID &&
-             ctl->input[1] != 1 ) /* Everything else automatically calculated. */
-            return 0;
-        break;
-
-    case 0x40000000: case 0x40000100:
-        /* Only care about the max_leaf limit. */
-
-    case 0x80000000 ... 0x80000000 + ARRAY_SIZE(p->extd.raw) - 1:
-        break;
-
-    default:
-        return 0;
-    }
-
-    /* Insert ctl data into cpuid_policy. */
-    switch ( ctl->input[0] )
-    {
-    case 0x00000000 ... ARRAY_SIZE(p->basic.raw) - 1:
-        switch ( ctl->input[0] )
-        {
-        case 4:
-            p->cache.raw[ctl->input[1]] = leaf;
-            break;
-
-        case 7:
-            p->feat.raw[ctl->input[1]] = leaf;
-            break;
-
-        case 0xb:
-            p->topo.raw[ctl->input[1]] = leaf;
-            break;
-
-        case XSTATE_CPUID:
-            p->xstate.raw[ctl->input[1]] = leaf;
-            break;
-
-        default:
-            p->basic.raw[ctl->input[0]] = leaf;
-            break;
-        }
-        break;
-
-    case 0x40000000:
-        p->hv_limit = ctl->eax;
-        break;
-
-    case 0x40000100:
-        p->hv2_limit = ctl->eax;
-        break;
-
-    case 0x80000000 ... 0x80000000 + ARRAY_SIZE(p->extd.raw) - 1:
-        p->extd.raw[ctl->input[0] - 0x80000000] = leaf;
-        break;
-    }
-
-    recalculate_cpuid_policy(d);
-    domain_cpu_policy_changed(d);
-
-    return 0;
-}
-
 static int update_domain_cpu_policy(struct domain *d,
                                     xen_domctl_cpu_policy_t *xdpc)
 {
@@ -951,19 +863,6 @@ long arch_do_domctl(
         break;
     }
 
-    case XEN_DOMCTL_set_cpuid:
-        if ( d == currd ) /* no domain_pause() */
-            ret = -EINVAL;
-        else if ( d->creation_finished )
-            ret = -EEXIST; /* No changing once the domain is running. */
-        else
-        {
-            domain_pause(d);
-            ret = update_domain_cpuid_info(d, &domctl->u.cpuid);
-            domain_unpause(d);
-        }
-        break;
-
     case XEN_DOMCTL_gettscinfo:
         if ( d == currd ) /* no domain_pause() */
             ret = -EINVAL;
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index bd7d26545d..605207a3d4 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -648,14 +648,6 @@ struct xen_domctl_set_target {
 
 #if defined(__i386__) || defined(__x86_64__)
 # define XEN_CPUID_INPUT_UNUSED  0xFFFFFFFF
-/* XEN_DOMCTL_set_cpuid */
-struct xen_domctl_cpuid {
-  uint32_t input[2];
-  uint32_t eax;
-  uint32_t ebx;
-  uint32_t ecx;
-  uint32_t edx;
-};
 
 /*
  * XEN_DOMCTL_{get,set}_cpu_policy (x86 specific)
@@ -1167,7 +1159,7 @@ struct xen_domctl {
 #define XEN_DOMCTL_set_target                    46
 #define XEN_DOMCTL_deassign_device               47
 #define XEN_DOMCTL_unbind_pt_irq                 48
-#define XEN_DOMCTL_set_cpuid                     49
+/* #define XEN_DOMCTL_set_cpuid                  49 - Obsolete - use set_cpu_policy */
 #define XEN_DOMCTL_get_device_group              50
 /* #define XEN_DOMCTL_set_machine_address_size   51 - Obsolete */
 /* #define XEN_DOMCTL_get_machine_address_size   52 - Obsolete */
@@ -1244,7 +1236,6 @@ struct xen_domctl {
         struct xen_domctl_vm_event_op       vm_event_op;
         struct xen_domctl_mem_sharing_op    mem_sharing_op;
 #if defined(__i386__) || defined(__x86_64__)
-        struct xen_domctl_cpuid             cpuid;
         struct xen_domctl_cpu_policy        cpu_policy;
         struct xen_domctl_vcpuextstate      vcpuextstate;
         struct xen_domctl_vcpu_msrs         vcpu_msrs;
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index b23772786a..fd8d23c185 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -716,8 +716,7 @@ static int flask_domctl(struct domain *d, int cmd)
         return current_has_perm(d, SECCLASS_DOMAIN, DOMAIN__SET_VIRQ_HANDLER);
 
     case XEN_DOMCTL_set_cpu_policy:
-    case XEN_DOMCTL_set_cpuid:
-        return current_has_perm(d, SECCLASS_DOMAIN2, DOMAIN2__SET_CPUID);
+        return current_has_perm(d, SECCLASS_DOMAIN2, DOMAIN2__SET_CPU_POLICY);
 
     case XEN_DOMCTL_gettscinfo:
         return current_has_perm(d, SECCLASS_DOMAIN2, DOMAIN2__GETTSC);
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index 6f3f9493f8..c055c14c26 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -208,8 +208,7 @@ class domain2
 #  target = the new target domain
     set_as_target
 # XEN_DOMCTL_set_cpu_policy
-# XEN_DOMCTL_set_cpuid
-    set_cpuid
+    set_cpu_policy
 # XEN_DOMCTL_gettscinfo
     gettsc
 # XEN_DOMCTL_settscinfo
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Xen-devel] [PATCH v2 10/10] x86/cpuid: Enable CPUID Faulting for PV control domains by default
  2019-09-13 19:27 [Xen-devel] [PATCH v2 00/10] x86/cpuid: Switch to using XEN_DOMCTL_set_cpumsr_policy Andrew Cooper
                   ` (8 preceding siblings ...)
  2019-09-13 19:27 ` [Xen-devel] [PATCH v2 09/10] x86/domctl: Drop XEN_DOMCTL_set_cpuid Andrew Cooper
@ 2019-09-13 19:27 ` Andrew Cooper
  2019-09-16 11:22   ` Jan Beulich
  9 siblings, 1 reply; 30+ messages in thread
From: Andrew Cooper @ 2019-09-13 19:27 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monné

The domain builder no longer uses local CPUID instructions for policy
decisions.  This resolves a key issue for PVH dom0's.  However, as PV dom0's
have never had faulting enforced, leave a command line option to restore the
old behaviour.

Advertise virtualised faulting support to control domains unless the opt-out
has been used.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>

v2:
 * Introduce a command line option to retain old behaviour.
 * Advertise virtualised faulting support to dom0 when it is used.
v2.1:
 * Split the PVH adjustment out.  Rebase.
 * Recover the docs/ hunk which was accidentally missing.
---
 docs/misc/xen-command-line.pandoc | 19 ++++++++++++++++++-
 xen/arch/x86/cpu/common.c         | 26 ++++++++++++++------------
 xen/arch/x86/dom0_build.c         |  2 ++
 xen/arch/x86/msr.c                |  3 ++-
 xen/include/asm-x86/setup.h       |  1 +
 5 files changed, 37 insertions(+), 14 deletions(-)

diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index 832797e2e2..fc64429064 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -658,7 +658,8 @@ The debug trace feature is only enabled in debugging builds of Xen.
 Specify the bit width of the DMA heap.
 
 ### dom0
-    = List of [ pv | pvh, shadow=<bool>, verbose=<bool> ]
+    = List of [ pv | pvh, shadow=<bool>, verbose=<bool>,
+                cpuid-faulting=<bool> ]
 
     Applicability: x86
 
@@ -691,6 +692,22 @@ Controls for how dom0 is constructed on x86 systems.
     information during the dom0 build.  It defaults to the compile time choice
     of `CONFIG_VERBOSE_DEBUG`.
 
+*   The `cpuid-faulting` boolean is an interim option, is only applicable to
+    PV dom0, and defaults to true.
+
+    Before Xen 4.13, the domain builder logic for guest construction depended
+    on seeing host CPUID values to function correctly.  As a result, CPUID
+    Faulting was never activated for PV dom0's, even on capable hardware.
+
+    In Xen 4.13, the domain builder logic has been fixed, and no longer has
+    this dependency.  As a consequence, CPUID Faulting is activated by default
+    even for PV dom0's.
+
+    However, as PV dom0's have always seen host CPUID data in the past, there
+    is a chance that further dependencies exist.  This boolean can be used to
+    restore the pre-4.13 behaviour.  If specifying `no-cpuid-faulting` fixes
+    an issue in dom0, please report a bug.
+
 ### dom0-iommu
     = List of [ passthrough=<bool>, strict=<bool>, map-inclusive=<bool>,
                 map-reserved=<bool>, none ]
diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
index 4bf852c948..6c6bd63301 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -10,12 +10,15 @@
 #include <asm/io.h>
 #include <asm/mpspec.h>
 #include <asm/apic.h>
+#include <asm/setup.h>
 #include <mach_apic.h>
 #include <public/sysctl.h> /* for XEN_INVALID_{SOCKET,CORE}_ID */
 
 #include "cpu.h"
 #include "mcheck/x86_mca.h"
 
+bool __read_mostly opt_dom0_cpuid_faulting = true;
+
 bool_t opt_arat = 1;
 boolean_param("arat", opt_arat);
 
@@ -171,20 +174,19 @@ void ctxt_switch_levelling(const struct vcpu *next)
 		/*
 		 * We *should* be enabling faulting for PV control domains.
 		 *
-		 * Unfortunately, the domain builder (having only ever been a
-		 * PV guest) expects to be able to see host cpuid state in a
-		 * native CPUID instruction, to correctly build a CPUID policy
-		 * for HVM guests (notably the xstate leaves).
-		 *
-		 * This logic is fundimentally broken for HVM toolstack
-		 * domains, and faulting causes PV guests to behave like HVM
-		 * guests from their point of view.
+		 * The domain builder has now been updated to not depend on
+		 * seeing host CPUID values.  This makes it compatible with
+		 * PVH toolstack domains, and lets us enable faulting by
+		 * default for all PV domains.
 		 *
-		 * Future development plans will move responsibility for
-		 * generating the maximum full cpuid policy into Xen, at which
-		 * this problem will disappear.
+		 * However, as PV control domains have never had faulting
+		 * enforced on them before, there might plausibly be other
+		 * dependenices on host CPUID data.  Therefore, we have left
+		 * an interim escape hatch in the form of
+		 * `dom0=no-cpuid-faulting` to restore the older behaviour.
 		 */
-		set_cpuid_faulting(nextd && (!is_control_domain(nextd) ||
+		set_cpuid_faulting(nextd && (opt_dom0_cpuid_faulting ||
+					     !is_control_domain(nextd) ||
 					     !is_pv_domain(nextd)) &&
 				   (is_pv_domain(nextd) ||
 				    next->arch.msrs->
diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c
index c69570920c..4b75166db3 100644
--- a/xen/arch/x86/dom0_build.c
+++ b/xen/arch/x86/dom0_build.c
@@ -305,6 +305,8 @@ static int __init parse_dom0_param(const char *s)
 #endif
         else if ( (val = parse_boolean("verbose", s, ss)) >= 0 )
             opt_dom0_verbose = val;
+        else if ( (val = parse_boolean("cpuid-faulting", s, ss)) >= 0 )
+            opt_dom0_cpuid_faulting = val;
         else
             rc = -EINVAL;
 
diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index a6c8cc7627..4698d2bba1 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -26,6 +26,7 @@
 
 #include <asm/debugreg.h>
 #include <asm/msr.h>
+#include <asm/setup.h>
 
 DEFINE_PER_CPU(uint32_t, tsc_aux);
 
@@ -92,7 +93,7 @@ int init_domain_msr_policy(struct domain *d)
         return -ENOMEM;
 
     /* See comment in ctxt_switch_levelling() */
-    if ( is_control_domain(d) && is_pv_domain(d) )
+    if ( !opt_dom0_cpuid_faulting && is_control_domain(d) && is_pv_domain(d) )
         mp->platform_info.cpuid_faulting = false;
 
     d->arch.msr = mp;
diff --git a/xen/include/asm-x86/setup.h b/xen/include/asm-x86/setup.h
index 15d6363022..861d46d6ac 100644
--- a/xen/include/asm-x86/setup.h
+++ b/xen/include/asm-x86/setup.h
@@ -66,6 +66,7 @@ extern bool opt_dom0_shadow;
 #endif
 extern bool opt_dom0_pvh;
 extern bool opt_dom0_verbose;
+extern bool opt_dom0_cpuid_faulting;
 
 #define max_init_domid (0)
 
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Xen-devel] [PATCH v2 01/10] x86/msr: Offer CPUID Faulting to PVH control domains
  2019-09-13 19:27 ` [Xen-devel] [PATCH v2 01/10] x86/msr: Offer CPUID Faulting to PVH control domains Andrew Cooper
@ 2019-09-16 10:53   ` Jan Beulich
  0 siblings, 0 replies; 30+ messages in thread
From: Jan Beulich @ 2019-09-16 10:53 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 13.09.2019 21:27, Andrew Cooper wrote:
> The control domain exclusion for CPUID Faulting predates dom0 PVH, but the
> reason for the exclusion (to allow the domain builder to see host CPUID
> values) isn't applicable.
> 
> The domain builder *is* broken in PVH control domains, and restricting the use
> of CPUID Faulting doesn't make it any less broken.  Tweak the logic to only
> exclude PV control domains.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Xen-devel] [PATCH v2 03/10] libx86: Introduce x86_cpu_policies_are_compatible()
  2019-09-13 19:27 ` [Xen-devel] [PATCH v2 03/10] libx86: Introduce x86_cpu_policies_are_compatible() Andrew Cooper
@ 2019-09-16 10:59   ` Jan Beulich
  2019-09-16 15:31     ` Andrew Cooper
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Beulich @ 2019-09-16 10:59 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 13.09.2019 21:27, Andrew Cooper wrote:
> --- /dev/null
> +++ b/xen/lib/x86/policy.c
> @@ -0,0 +1,54 @@
> +#include "private.h"
> +
> +#include <xen/lib/x86/cpu-policy.h>
> +
> +int x86_cpu_policies_are_compatible(const struct cpu_policy *host,
> +                                    const struct cpu_policy *guest,
> +                                    struct cpu_policy_errors *err)
> +{
> +    struct cpu_policy_errors e = INIT_CPU_POLICY_ERRORS;
> +    int ret = -EINVAL;
> +
> +    if ( err )
> +        *err = e;

You don't really need this, do you? All paths lead ...

> +#define NA XEN_CPUID_NO_SUBLEAF
> +#define FAIL_CPUID(l, s) \
> +    do { e.leaf = (l); e.subleaf = (s); goto out; } while ( 0 )
> +#define FAIL_MSR(m) \
> +    do { e.msr = (m); goto out; } while ( 0 )
> +
> +    if ( guest->cpuid->basic.max_leaf > host->cpuid->basic.max_leaf )
> +        FAIL_CPUID(0, NA);
> +
> +    if ( guest->cpuid->extd.max_leaf > host->cpuid->extd.max_leaf )
> +        FAIL_CPUID(0x80000008, NA);
> +
> +    /* TODO: Audit more CPUID data. */
> +
> +    if ( ~host->msr->platform_info.raw & guest->msr->platform_info.raw )
> +        FAIL_MSR(MSR_INTEL_PLATFORM_INFO);
> +
> +#undef FAIL_MSR
> +#undef FAIL_CPUID
> +#undef NA
> +
> +    /* Success. */
> +    ret = 0;
> +
> + out:
> +    if ( ret && err )
> +        *err = e;

... here, and hence you could simply drop the "ret &&" part of the
condition.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Xen-devel] [PATCH v2 05/10] x86/domctl: Implement XEN_DOMCTL_set_cpumsr_policy
  2019-09-13 19:27 ` [Xen-devel] [PATCH v2 05/10] x86/domctl: Implement XEN_DOMCTL_set_cpumsr_policy Andrew Cooper
@ 2019-09-16 11:04   ` Jan Beulich
  2019-09-16 15:40     ` Andrew Cooper
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Beulich @ 2019-09-16 11:04 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Sergey Dyasli, Wei Liu, Ian Jackson, Xen-devel, Daniel De Graaf,
	Roger Pau Monné

On 13.09.2019 21:27, Andrew Cooper wrote:
> v2:
>  * Bump the DOMCTL interface version
>  * Proactively set the error pointers in xc_set_domain_cpu_policy()

From this I would have expected ...

> --- a/tools/libxc/xc_cpuid_x86.c
> +++ b/tools/libxc/xc_cpuid_x86.c
> @@ -229,6 +229,52 @@ int xc_get_domain_cpu_policy(xc_interface *xch, uint32_t domid,
>      return ret;
>  }
>  
> +int xc_set_domain_cpu_policy(xc_interface *xch, uint32_t domid,
> +                             uint32_t nr_leaves, xen_cpuid_leaf_t *leaves,
> +                             uint32_t nr_msrs, xen_msr_entry_t *msrs,
> +                             uint32_t *err_leaf_p, uint32_t *err_subleaf_p,
> +                             uint32_t *err_msr_p)
> +{
> +    DECLARE_DOMCTL;
> +    DECLARE_HYPERCALL_BOUNCE(leaves,
> +                             nr_leaves * sizeof(*leaves),
> +                             XC_HYPERCALL_BUFFER_BOUNCE_IN);
> +    DECLARE_HYPERCALL_BOUNCE(msrs,
> +                             nr_msrs * sizeof(*msrs),
> +                             XC_HYPERCALL_BUFFER_BOUNCE_IN);
> +    int ret;
> +
> +    if ( xc_hypercall_bounce_pre(xch, leaves) )
> +        return -1;
> +
> +    if ( xc_hypercall_bounce_pre(xch, msrs) )
> +        return -1;

... setting of *err_..._p ahead of these.

> @@ -658,17 +658,24 @@ struct xen_domctl_cpuid {
>  };
>  
>  /*
> - * XEN_DOMCTL_get_cpu_policy (x86 specific)
> + * XEN_DOMCTL_{get,set}_cpu_policy (x86 specific)
>   *
> - * Query the CPUID and MSR policies for a specific domain.
> + * Query or set the CPUID and MSR policies for a specific domain.
>   */
>  struct xen_domctl_cpu_policy {
>      uint32_t nr_leaves; /* IN/OUT: Number of leaves in/written to
>                           * 'cpuid_policy'. */
>      uint32_t nr_msrs;   /* IN/OUT: Number of MSRs in/written to
>                           * 'msr_domain_policy' */
> -    XEN_GUEST_HANDLE_64(xen_cpuid_leaf_t) cpuid_policy; /* OUT */
> -    XEN_GUEST_HANDLE_64(xen_msr_entry_t) msr_policy;    /* OUT */
> +    XEN_GUEST_HANDLE_64(xen_cpuid_leaf_t) cpuid_policy; /* IN/OUT */
> +    XEN_GUEST_HANDLE_64(xen_msr_entry_t) msr_policy;    /* IN/OUT */
> +
> +    /*
> +     * OUT, set_policy only.  Written in some (but not all) error cases to
> +     * identify problem the CPUID leaf/subleaf and/or MSR which auditing
> +     * objects to.
> +     */

Stray "problem", or missing further word(s)?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Xen-devel] [PATCH v2 06/10] tools/libxc: Pre-cleanup for xc_cpuid_{set, apply_policy}()
  2019-09-13 19:27 ` [Xen-devel] [PATCH v2 06/10] tools/libxc: Pre-cleanup for xc_cpuid_{set, apply_policy}() Andrew Cooper
@ 2019-09-16 11:09   ` Jan Beulich
  2019-09-16 15:42     ` Andrew Cooper
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Beulich @ 2019-09-16 11:09 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Roger Pau Monné, Wei Liu, Ian Jackson

On 13.09.2019 21:27, Andrew Cooper wrote:
> @@ -932,6 +932,13 @@ int xc_cpuid_set(
>              goto fail;
>          }
>  
> +        /*
> +         * Notes for following this algorithm:
> +         *
> +         * While it will accept any leaf data, it only makes sense to use on
> +         * feature leaves.  regs[] initially contains the host values.  This,
> +         * with the fall-through chain is how the 's' and 'k' options works.
> +         */

Nit: Stray "s" at the very end. And doesn't there want to be a 2nd
comma after "chain" (I admit this is purely from a German language
perspective).

Jan 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Xen-devel] [PATCH v2 08/10] tools/libxc: Rework xc_cpuid_apply_policy() to use {get, set}_cpu_policy()
  2019-09-13 19:27 ` [Xen-devel] [PATCH v2 08/10] tools/libxc: Rework xc_cpuid_apply_policy() " Andrew Cooper
@ 2019-09-16 11:17   ` Jan Beulich
  2019-09-16 13:41     ` Wei Liu
  2019-09-16 15:49     ` Andrew Cooper
  2019-09-18 16:09   ` Jan Beulich
  2019-09-25 18:11   ` [Xen-devel] [PATCH v3 " Andrew Cooper
  2 siblings, 2 replies; 30+ messages in thread
From: Jan Beulich @ 2019-09-16 11:17 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Roger Pau Monné, Wei Liu, Ian Jackson

On 13.09.2019 21:27, Andrew Cooper wrote:
> -static void intel_xc_cpuid_policy(const struct cpuid_domain_info *info,
> -                                  const unsigned int *input, unsigned int *regs)
> -{
> -    switch ( input[0] )
> -    {
> -    case 0x00000004:
> -        /*
> -         * EAX[31:26] is Maximum Cores Per Package (minus one).
> -         * Update to reflect vLAPIC_ID = vCPU_ID * 2.
> -         */
> -        regs[0] = (((regs[0] & 0x7c000000u) << 1) | 0x04000000u |
> -                   (regs[0] & 0x3ffu));
> -        regs[3] &= 0x3ffu;

I think you want to mention that the removal of this masking is
intentional, for it looking bogus. With an appropriate addition to
the description
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Xen-devel] [PATCH v2 10/10] x86/cpuid: Enable CPUID Faulting for PV control domains by default
  2019-09-13 19:27 ` [Xen-devel] [PATCH v2 10/10] x86/cpuid: Enable CPUID Faulting for PV control domains by default Andrew Cooper
@ 2019-09-16 11:22   ` Jan Beulich
  2019-09-16 15:52     ` Andrew Cooper
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Beulich @ 2019-09-16 11:22 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 13.09.2019 21:27, Andrew Cooper wrote:
> The domain builder no longer uses local CPUID instructions for policy
> decisions.  This resolves a key issue for PVH dom0's.  However, as PV dom0's
> have never had faulting enforced, leave a command line option to restore the
> old behaviour.
> 
> Advertise virtualised faulting support to control domains unless the opt-out
> has been used.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
perhaps with ...

> --- a/xen/arch/x86/dom0_build.c
> +++ b/xen/arch/x86/dom0_build.c
> @@ -305,6 +305,8 @@ static int __init parse_dom0_param(const char *s)
>  #endif
>          else if ( (val = parse_boolean("verbose", s, ss)) >= 0 )
>              opt_dom0_verbose = val;
> +        else if ( (val = parse_boolean("cpuid-faulting", s, ss)) >= 0 )
> +            opt_dom0_cpuid_faulting = val;

... this wrapped in "#ifdef CONFIG_PV" or IS_ENABLED(CONFIG_PV)
added to the condition?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Xen-devel] [PATCH v2 08/10] tools/libxc: Rework xc_cpuid_apply_policy() to use {get, set}_cpu_policy()
  2019-09-16 11:17   ` Jan Beulich
@ 2019-09-16 13:41     ` Wei Liu
  2019-09-16 15:49     ` Andrew Cooper
  1 sibling, 0 replies; 30+ messages in thread
From: Wei Liu @ 2019-09-16 13:41 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Ian Jackson, Roger Pau Monné, Wei Liu, Xen-devel

On Mon, 16 Sep 2019 at 12:17, Jan Beulich <jbeulich@suse.com> wrote:
>
> On 13.09.2019 21:27, Andrew Cooper wrote:
> > -static void intel_xc_cpuid_policy(const struct cpuid_domain_info *info,
> > -                                  const unsigned int *input, unsigned int *regs)
> > -{
> > -    switch ( input[0] )
> > -    {
> > -    case 0x00000004:
> > -        /*
> > -         * EAX[31:26] is Maximum Cores Per Package (minus one).
> > -         * Update to reflect vLAPIC_ID = vCPU_ID * 2.
> > -         */
> > -        regs[0] = (((regs[0] & 0x7c000000u) << 1) | 0x04000000u |
> > -                   (regs[0] & 0x3ffu));
> > -        regs[3] &= 0x3ffu;
>
> I think you want to mention that the removal of this masking is
> intentional, for it looking bogus. With an appropriate addition to
> the description
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>

Acked-by: Wei Liu <wl@xen.org>

> Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Xen-devel] [PATCH v2 03/10] libx86: Introduce x86_cpu_policies_are_compatible()
  2019-09-16 10:59   ` Jan Beulich
@ 2019-09-16 15:31     ` Andrew Cooper
  0 siblings, 0 replies; 30+ messages in thread
From: Andrew Cooper @ 2019-09-16 15:31 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 16/09/2019 11:59, Jan Beulich wrote:
> On 13.09.2019 21:27, Andrew Cooper wrote:
>> --- /dev/null
>> +++ b/xen/lib/x86/policy.c
>> @@ -0,0 +1,54 @@
>> +#include "private.h"
>> +
>> +#include <xen/lib/x86/cpu-policy.h>
>> +
>> +int x86_cpu_policies_are_compatible(const struct cpu_policy *host,
>> +                                    const struct cpu_policy *guest,
>> +                                    struct cpu_policy_errors *err)
>> +{
>> +    struct cpu_policy_errors e = INIT_CPU_POLICY_ERRORS;
>> +    int ret = -EINVAL;
>> +
>> +    if ( err )
>> +        *err = e;
> You don't really need this, do you? All paths lead ...
>
>> +#define NA XEN_CPUID_NO_SUBLEAF
>> +#define FAIL_CPUID(l, s) \
>> +    do { e.leaf = (l); e.subleaf = (s); goto out; } while ( 0 )
>> +#define FAIL_MSR(m) \
>> +    do { e.msr = (m); goto out; } while ( 0 )
>> +
>> +    if ( guest->cpuid->basic.max_leaf > host->cpuid->basic.max_leaf )
>> +        FAIL_CPUID(0, NA);
>> +
>> +    if ( guest->cpuid->extd.max_leaf > host->cpuid->extd.max_leaf )
>> +        FAIL_CPUID(0x80000008, NA);
>> +
>> +    /* TODO: Audit more CPUID data. */
>> +
>> +    if ( ~host->msr->platform_info.raw & guest->msr->platform_info.raw )
>> +        FAIL_MSR(MSR_INTEL_PLATFORM_INFO);
>> +
>> +#undef FAIL_MSR
>> +#undef FAIL_CPUID
>> +#undef NA
>> +
>> +    /* Success. */
>> +    ret = 0;
>> +
>> + out:
>> +    if ( ret && err )
>> +        *err = e;
> ... here, and hence you could simply drop the "ret &&" part of the
> condition.

Hmm - so they do.  I'll adjust as you suggest.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Xen-devel] [PATCH v2 05/10] x86/domctl: Implement XEN_DOMCTL_set_cpumsr_policy
  2019-09-16 11:04   ` Jan Beulich
@ 2019-09-16 15:40     ` Andrew Cooper
  0 siblings, 0 replies; 30+ messages in thread
From: Andrew Cooper @ 2019-09-16 15:40 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Sergey Dyasli, Wei Liu, Ian Jackson, Xen-devel, Daniel De Graaf,
	Roger Pau Monné

On 16/09/2019 12:04, Jan Beulich wrote:
> On 13.09.2019 21:27, Andrew Cooper wrote:
>> v2:
>>  * Bump the DOMCTL interface version
>>  * Proactively set the error pointers in xc_set_domain_cpu_policy()
> From this I would have expected ...
>
>> --- a/tools/libxc/xc_cpuid_x86.c
>> +++ b/tools/libxc/xc_cpuid_x86.c
>> @@ -229,6 +229,52 @@ int xc_get_domain_cpu_policy(xc_interface *xch, uint32_t domid,
>>      return ret;
>>  }
>>  
>> +int xc_set_domain_cpu_policy(xc_interface *xch, uint32_t domid,
>> +                             uint32_t nr_leaves, xen_cpuid_leaf_t *leaves,
>> +                             uint32_t nr_msrs, xen_msr_entry_t *msrs,
>> +                             uint32_t *err_leaf_p, uint32_t *err_subleaf_p,
>> +                             uint32_t *err_msr_p)
>> +{
>> +    DECLARE_DOMCTL;
>> +    DECLARE_HYPERCALL_BOUNCE(leaves,
>> +                             nr_leaves * sizeof(*leaves),
>> +                             XC_HYPERCALL_BUFFER_BOUNCE_IN);
>> +    DECLARE_HYPERCALL_BOUNCE(msrs,
>> +                             nr_msrs * sizeof(*msrs),
>> +                             XC_HYPERCALL_BUFFER_BOUNCE_IN);
>> +    int ret;
>> +
>> +    if ( xc_hypercall_bounce_pre(xch, leaves) )
>> +        return -1;
>> +
>> +    if ( xc_hypercall_bounce_pre(xch, msrs) )
>> +        return -1;
> ... setting of *err_..._p ahead of these.

Hmm - I suppose so.  Done.

>
>> @@ -658,17 +658,24 @@ struct xen_domctl_cpuid {
>>  };
>>  
>>  /*
>> - * XEN_DOMCTL_get_cpu_policy (x86 specific)
>> + * XEN_DOMCTL_{get,set}_cpu_policy (x86 specific)
>>   *
>> - * Query the CPUID and MSR policies for a specific domain.
>> + * Query or set the CPUID and MSR policies for a specific domain.
>>   */
>>  struct xen_domctl_cpu_policy {
>>      uint32_t nr_leaves; /* IN/OUT: Number of leaves in/written to
>>                           * 'cpuid_policy'. */
>>      uint32_t nr_msrs;   /* IN/OUT: Number of MSRs in/written to
>>                           * 'msr_domain_policy' */
>> -    XEN_GUEST_HANDLE_64(xen_cpuid_leaf_t) cpuid_policy; /* OUT */
>> -    XEN_GUEST_HANDLE_64(xen_msr_entry_t) msr_policy;    /* OUT */
>> +    XEN_GUEST_HANDLE_64(xen_cpuid_leaf_t) cpuid_policy; /* IN/OUT */
>> +    XEN_GUEST_HANDLE_64(xen_msr_entry_t) msr_policy;    /* IN/OUT */
>> +
>> +    /*
>> +     * OUT, set_policy only.  Written in some (but not all) error cases to
>> +     * identify problem the CPUID leaf/subleaf and/or MSR which auditing
>> +     * objects to.
>> +     */
> Stray "problem", or missing further word(s)?

Stray problem.  Dropped.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Xen-devel] [PATCH v2 06/10] tools/libxc: Pre-cleanup for xc_cpuid_{set, apply_policy}()
  2019-09-16 11:09   ` Jan Beulich
@ 2019-09-16 15:42     ` Andrew Cooper
  0 siblings, 0 replies; 30+ messages in thread
From: Andrew Cooper @ 2019-09-16 15:42 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, Roger Pau Monné, Wei Liu, Ian Jackson

On 16/09/2019 12:09, Jan Beulich wrote:
> On 13.09.2019 21:27, Andrew Cooper wrote:
>> @@ -932,6 +932,13 @@ int xc_cpuid_set(
>>              goto fail;
>>          }
>>  
>> +        /*
>> +         * Notes for following this algorithm:
>> +         *
>> +         * While it will accept any leaf data, it only makes sense to use on
>> +         * feature leaves.  regs[] initially contains the host values.  This,
>> +         * with the fall-through chain is how the 's' and 'k' options works.
>> +         */
> Nit: Stray "s" at the very end. And doesn't there want to be a 2nd
> comma after "chain" (I admit this is purely from a German language
> perspective).

Both correct observations.  I was a bit too lazy when adding 's' into
the mix.

Fixed.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Xen-devel] [PATCH v2 02/10] libx86: Proactively initialise error pointers
       [not found]     ` <65f18521-15c5-72a9-29f6-cd5d621e1283@citrix.com>
@ 2019-09-16 15:46       ` Jan Beulich
  0 siblings, 0 replies; 30+ messages in thread
From: Jan Beulich @ 2019-09-16 15:46 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Wei Liu, Roger Pau Monné

On 16.09.2019 17:26, Andrew Cooper wrote:
> On 16/09/2019 11:56, Jan Beulich wrote:
>> On 13.09.2019 21:27, Andrew Cooper wrote:
>>> --- a/tools/tests/cpu-policy/test-cpu-policy.c
>>> +++ b/tools/tests/cpu-policy/test-cpu-policy.c
>>> @@ -283,7 +283,7 @@ static void test_cpuid_deserialise_failure(void)
>>>      for ( size_t i = 0; i < ARRAY_SIZE(tests); ++i )
>>>      {
>>>          const struct test *t = &tests[i];
>>> -        uint32_t err_leaf = ~0u, err_subleaf = ~0u;
>>> +        uint32_t err_leaf, err_subleaf;
>>>          int rc;
>>>  
>>>          /* No writes should occur.  Use NULL to catch errors. */
>>> @@ -336,7 +336,7 @@ static void test_msr_deserialise_failure(void)
>>>      for ( size_t i = 0; i < ARRAY_SIZE(tests); ++i )
>>>      {
>>>          const struct test *t = &tests[i];
>>> -        uint32_t err_msr = ~0u;
>>> +        uint32_t err_msr;
>> I continue to be curious about the ~0u => ...
>>
>>> --- a/xen/lib/x86/cpuid.c
>>> +++ b/xen/lib/x86/cpuid.c
>>> @@ -381,6 +381,11 @@ int x86_cpuid_copy_from_buffer(struct cpuid_policy *p,
>>>      unsigned int i;
>>>      xen_cpuid_leaf_t data;
>>>  
>>> +    if ( err_leaf )
>>> +        *err_leaf = -1;
>>> +    if ( err_subleaf )
>>> +        *err_subleaf = -1;
>>> +
>>>      /*
>>>       * A well formed caller is expected to pass an array with leaves in order,
>>>       * and without any repetitions.  However, due to per-vendor differences,
>>> --- a/xen/lib/x86/msr.c
>>> +++ b/xen/lib/x86/msr.c
>>> @@ -55,6 +55,9 @@ int x86_msr_copy_from_buffer(struct msr_policy *p,
>>>      xen_msr_entry_t data;
>>>      int rc;
>>>  
>>> +    if ( err_msr )
>>> +        *err_msr = -1;
>> ... => -1 switch.
> 
> Its shorter to write, and less buggy when the type changes.
> 
> Any reason why this email is in private?

None at all - I have no idea how xen-devel managed to disappear
from the recipients list (I've re-added it now).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Xen-devel] [PATCH v2 08/10] tools/libxc: Rework xc_cpuid_apply_policy() to use {get, set}_cpu_policy()
  2019-09-16 11:17   ` Jan Beulich
  2019-09-16 13:41     ` Wei Liu
@ 2019-09-16 15:49     ` Andrew Cooper
  2019-09-16 16:05       ` Jan Beulich
  1 sibling, 1 reply; 30+ messages in thread
From: Andrew Cooper @ 2019-09-16 15:49 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, Roger Pau Monné, Wei Liu, Ian Jackson

On 16/09/2019 12:17, Jan Beulich wrote:
> On 13.09.2019 21:27, Andrew Cooper wrote:
>> -static void intel_xc_cpuid_policy(const struct cpuid_domain_info *info,
>> -                                  const unsigned int *input, unsigned int *regs)
>> -{
>> -    switch ( input[0] )
>> -    {
>> -    case 0x00000004:
>> -        /*
>> -         * EAX[31:26] is Maximum Cores Per Package (minus one).
>> -         * Update to reflect vLAPIC_ID = vCPU_ID * 2.
>> -         */
>> -        regs[0] = (((regs[0] & 0x7c000000u) << 1) | 0x04000000u |
>> -                   (regs[0] & 0x3ffu));
>> -        regs[3] &= 0x3ffu;
> I think you want to mention that the removal of this masking is
> intentional, for it looking bogus. With an appropriate addition to
> the description
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

As I said before, I fail to see how that isn't covered by the blanket
"almost all of this is redundant" statement.

There are other masks which are dropped, and calling this one out in
isolation seems wrong.  Observe that the comment discussing topology
only talks about eax, and not edx.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Xen-devel] [PATCH v2 10/10] x86/cpuid: Enable CPUID Faulting for PV control domains by default
  2019-09-16 11:22   ` Jan Beulich
@ 2019-09-16 15:52     ` Andrew Cooper
  0 siblings, 0 replies; 30+ messages in thread
From: Andrew Cooper @ 2019-09-16 15:52 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 16/09/2019 12:22, Jan Beulich wrote:
> On 13.09.2019 21:27, Andrew Cooper wrote:
>> The domain builder no longer uses local CPUID instructions for policy
>> decisions.  This resolves a key issue for PVH dom0's.  However, as PV dom0's
>> have never had faulting enforced, leave a command line option to restore the
>> old behaviour.
>>
>> Advertise virtualised faulting support to control domains unless the opt-out
>> has been used.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Thanks,

> perhaps with ...
>
>> --- a/xen/arch/x86/dom0_build.c
>> +++ b/xen/arch/x86/dom0_build.c
>> @@ -305,6 +305,8 @@ static int __init parse_dom0_param(const char *s)
>>  #endif
>>          else if ( (val = parse_boolean("verbose", s, ss)) >= 0 )
>>              opt_dom0_verbose = val;
>> +        else if ( (val = parse_boolean("cpuid-faulting", s, ss)) >= 0 )
>> +            opt_dom0_cpuid_faulting = val;
> ... this wrapped in "#ifdef CONFIG_PV" or IS_ENABLED(CONFIG_PV)
> added to the condition?

Done.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Xen-devel] [PATCH v2 08/10] tools/libxc: Rework xc_cpuid_apply_policy() to use {get, set}_cpu_policy()
  2019-09-16 15:49     ` Andrew Cooper
@ 2019-09-16 16:05       ` Jan Beulich
  0 siblings, 0 replies; 30+ messages in thread
From: Jan Beulich @ 2019-09-16 16:05 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Ian Jackson, Xen-devel, Wei Liu, Roger Pau Monné

On 16.09.2019 17:49, Andrew Cooper wrote:
> On 16/09/2019 12:17, Jan Beulich wrote:
>> On 13.09.2019 21:27, Andrew Cooper wrote:
>>> -static void intel_xc_cpuid_policy(const struct cpuid_domain_info *info,
>>> -                                  const unsigned int *input, unsigned int *regs)
>>> -{
>>> -    switch ( input[0] )
>>> -    {
>>> -    case 0x00000004:
>>> -        /*
>>> -         * EAX[31:26] is Maximum Cores Per Package (minus one).
>>> -         * Update to reflect vLAPIC_ID = vCPU_ID * 2.
>>> -         */
>>> -        regs[0] = (((regs[0] & 0x7c000000u) << 1) | 0x04000000u |
>>> -                   (regs[0] & 0x3ffu));
>>> -        regs[3] &= 0x3ffu;
>> I think you want to mention that the removal of this masking is
>> intentional, for it looking bogus. With an appropriate addition to
>> the description
>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> 
> As I said before, I fail to see how that isn't covered by the blanket
> "almost all of this is redundant" statement.

Hmm, yes, fair enough - recalculate_cpuid_policy() indeed has been
zapping this to the low three bits already.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Xen-devel] [PATCH v2 08/10] tools/libxc: Rework xc_cpuid_apply_policy() to use {get, set}_cpu_policy()
  2019-09-13 19:27 ` [Xen-devel] [PATCH v2 08/10] tools/libxc: Rework xc_cpuid_apply_policy() " Andrew Cooper
  2019-09-16 11:17   ` Jan Beulich
@ 2019-09-18 16:09   ` Jan Beulich
  2019-09-19  8:48     ` Andrew Cooper
  2019-09-25 18:11   ` [Xen-devel] [PATCH v3 " Andrew Cooper
  2 siblings, 1 reply; 30+ messages in thread
From: Jan Beulich @ 2019-09-18 16:09 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Roger Pau Monné, Wei Liu, Ian Jackson

On 13.09.2019 21:27, Andrew Cooper wrote:
> @@ -1054,3 +446,191 @@ int xc_cpuid_set(
>  
>      return rc;
>  }
> +
> +int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid,
> +                          const uint32_t *featureset, unsigned int nr_features)
> +{
> +    int rc;
> +    xc_dominfo_t di;
> +    unsigned int i, nr_leaves, nr_msrs;
> +    xen_cpuid_leaf_t *leaves = NULL;
> +    struct cpuid_policy *p = NULL;
> +    uint32_t err_leaf = -1, err_subleaf = -1, err_msr = -1;
> +
> +    if ( xc_domain_getinfo(xch, domid, 1, &di) != 1 ||
> +         di.domid != domid )
> +    {
> +        ERROR("Failed to obtain d%d info", domid);
> +        rc = -ESRCH;
> +        goto out;
> +    }
> +
> +    rc = xc_get_cpu_policy_size(xch, &nr_leaves, &nr_msrs);
> +    if ( rc )
> +    {
> +        PERROR("Failed to obtain policy info size");
> +        rc = -errno;
> +        goto out;
> +    }
> +
> +    rc = -ENOMEM;
> +    if ( (leaves = calloc(nr_leaves, sizeof(*leaves))) == NULL ||
> +         (p = calloc(1, sizeof(*p))) == NULL )
> +        goto out;
> +
> +    nr_msrs = 0;
> +    rc = xc_get_domain_cpu_policy(xch, domid, &nr_leaves, leaves,
> +                                  &nr_msrs, NULL);
> +    if ( rc )
> +    {
> +        PERROR("Failed to obtain d%d's policy", domid);
> +        rc = -errno;
> +        goto out;
> +    }
> +
> +    rc = x86_cpuid_copy_from_buffer(p, leaves, nr_leaves,
> +                                    &err_leaf, &err_subleaf);
> +    if ( rc )
> +    {
> +        ERROR("Failed to deserialise CPUID (err leaf %#x, subleaf %#x) (%d = %s)",
> +              err_leaf, err_subleaf, -rc, strerror(-rc));
> +        goto out;
> +    }
> +
> +    if ( featureset )
> +    {
> +        uint32_t disabled_features[FEATURESET_NR_ENTRIES],
> +            feat[FEATURESET_NR_ENTRIES] = {};
> +        static const uint32_t deep_features[] = INIT_DEEP_FEATURES;
> +        unsigned int i, b;
> +
> +        /*
> +         * The user supplied featureset may be shorter or longer than
> +         * FEATURESET_NR_ENTRIES.  Shorter is fine, and we will zero-extend.
> +         * Longer is fine, so long as it only padded with zeros.
> +         */
> +        unsigned int user_len = min(FEATURESET_NR_ENTRIES + 0u, nr_features);
> +
> +        /* Check for truncated set bits. */
> +        rc = -EOPNOTSUPP;
> +        for ( i = user_len; i < nr_features; ++i )
> +            if ( featureset[i] != 0 )
> +                goto out;
> +
> +        memcpy(feat, featureset, sizeof(*featureset) * user_len);
> +
> +        /* Disable deep dependencies of disabled features. */
> +        for ( i = 0; i < ARRAY_SIZE(disabled_features); ++i )
> +            disabled_features[i] = ~feat[i] & deep_features[i];
> +
> +        for ( b = 0; b < sizeof(disabled_features) * CHAR_BIT; ++b )
> +        {
> +            const uint32_t *dfs;
> +
> +            if ( !test_bit(b, disabled_features) ||
> +                 !(dfs = x86_cpuid_lookup_deep_deps(b)) )
> +                continue;
> +
> +            for ( i = 0; i < ARRAY_SIZE(disabled_features); ++i )
> +            {
> +                feat[i] &= ~dfs[i];
> +                disabled_features[i] &= ~dfs[i];
> +            }
> +        }
> +
> +        cpuid_featureset_to_policy(feat, p);
> +    }
> +
> +    if ( !di.hvm )
> +    {
> +        uint32_t host_featureset[FEATURESET_NR_ENTRIES] = {};
> +        uint32_t len = ARRAY_SIZE(host_featureset);
> +
> +        rc = xc_get_cpu_featureset(xch, XEN_SYSCTL_cpu_featureset_host,
> +                                   &len, host_featureset);
> +        if ( rc )
> +        {
> +            /* Tolerate "buffer too small", as we've got the bits we need. */
> +            if ( errno == ENOBUFS )
> +                rc = 0;
> +            else
> +            {
> +                PERROR("Failed to obtain host featureset");
> +                rc = -errno;
> +                goto out;
> +            }
> +        }
> +
> +        /*
> +         * On hardware without CPUID Faulting, PV guests see real topology.
> +         * As a consequence, they also need to see the host htt/cmp fields.
> +         */
> +        p->basic.htt       = test_bit(X86_FEATURE_HTT, host_featureset);
> +        p->extd.cmp_legacy = test_bit(X86_FEATURE_CMP_LEGACY, host_featureset);
> +    }
> +    else
> +    {
> +        /*
> +         * Topology for HVM guests is entirely controlled by Xen.  For now, we
> +         * hardcode APIC_ID = vcpu_id * 2 to give the illusion of no SMT.
> +         */
> +        p->basic.htt = true;
> +        p->extd.cmp_legacy = false;
> +
> +        p->basic.lppp *= 2;

So as I've just learned from investigating the multi-vCPU guest boot
issue on Rome, this ...

> +        switch ( p->x86_vendor )
> +        {
> +        case X86_VENDOR_INTEL:
> +            for ( i = 0; (p->cache.subleaf[i].type &&
> +                          i < ARRAY_SIZE(p->cache.raw)); ++i )
> +            {
> +                p->cache.subleaf[i].cores_per_package =
> +                    (p->cache.subleaf[i].cores_per_package << 1) | 1;

..., this, and ...

> +                p->cache.subleaf[i].threads_per_cache = 0;
> +            }
> +            break;
> +
> +        case X86_VENDOR_AMD:
> +        case X86_VENDOR_HYGON:
> +            p->extd.nc = (p->extd.nc << 1) | 1;

... this can overflow (in the first case in particular leading to
a value of zero when the initial value was 128). I think it
wouldn't be bad if all of these were made saturating operations
in this new code, despite this not being an exact equivalent of
the old code. I haven't tried out yet whether correcting this in
the old code (thus applicable to 4.12 and older) will be enough
to fix the issue, but it is certainly part of what's needed.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Xen-devel] [PATCH v2 08/10] tools/libxc: Rework xc_cpuid_apply_policy() to use {get, set}_cpu_policy()
  2019-09-18 16:09   ` Jan Beulich
@ 2019-09-19  8:48     ` Andrew Cooper
  0 siblings, 0 replies; 30+ messages in thread
From: Andrew Cooper @ 2019-09-19  8:48 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, Roger Pau Monné, Wei Liu, Ian Jackson

On 18/09/2019 17:09, Jan Beulich wrote:
> On 13.09.2019 21:27, Andrew Cooper wrote:
>> +    else
>> +    {
>> +        /*
>> +         * Topology for HVM guests is entirely controlled by Xen.  For now, we
>> +         * hardcode APIC_ID = vcpu_id * 2 to give the illusion of no SMT.
>> +         */
>> +        p->basic.htt = true;
>> +        p->extd.cmp_legacy = false;
>> +
>> +        p->basic.lppp *= 2;
> So as I've just learned from investigating the multi-vCPU guest boot
> issue on Rome, this ...
>
>> +        switch ( p->x86_vendor )
>> +        {
>> +        case X86_VENDOR_INTEL:
>> +            for ( i = 0; (p->cache.subleaf[i].type &&
>> +                          i < ARRAY_SIZE(p->cache.raw)); ++i )
>> +            {
>> +                p->cache.subleaf[i].cores_per_package =
>> +                    (p->cache.subleaf[i].cores_per_package << 1) | 1;
> ..., this, and ...
>
>> +                p->cache.subleaf[i].threads_per_cache = 0;
>> +            }
>> +            break;
>> +
>> +        case X86_VENDOR_AMD:
>> +        case X86_VENDOR_HYGON:
>> +            p->extd.nc = (p->extd.nc << 1) | 1;
> ... this can overflow (in the first case in particular leading to
> a value of zero when the initial value was 128). I think it
> wouldn't be bad if all of these were made saturating operations
> in this new code, despite this not being an exact equivalent of
> the old code. I haven't tried out yet whether correcting this in
> the old code (thus applicable to 4.12 and older) will be enough
> to fix the issue, but it is certainly part of what's needed.

Given that we obviously want to backport, and I'm still waiting on acks,
it might be best for me to rebase this patch over whatever changes are
necessary to fix Rome.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Xen-devel] [PATCH v3 08/10] tools/libxc: Rework xc_cpuid_apply_policy() to use {get, set}_cpu_policy()
  2019-09-13 19:27 ` [Xen-devel] [PATCH v2 08/10] tools/libxc: Rework xc_cpuid_apply_policy() " Andrew Cooper
  2019-09-16 11:17   ` Jan Beulich
  2019-09-18 16:09   ` Jan Beulich
@ 2019-09-25 18:11   ` " Andrew Cooper
  2019-09-26  8:04     ` Jan Beulich
  2 siblings, 1 reply; 30+ messages in thread
From: Andrew Cooper @ 2019-09-25 18:11 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, Ian Jackson, Wei Liu, Jan Beulich, Roger Pau Monné

The purpose of this change is to stop using xc_cpuid_do_domctl(), and to stop
basing decisions on a local CPUID instruction.  This is not a correct or
appropriate way to construct policy information for other domains.

The overwhelming majority of this logic is redundant with the policy logic in
Xen, but has a habit of becoming stale (e.g. c/s 97e4ebdcd76 resulting in the
CPUID.7[1].eax not being offered to guests even when Xen is happy with the
content).

There are a few subtle side effects which need to remain in place.  A
successful call to xc_cpuid_apply_policy() must result in a call to
xc_set_domain_cpu_policy() because that is currently the only way the
ITSC/VMX/SVM bits become reflected in the guests CPUID view.  Future cleanup
will remove this side effect.

The topology tweaks are local to libxc.  Extend struct cpuid_policy with
enough named fields to express the logic, but keep it identical to before.
Fixing topology representation is another future area of work.

No (expected) change in behaviour.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wl@xen.org>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Ian Jackson <Ian.Jackson@citrix.com>

The repositioning of xc_cpuid_apply_policy() relative to xc_cpuid_set() is
simply to make the diff readable.  It is completely illegible otherwise.

v2:
 * Reword the commit message to drop AVX512_BF16
 * Initialise host_featureset[] just in case.
v3:
 * Use domain default policy to restore the previous behaviour.
 * Rebase over AMD Rome CPUID changes.
---
 tools/libxc/xc_cpuid_x86.c      | 833 ++++++++++------------------------------
 xen/include/xen/lib/x86/cpuid.h |  11 +-
 2 files changed, 219 insertions(+), 625 deletions(-)

diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
index 43bda10d96..4ae4e6899c 100644
--- a/tools/libxc/xc_cpuid_x86.c
+++ b/tools/libxc/xc_cpuid_x86.c
@@ -34,18 +34,13 @@ enum {
 
 #include <xen/asm/x86-vendors.h>
 
-#include <xen/lib/x86/cpuid.h>
-#include <xen/lib/x86/msr.h>
+#include <xen/lib/x86/cpu-policy.h>
 
 #define bitmaskof(idx)      (1u << ((idx) & 31))
 #define featureword_of(idx) ((idx) >> 5)
 #define clear_feature(idx, dst) ((dst) &= ~bitmaskof(idx))
 #define set_feature(idx, dst)   ((dst) |=  bitmaskof(idx))
 
-#define DEF_MAX_BASE 0x0000000du
-#define DEF_MAX_INTELEXT  0x80000008u
-#define DEF_MAX_AMDEXT    0x8000001cu
-
 int xc_get_cpu_levelling_caps(xc_interface *xch, uint32_t *caps)
 {
     DECLARE_SYSCTL;
@@ -282,622 +277,6 @@ int xc_set_domain_cpu_policy(xc_interface *xch, uint32_t domid,
     return ret;
 }
 
-struct cpuid_domain_info
-{
-    unsigned int vendor; /* X86_VENDOR_* */
-
-    bool hvm;
-    uint64_t xfeature_mask;
-
-    /*
-     * Careful with featureset lengths.
-     *
-     * Code in this file requires featureset to have at least
-     * xc_get_cpu_featureset_size() entries.  This is a libxc compiletime
-     * constant.
-     *
-     * The featureset length used by the hypervisor may be different.  If the
-     * hypervisor version is longer, XEN_SYSCTL_get_cpu_featureset will fail
-     * with -ENOBUFS, and libxc really does need rebuilding.  If the
-     * hypervisor version is shorter, it is safe to zero-extend.
-     */
-    uint32_t *featureset;
-    unsigned int nr_features;
-
-    /* PV-only information. */
-    bool pv64;
-
-    /* HVM-only information. */
-    bool pae;
-    bool nestedhvm;
-};
-
-static void cpuid(const unsigned int *input, unsigned int *regs)
-{
-    unsigned int count = (input[1] == XEN_CPUID_INPUT_UNUSED) ? 0 : input[1];
-#ifdef __i386__
-    /* Use the stack to avoid reg constraint failures with some gcc flags */
-    asm (
-        "push %%ebx; push %%edx\n\t"
-        "cpuid\n\t"
-        "mov %%ebx,4(%4)\n\t"
-        "mov %%edx,12(%4)\n\t"
-        "pop %%edx; pop %%ebx\n\t"
-        : "=a" (regs[0]), "=c" (regs[2])
-        : "0" (input[0]), "1" (count), "S" (regs)
-        : "memory" );
-#else
-    asm (
-        "cpuid"
-        : "=a" (regs[0]), "=b" (regs[1]), "=c" (regs[2]), "=d" (regs[3])
-        : "0" (input[0]), "2" (count) );
-#endif
-}
-
-static int get_cpuid_domain_info(xc_interface *xch, uint32_t domid,
-                                 struct cpuid_domain_info *info,
-                                 const uint32_t *featureset,
-                                 unsigned int nr_features)
-{
-    struct xen_domctl domctl = {};
-    xc_dominfo_t di;
-    unsigned int in[2] = { 0, ~0U }, regs[4];
-    unsigned int i, host_nr_features = xc_get_cpu_featureset_size();
-    int rc;
-
-    cpuid(in, regs);
-    info->vendor = x86_cpuid_lookup_vendor(regs[1], regs[2], regs[3]);
-
-    if ( xc_domain_getinfo(xch, domid, 1, &di) != 1 ||
-         di.domid != domid )
-        return -ESRCH;
-
-    info->hvm = di.hvm;
-
-    info->featureset = calloc(host_nr_features, sizeof(*info->featureset));
-    if ( !info->featureset )
-        return -ENOMEM;
-
-    info->nr_features = host_nr_features;
-
-    if ( featureset )
-    {
-        /*
-         * The user supplied featureset may be shorter or longer than
-         * host_nr_features.  Shorter is fine, and we will zero-extend.
-         * Longer is fine, so long as it only padded with zeros.
-         */
-        unsigned int fslen = min(host_nr_features, nr_features);
-
-        memcpy(info->featureset, featureset,
-               fslen * sizeof(*info->featureset));
-
-        /* Check for truncated set bits. */
-        for ( i = fslen; i < nr_features; ++i )
-            if ( featureset[i] != 0 )
-                return -EOPNOTSUPP;
-    }
-    else
-    {
-        rc = xc_get_cpu_featureset(xch, (info->hvm
-                                         ? XEN_SYSCTL_cpu_featureset_hvm
-                                         : XEN_SYSCTL_cpu_featureset_pv),
-                                   &host_nr_features, info->featureset);
-        if ( rc )
-            return -errno;
-    }
-
-    /* Get xstate information. */
-    domctl.cmd = XEN_DOMCTL_getvcpuextstate;
-    domctl.domain = domid;
-    rc = do_domctl(xch, &domctl);
-    if ( rc )
-        return -errno;
-
-    info->xfeature_mask = domctl.u.vcpuextstate.xfeature_mask;
-
-    if ( di.hvm )
-    {
-        uint64_t val;
-
-        rc = xc_hvm_param_get(xch, domid, HVM_PARAM_PAE_ENABLED, &val);
-        if ( rc )
-            return -errno;
-
-        info->pae = !!val;
-
-        rc = xc_hvm_param_get(xch, domid, HVM_PARAM_NESTEDHVM, &val);
-        if ( rc )
-            return -errno;
-
-        info->nestedhvm = !!val;
-    }
-    else
-    {
-        unsigned int width;
-
-        rc = xc_domain_get_guest_width(xch, domid, &width);
-        if ( rc )
-            return -errno;
-
-        info->pv64 = (width == 8);
-    }
-
-    return 0;
-}
-
-static void free_cpuid_domain_info(struct cpuid_domain_info *info)
-{
-    free(info->featureset);
-}
-
-static void amd_xc_cpuid_policy(const struct cpuid_domain_info *info,
-                                const unsigned int *input, unsigned int *regs)
-{
-    switch ( input[0] )
-    {
-    case 0x00000002:
-    case 0x00000004:
-        regs[0] = regs[1] = regs[2] = regs[3] = 0;
-        break;
-
-    case 0x80000000:
-        if ( regs[0] > DEF_MAX_AMDEXT )
-            regs[0] = DEF_MAX_AMDEXT;
-        break;
-
-    case 0x80000008:
-        /*
-         * ECX[15:12] is ApicIdCoreSize.
-         * ECX[7:0] is NumberOfCores (minus one).
-         * Update to reflect vLAPIC_ID = vCPU_ID * 2.  But make sure to avoid
-         * - overflow,
-         * - going out of sync with leaf 1 EBX[23:16],
-         * - incrementing ApicIdCoreSize when it's zero (which changes the
-         *   meaning of bits 7:0).
-         */
-        if ( (regs[2] & 0xffu) < 0x7fu )
-        {
-            if ( (regs[2] & 0xf000u) && (regs[2] & 0xf000u) != 0xf000u )
-                regs[2] = ((regs[2] + 0x1000u) & 0xf000u) | (regs[2] & 0xffu);
-            regs[2] = (regs[2] & 0xf000u) | ((regs[2] & 0x7fu) << 1) | 1u;
-        }
-        break;
-
-    case 0x8000000a: {
-        if ( !info->nestedhvm )
-        {
-            regs[0] = regs[1] = regs[2] = regs[3] = 0;
-            break;
-        }
-
-#define SVM_FEATURE_NPT            0x00000001 /* Nested page table support */
-#define SVM_FEATURE_LBRV           0x00000002 /* LBR virtualization support */
-#define SVM_FEATURE_SVML           0x00000004 /* SVM locking MSR support */
-#define SVM_FEATURE_NRIPS          0x00000008 /* Next RIP save on VMEXIT */
-#define SVM_FEATURE_TSCRATEMSR     0x00000010 /* TSC ratio MSR support */
-#define SVM_FEATURE_VMCBCLEAN      0x00000020 /* VMCB clean bits support */
-#define SVM_FEATURE_FLUSHBYASID    0x00000040 /* TLB flush by ASID support */
-#define SVM_FEATURE_DECODEASSISTS  0x00000080 /* Decode assists support */
-#define SVM_FEATURE_PAUSEFILTER    0x00000400 /* Pause intercept filter */
-
-        /* Pass 1: Only passthrough SVM features which are
-         * available in hw and which are implemented
-         */
-        regs[3] &= (SVM_FEATURE_NPT | SVM_FEATURE_LBRV | \
-            SVM_FEATURE_NRIPS | SVM_FEATURE_PAUSEFILTER | \
-            SVM_FEATURE_DECODEASSISTS);
-
-        /* Pass 2: Always enable SVM features which are emulated */
-        regs[3] |= SVM_FEATURE_VMCBCLEAN | SVM_FEATURE_TSCRATEMSR;
-        break;
-    }
-
-    }
-}
-
-static void intel_xc_cpuid_policy(const struct cpuid_domain_info *info,
-                                  const unsigned int *input, unsigned int *regs)
-{
-    switch ( input[0] )
-    {
-    case 0x00000004:
-        /*
-         * EAX[31:26] is Maximum Cores Per Package (minus one).
-         * Update to reflect vLAPIC_ID = vCPU_ID * 2.
-         */
-        regs[0] = (((regs[0] & 0x7c000000u) << 1) | 0x04000000u |
-                   (regs[0] & 0x3ffu));
-        regs[3] &= 0x3ffu;
-        break;
-
-    case 0x80000000:
-        if ( regs[0] > DEF_MAX_INTELEXT )
-            regs[0] = DEF_MAX_INTELEXT;
-        break;
-
-    case 0x80000005:
-        regs[0] = regs[1] = regs[2] = 0;
-        break;
-
-    case 0x80000008:
-        /* Mask AMD Number of Cores information. */
-        regs[2] = 0;
-        break;
-    }
-}
-
-static void xc_cpuid_hvm_policy(const struct cpuid_domain_info *info,
-                                const unsigned int *input, unsigned int *regs)
-{
-    switch ( input[0] )
-    {
-    case 0x00000000:
-        if ( regs[0] > DEF_MAX_BASE )
-            regs[0] = DEF_MAX_BASE;
-        break;
-
-    case 0x00000001:
-        /*
-         * EBX[23:16] is Maximum Logical Processors Per Package.
-         * Update to reflect vLAPIC_ID = vCPU_ID * 2, but make sure to avoid
-         * overflow.
-         */
-        if ( !(regs[1] & 0x00800000u) )
-            regs[1] = (regs[1] & 0x0000ffffu) | ((regs[1] & 0x007f0000u) << 1);
-        else
-            regs[1] &= 0x00ffffffu;
-
-        regs[2] = info->featureset[featureword_of(X86_FEATURE_SSE3)];
-        regs[3] = (info->featureset[featureword_of(X86_FEATURE_FPU)] |
-                   bitmaskof(X86_FEATURE_HTT));
-        break;
-
-    case 0x00000007: /* Intel-defined CPU features */
-        if ( input[1] == 0 )
-        {
-            regs[1] = info->featureset[featureword_of(X86_FEATURE_FSGSBASE)];
-            regs[2] = info->featureset[featureword_of(X86_FEATURE_PREFETCHWT1)];
-            regs[3] = info->featureset[featureword_of(X86_FEATURE_AVX512_4VNNIW)];
-        }
-        else
-        {
-            regs[1] = 0;
-            regs[2] = 0;
-            regs[3] = 0;
-        }
-        regs[0] = 0;
-        break;
-
-    case 0x0000000d: /* Xen automatically calculates almost everything. */
-        if ( input[1] == 1 )
-            regs[0] = info->featureset[featureword_of(X86_FEATURE_XSAVEOPT)];
-        else
-            regs[0] = 0;
-        regs[1] = regs[2] = regs[3] = 0;
-        break;
-
-    case 0x80000000:
-        /* Passthrough to cpu vendor specific functions */
-        break;
-
-    case 0x80000001:
-        regs[2] = (info->featureset[featureword_of(X86_FEATURE_LAHF_LM)] &
-                   ~bitmaskof(X86_FEATURE_CMP_LEGACY));
-        regs[3] = info->featureset[featureword_of(X86_FEATURE_SYSCALL)];
-        break;
-
-    case 0x80000007:
-        /*
-         * Keep only TSCInvariant. This may be cleared by the hypervisor
-         * depending on guest TSC and migration settings.
-         */
-        regs[0] = regs[1] = regs[2] = 0;
-        regs[3] &= 1u<<8;
-        break;
-
-    case 0x80000008:
-        regs[0] &= 0x0000ffffu;
-        regs[1] = info->featureset[featureword_of(X86_FEATURE_CLZERO)];
-        /* regs[2] handled in the per-vendor logic. */
-        regs[3] = 0;
-        break;
-
-    case 0x00000002: /* Intel cache info (dumped by AMD policy) */
-    case 0x00000004: /* Intel cache info (dumped by AMD policy) */
-    case 0x0000000a: /* Architectural Performance Monitor Features */
-    case 0x80000002: /* Processor name string */
-    case 0x80000003: /* ... continued         */
-    case 0x80000004: /* ... continued         */
-    case 0x80000005: /* AMD L1 cache/TLB info (dumped by Intel policy) */
-    case 0x80000006: /* AMD L2/3 cache/TLB info ; Intel L2 cache features */
-    case 0x8000000a: /* AMD SVM feature bits */
-    case 0x80000019: /* AMD 1G TLB */
-    case 0x8000001a: /* AMD perf hints */
-    case 0x8000001c: /* AMD lightweight profiling */
-        break;
-
-    default:
-        regs[0] = regs[1] = regs[2] = regs[3] = 0;
-        break;
-    }
-
-    if ( info->vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON) )
-        amd_xc_cpuid_policy(info, input, regs);
-    else
-        intel_xc_cpuid_policy(info, input, regs);
-}
-
-static void xc_cpuid_pv_policy(const struct cpuid_domain_info *info,
-                               const unsigned int *input, unsigned int *regs)
-{
-    switch ( input[0] )
-    {
-    case 0x00000000:
-        if ( regs[0] > DEF_MAX_BASE )
-            regs[0] = DEF_MAX_BASE;
-        break;
-
-    case 0x00000001:
-    {
-        /* Host topology exposed to PV guest.  Provide host value. */
-        bool host_htt = regs[3] & bitmaskof(X86_FEATURE_HTT);
-
-        /*
-         * Don't pick host's Initial APIC ID which can change from run
-         * to run.
-         */
-        regs[1] &= 0x00ffffffu;
-
-        regs[2] = info->featureset[featureword_of(X86_FEATURE_SSE3)];
-        regs[3] = (info->featureset[featureword_of(X86_FEATURE_FPU)] &
-                   ~bitmaskof(X86_FEATURE_HTT));
-
-        if ( host_htt )
-            regs[3] |= bitmaskof(X86_FEATURE_HTT);
-        break;
-    }
-
-    case 0x00000007:
-        if ( input[1] == 0 )
-        {
-            regs[1] = info->featureset[featureword_of(X86_FEATURE_FSGSBASE)];
-            regs[2] = info->featureset[featureword_of(X86_FEATURE_PREFETCHWT1)];
-            regs[3] = info->featureset[featureword_of(X86_FEATURE_AVX512_4VNNIW)];
-        }
-        else
-        {
-            regs[1] = 0;
-            regs[2] = 0;
-            regs[3] = 0;
-        }
-        regs[0] = 0;
-        break;
-
-    case 0x0000000d: /* Xen automatically calculates almost everything. */
-        if ( input[1] == 1 )
-            regs[0] = info->featureset[featureword_of(X86_FEATURE_XSAVEOPT)];
-        else
-            regs[0] = 0;
-        regs[1] = regs[2] = regs[3] = 0;
-        break;
-
-    case 0x80000000:
-    {
-        unsigned int max = (info->vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON))
-            ? DEF_MAX_AMDEXT : DEF_MAX_INTELEXT;
-
-        if ( regs[0] > max )
-            regs[0] = max;
-        break;
-    }
-
-    case 0x80000001:
-    {
-        /* Host topology exposed to PV guest.  Provide host CMP_LEGACY value. */
-        bool host_cmp_legacy = regs[2] & bitmaskof(X86_FEATURE_CMP_LEGACY);
-
-        regs[2] = (info->featureset[featureword_of(X86_FEATURE_LAHF_LM)] &
-                   ~bitmaskof(X86_FEATURE_CMP_LEGACY));
-        regs[3] = info->featureset[featureword_of(X86_FEATURE_SYSCALL)];
-
-        if ( host_cmp_legacy )
-            regs[2] |= bitmaskof(X86_FEATURE_CMP_LEGACY);
-
-        break;
-    }
-
-    case 0x80000008:
-        regs[0] &= 0x0000ffffu;
-        regs[1] = info->featureset[featureword_of(X86_FEATURE_CLZERO)];
-        regs[2] = regs[3] = 0;
-        break;
-
-    case 0x00000005: /* MONITOR/MWAIT */
-    case 0x0000000b: /* Extended Topology Enumeration */
-    case 0x8000000a: /* SVM revision and features */
-    case 0x8000001b: /* Instruction Based Sampling */
-    case 0x8000001c: /* Light Weight Profiling */
-    case 0x8000001e: /* Extended topology reporting */
-        regs[0] = regs[1] = regs[2] = regs[3] = 0;
-        break;
-    }
-}
-
-static void xc_cpuid_policy(const struct cpuid_domain_info *info,
-                            const unsigned int *input, unsigned int *regs)
-{
-    /*
-     * For hypervisor leaves (0x4000XXXX) only 0x4000xx00.EAX[7:0] bits (max
-     * number of leaves) can be set by user. Hypervisor will enforce this so
-     * all other bits are don't-care and we can set them to zero.
-     */
-    if ( (input[0] & 0xffff0000) == 0x40000000 )
-    {
-        regs[0] = regs[1] = regs[2] = regs[3] = 0;
-        return;
-    }
-
-    if ( info->hvm )
-        xc_cpuid_hvm_policy(info, input, regs);
-    else
-        xc_cpuid_pv_policy(info, input, regs);
-}
-
-static int xc_cpuid_do_domctl(
-    xc_interface *xch, uint32_t domid,
-    const unsigned int *input, const unsigned int *regs)
-{
-    DECLARE_DOMCTL;
-
-    memset(&domctl, 0, sizeof (domctl));
-    domctl.domain = domid;
-    domctl.cmd = XEN_DOMCTL_set_cpuid;
-    domctl.u.cpuid.input[0] = input[0];
-    domctl.u.cpuid.input[1] = input[1];
-    domctl.u.cpuid.eax = regs[0];
-    domctl.u.cpuid.ebx = regs[1];
-    domctl.u.cpuid.ecx = regs[2];
-    domctl.u.cpuid.edx = regs[3];
-
-    return do_domctl(xch, &domctl);
-}
-
-static void sanitise_featureset(struct cpuid_domain_info *info)
-{
-    const uint32_t fs_size = xc_get_cpu_featureset_size();
-    uint32_t disabled_features[fs_size];
-    static const uint32_t deep_features[] = INIT_DEEP_FEATURES;
-    unsigned int i, b;
-
-    if ( info->hvm )
-    {
-        /* HVM or PVH Guest */
-
-        if ( !info->pae )
-            clear_bit(X86_FEATURE_PAE, info->featureset);
-
-        if ( !info->nestedhvm )
-        {
-            clear_bit(X86_FEATURE_SVM, info->featureset);
-            clear_bit(X86_FEATURE_VMX, info->featureset);
-        }
-    }
-    else
-    {
-        /* PV Guest */
-
-        if ( !info->pv64 )
-        {
-            clear_bit(X86_FEATURE_LM, info->featureset);
-            if ( !(info->vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) )
-                clear_bit(X86_FEATURE_SYSCALL, info->featureset);
-        }
-
-        clear_bit(X86_FEATURE_PSE, info->featureset);
-        clear_bit(X86_FEATURE_PSE36, info->featureset);
-        clear_bit(X86_FEATURE_PGE, info->featureset);
-        clear_bit(X86_FEATURE_PAGE1GB, info->featureset);
-    }
-
-    if ( info->xfeature_mask == 0 )
-        clear_bit(X86_FEATURE_XSAVE, info->featureset);
-
-    /* Disable deep dependencies of disabled features. */
-    for ( i = 0; i < ARRAY_SIZE(disabled_features); ++i )
-        disabled_features[i] = ~info->featureset[i] & deep_features[i];
-
-    for ( b = 0; b < sizeof(disabled_features) * CHAR_BIT; ++b )
-    {
-        const uint32_t *dfs;
-
-        if ( !test_bit(b, disabled_features) ||
-             !(dfs = x86_cpuid_lookup_deep_deps(b)) )
-             continue;
-
-        for ( i = 0; i < ARRAY_SIZE(disabled_features); ++i )
-        {
-            info->featureset[i] &= ~dfs[i];
-            disabled_features[i] &= ~dfs[i];
-        }
-    }
-}
-
-int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid,
-                          const uint32_t *featureset, unsigned int nr_features)
-{
-    struct cpuid_domain_info info = {};
-    unsigned int input[2] = { 0, 0 }, regs[4];
-    unsigned int base_max, ext_max;
-    int rc;
-
-    rc = get_cpuid_domain_info(xch, domid, &info, featureset, nr_features);
-    if ( rc )
-        goto out;
-
-    cpuid(input, regs);
-    base_max = (regs[0] <= DEF_MAX_BASE) ? regs[0] : DEF_MAX_BASE;
-    input[0] = 0x80000000;
-    cpuid(input, regs);
-
-    if ( info.vendor == X86_VENDOR_AMD || info.vendor == X86_VENDOR_HYGON )
-        ext_max = (regs[0] <= DEF_MAX_AMDEXT) ? regs[0] : DEF_MAX_AMDEXT;
-    else
-        ext_max = (regs[0] <= DEF_MAX_INTELEXT) ? regs[0] : DEF_MAX_INTELEXT;
-
-    sanitise_featureset(&info);
-
-    input[0] = 0;
-    input[1] = XEN_CPUID_INPUT_UNUSED;
-    for ( ; ; )
-    {
-        cpuid(input, regs);
-        xc_cpuid_policy(&info, input, regs);
-
-        if ( regs[0] || regs[1] || regs[2] || regs[3] )
-        {
-            rc = xc_cpuid_do_domctl(xch, domid, input, regs);
-            if ( rc )
-                goto out;
-        }
-
-        /* Intel cache descriptor leaves. */
-        if ( input[0] == 4 )
-        {
-            input[1]++;
-            /* More to do? Then loop keeping %%eax==0x00000004. */
-            if ( (regs[0] & 0x1f) != 0 )
-                continue;
-        }
-        /* Extended Topology leaves. */
-        else if ( input[0] == 0xb )
-        {
-            uint8_t level_type = regs[2] >> 8;
-
-            input[1]++;
-            if ( level_type >= 1 && level_type <= 2 )
-                continue;
-        }
-
-        input[0]++;
-        if ( !(input[0] & 0x80000000u) && (input[0] > base_max ) )
-            input[0] = 0x80000000u;
-
-        input[1] = XEN_CPUID_INPUT_UNUSED;
-        if ( (input[0] == 4) || (input[0] == 7) || (input[0] == 0xb) )
-            input[1] = 0;
-        else if ( input[0] == 0xd )
-            input[1] = 1; /* Xen automatically calculates almost everything. */
-
-        if ( (input[0] & 0x80000000u) && (input[0] > ext_max) )
-            break;
-    }
-
- out:
-    free_cpuid_domain_info(&info);
-    return rc;
-}
-
 /*
  * Configure a single input with the informatiom from config.
  *
@@ -1074,3 +453,213 @@ int xc_cpuid_set(
 
     return rc;
 }
+
+int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid,
+                          const uint32_t *featureset, unsigned int nr_features)
+{
+    int rc;
+    xc_dominfo_t di;
+    unsigned int i, nr_leaves, nr_msrs;
+    xen_cpuid_leaf_t *leaves = NULL;
+    struct cpuid_policy *p = NULL;
+    uint32_t err_leaf = -1, err_subleaf = -1, err_msr = -1;
+
+    if ( xc_domain_getinfo(xch, domid, 1, &di) != 1 ||
+         di.domid != domid )
+    {
+        ERROR("Failed to obtain d%d info", domid);
+        rc = -ESRCH;
+        goto out;
+    }
+
+    rc = xc_get_cpu_policy_size(xch, &nr_leaves, &nr_msrs);
+    if ( rc )
+    {
+        PERROR("Failed to obtain policy info size");
+        rc = -errno;
+        goto out;
+    }
+
+    rc = -ENOMEM;
+    if ( (leaves = calloc(nr_leaves, sizeof(*leaves))) == NULL ||
+         (p = calloc(1, sizeof(*p))) == NULL )
+        goto out;
+
+    /* Get the domain's default policy. */
+    nr_msrs = 0;
+    rc = xc_get_system_cpu_policy(xch, di.hvm ? XEN_SYSCTL_cpu_policy_hvm_default
+                                              : XEN_SYSCTL_cpu_policy_pv_default,
+                                  &nr_leaves, leaves, &nr_msrs, NULL);
+    if ( rc )
+    {
+        PERROR("Failed to obtain %s default policy", di.hvm ? "hvm" : "pv");
+        rc = -errno;
+        goto out;
+    }
+
+    rc = x86_cpuid_copy_from_buffer(p, leaves, nr_leaves,
+                                    &err_leaf, &err_subleaf);
+    if ( rc )
+    {
+        ERROR("Failed to deserialise CPUID (err leaf %#x, subleaf %#x) (%d = %s)",
+              err_leaf, err_subleaf, -rc, strerror(-rc));
+        goto out;
+    }
+
+    if ( featureset )
+    {
+        uint32_t disabled_features[FEATURESET_NR_ENTRIES],
+            feat[FEATURESET_NR_ENTRIES] = {};
+        static const uint32_t deep_features[] = INIT_DEEP_FEATURES;
+        unsigned int i, b;
+
+        /*
+         * The user supplied featureset may be shorter or longer than
+         * FEATURESET_NR_ENTRIES.  Shorter is fine, and we will zero-extend.
+         * Longer is fine, so long as it only padded with zeros.
+         */
+        unsigned int user_len = min(FEATURESET_NR_ENTRIES + 0u, nr_features);
+
+        /* Check for truncated set bits. */
+        rc = -EOPNOTSUPP;
+        for ( i = user_len; i < nr_features; ++i )
+            if ( featureset[i] != 0 )
+                goto out;
+
+        memcpy(feat, featureset, sizeof(*featureset) * user_len);
+
+        /* Disable deep dependencies of disabled features. */
+        for ( i = 0; i < ARRAY_SIZE(disabled_features); ++i )
+            disabled_features[i] = ~feat[i] & deep_features[i];
+
+        for ( b = 0; b < sizeof(disabled_features) * CHAR_BIT; ++b )
+        {
+            const uint32_t *dfs;
+
+            if ( !test_bit(b, disabled_features) ||
+                 !(dfs = x86_cpuid_lookup_deep_deps(b)) )
+                continue;
+
+            for ( i = 0; i < ARRAY_SIZE(disabled_features); ++i )
+            {
+                feat[i] &= ~dfs[i];
+                disabled_features[i] &= ~dfs[i];
+            }
+        }
+
+        cpuid_featureset_to_policy(feat, p);
+    }
+
+    if ( !di.hvm )
+    {
+        uint32_t host_featureset[FEATURESET_NR_ENTRIES] = {};
+        uint32_t len = ARRAY_SIZE(host_featureset);
+
+        rc = xc_get_cpu_featureset(xch, XEN_SYSCTL_cpu_featureset_host,
+                                   &len, host_featureset);
+        if ( rc )
+        {
+            /* Tolerate "buffer too small", as we've got the bits we need. */
+            if ( errno == ENOBUFS )
+                rc = 0;
+            else
+            {
+                PERROR("Failed to obtain host featureset");
+                rc = -errno;
+                goto out;
+            }
+        }
+
+        /*
+         * On hardware without CPUID Faulting, PV guests see real topology.
+         * As a consequence, they also need to see the host htt/cmp fields.
+         */
+        p->basic.htt       = test_bit(X86_FEATURE_HTT, host_featureset);
+        p->extd.cmp_legacy = test_bit(X86_FEATURE_CMP_LEGACY, host_featureset);
+    }
+    else
+    {
+        /*
+         * Topology for HVM guests is entirely controlled by Xen.  For now, we
+         * hardcode APIC_ID = vcpu_id * 2 to give the illusion of no SMT.
+         */
+        p->basic.htt = true;
+        p->extd.cmp_legacy = false;
+
+        /*
+         * EBX[23:16] is Maximum Logical Processors Per Package.
+         * Update to reflect vLAPIC_ID = vCPU_ID * 2, but make sure to avoid
+         * overflow.
+         */
+        if ( !(p->basic.lppp & 0x80) )
+            p->basic.lppp *= 2;
+
+        switch ( p->x86_vendor )
+        {
+        case X86_VENDOR_INTEL:
+            for ( i = 0; (p->cache.subleaf[i].type &&
+                          i < ARRAY_SIZE(p->cache.raw)); ++i )
+            {
+                p->cache.subleaf[i].cores_per_package =
+                    (p->cache.subleaf[i].cores_per_package << 1) | 1;
+                p->cache.subleaf[i].threads_per_cache = 0;
+            }
+            break;
+
+        case X86_VENDOR_AMD:
+        case X86_VENDOR_HYGON:
+            /*
+             * ECX[15:12] is ApicIdCoreSize.
+             * ECX[7:0] is NumberOfCores (minus one).
+             * Update to reflect vLAPIC_ID = vCPU_ID * 2.  But avoid
+             * - overflow,
+             * - going out of sync with leaf 1 EBX[23:16],
+             * - incrementing ApicIdCoreSize when it's zero (which changes the
+             *   meaning of bits 7:0).
+             */
+            if ( !(p->extd.nc & 0x80) )
+            {
+                if ( p->extd.apic_id_size != 0 && p->extd.apic_id_size != 0xf )
+                    p->extd.apic_id_size++;
+
+                p->extd.nc = (p->extd.nc << 1) | 1;
+            }
+            break;
+        }
+
+        /*
+         * These settings are necessary to cause earlier HVM_PARAM_NESTEDHVM /
+         * XEN_DOMCTL_disable_migrate settings to be reflected correctly in
+         * CPUID.  Xen will discard these bits if configuration hasn't been
+         * set for the domain.
+         */
+        p->extd.itsc = true;
+        p->basic.vmx = true;
+        p->extd.svm = true;
+    }
+
+    rc = x86_cpuid_copy_to_buffer(p, leaves, &nr_leaves);
+    if ( rc )
+    {
+        ERROR("Failed to serialise CPUID (%d = %s)", -rc, strerror(-rc));
+        goto out;
+    }
+
+    rc = xc_set_domain_cpu_policy(xch, domid, nr_leaves, leaves, 0, NULL,
+                                  &err_leaf, &err_subleaf, &err_msr);
+    if ( rc )
+    {
+        PERROR("Failed to set d%d's policy (err leaf %#x, subleaf %#x, msr %#x)",
+               domid, err_leaf, err_subleaf, err_msr);
+        rc = -errno;
+        goto out;
+    }
+
+    rc = 0;
+
+out:
+    free(p);
+    free(leaves);
+
+    return rc;
+}
diff --git a/xen/include/xen/lib/x86/cpuid.h b/xen/include/xen/lib/x86/cpuid.h
index 79840f99ce..331ef4f4f0 100644
--- a/xen/include/xen/lib/x86/cpuid.h
+++ b/xen/include/xen/lib/x86/cpuid.h
@@ -154,8 +154,12 @@ struct cpuid_policy
     union {
         struct cpuid_leaf raw[CPUID_GUEST_NR_CACHE];
         struct cpuid_cache_leaf {
-            uint32_t type:5,
-                :27, :32, :32, :32;
+            uint32_t /* a */ type:5, level:3;
+            bool self_init:1, fully_assoc:1;
+            uint32_t :4, threads_per_cache:12, cores_per_package:6;
+            uint32_t /* b */ line_size:12, partitions:10, ways:10;
+            uint32_t /* c */ sets;
+            bool /* d */ wbinvd:1, inclusive:1, complex:1;
         } subleaf[CPUID_GUEST_NR_CACHE];
     } cache;
 
@@ -259,7 +263,8 @@ struct cpuid_policy
                 uint32_t e8b;
                 struct { DECL_BITFIELD(e8b); };
             };
-            uint32_t /* c */:32, /* d */:32;
+            uint32_t nc:8, :4, apic_id_size:4, :16;
+            uint32_t /* d */:32;
         };
     } extd;
 
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Xen-devel] [PATCH v3 08/10] tools/libxc: Rework xc_cpuid_apply_policy() to use {get, set}_cpu_policy()
  2019-09-25 18:11   ` [Xen-devel] [PATCH v3 " Andrew Cooper
@ 2019-09-26  8:04     ` Jan Beulich
  2019-09-26 12:25       ` Andrew Cooper
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Beulich @ 2019-09-26  8:04 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Roger Pau Monné, Wei Liu, Ian Jackson

On 25.09.2019 20:11, Andrew Cooper wrote:
> +int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid,
> +                          const uint32_t *featureset, unsigned int nr_features)
> +{
> +    int rc;
> +    xc_dominfo_t di;
> +    unsigned int i, nr_leaves, nr_msrs;
> +    xen_cpuid_leaf_t *leaves = NULL;
> +    struct cpuid_policy *p = NULL;
> +    uint32_t err_leaf = -1, err_subleaf = -1, err_msr = -1;
> +
> +    if ( xc_domain_getinfo(xch, domid, 1, &di) != 1 ||
> +         di.domid != domid )
> +    {
> +        ERROR("Failed to obtain d%d info", domid);
> +        rc = -ESRCH;
> +        goto out;
> +    }
> +
> +    rc = xc_get_cpu_policy_size(xch, &nr_leaves, &nr_msrs);
> +    if ( rc )
> +    {
> +        PERROR("Failed to obtain policy info size");
> +        rc = -errno;
> +        goto out;
> +    }
> +
> +    rc = -ENOMEM;
> +    if ( (leaves = calloc(nr_leaves, sizeof(*leaves))) == NULL ||
> +         (p = calloc(1, sizeof(*p))) == NULL )
> +        goto out;
> +
> +    /* Get the domain's default policy. */
> +    nr_msrs = 0;
> +    rc = xc_get_system_cpu_policy(xch, di.hvm ? XEN_SYSCTL_cpu_policy_hvm_default
> +                                              : XEN_SYSCTL_cpu_policy_pv_default,
> +                                  &nr_leaves, leaves, &nr_msrs, NULL);
> +    if ( rc )
> +    {
> +        PERROR("Failed to obtain %s default policy", di.hvm ? "hvm" : "pv");
> +        rc = -errno;
> +        goto out;
> +    }
> +
> +    rc = x86_cpuid_copy_from_buffer(p, leaves, nr_leaves,
> +                                    &err_leaf, &err_subleaf);
> +    if ( rc )
> +    {
> +        ERROR("Failed to deserialise CPUID (err leaf %#x, subleaf %#x) (%d = %s)",
> +              err_leaf, err_subleaf, -rc, strerror(-rc));
> +        goto out;
> +    }
> +
> +    if ( featureset )
> +    {
> +        uint32_t disabled_features[FEATURESET_NR_ENTRIES],
> +            feat[FEATURESET_NR_ENTRIES] = {};
> +        static const uint32_t deep_features[] = INIT_DEEP_FEATURES;
> +        unsigned int i, b;
> +
> +        /*
> +         * The user supplied featureset may be shorter or longer than
> +         * FEATURESET_NR_ENTRIES.  Shorter is fine, and we will zero-extend.
> +         * Longer is fine, so long as it only padded with zeros.
> +         */
> +        unsigned int user_len = min(FEATURESET_NR_ENTRIES + 0u, nr_features);
> +
> +        /* Check for truncated set bits. */
> +        rc = -EOPNOTSUPP;
> +        for ( i = user_len; i < nr_features; ++i )
> +            if ( featureset[i] != 0 )
> +                goto out;
> +
> +        memcpy(feat, featureset, sizeof(*featureset) * user_len);
> +
> +        /* Disable deep dependencies of disabled features. */
> +        for ( i = 0; i < ARRAY_SIZE(disabled_features); ++i )
> +            disabled_features[i] = ~feat[i] & deep_features[i];
> +
> +        for ( b = 0; b < sizeof(disabled_features) * CHAR_BIT; ++b )
> +        {
> +            const uint32_t *dfs;
> +
> +            if ( !test_bit(b, disabled_features) ||
> +                 !(dfs = x86_cpuid_lookup_deep_deps(b)) )
> +                continue;
> +
> +            for ( i = 0; i < ARRAY_SIZE(disabled_features); ++i )
> +            {
> +                feat[i] &= ~dfs[i];
> +                disabled_features[i] &= ~dfs[i];
> +            }
> +        }
> +
> +        cpuid_featureset_to_policy(feat, p);
> +    }
> +
> +    if ( !di.hvm )
> +    {
> +        uint32_t host_featureset[FEATURESET_NR_ENTRIES] = {};
> +        uint32_t len = ARRAY_SIZE(host_featureset);
> +
> +        rc = xc_get_cpu_featureset(xch, XEN_SYSCTL_cpu_featureset_host,
> +                                   &len, host_featureset);
> +        if ( rc )
> +        {
> +            /* Tolerate "buffer too small", as we've got the bits we need. */
> +            if ( errno == ENOBUFS )
> +                rc = 0;
> +            else
> +            {
> +                PERROR("Failed to obtain host featureset");
> +                rc = -errno;
> +                goto out;
> +            }
> +        }
> +
> +        /*
> +         * On hardware without CPUID Faulting, PV guests see real topology.
> +         * As a consequence, they also need to see the host htt/cmp fields.
> +         */
> +        p->basic.htt       = test_bit(X86_FEATURE_HTT, host_featureset);
> +        p->extd.cmp_legacy = test_bit(X86_FEATURE_CMP_LEGACY, host_featureset);
> +    }
> +    else
> +    {
> +        /*
> +         * Topology for HVM guests is entirely controlled by Xen.  For now, we
> +         * hardcode APIC_ID = vcpu_id * 2 to give the illusion of no SMT.
> +         */
> +        p->basic.htt = true;
> +        p->extd.cmp_legacy = false;
> +
> +        /*
> +         * EBX[23:16] is Maximum Logical Processors Per Package.
> +         * Update to reflect vLAPIC_ID = vCPU_ID * 2, but make sure to avoid
> +         * overflow.
> +         */
> +        if ( !(p->basic.lppp & 0x80) )
> +            p->basic.lppp *= 2;

I think you want to start the comment with "Leaf 1 EBX[23:16] ...",
as p->basic covers all basic leaves.

Additionally, while using masking instead of a relational operator
is correct here, ...

> +        switch ( p->x86_vendor )
> +        {
> +        case X86_VENDOR_INTEL:
> +            for ( i = 0; (p->cache.subleaf[i].type &&
> +                          i < ARRAY_SIZE(p->cache.raw)); ++i )
> +            {
> +                p->cache.subleaf[i].cores_per_package =
> +                    (p->cache.subleaf[i].cores_per_package << 1) | 1;
> +                p->cache.subleaf[i].threads_per_cache = 0;
> +            }
> +            break;
> +
> +        case X86_VENDOR_AMD:
> +        case X86_VENDOR_HYGON:
> +            /*
> +             * ECX[15:12] is ApicIdCoreSize.
> +             * ECX[7:0] is NumberOfCores (minus one).
> +             * Update to reflect vLAPIC_ID = vCPU_ID * 2.  But avoid
> +             * - overflow,
> +             * - going out of sync with leaf 1 EBX[23:16],
> +             * - incrementing ApicIdCoreSize when it's zero (which changes the
> +             *   meaning of bits 7:0).
> +             */
> +            if ( !(p->extd.nc & 0x80) )

... it isn't here, i.e. this isn't a correct transformation of the
recent change for Rome): If the value is 0x7f here, the value in
leaf 1 would be 0x80. An adjustment, however, needs to be done
either to both leaves, or to none of them, to keep the values in
sufficient sync (and I think you'd break Rome again otherwise, as
p->extd.nc _is_ 0x7f there). Hence the "(regs[2] & 0xffu) < 0x7fu"
check in my recent patch.

Like above I think you want to name the (extended) leaf in the
comment, as p->extd similarly covers all extended leaves.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Xen-devel] [PATCH v3 08/10] tools/libxc: Rework xc_cpuid_apply_policy() to use {get, set}_cpu_policy()
  2019-09-26  8:04     ` Jan Beulich
@ 2019-09-26 12:25       ` Andrew Cooper
  0 siblings, 0 replies; 30+ messages in thread
From: Andrew Cooper @ 2019-09-26 12:25 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, Roger Pau Monné, Wei Liu, Ian Jackson

On 26/09/2019 09:04, Jan Beulich wrote:
> On 25.09.2019 20:11, Andrew Cooper wrote:
>> +int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid,
>> +                          const uint32_t *featureset, unsigned int nr_features)
>> +{
>> +    int rc;
>> +    xc_dominfo_t di;
>> +    unsigned int i, nr_leaves, nr_msrs;
>> +    xen_cpuid_leaf_t *leaves = NULL;
>> +    struct cpuid_policy *p = NULL;
>> +    uint32_t err_leaf = -1, err_subleaf = -1, err_msr = -1;
>> +
>> +    if ( xc_domain_getinfo(xch, domid, 1, &di) != 1 ||
>> +         di.domid != domid )
>> +    {
>> +        ERROR("Failed to obtain d%d info", domid);
>> +        rc = -ESRCH;
>> +        goto out;
>> +    }
>> +
>> +    rc = xc_get_cpu_policy_size(xch, &nr_leaves, &nr_msrs);
>> +    if ( rc )
>> +    {
>> +        PERROR("Failed to obtain policy info size");
>> +        rc = -errno;
>> +        goto out;
>> +    }
>> +
>> +    rc = -ENOMEM;
>> +    if ( (leaves = calloc(nr_leaves, sizeof(*leaves))) == NULL ||
>> +         (p = calloc(1, sizeof(*p))) == NULL )
>> +        goto out;
>> +
>> +    /* Get the domain's default policy. */
>> +    nr_msrs = 0;
>> +    rc = xc_get_system_cpu_policy(xch, di.hvm ? XEN_SYSCTL_cpu_policy_hvm_default
>> +                                              : XEN_SYSCTL_cpu_policy_pv_default,
>> +                                  &nr_leaves, leaves, &nr_msrs, NULL);
>> +    if ( rc )
>> +    {
>> +        PERROR("Failed to obtain %s default policy", di.hvm ? "hvm" : "pv");
>> +        rc = -errno;
>> +        goto out;
>> +    }
>> +
>> +    rc = x86_cpuid_copy_from_buffer(p, leaves, nr_leaves,
>> +                                    &err_leaf, &err_subleaf);
>> +    if ( rc )
>> +    {
>> +        ERROR("Failed to deserialise CPUID (err leaf %#x, subleaf %#x) (%d = %s)",
>> +              err_leaf, err_subleaf, -rc, strerror(-rc));
>> +        goto out;
>> +    }
>> +
>> +    if ( featureset )
>> +    {
>> +        uint32_t disabled_features[FEATURESET_NR_ENTRIES],
>> +            feat[FEATURESET_NR_ENTRIES] = {};
>> +        static const uint32_t deep_features[] = INIT_DEEP_FEATURES;
>> +        unsigned int i, b;
>> +
>> +        /*
>> +         * The user supplied featureset may be shorter or longer than
>> +         * FEATURESET_NR_ENTRIES.  Shorter is fine, and we will zero-extend.
>> +         * Longer is fine, so long as it only padded with zeros.
>> +         */
>> +        unsigned int user_len = min(FEATURESET_NR_ENTRIES + 0u, nr_features);
>> +
>> +        /* Check for truncated set bits. */
>> +        rc = -EOPNOTSUPP;
>> +        for ( i = user_len; i < nr_features; ++i )
>> +            if ( featureset[i] != 0 )
>> +                goto out;
>> +
>> +        memcpy(feat, featureset, sizeof(*featureset) * user_len);
>> +
>> +        /* Disable deep dependencies of disabled features. */
>> +        for ( i = 0; i < ARRAY_SIZE(disabled_features); ++i )
>> +            disabled_features[i] = ~feat[i] & deep_features[i];
>> +
>> +        for ( b = 0; b < sizeof(disabled_features) * CHAR_BIT; ++b )
>> +        {
>> +            const uint32_t *dfs;
>> +
>> +            if ( !test_bit(b, disabled_features) ||
>> +                 !(dfs = x86_cpuid_lookup_deep_deps(b)) )
>> +                continue;
>> +
>> +            for ( i = 0; i < ARRAY_SIZE(disabled_features); ++i )
>> +            {
>> +                feat[i] &= ~dfs[i];
>> +                disabled_features[i] &= ~dfs[i];
>> +            }
>> +        }
>> +
>> +        cpuid_featureset_to_policy(feat, p);
>> +    }
>> +
>> +    if ( !di.hvm )
>> +    {
>> +        uint32_t host_featureset[FEATURESET_NR_ENTRIES] = {};
>> +        uint32_t len = ARRAY_SIZE(host_featureset);
>> +
>> +        rc = xc_get_cpu_featureset(xch, XEN_SYSCTL_cpu_featureset_host,
>> +                                   &len, host_featureset);
>> +        if ( rc )
>> +        {
>> +            /* Tolerate "buffer too small", as we've got the bits we need. */
>> +            if ( errno == ENOBUFS )
>> +                rc = 0;
>> +            else
>> +            {
>> +                PERROR("Failed to obtain host featureset");
>> +                rc = -errno;
>> +                goto out;
>> +            }
>> +        }
>> +
>> +        /*
>> +         * On hardware without CPUID Faulting, PV guests see real topology.
>> +         * As a consequence, they also need to see the host htt/cmp fields.
>> +         */
>> +        p->basic.htt       = test_bit(X86_FEATURE_HTT, host_featureset);
>> +        p->extd.cmp_legacy = test_bit(X86_FEATURE_CMP_LEGACY, host_featureset);
>> +    }
>> +    else
>> +    {
>> +        /*
>> +         * Topology for HVM guests is entirely controlled by Xen.  For now, we
>> +         * hardcode APIC_ID = vcpu_id * 2 to give the illusion of no SMT.
>> +         */
>> +        p->basic.htt = true;
>> +        p->extd.cmp_legacy = false;
>> +
>> +        /*
>> +         * EBX[23:16] is Maximum Logical Processors Per Package.
>> +         * Update to reflect vLAPIC_ID = vCPU_ID * 2, but make sure to avoid
>> +         * overflow.
>> +         */
>> +        if ( !(p->basic.lppp & 0x80) )
>> +            p->basic.lppp *= 2;
> I think you want to start the comment with "Leaf 1 EBX[23:16] ...",
> as p->basic covers all basic leaves.
>
> Additionally, while using masking instead of a relational operator
> is correct here, ...
>
>> +        switch ( p->x86_vendor )
>> +        {
>> +        case X86_VENDOR_INTEL:
>> +            for ( i = 0; (p->cache.subleaf[i].type &&
>> +                          i < ARRAY_SIZE(p->cache.raw)); ++i )
>> +            {
>> +                p->cache.subleaf[i].cores_per_package =
>> +                    (p->cache.subleaf[i].cores_per_package << 1) | 1;
>> +                p->cache.subleaf[i].threads_per_cache = 0;
>> +            }
>> +            break;
>> +
>> +        case X86_VENDOR_AMD:
>> +        case X86_VENDOR_HYGON:
>> +            /*
>> +             * ECX[15:12] is ApicIdCoreSize.
>> +             * ECX[7:0] is NumberOfCores (minus one).
>> +             * Update to reflect vLAPIC_ID = vCPU_ID * 2.  But avoid
>> +             * - overflow,
>> +             * - going out of sync with leaf 1 EBX[23:16],
>> +             * - incrementing ApicIdCoreSize when it's zero (which changes the
>> +             *   meaning of bits 7:0).
>> +             */
>> +            if ( !(p->extd.nc & 0x80) )
> ... it isn't here, i.e. this isn't a correct transformation of the
> recent change for Rome): If the value is 0x7f here, the value in
> leaf 1 would be 0x80. An adjustment, however, needs to be done
> either to both leaves, or to none of them, to keep the values in
> sufficient sync (and I think you'd break Rome again otherwise, as
> p->extd.nc _is_ 0x7f there). Hence the "(regs[2] & 0xffu) < 0x7fu"
> check in my recent patch.

Urgh yes - I questioned this when doing the rebase a second time.  I'll
revert to the logic you had.

>
> Like above I think you want to name the (extended) leaf in the
> comment, as p->extd similarly covers all extended leaves.

Done.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, back to index

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-13 19:27 [Xen-devel] [PATCH v2 00/10] x86/cpuid: Switch to using XEN_DOMCTL_set_cpumsr_policy Andrew Cooper
2019-09-13 19:27 ` [Xen-devel] [PATCH v2 01/10] x86/msr: Offer CPUID Faulting to PVH control domains Andrew Cooper
2019-09-16 10:53   ` Jan Beulich
2019-09-13 19:27 ` [Xen-devel] [PATCH v2 02/10] libx86: Proactively initialise error pointers Andrew Cooper
     [not found]   ` <527f33ad-3de1-15c7-eb4b-603eaf65f3c5@suse.com>
     [not found]     ` <65f18521-15c5-72a9-29f6-cd5d621e1283@citrix.com>
2019-09-16 15:46       ` Jan Beulich
2019-09-13 19:27 ` [Xen-devel] [PATCH v2 03/10] libx86: Introduce x86_cpu_policies_are_compatible() Andrew Cooper
2019-09-16 10:59   ` Jan Beulich
2019-09-16 15:31     ` Andrew Cooper
2019-09-13 19:27 ` [Xen-devel] [PATCH v2 04/10] x86/cpuid: Split update_domain_cpuid_info() in half Andrew Cooper
2019-09-13 19:27 ` [Xen-devel] [PATCH v2 05/10] x86/domctl: Implement XEN_DOMCTL_set_cpumsr_policy Andrew Cooper
2019-09-16 11:04   ` Jan Beulich
2019-09-16 15:40     ` Andrew Cooper
2019-09-13 19:27 ` [Xen-devel] [PATCH v2 06/10] tools/libxc: Pre-cleanup for xc_cpuid_{set, apply_policy}() Andrew Cooper
2019-09-16 11:09   ` Jan Beulich
2019-09-16 15:42     ` Andrew Cooper
2019-09-13 19:27 ` [Xen-devel] [PATCH v2 07/10] tools/libxc: Rework xc_cpuid_set() to use {get, set}_cpu_policy() Andrew Cooper
2019-09-13 19:27 ` [Xen-devel] [PATCH v2 08/10] tools/libxc: Rework xc_cpuid_apply_policy() " Andrew Cooper
2019-09-16 11:17   ` Jan Beulich
2019-09-16 13:41     ` Wei Liu
2019-09-16 15:49     ` Andrew Cooper
2019-09-16 16:05       ` Jan Beulich
2019-09-18 16:09   ` Jan Beulich
2019-09-19  8:48     ` Andrew Cooper
2019-09-25 18:11   ` [Xen-devel] [PATCH v3 " Andrew Cooper
2019-09-26  8:04     ` Jan Beulich
2019-09-26 12:25       ` Andrew Cooper
2019-09-13 19:27 ` [Xen-devel] [PATCH v2 09/10] x86/domctl: Drop XEN_DOMCTL_set_cpuid Andrew Cooper
2019-09-13 19:27 ` [Xen-devel] [PATCH v2 10/10] x86/cpuid: Enable CPUID Faulting for PV control domains by default Andrew Cooper
2019-09-16 11:22   ` Jan Beulich
2019-09-16 15:52     ` Andrew Cooper

Xen-Devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/xen-devel/0 xen-devel/git/0.git
	git clone --mirror https://lore.kernel.org/xen-devel/1 xen-devel/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 xen-devel xen-devel/ https://lore.kernel.org/xen-devel \
		xen-devel@lists.xenproject.org xen-devel@lists.xen.org xen-devel@archiver.kernel.org
	public-inbox-index xen-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.xenproject.lists.xen-devel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox