All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] AMD Nested Virt Preparation
@ 2024-03-13 12:24 George Dunlap
  2024-03-13 12:24 ` [PATCH v2 1/3] x86: Move SVM features exposed to guest into hvm_max_cpu_policy George Dunlap
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: George Dunlap @ 2024-03-13 12:24 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap

This series lays the groundwork for revamp of the AMD nested virt
functionality.  The first two patches are clean-ups or reorganizations
of existing code.  The final patch is the first major step towards making
the feature supportable: allowing Xen to refuse nested virt support if certain
hardware features are not present.

George Dunlap (3):
  x86: Move SVM features exposed to guest into hvm_max_cpu_policy
  nestedsvm: Disable TscRateMSR
  svm/nestedsvm: Introduce nested capabilities bit

 docs/designs/nested-svm-cpu-features.md      | 111 +++++++++++++++++++
 xen/arch/x86/cpu-policy.c                    |  29 ++---
 xen/arch/x86/domain.c                        |   6 +
 xen/arch/x86/hvm/nestedhvm.c                 |  10 ++
 xen/arch/x86/hvm/svm/nestedsvm.c             |  16 ++-
 xen/arch/x86/hvm/svm/svm.c                   |  57 ----------
 xen/arch/x86/hvm/vmx/vvmx.c                  |   8 ++
 xen/arch/x86/include/asm/hvm/hvm.h           |  16 ++-
 xen/arch/x86/include/asm/hvm/nestedhvm.h     |   4 +
 xen/arch/x86/include/asm/hvm/svm/nestedsvm.h |   5 -
 10 files changed, 184 insertions(+), 78 deletions(-)
 create mode 100644 docs/designs/nested-svm-cpu-features.md

-- 
2.25.1



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 1/3] x86: Move SVM features exposed to guest into hvm_max_cpu_policy
  2024-03-13 12:24 [PATCH v2 0/3] AMD Nested Virt Preparation George Dunlap
@ 2024-03-13 12:24 ` George Dunlap
  2024-03-18 13:54   ` Jan Beulich
  2024-03-13 12:24 ` [PATCH v2 2/3] nestedsvm: Disable TscRateMSR George Dunlap
  2024-03-13 12:24 ` [PATCH v2 3/3] svm/nestedsvm: Introduce nested capabilities bit George Dunlap
  2 siblings, 1 reply; 12+ messages in thread
From: George Dunlap @ 2024-03-13 12:24 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap

Currently (nested) SVM features we're willing to expose to the guest
are defined in calculate_host_policy, and stored in host_cpu_policy.
This is the wrong place for this; move it into
calculate_hvm_max_policy(), and store it in hvm_max_cpu_policy.

Signed-off-by: George Dunlap <george.dunlap@cloud.com>
---
v2:
- New
---
 xen/arch/x86/cpu-policy.c | 30 +++++++++++++++++-------------
 1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c
index 2acc27632f..bd047456eb 100644
--- a/xen/arch/x86/cpu-policy.c
+++ b/xen/arch/x86/cpu-policy.c
@@ -398,19 +398,6 @@ static void __init calculate_host_policy(void)
     if ( vpmu_mode == XENPMU_MODE_OFF )
         p->basic.raw[0xa] = EMPTY_LEAF;
 
-    if ( p->extd.svm )
-    {
-        /* Clamp to implemented features which require hardware support. */
-        p->extd.raw[0xa].d &= ((1u << SVM_FEATURE_NPT) |
-                               (1u << SVM_FEATURE_LBRV) |
-                               (1u << SVM_FEATURE_NRIPS) |
-                               (1u << SVM_FEATURE_PAUSEFILTER) |
-                               (1u << SVM_FEATURE_DECODEASSISTS));
-        /* Enable features which are always emulated. */
-        p->extd.raw[0xa].d |= ((1u << SVM_FEATURE_VMCBCLEAN) |
-                               (1u << SVM_FEATURE_TSCRATEMSR));
-    }
-
     /* 0x000000ce  MSR_INTEL_PLATFORM_INFO */
     /* probe_cpuid_faulting() sanity checks presence of MISC_FEATURES_ENABLES */
     p->platform_info.cpuid_faulting = cpu_has_cpuid_faulting;
@@ -741,6 +728,23 @@ static void __init calculate_hvm_max_policy(void)
     if ( !cpu_has_vmx )
         __clear_bit(X86_FEATURE_PKS, fs);
 
+    /* 
+     * Make adjustments to possible (nested) virtualization features exposed
+     * to the guest
+     */
+    if ( p->extd.svm )
+    {
+        /* Clamp to implemented features which require hardware support. */
+        p->extd.raw[0xa].d &= ((1u << SVM_FEATURE_NPT) |
+                               (1u << SVM_FEATURE_LBRV) |
+                               (1u << SVM_FEATURE_NRIPS) |
+                               (1u << SVM_FEATURE_PAUSEFILTER) |
+                               (1u << SVM_FEATURE_DECODEASSISTS));
+        /* Enable features which are always emulated. */
+        p->extd.raw[0xa].d |= ((1u << SVM_FEATURE_VMCBCLEAN) |
+                               (1u << SVM_FEATURE_TSCRATEMSR));
+    }
+    
     guest_common_max_feature_adjustments(fs);
     guest_common_feature_adjustments(fs);
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 2/3] nestedsvm: Disable TscRateMSR
  2024-03-13 12:24 [PATCH v2 0/3] AMD Nested Virt Preparation George Dunlap
  2024-03-13 12:24 ` [PATCH v2 1/3] x86: Move SVM features exposed to guest into hvm_max_cpu_policy George Dunlap
@ 2024-03-13 12:24 ` George Dunlap
  2024-03-18 13:56   ` Jan Beulich
  2024-03-13 12:24 ` [PATCH v2 3/3] svm/nestedsvm: Introduce nested capabilities bit George Dunlap
  2 siblings, 1 reply; 12+ messages in thread
From: George Dunlap @ 2024-03-13 12:24 UTC (permalink / raw)
  To: xen-devel
  Cc: George Dunlap, Jan Beulich, Andrew Cooper, Roger Pau Monné, Wei Liu

The primary purpose of TSC scaling, from our perspective, is to
maintain the fiction of an "invariant TSC" across migrates between
platforms with different clock speeds.

On AMD, the TscRateMSR CPUID bit is unconditionally enabled in the
"host cpuid", even if the hardware doesn't actually support it.
According to c/s fd14a1943c4 ("nestedsvm: Support TSC Rate MSR"),
testing showed that emulating TSC scaling in an L1 was more expensive
than emulating TSC scaling on an L0 (due to extra sets of vmexit /
vmenter).

However, the current implementation seems to be broken.

First of all, the final L2 scaling ratio should be a composition of
the L0 scaling ratio and the L1 scaling ratio; there's no indication
this is being done anywhere.

Secondly, it's not clear that the L1 tsc scaling ratio actually
affects the L0 tsc scaling ratio.  The stored value (ns_tscratio) is
used to affect the tsc *offset*, but doesn't seem to actually be
factored into d->hvm.tsc_scaling_ratio.  (Which shouldn't be
per-domain anyway, but per-vcpu.)  Having the *offset* scaled
according to the nested scaling without the actual RDTSC itself also
being scaled has got to produce inconsistent results.

For now, just disable the functionality entirely until we can
implement it properly:

- Don't set TSCRATEMSR in the host CPUID policy

- Remove MSR_AMD64_TSC_RATIO emulation handling, so that the guest
  guests a #GP if it tries to access them (as it should when
  TSCRATEMSR is clear)

- Remove ns_tscratio from struct nestedhvm, and all code that touches
  it

Unfortunately this means ripping out the scaling calculation stuff as
well, since it's only used in the nested case; it's there in the git
tree if we need it for reference when we re-introduce it.

Signed-off-by: George Dunlap <george.dunlap@cloud.com>
---
v2:
- Port over move to hvm_max_cpu_policy

CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: "Roger Pau Monné" <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
---
 xen/arch/x86/cpu-policy.c                    |  3 +-
 xen/arch/x86/hvm/svm/nestedsvm.c             |  2 -
 xen/arch/x86/hvm/svm/svm.c                   | 57 --------------------
 xen/arch/x86/include/asm/hvm/svm/nestedsvm.h |  5 --
 4 files changed, 1 insertion(+), 66 deletions(-)

diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c
index bd047456eb..5952ff20e6 100644
--- a/xen/arch/x86/cpu-policy.c
+++ b/xen/arch/x86/cpu-policy.c
@@ -741,8 +741,7 @@ static void __init calculate_hvm_max_policy(void)
                                (1u << SVM_FEATURE_PAUSEFILTER) |
                                (1u << SVM_FEATURE_DECODEASSISTS));
         /* Enable features which are always emulated. */
-        p->extd.raw[0xa].d |= ((1u << SVM_FEATURE_VMCBCLEAN) |
-                               (1u << SVM_FEATURE_TSCRATEMSR));
+        p->extd.raw[0xa].d |= (1u << SVM_FEATURE_VMCBCLEAN);
     }
     
     guest_common_max_feature_adjustments(fs);
diff --git a/xen/arch/x86/hvm/svm/nestedsvm.c b/xen/arch/x86/hvm/svm/nestedsvm.c
index e4e01add8c..a5319ab729 100644
--- a/xen/arch/x86/hvm/svm/nestedsvm.c
+++ b/xen/arch/x86/hvm/svm/nestedsvm.c
@@ -146,8 +146,6 @@ int cf_check nsvm_vcpu_reset(struct vcpu *v)
     svm->ns_msr_hsavepa = INVALID_PADDR;
     svm->ns_ovvmcb_pa = INVALID_PADDR;
 
-    svm->ns_tscratio = DEFAULT_TSC_RATIO;
-
     svm->ns_cr_intercepts = 0;
     svm->ns_dr_intercepts = 0;
     svm->ns_exception_intercepts = 0;
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index b551eac807..34b9f603bc 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -777,43 +777,6 @@ static int cf_check svm_get_guest_pat(struct vcpu *v, u64 *gpat)
     return 1;
 }
 
-static uint64_t scale_tsc(uint64_t host_tsc, uint64_t ratio)
-{
-    uint64_t mult, frac, scaled_host_tsc;
-
-    if ( ratio == DEFAULT_TSC_RATIO )
-        return host_tsc;
-
-    /*
-     * Suppose the most significant 32 bits of host_tsc and ratio are
-     * tsc_h and mult, and the least 32 bits of them are tsc_l and frac,
-     * then
-     *     host_tsc * ratio * 2^-32
-     *     = host_tsc * (mult * 2^32 + frac) * 2^-32
-     *     = host_tsc * mult + (tsc_h * 2^32 + tsc_l) * frac * 2^-32
-     *     = host_tsc * mult + tsc_h * frac + ((tsc_l * frac) >> 32)
-     *
-     * Multiplications in the last two terms are between 32-bit integers,
-     * so both of them can fit in 64-bit integers.
-     *
-     * Because mult is usually less than 10 in practice, it's very rare
-     * that host_tsc * mult can overflow a 64-bit integer.
-     */
-    mult = ratio >> 32;
-    frac = ratio & ((1ULL << 32) - 1);
-    scaled_host_tsc  = host_tsc * mult;
-    scaled_host_tsc += (host_tsc >> 32) * frac;
-    scaled_host_tsc += ((host_tsc & ((1ULL << 32) - 1)) * frac) >> 32;
-
-    return scaled_host_tsc;
-}
-
-static uint64_t svm_get_tsc_offset(uint64_t host_tsc, uint64_t guest_tsc,
-    uint64_t ratio)
-{
-    return guest_tsc - scale_tsc(host_tsc, ratio);
-}
-
 static void cf_check svm_set_tsc_offset(struct vcpu *v, u64 offset, u64 at_tsc)
 {
     struct vmcb_struct *vmcb = v->arch.hvm.svm.vmcb;
@@ -832,18 +795,8 @@ static void cf_check svm_set_tsc_offset(struct vcpu *v, u64 offset, u64 at_tsc)
 
     if ( nestedhvm_vcpu_in_guestmode(v) )
     {
-        struct nestedsvm *svm = &vcpu_nestedsvm(v);
-
         n2_tsc_offset = vmcb_get_tsc_offset(n2vmcb) -
                         vmcb_get_tsc_offset(n1vmcb);
-        if ( svm->ns_tscratio != DEFAULT_TSC_RATIO )
-        {
-            uint64_t guest_tsc = hvm_get_guest_tsc_fixed(v, at_tsc);
-
-            n2_tsc_offset = svm_get_tsc_offset(guest_tsc,
-                                               guest_tsc + n2_tsc_offset,
-                                               svm->ns_tscratio);
-        }
         vmcb_set_tsc_offset(n1vmcb, offset);
     }
 
@@ -1921,10 +1874,6 @@ static int cf_check svm_msr_read_intercept(
         *msr_content = nsvm->ns_msr_hsavepa;
         break;
 
-    case MSR_AMD64_TSC_RATIO:
-        *msr_content = nsvm->ns_tscratio;
-        break;
-
     case MSR_AMD_OSVW_ID_LENGTH:
     case MSR_AMD_OSVW_STATUS:
         if ( !d->arch.cpuid->extd.osvw )
@@ -2103,12 +2052,6 @@ static int cf_check svm_msr_write_intercept(
             goto gpf;
         break;
 
-    case MSR_AMD64_TSC_RATIO:
-        if ( msr_content & TSC_RATIO_RSVD_BITS )
-            goto gpf;
-        nsvm->ns_tscratio = msr_content;
-        break;
-
     case MSR_IA32_MCx_MISC(4): /* Threshold register */
     case MSR_F10_MC4_MISC1 ... MSR_F10_MC4_MISC3:
         /*
diff --git a/xen/arch/x86/include/asm/hvm/svm/nestedsvm.h b/xen/arch/x86/include/asm/hvm/svm/nestedsvm.h
index 406fc082b1..45d658ad01 100644
--- a/xen/arch/x86/include/asm/hvm/svm/nestedsvm.h
+++ b/xen/arch/x86/include/asm/hvm/svm/nestedsvm.h
@@ -18,11 +18,6 @@ struct nestedsvm {
      */
     uint64_t ns_ovvmcb_pa;
 
-    /* virtual tscratio holding the value l1 guest writes to the
-     * MSR_AMD64_TSC_RATIO MSR.
-     */
-    uint64_t ns_tscratio;
-
     /* Cached real intercepts of the l2 guest */
     uint32_t ns_cr_intercepts;
     uint32_t ns_dr_intercepts;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 3/3] svm/nestedsvm: Introduce nested capabilities bit
  2024-03-13 12:24 [PATCH v2 0/3] AMD Nested Virt Preparation George Dunlap
  2024-03-13 12:24 ` [PATCH v2 1/3] x86: Move SVM features exposed to guest into hvm_max_cpu_policy George Dunlap
  2024-03-13 12:24 ` [PATCH v2 2/3] nestedsvm: Disable TscRateMSR George Dunlap
@ 2024-03-13 12:24 ` George Dunlap
  2024-03-18 14:17   ` Jan Beulich
  2 siblings, 1 reply; 12+ messages in thread
From: George Dunlap @ 2024-03-13 12:24 UTC (permalink / raw)
  To: xen-devel
  Cc: George Dunlap, Andrew Cooper, George Dunlap, Jan Beulich,
	Julien Grall, Stefano Stabellini, Wei Liu, Roger Pau Monné,
	Jun Nakajima, Kevin Tian

In order to make implementation and testing tractable, we will require
specific host functionality.  Add a nested_virt bit to hvm_funcs.caps,
and return an error if a domain is created with nested virt and this
bit isn't set.  Create VMX and SVM callbacks to be executed from
start_nested_svm(), which is guaranteed to execute after all
command-line options have been procesed.

For VMX, start with always enabling it if HAP is present; this
shouldn't change current behvior.

For SVM, require some basic functionality, adding a document
explaining the rationale.

NB that only SVM CPUID bits 0-7 have been considered.  Bits 10-16 may
be considered in a follow-up patch.

Signed-off-by: George Dunlap <george.dunlap@cloud.com>
---
v2:
 - Fixed typo in title
 - Added hvm_nested_virt_supported() def for !CONFIG_HVM
 - Rebased over previous changes
 - Tweak some wording in document
 - Require npt rather than nrips twice
 - Remove stray __init from header
 - Set caps.nested_virt from callback from nestedhvm_setup()

CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: George Dunlap <george.dunlap@citrix.com>
CC: Jan Beulich <jbeulich@suse.com>
CC: Julien Grall <julien@xen.org>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Wei Liu <wl@xen.org>
CC: "Roger Pau Monné" <roger.pau@citrix.com>
CC: Jun Nakajima <jun.nakajima@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
---
 docs/designs/nested-svm-cpu-features.md  | 111 +++++++++++++++++++++++
 xen/arch/x86/domain.c                    |   6 ++
 xen/arch/x86/hvm/nestedhvm.c             |  10 ++
 xen/arch/x86/hvm/svm/nestedsvm.c         |  14 +++
 xen/arch/x86/hvm/vmx/vvmx.c              |   8 ++
 xen/arch/x86/include/asm/hvm/hvm.h       |  16 +++-
 xen/arch/x86/include/asm/hvm/nestedhvm.h |   4 +
 7 files changed, 168 insertions(+), 1 deletion(-)

diff --git a/docs/designs/nested-svm-cpu-features.md b/docs/designs/nested-svm-cpu-features.md
new file mode 100644
index 0000000000..837a96df05
--- /dev/null
+++ b/docs/designs/nested-svm-cpu-features.md
@@ -0,0 +1,111 @@
+# Nested SVM (AMD) CPUID requirements
+
+The first step in making nested SVM production-ready is to make sure
+that all features are implemented and well-tested.  To make this
+tractable, we will initially be limiting the "supported" range of
+nested virt to a specific subset of host and guest features.  This
+document describes the criteria for deciding on features, and the
+rationale behind each feature.
+
+For AMD, all virtualization-related features can be found in CPUID
+leaf 8000000A:edx
+
+# Criteria
+
+- Processor support: At a minimum we want to support processors from
+  the last 5 years.  All things being equal, we'd prefer to cover
+  older processors than not.  Bits 0:7 were available in the very
+  earliest processors; and even through bit 15 we should be pretty
+  good support-wise.
+
+- Faithfulness to hardware: We need the behavior of the "virtual cpu"
+  from the L1 hypervisor's perspective to be as close as possible to
+  the original hardware.  In particular, the behavior of the hardware
+  on error paths 1) is not easy to understand or test, 2) can be the
+  source of surprising vulnerabiliies.  (See XSA-7 for an example of a
+  case where subtle error-handling differences can open up a privilege
+  escalation.)  We should avoid emulating any bit of the hardware with
+  complex error paths if we can at all help it.
+
+- Cost of implementation: We want to minimize the cost of
+  implementation (where this includes bringing an existing sub-par
+  implementation up to speed).  All things being equal, we'll favor a
+  configuration which does not require any new implementation.
+
+- Performance: All things being equal, we'd prefer to choose a set of
+  L0 / L1 CPUID bits that are faster than slower.
+
+
+# Bits
+
+- 0 `NP` *Nested Paging*: Required both for L0 and L1.
+
+  Based primarily on faithfulness and performance, as well as
+  potential cost of implementation.  Available on earliest hardware,
+  so no compatibility issues.
+
+- 1 `LbrVirt` *LBR / debugging virtualization*: Require for L0 and L1.
+
+  For L0 this is required for performance: There's no way to tell the
+  guests not to use the LBR-related registers; and if the guest does,
+  then you have to save and restore all LBR-related registers on
+  context switch, which is prohibitive.  Furthermore, the additional
+  emulation risks a security-relevant difference to come up.
+
+  Providing it to L1 when we have it in L0 is basically free, and
+  already implemented.
+
+  Just require it and provide it.
+
+- 2 `SVML` *SVM Lock*: Not required for L0, not provided to L1
+
+  Seems to be aboult enabling an operating system to prevent "blue
+  pill" attacks against itself.
+
+  Xen doesn't use it, nor provide it; so it would need to be
+  implementend.  The best way to protect a guest OS is to leave nested
+  virt disabled in the tools.
+
+- 3 `NRIPS` NRIP Save: Require for both L0 and L1
+
+  If NRIPS is not present, the software interrupt injection
+  functionality can't be used; and Xen has to emulate it.  That's
+  another source of potential security issues.  If hardware supports
+  it, then providing it to guest is basically free.
+
+- 4 `TscRateMsr`: Not required by L0, not provided to L1
+
+  The main putative use for this would be trying to maintain an
+  invariant TSC across cores with different clock speeds, or after a
+  migrate.  Unlike others, this doesn't have an error path to worry
+  about compatibility-wise; and according to tests done when nestedSVM
+  was first implemented, it's actually faster to emliate TscRateMSR in
+  the L0 hypervisor than for L1 to attempt to emulate it itself.
+
+  However, using this properly in L0 will take some implementation
+  effort; and composing it properly with L1 will take even more
+  effort.  Just leave it off for now.
+
+ - 5 `VmcbClean`: VMCB Clean Bits: Not required by L0, provide to L1
+
+  This is a pure optimization, both on the side of the L0 and L1.  The
+  implementaiton for L1 is entirely Xen-side, so can be provided even
+  on hardware that doesn't provide it.  And it's purely an
+  optimization, so could be "implemented" by ignoring the bits
+  entirely.
+
+  As such, we don't need to require it for L0; and as it's already
+  implemented, no reason not to provide it to L1.  Before this feature
+  was available those bits were marked SBZ ("should be zero"); setting
+  them was already advertised to cause unpredictable behavior.
+
+- 6 `FlushByAsid`: Require for L0, provide to L1
+
+  This is cheap and easy to use for L0 and to provide to the L1;
+  there's no reson not to just pass it through.
+
+- 7 `DecodeAssists`: Require for L0, provide to L1
+
+  Using it in L0 reduces the chance that we'll make some sort of error
+  in the decode path.  And if hardware supports it, it's easy enough
+  to provide to the L1.
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index bda853e3c9..a25f498265 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -673,6 +673,12 @@ int arch_sanitise_domain_config(struct xen_domctl_createdomain *config)
          */
         config->flags |= XEN_DOMCTL_CDF_oos_off;
 
+    if ( nested_virt && !hvm_nested_virt_supported() )
+    {
+        dprintk(XENLOG_INFO, "Nested virt requested but not available\n");
+        return -EINVAL;        
+    }
+
     if ( nested_virt && !hap )
     {
         dprintk(XENLOG_INFO, "Nested virt not supported without HAP\n");
diff --git a/xen/arch/x86/hvm/nestedhvm.c b/xen/arch/x86/hvm/nestedhvm.c
index 12bf7172b8..451c4da6d4 100644
--- a/xen/arch/x86/hvm/nestedhvm.c
+++ b/xen/arch/x86/hvm/nestedhvm.c
@@ -150,6 +150,16 @@ static int __init cf_check nestedhvm_setup(void)
     __clear_bit(0x80, shadow_io_bitmap[0]);
     __clear_bit(0xed, shadow_io_bitmap[1]);
 
+    /* 
+     * NB this must be called after all command-line processing has been
+     * done, so that if (for example) HAP is disabled, nested virt is
+     * disabled as well.
+     */
+    if ( cpu_has_vmx )
+        start_nested_vmx(&hvm_funcs);
+    else if ( cpu_has_svm )
+        start_nested_svm(&hvm_funcs);
+
     return 0;
 }
 __initcall(nestedhvm_setup);
diff --git a/xen/arch/x86/hvm/svm/nestedsvm.c b/xen/arch/x86/hvm/svm/nestedsvm.c
index a5319ab729..ad2e9f5c35 100644
--- a/xen/arch/x86/hvm/svm/nestedsvm.c
+++ b/xen/arch/x86/hvm/svm/nestedsvm.c
@@ -1666,3 +1666,17 @@ void svm_nested_features_on_efer_update(struct vcpu *v)
         }
     }
 }
+
+void __init start_nested_svm(struct hvm_function_table *hvm_function_table)
+{
+    /* 
+     * Required host functionality to support nested virt.  See
+     * docs/designs/nested-svm-cpu-features.md for rationale.
+     */
+    hvm_function_table->caps.nested_virt =
+        hvm_function_table->caps.hap && 
+        cpu_has_svm_lbrv &&
+        cpu_has_svm_nrips &&
+        cpu_has_svm_flushbyasid &&
+        cpu_has_svm_decode;
+}
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index ece0aa243a..ed058d9d2b 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -2816,6 +2816,14 @@ void nvmx_set_cr_read_shadow(struct vcpu *v, unsigned int cr)
     __vmwrite(read_shadow_field, v->arch.hvm.nvcpu.guest_cr[cr]);
 }
 
+void __init start_nested_vmx(struct hvm_function_table *hvm_function_table)
+{
+    /* TODO: Require hardware support before enabling nested virt */
+    hvm_function_table->caps.nested_virt =
+        hvm_function_table->caps.hap;
+}
+
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/include/asm/hvm/hvm.h b/xen/arch/x86/include/asm/hvm/hvm.h
index 87a6935d97..e6f937fed7 100644
--- a/xen/arch/x86/include/asm/hvm/hvm.h
+++ b/xen/arch/x86/include/asm/hvm/hvm.h
@@ -97,7 +97,10 @@ struct hvm_function_table {
              singlestep:1,
             
              /* Hardware virtual interrupt delivery enable? */
-             virtual_intr_delivery:1;
+             virtual_intr_delivery:1,
+
+             /* Nested virt capabilities */
+             nested_virt:1;
     } caps;
 
     /*
@@ -654,6 +657,12 @@ static inline bool hvm_altp2m_supported(void)
     return hvm_funcs.caps.altp2m;
 }
 
+/* Returns true if we have the minimum hardware requirements for nested virt */
+static inline bool hvm_nested_virt_supported(void)
+{
+    return hvm_funcs.caps.nested_virt;
+}
+
 /* updates the current hardware p2m */
 static inline void altp2m_vcpu_update_p2m(struct vcpu *v)
 {
@@ -797,6 +806,11 @@ static inline bool hvm_hap_supported(void)
     return false;
 }
 
+static inline bool hvm_nested_virt_supported(void)
+{
+    return false;
+}
+
 static inline bool nhvm_vmcx_hap_enabled(const struct vcpu *v)
 {
     ASSERT_UNREACHABLE();
diff --git a/xen/arch/x86/include/asm/hvm/nestedhvm.h b/xen/arch/x86/include/asm/hvm/nestedhvm.h
index 56a2019e1b..0568acb25f 100644
--- a/xen/arch/x86/include/asm/hvm/nestedhvm.h
+++ b/xen/arch/x86/include/asm/hvm/nestedhvm.h
@@ -82,4 +82,8 @@ static inline bool vvmcx_valid(const struct vcpu *v)
     return vcpu_nestedhvm(v).nv_vvmcxaddr != INVALID_PADDR;
 }
 
+
+void start_nested_svm(struct hvm_function_table *);
+void start_nested_vmx(struct hvm_function_table *);
+
 #endif /* _HVM_NESTEDHVM_H */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/3] x86: Move SVM features exposed to guest into hvm_max_cpu_policy
  2024-03-13 12:24 ` [PATCH v2 1/3] x86: Move SVM features exposed to guest into hvm_max_cpu_policy George Dunlap
@ 2024-03-18 13:54   ` Jan Beulich
  0 siblings, 0 replies; 12+ messages in thread
From: Jan Beulich @ 2024-03-18 13:54 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel

On 13.03.2024 13:24, George Dunlap wrote:
> Currently (nested) SVM features we're willing to expose to the guest
> are defined in calculate_host_policy, and stored in host_cpu_policy.
> This is the wrong place for this; move it into
> calculate_hvm_max_policy(), and store it in hvm_max_cpu_policy.
> 
> Signed-off-by: George Dunlap <george.dunlap@cloud.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/3] nestedsvm: Disable TscRateMSR
  2024-03-13 12:24 ` [PATCH v2 2/3] nestedsvm: Disable TscRateMSR George Dunlap
@ 2024-03-18 13:56   ` Jan Beulich
  0 siblings, 0 replies; 12+ messages in thread
From: Jan Beulich @ 2024-03-18 13:56 UTC (permalink / raw)
  To: George Dunlap; +Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, xen-devel

On 13.03.2024 13:24, George Dunlap wrote:
> The primary purpose of TSC scaling, from our perspective, is to
> maintain the fiction of an "invariant TSC" across migrates between
> platforms with different clock speeds.
> 
> On AMD, the TscRateMSR CPUID bit is unconditionally enabled in the
> "host cpuid", even if the hardware doesn't actually support it.
> According to c/s fd14a1943c4 ("nestedsvm: Support TSC Rate MSR"),
> testing showed that emulating TSC scaling in an L1 was more expensive
> than emulating TSC scaling on an L0 (due to extra sets of vmexit /
> vmenter).
> 
> However, the current implementation seems to be broken.
> 
> First of all, the final L2 scaling ratio should be a composition of
> the L0 scaling ratio and the L1 scaling ratio; there's no indication
> this is being done anywhere.
> 
> Secondly, it's not clear that the L1 tsc scaling ratio actually
> affects the L0 tsc scaling ratio.  The stored value (ns_tscratio) is
> used to affect the tsc *offset*, but doesn't seem to actually be
> factored into d->hvm.tsc_scaling_ratio.  (Which shouldn't be
> per-domain anyway, but per-vcpu.)  Having the *offset* scaled
> according to the nested scaling without the actual RDTSC itself also
> being scaled has got to produce inconsistent results.
> 
> For now, just disable the functionality entirely until we can
> implement it properly:
> 
> - Don't set TSCRATEMSR in the host CPUID policy

"host" is stale here; it's "HVM max" now.

> - Remove MSR_AMD64_TSC_RATIO emulation handling, so that the guest
>   guests a #GP if it tries to access them (as it should when
>   TSCRATEMSR is clear)
> 
> - Remove ns_tscratio from struct nestedhvm, and all code that touches
>   it
> 
> Unfortunately this means ripping out the scaling calculation stuff as
> well, since it's only used in the nested case; it's there in the git
> tree if we need it for reference when we re-introduce it.
> 
> Signed-off-by: George Dunlap <george.dunlap@cloud.com>

Acked-by: Jan Beulich <jbeulich@suse.com>

Jan


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 3/3] svm/nestedsvm: Introduce nested capabilities bit
  2024-03-13 12:24 ` [PATCH v2 3/3] svm/nestedsvm: Introduce nested capabilities bit George Dunlap
@ 2024-03-18 14:17   ` Jan Beulich
  2024-03-27 17:01     ` George Dunlap
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2024-03-18 14:17 UTC (permalink / raw)
  To: George Dunlap
  Cc: Andrew Cooper, George Dunlap, Julien Grall, Stefano Stabellini,
	Wei Liu, Roger Pau Monné,
	Jun Nakajima, Kevin Tian, xen-devel

On 13.03.2024 13:24, George Dunlap wrote:
> In order to make implementation and testing tractable, we will require
> specific host functionality.  Add a nested_virt bit to hvm_funcs.caps,
> and return an error if a domain is created with nested virt and this
> bit isn't set.  Create VMX and SVM callbacks to be executed from
> start_nested_svm(), which is guaranteed to execute after all

Nit: nestedhvm_setup() (or, with different wording, start_nested_{svm,vmx}()).

> command-line options have been procesed.
> 
> For VMX, start with always enabling it if HAP is present; this
> shouldn't change current behvior.
> 
> For SVM, require some basic functionality, adding a document
> explaining the rationale.
> 
> NB that only SVM CPUID bits 0-7 have been considered.  Bits 10-16 may
> be considered in a follow-up patch.
> 
> Signed-off-by: George Dunlap <george.dunlap@cloud.com>
>[...]
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -673,6 +673,12 @@ int arch_sanitise_domain_config(struct xen_domctl_createdomain *config)
>           */
>          config->flags |= XEN_DOMCTL_CDF_oos_off;
>  
> +    if ( nested_virt && !hvm_nested_virt_supported() )
> +    {
> +        dprintk(XENLOG_INFO, "Nested virt requested but not available\n");
> +        return -EINVAL;        
> +    }
> +
>      if ( nested_virt && !hap )
>      {
>          dprintk(XENLOG_INFO, "Nested virt not supported without HAP\n");

As mentioned in reply to v1, I think what both start_nested_{svm,vmx}() check
is redundant with this latter check. I think that check isn't necessary there
(yet unlike suggested in reply to v1 I don't think anymore that the check here
can alternatively be dropped). And even if it was to be kept for some reason,
I'm having some difficulty seeing why vendor code needs to do that check, when
nestedhvm_setup() could do it for both of them.

> --- a/xen/arch/x86/hvm/nestedhvm.c
> +++ b/xen/arch/x86/hvm/nestedhvm.c
> @@ -150,6 +150,16 @@ static int __init cf_check nestedhvm_setup(void)
>      __clear_bit(0x80, shadow_io_bitmap[0]);
>      __clear_bit(0xed, shadow_io_bitmap[1]);
>  
> +    /* 
> +     * NB this must be called after all command-line processing has been
> +     * done, so that if (for example) HAP is disabled, nested virt is
> +     * disabled as well.
> +     */
> +    if ( cpu_has_vmx )
> +        start_nested_vmx(&hvm_funcs);
> +    else if ( cpu_has_svm )
> +        start_nested_svm(&hvm_funcs);

Instead of passing the pointer, couldn't you have the functions return
bool, and then set hvm_funcs.caps.nested_virt from that? Passing that
pointer looks somewhat odd to me.

Is there a reason to use direct calls here rather than a true hook
(seeing that hvm_funcs must have been populated by the time we make it
here)? I understand we're (remotely) considering to switch away from
using hooks at some point, but right now this feels somewhat
inconsistent if not further justified.

> --- a/xen/arch/x86/hvm/vmx/vvmx.c
> +++ b/xen/arch/x86/hvm/vmx/vvmx.c
> @@ -2816,6 +2816,14 @@ void nvmx_set_cr_read_shadow(struct vcpu *v, unsigned int cr)
>      __vmwrite(read_shadow_field, v->arch.hvm.nvcpu.guest_cr[cr]);
>  }
>  
> +void __init start_nested_vmx(struct hvm_function_table *hvm_function_table)
> +{
> +    /* TODO: Require hardware support before enabling nested virt */
> +    hvm_function_table->caps.nested_virt =
> +        hvm_function_table->caps.hap;
> +}
> +
> +

Nit: No double blank lines please.

Jan


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 3/3] svm/nestedsvm: Introduce nested capabilities bit
  2024-03-18 14:17   ` Jan Beulich
@ 2024-03-27 17:01     ` George Dunlap
  2024-03-28  6:44       ` Jan Beulich
  0 siblings, 1 reply; 12+ messages in thread
From: George Dunlap @ 2024-03-27 17:01 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Julien Grall, Stefano Stabellini, Wei Liu,
	Roger Pau Monné,
	Jun Nakajima, Kevin Tian, xen-devel

On Mon, Mar 18, 2024 at 2:17 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 13.03.2024 13:24, George Dunlap wrote:
> > In order to make implementation and testing tractable, we will require
> > specific host functionality.  Add a nested_virt bit to hvm_funcs.caps,
> > and return an error if a domain is created with nested virt and this
> > bit isn't set.  Create VMX and SVM callbacks to be executed from
> > start_nested_svm(), which is guaranteed to execute after all
>
> Nit: nestedhvm_setup() (or, with different wording, start_nested_{svm,vmx}()).

Oops, fixed.

> > --- a/xen/arch/x86/domain.c
> > +++ b/xen/arch/x86/domain.c
> > @@ -673,6 +673,12 @@ int arch_sanitise_domain_config(struct xen_domctl_createdomain *config)
> >           */
> >          config->flags |= XEN_DOMCTL_CDF_oos_off;
> >
> > +    if ( nested_virt && !hvm_nested_virt_supported() )
> > +    {
> > +        dprintk(XENLOG_INFO, "Nested virt requested but not available\n");
> > +        return -EINVAL;
> > +    }
> > +
> >      if ( nested_virt && !hap )
> >      {
> >          dprintk(XENLOG_INFO, "Nested virt not supported without HAP\n");
>
> As mentioned in reply to v1, I think what both start_nested_{svm,vmx}() check
> is redundant with this latter check. I think that check isn't necessary there
> (yet unlike suggested in reply to v1 I don't think anymore that the check here
> can alternatively be dropped). And even if it was to be kept for some reason,
> I'm having some difficulty seeing why vendor code needs to do that check, when
> nestedhvm_setup() could do it for both of them.

I'm having a bit of trouble resolving the antecedents in this
paragraph (what "this" and "there" are referring to).

Are you saying that we should set hvm_funcs.caps.nested_virt to 'true'
even if the hardware doesn't support HAP, because we check that here?

That seems like a very strange way to go about things; hvm_funcs.caps
should reflect the actual capabilities of the hardware.  Suppose at
some point we want to expose nested virt capability to the toolstack
-- wouldn't it make more sense to be able to just read
`hvm_funcs.caps.nested_virt`, rather than having to remember to && it
with `hvm_funcs.caps.hap`?

And as you say, you can't get rid of the HAP check here, because we
also want to deny nested virt if the admin deliberately created a
guest in shadow mode on a system that has HAP available.  So it's not
redundant: one is checking the capabilities of the system, the other
is checking the requested guest configuration.

As to why to have each vendor independent code check for HAP -- there
are in fact two implementations of the code; it's nice to be able to
look in one place for each implementation to determine the
requirements.  Additionally, it would be possible, in the future, for
one of the nested virt implementations to enable shadow mode, while
the other one didn't.  The current arrangement makes that easy.

> > --- a/xen/arch/x86/hvm/nestedhvm.c
> > +++ b/xen/arch/x86/hvm/nestedhvm.c
> > @@ -150,6 +150,16 @@ static int __init cf_check nestedhvm_setup(void)
> >      __clear_bit(0x80, shadow_io_bitmap[0]);
> >      __clear_bit(0xed, shadow_io_bitmap[1]);
> >
> > +    /*
> > +     * NB this must be called after all command-line processing has been
> > +     * done, so that if (for example) HAP is disabled, nested virt is
> > +     * disabled as well.
> > +     */
> > +    if ( cpu_has_vmx )
> > +        start_nested_vmx(&hvm_funcs);
> > +    else if ( cpu_has_svm )
> > +        start_nested_svm(&hvm_funcs);
>
> Instead of passing the pointer, couldn't you have the functions return
> bool, and then set hvm_funcs.caps.nested_virt from that? Passing that
> pointer looks somewhat odd to me.

For one, it makes start_nested_XXX symmetric to start_XXX, which also
has access to the full hvm_funcs structure (albeit in the former case
because it's creating the structure).  For two, both of them need to
read caps.hap.  For three, it's just more flexible -- there may be
future things that we want to modify in the start_nested_*()
functions.  If we did as you suggest, and then added (say)
caps.nested_virt_needs_hap, then we'd either need to add a second
function, or refactor it to look like this.

> Is there a reason to use direct calls here rather than a true hook
> (seeing that hvm_funcs must have been populated by the time we make it
> here)? I understand we're (remotely) considering to switch away from
> using hooks at some point, but right now this feels somewhat
> inconsistent if not further justified.

Again it mimics the calls to start_vmx/svm in hvm_enable.  But I could
look at adding a function pointer to see if it works.

 -George


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 3/3] svm/nestedsvm: Introduce nested capabilities bit
  2024-03-27 17:01     ` George Dunlap
@ 2024-03-28  6:44       ` Jan Beulich
  2024-03-28 10:57         ` George Dunlap
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2024-03-28  6:44 UTC (permalink / raw)
  To: George Dunlap, Andrew Cooper
  Cc: Julien Grall, Stefano Stabellini, Wei Liu, Roger Pau Monné,
	Jun Nakajima, Kevin Tian, xen-devel

On 27.03.2024 18:01, George Dunlap wrote:
> On Mon, Mar 18, 2024 at 2:17 PM Jan Beulich <jbeulich@suse.com> wrote:
>> On 13.03.2024 13:24, George Dunlap wrote:
>>> --- a/xen/arch/x86/domain.c
>>> +++ b/xen/arch/x86/domain.c
>>> @@ -673,6 +673,12 @@ int arch_sanitise_domain_config(struct xen_domctl_createdomain *config)
>>>           */
>>>          config->flags |= XEN_DOMCTL_CDF_oos_off;
>>>
>>> +    if ( nested_virt && !hvm_nested_virt_supported() )
>>> +    {
>>> +        dprintk(XENLOG_INFO, "Nested virt requested but not available\n");
>>> +        return -EINVAL;
>>> +    }
>>> +
>>>      if ( nested_virt && !hap )
>>>      {
>>>          dprintk(XENLOG_INFO, "Nested virt not supported without HAP\n");
>>
>> As mentioned in reply to v1, I think what both start_nested_{svm,vmx}() check
>> is redundant with this latter check. I think that check isn't necessary there
>> (yet unlike suggested in reply to v1 I don't think anymore that the check here
>> can alternatively be dropped). And even if it was to be kept for some reason,
>> I'm having some difficulty seeing why vendor code needs to do that check, when
>> nestedhvm_setup() could do it for both of them.
> 
> I'm having a bit of trouble resolving the antecedents in this
> paragraph (what "this" and "there" are referring to).
> 
> Are you saying that we should set hvm_funcs.caps.nested_virt to 'true'
> even if the hardware doesn't support HAP, because we check that here?
> 
> That seems like a very strange way to go about things; hvm_funcs.caps
> should reflect the actual capabilities of the hardware.  Suppose at
> some point we want to expose nested virt capability to the toolstack
> -- wouldn't it make more sense to be able to just read
> `hvm_funcs.caps.nested_virt`, rather than having to remember to && it
> with `hvm_funcs.caps.hap`?
> 
> And as you say, you can't get rid of the HAP check here, because we
> also want to deny nested virt if the admin deliberately created a
> guest in shadow mode on a system that has HAP available.  So it's not
> redundant: one is checking the capabilities of the system, the other
> is checking the requested guest configuration.

Hmm, yes, you're right.

> As to why to have each vendor independent code check for HAP -- there
> are in fact two implementations of the code; it's nice to be able to
> look in one place for each implementation to determine the
> requirements.  Additionally, it would be possible, in the future, for
> one of the nested virt implementations to enable shadow mode, while
> the other one didn't.  The current arrangement makes that easy.

Well, first - how likely is this, when instead we've been considering
whether we could get rid of shadow mode? And then this is balancing
between ease of future changes vs avoiding redundancy where it can be
avoided. I'm not going to insist on centralizing the HAP check, but I
certainly wanted the option to be considered.

>>> --- a/xen/arch/x86/hvm/nestedhvm.c
>>> +++ b/xen/arch/x86/hvm/nestedhvm.c
>>> @@ -150,6 +150,16 @@ static int __init cf_check nestedhvm_setup(void)
>>>      __clear_bit(0x80, shadow_io_bitmap[0]);
>>>      __clear_bit(0xed, shadow_io_bitmap[1]);
>>>
>>> +    /*
>>> +     * NB this must be called after all command-line processing has been
>>> +     * done, so that if (for example) HAP is disabled, nested virt is
>>> +     * disabled as well.
>>> +     */
>>> +    if ( cpu_has_vmx )
>>> +        start_nested_vmx(&hvm_funcs);
>>> +    else if ( cpu_has_svm )
>>> +        start_nested_svm(&hvm_funcs);
>>
>> Instead of passing the pointer, couldn't you have the functions return
>> bool, and then set hvm_funcs.caps.nested_virt from that? Passing that
>> pointer looks somewhat odd to me.
> 
> For one, it makes start_nested_XXX symmetric to start_XXX, which also
> has access to the full hvm_funcs structure (albeit in the former case
> because it's creating the structure).

This last aspect is the crucial aspect: start{svm,vmx}() are indeed quite
special because of this. Everywhere else we have accessor helpers when
hvm_funcs needs consulting, e.g. ...

>  For two, both of them need to read caps.hap.

... caps _reads_ would want using (an) accessor(s), too.

>  For three, it's just more flexible -- there may be
> future things that we want to modify in the start_nested_*()
> functions.  If we did as you suggest, and then added (say)
> caps.nested_virt_needs_hap, then we'd either need to add a second
> function, or refactor it to look like this.

Right, I can see this as an argument when taking, as you do, speculation
on future needs into account. Albeit I'm then inclined to say that even
such modifications may better be done through accessor helpers.

>> Is there a reason to use direct calls here rather than a true hook
>> (seeing that hvm_funcs must have been populated by the time we make it
>> here)? I understand we're (remotely) considering to switch away from
>> using hooks at some point, but right now this feels somewhat
>> inconsistent if not further justified.
> 
> Again it mimics the calls to start_vmx/svm in hvm_enable.  But I could
> look at adding a function pointer to see if it works.

Andrew - do you have any input here towards those somewhat vague plans
of getting rid of the hook functions? I guess they're more relevant in
places where we can't easily use the altcall machinery (i.e. not here).

Jan


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 3/3] svm/nestedsvm: Introduce nested capabilities bit
  2024-03-28  6:44       ` Jan Beulich
@ 2024-03-28 10:57         ` George Dunlap
  2024-03-28 11:32           ` Jan Beulich
  2024-04-03  5:52           ` Jan Beulich
  0 siblings, 2 replies; 12+ messages in thread
From: George Dunlap @ 2024-03-28 10:57 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Julien Grall, Stefano Stabellini, Wei Liu,
	Roger Pau Monné,
	Jun Nakajima, Kevin Tian, xen-devel

On Thu, Mar 28, 2024 at 6:44 AM Jan Beulich <jbeulich@suse.com> wrote:
> > As to why to have each vendor independent code check for HAP -- there
> > are in fact two implementations of the code; it's nice to be able to
> > look in one place for each implementation to determine the
> > requirements.  Additionally, it would be possible, in the future, for
> > one of the nested virt implementations to enable shadow mode, while
> > the other one didn't.  The current arrangement makes that easy.
>
> Well, first - how likely is this, when instead we've been considering
> whether we could get rid of shadow mode? And then this is balancing
> between ease of future changes vs avoiding redundancy where it can be
> avoided. I'm not going to insist on centralizing the HAP check, but I
> certainly wanted the option to be considered.

I considered it before replying to you; so consider it considered. :-)

> >>> --- a/xen/arch/x86/hvm/nestedhvm.c
> >>> +++ b/xen/arch/x86/hvm/nestedhvm.c
> >>> @@ -150,6 +150,16 @@ static int __init cf_check nestedhvm_setup(void)
> >>>      __clear_bit(0x80, shadow_io_bitmap[0]);
> >>>      __clear_bit(0xed, shadow_io_bitmap[1]);
> >>>
> >>> +    /*
> >>> +     * NB this must be called after all command-line processing has been
> >>> +     * done, so that if (for example) HAP is disabled, nested virt is
> >>> +     * disabled as well.
> >>> +     */
> >>> +    if ( cpu_has_vmx )
> >>> +        start_nested_vmx(&hvm_funcs);
> >>> +    else if ( cpu_has_svm )
> >>> +        start_nested_svm(&hvm_funcs);
> >>
> >> Instead of passing the pointer, couldn't you have the functions return
> >> bool, and then set hvm_funcs.caps.nested_virt from that? Passing that
> >> pointer looks somewhat odd to me.
> >
> > For one, it makes start_nested_XXX symmetric to start_XXX, which also
> > has access to the full hvm_funcs structure (albeit in the former case
> > because it's creating the structure).
>
> This last aspect is the crucial aspect: start{svm,vmx}() are indeed quite
> special because of this. Everywhere else we have accessor helpers when
> hvm_funcs needs consulting, e.g. ...
>
> >  For two, both of them need to read caps.hap.
>
> ... caps _reads_ would want using (an) accessor(s), too.
>
> >  For three, it's just more flexible -- there may be
> > future things that we want to modify in the start_nested_*()
> > functions.  If we did as you suggest, and then added (say)
> > caps.nested_virt_needs_hap, then we'd either need to add a second
> > function, or refactor it to look like this.
>
> Right, I can see this as an argument when taking, as you do, speculation
> on future needs into account. Albeit I'm then inclined to say that even
> such modifications may better be done through accessor helpers.

Why access it through accessor helpers?

I consider that it's the SVM and VMX "construction/setup" code
respectively which "own" hvm_funcs (as evidenced by the fact that they
create the structures and then return them); and I consider the
start_nested_{vmx/svm} to be a part of the same code; so I consider it
valid for them to have direct access to the structure.

> >> Is there a reason to use direct calls here rather than a true hook
> >> (seeing that hvm_funcs must have been populated by the time we make it
> >> here)? I understand we're (remotely) considering to switch away from
> >> using hooks at some point, but right now this feels somewhat
> >> inconsistent if not further justified.
> >
> > Again it mimics the calls to start_vmx/svm in hvm_enable.  But I could
> > look at adding a function pointer to see if it works.
>
> Andrew - do you have any input here towards those somewhat vague plans
> of getting rid of the hook functions? I guess they're more relevant in
> places where we can't easily use the altcall machinery (i.e. not here).

Rather than do all that work collecting information, why don't we just
check it in as it is?  Obviously if we're thinking about getting rid
of hook functions at some point anyway, it can't be all that bad.
There is an aesthetic justification for the lack of hook, in that it's
similar to the start_vmx/svm() functions.

So far I really don't think the remaining "open" changes we're
discussing are worth either of our efforts.  I don't plan to make any
of these changes unless another x86 maintainer seconds your request
(or you can convince me they're worth making).

(The other two changes -- correcting the function name in the commit
message, and removing the extra blank line -- I've already changed
locally, so could check in with your ack.)

 -George


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 3/3] svm/nestedsvm: Introduce nested capabilities bit
  2024-03-28 10:57         ` George Dunlap
@ 2024-03-28 11:32           ` Jan Beulich
  2024-04-03  5:52           ` Jan Beulich
  1 sibling, 0 replies; 12+ messages in thread
From: Jan Beulich @ 2024-03-28 11:32 UTC (permalink / raw)
  To: George Dunlap
  Cc: Andrew Cooper, Julien Grall, Stefano Stabellini, Wei Liu,
	Roger Pau Monné,
	Jun Nakajima, Kevin Tian, xen-devel

On 28.03.2024 11:57, George Dunlap wrote:
> On Thu, Mar 28, 2024 at 6:44 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>> --- a/xen/arch/x86/hvm/nestedhvm.c
>>>>> +++ b/xen/arch/x86/hvm/nestedhvm.c
>>>>> @@ -150,6 +150,16 @@ static int __init cf_check nestedhvm_setup(void)
>>>>>      __clear_bit(0x80, shadow_io_bitmap[0]);
>>>>>      __clear_bit(0xed, shadow_io_bitmap[1]);
>>>>>
>>>>> +    /*
>>>>> +     * NB this must be called after all command-line processing has been
>>>>> +     * done, so that if (for example) HAP is disabled, nested virt is
>>>>> +     * disabled as well.
>>>>> +     */
>>>>> +    if ( cpu_has_vmx )
>>>>> +        start_nested_vmx(&hvm_funcs);
>>>>> +    else if ( cpu_has_svm )
>>>>> +        start_nested_svm(&hvm_funcs);
>>>>
>>>> Instead of passing the pointer, couldn't you have the functions return
>>>> bool, and then set hvm_funcs.caps.nested_virt from that? Passing that
>>>> pointer looks somewhat odd to me.
>>>
>>> For one, it makes start_nested_XXX symmetric to start_XXX, which also
>>> has access to the full hvm_funcs structure (albeit in the former case
>>> because it's creating the structure).
>>
>> This last aspect is the crucial aspect: start{svm,vmx}() are indeed quite
>> special because of this. Everywhere else we have accessor helpers when
>> hvm_funcs needs consulting, e.g. ...
>>
>>>  For two, both of them need to read caps.hap.
>>
>> ... caps _reads_ would want using (an) accessor(s), too.
>>
>>>  For three, it's just more flexible -- there may be
>>> future things that we want to modify in the start_nested_*()
>>> functions.  If we did as you suggest, and then added (say)
>>> caps.nested_virt_needs_hap, then we'd either need to add a second
>>> function, or refactor it to look like this.
>>
>> Right, I can see this as an argument when taking, as you do, speculation
>> on future needs into account. Albeit I'm then inclined to say that even
>> such modifications may better be done through accessor helpers.
> 
> Why access it through accessor helpers?
> 
> I consider that it's the SVM and VMX "construction/setup" code
> respectively which "own" hvm_funcs (as evidenced by the fact that they
> create the structures and then return them); and I consider the
> start_nested_{vmx/svm} to be a part of the same code; so I consider it
> valid for them to have direct access to the structure.

That's one way of looking at it, yes. However, to me neither SVM nor VMX
code directly fiddle with hvm_funcs in the way you describe it. Instead
they return a pointer to their respective instances of struct
hvm_function_table. If the nested startup code would similarly alter
private struct instances of some sort, all would be fine.

May I suggest that you grep for hvm_funcs in what is there right now. You'll
find solely vmcs.c having a few uses of .caps, which likely are wrong (as in:
layering violations), too.

That said I can accept that the situation is slightly different here, when
generic code passes a pointer to the nested startup functions. To me at
least it still feels layering-violation-ish.

Jan


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 3/3] svm/nestedsvm: Introduce nested capabilities bit
  2024-03-28 10:57         ` George Dunlap
  2024-03-28 11:32           ` Jan Beulich
@ 2024-04-03  5:52           ` Jan Beulich
  1 sibling, 0 replies; 12+ messages in thread
From: Jan Beulich @ 2024-04-03  5:52 UTC (permalink / raw)
  To: George Dunlap
  Cc: Andrew Cooper, Julien Grall, Stefano Stabellini, Wei Liu,
	Roger Pau Monné,
	Jun Nakajima, Kevin Tian, xen-devel

On 28.03.2024 11:57, George Dunlap wrote:
> On Thu, Mar 28, 2024 at 6:44 AM Jan Beulich <jbeulich@suse.com> wrote:
>>> As to why to have each vendor independent code check for HAP -- there
>>> are in fact two implementations of the code; it's nice to be able to
>>> look in one place for each implementation to determine the
>>> requirements.  Additionally, it would be possible, in the future, for
>>> one of the nested virt implementations to enable shadow mode, while
>>> the other one didn't.  The current arrangement makes that easy.
>>
>> Well, first - how likely is this, when instead we've been considering
>> whether we could get rid of shadow mode? And then this is balancing
>> between ease of future changes vs avoiding redundancy where it can be
>> avoided. I'm not going to insist on centralizing the HAP check, but I
>> certainly wanted the option to be considered.
> 
> I considered it before replying to you; so consider it considered. :-)
> 
>>>>> --- a/xen/arch/x86/hvm/nestedhvm.c
>>>>> +++ b/xen/arch/x86/hvm/nestedhvm.c
>>>>> @@ -150,6 +150,16 @@ static int __init cf_check nestedhvm_setup(void)
>>>>>      __clear_bit(0x80, shadow_io_bitmap[0]);
>>>>>      __clear_bit(0xed, shadow_io_bitmap[1]);
>>>>>
>>>>> +    /*
>>>>> +     * NB this must be called after all command-line processing has been
>>>>> +     * done, so that if (for example) HAP is disabled, nested virt is
>>>>> +     * disabled as well.
>>>>> +     */
>>>>> +    if ( cpu_has_vmx )
>>>>> +        start_nested_vmx(&hvm_funcs);
>>>>> +    else if ( cpu_has_svm )
>>>>> +        start_nested_svm(&hvm_funcs);
>>>>
>>>> Instead of passing the pointer, couldn't you have the functions return
>>>> bool, and then set hvm_funcs.caps.nested_virt from that? Passing that
>>>> pointer looks somewhat odd to me.
>>>
>>> For one, it makes start_nested_XXX symmetric to start_XXX, which also
>>> has access to the full hvm_funcs structure (albeit in the former case
>>> because it's creating the structure).
>>
>> This last aspect is the crucial aspect: start{svm,vmx}() are indeed quite
>> special because of this. Everywhere else we have accessor helpers when
>> hvm_funcs needs consulting, e.g. ...
>>
>>>  For two, both of them need to read caps.hap.
>>
>> ... caps _reads_ would want using (an) accessor(s), too.
>>
>>>  For three, it's just more flexible -- there may be
>>> future things that we want to modify in the start_nested_*()
>>> functions.  If we did as you suggest, and then added (say)
>>> caps.nested_virt_needs_hap, then we'd either need to add a second
>>> function, or refactor it to look like this.
>>
>> Right, I can see this as an argument when taking, as you do, speculation
>> on future needs into account. Albeit I'm then inclined to say that even
>> such modifications may better be done through accessor helpers.
> 
> Why access it through accessor helpers?
> 
> I consider that it's the SVM and VMX "construction/setup" code
> respectively which "own" hvm_funcs (as evidenced by the fact that they
> create the structures and then return them); and I consider the
> start_nested_{vmx/svm} to be a part of the same code; so I consider it
> valid for them to have direct access to the structure.
> 
>>>> Is there a reason to use direct calls here rather than a true hook
>>>> (seeing that hvm_funcs must have been populated by the time we make it
>>>> here)? I understand we're (remotely) considering to switch away from
>>>> using hooks at some point, but right now this feels somewhat
>>>> inconsistent if not further justified.
>>>
>>> Again it mimics the calls to start_vmx/svm in hvm_enable.  But I could
>>> look at adding a function pointer to see if it works.
>>
>> Andrew - do you have any input here towards those somewhat vague plans
>> of getting rid of the hook functions? I guess they're more relevant in
>> places where we can't easily use the altcall machinery (i.e. not here).
> 
> Rather than do all that work collecting information, why don't we just
> check it in as it is?  Obviously if we're thinking about getting rid
> of hook functions at some point anyway, it can't be all that bad.
> There is an aesthetic justification for the lack of hook, in that it's
> similar to the start_vmx/svm() functions.
> 
> So far I really don't think the remaining "open" changes we're
> discussing are worth either of our efforts.  I don't plan to make any
> of these changes unless another x86 maintainer seconds your request
> (or you can convince me they're worth making).
> 
> (The other two changes -- correcting the function name in the commit
> message, and removing the extra blank line -- I've already changed
> locally, so could check in with your ack.)

After some mumbling to myself over the holidays
Acked-by: Jan Beulich <jbeulich@suse.com>

Jan


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-04-03  5:52 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-13 12:24 [PATCH v2 0/3] AMD Nested Virt Preparation George Dunlap
2024-03-13 12:24 ` [PATCH v2 1/3] x86: Move SVM features exposed to guest into hvm_max_cpu_policy George Dunlap
2024-03-18 13:54   ` Jan Beulich
2024-03-13 12:24 ` [PATCH v2 2/3] nestedsvm: Disable TscRateMSR George Dunlap
2024-03-18 13:56   ` Jan Beulich
2024-03-13 12:24 ` [PATCH v2 3/3] svm/nestedsvm: Introduce nested capabilities bit George Dunlap
2024-03-18 14:17   ` Jan Beulich
2024-03-27 17:01     ` George Dunlap
2024-03-28  6:44       ` Jan Beulich
2024-03-28 10:57         ` George Dunlap
2024-03-28 11:32           ` Jan Beulich
2024-04-03  5:52           ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.