All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] x86/vmx: implement Bus Lock and VM Notify
@ 2022-05-26 11:11 Roger Pau Monne
  2022-05-26 11:11 ` [PATCH v2 1/3] x86/vmx: implement Bus Lock detection Roger Pau Monne
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Roger Pau Monne @ 2022-05-26 11:11 UTC (permalink / raw)
  To: xen-devel
  Cc: Roger Pau Monne, Jun Nakajima, Kevin Tian, Jan Beulich,
	Andrew Cooper, Wei Liu, George Dunlap, Julien Grall,
	Stefano Stabellini

Hello,

Following series implements support for bus lock and notify VM exit.

Patches are not really dependent, but I've developed them together by
virtue of both features being in Intel Instructions Set Extensions PR
Chapter 9.

Thanks, Roger.

Roger Pau Monne (3):
  x86/vmx: implement Bus Lock detection
  x86/vmx: introduce helper to set VMX_INTR_SHADOW_NMI
  x86/vmx: implement Notify VM Exit

 docs/misc/xen-command-line.pandoc       | 11 ++++
 xen/arch/x86/hvm/vmx/vmcs.c             | 23 +++++++-
 xen/arch/x86/hvm/vmx/vmx.c              | 78 ++++++++++++++++++++++---
 xen/arch/x86/include/asm/hvm/vmx/vmcs.h |  7 +++
 xen/arch/x86/include/asm/hvm/vmx/vmx.h  | 11 ++++
 xen/arch/x86/include/asm/perfc_defn.h   |  5 +-
 6 files changed, 124 insertions(+), 11 deletions(-)

-- 
2.36.0



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 1/3] x86/vmx: implement Bus Lock detection
  2022-05-26 11:11 [PATCH v2 0/3] x86/vmx: implement Bus Lock and VM Notify Roger Pau Monne
@ 2022-05-26 11:11 ` Roger Pau Monne
  2022-06-03 12:16   ` Jan Beulich
  2022-06-06 13:27   ` Andrew Cooper
  2022-05-26 11:11 ` [PATCH v2 2/3] x86/vmx: introduce helper to set VMX_INTR_SHADOW_NMI Roger Pau Monne
  2022-05-26 11:11 ` [PATCH v2 3/3] x86/vmx: implement Notify VM Exit Roger Pau Monne
  2 siblings, 2 replies; 21+ messages in thread
From: Roger Pau Monne @ 2022-05-26 11:11 UTC (permalink / raw)
  To: xen-devel
  Cc: Roger Pau Monne, Jun Nakajima, Kevin Tian, Jan Beulich,
	Andrew Cooper, Wei Liu

Add support for enabling Bus Lock Detection on Intel systems.  Such
detection works by triggering a vmexit, which ought to be enough of a
pause to prevent a guest from abusing of the Bus Lock.

Add an extra Xen perf counter to track the number of Bus Locks detected.
This is done because Bus Locks can also be reported by setting the bit
26 in the exit reason field, so also account for those.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v1:
 - Adjust commit message.
---
 xen/arch/x86/hvm/vmx/vmcs.c             |  4 +++-
 xen/arch/x86/hvm/vmx/vmx.c              | 18 ++++++++++++++++++
 xen/arch/x86/include/asm/hvm/vmx/vmcs.h |  3 +++
 xen/arch/x86/include/asm/hvm/vmx/vmx.h  |  2 ++
 xen/arch/x86/include/asm/perfc_defn.h   |  4 +++-
 5 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 56fed2db03..d388e6729c 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -209,6 +209,7 @@ static void __init vmx_display_features(void)
     P(cpu_has_vmx_virt_exceptions, "Virtualisation Exceptions");
     P(cpu_has_vmx_pml, "Page Modification Logging");
     P(cpu_has_vmx_tsc_scaling, "TSC Scaling");
+    P(cpu_has_vmx_bus_lock_detection, "Bus Lock Detection");
 #undef P
 
     if ( !printed )
@@ -318,7 +319,8 @@ static int vmx_init_vmcs_config(bool bsp)
                SECONDARY_EXEC_ENABLE_VM_FUNCTIONS |
                SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS |
                SECONDARY_EXEC_XSAVES |
-               SECONDARY_EXEC_TSC_SCALING);
+               SECONDARY_EXEC_TSC_SCALING |
+               SECONDARY_EXEC_BUS_LOCK_DETECTION);
         if ( _vmx_misc_cap & VMX_MISC_VMWRITE_ALL )
             opt |= SECONDARY_EXEC_ENABLE_VMCS_SHADOWING;
         if ( opt_vpid_enabled )
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index f08a00dcbb..476ab72463 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -4065,6 +4065,16 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
 
     if ( unlikely(exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY) )
         return vmx_failed_vmentry(exit_reason, regs);
+    if ( unlikely(exit_reason & VMX_EXIT_REASONS_BUS_LOCK) )
+    {
+        /*
+         * Delivery of Bus Lock VM exit was pre-empted by a higher priority VM
+         * exit.
+         */
+        exit_reason &= ~VMX_EXIT_REASONS_BUS_LOCK;
+        if ( exit_reason != EXIT_REASON_BUS_LOCK )
+            perfc_incr(buslock);
+    }
 
     if ( v->arch.hvm.vmx.vmx_realmode )
     {
@@ -4561,6 +4571,14 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
         vmx_handle_descriptor_access(exit_reason);
         break;
 
+    case EXIT_REASON_BUS_LOCK:
+        perfc_incr(buslock);
+        /*
+         * Nothing to do: just taking a vmexit should be enough of a pause to
+         * prevent a VM from crippling the host with bus locks.
+         */
+        break;
+
     case EXIT_REASON_VMX_PREEMPTION_TIMER_EXPIRED:
     case EXIT_REASON_INVPCID:
     /* fall through */
diff --git a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h
index 9119aa8536..5d3edc1642 100644
--- a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h
+++ b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h
@@ -266,6 +266,7 @@ extern u32 vmx_vmentry_control;
 #define SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS   0x00040000
 #define SECONDARY_EXEC_XSAVES                   0x00100000
 #define SECONDARY_EXEC_TSC_SCALING              0x02000000
+#define SECONDARY_EXEC_BUS_LOCK_DETECTION       0x40000000
 extern u32 vmx_secondary_exec_control;
 
 #define VMX_EPT_EXEC_ONLY_SUPPORTED                         0x00000001
@@ -345,6 +346,8 @@ extern u64 vmx_ept_vpid_cap;
     (vmx_secondary_exec_control & SECONDARY_EXEC_XSAVES)
 #define cpu_has_vmx_tsc_scaling \
     (vmx_secondary_exec_control & SECONDARY_EXEC_TSC_SCALING)
+#define cpu_has_vmx_bus_lock_detection \
+    (vmx_secondary_exec_control & SECONDARY_EXEC_BUS_LOCK_DETECTION)
 
 #define VMCS_RID_TYPE_MASK              0x80000000
 
diff --git a/xen/arch/x86/include/asm/hvm/vmx/vmx.h b/xen/arch/x86/include/asm/hvm/vmx/vmx.h
index 8eedf59155..03995701a1 100644
--- a/xen/arch/x86/include/asm/hvm/vmx/vmx.h
+++ b/xen/arch/x86/include/asm/hvm/vmx/vmx.h
@@ -159,6 +159,7 @@ static inline void pi_clear_sn(struct pi_desc *pi_desc)
  * Exit Reasons
  */
 #define VMX_EXIT_REASONS_FAILED_VMENTRY 0x80000000
+#define VMX_EXIT_REASONS_BUS_LOCK       (1u << 26)
 
 #define EXIT_REASON_EXCEPTION_NMI       0
 #define EXIT_REASON_EXTERNAL_INTERRUPT  1
@@ -219,6 +220,7 @@ static inline void pi_clear_sn(struct pi_desc *pi_desc)
 #define EXIT_REASON_PML_FULL            62
 #define EXIT_REASON_XSAVES              63
 #define EXIT_REASON_XRSTORS             64
+#define EXIT_REASON_BUS_LOCK            74
 /* Remember to also update VMX_PERF_EXIT_REASON_SIZE! */
 
 /*
diff --git a/xen/arch/x86/include/asm/perfc_defn.h b/xen/arch/x86/include/asm/perfc_defn.h
index b07063b7d8..d6eb661940 100644
--- a/xen/arch/x86/include/asm/perfc_defn.h
+++ b/xen/arch/x86/include/asm/perfc_defn.h
@@ -6,7 +6,7 @@ PERFCOUNTER_ARRAY(exceptions,           "exceptions", 32)
 
 #ifdef CONFIG_HVM
 
-#define VMX_PERF_EXIT_REASON_SIZE 65
+#define VMX_PERF_EXIT_REASON_SIZE 75
 #define VMEXIT_NPF_PERFC 143
 #define SVM_PERF_EXIT_REASON_SIZE (VMEXIT_NPF_PERFC + 1)
 PERFCOUNTER_ARRAY(vmexits,              "vmexits",
@@ -125,4 +125,6 @@ PERFCOUNTER(realmode_exits,      "vmexits from realmode")
 
 PERFCOUNTER(pauseloop_exits, "vmexits from Pause-Loop Detection")
 
+PERFCOUNTER(buslock, "Bus Locks Detected")
+
 /*#endif*/ /* __XEN_PERFC_DEFN_H__ */
-- 
2.36.0



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 2/3] x86/vmx: introduce helper to set VMX_INTR_SHADOW_NMI
  2022-05-26 11:11 [PATCH v2 0/3] x86/vmx: implement Bus Lock and VM Notify Roger Pau Monne
  2022-05-26 11:11 ` [PATCH v2 1/3] x86/vmx: implement Bus Lock detection Roger Pau Monne
@ 2022-05-26 11:11 ` Roger Pau Monne
  2022-06-03 12:20   ` Jan Beulich
  2022-05-26 11:11 ` [PATCH v2 3/3] x86/vmx: implement Notify VM Exit Roger Pau Monne
  2 siblings, 1 reply; 21+ messages in thread
From: Roger Pau Monne @ 2022-05-26 11:11 UTC (permalink / raw)
  To: xen-devel
  Cc: Roger Pau Monne, Jun Nakajima, Kevin Tian, Jan Beulich,
	Andrew Cooper, Wei Liu

Introduce a small helper to OR VMX_INTR_SHADOW_NMI in
GUEST_INTERRUPTIBILITY_INFO in order to help dealing with the NMI
unblocked by IRET case.  Replace the existing usage in handling
EXIT_REASON_EXCEPTION_NMI and also add such handling to EPT violations
and page-modification log-full events.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v1:
 - New in this version.
---
 xen/arch/x86/hvm/vmx/vmx.c             | 28 +++++++++++++++++++-------
 xen/arch/x86/include/asm/hvm/vmx/vmx.h |  3 +++
 2 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 476ab72463..69980c8e31 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -3935,6 +3935,15 @@ static int vmx_handle_apic_write(void)
     return vlapic_apicv_write(current, exit_qualification & 0xfff);
 }
 
+static void undo_nmis_unblocked_by_iret(void)
+{
+    unsigned long guest_info;
+
+    __vmread(GUEST_INTERRUPTIBILITY_INFO, &guest_info);
+    __vmwrite(GUEST_INTERRUPTIBILITY_INFO,
+              guest_info | VMX_INTR_SHADOW_NMI);
+}
+
 void vmx_vmexit_handler(struct cpu_user_regs *regs)
 {
     unsigned long exit_qualification, exit_reason, idtv_info, intr_info = 0;
@@ -4139,13 +4148,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
         if ( unlikely(intr_info & INTR_INFO_NMI_UNBLOCKED_BY_IRET) &&
              !(idtv_info & INTR_INFO_VALID_MASK) &&
              (vector != TRAP_double_fault) )
-        {
-            unsigned long guest_info;
-
-            __vmread(GUEST_INTERRUPTIBILITY_INFO, &guest_info);
-            __vmwrite(GUEST_INTERRUPTIBILITY_INFO,
-                      guest_info | VMX_INTR_SHADOW_NMI);
-        }
+            undo_nmis_unblocked_by_iret();
 
         perfc_incra(cause_vector, vector);
 
@@ -4511,6 +4514,11 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
 
         __vmread(GUEST_PHYSICAL_ADDRESS, &gpa);
         __vmread(EXIT_QUALIFICATION, &exit_qualification);
+
+        if ( unlikely(exit_qualification & INTR_INFO_NMI_UNBLOCKED_BY_IRET) &&
+             !(idtv_info & INTR_INFO_VALID_MASK) )
+            undo_nmis_unblocked_by_iret();
+
         ept_handle_violation(exit_qualification, gpa);
         break;
     }
@@ -4555,6 +4563,12 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
         break;
 
     case EXIT_REASON_PML_FULL:
+        __vmread(EXIT_QUALIFICATION, &exit_qualification);
+
+        if ( unlikely(exit_qualification & INTR_INFO_NMI_UNBLOCKED_BY_IRET) &&
+             !(idtv_info & INTR_INFO_VALID_MASK) )
+            undo_nmis_unblocked_by_iret();
+
         vmx_vcpu_flush_pml_buffer(v);
         break;
 
diff --git a/xen/arch/x86/include/asm/hvm/vmx/vmx.h b/xen/arch/x86/include/asm/hvm/vmx/vmx.h
index 03995701a1..bc0caad6fb 100644
--- a/xen/arch/x86/include/asm/hvm/vmx/vmx.h
+++ b/xen/arch/x86/include/asm/hvm/vmx/vmx.h
@@ -225,6 +225,9 @@ static inline void pi_clear_sn(struct pi_desc *pi_desc)
 
 /*
  * Interruption-information format
+ *
+ * Note INTR_INFO_NMI_UNBLOCKED_BY_IRET is also used with Exit Qualification
+ * field under some circumstances.
  */
 #define INTR_INFO_VECTOR_MASK           0xff            /* 7:0 */
 #define INTR_INFO_INTR_TYPE_MASK        0x700           /* 10:8 */
-- 
2.36.0



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 3/3] x86/vmx: implement Notify VM Exit
  2022-05-26 11:11 [PATCH v2 0/3] x86/vmx: implement Bus Lock and VM Notify Roger Pau Monne
  2022-05-26 11:11 ` [PATCH v2 1/3] x86/vmx: implement Bus Lock detection Roger Pau Monne
  2022-05-26 11:11 ` [PATCH v2 2/3] x86/vmx: introduce helper to set VMX_INTR_SHADOW_NMI Roger Pau Monne
@ 2022-05-26 11:11 ` Roger Pau Monne
  2022-06-03 12:49   ` Jan Beulich
  2022-06-09  7:04   ` Tian, Kevin
  2 siblings, 2 replies; 21+ messages in thread
From: Roger Pau Monne @ 2022-05-26 11:11 UTC (permalink / raw)
  To: xen-devel
  Cc: Roger Pau Monne, Andrew Cooper, George Dunlap, Jan Beulich,
	Julien Grall, Stefano Stabellini, Wei Liu, Jun Nakajima,
	Kevin Tian

Under certain conditions guests can get the CPU stuck in an unbounded
loop without the possibility of an interrupt window to occur on
instruction boundary.  This was the case with the scenarios described
in XSA-156.

Make use of the Notify VM Exit mechanism, that will trigger a VM Exit
if no interrupt window occurs for a specified amount of time.  Note
that using the Notify VM Exit avoids having to trap #AC and #DB
exceptions, as Xen is guaranteed to get a VM Exit even if the guest
puts the CPU in a loop without an interrupt window, as such disable
the intercepts if the feature is available and enabled.

Setting the notify VM exit window to 0 is safe because there's a
threshold added by the hardware in order to have a sane window value.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v1:
 - Properly update debug state when using notify VM exit.
 - Reword commit message.
---
This change enables the notify VM exit by default, KVM however doesn't
seem to enable it by default, and there's the following note in the
commit message:

"- There's a possibility, however small, that a notify VM exit happens
   with VM_CONTEXT_INVALID set in exit qualification. In this case, the
   vcpu can no longer run. To avoid killing a well-behaved guest, set
   notify window as -1 to disable this feature by default."

It's not obviously clear to me whether the comment was meant to be:
"There's a possibility, however small, that a notify VM exit _wrongly_
happens with VM_CONTEXT_INVALID".

It's also not clear whether such wrong hardware behavior only affects
a specific set of hardware, in a way that we could avoid enabling
notify VM exit there.

There's a discussion in one of the Linux patches that 128K might be
the safer value in order to prevent false positives, but I have no
formal confirmation about this.  Maybe our Intel maintainers can
provide some more feedback on a suitable notify VM exit window
value.

I've tested with 0 (the proposed default in the patch) and I don't
seem to be able to trigger notify VM exits under normal guest
operation.  Note that even in that case the guest won't be destroyed
unless the context is corrupt.
---
 docs/misc/xen-command-line.pandoc       | 11 +++++++++
 xen/arch/x86/hvm/vmx/vmcs.c             | 19 +++++++++++++++
 xen/arch/x86/hvm/vmx/vmx.c              | 32 +++++++++++++++++++++++--
 xen/arch/x86/include/asm/hvm/vmx/vmcs.h |  4 ++++
 xen/arch/x86/include/asm/hvm/vmx/vmx.h  |  6 +++++
 xen/arch/x86/include/asm/perfc_defn.h   |  3 ++-
 6 files changed, 72 insertions(+), 3 deletions(-)

diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index 1dc7e1ca07..ccf8bf5806 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -2544,6 +2544,17 @@ guest will notify Xen that it has failed to acquire a spinlock.
 <major>, <minor> and <build> must be integers. The values will be
 encoded in guest CPUID 0x40000002 if viridian enlightenments are enabled.
 
+### vm-notify-window (Intel)
+> `= <integer>`
+
+> Default: `0`
+
+Specify the value of the VM Notify window used to detect locked VMs. Set to -1
+to disable the feature.  Value is in units of crystal clock cycles.
+
+Note the hardware might add a threshold to the provided value in order to make
+it safe, and hence using 0 is fine.
+
 ### vpid (Intel)
 > `= <boolean>`
 
diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index d388e6729c..6cb2c6c6b7 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -67,6 +67,9 @@ integer_param("ple_gap", ple_gap);
 static unsigned int __read_mostly ple_window = 4096;
 integer_param("ple_window", ple_window);
 
+static unsigned int __ro_after_init vm_notify_window;
+integer_param("vm-notify-window", vm_notify_window);
+
 static bool __read_mostly opt_ept_pml = true;
 static s8 __read_mostly opt_ept_ad = -1;
 int8_t __read_mostly opt_ept_exec_sp = -1;
@@ -210,6 +213,7 @@ static void __init vmx_display_features(void)
     P(cpu_has_vmx_pml, "Page Modification Logging");
     P(cpu_has_vmx_tsc_scaling, "TSC Scaling");
     P(cpu_has_vmx_bus_lock_detection, "Bus Lock Detection");
+    P(cpu_has_vmx_notify_vm_exiting, "Notify VM Exit");
 #undef P
 
     if ( !printed )
@@ -329,6 +333,8 @@ static int vmx_init_vmcs_config(bool bsp)
             opt |= SECONDARY_EXEC_UNRESTRICTED_GUEST;
         if ( opt_ept_pml )
             opt |= SECONDARY_EXEC_ENABLE_PML;
+        if ( vm_notify_window != ~0u )
+            opt |= SECONDARY_EXEC_NOTIFY_VM_EXITING;
 
         /*
          * "APIC Register Virtualization" and "Virtual Interrupt Delivery"
@@ -1333,6 +1339,19 @@ static int construct_vmcs(struct vcpu *v)
         rc = vmx_add_msr(v, MSR_FLUSH_CMD, FLUSH_CMD_L1D,
                          VMX_MSR_GUEST_LOADONLY);
 
+    if ( cpu_has_vmx_notify_vm_exiting )
+    {
+        __vmwrite(NOTIFY_WINDOW, vm_notify_window);
+        /*
+         * Disable #AC and #DB interception: by using VM Notify Xen is
+         * guaranteed to get a VM exit even if the guest manages to lock the
+         * CPU.
+         */
+        v->arch.hvm.vmx.exception_bitmap &= ~((1U << TRAP_debug) |
+                                              (1U << TRAP_alignment_check));
+        vmx_update_exception_bitmap(v);
+    }
+
  out:
     vmx_vmcs_exit(v);
 
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 69980c8e31..d3c1597b3e 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1419,10 +1419,19 @@ static void cf_check vmx_update_host_cr3(struct vcpu *v)
 
 void vmx_update_debug_state(struct vcpu *v)
 {
+    unsigned int mask = 1u << TRAP_int3;
+
+    if ( !cpu_has_monitor_trap_flag && cpu_has_vmx_notify_vm_exiting )
+        /*
+         * Only allow toggling TRAP_debug if notify VM exit is enabled, as
+         * unconditionally setting TRAP_debug is part of the XSA-156 fix.
+         */
+        mask |= 1u << TRAP_debug;
+
     if ( v->arch.hvm.debug_state_latch )
-        v->arch.hvm.vmx.exception_bitmap |= 1U << TRAP_int3;
+        v->arch.hvm.vmx.exception_bitmap |= mask;
     else
-        v->arch.hvm.vmx.exception_bitmap &= ~(1U << TRAP_int3);
+        v->arch.hvm.vmx.exception_bitmap &= ~mask;
 
     vmx_vmcs_enter(v);
     vmx_update_exception_bitmap(v);
@@ -4155,6 +4164,9 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
         switch ( vector )
         {
         case TRAP_debug:
+            if ( cpu_has_monitor_trap_flag && cpu_has_vmx_notify_vm_exiting )
+                goto exit_and_crash;
+
             /*
              * Updates DR6 where debugger can peek (See 3B 23.2.1,
              * Table 23-1, "Exit Qualification for Debug Exceptions").
@@ -4593,6 +4605,22 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
          */
         break;
 
+    case EXIT_REASON_NOTIFY:
+        __vmread(EXIT_QUALIFICATION, &exit_qualification);
+
+        if ( exit_qualification & NOTIFY_VM_CONTEXT_INVALID )
+        {
+            perfc_incr(vmnotify_crash);
+            gprintk(XENLOG_ERR, "invalid VM context after notify vmexit\n");
+            domain_crash(v->domain);
+            break;
+        }
+
+        if ( unlikely(exit_qualification & INTR_INFO_NMI_UNBLOCKED_BY_IRET) )
+            undo_nmis_unblocked_by_iret();
+
+        break;
+
     case EXIT_REASON_VMX_PREEMPTION_TIMER_EXPIRED:
     case EXIT_REASON_INVPCID:
     /* fall through */
diff --git a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h
index 5d3edc1642..0961eabf3f 100644
--- a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h
+++ b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h
@@ -267,6 +267,7 @@ extern u32 vmx_vmentry_control;
 #define SECONDARY_EXEC_XSAVES                   0x00100000
 #define SECONDARY_EXEC_TSC_SCALING              0x02000000
 #define SECONDARY_EXEC_BUS_LOCK_DETECTION       0x40000000
+#define SECONDARY_EXEC_NOTIFY_VM_EXITING        0x80000000
 extern u32 vmx_secondary_exec_control;
 
 #define VMX_EPT_EXEC_ONLY_SUPPORTED                         0x00000001
@@ -348,6 +349,8 @@ extern u64 vmx_ept_vpid_cap;
     (vmx_secondary_exec_control & SECONDARY_EXEC_TSC_SCALING)
 #define cpu_has_vmx_bus_lock_detection \
     (vmx_secondary_exec_control & SECONDARY_EXEC_BUS_LOCK_DETECTION)
+#define cpu_has_vmx_notify_vm_exiting \
+    (vmx_secondary_exec_control & SECONDARY_EXEC_NOTIFY_VM_EXITING)
 
 #define VMCS_RID_TYPE_MASK              0x80000000
 
@@ -455,6 +458,7 @@ enum vmcs_field {
     SECONDARY_VM_EXEC_CONTROL       = 0x0000401e,
     PLE_GAP                         = 0x00004020,
     PLE_WINDOW                      = 0x00004022,
+    NOTIFY_WINDOW                   = 0x00004024,
     VM_INSTRUCTION_ERROR            = 0x00004400,
     VM_EXIT_REASON                  = 0x00004402,
     VM_EXIT_INTR_INFO               = 0x00004404,
diff --git a/xen/arch/x86/include/asm/hvm/vmx/vmx.h b/xen/arch/x86/include/asm/hvm/vmx/vmx.h
index bc0caad6fb..e429de8541 100644
--- a/xen/arch/x86/include/asm/hvm/vmx/vmx.h
+++ b/xen/arch/x86/include/asm/hvm/vmx/vmx.h
@@ -221,6 +221,7 @@ static inline void pi_clear_sn(struct pi_desc *pi_desc)
 #define EXIT_REASON_XSAVES              63
 #define EXIT_REASON_XRSTORS             64
 #define EXIT_REASON_BUS_LOCK            74
+#define EXIT_REASON_NOTIFY              75
 /* Remember to also update VMX_PERF_EXIT_REASON_SIZE! */
 
 /*
@@ -236,6 +237,11 @@ static inline void pi_clear_sn(struct pi_desc *pi_desc)
 #define INTR_INFO_VALID_MASK            0x80000000      /* 31 */
 #define INTR_INFO_RESVD_BITS_MASK       0x7ffff000
 
+/*
+ * Exit Qualifications for NOTIFY VM EXIT
+ */
+#define NOTIFY_VM_CONTEXT_INVALID       1u
+
 /*
  * Exit Qualifications for MOV for Control Register Access
  */
diff --git a/xen/arch/x86/include/asm/perfc_defn.h b/xen/arch/x86/include/asm/perfc_defn.h
index d6eb661940..c6b601b729 100644
--- a/xen/arch/x86/include/asm/perfc_defn.h
+++ b/xen/arch/x86/include/asm/perfc_defn.h
@@ -6,7 +6,7 @@ PERFCOUNTER_ARRAY(exceptions,           "exceptions", 32)
 
 #ifdef CONFIG_HVM
 
-#define VMX_PERF_EXIT_REASON_SIZE 75
+#define VMX_PERF_EXIT_REASON_SIZE 76
 #define VMEXIT_NPF_PERFC 143
 #define SVM_PERF_EXIT_REASON_SIZE (VMEXIT_NPF_PERFC + 1)
 PERFCOUNTER_ARRAY(vmexits,              "vmexits",
@@ -126,5 +126,6 @@ PERFCOUNTER(realmode_exits,      "vmexits from realmode")
 PERFCOUNTER(pauseloop_exits, "vmexits from Pause-Loop Detection")
 
 PERFCOUNTER(buslock, "Bus Locks Detected")
+PERFCOUNTER(vmnotify_crash, "domains crashed by Notify VM Exit")
 
 /*#endif*/ /* __XEN_PERFC_DEFN_H__ */
-- 
2.36.0



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/3] x86/vmx: implement Bus Lock detection
  2022-05-26 11:11 ` [PATCH v2 1/3] x86/vmx: implement Bus Lock detection Roger Pau Monne
@ 2022-06-03 12:16   ` Jan Beulich
  2022-06-03 14:29     ` Roger Pau Monné
  2022-06-06 13:27   ` Andrew Cooper
  1 sibling, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2022-06-03 12:16 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Jun Nakajima, Kevin Tian, Andrew Cooper, Wei Liu, xen-devel

On 26.05.2022 13:11, Roger Pau Monne wrote:
> Add support for enabling Bus Lock Detection on Intel systems.  Such
> detection works by triggering a vmexit, which ought to be enough of a
> pause to prevent a guest from abusing of the Bus Lock.
> 
> Add an extra Xen perf counter to track the number of Bus Locks detected.
> This is done because Bus Locks can also be reported by setting the bit
> 26 in the exit reason field, so also account for those.
> 
> Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

This implements just the VMexit part of the feature - maybe the
title wants to reflect that? The vmx: tag could also mean there
is exposure to guests included for the #DB part of the feature.

Jan



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/3] x86/vmx: introduce helper to set VMX_INTR_SHADOW_NMI
  2022-05-26 11:11 ` [PATCH v2 2/3] x86/vmx: introduce helper to set VMX_INTR_SHADOW_NMI Roger Pau Monne
@ 2022-06-03 12:20   ` Jan Beulich
  0 siblings, 0 replies; 21+ messages in thread
From: Jan Beulich @ 2022-06-03 12:20 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Jun Nakajima, Kevin Tian, Andrew Cooper, Wei Liu, xen-devel

On 26.05.2022 13:11, Roger Pau Monne wrote:
> Introduce a small helper to OR VMX_INTR_SHADOW_NMI in
> GUEST_INTERRUPTIBILITY_INFO in order to help dealing with the NMI
> unblocked by IRET case.  Replace the existing usage in handling
> EXIT_REASON_EXCEPTION_NMI and also add such handling to EPT violations
> and page-modification log-full events.
> 
> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/3] x86/vmx: implement Notify VM Exit
  2022-05-26 11:11 ` [PATCH v2 3/3] x86/vmx: implement Notify VM Exit Roger Pau Monne
@ 2022-06-03 12:49   ` Jan Beulich
  2022-06-03 14:46     ` Roger Pau Monné
  2022-06-09  7:04   ` Tian, Kevin
  1 sibling, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2022-06-03 12:49 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Andrew Cooper, George Dunlap, Julien Grall, Stefano Stabellini,
	Wei Liu, Jun Nakajima, Kevin Tian, xen-devel

On 26.05.2022 13:11, Roger Pau Monne wrote:
> Under certain conditions guests can get the CPU stuck in an unbounded
> loop without the possibility of an interrupt window to occur on
> instruction boundary.  This was the case with the scenarios described
> in XSA-156.
> 
> Make use of the Notify VM Exit mechanism, that will trigger a VM Exit
> if no interrupt window occurs for a specified amount of time.  Note
> that using the Notify VM Exit avoids having to trap #AC and #DB
> exceptions, as Xen is guaranteed to get a VM Exit even if the guest
> puts the CPU in a loop without an interrupt window, as such disable
> the intercepts if the feature is available and enabled.
> 
> Setting the notify VM exit window to 0 is safe because there's a
> threshold added by the hardware in order to have a sane window value.
> 
> Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> Changes since v1:
>  - Properly update debug state when using notify VM exit.
>  - Reword commit message.
> ---
> This change enables the notify VM exit by default, KVM however doesn't
> seem to enable it by default, and there's the following note in the
> commit message:
> 
> "- There's a possibility, however small, that a notify VM exit happens
>    with VM_CONTEXT_INVALID set in exit qualification. In this case, the
>    vcpu can no longer run. To avoid killing a well-behaved guest, set
>    notify window as -1 to disable this feature by default."
> 
> It's not obviously clear to me whether the comment was meant to be:
> "There's a possibility, however small, that a notify VM exit _wrongly_
> happens with VM_CONTEXT_INVALID".
> 
> It's also not clear whether such wrong hardware behavior only affects
> a specific set of hardware, in a way that we could avoid enabling
> notify VM exit there.
> 
> There's a discussion in one of the Linux patches that 128K might be
> the safer value in order to prevent false positives, but I have no
> formal confirmation about this.  Maybe our Intel maintainers can
> provide some more feedback on a suitable notify VM exit window
> value.

This certainly wants sorting one way or another before I, for one,
would consider giving an R-b here.

> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -67,6 +67,9 @@ integer_param("ple_gap", ple_gap);
>  static unsigned int __read_mostly ple_window = 4096;
>  integer_param("ple_window", ple_window);
>  
> +static unsigned int __ro_after_init vm_notify_window;
> +integer_param("vm-notify-window", vm_notify_window);

Could even be a runtime param, I guess. Albeit I realize this would
complicate things further down.

> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -1419,10 +1419,19 @@ static void cf_check vmx_update_host_cr3(struct vcpu *v)
>  
>  void vmx_update_debug_state(struct vcpu *v)
>  {
> +    unsigned int mask = 1u << TRAP_int3;
> +
> +    if ( !cpu_has_monitor_trap_flag && cpu_has_vmx_notify_vm_exiting )

I'm puzzled by the lack of symmetry between this and ...

> +        /*
> +         * Only allow toggling TRAP_debug if notify VM exit is enabled, as
> +         * unconditionally setting TRAP_debug is part of the XSA-156 fix.
> +         */
> +        mask |= 1u << TRAP_debug;
> +
>      if ( v->arch.hvm.debug_state_latch )
> -        v->arch.hvm.vmx.exception_bitmap |= 1U << TRAP_int3;
> +        v->arch.hvm.vmx.exception_bitmap |= mask;
>      else
> -        v->arch.hvm.vmx.exception_bitmap &= ~(1U << TRAP_int3);
> +        v->arch.hvm.vmx.exception_bitmap &= ~mask;
>  
>      vmx_vmcs_enter(v);
>      vmx_update_exception_bitmap(v);
> @@ -4155,6 +4164,9 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
>          switch ( vector )
>          {
>          case TRAP_debug:
> +            if ( cpu_has_monitor_trap_flag && cpu_has_vmx_notify_vm_exiting )
> +                goto exit_and_crash;

... this condition. Shouldn't one be the inverse of the other (and
then it's the one down here which wants adjusting)?

> @@ -126,5 +126,6 @@ PERFCOUNTER(realmode_exits,      "vmexits from realmode")
>  PERFCOUNTER(pauseloop_exits, "vmexits from Pause-Loop Detection")
>  
>  PERFCOUNTER(buslock, "Bus Locks Detected")
> +PERFCOUNTER(vmnotify_crash, "domains crashed by Notify VM Exit")

I think the text is not entirely correct and would want to be
"domain crashes by ...". Multiple vCPU-s of a single domain can
experience this in parallel (granted this would require "good"
timing).

Jan



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/3] x86/vmx: implement Bus Lock detection
  2022-06-03 12:16   ` Jan Beulich
@ 2022-06-03 14:29     ` Roger Pau Monné
  2022-06-07  7:20       ` Jan Beulich
  0 siblings, 1 reply; 21+ messages in thread
From: Roger Pau Monné @ 2022-06-03 14:29 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Jun Nakajima, Kevin Tian, Andrew Cooper, Wei Liu, xen-devel

On Fri, Jun 03, 2022 at 02:16:47PM +0200, Jan Beulich wrote:
> On 26.05.2022 13:11, Roger Pau Monne wrote:
> > Add support for enabling Bus Lock Detection on Intel systems.  Such
> > detection works by triggering a vmexit, which ought to be enough of a
> > pause to prevent a guest from abusing of the Bus Lock.
> > 
> > Add an extra Xen perf counter to track the number of Bus Locks detected.
> > This is done because Bus Locks can also be reported by setting the bit
> > 26 in the exit reason field, so also account for those.
> > 
> > Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> 
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> 
> This implements just the VMexit part of the feature - maybe the
> title wants to reflect that? The vmx: tag could also mean there
> is exposure to guests included for the #DB part of the feature.

Maybe:

"x86/vmx: add Bus Lock detection to the hypervisor"

Thanks, Roger.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/3] x86/vmx: implement Notify VM Exit
  2022-06-03 12:49   ` Jan Beulich
@ 2022-06-03 14:46     ` Roger Pau Monné
  2022-06-07  7:43       ` Jan Beulich
  0 siblings, 1 reply; 21+ messages in thread
From: Roger Pau Monné @ 2022-06-03 14:46 UTC (permalink / raw)
  To: Jan Beulich, Kevin Tian, Jun Nakajima
  Cc: Andrew Cooper, George Dunlap, Julien Grall, Stefano Stabellini,
	Wei Liu, xen-devel

On Fri, Jun 03, 2022 at 02:49:54PM +0200, Jan Beulich wrote:
> On 26.05.2022 13:11, Roger Pau Monne wrote:
> > Under certain conditions guests can get the CPU stuck in an unbounded
> > loop without the possibility of an interrupt window to occur on
> > instruction boundary.  This was the case with the scenarios described
> > in XSA-156.
> > 
> > Make use of the Notify VM Exit mechanism, that will trigger a VM Exit
> > if no interrupt window occurs for a specified amount of time.  Note
> > that using the Notify VM Exit avoids having to trap #AC and #DB
> > exceptions, as Xen is guaranteed to get a VM Exit even if the guest
> > puts the CPU in a loop without an interrupt window, as such disable
> > the intercepts if the feature is available and enabled.
> > 
> > Setting the notify VM exit window to 0 is safe because there's a
> > threshold added by the hardware in order to have a sane window value.
> > 
> > Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > ---
> > Changes since v1:
> >  - Properly update debug state when using notify VM exit.
> >  - Reword commit message.
> > ---
> > This change enables the notify VM exit by default, KVM however doesn't
> > seem to enable it by default, and there's the following note in the
> > commit message:
> > 
> > "- There's a possibility, however small, that a notify VM exit happens
> >    with VM_CONTEXT_INVALID set in exit qualification. In this case, the
> >    vcpu can no longer run. To avoid killing a well-behaved guest, set
> >    notify window as -1 to disable this feature by default."
> > 
> > It's not obviously clear to me whether the comment was meant to be:
> > "There's a possibility, however small, that a notify VM exit _wrongly_
> > happens with VM_CONTEXT_INVALID".
> > 
> > It's also not clear whether such wrong hardware behavior only affects
> > a specific set of hardware, in a way that we could avoid enabling
> > notify VM exit there.
> > 
> > There's a discussion in one of the Linux patches that 128K might be
> > the safer value in order to prevent false positives, but I have no
> > formal confirmation about this.  Maybe our Intel maintainers can
> > provide some more feedback on a suitable notify VM exit window
> > value.
> 
> This certainly wants sorting one way or another before I, for one,
> would consider giving an R-b here.

I was hoping for either Kevin or Jun (now moved from Cc to To) to
provide some guidance here on what a suitable default value would be.

> > --- a/xen/arch/x86/hvm/vmx/vmcs.c
> > +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> > @@ -67,6 +67,9 @@ integer_param("ple_gap", ple_gap);
> >  static unsigned int __read_mostly ple_window = 4096;
> >  integer_param("ple_window", ple_window);
> >  
> > +static unsigned int __ro_after_init vm_notify_window;
> > +integer_param("vm-notify-window", vm_notify_window);
> 
> Could even be a runtime param, I guess. Albeit I realize this would
> complicate things further down.
> 
> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > @@ -1419,10 +1419,19 @@ static void cf_check vmx_update_host_cr3(struct vcpu *v)
> >  
> >  void vmx_update_debug_state(struct vcpu *v)
> >  {
> > +    unsigned int mask = 1u << TRAP_int3;
> > +
> > +    if ( !cpu_has_monitor_trap_flag && cpu_has_vmx_notify_vm_exiting )
> 
> I'm puzzled by the lack of symmetry between this and ...
> 
> > +        /*
> > +         * Only allow toggling TRAP_debug if notify VM exit is enabled, as
> > +         * unconditionally setting TRAP_debug is part of the XSA-156 fix.
> > +         */
> > +        mask |= 1u << TRAP_debug;
> > +
> >      if ( v->arch.hvm.debug_state_latch )
> > -        v->arch.hvm.vmx.exception_bitmap |= 1U << TRAP_int3;
> > +        v->arch.hvm.vmx.exception_bitmap |= mask;
> >      else
> > -        v->arch.hvm.vmx.exception_bitmap &= ~(1U << TRAP_int3);
> > +        v->arch.hvm.vmx.exception_bitmap &= ~mask;
> >  
> >      vmx_vmcs_enter(v);
> >      vmx_update_exception_bitmap(v);
> > @@ -4155,6 +4164,9 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
> >          switch ( vector )
> >          {
> >          case TRAP_debug:
> > +            if ( cpu_has_monitor_trap_flag && cpu_has_vmx_notify_vm_exiting )
> > +                goto exit_and_crash;
> 
> ... this condition. Shouldn't one be the inverse of the other (and
> then it's the one down here which wants adjusting)?

The condition in vmx_update_debug_state() sets the mask so that
TRAP_debug will only be added or removed from the bitmap if
!cpu_has_monitor_trap_flag && cpu_has_vmx_notify_vm_exiting (note that
otherwise TRAP_debug is unconditionally set if
!cpu_has_vmx_notify_vm_exiting).

Hence it's impossible to get a VMExit TRAP_debug with
cpu_has_monitor_trap_flag && cpu_has_vmx_notify_vm_exiting because
TRAP_debug will never be set by vmx_update_debug_state() in that
case.

> 
> > @@ -126,5 +126,6 @@ PERFCOUNTER(realmode_exits,      "vmexits from realmode")
> >  PERFCOUNTER(pauseloop_exits, "vmexits from Pause-Loop Detection")
> >  
> >  PERFCOUNTER(buslock, "Bus Locks Detected")
> > +PERFCOUNTER(vmnotify_crash, "domains crashed by Notify VM Exit")
> 
> I think the text is not entirely correct and would want to be
> "domain crashes by ...". Multiple vCPU-s of a single domain can
> experience this in parallel (granted this would require "good"
> timing).

Sure, thanks for the review.

Roger.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/3] x86/vmx: implement Bus Lock detection
  2022-05-26 11:11 ` [PATCH v2 1/3] x86/vmx: implement Bus Lock detection Roger Pau Monne
  2022-06-03 12:16   ` Jan Beulich
@ 2022-06-06 13:27   ` Andrew Cooper
  2022-06-07  6:54     ` Jan Beulich
  1 sibling, 1 reply; 21+ messages in thread
From: Andrew Cooper @ 2022-06-06 13:27 UTC (permalink / raw)
  To: Roger Pau Monne, xen-devel; +Cc: Jun Nakajima, Kevin Tian, Jan Beulich, Wei Liu

On 26/05/2022 12:11, Roger Pau Monne wrote:
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index f08a00dcbb..476ab72463 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -4065,6 +4065,16 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
>  
>      if ( unlikely(exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY) )
>          return vmx_failed_vmentry(exit_reason, regs);
> +    if ( unlikely(exit_reason & VMX_EXIT_REASONS_BUS_LOCK) )
> +    {
> +        /*
> +         * Delivery of Bus Lock VM exit was pre-empted by a higher priority VM
> +         * exit.
> +         */
> +        exit_reason &= ~VMX_EXIT_REASONS_BUS_LOCK;
> +        if ( exit_reason != EXIT_REASON_BUS_LOCK )
> +            perfc_incr(buslock);
> +    }

I know this post-dates you posting v2, but given the latest update from
Intel, VMX_EXIT_REASONS_BUS_LOCK will be set on all exits.

So the code logic would be simpler as:

if ( exit_reason & VMX_EXIT_REASONS_BUS_LOCK )
{
    perfc_incr(buslock);
    exit_reason &= ~VMX_EXIT_REASONS_BUS_LOCK;
}

and ...

>  
>      if ( v->arch.hvm.vmx.vmx_realmode )
>      {
> @@ -4561,6 +4571,14 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
>          vmx_handle_descriptor_access(exit_reason);
>          break;
>  
> +    case EXIT_REASON_BUS_LOCK:
> +        perfc_incr(buslock);

... dropping this perf counter.

With something along these lines, Reviewed-by: Andrew Cooper
<andrew.cooper3@citrix.com>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/3] x86/vmx: implement Bus Lock detection
  2022-06-06 13:27   ` Andrew Cooper
@ 2022-06-07  6:54     ` Jan Beulich
  2022-06-07 10:03       ` Roger Pau Monné
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2022-06-07  6:54 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Jun Nakajima, Kevin Tian, Wei Liu, Roger Pau Monne, xen-devel

On 06.06.2022 15:27, Andrew Cooper wrote:
> On 26/05/2022 12:11, Roger Pau Monne wrote:
>> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
>> index f08a00dcbb..476ab72463 100644
>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> @@ -4065,6 +4065,16 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
>>  
>>      if ( unlikely(exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY) )
>>          return vmx_failed_vmentry(exit_reason, regs);
>> +    if ( unlikely(exit_reason & VMX_EXIT_REASONS_BUS_LOCK) )
>> +    {
>> +        /*
>> +         * Delivery of Bus Lock VM exit was pre-empted by a higher priority VM
>> +         * exit.
>> +         */
>> +        exit_reason &= ~VMX_EXIT_REASONS_BUS_LOCK;
>> +        if ( exit_reason != EXIT_REASON_BUS_LOCK )
>> +            perfc_incr(buslock);
>> +    }
> 
> I know this post-dates you posting v2, but given the latest update from
> Intel, VMX_EXIT_REASONS_BUS_LOCK will be set on all exits.

Mind me asking what "latest update" you're referring to? Neither SDM nor
ISE have seen a recent update, afaict.

Jan



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/3] x86/vmx: implement Bus Lock detection
  2022-06-03 14:29     ` Roger Pau Monné
@ 2022-06-07  7:20       ` Jan Beulich
  0 siblings, 0 replies; 21+ messages in thread
From: Jan Beulich @ 2022-06-07  7:20 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Jun Nakajima, Kevin Tian, Andrew Cooper, Wei Liu, xen-devel

On 03.06.2022 16:29, Roger Pau Monné wrote:
> On Fri, Jun 03, 2022 at 02:16:47PM +0200, Jan Beulich wrote:
>> On 26.05.2022 13:11, Roger Pau Monne wrote:
>>> Add support for enabling Bus Lock Detection on Intel systems.  Such
>>> detection works by triggering a vmexit, which ought to be enough of a
>>> pause to prevent a guest from abusing of the Bus Lock.
>>>
>>> Add an extra Xen perf counter to track the number of Bus Locks detected.
>>> This is done because Bus Locks can also be reported by setting the bit
>>> 26 in the exit reason field, so also account for those.
>>>
>>> Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>
>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>
>> This implements just the VMexit part of the feature - maybe the
>> title wants to reflect that? The vmx: tag could also mean there
>> is exposure to guests included for the #DB part of the feature.
> 
> Maybe:
> 
> "x86/vmx: add Bus Lock detection to the hypervisor"

Fine with me.

Jan



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/3] x86/vmx: implement Notify VM Exit
  2022-06-03 14:46     ` Roger Pau Monné
@ 2022-06-07  7:43       ` Jan Beulich
  2022-06-07 10:05         ` Roger Pau Monné
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2022-06-07  7:43 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Andrew Cooper, George Dunlap, Julien Grall, Stefano Stabellini,
	Wei Liu, Kevin Tian, Jun Nakajima, xen-devel

On 03.06.2022 16:46, Roger Pau Monné wrote:
> On Fri, Jun 03, 2022 at 02:49:54PM +0200, Jan Beulich wrote:
>> On 26.05.2022 13:11, Roger Pau Monne wrote:
>>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>>> @@ -1419,10 +1419,19 @@ static void cf_check vmx_update_host_cr3(struct vcpu *v)
>>>  
>>>  void vmx_update_debug_state(struct vcpu *v)
>>>  {
>>> +    unsigned int mask = 1u << TRAP_int3;
>>> +
>>> +    if ( !cpu_has_monitor_trap_flag && cpu_has_vmx_notify_vm_exiting )
>>
>> I'm puzzled by the lack of symmetry between this and ...
>>
>>> +        /*
>>> +         * Only allow toggling TRAP_debug if notify VM exit is enabled, as
>>> +         * unconditionally setting TRAP_debug is part of the XSA-156 fix.
>>> +         */
>>> +        mask |= 1u << TRAP_debug;
>>> +
>>>      if ( v->arch.hvm.debug_state_latch )
>>> -        v->arch.hvm.vmx.exception_bitmap |= 1U << TRAP_int3;
>>> +        v->arch.hvm.vmx.exception_bitmap |= mask;
>>>      else
>>> -        v->arch.hvm.vmx.exception_bitmap &= ~(1U << TRAP_int3);
>>> +        v->arch.hvm.vmx.exception_bitmap &= ~mask;
>>>  
>>>      vmx_vmcs_enter(v);
>>>      vmx_update_exception_bitmap(v);
>>> @@ -4155,6 +4164,9 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
>>>          switch ( vector )
>>>          {
>>>          case TRAP_debug:
>>> +            if ( cpu_has_monitor_trap_flag && cpu_has_vmx_notify_vm_exiting )
>>> +                goto exit_and_crash;
>>
>> ... this condition. Shouldn't one be the inverse of the other (and
>> then it's the one down here which wants adjusting)?
> 
> The condition in vmx_update_debug_state() sets the mask so that
> TRAP_debug will only be added or removed from the bitmap if
> !cpu_has_monitor_trap_flag && cpu_has_vmx_notify_vm_exiting (note that
> otherwise TRAP_debug is unconditionally set if
> !cpu_has_vmx_notify_vm_exiting).
> 
> Hence it's impossible to get a VMExit TRAP_debug with
> cpu_has_monitor_trap_flag && cpu_has_vmx_notify_vm_exiting because
> TRAP_debug will never be set by vmx_update_debug_state() in that
> case.

Hmm, yes, I've been misguided by you not altering the existing setting
of v->arch.hvm.vmx.exception_bitmap in construct_vmcs(). Instead you
add an entirely new block of code near the bottom of the function. Is
there any chance you could move up that adjustment, perhaps along the
lines of

    v->arch.hvm.vmx.exception_bitmap = HVM_TRAP_MASK
              | (paging_mode_hap(d) ? 0 : (1U << TRAP_page_fault))
              | (v->arch.fully_eager_fpu ? 0 : (1U << TRAP_no_device));
    if ( cpu_has_vmx_notify_vm_exiting )
    {
        __vmwrite(NOTIFY_WINDOW, vm_notify_window);
        /*
         * Disable #AC and #DB interception: by using VM Notify Xen is
         * guaranteed to get a VM exit even if the guest manages to lock the
         * CPU.
         */
        v->arch.hvm.vmx.exception_bitmap &= ~((1U << TRAP_debug) |
                                              (1U << TRAP_alignment_check));
    }
    vmx_update_exception_bitmap(v);

Jan



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/3] x86/vmx: implement Bus Lock detection
  2022-06-07  6:54     ` Jan Beulich
@ 2022-06-07 10:03       ` Roger Pau Monné
  0 siblings, 0 replies; 21+ messages in thread
From: Roger Pau Monné @ 2022-06-07 10:03 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Jun Nakajima, Kevin Tian, Wei Liu, xen-devel

On Tue, Jun 07, 2022 at 08:54:15AM +0200, Jan Beulich wrote:
> On 06.06.2022 15:27, Andrew Cooper wrote:
> > On 26/05/2022 12:11, Roger Pau Monne wrote:
> >> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> >> index f08a00dcbb..476ab72463 100644
> >> --- a/xen/arch/x86/hvm/vmx/vmx.c
> >> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> >> @@ -4065,6 +4065,16 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
> >>  
> >>      if ( unlikely(exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY) )
> >>          return vmx_failed_vmentry(exit_reason, regs);
> >> +    if ( unlikely(exit_reason & VMX_EXIT_REASONS_BUS_LOCK) )
> >> +    {
> >> +        /*
> >> +         * Delivery of Bus Lock VM exit was pre-empted by a higher priority VM
> >> +         * exit.
> >> +         */
> >> +        exit_reason &= ~VMX_EXIT_REASONS_BUS_LOCK;
> >> +        if ( exit_reason != EXIT_REASON_BUS_LOCK )
> >> +            perfc_incr(buslock);
> >> +    }
> > 
> > I know this post-dates you posting v2, but given the latest update from
> > Intel, VMX_EXIT_REASONS_BUS_LOCK will be set on all exits.
> 
> Mind me asking what "latest update" you're referring to? Neither SDM nor
> ISE have seen a recent update, afaict.

After Andrew's request we got in touch with Intel regarding whether
VMX_EXIT_REASONS_BUS_LOCK is set when exit_reason ==
EXIT_REASON_BUS_LOCK, and they will update the ISE to contain:

"the bit 26 in the exit-reason field will always be set on VM exits
due to bus locks."

So I will apply the changes requested by Andrew to match this
behavior.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/3] x86/vmx: implement Notify VM Exit
  2022-06-07  7:43       ` Jan Beulich
@ 2022-06-07 10:05         ` Roger Pau Monné
  2022-06-08  3:51           ` Tian, Kevin
  0 siblings, 1 reply; 21+ messages in thread
From: Roger Pau Monné @ 2022-06-07 10:05 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, George Dunlap, Julien Grall, Stefano Stabellini,
	Wei Liu, Kevin Tian, Jun Nakajima, xen-devel

On Tue, Jun 07, 2022 at 09:43:25AM +0200, Jan Beulich wrote:
> On 03.06.2022 16:46, Roger Pau Monné wrote:
> > On Fri, Jun 03, 2022 at 02:49:54PM +0200, Jan Beulich wrote:
> >> On 26.05.2022 13:11, Roger Pau Monne wrote:
> >>> --- a/xen/arch/x86/hvm/vmx/vmx.c
> >>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> >>> @@ -1419,10 +1419,19 @@ static void cf_check vmx_update_host_cr3(struct vcpu *v)
> >>>  
> >>>  void vmx_update_debug_state(struct vcpu *v)
> >>>  {
> >>> +    unsigned int mask = 1u << TRAP_int3;
> >>> +
> >>> +    if ( !cpu_has_monitor_trap_flag && cpu_has_vmx_notify_vm_exiting )
> >>
> >> I'm puzzled by the lack of symmetry between this and ...
> >>
> >>> +        /*
> >>> +         * Only allow toggling TRAP_debug if notify VM exit is enabled, as
> >>> +         * unconditionally setting TRAP_debug is part of the XSA-156 fix.
> >>> +         */
> >>> +        mask |= 1u << TRAP_debug;
> >>> +
> >>>      if ( v->arch.hvm.debug_state_latch )
> >>> -        v->arch.hvm.vmx.exception_bitmap |= 1U << TRAP_int3;
> >>> +        v->arch.hvm.vmx.exception_bitmap |= mask;
> >>>      else
> >>> -        v->arch.hvm.vmx.exception_bitmap &= ~(1U << TRAP_int3);
> >>> +        v->arch.hvm.vmx.exception_bitmap &= ~mask;
> >>>  
> >>>      vmx_vmcs_enter(v);
> >>>      vmx_update_exception_bitmap(v);
> >>> @@ -4155,6 +4164,9 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
> >>>          switch ( vector )
> >>>          {
> >>>          case TRAP_debug:
> >>> +            if ( cpu_has_monitor_trap_flag && cpu_has_vmx_notify_vm_exiting )
> >>> +                goto exit_and_crash;
> >>
> >> ... this condition. Shouldn't one be the inverse of the other (and
> >> then it's the one down here which wants adjusting)?
> > 
> > The condition in vmx_update_debug_state() sets the mask so that
> > TRAP_debug will only be added or removed from the bitmap if
> > !cpu_has_monitor_trap_flag && cpu_has_vmx_notify_vm_exiting (note that
> > otherwise TRAP_debug is unconditionally set if
> > !cpu_has_vmx_notify_vm_exiting).
> > 
> > Hence it's impossible to get a VMExit TRAP_debug with
> > cpu_has_monitor_trap_flag && cpu_has_vmx_notify_vm_exiting because
> > TRAP_debug will never be set by vmx_update_debug_state() in that
> > case.
> 
> Hmm, yes, I've been misguided by you not altering the existing setting
> of v->arch.hvm.vmx.exception_bitmap in construct_vmcs(). Instead you
> add an entirely new block of code near the bottom of the function. Is
> there any chance you could move up that adjustment, perhaps along the
> lines of
> 
>     v->arch.hvm.vmx.exception_bitmap = HVM_TRAP_MASK
>               | (paging_mode_hap(d) ? 0 : (1U << TRAP_page_fault))
>               | (v->arch.fully_eager_fpu ? 0 : (1U << TRAP_no_device));
>     if ( cpu_has_vmx_notify_vm_exiting )
>     {
>         __vmwrite(NOTIFY_WINDOW, vm_notify_window);
>         /*
>          * Disable #AC and #DB interception: by using VM Notify Xen is
>          * guaranteed to get a VM exit even if the guest manages to lock the
>          * CPU.
>          */
>         v->arch.hvm.vmx.exception_bitmap &= ~((1U << TRAP_debug) |
>                                               (1U << TRAP_alignment_check));
>     }
>     vmx_update_exception_bitmap(v);

Sure, will move up when posting a new version then.  I will wait for
feedback from Jun or Kevin regarding the default window size before
doing so.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH v2 3/3] x86/vmx: implement Notify VM Exit
  2022-06-07 10:05         ` Roger Pau Monné
@ 2022-06-08  3:51           ` Tian, Kevin
  0 siblings, 0 replies; 21+ messages in thread
From: Tian, Kevin @ 2022-06-08  3:51 UTC (permalink / raw)
  To: Pau Monné, Roger, Beulich, Jan
  Cc: Cooper, Andrew, George Dunlap, Julien Grall, Stefano Stabellini,
	Wei Liu, Nakajima, Jun, xen-devel

> From: Roger Pau Monné
> Sent: Tuesday, June 7, 2022 6:06 PM
> 
> On Tue, Jun 07, 2022 at 09:43:25AM +0200, Jan Beulich wrote:
> > On 03.06.2022 16:46, Roger Pau Monné wrote:
> > > On Fri, Jun 03, 2022 at 02:49:54PM +0200, Jan Beulich wrote:
> > >> On 26.05.2022 13:11, Roger Pau Monne wrote:
> > >>> --- a/xen/arch/x86/hvm/vmx/vmx.c
> > >>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > >>> @@ -1419,10 +1419,19 @@ static void cf_check
> vmx_update_host_cr3(struct vcpu *v)
> > >>>
> > >>>  void vmx_update_debug_state(struct vcpu *v)
> > >>>  {
> > >>> +    unsigned int mask = 1u << TRAP_int3;
> > >>> +
> > >>> +    if ( !cpu_has_monitor_trap_flag &&
> cpu_has_vmx_notify_vm_exiting )
> > >>
> > >> I'm puzzled by the lack of symmetry between this and ...
> > >>
> > >>> +        /*
> > >>> +         * Only allow toggling TRAP_debug if notify VM exit is enabled, as
> > >>> +         * unconditionally setting TRAP_debug is part of the XSA-156 fix.
> > >>> +         */
> > >>> +        mask |= 1u << TRAP_debug;
> > >>> +
> > >>>      if ( v->arch.hvm.debug_state_latch )
> > >>> -        v->arch.hvm.vmx.exception_bitmap |= 1U << TRAP_int3;
> > >>> +        v->arch.hvm.vmx.exception_bitmap |= mask;
> > >>>      else
> > >>> -        v->arch.hvm.vmx.exception_bitmap &= ~(1U << TRAP_int3);
> > >>> +        v->arch.hvm.vmx.exception_bitmap &= ~mask;
> > >>>
> > >>>      vmx_vmcs_enter(v);
> > >>>      vmx_update_exception_bitmap(v);
> > >>> @@ -4155,6 +4164,9 @@ void vmx_vmexit_handler(struct
> cpu_user_regs *regs)
> > >>>          switch ( vector )
> > >>>          {
> > >>>          case TRAP_debug:
> > >>> +            if ( cpu_has_monitor_trap_flag &&
> cpu_has_vmx_notify_vm_exiting )
> > >>> +                goto exit_and_crash;
> > >>
> > >> ... this condition. Shouldn't one be the inverse of the other (and
> > >> then it's the one down here which wants adjusting)?
> > >
> > > The condition in vmx_update_debug_state() sets the mask so that
> > > TRAP_debug will only be added or removed from the bitmap if
> > > !cpu_has_monitor_trap_flag && cpu_has_vmx_notify_vm_exiting (note
> that
> > > otherwise TRAP_debug is unconditionally set if
> > > !cpu_has_vmx_notify_vm_exiting).
> > >
> > > Hence it's impossible to get a VMExit TRAP_debug with
> > > cpu_has_monitor_trap_flag && cpu_has_vmx_notify_vm_exiting because
> > > TRAP_debug will never be set by vmx_update_debug_state() in that
> > > case.
> >
> > Hmm, yes, I've been misguided by you not altering the existing setting
> > of v->arch.hvm.vmx.exception_bitmap in construct_vmcs(). Instead you
> > add an entirely new block of code near the bottom of the function. Is
> > there any chance you could move up that adjustment, perhaps along the
> > lines of
> >
> >     v->arch.hvm.vmx.exception_bitmap = HVM_TRAP_MASK
> >               | (paging_mode_hap(d) ? 0 : (1U << TRAP_page_fault))
> >               | (v->arch.fully_eager_fpu ? 0 : (1U << TRAP_no_device));
> >     if ( cpu_has_vmx_notify_vm_exiting )
> >     {
> >         __vmwrite(NOTIFY_WINDOW, vm_notify_window);
> >         /*
> >          * Disable #AC and #DB interception: by using VM Notify Xen is
> >          * guaranteed to get a VM exit even if the guest manages to lock the
> >          * CPU.
> >          */
> >         v->arch.hvm.vmx.exception_bitmap &= ~((1U << TRAP_debug) |
> >                                               (1U << TRAP_alignment_check));
> >     }
> >     vmx_update_exception_bitmap(v);
> 
> Sure, will move up when posting a new version then.  I will wait for
> feedback from Jun or Kevin regarding the default window size before
> doing so.
> 

let me check internally.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH v2 3/3] x86/vmx: implement Notify VM Exit
  2022-05-26 11:11 ` [PATCH v2 3/3] x86/vmx: implement Notify VM Exit Roger Pau Monne
  2022-06-03 12:49   ` Jan Beulich
@ 2022-06-09  7:04   ` Tian, Kevin
  2022-06-09  7:39     ` Xiaoyao Li
  1 sibling, 1 reply; 21+ messages in thread
From: Tian, Kevin @ 2022-06-09  7:04 UTC (permalink / raw)
  To: Pau Monné, Roger, xen-devel
  Cc: Pau Monné,
	Roger, Cooper, Andrew, George Dunlap, Beulich, Jan, Julien Grall,
	Stefano Stabellini, Wei Liu, Nakajima, Jun, Qiang, Chenyi, Li,
	Xiaoyao

+Chenyi/Xiaoyao who worked on the KVM support. Presumably
similar opens have been discussed in KVM hence they have the
right background to comment here.

> From: Roger Pau Monne <roger.pau@citrix.com>
> Sent: Thursday, May 26, 2022 7:12 PM
> 
> Under certain conditions guests can get the CPU stuck in an unbounded
> loop without the possibility of an interrupt window to occur on
> instruction boundary.  This was the case with the scenarios described
> in XSA-156.
> 
> Make use of the Notify VM Exit mechanism, that will trigger a VM Exit
> if no interrupt window occurs for a specified amount of time.  Note
> that using the Notify VM Exit avoids having to trap #AC and #DB
> exceptions, as Xen is guaranteed to get a VM Exit even if the guest
> puts the CPU in a loop without an interrupt window, as such disable
> the intercepts if the feature is available and enabled.
> 
> Setting the notify VM exit window to 0 is safe because there's a
> threshold added by the hardware in order to have a sane window value.
> 
> Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> Changes since v1:
>  - Properly update debug state when using notify VM exit.
>  - Reword commit message.
> ---
> This change enables the notify VM exit by default, KVM however doesn't
> seem to enable it by default, and there's the following note in the
> commit message:
> 
> "- There's a possibility, however small, that a notify VM exit happens
>    with VM_CONTEXT_INVALID set in exit qualification. In this case, the
>    vcpu can no longer run. To avoid killing a well-behaved guest, set
>    notify window as -1 to disable this feature by default."
> 
> It's not obviously clear to me whether the comment was meant to be:
> "There's a possibility, however small, that a notify VM exit _wrongly_
> happens with VM_CONTEXT_INVALID".
> 
> It's also not clear whether such wrong hardware behavior only affects
> a specific set of hardware, in a way that we could avoid enabling
> notify VM exit there.
> 
> There's a discussion in one of the Linux patches that 128K might be
> the safer value in order to prevent false positives, but I have no
> formal confirmation about this.  Maybe our Intel maintainers can
> provide some more feedback on a suitable notify VM exit window
> value.
> 
> I've tested with 0 (the proposed default in the patch) and I don't
> seem to be able to trigger notify VM exits under normal guest
> operation.  Note that even in that case the guest won't be destroyed
> unless the context is corrupt.
> ---
>  docs/misc/xen-command-line.pandoc       | 11 +++++++++
>  xen/arch/x86/hvm/vmx/vmcs.c             | 19 +++++++++++++++
>  xen/arch/x86/hvm/vmx/vmx.c              | 32 +++++++++++++++++++++++--
>  xen/arch/x86/include/asm/hvm/vmx/vmcs.h |  4 ++++
>  xen/arch/x86/include/asm/hvm/vmx/vmx.h  |  6 +++++
>  xen/arch/x86/include/asm/perfc_defn.h   |  3 ++-
>  6 files changed, 72 insertions(+), 3 deletions(-)
> 
> diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-
> command-line.pandoc
> index 1dc7e1ca07..ccf8bf5806 100644
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -2544,6 +2544,17 @@ guest will notify Xen that it has failed to acquire a
> spinlock.
>  <major>, <minor> and <build> must be integers. The values will be
>  encoded in guest CPUID 0x40000002 if viridian enlightenments are enabled.
> 
> +### vm-notify-window (Intel)
> +> `= <integer>`
> +
> +> Default: `0`
> +
> +Specify the value of the VM Notify window used to detect locked VMs. Set
> to -1
> +to disable the feature.  Value is in units of crystal clock cycles.
> +
> +Note the hardware might add a threshold to the provided value in order to
> make
> +it safe, and hence using 0 is fine.
> +
>  ### vpid (Intel)
>  > `= <boolean>`
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index d388e6729c..6cb2c6c6b7 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -67,6 +67,9 @@ integer_param("ple_gap", ple_gap);
>  static unsigned int __read_mostly ple_window = 4096;
>  integer_param("ple_window", ple_window);
> 
> +static unsigned int __ro_after_init vm_notify_window;
> +integer_param("vm-notify-window", vm_notify_window);
> +
>  static bool __read_mostly opt_ept_pml = true;
>  static s8 __read_mostly opt_ept_ad = -1;
>  int8_t __read_mostly opt_ept_exec_sp = -1;
> @@ -210,6 +213,7 @@ static void __init vmx_display_features(void)
>      P(cpu_has_vmx_pml, "Page Modification Logging");
>      P(cpu_has_vmx_tsc_scaling, "TSC Scaling");
>      P(cpu_has_vmx_bus_lock_detection, "Bus Lock Detection");
> +    P(cpu_has_vmx_notify_vm_exiting, "Notify VM Exit");
>  #undef P
> 
>      if ( !printed )
> @@ -329,6 +333,8 @@ static int vmx_init_vmcs_config(bool bsp)
>              opt |= SECONDARY_EXEC_UNRESTRICTED_GUEST;
>          if ( opt_ept_pml )
>              opt |= SECONDARY_EXEC_ENABLE_PML;
> +        if ( vm_notify_window != ~0u )
> +            opt |= SECONDARY_EXEC_NOTIFY_VM_EXITING;
> 
>          /*
>           * "APIC Register Virtualization" and "Virtual Interrupt Delivery"
> @@ -1333,6 +1339,19 @@ static int construct_vmcs(struct vcpu *v)
>          rc = vmx_add_msr(v, MSR_FLUSH_CMD, FLUSH_CMD_L1D,
>                           VMX_MSR_GUEST_LOADONLY);
> 
> +    if ( cpu_has_vmx_notify_vm_exiting )
> +    {
> +        __vmwrite(NOTIFY_WINDOW, vm_notify_window);
> +        /*
> +         * Disable #AC and #DB interception: by using VM Notify Xen is
> +         * guaranteed to get a VM exit even if the guest manages to lock the
> +         * CPU.
> +         */
> +        v->arch.hvm.vmx.exception_bitmap &= ~((1U << TRAP_debug) |
> +                                              (1U << TRAP_alignment_check));
> +        vmx_update_exception_bitmap(v);
> +    }
> +
>   out:
>      vmx_vmcs_exit(v);
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index 69980c8e31..d3c1597b3e 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -1419,10 +1419,19 @@ static void cf_check vmx_update_host_cr3(struct
> vcpu *v)
> 
>  void vmx_update_debug_state(struct vcpu *v)
>  {
> +    unsigned int mask = 1u << TRAP_int3;
> +
> +    if ( !cpu_has_monitor_trap_flag && cpu_has_vmx_notify_vm_exiting )
> +        /*
> +         * Only allow toggling TRAP_debug if notify VM exit is enabled, as
> +         * unconditionally setting TRAP_debug is part of the XSA-156 fix.
> +         */
> +        mask |= 1u << TRAP_debug;
> +
>      if ( v->arch.hvm.debug_state_latch )
> -        v->arch.hvm.vmx.exception_bitmap |= 1U << TRAP_int3;
> +        v->arch.hvm.vmx.exception_bitmap |= mask;
>      else
> -        v->arch.hvm.vmx.exception_bitmap &= ~(1U << TRAP_int3);
> +        v->arch.hvm.vmx.exception_bitmap &= ~mask;
> 
>      vmx_vmcs_enter(v);
>      vmx_update_exception_bitmap(v);
> @@ -4155,6 +4164,9 @@ void vmx_vmexit_handler(struct cpu_user_regs
> *regs)
>          switch ( vector )
>          {
>          case TRAP_debug:
> +            if ( cpu_has_monitor_trap_flag && cpu_has_vmx_notify_vm_exiting )
> +                goto exit_and_crash;
> +
>              /*
>               * Updates DR6 where debugger can peek (See 3B 23.2.1,
>               * Table 23-1, "Exit Qualification for Debug Exceptions").
> @@ -4593,6 +4605,22 @@ void vmx_vmexit_handler(struct cpu_user_regs
> *regs)
>           */
>          break;
> 
> +    case EXIT_REASON_NOTIFY:
> +        __vmread(EXIT_QUALIFICATION, &exit_qualification);
> +
> +        if ( exit_qualification & NOTIFY_VM_CONTEXT_INVALID )
> +        {
> +            perfc_incr(vmnotify_crash);
> +            gprintk(XENLOG_ERR, "invalid VM context after notify vmexit\n");
> +            domain_crash(v->domain);
> +            break;
> +        }
> +
> +        if ( unlikely(exit_qualification &
> INTR_INFO_NMI_UNBLOCKED_BY_IRET) )
> +            undo_nmis_unblocked_by_iret();
> +
> +        break;
> +
>      case EXIT_REASON_VMX_PREEMPTION_TIMER_EXPIRED:
>      case EXIT_REASON_INVPCID:
>      /* fall through */
> diff --git a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h
> b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h
> index 5d3edc1642..0961eabf3f 100644
> --- a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h
> +++ b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h
> @@ -267,6 +267,7 @@ extern u32 vmx_vmentry_control;
>  #define SECONDARY_EXEC_XSAVES                   0x00100000
>  #define SECONDARY_EXEC_TSC_SCALING              0x02000000
>  #define SECONDARY_EXEC_BUS_LOCK_DETECTION       0x40000000
> +#define SECONDARY_EXEC_NOTIFY_VM_EXITING        0x80000000
>  extern u32 vmx_secondary_exec_control;
> 
>  #define VMX_EPT_EXEC_ONLY_SUPPORTED                         0x00000001
> @@ -348,6 +349,8 @@ extern u64 vmx_ept_vpid_cap;
>      (vmx_secondary_exec_control & SECONDARY_EXEC_TSC_SCALING)
>  #define cpu_has_vmx_bus_lock_detection \
>      (vmx_secondary_exec_control &
> SECONDARY_EXEC_BUS_LOCK_DETECTION)
> +#define cpu_has_vmx_notify_vm_exiting \
> +    (vmx_secondary_exec_control &
> SECONDARY_EXEC_NOTIFY_VM_EXITING)
> 
>  #define VMCS_RID_TYPE_MASK              0x80000000
> 
> @@ -455,6 +458,7 @@ enum vmcs_field {
>      SECONDARY_VM_EXEC_CONTROL       = 0x0000401e,
>      PLE_GAP                         = 0x00004020,
>      PLE_WINDOW                      = 0x00004022,
> +    NOTIFY_WINDOW                   = 0x00004024,
>      VM_INSTRUCTION_ERROR            = 0x00004400,
>      VM_EXIT_REASON                  = 0x00004402,
>      VM_EXIT_INTR_INFO               = 0x00004404,
> diff --git a/xen/arch/x86/include/asm/hvm/vmx/vmx.h
> b/xen/arch/x86/include/asm/hvm/vmx/vmx.h
> index bc0caad6fb..e429de8541 100644
> --- a/xen/arch/x86/include/asm/hvm/vmx/vmx.h
> +++ b/xen/arch/x86/include/asm/hvm/vmx/vmx.h
> @@ -221,6 +221,7 @@ static inline void pi_clear_sn(struct pi_desc *pi_desc)
>  #define EXIT_REASON_XSAVES              63
>  #define EXIT_REASON_XRSTORS             64
>  #define EXIT_REASON_BUS_LOCK            74
> +#define EXIT_REASON_NOTIFY              75
>  /* Remember to also update VMX_PERF_EXIT_REASON_SIZE! */
> 
>  /*
> @@ -236,6 +237,11 @@ static inline void pi_clear_sn(struct pi_desc *pi_desc)
>  #define INTR_INFO_VALID_MASK            0x80000000      /* 31 */
>  #define INTR_INFO_RESVD_BITS_MASK       0x7ffff000
> 
> +/*
> + * Exit Qualifications for NOTIFY VM EXIT
> + */
> +#define NOTIFY_VM_CONTEXT_INVALID       1u
> +
>  /*
>   * Exit Qualifications for MOV for Control Register Access
>   */
> diff --git a/xen/arch/x86/include/asm/perfc_defn.h
> b/xen/arch/x86/include/asm/perfc_defn.h
> index d6eb661940..c6b601b729 100644
> --- a/xen/arch/x86/include/asm/perfc_defn.h
> +++ b/xen/arch/x86/include/asm/perfc_defn.h
> @@ -6,7 +6,7 @@ PERFCOUNTER_ARRAY(exceptions,           "exceptions", 32)
> 
>  #ifdef CONFIG_HVM
> 
> -#define VMX_PERF_EXIT_REASON_SIZE 75
> +#define VMX_PERF_EXIT_REASON_SIZE 76
>  #define VMEXIT_NPF_PERFC 143
>  #define SVM_PERF_EXIT_REASON_SIZE (VMEXIT_NPF_PERFC + 1)
>  PERFCOUNTER_ARRAY(vmexits,              "vmexits",
> @@ -126,5 +126,6 @@ PERFCOUNTER(realmode_exits,      "vmexits from
> realmode")
>  PERFCOUNTER(pauseloop_exits, "vmexits from Pause-Loop Detection")
> 
>  PERFCOUNTER(buslock, "Bus Locks Detected")
> +PERFCOUNTER(vmnotify_crash, "domains crashed by Notify VM Exit")
> 
>  /*#endif*/ /* __XEN_PERFC_DEFN_H__ */
> --
> 2.36.0


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/3] x86/vmx: implement Notify VM Exit
  2022-06-09  7:04   ` Tian, Kevin
@ 2022-06-09  7:39     ` Xiaoyao Li
  2022-06-09 10:09       ` Roger Pau Monné
  0 siblings, 1 reply; 21+ messages in thread
From: Xiaoyao Li @ 2022-06-09  7:39 UTC (permalink / raw)
  To: Tian, Kevin, Pau Monné, Roger, xen-devel
  Cc: Cooper, Andrew, George Dunlap, Beulich, Jan, Julien Grall,
	Stefano Stabellini, Wei Liu, Nakajima, Jun, Qiang, Chenyi

On 6/9/2022 3:04 PM, Tian, Kevin wrote:
> +Chenyi/Xiaoyao who worked on the KVM support. Presumably
> similar opens have been discussed in KVM hence they have the
> right background to comment here.
> 
>> From: Roger Pau Monne <roger.pau@citrix.com>
>> Sent: Thursday, May 26, 2022 7:12 PM
>>
>> Under certain conditions guests can get the CPU stuck in an unbounded
>> loop without the possibility of an interrupt window to occur on
>> instruction boundary.  This was the case with the scenarios described
>> in XSA-156.
>>
>> Make use of the Notify VM Exit mechanism, that will trigger a VM Exit
>> if no interrupt window occurs for a specified amount of time.  Note
>> that using the Notify VM Exit avoids having to trap #AC and #DB
>> exceptions, as Xen is guaranteed to get a VM Exit even if the guest
>> puts the CPU in a loop without an interrupt window, as such disable
>> the intercepts if the feature is available and enabled.
>>
>> Setting the notify VM exit window to 0 is safe because there's a
>> threshold added by the hardware in order to have a sane window value.
>>
>> Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>> ---
>> Changes since v1:
>>   - Properly update debug state when using notify VM exit.
>>   - Reword commit message.
>> ---
>> This change enables the notify VM exit by default, KVM however doesn't
>> seem to enable it by default, and there's the following note in the
>> commit message:
>>
>> "- There's a possibility, however small, that a notify VM exit happens
>>     with VM_CONTEXT_INVALID set in exit qualification. In this case, the
>>     vcpu can no longer run. To avoid killing a well-behaved guest, set
>>     notify window as -1 to disable this feature by default."
>>
>> It's not obviously clear to me whether the comment was meant to be:
>> "There's a possibility, however small, that a notify VM exit _wrongly_
>> happens with VM_CONTEXT_INVALID".
>>
>> It's also not clear whether such wrong hardware behavior only affects
>> a specific set of hardware, 

I'm not sure what you mean for a specific set of hardware.

We make it default off in KVM just in case that future silicon wrongly 
sets VM_CONTEXT_INVALID bit. Becuase we make the policy that VM cannot 
continue running in that case.

For the worst case, if some future silicon happens to have this kind 
silly bug, then the existing product kernel all suffer the possibility 
that their VM being killed due to the feature is default on.

>> in a way that we could avoid enabling
>> notify VM exit there.
>>
>> There's a discussion in one of the Linux patches that 128K might be
>> the safer value in order to prevent false positives, but I have no
>> formal confirmation about this.  Maybe our Intel maintainers can
>> provide some more feedback on a suitable notify VM exit window
>> value.

The 128k is the internal threshold for SPR silicon. The internal 
threshold is tuned by Intel for each silicon, to make sure it's big 
enough to avoid false positive even when user set vmcs.notify_window to 0.

However, it varies for different processor generations.

What is the suitable value is hard to say, it depends on how soon does 
VMM want to intercept the VM. Anyway, Intel ensures that even value 0 is 
safe.

>>
>> I've tested with 0 (the proposed default in the patch) and I don't
>> seem to be able to trigger notify VM exits under normal guest
>> operation.  Note that even in that case the guest won't be destroyed
>> unless the context is corrupt.
>> ---




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/3] x86/vmx: implement Notify VM Exit
  2022-06-09  7:39     ` Xiaoyao Li
@ 2022-06-09 10:09       ` Roger Pau Monné
  2022-06-16 11:17         ` Roger Pau Monné
  2022-06-16 11:57         ` Xiaoyao Li
  0 siblings, 2 replies; 21+ messages in thread
From: Roger Pau Monné @ 2022-06-09 10:09 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Tian, Kevin, xen-devel, Cooper, Andrew, George Dunlap, Beulich,
	Jan, Julien Grall, Stefano Stabellini, Wei Liu, Nakajima, Jun,
	Qiang, Chenyi

On Thu, Jun 09, 2022 at 03:39:33PM +0800, Xiaoyao Li wrote:
> On 6/9/2022 3:04 PM, Tian, Kevin wrote:
> > +Chenyi/Xiaoyao who worked on the KVM support. Presumably
> > similar opens have been discussed in KVM hence they have the
> > right background to comment here.
> > 
> > > From: Roger Pau Monne <roger.pau@citrix.com>
> > > Sent: Thursday, May 26, 2022 7:12 PM
> > > 
> > > Under certain conditions guests can get the CPU stuck in an unbounded
> > > loop without the possibility of an interrupt window to occur on
> > > instruction boundary.  This was the case with the scenarios described
> > > in XSA-156.
> > > 
> > > Make use of the Notify VM Exit mechanism, that will trigger a VM Exit
> > > if no interrupt window occurs for a specified amount of time.  Note
> > > that using the Notify VM Exit avoids having to trap #AC and #DB
> > > exceptions, as Xen is guaranteed to get a VM Exit even if the guest
> > > puts the CPU in a loop without an interrupt window, as such disable
> > > the intercepts if the feature is available and enabled.
> > > 
> > > Setting the notify VM exit window to 0 is safe because there's a
> > > threshold added by the hardware in order to have a sane window value.
> > > 
> > > Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
> > > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > > ---
> > > Changes since v1:
> > >   - Properly update debug state when using notify VM exit.
> > >   - Reword commit message.
> > > ---
> > > This change enables the notify VM exit by default, KVM however doesn't
> > > seem to enable it by default, and there's the following note in the
> > > commit message:
> > > 
> > > "- There's a possibility, however small, that a notify VM exit happens
> > >     with VM_CONTEXT_INVALID set in exit qualification. In this case, the
> > >     vcpu can no longer run. To avoid killing a well-behaved guest, set
> > >     notify window as -1 to disable this feature by default."
> > > 
> > > It's not obviously clear to me whether the comment was meant to be:
> > > "There's a possibility, however small, that a notify VM exit _wrongly_
> > > happens with VM_CONTEXT_INVALID".
> > > 
> > > It's also not clear whether such wrong hardware behavior only affects
> > > a specific set of hardware,
> 
> I'm not sure what you mean for a specific set of hardware.
> 
> We make it default off in KVM just in case that future silicon wrongly sets
> VM_CONTEXT_INVALID bit. Becuase we make the policy that VM cannot continue
> running in that case.
> 
> For the worst case, if some future silicon happens to have this kind silly
> bug, then the existing product kernel all suffer the possibility that their
> VM being killed due to the feature is default on.

That's IMO a weird policy.  If there's such behavior in any hardware
platform I would assume Intel would issue an errata, and then we would
just avoid using the feature on affected hardware (like we do with
other hardware features when they have erratas).

If we applied the same logic to all new Intel features we won't use
any of them.  At least in Xen there are already combinations of vmexit
conditions that will lead to the guest being killed.

> > > in a way that we could avoid enabling
> > > notify VM exit there.
> > > 
> > > There's a discussion in one of the Linux patches that 128K might be
> > > the safer value in order to prevent false positives, but I have no
> > > formal confirmation about this.  Maybe our Intel maintainers can
> > > provide some more feedback on a suitable notify VM exit window
> > > value.
> 
> The 128k is the internal threshold for SPR silicon. The internal threshold
> is tuned by Intel for each silicon, to make sure it's big enough to avoid
> false positive even when user set vmcs.notify_window to 0.
> 
> However, it varies for different processor generations.
> 
> What is the suitable value is hard to say, it depends on how soon does VMM
> want to intercept the VM. Anyway, Intel ensures that even value 0 is safe.

Ideally we need a fixed default value that's guaranteed to work on all
possible hardware that supports the feature, or alternatively a way to
calculate a sane default window based on the hardware platform.

Could we get some wording added to the ISE regarding 0 being a
suitable default value to use because hardware will add a threshold
internally to make the value safe?

Thanks, Roger.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/3] x86/vmx: implement Notify VM Exit
  2022-06-09 10:09       ` Roger Pau Monné
@ 2022-06-16 11:17         ` Roger Pau Monné
  2022-06-16 11:57         ` Xiaoyao Li
  1 sibling, 0 replies; 21+ messages in thread
From: Roger Pau Monné @ 2022-06-16 11:17 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Tian, Kevin, xen-devel, Cooper, Andrew, George Dunlap, Beulich,
	Jan, Julien Grall, Stefano Stabellini, Wei Liu, Nakajima, Jun,
	Qiang, Chenyi

Ping?

On Thu, Jun 09, 2022 at 12:09:18PM +0200, Roger Pau Monné wrote:
> On Thu, Jun 09, 2022 at 03:39:33PM +0800, Xiaoyao Li wrote:
> > On 6/9/2022 3:04 PM, Tian, Kevin wrote:
> > > +Chenyi/Xiaoyao who worked on the KVM support. Presumably
> > > similar opens have been discussed in KVM hence they have the
> > > right background to comment here.
> > > 
> > > > From: Roger Pau Monne <roger.pau@citrix.com>
> > > > Sent: Thursday, May 26, 2022 7:12 PM
> > > > 
> > > > Under certain conditions guests can get the CPU stuck in an unbounded
> > > > loop without the possibility of an interrupt window to occur on
> > > > instruction boundary.  This was the case with the scenarios described
> > > > in XSA-156.
> > > > 
> > > > Make use of the Notify VM Exit mechanism, that will trigger a VM Exit
> > > > if no interrupt window occurs for a specified amount of time.  Note
> > > > that using the Notify VM Exit avoids having to trap #AC and #DB
> > > > exceptions, as Xen is guaranteed to get a VM Exit even if the guest
> > > > puts the CPU in a loop without an interrupt window, as such disable
> > > > the intercepts if the feature is available and enabled.
> > > > 
> > > > Setting the notify VM exit window to 0 is safe because there's a
> > > > threshold added by the hardware in order to have a sane window value.
> > > > 
> > > > Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
> > > > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > > > ---
> > > > Changes since v1:
> > > >   - Properly update debug state when using notify VM exit.
> > > >   - Reword commit message.
> > > > ---
> > > > This change enables the notify VM exit by default, KVM however doesn't
> > > > seem to enable it by default, and there's the following note in the
> > > > commit message:
> > > > 
> > > > "- There's a possibility, however small, that a notify VM exit happens
> > > >     with VM_CONTEXT_INVALID set in exit qualification. In this case, the
> > > >     vcpu can no longer run. To avoid killing a well-behaved guest, set
> > > >     notify window as -1 to disable this feature by default."
> > > > 
> > > > It's not obviously clear to me whether the comment was meant to be:
> > > > "There's a possibility, however small, that a notify VM exit _wrongly_
> > > > happens with VM_CONTEXT_INVALID".
> > > > 
> > > > It's also not clear whether such wrong hardware behavior only affects
> > > > a specific set of hardware,
> > 
> > I'm not sure what you mean for a specific set of hardware.
> > 
> > We make it default off in KVM just in case that future silicon wrongly sets
> > VM_CONTEXT_INVALID bit. Becuase we make the policy that VM cannot continue
> > running in that case.
> > 
> > For the worst case, if some future silicon happens to have this kind silly
> > bug, then the existing product kernel all suffer the possibility that their
> > VM being killed due to the feature is default on.
> 
> That's IMO a weird policy.  If there's such behavior in any hardware
> platform I would assume Intel would issue an errata, and then we would
> just avoid using the feature on affected hardware (like we do with
> other hardware features when they have erratas).
> 
> If we applied the same logic to all new Intel features we won't use
> any of them.  At least in Xen there are already combinations of vmexit
> conditions that will lead to the guest being killed.
> 
> > > > in a way that we could avoid enabling
> > > > notify VM exit there.
> > > > 
> > > > There's a discussion in one of the Linux patches that 128K might be
> > > > the safer value in order to prevent false positives, but I have no
> > > > formal confirmation about this.  Maybe our Intel maintainers can
> > > > provide some more feedback on a suitable notify VM exit window
> > > > value.
> > 
> > The 128k is the internal threshold for SPR silicon. The internal threshold
> > is tuned by Intel for each silicon, to make sure it's big enough to avoid
> > false positive even when user set vmcs.notify_window to 0.
> > 
> > However, it varies for different processor generations.
> > 
> > What is the suitable value is hard to say, it depends on how soon does VMM
> > want to intercept the VM. Anyway, Intel ensures that even value 0 is safe.
> 
> Ideally we need a fixed default value that's guaranteed to work on all
> possible hardware that supports the feature, or alternatively a way to
> calculate a sane default window based on the hardware platform.
> 
> Could we get some wording added to the ISE regarding 0 being a
> suitable default value to use because hardware will add a threshold
> internally to make the value safe?
> 
> Thanks, Roger.
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/3] x86/vmx: implement Notify VM Exit
  2022-06-09 10:09       ` Roger Pau Monné
  2022-06-16 11:17         ` Roger Pau Monné
@ 2022-06-16 11:57         ` Xiaoyao Li
  1 sibling, 0 replies; 21+ messages in thread
From: Xiaoyao Li @ 2022-06-16 11:57 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Tian, Kevin, xen-devel, Cooper, Andrew, George Dunlap, Beulich,
	Jan, Julien Grall, Stefano Stabellini, Wei Liu, Nakajima, Jun,
	Qiang, Chenyi

On 6/9/2022 6:09 PM, Roger Pau Monné wrote:
> On Thu, Jun 09, 2022 at 03:39:33PM +0800, Xiaoyao Li wrote:
>> On 6/9/2022 3:04 PM, Tian, Kevin wrote:
>>> +Chenyi/Xiaoyao who worked on the KVM support. Presumably
>>> similar opens have been discussed in KVM hence they have the
>>> right background to comment here.
>>>
>>>> From: Roger Pau Monne <roger.pau@citrix.com>
>>>> Sent: Thursday, May 26, 2022 7:12 PM
>>>>
>>>> Under certain conditions guests can get the CPU stuck in an unbounded
>>>> loop without the possibility of an interrupt window to occur on
>>>> instruction boundary.  This was the case with the scenarios described
>>>> in XSA-156.
>>>>
>>>> Make use of the Notify VM Exit mechanism, that will trigger a VM Exit
>>>> if no interrupt window occurs for a specified amount of time.  Note
>>>> that using the Notify VM Exit avoids having to trap #AC and #DB
>>>> exceptions, as Xen is guaranteed to get a VM Exit even if the guest
>>>> puts the CPU in a loop without an interrupt window, as such disable
>>>> the intercepts if the feature is available and enabled.
>>>>
>>>> Setting the notify VM exit window to 0 is safe because there's a
>>>> threshold added by the hardware in order to have a sane window value.
>>>>
>>>> Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>>> ---
>>>> Changes since v1:
>>>>    - Properly update debug state when using notify VM exit.
>>>>    - Reword commit message.
>>>> ---
>>>> This change enables the notify VM exit by default, KVM however doesn't
>>>> seem to enable it by default, and there's the following note in the
>>>> commit message:
>>>>
>>>> "- There's a possibility, however small, that a notify VM exit happens
>>>>      with VM_CONTEXT_INVALID set in exit qualification. In this case, the
>>>>      vcpu can no longer run. To avoid killing a well-behaved guest, set
>>>>      notify window as -1 to disable this feature by default."
>>>>
>>>> It's not obviously clear to me whether the comment was meant to be:
>>>> "There's a possibility, however small, that a notify VM exit _wrongly_
>>>> happens with VM_CONTEXT_INVALID".
>>>>
>>>> It's also not clear whether such wrong hardware behavior only affects
>>>> a specific set of hardware,
>>
>> I'm not sure what you mean for a specific set of hardware.
>>
>> We make it default off in KVM just in case that future silicon wrongly sets
>> VM_CONTEXT_INVALID bit. Becuase we make the policy that VM cannot continue
>> running in that case.
>>
>> For the worst case, if some future silicon happens to have this kind silly
>> bug, then the existing product kernel all suffer the possibility that their
>> VM being killed due to the feature is default on.
> 
> That's IMO a weird policy.  If there's such behavior in any hardware
> platform I would assume Intel would issue an errata, and then we would
> just avoid using the feature on affected hardware (like we do with
> other hardware features when they have erratas).
> 
> If we applied the same logic to all new Intel features we won't use
> any of them.  At least in Xen there are already combinations of vmexit
> conditions that will lead to the guest being killed.

The reason is that, currently no case will set the VM_CONTEXT_INVALID 
bit, people in KVM community are cautious on the uncertainty. No one in 
what case the VM_CONTEXT_INVALID will be in the future.

Anyway, that's only the worry from KVM reviewers.

>>>> in a way that we could avoid enabling
>>>> notify VM exit there.
>>>>
>>>> There's a discussion in one of the Linux patches that 128K might be
>>>> the safer value in order to prevent false positives, but I have no
>>>> formal confirmation about this.  Maybe our Intel maintainers can
>>>> provide some more feedback on a suitable notify VM exit window
>>>> value.
>>
>> The 128k is the internal threshold for SPR silicon. The internal threshold
>> is tuned by Intel for each silicon, to make sure it's big enough to avoid
>> false positive even when user set vmcs.notify_window to 0.
>>
>> However, it varies for different processor generations.
>>
>> What is the suitable value is hard to say, it depends on how soon does VMM
>> want to intercept the VM. Anyway, Intel ensures that even value 0 is safe.
> 
> Ideally we need a fixed default value that's guaranteed to work on all
> possible hardware that supports the feature, or alternatively a way to
> calculate a sane default window based on the hardware platform.
> 
> Could we get some wording added to the ISE regarding 0 being a
> suitable default value to use because hardware will add a threshold
> internally to make the value safe?

We will work with internal architects on this.

> Thanks, Roger.



^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2022-06-16 11:58 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-26 11:11 [PATCH v2 0/3] x86/vmx: implement Bus Lock and VM Notify Roger Pau Monne
2022-05-26 11:11 ` [PATCH v2 1/3] x86/vmx: implement Bus Lock detection Roger Pau Monne
2022-06-03 12:16   ` Jan Beulich
2022-06-03 14:29     ` Roger Pau Monné
2022-06-07  7:20       ` Jan Beulich
2022-06-06 13:27   ` Andrew Cooper
2022-06-07  6:54     ` Jan Beulich
2022-06-07 10:03       ` Roger Pau Monné
2022-05-26 11:11 ` [PATCH v2 2/3] x86/vmx: introduce helper to set VMX_INTR_SHADOW_NMI Roger Pau Monne
2022-06-03 12:20   ` Jan Beulich
2022-05-26 11:11 ` [PATCH v2 3/3] x86/vmx: implement Notify VM Exit Roger Pau Monne
2022-06-03 12:49   ` Jan Beulich
2022-06-03 14:46     ` Roger Pau Monné
2022-06-07  7:43       ` Jan Beulich
2022-06-07 10:05         ` Roger Pau Monné
2022-06-08  3:51           ` Tian, Kevin
2022-06-09  7:04   ` Tian, Kevin
2022-06-09  7:39     ` Xiaoyao Li
2022-06-09 10:09       ` Roger Pau Monné
2022-06-16 11:17         ` Roger Pau Monné
2022-06-16 11:57         ` Xiaoyao Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.