kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 0/2] Enable notify VM exit
@ 2022-09-15  9:28 Chenyi Qiang
  2022-09-15  9:28 ` [PATCH v6 1/2] i386: kvm: extend kvm_{get, put}_vcpu_events to support pending triple fault Chenyi Qiang
  2022-09-15  9:28 ` [PATCH v6 2/2] i386: Add notify VM exit support Chenyi Qiang
  0 siblings, 2 replies; 10+ messages in thread
From: Chenyi Qiang @ 2022-09-15  9:28 UTC (permalink / raw)
  To: Paolo Bonzini, Marcelo Tosatti, Richard Henderson,
	Eduardo Habkost, Peter Xu, Xiaoyao Li
  Cc: Chenyi Qiang, qemu-devel, kvm

Notify VM exit is introduced to mitigate the potential DOS attach from
malicious VM. This series is the userspace part to enable this feature
through a new KVM capability KVM_CAP_X86_NOTIFY_VMEXIT. The detailed
info can be seen in Patch 3.

The corresponding KVM support can be found in linux 6.0-rc1:
(2f4073e08f4c KVM: VMX: Enable Notify VM exit)

This patch set depends on some definition which can be updated from
scripts/update-linux-headers.sh. A separate patch set is sent out at
https://lists.gnu.org/archive/html/qemu-devel/2022-09/msg02102.html

---
Change logs:
v5 -> v6
- Add some info related to the valid range of notify_window in patch 2. (Peter Xu)
- Add the doc in qemu-options.hx. (Peter Xu)
- v5: https://lore.kernel.org/qemu-devel/20220817020845.21855-1-chenyi.qiang@intel.com/

v4 -> v5
- Remove the assert check to avoid the nop in NDEBUG case. (Yuan)
- v4: https://lore.kernel.org/qemu-devel/20220524140302.23272-1-chenyi.qiang@intel.com/

v3 -> v4
- Add a new KVM cap KVM_CAP_TRIPLE_FAULT_EVENT to guard the extension of triple fault
  event save&restore.
- v3: https://lore.kernel.org/qemu-devel/20220421074028.18196-1-chenyi.qiang@intel.com/

v2 -> v3
- Extend the argument to include both the notify window and some flags
  when enabling KVM_CAP_X86_BUS_LOCK_EXIT CAP.
- Change to use KVM_VCPUEVENTS_VALID_TRIPLE_FAULT in flags field and add
  pending_triple_fault field in struct kvm_vcpu_events.
- v2: https://lore.kernel.org/qemu-devel/20220318082934.25030-1-chenyi.qiang@intel.com/

v1 -> v2
- Add some commit message to explain why we disable Notify VM exit by default.
- Rename KVM_VCPUEVENT_SHUTDOWN to KVM_VCPUEVENT_TRIPLE_FAULT.
- Do the corresponding change to use the KVM_VCPUEVENTS_TRIPLE_FAULT
  to save/restore the triple fault event to avoid lose some synthesized
  triple fault from KVM.
- v1: https://lore.kernel.org/qemu-devel/20220310090205.10645-1-chenyi.qiang@intel.com/

---

Chenyi Qiang (2):
  i386: kvm: extend kvm_{get, put}_vcpu_events to support pending triple
    fault
  i386: Add notify VM exit support

 hw/i386/x86.c         | 45 ++++++++++++++++++++++++++++++++++++++++
 include/hw/i386/x86.h |  5 +++++
 qemu-options.hx       | 10 ++++++++-
 target/i386/cpu.c     |  1 +
 target/i386/cpu.h     |  1 +
 target/i386/kvm/kvm.c | 48 +++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 109 insertions(+), 1 deletion(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v6 1/2] i386: kvm: extend kvm_{get, put}_vcpu_events to support pending triple fault
  2022-09-15  9:28 [PATCH v6 0/2] Enable notify VM exit Chenyi Qiang
@ 2022-09-15  9:28 ` Chenyi Qiang
  2022-09-15  9:28 ` [PATCH v6 2/2] i386: Add notify VM exit support Chenyi Qiang
  1 sibling, 0 replies; 10+ messages in thread
From: Chenyi Qiang @ 2022-09-15  9:28 UTC (permalink / raw)
  To: Paolo Bonzini, Marcelo Tosatti, Richard Henderson,
	Eduardo Habkost, Peter Xu, Xiaoyao Li
  Cc: Chenyi Qiang, qemu-devel, kvm

For the direct triple faults, i.e. hardware detected and KVM morphed
to VM-Exit, KVM will never lose them. But for triple faults sythesized
by KVM, e.g. the RSM path, if KVM exits to userspace before the request
is serviced, userspace could migrate the VM and lose the triple fault.

A new flag KVM_VCPUEVENT_VALID_TRIPLE_FAULT is defined to signal that
the event.triple_fault_pending field contains a valid state if the
KVM_CAP_X86_TRIPLE_FAULT_EVENT capability is enabled.

Acked-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
---
 target/i386/cpu.c     |  1 +
 target/i386/cpu.h     |  1 +
 target/i386/kvm/kvm.c | 20 ++++++++++++++++++++
 3 files changed, 22 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 1db1278a59..6e107466b3 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6017,6 +6017,7 @@ static void x86_cpu_reset(DeviceState *dev)
     env->exception_has_payload = false;
     env->exception_payload = 0;
     env->nmi_injected = false;
+    env->triple_fault_pending = false;
 #if !defined(CONFIG_USER_ONLY)
     /* We hard-wire the BSP to the first CPU. */
     apic_designate_bsp(cpu->apic_state, s->cpu_index == 0);
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 82004b65b9..b97d182e28 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1739,6 +1739,7 @@ typedef struct CPUArchState {
     uint8_t has_error_code;
     uint8_t exception_has_payload;
     uint64_t exception_payload;
+    bool triple_fault_pending;
     uint32_t ins_len;
     uint32_t sipi_vector;
     bool tsc_valid;
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index a1fd1f5379..3838827134 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -132,6 +132,7 @@ static int has_xcrs;
 static int has_pit_state2;
 static int has_sregs2;
 static int has_exception_payload;
+static int has_triple_fault_event;
 
 static bool has_msr_mcg_ext_ctl;
 
@@ -2483,6 +2484,16 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
         }
     }
 
+    has_triple_fault_event = kvm_check_extension(s, KVM_CAP_X86_TRIPLE_FAULT_EVENT);
+    if (has_triple_fault_event) {
+        ret = kvm_vm_enable_cap(s, KVM_CAP_X86_TRIPLE_FAULT_EVENT, 0, true);
+        if (ret < 0) {
+            error_report("kvm: Failed to enable triple fault event cap: %s",
+                         strerror(-ret));
+            return ret;
+        }
+    }
+
     ret = kvm_get_supported_msrs(s);
     if (ret < 0) {
         return ret;
@@ -4299,6 +4310,11 @@ static int kvm_put_vcpu_events(X86CPU *cpu, int level)
         }
     }
 
+    if (has_triple_fault_event) {
+        events.flags |= KVM_VCPUEVENT_VALID_TRIPLE_FAULT;
+        events.triple_fault.pending = env->triple_fault_pending;
+    }
+
     return kvm_vcpu_ioctl(CPU(cpu), KVM_SET_VCPU_EVENTS, &events);
 }
 
@@ -4368,6 +4384,10 @@ static int kvm_get_vcpu_events(X86CPU *cpu)
         }
     }
 
+    if (events.flags & KVM_VCPUEVENT_VALID_TRIPLE_FAULT) {
+        env->triple_fault_pending = events.triple_fault.pending;
+    }
+
     env->sipi_vector = events.sipi_vector;
 
     return 0;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v6 2/2] i386: Add notify VM exit support
  2022-09-15  9:28 [PATCH v6 0/2] Enable notify VM exit Chenyi Qiang
  2022-09-15  9:28 ` [PATCH v6 1/2] i386: kvm: extend kvm_{get, put}_vcpu_events to support pending triple fault Chenyi Qiang
@ 2022-09-15  9:28 ` Chenyi Qiang
  2022-09-16 21:57   ` Peter Xu
  1 sibling, 1 reply; 10+ messages in thread
From: Chenyi Qiang @ 2022-09-15  9:28 UTC (permalink / raw)
  To: Paolo Bonzini, Marcelo Tosatti, Richard Henderson,
	Eduardo Habkost, Peter Xu, Xiaoyao Li
  Cc: Chenyi Qiang, qemu-devel, kvm

There are cases that malicious virtual machine can cause CPU stuck (due
to event windows don't open up), e.g., infinite loop in microcode when
nested #AC (CVE-2015-5307). No event window means no event (NMI, SMI and
IRQ) can be delivered. It leads the CPU to be unavailable to host or
other VMs. Notify VM exit is introduced to mitigate such kind of
attacks, which will generate a VM exit if no event window occurs in VM
non-root mode for a specified amount of time (notify window).

A new KVM capability KVM_CAP_X86_NOTIFY_VMEXIT is exposed to user space
so that the user can query the capability and set the expected notify
window when creating VMs. The format of the argument when enabling this
capability is as follows:
  Bit 63:32 - notify window specified in qemu command
  Bit 31:0  - some flags (e.g. KVM_X86_NOTIFY_VMEXIT_ENABLED is set to
              enable the feature.)

Because there are some concerns, e.g. a notify VM exit may happen with
VM_CONTEXT_INVALID set in exit qualification (no cases are anticipated
that would set this bit), which means VM context is corrupted. To avoid
the false positive and a well-behaved guest gets killed, make this
feature disabled by default. Users can enable the feature by a new
machine property:
    qemu -machine notify_vmexit=on,notify_window=0 ...

Note that notify_window is only valid when notify_vmexit is on. The valid
range of notify_window is non-negative. It is even safe to set it to zero
since there's an internal hardware threshold to be added to ensure no false
positive.

A new KVM exit reason KVM_EXIT_NOTIFY is defined for notify VM exit. If
it happens with VM_INVALID_CONTEXT, hypervisor exits to user space to
inform the fatal case. Then user space can inject a SHUTDOWN event to
the target vcpu. This is implemented by injecting a sythesized triple
fault event.

Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
---
 hw/i386/x86.c         | 45 +++++++++++++++++++++++++++++++++++++++++++
 include/hw/i386/x86.h |  5 +++++
 qemu-options.hx       | 10 +++++++++-
 target/i386/kvm/kvm.c | 28 +++++++++++++++++++++++++++
 4 files changed, 87 insertions(+), 1 deletion(-)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 050eedc0c8..1eccbd3deb 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -1379,6 +1379,37 @@ static void machine_set_sgx_epc(Object *obj, Visitor *v, const char *name,
     qapi_free_SgxEPCList(list);
 }
 
+static bool x86_machine_get_notify_vmexit(Object *obj, Error **errp)
+{
+    X86MachineState *x86ms = X86_MACHINE(obj);
+
+    return x86ms->notify_vmexit;
+}
+
+static void x86_machine_set_notify_vmexit(Object *obj, bool value, Error **errp)
+{
+    X86MachineState *x86ms = X86_MACHINE(obj);
+
+    x86ms->notify_vmexit = value;
+}
+
+static void x86_machine_get_notify_window(Object *obj, Visitor *v,
+                                const char *name, void *opaque, Error **errp)
+{
+    X86MachineState *x86ms = X86_MACHINE(obj);
+    uint32_t notify_window = x86ms->notify_window;
+
+    visit_type_uint32(v, name, &notify_window, errp);
+}
+
+static void x86_machine_set_notify_window(Object *obj, Visitor *v,
+                               const char *name, void *opaque, Error **errp)
+{
+    X86MachineState *x86ms = X86_MACHINE(obj);
+
+    visit_type_uint32(v, name, &x86ms->notify_window, errp);
+}
+
 static void x86_machine_initfn(Object *obj)
 {
     X86MachineState *x86ms = X86_MACHINE(obj);
@@ -1392,6 +1423,8 @@ static void x86_machine_initfn(Object *obj)
     x86ms->oem_table_id = g_strndup(ACPI_BUILD_APPNAME8, 8);
     x86ms->bus_lock_ratelimit = 0;
     x86ms->above_4g_mem_start = 4 * GiB;
+    x86ms->notify_vmexit = false;
+    x86ms->notify_window = 0;
 }
 
 static void x86_machine_class_init(ObjectClass *oc, void *data)
@@ -1461,6 +1494,18 @@ static void x86_machine_class_init(ObjectClass *oc, void *data)
         NULL, NULL);
     object_class_property_set_description(oc, "sgx-epc",
         "SGX EPC device");
+
+    object_class_property_add(oc, X86_MACHINE_NOTIFY_WINDOW, "uint32_t",
+                              x86_machine_get_notify_window,
+                              x86_machine_set_notify_window, NULL, NULL);
+    object_class_property_set_description(oc, X86_MACHINE_NOTIFY_WINDOW,
+            "Set the notify window required by notify VM exit");
+
+    object_class_property_add_bool(oc, X86_MACHINE_NOTIFY_VMEXIT,
+                                   x86_machine_get_notify_vmexit,
+                                   x86_machine_set_notify_vmexit);
+    object_class_property_set_description(oc, X86_MACHINE_NOTIFY_VMEXIT,
+            "Enable notify VM exit");
 }
 
 static const TypeInfo x86_machine_info = {
diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
index 62fa5774f8..5707329fa7 100644
--- a/include/hw/i386/x86.h
+++ b/include/hw/i386/x86.h
@@ -85,6 +85,9 @@ struct X86MachineState {
      * which means no limitation on the guest's bus locks.
      */
     uint64_t bus_lock_ratelimit;
+
+    bool notify_vmexit;
+    uint32_t notify_window;
 };
 
 #define X86_MACHINE_SMM              "smm"
@@ -94,6 +97,8 @@ struct X86MachineState {
 #define X86_MACHINE_OEM_ID           "x-oem-id"
 #define X86_MACHINE_OEM_TABLE_ID     "x-oem-table-id"
 #define X86_MACHINE_BUS_LOCK_RATELIMIT  "bus-lock-ratelimit"
+#define X86_MACHINE_NOTIFY_VMEXIT     "notify-vmexit"
+#define X86_MACHINE_NOTIFY_WINDOW     "notify-window"
 
 #define TYPE_X86_MACHINE   MACHINE_TYPE_NAME("x86")
 OBJECT_DECLARE_TYPE(X86MachineState, X86MachineClass, X86_MACHINE)
diff --git a/qemu-options.hx b/qemu-options.hx
index 31c04f7eea..3cdeeac8f3 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -37,7 +37,8 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
     "                memory-encryption=@var{} memory encryption object to use (default=none)\n"
     "                hmat=on|off controls ACPI HMAT support (default=off)\n"
     "                memory-backend='backend-id' specifies explicitly provided backend for main RAM (default=none)\n"
-    "                cxl-fmw.0.targets.0=firsttarget,cxl-fmw.0.targets.1=secondtarget,cxl-fmw.0.size=size[,cxl-fmw.0.interleave-granularity=granularity]\n",
+    "                cxl-fmw.0.targets.0=firsttarget,cxl-fmw.0.targets.1=secondtarget,cxl-fmw.0.size=size[,cxl-fmw.0.interleave-granularity=granularity]\n"
+    "                notify_vmexit=on|off,notify_window=n controls notify VM exit support (default=off) and specifies the notify window size (default=0)\n",
     QEMU_ARCH_ALL)
 SRST
 ``-machine [type=]name[,prop=value[,...]]``
@@ -157,6 +158,13 @@ SRST
         ::
 
             -machine cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.targets.1=cxl.1,cxl-fmw.0.size=128G,cxl-fmw.0.interleave-granularity=512k
+
+    ``notify_vmexit=on|off,notify_window=n``
+        Enables or disables Notify VM exit support on x86 host and specify
+        the corresponding notify window to trigger the VM exit if enabled.
+        This feature can mitigate the CPU stuck issue due to event windows
+        don't open up for a specified of time (notify window).
+        The default is off.
 ERST
 
 DEF("M", HAS_ARG, QEMU_OPTION_M,
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 3838827134..ae7fb2c495 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2597,6 +2597,20 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
             ratelimit_set_speed(&bus_lock_ratelimit_ctrl,
                                 x86ms->bus_lock_ratelimit, BUS_LOCK_SLICE_TIME);
         }
+
+        if (x86ms->notify_vmexit &&
+            kvm_check_extension(s, KVM_CAP_X86_NOTIFY_VMEXIT)) {
+            uint64_t notify_window_flags = ((uint64_t)x86ms->notify_window << 32) |
+                                           KVM_X86_NOTIFY_VMEXIT_ENABLED |
+                                           KVM_X86_NOTIFY_VMEXIT_USER;
+            ret = kvm_vm_enable_cap(s, KVM_CAP_X86_NOTIFY_VMEXIT, 0,
+                                    notify_window_flags);
+            if (ret < 0) {
+                error_report("kvm: Failed to enable notify vmexit cap: %s",
+                             strerror(-ret));
+                return ret;
+            }
+        }
     }
 
     return 0;
@@ -5141,6 +5155,7 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
     X86CPU *cpu = X86_CPU(cs);
     uint64_t code;
     int ret;
+    struct kvm_vcpu_events events = {};
 
     switch (run->exit_reason) {
     case KVM_EXIT_HLT:
@@ -5196,6 +5211,19 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
         /* already handled in kvm_arch_post_run */
         ret = 0;
         break;
+    case KVM_EXIT_NOTIFY:
+        ret = 0;
+        if (run->notify.flags & KVM_NOTIFY_CONTEXT_INVALID) {
+            warn_report("KVM: invalid context due to notify vmexit");
+            if (has_triple_fault_event) {
+                events.flags |= KVM_VCPUEVENT_VALID_TRIPLE_FAULT;
+                events.triple_fault.pending = true;
+                ret = kvm_vcpu_ioctl(cs, KVM_SET_VCPU_EVENTS, &events);
+            } else {
+                ret = -1;
+            }
+        }
+        break;
     default:
         fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason);
         ret = -1;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 2/2] i386: Add notify VM exit support
  2022-09-15  9:28 ` [PATCH v6 2/2] i386: Add notify VM exit support Chenyi Qiang
@ 2022-09-16 21:57   ` Peter Xu
  2022-09-19  5:46     ` Chenyi Qiang
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Xu @ 2022-09-16 21:57 UTC (permalink / raw)
  To: Chenyi Qiang
  Cc: Paolo Bonzini, Marcelo Tosatti, Richard Henderson,
	Eduardo Habkost, Xiaoyao Li, qemu-devel, kvm

On Thu, Sep 15, 2022 at 05:28:39PM +0800, Chenyi Qiang wrote:
> There are cases that malicious virtual machine can cause CPU stuck (due
> to event windows don't open up), e.g., infinite loop in microcode when
> nested #AC (CVE-2015-5307). No event window means no event (NMI, SMI and
> IRQ) can be delivered. It leads the CPU to be unavailable to host or
> other VMs. Notify VM exit is introduced to mitigate such kind of
> attacks, which will generate a VM exit if no event window occurs in VM
> non-root mode for a specified amount of time (notify window).
> 
> A new KVM capability KVM_CAP_X86_NOTIFY_VMEXIT is exposed to user space
> so that the user can query the capability and set the expected notify
> window when creating VMs. The format of the argument when enabling this
> capability is as follows:
>   Bit 63:32 - notify window specified in qemu command
>   Bit 31:0  - some flags (e.g. KVM_X86_NOTIFY_VMEXIT_ENABLED is set to
>               enable the feature.)
> 
> Because there are some concerns, e.g. a notify VM exit may happen with
> VM_CONTEXT_INVALID set in exit qualification (no cases are anticipated
> that would set this bit), which means VM context is corrupted. To avoid
> the false positive and a well-behaved guest gets killed, make this
> feature disabled by default. Users can enable the feature by a new
> machine property:
>     qemu -machine notify_vmexit=on,notify_window=0 ...
> 
> Note that notify_window is only valid when notify_vmexit is on. The valid
> range of notify_window is non-negative. It is even safe to set it to zero
> since there's an internal hardware threshold to be added to ensure no false
> positive.
> 
> A new KVM exit reason KVM_EXIT_NOTIFY is defined for notify VM exit. If
> it happens with VM_INVALID_CONTEXT, hypervisor exits to user space to
> inform the fatal case. Then user space can inject a SHUTDOWN event to
> the target vcpu. This is implemented by injecting a sythesized triple
> fault event.
> 
> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
> ---
>  hw/i386/x86.c         | 45 +++++++++++++++++++++++++++++++++++++++++++
>  include/hw/i386/x86.h |  5 +++++
>  qemu-options.hx       | 10 +++++++++-
>  target/i386/kvm/kvm.c | 28 +++++++++++++++++++++++++++
>  4 files changed, 87 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/i386/x86.c b/hw/i386/x86.c
> index 050eedc0c8..1eccbd3deb 100644
> --- a/hw/i386/x86.c
> +++ b/hw/i386/x86.c
> @@ -1379,6 +1379,37 @@ static void machine_set_sgx_epc(Object *obj, Visitor *v, const char *name,
>      qapi_free_SgxEPCList(list);
>  }
>  
> +static bool x86_machine_get_notify_vmexit(Object *obj, Error **errp)
> +{
> +    X86MachineState *x86ms = X86_MACHINE(obj);
> +
> +    return x86ms->notify_vmexit;
> +}
> +
> +static void x86_machine_set_notify_vmexit(Object *obj, bool value, Error **errp)
> +{
> +    X86MachineState *x86ms = X86_MACHINE(obj);
> +
> +    x86ms->notify_vmexit = value;
> +}
> +
> +static void x86_machine_get_notify_window(Object *obj, Visitor *v,
> +                                const char *name, void *opaque, Error **errp)
> +{
> +    X86MachineState *x86ms = X86_MACHINE(obj);
> +    uint32_t notify_window = x86ms->notify_window;
> +
> +    visit_type_uint32(v, name, &notify_window, errp);
> +}
> +
> +static void x86_machine_set_notify_window(Object *obj, Visitor *v,
> +                               const char *name, void *opaque, Error **errp)
> +{
> +    X86MachineState *x86ms = X86_MACHINE(obj);
> +
> +    visit_type_uint32(v, name, &x86ms->notify_window, errp);
> +}
> +
>  static void x86_machine_initfn(Object *obj)
>  {
>      X86MachineState *x86ms = X86_MACHINE(obj);
> @@ -1392,6 +1423,8 @@ static void x86_machine_initfn(Object *obj)
>      x86ms->oem_table_id = g_strndup(ACPI_BUILD_APPNAME8, 8);
>      x86ms->bus_lock_ratelimit = 0;
>      x86ms->above_4g_mem_start = 4 * GiB;
> +    x86ms->notify_vmexit = false;
> +    x86ms->notify_window = 0;
>  }
>  
>  static void x86_machine_class_init(ObjectClass *oc, void *data)
> @@ -1461,6 +1494,18 @@ static void x86_machine_class_init(ObjectClass *oc, void *data)
>          NULL, NULL);
>      object_class_property_set_description(oc, "sgx-epc",
>          "SGX EPC device");
> +
> +    object_class_property_add(oc, X86_MACHINE_NOTIFY_WINDOW, "uint32_t",
> +                              x86_machine_get_notify_window,
> +                              x86_machine_set_notify_window, NULL, NULL);
> +    object_class_property_set_description(oc, X86_MACHINE_NOTIFY_WINDOW,
> +            "Set the notify window required by notify VM exit");
> +
> +    object_class_property_add_bool(oc, X86_MACHINE_NOTIFY_VMEXIT,
> +                                   x86_machine_get_notify_vmexit,
> +                                   x86_machine_set_notify_vmexit);
> +    object_class_property_set_description(oc, X86_MACHINE_NOTIFY_VMEXIT,
> +            "Enable notify VM exit");
>  }
>  
>  static const TypeInfo x86_machine_info = {
> diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
> index 62fa5774f8..5707329fa7 100644
> --- a/include/hw/i386/x86.h
> +++ b/include/hw/i386/x86.h
> @@ -85,6 +85,9 @@ struct X86MachineState {
>       * which means no limitation on the guest's bus locks.
>       */
>      uint64_t bus_lock_ratelimit;
> +
> +    bool notify_vmexit;
> +    uint32_t notify_window;
>  };
>  
>  #define X86_MACHINE_SMM              "smm"
> @@ -94,6 +97,8 @@ struct X86MachineState {
>  #define X86_MACHINE_OEM_ID           "x-oem-id"
>  #define X86_MACHINE_OEM_TABLE_ID     "x-oem-table-id"
>  #define X86_MACHINE_BUS_LOCK_RATELIMIT  "bus-lock-ratelimit"
> +#define X86_MACHINE_NOTIFY_VMEXIT     "notify-vmexit"
> +#define X86_MACHINE_NOTIFY_WINDOW     "notify-window"
>  
>  #define TYPE_X86_MACHINE   MACHINE_TYPE_NAME("x86")
>  OBJECT_DECLARE_TYPE(X86MachineState, X86MachineClass, X86_MACHINE)
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 31c04f7eea..3cdeeac8f3 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -37,7 +37,8 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
>      "                memory-encryption=@var{} memory encryption object to use (default=none)\n"
>      "                hmat=on|off controls ACPI HMAT support (default=off)\n"
>      "                memory-backend='backend-id' specifies explicitly provided backend for main RAM (default=none)\n"
> -    "                cxl-fmw.0.targets.0=firsttarget,cxl-fmw.0.targets.1=secondtarget,cxl-fmw.0.size=size[,cxl-fmw.0.interleave-granularity=granularity]\n",
> +    "                cxl-fmw.0.targets.0=firsttarget,cxl-fmw.0.targets.1=secondtarget,cxl-fmw.0.size=size[,cxl-fmw.0.interleave-granularity=granularity]\n"
> +    "                notify_vmexit=on|off,notify_window=n controls notify VM exit support (default=off) and specifies the notify window size (default=0)\n",
>      QEMU_ARCH_ALL)
>  SRST
>  ``-machine [type=]name[,prop=value[,...]]``
> @@ -157,6 +158,13 @@ SRST
>          ::
>  
>              -machine cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.targets.1=cxl.1,cxl-fmw.0.size=128G,cxl-fmw.0.interleave-granularity=512k
> +
> +    ``notify_vmexit=on|off,notify_window=n``
> +        Enables or disables Notify VM exit support on x86 host and specify
> +        the corresponding notify window to trigger the VM exit if enabled.
> +        This feature can mitigate the CPU stuck issue due to event windows
> +        don't open up for a specified of time (notify window).
> +        The default is off.
>  ERST
>  
>  DEF("M", HAS_ARG, QEMU_OPTION_M,
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 3838827134..ae7fb2c495 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -2597,6 +2597,20 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>              ratelimit_set_speed(&bus_lock_ratelimit_ctrl,
>                                  x86ms->bus_lock_ratelimit, BUS_LOCK_SLICE_TIME);
>          }
> +
> +        if (x86ms->notify_vmexit &&
> +            kvm_check_extension(s, KVM_CAP_X86_NOTIFY_VMEXIT)) {
> +            uint64_t notify_window_flags = ((uint64_t)x86ms->notify_window << 32) |
> +                                           KVM_X86_NOTIFY_VMEXIT_ENABLED |
> +                                           KVM_X86_NOTIFY_VMEXIT_USER;

It'll always request a user exit here as long as enabled, then...

> +            ret = kvm_vm_enable_cap(s, KVM_CAP_X86_NOTIFY_VMEXIT, 0,
> +                                    notify_window_flags);
> +            if (ret < 0) {
> +                error_report("kvm: Failed to enable notify vmexit cap: %s",
> +                             strerror(-ret));
> +                return ret;
> +            }
> +        }
>      }
>  
>      return 0;
> @@ -5141,6 +5155,7 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
>      X86CPU *cpu = X86_CPU(cs);
>      uint64_t code;
>      int ret;
> +    struct kvm_vcpu_events events = {};
>  
>      switch (run->exit_reason) {
>      case KVM_EXIT_HLT:
> @@ -5196,6 +5211,19 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
>          /* already handled in kvm_arch_post_run */
>          ret = 0;
>          break;
> +    case KVM_EXIT_NOTIFY:
> +        ret = 0;
> +        if (run->notify.flags & KVM_NOTIFY_CONTEXT_INVALID) {
> +            warn_report("KVM: invalid context due to notify vmexit");
> +            if (has_triple_fault_event) {
> +                events.flags |= KVM_VCPUEVENT_VALID_TRIPLE_FAULT;
> +                events.triple_fault.pending = true;
> +                ret = kvm_vcpu_ioctl(cs, KVM_SET_VCPU_EVENTS, &events);
> +            } else {
> +                ret = -1;
> +            }
> +        }

... should we do something even if the context is valid?  Or I'm a bit
confused why KVM_X86_NOTIFY_VMEXIT_USER was set (IIUC we can just enable it
without setting VMEXIT_USER then).

Not sure some warning would be also useful here, but I really don't know
the whole context so I can't tell whether there can easily be false
positives to pollute qemu log.

> +        break;
>      default:
>          fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason);
>          ret = -1;
> -- 
> 2.17.1
> 

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 2/2] i386: Add notify VM exit support
  2022-09-16 21:57   ` Peter Xu
@ 2022-09-19  5:46     ` Chenyi Qiang
  2022-09-19  6:11       ` Xiaoyao Li
  2022-09-19 15:53       ` Peter Xu
  0 siblings, 2 replies; 10+ messages in thread
From: Chenyi Qiang @ 2022-09-19  5:46 UTC (permalink / raw)
  To: Peter Xu
  Cc: Paolo Bonzini, Marcelo Tosatti, Richard Henderson,
	Eduardo Habkost, Xiaoyao Li, qemu-devel, kvm



On 9/17/2022 5:57 AM, Peter Xu wrote:
> On Thu, Sep 15, 2022 at 05:28:39PM +0800, Chenyi Qiang wrote:
>> There are cases that malicious virtual machine can cause CPU stuck (due
>> to event windows don't open up), e.g., infinite loop in microcode when
>> nested #AC (CVE-2015-5307). No event window means no event (NMI, SMI and
>> IRQ) can be delivered. It leads the CPU to be unavailable to host or
>> other VMs. Notify VM exit is introduced to mitigate such kind of
>> attacks, which will generate a VM exit if no event window occurs in VM
>> non-root mode for a specified amount of time (notify window).
>>
>> A new KVM capability KVM_CAP_X86_NOTIFY_VMEXIT is exposed to user space
>> so that the user can query the capability and set the expected notify
>> window when creating VMs. The format of the argument when enabling this
>> capability is as follows:
>>    Bit 63:32 - notify window specified in qemu command
>>    Bit 31:0  - some flags (e.g. KVM_X86_NOTIFY_VMEXIT_ENABLED is set to
>>                enable the feature.)
>>
>> Because there are some concerns, e.g. a notify VM exit may happen with
>> VM_CONTEXT_INVALID set in exit qualification (no cases are anticipated
>> that would set this bit), which means VM context is corrupted. To avoid
>> the false positive and a well-behaved guest gets killed, make this
>> feature disabled by default. Users can enable the feature by a new
>> machine property:
>>      qemu -machine notify_vmexit=on,notify_window=0 ...
>>
>> Note that notify_window is only valid when notify_vmexit is on. The valid
>> range of notify_window is non-negative. It is even safe to set it to zero
>> since there's an internal hardware threshold to be added to ensure no false
>> positive.
>>
>> A new KVM exit reason KVM_EXIT_NOTIFY is defined for notify VM exit. If
>> it happens with VM_INVALID_CONTEXT, hypervisor exits to user space to
>> inform the fatal case. Then user space can inject a SHUTDOWN event to
>> the target vcpu. This is implemented by injecting a sythesized triple
>> fault event.
>>
>> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
>> ---
>>   hw/i386/x86.c         | 45 +++++++++++++++++++++++++++++++++++++++++++
>>   include/hw/i386/x86.h |  5 +++++
>>   qemu-options.hx       | 10 +++++++++-
>>   target/i386/kvm/kvm.c | 28 +++++++++++++++++++++++++++
>>   4 files changed, 87 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/i386/x86.c b/hw/i386/x86.c
>> index 050eedc0c8..1eccbd3deb 100644
>> --- a/hw/i386/x86.c
>> +++ b/hw/i386/x86.c
>> @@ -1379,6 +1379,37 @@ static void machine_set_sgx_epc(Object *obj, Visitor *v, const char *name,
>>       qapi_free_SgxEPCList(list);
>>   }
>>   
>> +static bool x86_machine_get_notify_vmexit(Object *obj, Error **errp)
>> +{
>> +    X86MachineState *x86ms = X86_MACHINE(obj);
>> +
>> +    return x86ms->notify_vmexit;
>> +}
>> +
>> +static void x86_machine_set_notify_vmexit(Object *obj, bool value, Error **errp)
>> +{
>> +    X86MachineState *x86ms = X86_MACHINE(obj);
>> +
>> +    x86ms->notify_vmexit = value;
>> +}
>> +
>> +static void x86_machine_get_notify_window(Object *obj, Visitor *v,
>> +                                const char *name, void *opaque, Error **errp)
>> +{
>> +    X86MachineState *x86ms = X86_MACHINE(obj);
>> +    uint32_t notify_window = x86ms->notify_window;
>> +
>> +    visit_type_uint32(v, name, &notify_window, errp);
>> +}
>> +
>> +static void x86_machine_set_notify_window(Object *obj, Visitor *v,
>> +                               const char *name, void *opaque, Error **errp)
>> +{
>> +    X86MachineState *x86ms = X86_MACHINE(obj);
>> +
>> +    visit_type_uint32(v, name, &x86ms->notify_window, errp);
>> +}
>> +
>>   static void x86_machine_initfn(Object *obj)
>>   {
>>       X86MachineState *x86ms = X86_MACHINE(obj);
>> @@ -1392,6 +1423,8 @@ static void x86_machine_initfn(Object *obj)
>>       x86ms->oem_table_id = g_strndup(ACPI_BUILD_APPNAME8, 8);
>>       x86ms->bus_lock_ratelimit = 0;
>>       x86ms->above_4g_mem_start = 4 * GiB;
>> +    x86ms->notify_vmexit = false;
>> +    x86ms->notify_window = 0;
>>   }
>>   
>>   static void x86_machine_class_init(ObjectClass *oc, void *data)
>> @@ -1461,6 +1494,18 @@ static void x86_machine_class_init(ObjectClass *oc, void *data)
>>           NULL, NULL);
>>       object_class_property_set_description(oc, "sgx-epc",
>>           "SGX EPC device");
>> +
>> +    object_class_property_add(oc, X86_MACHINE_NOTIFY_WINDOW, "uint32_t",
>> +                              x86_machine_get_notify_window,
>> +                              x86_machine_set_notify_window, NULL, NULL);
>> +    object_class_property_set_description(oc, X86_MACHINE_NOTIFY_WINDOW,
>> +            "Set the notify window required by notify VM exit");
>> +
>> +    object_class_property_add_bool(oc, X86_MACHINE_NOTIFY_VMEXIT,
>> +                                   x86_machine_get_notify_vmexit,
>> +                                   x86_machine_set_notify_vmexit);
>> +    object_class_property_set_description(oc, X86_MACHINE_NOTIFY_VMEXIT,
>> +            "Enable notify VM exit");
>>   }
>>   
>>   static const TypeInfo x86_machine_info = {
>> diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
>> index 62fa5774f8..5707329fa7 100644
>> --- a/include/hw/i386/x86.h
>> +++ b/include/hw/i386/x86.h
>> @@ -85,6 +85,9 @@ struct X86MachineState {
>>        * which means no limitation on the guest's bus locks.
>>        */
>>       uint64_t bus_lock_ratelimit;
>> +
>> +    bool notify_vmexit;
>> +    uint32_t notify_window;
>>   };
>>   
>>   #define X86_MACHINE_SMM              "smm"
>> @@ -94,6 +97,8 @@ struct X86MachineState {
>>   #define X86_MACHINE_OEM_ID           "x-oem-id"
>>   #define X86_MACHINE_OEM_TABLE_ID     "x-oem-table-id"
>>   #define X86_MACHINE_BUS_LOCK_RATELIMIT  "bus-lock-ratelimit"
>> +#define X86_MACHINE_NOTIFY_VMEXIT     "notify-vmexit"
>> +#define X86_MACHINE_NOTIFY_WINDOW     "notify-window"
>>   
>>   #define TYPE_X86_MACHINE   MACHINE_TYPE_NAME("x86")
>>   OBJECT_DECLARE_TYPE(X86MachineState, X86MachineClass, X86_MACHINE)
>> diff --git a/qemu-options.hx b/qemu-options.hx
>> index 31c04f7eea..3cdeeac8f3 100644
>> --- a/qemu-options.hx
>> +++ b/qemu-options.hx
>> @@ -37,7 +37,8 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
>>       "                memory-encryption=@var{} memory encryption object to use (default=none)\n"
>>       "                hmat=on|off controls ACPI HMAT support (default=off)\n"
>>       "                memory-backend='backend-id' specifies explicitly provided backend for main RAM (default=none)\n"
>> -    "                cxl-fmw.0.targets.0=firsttarget,cxl-fmw.0.targets.1=secondtarget,cxl-fmw.0.size=size[,cxl-fmw.0.interleave-granularity=granularity]\n",
>> +    "                cxl-fmw.0.targets.0=firsttarget,cxl-fmw.0.targets.1=secondtarget,cxl-fmw.0.size=size[,cxl-fmw.0.interleave-granularity=granularity]\n"
>> +    "                notify_vmexit=on|off,notify_window=n controls notify VM exit support (default=off) and specifies the notify window size (default=0)\n",
>>       QEMU_ARCH_ALL)
>>   SRST
>>   ``-machine [type=]name[,prop=value[,...]]``
>> @@ -157,6 +158,13 @@ SRST
>>           ::
>>   
>>               -machine cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.targets.1=cxl.1,cxl-fmw.0.size=128G,cxl-fmw.0.interleave-granularity=512k
>> +
>> +    ``notify_vmexit=on|off,notify_window=n``
>> +        Enables or disables Notify VM exit support on x86 host and specify
>> +        the corresponding notify window to trigger the VM exit if enabled.
>> +        This feature can mitigate the CPU stuck issue due to event windows
>> +        don't open up for a specified of time (notify window).
>> +        The default is off.
>>   ERST
>>   
>>   DEF("M", HAS_ARG, QEMU_OPTION_M,
>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
>> index 3838827134..ae7fb2c495 100644
>> --- a/target/i386/kvm/kvm.c
>> +++ b/target/i386/kvm/kvm.c
>> @@ -2597,6 +2597,20 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>>               ratelimit_set_speed(&bus_lock_ratelimit_ctrl,
>>                                   x86ms->bus_lock_ratelimit, BUS_LOCK_SLICE_TIME);
>>           }
>> +
>> +        if (x86ms->notify_vmexit &&
>> +            kvm_check_extension(s, KVM_CAP_X86_NOTIFY_VMEXIT)) {
>> +            uint64_t notify_window_flags = ((uint64_t)x86ms->notify_window << 32) |
>> +                                           KVM_X86_NOTIFY_VMEXIT_ENABLED |
>> +                                           KVM_X86_NOTIFY_VMEXIT_USER;
> 
> It'll always request a user exit here as long as enabled, then...
> 
>> +            ret = kvm_vm_enable_cap(s, KVM_CAP_X86_NOTIFY_VMEXIT, 0,
>> +                                    notify_window_flags);
>> +            if (ret < 0) {
>> +                error_report("kvm: Failed to enable notify vmexit cap: %s",
>> +                             strerror(-ret));
>> +                return ret;
>> +            }
>> +        }
>>       }
>>   
>>       return 0;
>> @@ -5141,6 +5155,7 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
>>       X86CPU *cpu = X86_CPU(cs);
>>       uint64_t code;
>>       int ret;
>> +    struct kvm_vcpu_events events = {};
>>   
>>       switch (run->exit_reason) {
>>       case KVM_EXIT_HLT:
>> @@ -5196,6 +5211,19 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
>>           /* already handled in kvm_arch_post_run */
>>           ret = 0;
>>           break;
>> +    case KVM_EXIT_NOTIFY:
>> +        ret = 0;
>> +        if (run->notify.flags & KVM_NOTIFY_CONTEXT_INVALID) {
>> +            warn_report("KVM: invalid context due to notify vmexit");
>> +            if (has_triple_fault_event) {
>> +                events.flags |= KVM_VCPUEVENT_VALID_TRIPLE_FAULT;
>> +                events.triple_fault.pending = true;
>> +                ret = kvm_vcpu_ioctl(cs, KVM_SET_VCPU_EVENTS, &events);
>> +            } else {
>> +                ret = -1;
>> +            }
>> +        }
> 
> ... should we do something even if the context is valid?  Or I'm a bit


Yes, make sense. A warning log is necessary if the context is valid.

> confused why KVM_X86_NOTIFY_VMEXIT_USER was set (IIUC we can just enable it
> without setting VMEXIT_USER then).
> 

VMEXIT_USR was set because KVM community prefers userspace can get 
notified and help to do some analysis or mitigation if notify window was 
exceeded.

> Not sure some warning would be also useful here, but I really don't know
> the whole context so I can't tell whether there can easily be false
> positives to pollute qemu log.
> 

The false positive case is not easy to happen unless some potential 
issues in silicon. But in case of it, to avoid polluting qemu log, how 
about:

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index ae7fb2c495..8f97133cbf 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -5213,6 +5213,7 @@ int kvm_arch_handle_exit(CPUState *cs, struct 
kvm_run *run)
          break;
      case KVM_EXIT_NOTIFY:
          ret = 0;
+        warn_report_once("KVM: notify window was exceeded in guest");
          if (run->notify.flags & KVM_NOTIFY_CONTEXT_INVALID) {
              warn_report("KVM: invalid context due to notify vmexit");
              if (has_triple_fault_event) {

>> +        break;
>>       default:
>>           fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason);
>>           ret = -1;
>> -- 
>> 2.17.1
>>
> 

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 2/2] i386: Add notify VM exit support
  2022-09-19  5:46     ` Chenyi Qiang
@ 2022-09-19  6:11       ` Xiaoyao Li
  2022-09-19 15:53       ` Peter Xu
  1 sibling, 0 replies; 10+ messages in thread
From: Xiaoyao Li @ 2022-09-19  6:11 UTC (permalink / raw)
  To: Chenyi Qiang, Peter Xu
  Cc: Paolo Bonzini, Marcelo Tosatti, Richard Henderson,
	Eduardo Habkost, qemu-devel, kvm

On 9/19/2022 1:46 PM, Chenyi Qiang wrote:
>> Not sure some warning would be also useful here, but I really don't know
>> the whole context so I can't tell whether there can easily be false
>> positives to pollute qemu log.
>>
> 
> The false positive case is not easy to happen unless some potential 
> issues in silicon. But in case of it, to avoid polluting qemu log, how 
> about:
> 
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index ae7fb2c495..8f97133cbf 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -5213,6 +5213,7 @@ int kvm_arch_handle_exit(CPUState *cs, struct 
> kvm_run *run)
>           break;
>       case KVM_EXIT_NOTIFY:
>           ret = 0;
> +        warn_report_once("KVM: notify window was exceeded in guest");
>           if (run->notify.flags & KVM_NOTIFY_CONTEXT_INVALID) {
>               warn_report("KVM: invalid context due to notify vmexit");
>               if (has_triple_fault_event) {

how about this

     case KVM_EXIT_NOTIFY:
         bool ctx_invalid = run->notify.flags & KVM_NOTIFY_CONTEXT_INVALID;
         ret = 0;
         warn_report_once("KVM: Encounter notify exit with %svalid context",
                          ctx_invalid ? "in" : "");

         if (ctx_invalid) {
             ...
         }

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 2/2] i386: Add notify VM exit support
  2022-09-19  5:46     ` Chenyi Qiang
  2022-09-19  6:11       ` Xiaoyao Li
@ 2022-09-19 15:53       ` Peter Xu
  2022-09-20  5:55         ` Chenyi Qiang
  1 sibling, 1 reply; 10+ messages in thread
From: Peter Xu @ 2022-09-19 15:53 UTC (permalink / raw)
  To: Chenyi Qiang
  Cc: Paolo Bonzini, Marcelo Tosatti, Richard Henderson,
	Eduardo Habkost, Xiaoyao Li, qemu-devel, kvm

On Mon, Sep 19, 2022 at 01:46:38PM +0800, Chenyi Qiang wrote:
> 
> 
> On 9/17/2022 5:57 AM, Peter Xu wrote:
> > On Thu, Sep 15, 2022 at 05:28:39PM +0800, Chenyi Qiang wrote:
> > > There are cases that malicious virtual machine can cause CPU stuck (due
> > > to event windows don't open up), e.g., infinite loop in microcode when
> > > nested #AC (CVE-2015-5307). No event window means no event (NMI, SMI and
> > > IRQ) can be delivered. It leads the CPU to be unavailable to host or
> > > other VMs. Notify VM exit is introduced to mitigate such kind of
> > > attacks, which will generate a VM exit if no event window occurs in VM
> > > non-root mode for a specified amount of time (notify window).
> > > 
> > > A new KVM capability KVM_CAP_X86_NOTIFY_VMEXIT is exposed to user space
> > > so that the user can query the capability and set the expected notify
> > > window when creating VMs. The format of the argument when enabling this
> > > capability is as follows:
> > >    Bit 63:32 - notify window specified in qemu command
> > >    Bit 31:0  - some flags (e.g. KVM_X86_NOTIFY_VMEXIT_ENABLED is set to
> > >                enable the feature.)
> > > 
> > > Because there are some concerns, e.g. a notify VM exit may happen with
> > > VM_CONTEXT_INVALID set in exit qualification (no cases are anticipated
> > > that would set this bit), which means VM context is corrupted. To avoid
> > > the false positive and a well-behaved guest gets killed, make this
> > > feature disabled by default. Users can enable the feature by a new
> > > machine property:
> > >      qemu -machine notify_vmexit=on,notify_window=0 ...
> > > 
> > > Note that notify_window is only valid when notify_vmexit is on. The valid
> > > range of notify_window is non-negative. It is even safe to set it to zero
> > > since there's an internal hardware threshold to be added to ensure no false
> > > positive.
> > > 
> > > A new KVM exit reason KVM_EXIT_NOTIFY is defined for notify VM exit. If
> > > it happens with VM_INVALID_CONTEXT, hypervisor exits to user space to
> > > inform the fatal case. Then user space can inject a SHUTDOWN event to
> > > the target vcpu. This is implemented by injecting a sythesized triple
> > > fault event.
> > > 
> > > Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
> > > ---
> > >   hw/i386/x86.c         | 45 +++++++++++++++++++++++++++++++++++++++++++
> > >   include/hw/i386/x86.h |  5 +++++
> > >   qemu-options.hx       | 10 +++++++++-
> > >   target/i386/kvm/kvm.c | 28 +++++++++++++++++++++++++++
> > >   4 files changed, 87 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/hw/i386/x86.c b/hw/i386/x86.c
> > > index 050eedc0c8..1eccbd3deb 100644
> > > --- a/hw/i386/x86.c
> > > +++ b/hw/i386/x86.c
> > > @@ -1379,6 +1379,37 @@ static void machine_set_sgx_epc(Object *obj, Visitor *v, const char *name,
> > >       qapi_free_SgxEPCList(list);
> > >   }
> > > +static bool x86_machine_get_notify_vmexit(Object *obj, Error **errp)
> > > +{
> > > +    X86MachineState *x86ms = X86_MACHINE(obj);
> > > +
> > > +    return x86ms->notify_vmexit;
> > > +}
> > > +
> > > +static void x86_machine_set_notify_vmexit(Object *obj, bool value, Error **errp)
> > > +{
> > > +    X86MachineState *x86ms = X86_MACHINE(obj);
> > > +
> > > +    x86ms->notify_vmexit = value;
> > > +}
> > > +
> > > +static void x86_machine_get_notify_window(Object *obj, Visitor *v,
> > > +                                const char *name, void *opaque, Error **errp)
> > > +{
> > > +    X86MachineState *x86ms = X86_MACHINE(obj);
> > > +    uint32_t notify_window = x86ms->notify_window;
> > > +
> > > +    visit_type_uint32(v, name, &notify_window, errp);
> > > +}
> > > +
> > > +static void x86_machine_set_notify_window(Object *obj, Visitor *v,
> > > +                               const char *name, void *opaque, Error **errp)
> > > +{
> > > +    X86MachineState *x86ms = X86_MACHINE(obj);
> > > +
> > > +    visit_type_uint32(v, name, &x86ms->notify_window, errp);
> > > +}
> > > +
> > >   static void x86_machine_initfn(Object *obj)
> > >   {
> > >       X86MachineState *x86ms = X86_MACHINE(obj);
> > > @@ -1392,6 +1423,8 @@ static void x86_machine_initfn(Object *obj)
> > >       x86ms->oem_table_id = g_strndup(ACPI_BUILD_APPNAME8, 8);
> > >       x86ms->bus_lock_ratelimit = 0;
> > >       x86ms->above_4g_mem_start = 4 * GiB;
> > > +    x86ms->notify_vmexit = false;
> > > +    x86ms->notify_window = 0;
> > >   }
> > >   static void x86_machine_class_init(ObjectClass *oc, void *data)
> > > @@ -1461,6 +1494,18 @@ static void x86_machine_class_init(ObjectClass *oc, void *data)
> > >           NULL, NULL);
> > >       object_class_property_set_description(oc, "sgx-epc",
> > >           "SGX EPC device");
> > > +
> > > +    object_class_property_add(oc, X86_MACHINE_NOTIFY_WINDOW, "uint32_t",
> > > +                              x86_machine_get_notify_window,
> > > +                              x86_machine_set_notify_window, NULL, NULL);
> > > +    object_class_property_set_description(oc, X86_MACHINE_NOTIFY_WINDOW,
> > > +            "Set the notify window required by notify VM exit");
> > > +
> > > +    object_class_property_add_bool(oc, X86_MACHINE_NOTIFY_VMEXIT,
> > > +                                   x86_machine_get_notify_vmexit,
> > > +                                   x86_machine_set_notify_vmexit);
> > > +    object_class_property_set_description(oc, X86_MACHINE_NOTIFY_VMEXIT,
> > > +            "Enable notify VM exit");
> > >   }
> > >   static const TypeInfo x86_machine_info = {
> > > diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
> > > index 62fa5774f8..5707329fa7 100644
> > > --- a/include/hw/i386/x86.h
> > > +++ b/include/hw/i386/x86.h
> > > @@ -85,6 +85,9 @@ struct X86MachineState {
> > >        * which means no limitation on the guest's bus locks.
> > >        */
> > >       uint64_t bus_lock_ratelimit;
> > > +
> > > +    bool notify_vmexit;
> > > +    uint32_t notify_window;
> > >   };
> > >   #define X86_MACHINE_SMM              "smm"
> > > @@ -94,6 +97,8 @@ struct X86MachineState {
> > >   #define X86_MACHINE_OEM_ID           "x-oem-id"
> > >   #define X86_MACHINE_OEM_TABLE_ID     "x-oem-table-id"
> > >   #define X86_MACHINE_BUS_LOCK_RATELIMIT  "bus-lock-ratelimit"
> > > +#define X86_MACHINE_NOTIFY_VMEXIT     "notify-vmexit"
> > > +#define X86_MACHINE_NOTIFY_WINDOW     "notify-window"
> > >   #define TYPE_X86_MACHINE   MACHINE_TYPE_NAME("x86")
> > >   OBJECT_DECLARE_TYPE(X86MachineState, X86MachineClass, X86_MACHINE)
> > > diff --git a/qemu-options.hx b/qemu-options.hx
> > > index 31c04f7eea..3cdeeac8f3 100644
> > > --- a/qemu-options.hx
> > > +++ b/qemu-options.hx
> > > @@ -37,7 +37,8 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
> > >       "                memory-encryption=@var{} memory encryption object to use (default=none)\n"
> > >       "                hmat=on|off controls ACPI HMAT support (default=off)\n"
> > >       "                memory-backend='backend-id' specifies explicitly provided backend for main RAM (default=none)\n"
> > > -    "                cxl-fmw.0.targets.0=firsttarget,cxl-fmw.0.targets.1=secondtarget,cxl-fmw.0.size=size[,cxl-fmw.0.interleave-granularity=granularity]\n",
> > > +    "                cxl-fmw.0.targets.0=firsttarget,cxl-fmw.0.targets.1=secondtarget,cxl-fmw.0.size=size[,cxl-fmw.0.interleave-granularity=granularity]\n"
> > > +    "                notify_vmexit=on|off,notify_window=n controls notify VM exit support (default=off) and specifies the notify window size (default=0)\n",
> > >       QEMU_ARCH_ALL)
> > >   SRST
> > >   ``-machine [type=]name[,prop=value[,...]]``
> > > @@ -157,6 +158,13 @@ SRST
> > >           ::
> > >               -machine cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.targets.1=cxl.1,cxl-fmw.0.size=128G,cxl-fmw.0.interleave-granularity=512k
> > > +
> > > +    ``notify_vmexit=on|off,notify_window=n``
> > > +        Enables or disables Notify VM exit support on x86 host and specify
> > > +        the corresponding notify window to trigger the VM exit if enabled.
> > > +        This feature can mitigate the CPU stuck issue due to event windows
> > > +        don't open up for a specified of time (notify window).
> > > +        The default is off.
> > >   ERST
> > >   DEF("M", HAS_ARG, QEMU_OPTION_M,
> > > diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> > > index 3838827134..ae7fb2c495 100644
> > > --- a/target/i386/kvm/kvm.c
> > > +++ b/target/i386/kvm/kvm.c
> > > @@ -2597,6 +2597,20 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
> > >               ratelimit_set_speed(&bus_lock_ratelimit_ctrl,
> > >                                   x86ms->bus_lock_ratelimit, BUS_LOCK_SLICE_TIME);
> > >           }
> > > +
> > > +        if (x86ms->notify_vmexit &&
> > > +            kvm_check_extension(s, KVM_CAP_X86_NOTIFY_VMEXIT)) {
> > > +            uint64_t notify_window_flags = ((uint64_t)x86ms->notify_window << 32) |
> > > +                                           KVM_X86_NOTIFY_VMEXIT_ENABLED |
> > > +                                           KVM_X86_NOTIFY_VMEXIT_USER;
> > 
> > It'll always request a user exit here as long as enabled, then...
> > 
> > > +            ret = kvm_vm_enable_cap(s, KVM_CAP_X86_NOTIFY_VMEXIT, 0,
> > > +                                    notify_window_flags);
> > > +            if (ret < 0) {
> > > +                error_report("kvm: Failed to enable notify vmexit cap: %s",
> > > +                             strerror(-ret));
> > > +                return ret;
> > > +            }
> > > +        }
> > >       }
> > >       return 0;
> > > @@ -5141,6 +5155,7 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
> > >       X86CPU *cpu = X86_CPU(cs);
> > >       uint64_t code;
> > >       int ret;
> > > +    struct kvm_vcpu_events events = {};
> > >       switch (run->exit_reason) {
> > >       case KVM_EXIT_HLT:
> > > @@ -5196,6 +5211,19 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
> > >           /* already handled in kvm_arch_post_run */
> > >           ret = 0;
> > >           break;
> > > +    case KVM_EXIT_NOTIFY:
> > > +        ret = 0;
> > > +        if (run->notify.flags & KVM_NOTIFY_CONTEXT_INVALID) {
> > > +            warn_report("KVM: invalid context due to notify vmexit");
> > > +            if (has_triple_fault_event) {
> > > +                events.flags |= KVM_VCPUEVENT_VALID_TRIPLE_FAULT;
> > > +                events.triple_fault.pending = true;
> > > +                ret = kvm_vcpu_ioctl(cs, KVM_SET_VCPU_EVENTS, &events);
> > > +            } else {
> > > +                ret = -1;
> > > +            }
> > > +        }
> > 
> > ... should we do something even if the context is valid?  Or I'm a bit
> 
> 
> Yes, make sense. A warning log is necessary if the context is valid.
> 
> > confused why KVM_X86_NOTIFY_VMEXIT_USER was set (IIUC we can just enable it
> > without setting VMEXIT_USER then).
> > 
> 
> VMEXIT_USR was set because KVM community prefers userspace can get notified
> and help to do some analysis or mitigation if notify window was exceeded.
> 
> > Not sure some warning would be also useful here, but I really don't know
> > the whole context so I can't tell whether there can easily be false
> > positives to pollute qemu log.
> > 
> 
> The false positive case is not easy to happen unless some potential issues
> in silicon. But in case of it, to avoid polluting qemu log, how about:
> 
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index ae7fb2c495..8f97133cbf 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -5213,6 +5213,7 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run
> *run)
>          break;
>      case KVM_EXIT_NOTIFY:
>          ret = 0;
> +        warn_report_once("KVM: notify window was exceeded in guest");

Is there more informative way to dump this?  If it's 99% that the guest was
doing something weird and needs attention, maybe worthwhile to point that
out directly to the admin?

>          if (run->notify.flags & KVM_NOTIFY_CONTEXT_INVALID) {
>              warn_report("KVM: invalid context due to notify vmexit");
>              if (has_triple_fault_event) {

Adding a warning looks good to me, with that (or in any better form of
wording):

Acked-by: Peter Xu <peterx@redhat.com>

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 2/2] i386: Add notify VM exit support
  2022-09-19 15:53       ` Peter Xu
@ 2022-09-20  5:55         ` Chenyi Qiang
  2022-09-20 13:59           ` Peter Xu
  0 siblings, 1 reply; 10+ messages in thread
From: Chenyi Qiang @ 2022-09-20  5:55 UTC (permalink / raw)
  To: Peter Xu
  Cc: Paolo Bonzini, Marcelo Tosatti, Richard Henderson,
	Eduardo Habkost, Xiaoyao Li, qemu-devel, kvm



On 9/19/2022 11:53 PM, Peter Xu wrote:
> On Mon, Sep 19, 2022 at 01:46:38PM +0800, Chenyi Qiang wrote:
>>
>>
>> On 9/17/2022 5:57 AM, Peter Xu wrote:
>>> On Thu, Sep 15, 2022 at 05:28:39PM +0800, Chenyi Qiang wrote:
>>>> There are cases that malicious virtual machine can cause CPU stuck (due
>>>> to event windows don't open up), e.g., infinite loop in microcode when
>>>> nested #AC (CVE-2015-5307). No event window means no event (NMI, SMI and
>>>> IRQ) can be delivered. It leads the CPU to be unavailable to host or
>>>> other VMs. Notify VM exit is introduced to mitigate such kind of
>>>> attacks, which will generate a VM exit if no event window occurs in VM
>>>> non-root mode for a specified amount of time (notify window).
>>>>
>>>> A new KVM capability KVM_CAP_X86_NOTIFY_VMEXIT is exposed to user space
>>>> so that the user can query the capability and set the expected notify
>>>> window when creating VMs. The format of the argument when enabling this
>>>> capability is as follows:
>>>>     Bit 63:32 - notify window specified in qemu command
>>>>     Bit 31:0  - some flags (e.g. KVM_X86_NOTIFY_VMEXIT_ENABLED is set to
>>>>                 enable the feature.)
>>>>
>>>> Because there are some concerns, e.g. a notify VM exit may happen with
>>>> VM_CONTEXT_INVALID set in exit qualification (no cases are anticipated
>>>> that would set this bit), which means VM context is corrupted. To avoid
>>>> the false positive and a well-behaved guest gets killed, make this
>>>> feature disabled by default. Users can enable the feature by a new
>>>> machine property:
>>>>       qemu -machine notify_vmexit=on,notify_window=0 ...
>>>>
>>>> Note that notify_window is only valid when notify_vmexit is on. The valid
>>>> range of notify_window is non-negative. It is even safe to set it to zero
>>>> since there's an internal hardware threshold to be added to ensure no false
>>>> positive.
>>>>
>>>> A new KVM exit reason KVM_EXIT_NOTIFY is defined for notify VM exit. If
>>>> it happens with VM_INVALID_CONTEXT, hypervisor exits to user space to
>>>> inform the fatal case. Then user space can inject a SHUTDOWN event to
>>>> the target vcpu. This is implemented by injecting a sythesized triple
>>>> fault event.
>>>>
>>>> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
>>>> ---
>>>>    hw/i386/x86.c         | 45 +++++++++++++++++++++++++++++++++++++++++++
>>>>    include/hw/i386/x86.h |  5 +++++
>>>>    qemu-options.hx       | 10 +++++++++-
>>>>    target/i386/kvm/kvm.c | 28 +++++++++++++++++++++++++++
>>>>    4 files changed, 87 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/hw/i386/x86.c b/hw/i386/x86.c
>>>> index 050eedc0c8..1eccbd3deb 100644
>>>> --- a/hw/i386/x86.c
>>>> +++ b/hw/i386/x86.c
>>>> @@ -1379,6 +1379,37 @@ static void machine_set_sgx_epc(Object *obj, Visitor *v, const char *name,
>>>>        qapi_free_SgxEPCList(list);
>>>>    }
>>>> +static bool x86_machine_get_notify_vmexit(Object *obj, Error **errp)
>>>> +{
>>>> +    X86MachineState *x86ms = X86_MACHINE(obj);
>>>> +
>>>> +    return x86ms->notify_vmexit;
>>>> +}
>>>> +
>>>> +static void x86_machine_set_notify_vmexit(Object *obj, bool value, Error **errp)
>>>> +{
>>>> +    X86MachineState *x86ms = X86_MACHINE(obj);
>>>> +
>>>> +    x86ms->notify_vmexit = value;
>>>> +}
>>>> +
>>>> +static void x86_machine_get_notify_window(Object *obj, Visitor *v,
>>>> +                                const char *name, void *opaque, Error **errp)
>>>> +{
>>>> +    X86MachineState *x86ms = X86_MACHINE(obj);
>>>> +    uint32_t notify_window = x86ms->notify_window;
>>>> +
>>>> +    visit_type_uint32(v, name, &notify_window, errp);
>>>> +}
>>>> +
>>>> +static void x86_machine_set_notify_window(Object *obj, Visitor *v,
>>>> +                               const char *name, void *opaque, Error **errp)
>>>> +{
>>>> +    X86MachineState *x86ms = X86_MACHINE(obj);
>>>> +
>>>> +    visit_type_uint32(v, name, &x86ms->notify_window, errp);
>>>> +}
>>>> +
>>>>    static void x86_machine_initfn(Object *obj)
>>>>    {
>>>>        X86MachineState *x86ms = X86_MACHINE(obj);
>>>> @@ -1392,6 +1423,8 @@ static void x86_machine_initfn(Object *obj)
>>>>        x86ms->oem_table_id = g_strndup(ACPI_BUILD_APPNAME8, 8);
>>>>        x86ms->bus_lock_ratelimit = 0;
>>>>        x86ms->above_4g_mem_start = 4 * GiB;
>>>> +    x86ms->notify_vmexit = false;
>>>> +    x86ms->notify_window = 0;
>>>>    }
>>>>    static void x86_machine_class_init(ObjectClass *oc, void *data)
>>>> @@ -1461,6 +1494,18 @@ static void x86_machine_class_init(ObjectClass *oc, void *data)
>>>>            NULL, NULL);
>>>>        object_class_property_set_description(oc, "sgx-epc",
>>>>            "SGX EPC device");
>>>> +
>>>> +    object_class_property_add(oc, X86_MACHINE_NOTIFY_WINDOW, "uint32_t",
>>>> +                              x86_machine_get_notify_window,
>>>> +                              x86_machine_set_notify_window, NULL, NULL);
>>>> +    object_class_property_set_description(oc, X86_MACHINE_NOTIFY_WINDOW,
>>>> +            "Set the notify window required by notify VM exit");
>>>> +
>>>> +    object_class_property_add_bool(oc, X86_MACHINE_NOTIFY_VMEXIT,
>>>> +                                   x86_machine_get_notify_vmexit,
>>>> +                                   x86_machine_set_notify_vmexit);
>>>> +    object_class_property_set_description(oc, X86_MACHINE_NOTIFY_VMEXIT,
>>>> +            "Enable notify VM exit");
>>>>    }
>>>>    static const TypeInfo x86_machine_info = {
>>>> diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
>>>> index 62fa5774f8..5707329fa7 100644
>>>> --- a/include/hw/i386/x86.h
>>>> +++ b/include/hw/i386/x86.h
>>>> @@ -85,6 +85,9 @@ struct X86MachineState {
>>>>         * which means no limitation on the guest's bus locks.
>>>>         */
>>>>        uint64_t bus_lock_ratelimit;
>>>> +
>>>> +    bool notify_vmexit;
>>>> +    uint32_t notify_window;
>>>>    };
>>>>    #define X86_MACHINE_SMM              "smm"
>>>> @@ -94,6 +97,8 @@ struct X86MachineState {
>>>>    #define X86_MACHINE_OEM_ID           "x-oem-id"
>>>>    #define X86_MACHINE_OEM_TABLE_ID     "x-oem-table-id"
>>>>    #define X86_MACHINE_BUS_LOCK_RATELIMIT  "bus-lock-ratelimit"
>>>> +#define X86_MACHINE_NOTIFY_VMEXIT     "notify-vmexit"
>>>> +#define X86_MACHINE_NOTIFY_WINDOW     "notify-window"
>>>>    #define TYPE_X86_MACHINE   MACHINE_TYPE_NAME("x86")
>>>>    OBJECT_DECLARE_TYPE(X86MachineState, X86MachineClass, X86_MACHINE)
>>>> diff --git a/qemu-options.hx b/qemu-options.hx
>>>> index 31c04f7eea..3cdeeac8f3 100644
>>>> --- a/qemu-options.hx
>>>> +++ b/qemu-options.hx
>>>> @@ -37,7 +37,8 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
>>>>        "                memory-encryption=@var{} memory encryption object to use (default=none)\n"
>>>>        "                hmat=on|off controls ACPI HMAT support (default=off)\n"
>>>>        "                memory-backend='backend-id' specifies explicitly provided backend for main RAM (default=none)\n"
>>>> -    "                cxl-fmw.0.targets.0=firsttarget,cxl-fmw.0.targets.1=secondtarget,cxl-fmw.0.size=size[,cxl-fmw.0.interleave-granularity=granularity]\n",
>>>> +    "                cxl-fmw.0.targets.0=firsttarget,cxl-fmw.0.targets.1=secondtarget,cxl-fmw.0.size=size[,cxl-fmw.0.interleave-granularity=granularity]\n"
>>>> +    "                notify_vmexit=on|off,notify_window=n controls notify VM exit support (default=off) and specifies the notify window size (default=0)\n",
>>>>        QEMU_ARCH_ALL)
>>>>    SRST
>>>>    ``-machine [type=]name[,prop=value[,...]]``
>>>> @@ -157,6 +158,13 @@ SRST
>>>>            ::
>>>>                -machine cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.targets.1=cxl.1,cxl-fmw.0.size=128G,cxl-fmw.0.interleave-granularity=512k
>>>> +
>>>> +    ``notify_vmexit=on|off,notify_window=n``
>>>> +        Enables or disables Notify VM exit support on x86 host and specify
>>>> +        the corresponding notify window to trigger the VM exit if enabled.
>>>> +        This feature can mitigate the CPU stuck issue due to event windows
>>>> +        don't open up for a specified of time (notify window).
>>>> +        The default is off.
>>>>    ERST
>>>>    DEF("M", HAS_ARG, QEMU_OPTION_M,
>>>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
>>>> index 3838827134..ae7fb2c495 100644
>>>> --- a/target/i386/kvm/kvm.c
>>>> +++ b/target/i386/kvm/kvm.c
>>>> @@ -2597,6 +2597,20 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>>>>                ratelimit_set_speed(&bus_lock_ratelimit_ctrl,
>>>>                                    x86ms->bus_lock_ratelimit, BUS_LOCK_SLICE_TIME);
>>>>            }
>>>> +
>>>> +        if (x86ms->notify_vmexit &&
>>>> +            kvm_check_extension(s, KVM_CAP_X86_NOTIFY_VMEXIT)) {
>>>> +            uint64_t notify_window_flags = ((uint64_t)x86ms->notify_window << 32) |
>>>> +                                           KVM_X86_NOTIFY_VMEXIT_ENABLED |
>>>> +                                           KVM_X86_NOTIFY_VMEXIT_USER;
>>>
>>> It'll always request a user exit here as long as enabled, then...
>>>
>>>> +            ret = kvm_vm_enable_cap(s, KVM_CAP_X86_NOTIFY_VMEXIT, 0,
>>>> +                                    notify_window_flags);
>>>> +            if (ret < 0) {
>>>> +                error_report("kvm: Failed to enable notify vmexit cap: %s",
>>>> +                             strerror(-ret));
>>>> +                return ret;
>>>> +            }
>>>> +        }
>>>>        }
>>>>        return 0;
>>>> @@ -5141,6 +5155,7 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
>>>>        X86CPU *cpu = X86_CPU(cs);
>>>>        uint64_t code;
>>>>        int ret;
>>>> +    struct kvm_vcpu_events events = {};
>>>>        switch (run->exit_reason) {
>>>>        case KVM_EXIT_HLT:
>>>> @@ -5196,6 +5211,19 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
>>>>            /* already handled in kvm_arch_post_run */
>>>>            ret = 0;
>>>>            break;
>>>> +    case KVM_EXIT_NOTIFY:
>>>> +        ret = 0;
>>>> +        if (run->notify.flags & KVM_NOTIFY_CONTEXT_INVALID) {
>>>> +            warn_report("KVM: invalid context due to notify vmexit");
>>>> +            if (has_triple_fault_event) {
>>>> +                events.flags |= KVM_VCPUEVENT_VALID_TRIPLE_FAULT;
>>>> +                events.triple_fault.pending = true;
>>>> +                ret = kvm_vcpu_ioctl(cs, KVM_SET_VCPU_EVENTS, &events);
>>>> +            } else {
>>>> +                ret = -1;
>>>> +            }
>>>> +        }
>>>
>>> ... should we do something even if the context is valid?  Or I'm a bit
>>
>>
>> Yes, make sense. A warning log is necessary if the context is valid.
>>
>>> confused why KVM_X86_NOTIFY_VMEXIT_USER was set (IIUC we can just enable it
>>> without setting VMEXIT_USER then).
>>>
>>
>> VMEXIT_USR was set because KVM community prefers userspace can get notified
>> and help to do some analysis or mitigation if notify window was exceeded.
>>
>>> Not sure some warning would be also useful here, but I really don't know
>>> the whole context so I can't tell whether there can easily be false
>>> positives to pollute qemu log.
>>>
>>
>> The false positive case is not easy to happen unless some potential issues
>> in silicon. But in case of it, to avoid polluting qemu log, how about:
>>
>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
>> index ae7fb2c495..8f97133cbf 100644
>> --- a/target/i386/kvm/kvm.c
>> +++ b/target/i386/kvm/kvm.c
>> @@ -5213,6 +5213,7 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run
>> *run)
>>           break;
>>       case KVM_EXIT_NOTIFY:
>>           ret = 0;
>> +        warn_report_once("KVM: notify window was exceeded in guest");
> 
> Is there more informative way to dump this?  If it's 99% that the guest was
> doing something weird and needs attention, maybe worthwhile to point that
> out directly to the admin?
> 

Do you mean to use other method to dump the info? i.e. printing a 
message is not so clear. Or the output message ("KVM: notify window was 
exceeded in guest") is not obvious and we need other wording.

>>           if (run->notify.flags & KVM_NOTIFY_CONTEXT_INVALID) {
>>               warn_report("KVM: invalid context due to notify vmexit");
>>               if (has_triple_fault_event) {
> 
> Adding a warning looks good to me, with that (or in any better form of
> wording):
> 
If no objection, I'll follow Xiaoyao's suggestion to form the wording like:

case KVM_EXIT_NOTIFY:
     ctx_invalid = !!(run->notify.flags & KVM_NOTIFY_CONTEXT_INVALID);
     ret = 0;
     warn_report_once("KVM: Encounter notify exit with %svalid context 
in guest", ctx_invalid ? "in" : "");
     if (ctx_invalid) {
         if (has_triple_fault_event) {}
     }

> Acked-by: Peter Xu <peterx@redhat.com>
> 
> Thanks,
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 2/2] i386: Add notify VM exit support
  2022-09-20  5:55         ` Chenyi Qiang
@ 2022-09-20 13:59           ` Peter Xu
  2022-09-21  3:07             ` Chenyi Qiang
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Xu @ 2022-09-20 13:59 UTC (permalink / raw)
  To: Chenyi Qiang
  Cc: Paolo Bonzini, Marcelo Tosatti, Richard Henderson,
	Eduardo Habkost, Xiaoyao Li, qemu-devel, kvm

On Tue, Sep 20, 2022 at 01:55:20PM +0800, Chenyi Qiang wrote:
> > > @@ -5213,6 +5213,7 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run
> > > *run)
> > >           break;
> > >       case KVM_EXIT_NOTIFY:
> > >           ret = 0;
> > > +        warn_report_once("KVM: notify window was exceeded in guest");
> > 
> > Is there more informative way to dump this?  If it's 99% that the guest was
> > doing something weird and needs attention, maybe worthwhile to point that
> > out directly to the admin?
> > 
> 
> Do you mean to use other method to dump the info? i.e. printing a message is
> not so clear. Or the output message ("KVM: notify window was exceeded in
> guest") is not obvious and we need other wording.

I meant something like:

  KVM received notify exit.  It means there can be possible misbehaves in
  the guest, please have a look.

Or something similar.  What I'm worried is the admin may not understand
what's "notify window" and that message got simply ignored.

Though I am not even sure whether that's accurate in the wordings.

> 
> > >           if (run->notify.flags & KVM_NOTIFY_CONTEXT_INVALID) {
> > >               warn_report("KVM: invalid context due to notify vmexit");
> > >               if (has_triple_fault_event) {
> > 
> > Adding a warning looks good to me, with that (or in any better form of
> > wording):
> > 
> If no objection, I'll follow Xiaoyao's suggestion to form the wording like:

No objection here.  Thanks.

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 2/2] i386: Add notify VM exit support
  2022-09-20 13:59           ` Peter Xu
@ 2022-09-21  3:07             ` Chenyi Qiang
  0 siblings, 0 replies; 10+ messages in thread
From: Chenyi Qiang @ 2022-09-21  3:07 UTC (permalink / raw)
  To: Peter Xu
  Cc: Paolo Bonzini, Marcelo Tosatti, Richard Henderson,
	Eduardo Habkost, Xiaoyao Li, qemu-devel, kvm



On 9/20/2022 9:59 PM, Peter Xu wrote:
> On Tue, Sep 20, 2022 at 01:55:20PM +0800, Chenyi Qiang wrote:
>>>> @@ -5213,6 +5213,7 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run
>>>> *run)
>>>>            break;
>>>>        case KVM_EXIT_NOTIFY:
>>>>            ret = 0;
>>>> +        warn_report_once("KVM: notify window was exceeded in guest");
>>>
>>> Is there more informative way to dump this?  If it's 99% that the guest was
>>> doing something weird and needs attention, maybe worthwhile to point that
>>> out directly to the admin?
>>>
>>
>> Do you mean to use other method to dump the info? i.e. printing a message is
>> not so clear. Or the output message ("KVM: notify window was exceeded in
>> guest") is not obvious and we need other wording.
> 
> I meant something like:
> 
>    KVM received notify exit.  It means there can be possible misbehaves in
>    the guest, please have a look.

Get your point. Then I can print this message behind as well.

Thanks.

> 
> Or something similar.  What I'm worried is the admin may not understand
> what's "notify window" and that message got simply ignored.
> 
> Though I am not even sure whether that's accurate in the wordings.
> 
>>
>>>>            if (run->notify.flags & KVM_NOTIFY_CONTEXT_INVALID) {
>>>>                warn_report("KVM: invalid context due to notify vmexit");
>>>>                if (has_triple_fault_event) {
>>>
>>> Adding a warning looks good to me, with that (or in any better form of
>>> wording):
>>>
>> If no objection, I'll follow Xiaoyao's suggestion to form the wording like:
> 
> No objection here.  Thanks.
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-09-21  3:08 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-15  9:28 [PATCH v6 0/2] Enable notify VM exit Chenyi Qiang
2022-09-15  9:28 ` [PATCH v6 1/2] i386: kvm: extend kvm_{get, put}_vcpu_events to support pending triple fault Chenyi Qiang
2022-09-15  9:28 ` [PATCH v6 2/2] i386: Add notify VM exit support Chenyi Qiang
2022-09-16 21:57   ` Peter Xu
2022-09-19  5:46     ` Chenyi Qiang
2022-09-19  6:11       ` Xiaoyao Li
2022-09-19 15:53       ` Peter Xu
2022-09-20  5:55         ` Chenyi Qiang
2022-09-20 13:59           ` Peter Xu
2022-09-21  3:07             ` Chenyi Qiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).