xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 for-4.14 0/3] vm_event: fix race-condition when disabling monitor events
@ 2020-05-21  2:31 Tamas K Lengyel
  2020-05-21  2:31 ` [PATCH v2 for-4.14 1/3] xen/monitor: Control register values Tamas K Lengyel
                   ` (3 more replies)
  0 siblings, 4 replies; 21+ messages in thread
From: Tamas K Lengyel @ 2020-05-21  2:31 UTC (permalink / raw)
  To: xen-devel
  Cc: Petre Pircalabu, Tamas K Lengyel, Julien Grall, Wei Liu,
	Andrew Cooper, Ian Jackson, George Dunlap, Stefano Stabellini,
	Jan Beulich, Alexandru Isaila, Volodymyr Babchuk,
	Roger Pau Monné

For the last couple years we have received numerous reports from users of
monitor vm_events of spurious guest crashes when using events. In particular,
it has observed that the problem occurs when vm_events are being disabled. The
nature of the guest crash varied widely and has only occured occasionally. This
made debugging the issue particularly hard. We had discussions about this issue
even here on the xen-devel mailinglist with no luck figuring it out.

The bug has now been identified as a race-condition between register event
handling and disabling the vm_event interface.

Patch 96760e2fba100d694300a81baddb5740e0f8c0ee, "vm_event: deny register writes
if refused by  vm_event reply" is the patch that introduced the error. In this
patch emulation of register write events can be postponed until the
corresponding vm_event handler decides whether to allow such write to take
place. Unfortunately this can only be implemented by performing the deny/allow
step when the vCPU gets scheduled. Due to that postponed emulation of the event
if the user decides to pause the VM in the vm_event handler and then disable
events, the entire emulation step is skipped the next time the vCPU is resumed.
Even if the user doesn't pause during the vm_event handling but exits
immediately and disables vm_event, the situation becomes racey as disabling
vm_event may succeed before the guest's vCPUs get scheduled with the pending
emulation task. This has been particularly the case with VMS  that have several
vCPUs as after the VM is unpaused it may actually take a long time before all
vCPUs get scheduled.

The only solution currently is to poll each vCPU before vm_events are disabled
to verify they had been scheduled before it is safe to disable vm_events. The
following patches resolve this issue in a much nicer way.

Patch 1 adds an option to the monitor_op domctl that needs to be specified if
    the user wants to actually use the postponed register-write handling
    mechanism. If that option is not specified then handling is performed the
    same way as before patch 96760e2fba100d694300a81baddb5740e0f8c0ee.
    
Patch 2 performs sanity checking when disabling vm_events to determine whether
    its safe to free all vm_event structures. The vCPUs still get unpaused to
    allow them to get scheduled and perform any of their pending operations,
    but otherwise an -EAGAIN error is returned signaling to the user that they
    need to wait and try again disabling the interface.
    
Patch 3 adds a vm_event specifically to signal to the user when it is safe to
    continue disabling the interface.
    
Shout out to our friends at CERT.pl for stumbling upon a crucial piece of
information that lead to finally squashing this nasty bug.

v2: minor adjustments based on Jan's comments

Tamas K Lengyel (3):
  xen/monitor: Control register values
  xen/vm_event: add vm_event_check_pending_op
  xen/vm_event: Add safe to disable vm_event

 xen/arch/x86/hvm/hvm.c            | 63 +++++++++++++++++++++++--------
 xen/arch/x86/hvm/monitor.c        | 14 +++++++
 xen/arch/x86/monitor.c            | 23 ++++++++++-
 xen/arch/x86/vm_event.c           | 23 +++++++++++
 xen/common/vm_event.c             | 17 +++++++--
 xen/include/asm-arm/vm_event.h    |  7 ++++
 xen/include/asm-x86/domain.h      |  2 +
 xen/include/asm-x86/hvm/monitor.h |  1 +
 xen/include/asm-x86/vm_event.h    |  2 +
 xen/include/public/domctl.h       |  3 ++
 xen/include/public/vm_event.h     |  8 ++++
 11 files changed, 144 insertions(+), 19 deletions(-)

-- 
2.26.1



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 for-4.14 1/3] xen/monitor: Control register values
  2020-05-21  2:31 [PATCH v2 for-4.14 0/3] vm_event: fix race-condition when disabling monitor events Tamas K Lengyel
@ 2020-05-21  2:31 ` Tamas K Lengyel
  2020-06-02 11:08   ` Roger Pau Monné
  2020-05-21  2:31 ` [PATCH v2 for-4.14 2/3] xen/vm_event: add vm_event_check_pending_op Tamas K Lengyel
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 21+ messages in thread
From: Tamas K Lengyel @ 2020-05-21  2:31 UTC (permalink / raw)
  To: xen-devel
  Cc: Petre Pircalabu, Tamas K Lengyel, Julien Grall, Wei Liu,
	Andrew Cooper, Ian Jackson, George Dunlap, Stefano Stabellini,
	Jan Beulich, Alexandru Isaila, Roger Pau Monné

Extend the monitor_op domctl to include option that enables
controlling what values certain registers are permitted to hold
by a monitor subscriber.

Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
---
 xen/arch/x86/hvm/hvm.c       | 25 ++++++++++++++++---------
 xen/arch/x86/monitor.c       | 10 +++++++++-
 xen/include/asm-x86/domain.h |  1 +
 xen/include/public/domctl.h  |  1 +
 4 files changed, 27 insertions(+), 10 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 09ee299bc7..e6780c685b 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -2263,7 +2263,8 @@ int hvm_set_cr0(unsigned long value, bool may_defer)
     {
         ASSERT(v->arch.vm_event);
 
-        if ( hvm_monitor_crX(CR0, value, old_value) )
+        if ( hvm_monitor_crX(CR0, value, old_value) &&
+             v->domain->arch.monitor.control_register_values )
         {
             /* The actual write will occur in hvm_do_resume(), if permitted. */
             v->arch.vm_event->write_data.do_write.cr0 = 1;
@@ -2362,7 +2363,8 @@ int hvm_set_cr3(unsigned long value, bool may_defer)
     {
         ASSERT(v->arch.vm_event);
 
-        if ( hvm_monitor_crX(CR3, value, old) )
+        if ( hvm_monitor_crX(CR3, value, old) &&
+             v->domain->arch.monitor.control_register_values )
         {
             /* The actual write will occur in hvm_do_resume(), if permitted. */
             v->arch.vm_event->write_data.do_write.cr3 = 1;
@@ -2443,7 +2445,8 @@ int hvm_set_cr4(unsigned long value, bool may_defer)
     {
         ASSERT(v->arch.vm_event);
 
-        if ( hvm_monitor_crX(CR4, value, old_cr) )
+        if ( hvm_monitor_crX(CR4, value, old_cr) &&
+             v->domain->arch.monitor.control_register_values )
         {
             /* The actual write will occur in hvm_do_resume(), if permitted. */
             v->arch.vm_event->write_data.do_write.cr4 = 1;
@@ -3587,13 +3590,17 @@ int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content,
 
         ASSERT(v->arch.vm_event);
 
-        /* The actual write will occur in hvm_do_resume() (if permitted). */
-        v->arch.vm_event->write_data.do_write.msr = 1;
-        v->arch.vm_event->write_data.msr = msr;
-        v->arch.vm_event->write_data.value = msr_content;
-
         hvm_monitor_msr(msr, msr_content, msr_old_content);
-        return X86EMUL_OKAY;
+
+        if ( v->domain->arch.monitor.control_register_values )
+        {
+            /* The actual write will occur in hvm_do_resume(), if permitted. */
+            v->arch.vm_event->write_data.do_write.msr = 1;
+            v->arch.vm_event->write_data.msr = msr;
+            v->arch.vm_event->write_data.value = msr_content;
+
+            return X86EMUL_OKAY;
+        }
     }
 
     if ( (ret = guest_wrmsr(v, msr, msr_content)) != X86EMUL_UNHANDLEABLE )
diff --git a/xen/arch/x86/monitor.c b/xen/arch/x86/monitor.c
index bbcb7536c7..1517a97f50 100644
--- a/xen/arch/x86/monitor.c
+++ b/xen/arch/x86/monitor.c
@@ -144,7 +144,15 @@ int arch_monitor_domctl_event(struct domain *d,
                               struct xen_domctl_monitor_op *mop)
 {
     struct arch_domain *ad = &d->arch;
-    bool requested_status = (XEN_DOMCTL_MONITOR_OP_ENABLE == mop->op);
+    bool requested_status;
+
+    if ( XEN_DOMCTL_MONITOR_OP_CONTROL_REGISTERS == mop->op )
+    {
+        ad->monitor.control_register_values = true;
+        return 0;
+    }
+
+    requested_status = (XEN_DOMCTL_MONITOR_OP_ENABLE == mop->op);
 
     switch ( mop->event )
     {
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 5b6d909266..d890ab7a22 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -416,6 +416,7 @@ struct arch_domain
          * This is used to filter out pagefaults.
          */
         unsigned int inguest_pagefault_disabled                            : 1;
+        unsigned int control_register_values                               : 1;
         struct monitor_msr_bitmap *msr_bitmap;
         uint64_t write_ctrlreg_mask[4];
     } monitor;
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 1ad34c35eb..cbcd25f12c 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -1025,6 +1025,7 @@ struct xen_domctl_psr_cmt_op {
 #define XEN_DOMCTL_MONITOR_OP_DISABLE           1
 #define XEN_DOMCTL_MONITOR_OP_GET_CAPABILITIES  2
 #define XEN_DOMCTL_MONITOR_OP_EMULATE_EACH_REP  3
+#define XEN_DOMCTL_MONITOR_OP_CONTROL_REGISTERS 4
 
 #define XEN_DOMCTL_MONITOR_EVENT_WRITE_CTRLREG         0
 #define XEN_DOMCTL_MONITOR_EVENT_MOV_TO_MSR            1
-- 
2.26.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 for-4.14 2/3] xen/vm_event: add vm_event_check_pending_op
  2020-05-21  2:31 [PATCH v2 for-4.14 0/3] vm_event: fix race-condition when disabling monitor events Tamas K Lengyel
  2020-05-21  2:31 ` [PATCH v2 for-4.14 1/3] xen/monitor: Control register values Tamas K Lengyel
@ 2020-05-21  2:31 ` Tamas K Lengyel
  2020-06-02 11:47   ` Roger Pau Monné
  2020-05-21  2:31 ` [PATCH v2 for-4.14 3/3] xen/vm_event: Add safe to disable vm_event Tamas K Lengyel
  2020-06-01 18:58 ` [PATCH v2 for-4.14 0/3] vm_event: fix race-condition when disabling monitor events Tamas K Lengyel
  3 siblings, 1 reply; 21+ messages in thread
From: Tamas K Lengyel @ 2020-05-21  2:31 UTC (permalink / raw)
  To: xen-devel
  Cc: Petre Pircalabu, Tamas K Lengyel, Julien Grall, Wei Liu,
	Andrew Cooper, Stefano Stabellini, Jan Beulich, Alexandru Isaila,
	Volodymyr Babchuk, Roger Pau Monné

Perform sanity checking when shutting vm_event down to determine whether
it is safe to do so. Error out with -EAGAIN in case pending operations
have been found for the domain.

Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
---
 xen/arch/x86/vm_event.c        | 23 +++++++++++++++++++++++
 xen/common/vm_event.c          | 17 ++++++++++++++---
 xen/include/asm-arm/vm_event.h |  7 +++++++
 xen/include/asm-x86/vm_event.h |  2 ++
 4 files changed, 46 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/vm_event.c b/xen/arch/x86/vm_event.c
index 848d69c1b0..a23aadc112 100644
--- a/xen/arch/x86/vm_event.c
+++ b/xen/arch/x86/vm_event.c
@@ -297,6 +297,29 @@ void vm_event_emulate_check(struct vcpu *v, vm_event_response_t *rsp)
     };
 }
 
+bool vm_event_check_pending_op(const struct vcpu *v)
+{
+    struct monitor_write_data *w = &v->arch.vm_event->write_data;
+
+    if ( !v->arch.vm_event->sync_event )
+        return false;
+
+    if ( w->do_write.cr0 )
+        return true;
+    if ( w->do_write.cr3 )
+        return true;
+    if ( w->do_write.cr4 )
+        return true;
+    if ( w->do_write.msr )
+        return true;
+    if ( v->arch.vm_event->set_gprs )
+        return true;
+    if ( v->arch.vm_event->emulate_flags )
+        return true;
+
+    return false;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/common/vm_event.c b/xen/common/vm_event.c
index 127f2d58f1..2df327a42c 100644
--- a/xen/common/vm_event.c
+++ b/xen/common/vm_event.c
@@ -183,6 +183,7 @@ static int vm_event_disable(struct domain *d, struct vm_event_domain **p_ved)
     if ( vm_event_check_ring(ved) )
     {
         struct vcpu *v;
+        bool pending_op = false;
 
         spin_lock(&ved->lock);
 
@@ -192,9 +193,6 @@ static int vm_event_disable(struct domain *d, struct vm_event_domain **p_ved)
             return -EBUSY;
         }
 
-        /* Free domU's event channel and leave the other one unbound */
-        free_xen_event_channel(d, ved->xen_port);
-
         /* Unblock all vCPUs */
         for_each_vcpu ( d, v )
         {
@@ -203,8 +201,21 @@ static int vm_event_disable(struct domain *d, struct vm_event_domain **p_ved)
                 vcpu_unpause(v);
                 ved->blocked--;
             }
+
+            if ( vm_event_check_pending_op(v) )
+                pending_op = true;
         }
 
+        /* vm_event ops are still pending until vCPUs get scheduled */
+        if ( pending_op )
+        {
+            spin_unlock(&ved->lock);
+            return -EAGAIN;
+        }
+
+        /* Free domU's event channel and leave the other one unbound */
+        free_xen_event_channel(d, ved->xen_port);
+
         destroy_ring_for_helper(&ved->ring_page, ved->ring_pg_struct);
 
         vm_event_cleanup_domain(d);
diff --git a/xen/include/asm-arm/vm_event.h b/xen/include/asm-arm/vm_event.h
index 14d1d341cc..978b224dc3 100644
--- a/xen/include/asm-arm/vm_event.h
+++ b/xen/include/asm-arm/vm_event.h
@@ -58,4 +58,11 @@ void vm_event_sync_event(struct vcpu *v, bool value)
     /* Not supported on ARM. */
 }
 
+static inline
+bool vm_event_check_pending_op(const struct vcpu *v)
+{
+    /* Not supported on ARM. */
+    return false;
+}
+
 #endif /* __ASM_ARM_VM_EVENT_H__ */
diff --git a/xen/include/asm-x86/vm_event.h b/xen/include/asm-x86/vm_event.h
index 785e741fba..97860d0d99 100644
--- a/xen/include/asm-x86/vm_event.h
+++ b/xen/include/asm-x86/vm_event.h
@@ -54,4 +54,6 @@ void vm_event_emulate_check(struct vcpu *v, vm_event_response_t *rsp);
 
 void vm_event_sync_event(struct vcpu *v, bool value);
 
+bool vm_event_check_pending_op(const struct vcpu *v);
+
 #endif /* __ASM_X86_VM_EVENT_H__ */
-- 
2.26.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 for-4.14 3/3] xen/vm_event: Add safe to disable vm_event
  2020-05-21  2:31 [PATCH v2 for-4.14 0/3] vm_event: fix race-condition when disabling monitor events Tamas K Lengyel
  2020-05-21  2:31 ` [PATCH v2 for-4.14 1/3] xen/monitor: Control register values Tamas K Lengyel
  2020-05-21  2:31 ` [PATCH v2 for-4.14 2/3] xen/vm_event: add vm_event_check_pending_op Tamas K Lengyel
@ 2020-05-21  2:31 ` Tamas K Lengyel
  2020-06-02 12:54   ` Roger Pau Monné
  2020-06-01 18:58 ` [PATCH v2 for-4.14 0/3] vm_event: fix race-condition when disabling monitor events Tamas K Lengyel
  3 siblings, 1 reply; 21+ messages in thread
From: Tamas K Lengyel @ 2020-05-21  2:31 UTC (permalink / raw)
  To: xen-devel
  Cc: Petre Pircalabu, Tamas K Lengyel, Julien Grall, Wei Liu,
	Andrew Cooper, Ian Jackson, George Dunlap, Stefano Stabellini,
	Jan Beulich, Alexandru Isaila, Roger Pau Monné

Instead of having to repeatedly try to disable vm_events, request a specific
vm_event to be sent when the domain is safe to continue with shutting down
the vm_event interface.

Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
---
 xen/arch/x86/hvm/hvm.c            | 38 ++++++++++++++++++++++++++-----
 xen/arch/x86/hvm/monitor.c        | 14 ++++++++++++
 xen/arch/x86/monitor.c            | 13 +++++++++++
 xen/include/asm-x86/domain.h      |  1 +
 xen/include/asm-x86/hvm/monitor.h |  1 +
 xen/include/public/domctl.h       |  2 ++
 xen/include/public/vm_event.h     |  8 +++++++
 7 files changed, 71 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index e6780c685b..fc7e1e2b22 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -563,15 +563,41 @@ void hvm_do_resume(struct vcpu *v)
         v->arch.hvm.inject_event.vector = HVM_EVENT_VECTOR_UNSET;
     }
 
-    if ( unlikely(v->arch.vm_event) && v->arch.monitor.next_interrupt_enabled )
+    if ( unlikely(v->arch.vm_event) )
     {
-        struct x86_event info;
+        struct domain *d = v->domain;
+
+        if ( v->arch.monitor.next_interrupt_enabled )
+        {
+            struct x86_event info;
+
+            if ( hvm_get_pending_event(v, &info) )
+            {
+                hvm_monitor_interrupt(info.vector, info.type, info.error_code,
+                                      info.cr2);
+                v->arch.monitor.next_interrupt_enabled = false;
+            }
+        }
 
-        if ( hvm_get_pending_event(v, &info) )
+        if ( d->arch.monitor.safe_to_disable )
         {
-            hvm_monitor_interrupt(info.vector, info.type, info.error_code,
-                                  info.cr2);
-            v->arch.monitor.next_interrupt_enabled = false;
+            const struct vcpu *check_vcpu;
+            bool pending_op = false;
+
+            for_each_vcpu ( d, check_vcpu )
+            {
+                if ( vm_event_check_pending_op(check_vcpu) )
+                {
+                    pending_op = true;
+                    break;
+                }
+            }
+
+            if ( !pending_op )
+            {
+                hvm_monitor_safe_to_disable();
+                d->arch.monitor.safe_to_disable = false;
+            }
         }
     }
 }
diff --git a/xen/arch/x86/hvm/monitor.c b/xen/arch/x86/hvm/monitor.c
index f5d89e71d1..75fd1a4b68 100644
--- a/xen/arch/x86/hvm/monitor.c
+++ b/xen/arch/x86/hvm/monitor.c
@@ -300,6 +300,20 @@ bool hvm_monitor_check_p2m(unsigned long gla, gfn_t gfn, uint32_t pfec,
     return monitor_traps(curr, true, &req) >= 0;
 }
 
+void hvm_monitor_safe_to_disable(void)
+{
+    struct vcpu *curr = current;
+    struct arch_domain *ad = &curr->domain->arch;
+    vm_event_request_t req = {};
+
+    if ( !ad->monitor.safe_to_disable )
+        return;
+
+    req.reason = VM_EVENT_REASON_SAFE_TO_DISABLE;
+
+    monitor_traps(curr, 0, &req);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/monitor.c b/xen/arch/x86/monitor.c
index 1517a97f50..86e0ba2fbc 100644
--- a/xen/arch/x86/monitor.c
+++ b/xen/arch/x86/monitor.c
@@ -339,6 +339,19 @@ int arch_monitor_domctl_event(struct domain *d,
         break;
     }
 
+    case XEN_DOMCTL_MONITOR_EVENT_SAFE_TO_DISABLE:
+    {
+        bool old_status = ad->monitor.safe_to_disable;
+
+        if ( unlikely(old_status == requested_status) )
+            return -EEXIST;
+
+        domain_pause(d);
+        ad->monitor.safe_to_disable = requested_status;
+        domain_unpause(d);
+        break;
+    }
+
     default:
         /*
          * Should not be reached unless arch_monitor_get_capabilities() is
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index d890ab7a22..948b750c71 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -417,6 +417,7 @@ struct arch_domain
          */
         unsigned int inguest_pagefault_disabled                            : 1;
         unsigned int control_register_values                               : 1;
+        unsigned int safe_to_disable                                       : 1;
         struct monitor_msr_bitmap *msr_bitmap;
         uint64_t write_ctrlreg_mask[4];
     } monitor;
diff --git a/xen/include/asm-x86/hvm/monitor.h b/xen/include/asm-x86/hvm/monitor.h
index 66de24cb75..dbc113a635 100644
--- a/xen/include/asm-x86/hvm/monitor.h
+++ b/xen/include/asm-x86/hvm/monitor.h
@@ -52,6 +52,7 @@ bool hvm_monitor_emul_unimplemented(void);
 
 bool hvm_monitor_check_p2m(unsigned long gla, gfn_t gfn, uint32_t pfec,
                            uint16_t kind);
+void hvm_monitor_safe_to_disable(void);
 
 #endif /* __ASM_X86_HVM_MONITOR_H__ */
 
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index cbcd25f12c..247e809a6c 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -1040,6 +1040,8 @@ struct xen_domctl_psr_cmt_op {
 #define XEN_DOMCTL_MONITOR_EVENT_EMUL_UNIMPLEMENTED    10
 /* Enabled by default */
 #define XEN_DOMCTL_MONITOR_EVENT_INGUEST_PAGEFAULT     11
+/* Always async, disables automaticaly on first event */
+#define XEN_DOMCTL_MONITOR_EVENT_SAFE_TO_DISABLE       12
 
 struct xen_domctl_monitor_op {
     uint32_t op; /* XEN_DOMCTL_MONITOR_OP_* */
diff --git a/xen/include/public/vm_event.h b/xen/include/public/vm_event.h
index fdd3ad8a30..b66d2a8634 100644
--- a/xen/include/public/vm_event.h
+++ b/xen/include/public/vm_event.h
@@ -159,6 +159,14 @@
 #define VM_EVENT_REASON_DESCRIPTOR_ACCESS       13
 /* Current instruction is not implemented by the emulator */
 #define VM_EVENT_REASON_EMUL_UNIMPLEMENTED      14
+/*
+ * When shutting down vm_event it may not be immediately safe to complete the
+ * process as some vCPUs may be pending synchronization. This async event
+ * type can be used to receive a notification when its safe to finish disabling
+ * the vm_event interface. All other event types need to be disabled before
+ * registering to this one.
+ */
+#define VM_EVENT_REASON_SAFE_TO_DISABLE         15
 
 /* Supported values for the vm_event_write_ctrlreg index. */
 #define VM_EVENT_X86_CR0    0
-- 
2.26.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 for-4.14 0/3] vm_event: fix race-condition when disabling monitor events
  2020-05-21  2:31 [PATCH v2 for-4.14 0/3] vm_event: fix race-condition when disabling monitor events Tamas K Lengyel
                   ` (2 preceding siblings ...)
  2020-05-21  2:31 ` [PATCH v2 for-4.14 3/3] xen/vm_event: Add safe to disable vm_event Tamas K Lengyel
@ 2020-06-01 18:58 ` Tamas K Lengyel
  3 siblings, 0 replies; 21+ messages in thread
From: Tamas K Lengyel @ 2020-06-01 18:58 UTC (permalink / raw)
  To: Xen-devel
  Cc: Petre Pircalabu, Stefano Stabellini, Julien Grall, Wei Liu,
	Andrew Cooper, Ian Jackson, George Dunlap, Jan Beulich,
	Alexandru Isaila, Volodymyr Babchuk, Roger Pau Monné

On Wed, May 20, 2020 at 8:31 PM Tamas K Lengyel <tamas@tklengyel.com> wrote:
>
> For the last couple years we have received numerous reports from users of
> monitor vm_events of spurious guest crashes when using events. In particular,
> it has observed that the problem occurs when vm_events are being disabled. The
> nature of the guest crash varied widely and has only occured occasionally. This
> made debugging the issue particularly hard. We had discussions about this issue
> even here on the xen-devel mailinglist with no luck figuring it out.
>
> The bug has now been identified as a race-condition between register event
> handling and disabling the vm_event interface.
>
> Patch 96760e2fba100d694300a81baddb5740e0f8c0ee, "vm_event: deny register writes
> if refused by  vm_event reply" is the patch that introduced the error. In this
> patch emulation of register write events can be postponed until the
> corresponding vm_event handler decides whether to allow such write to take
> place. Unfortunately this can only be implemented by performing the deny/allow
> step when the vCPU gets scheduled. Due to that postponed emulation of the event
> if the user decides to pause the VM in the vm_event handler and then disable
> events, the entire emulation step is skipped the next time the vCPU is resumed.
> Even if the user doesn't pause during the vm_event handling but exits
> immediately and disables vm_event, the situation becomes racey as disabling
> vm_event may succeed before the guest's vCPUs get scheduled with the pending
> emulation task. This has been particularly the case with VMS  that have several
> vCPUs as after the VM is unpaused it may actually take a long time before all
> vCPUs get scheduled.
>
> The only solution currently is to poll each vCPU before vm_events are disabled
> to verify they had been scheduled before it is safe to disable vm_events. The
> following patches resolve this issue in a much nicer way.
>
> Patch 1 adds an option to the monitor_op domctl that needs to be specified if
>     the user wants to actually use the postponed register-write handling
>     mechanism. If that option is not specified then handling is performed the
>     same way as before patch 96760e2fba100d694300a81baddb5740e0f8c0ee.
>
> Patch 2 performs sanity checking when disabling vm_events to determine whether
>     its safe to free all vm_event structures. The vCPUs still get unpaused to
>     allow them to get scheduled and perform any of their pending operations,
>     but otherwise an -EAGAIN error is returned signaling to the user that they
>     need to wait and try again disabling the interface.
>
> Patch 3 adds a vm_event specifically to signal to the user when it is safe to
>     continue disabling the interface.
>
> Shout out to our friends at CERT.pl for stumbling upon a crucial piece of
> information that lead to finally squashing this nasty bug.
>
> v2: minor adjustments based on Jan's comments

Patch ping.

Tamas


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 for-4.14 1/3] xen/monitor: Control register values
  2020-05-21  2:31 ` [PATCH v2 for-4.14 1/3] xen/monitor: Control register values Tamas K Lengyel
@ 2020-06-02 11:08   ` Roger Pau Monné
  2020-06-02 12:40     ` Tamas K Lengyel
  0 siblings, 1 reply; 21+ messages in thread
From: Roger Pau Monné @ 2020-06-02 11:08 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Petre Pircalabu, Stefano Stabellini, Julien Grall, Wei Liu,
	Andrew Cooper, Ian Jackson, George Dunlap, Jan Beulich,
	Alexandru Isaila, xen-devel

On Wed, May 20, 2020 at 08:31:52PM -0600, Tamas K Lengyel wrote:
> Extend the monitor_op domctl to include option that enables
> controlling what values certain registers are permitted to hold
> by a monitor subscriber.

I think the change could benefit for some more detail commit message
here. Why is this useful?

There already seems to be some support for gating MSR writes, which
seems to be expanded by this commit?

Is it solving some kind of bug reported?

> Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
> ---
>  xen/arch/x86/hvm/hvm.c       | 25 ++++++++++++++++---------
>  xen/arch/x86/monitor.c       | 10 +++++++++-
>  xen/include/asm-x86/domain.h |  1 +
>  xen/include/public/domctl.h  |  1 +
>  4 files changed, 27 insertions(+), 10 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 09ee299bc7..e6780c685b 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -2263,7 +2263,8 @@ int hvm_set_cr0(unsigned long value, bool may_defer)
>      {
>          ASSERT(v->arch.vm_event);
>  
> -        if ( hvm_monitor_crX(CR0, value, old_value) )
> +        if ( hvm_monitor_crX(CR0, value, old_value) &&
> +             v->domain->arch.monitor.control_register_values )
>          {
>              /* The actual write will occur in hvm_do_resume(), if permitted. */
>              v->arch.vm_event->write_data.do_write.cr0 = 1;
> @@ -2362,7 +2363,8 @@ int hvm_set_cr3(unsigned long value, bool may_defer)
>      {
>          ASSERT(v->arch.vm_event);
>  
> -        if ( hvm_monitor_crX(CR3, value, old) )
> +        if ( hvm_monitor_crX(CR3, value, old) &&
> +             v->domain->arch.monitor.control_register_values )
>          {
>              /* The actual write will occur in hvm_do_resume(), if permitted. */
>              v->arch.vm_event->write_data.do_write.cr3 = 1;
> @@ -2443,7 +2445,8 @@ int hvm_set_cr4(unsigned long value, bool may_defer)
>      {
>          ASSERT(v->arch.vm_event);
>  
> -        if ( hvm_monitor_crX(CR4, value, old_cr) )
> +        if ( hvm_monitor_crX(CR4, value, old_cr) &&
> +             v->domain->arch.monitor.control_register_values )

I think you could return control_register_values in hvm_monitor_crX
instead of having to add the check to each caller?

>          {
>              /* The actual write will occur in hvm_do_resume(), if permitted. */
>              v->arch.vm_event->write_data.do_write.cr4 = 1;
> @@ -3587,13 +3590,17 @@ int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content,
>  
>          ASSERT(v->arch.vm_event);
>  
> -        /* The actual write will occur in hvm_do_resume() (if permitted). */
> -        v->arch.vm_event->write_data.do_write.msr = 1;
> -        v->arch.vm_event->write_data.msr = msr;
> -        v->arch.vm_event->write_data.value = msr_content;
> -
>          hvm_monitor_msr(msr, msr_content, msr_old_content);
> -        return X86EMUL_OKAY;
> +
> +        if ( v->domain->arch.monitor.control_register_values )

Is there any value in limiting control_register_values to MSR that
represent control registers, like EFER and XSS?

> +        {
> +            /* The actual write will occur in hvm_do_resume(), if permitted. */
> +            v->arch.vm_event->write_data.do_write.msr = 1;
> +            v->arch.vm_event->write_data.msr = msr;
> +            v->arch.vm_event->write_data.value = msr_content;
> +
> +            return X86EMUL_OKAY;
> +        }

You seem to change the previous flow of the function here, that would
just call hvm_monitor_msr and return previously.

Don't you need to move the return from outside the added if condition
in order to keep previous behavior? Or else the write is committed
straight away.

>      }
>  
>      if ( (ret = guest_wrmsr(v, msr, msr_content)) != X86EMUL_UNHANDLEABLE )
> diff --git a/xen/arch/x86/monitor.c b/xen/arch/x86/monitor.c
> index bbcb7536c7..1517a97f50 100644
> --- a/xen/arch/x86/monitor.c
> +++ b/xen/arch/x86/monitor.c
> @@ -144,7 +144,15 @@ int arch_monitor_domctl_event(struct domain *d,
>                                struct xen_domctl_monitor_op *mop)
>  {
>      struct arch_domain *ad = &d->arch;
> -    bool requested_status = (XEN_DOMCTL_MONITOR_OP_ENABLE == mop->op);
> +    bool requested_status;
> +
> +    if ( XEN_DOMCTL_MONITOR_OP_CONTROL_REGISTERS == mop->op )
> +    {
> +        ad->monitor.control_register_values = true;

I think strictly speaking you need to use 1 here, since this variable
is not a boolean.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 for-4.14 2/3] xen/vm_event: add vm_event_check_pending_op
  2020-05-21  2:31 ` [PATCH v2 for-4.14 2/3] xen/vm_event: add vm_event_check_pending_op Tamas K Lengyel
@ 2020-06-02 11:47   ` Roger Pau Monné
  2020-06-02 11:50     ` Jan Beulich
  2020-06-02 12:43     ` Tamas K Lengyel
  0 siblings, 2 replies; 21+ messages in thread
From: Roger Pau Monné @ 2020-06-02 11:47 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Petre Pircalabu, Stefano Stabellini, Julien Grall, Wei Liu,
	Andrew Cooper, Jan Beulich, Alexandru Isaila, xen-devel,
	Volodymyr Babchuk

On Wed, May 20, 2020 at 08:31:53PM -0600, Tamas K Lengyel wrote:
> Perform sanity checking when shutting vm_event down to determine whether
> it is safe to do so. Error out with -EAGAIN in case pending operations
> have been found for the domain.
> 
> Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
> ---
>  xen/arch/x86/vm_event.c        | 23 +++++++++++++++++++++++
>  xen/common/vm_event.c          | 17 ++++++++++++++---
>  xen/include/asm-arm/vm_event.h |  7 +++++++
>  xen/include/asm-x86/vm_event.h |  2 ++
>  4 files changed, 46 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/arch/x86/vm_event.c b/xen/arch/x86/vm_event.c
> index 848d69c1b0..a23aadc112 100644
> --- a/xen/arch/x86/vm_event.c
> +++ b/xen/arch/x86/vm_event.c
> @@ -297,6 +297,29 @@ void vm_event_emulate_check(struct vcpu *v, vm_event_response_t *rsp)
>      };
>  }
>  
> +bool vm_event_check_pending_op(const struct vcpu *v)
> +{
> +    struct monitor_write_data *w = &v->arch.vm_event->write_data;

const

> +
> +    if ( !v->arch.vm_event->sync_event )
> +        return false;
> +
> +    if ( w->do_write.cr0 )
> +        return true;
> +    if ( w->do_write.cr3 )
> +        return true;
> +    if ( w->do_write.cr4 )
> +        return true;
> +    if ( w->do_write.msr )
> +        return true;
> +    if ( v->arch.vm_event->set_gprs )
> +        return true;
> +    if ( v->arch.vm_event->emulate_flags )
> +        return true;

Can you please group this into a single if, ie:

if ( w->do_write.cr0 || w->do_write.cr3 || ... )
    return true;

> +
> +    return false;
> +}
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/xen/common/vm_event.c b/xen/common/vm_event.c
> index 127f2d58f1..2df327a42c 100644
> --- a/xen/common/vm_event.c
> +++ b/xen/common/vm_event.c
> @@ -183,6 +183,7 @@ static int vm_event_disable(struct domain *d, struct vm_event_domain **p_ved)
>      if ( vm_event_check_ring(ved) )
>      {
>          struct vcpu *v;
> +        bool pending_op = false;
>  
>          spin_lock(&ved->lock);
>  
> @@ -192,9 +193,6 @@ static int vm_event_disable(struct domain *d, struct vm_event_domain **p_ved)
>              return -EBUSY;
>          }
>  
> -        /* Free domU's event channel and leave the other one unbound */
> -        free_xen_event_channel(d, ved->xen_port);
> -
>          /* Unblock all vCPUs */
>          for_each_vcpu ( d, v )
>          {
> @@ -203,8 +201,21 @@ static int vm_event_disable(struct domain *d, struct vm_event_domain **p_ved)
>                  vcpu_unpause(v);
>                  ved->blocked--;
>              }
> +
> +            if ( vm_event_check_pending_op(v) )
> +                pending_op = true;

You could just do:

pending_op |= vm_event_check_pending_op(v);

and avoid the initialization of pending_op above. Or alternatively:

if ( !pending_op && vm_event_check_pending_op(v) )
    pending_op = true;

Which avoid repeated calls to vm_event_check_pending_op when at least
one vCPU is known to be busy.

>          }
>  
> +        /* vm_event ops are still pending until vCPUs get scheduled */
> +        if ( pending_op )
> +        {
> +            spin_unlock(&ved->lock);
> +            return -EAGAIN;

What happens when this gets called from vm_event_cleanup?

AFAICT the result there is ignored, and could leak the vm_event
allocated memory?

Thanks, Roger.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 for-4.14 2/3] xen/vm_event: add vm_event_check_pending_op
  2020-06-02 11:47   ` Roger Pau Monné
@ 2020-06-02 11:50     ` Jan Beulich
  2020-06-02 12:43     ` Tamas K Lengyel
  1 sibling, 0 replies; 21+ messages in thread
From: Jan Beulich @ 2020-06-02 11:50 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Petre Pircalabu, Tamas K Lengyel, Julien Grall, Wei Liu,
	Andrew Cooper, Stefano Stabellini, Alexandru Isaila, xen-devel,
	Volodymyr Babchuk

On 02.06.2020 13:47, Roger Pau Monné wrote:
> On Wed, May 20, 2020 at 08:31:53PM -0600, Tamas K Lengyel wrote:
>> Perform sanity checking when shutting vm_event down to determine whether
>> it is safe to do so. Error out with -EAGAIN in case pending operations
>> have been found for the domain.
>>
>> Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
>> ---
>>  xen/arch/x86/vm_event.c        | 23 +++++++++++++++++++++++
>>  xen/common/vm_event.c          | 17 ++++++++++++++---
>>  xen/include/asm-arm/vm_event.h |  7 +++++++
>>  xen/include/asm-x86/vm_event.h |  2 ++
>>  4 files changed, 46 insertions(+), 3 deletions(-)
>>
>> diff --git a/xen/arch/x86/vm_event.c b/xen/arch/x86/vm_event.c
>> index 848d69c1b0..a23aadc112 100644
>> --- a/xen/arch/x86/vm_event.c
>> +++ b/xen/arch/x86/vm_event.c
>> @@ -297,6 +297,29 @@ void vm_event_emulate_check(struct vcpu *v, vm_event_response_t *rsp)
>>      };
>>  }
>>  
>> +bool vm_event_check_pending_op(const struct vcpu *v)
>> +{
>> +    struct monitor_write_data *w = &v->arch.vm_event->write_data;
> 
> const
> 
>> +
>> +    if ( !v->arch.vm_event->sync_event )
>> +        return false;
>> +
>> +    if ( w->do_write.cr0 )
>> +        return true;
>> +    if ( w->do_write.cr3 )
>> +        return true;
>> +    if ( w->do_write.cr4 )
>> +        return true;
>> +    if ( w->do_write.msr )
>> +        return true;
>> +    if ( v->arch.vm_event->set_gprs )
>> +        return true;
>> +    if ( v->arch.vm_event->emulate_flags )
>> +        return true;
> 
> Can you please group this into a single if, ie:
> 
> if ( w->do_write.cr0 || w->do_write.cr3 || ... )
>     return true;
> 
>> +
>> +    return false;
>> +}
>> +
>>  /*
>>   * Local variables:
>>   * mode: C
>> diff --git a/xen/common/vm_event.c b/xen/common/vm_event.c
>> index 127f2d58f1..2df327a42c 100644
>> --- a/xen/common/vm_event.c
>> +++ b/xen/common/vm_event.c
>> @@ -183,6 +183,7 @@ static int vm_event_disable(struct domain *d, struct vm_event_domain **p_ved)
>>      if ( vm_event_check_ring(ved) )
>>      {
>>          struct vcpu *v;
>> +        bool pending_op = false;
>>  
>>          spin_lock(&ved->lock);
>>  
>> @@ -192,9 +193,6 @@ static int vm_event_disable(struct domain *d, struct vm_event_domain **p_ved)
>>              return -EBUSY;
>>          }
>>  
>> -        /* Free domU's event channel and leave the other one unbound */
>> -        free_xen_event_channel(d, ved->xen_port);
>> -
>>          /* Unblock all vCPUs */
>>          for_each_vcpu ( d, v )
>>          {
>> @@ -203,8 +201,21 @@ static int vm_event_disable(struct domain *d, struct vm_event_domain **p_ved)
>>                  vcpu_unpause(v);
>>                  ved->blocked--;
>>              }
>> +
>> +            if ( vm_event_check_pending_op(v) )
>> +                pending_op = true;
> 
> You could just do:
> 
> pending_op |= vm_event_check_pending_op(v);
> 
> and avoid the initialization of pending_op above.

The initialization has to stay, or it couldn't be |= afaict.

> Or alternatively:
> 
> if ( !pending_op && vm_event_check_pending_op(v) )
>     pending_op = true;
> 
> Which avoid repeated calls to vm_event_check_pending_op when at least
> one vCPU is known to be busy.

    if ( !pending_op )
        pending_op = vm_event_check_pending_op(v);

?

Jan


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 for-4.14 1/3] xen/monitor: Control register values
  2020-06-02 11:08   ` Roger Pau Monné
@ 2020-06-02 12:40     ` Tamas K Lengyel
  2020-06-02 12:47       ` Jan Beulich
  2020-06-02 13:01       ` Roger Pau Monné
  0 siblings, 2 replies; 21+ messages in thread
From: Tamas K Lengyel @ 2020-06-02 12:40 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Petre Pircalabu, Stefano Stabellini, Julien Grall, Wei Liu,
	Andrew Cooper, Ian Jackson, George Dunlap, Jan Beulich,
	Alexandru Isaila, Xen-devel

On Tue, Jun 2, 2020 at 5:08 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>
> On Wed, May 20, 2020 at 08:31:52PM -0600, Tamas K Lengyel wrote:
> > Extend the monitor_op domctl to include option that enables
> > controlling what values certain registers are permitted to hold
> > by a monitor subscriber.
>
> I think the change could benefit for some more detail commit message
> here. Why is this useful?

You would have to ask the Bitdefender folks who made the feature. I
don't use it. Here we are just making it optional as it is buggy so it
is disabled by default.

>
> There already seems to be some support for gating MSR writes, which
> seems to be expanded by this commit?

We don't expand on any existing features, we make an existing feature optional.

>
> Is it solving some kind of bug reported?

It does, please take a look at the cover letter.

>
> > Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
> > ---
> >  xen/arch/x86/hvm/hvm.c       | 25 ++++++++++++++++---------
> >  xen/arch/x86/monitor.c       | 10 +++++++++-
> >  xen/include/asm-x86/domain.h |  1 +
> >  xen/include/public/domctl.h  |  1 +
> >  4 files changed, 27 insertions(+), 10 deletions(-)
> >
> > diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> > index 09ee299bc7..e6780c685b 100644
> > --- a/xen/arch/x86/hvm/hvm.c
> > +++ b/xen/arch/x86/hvm/hvm.c
> > @@ -2263,7 +2263,8 @@ int hvm_set_cr0(unsigned long value, bool may_defer)
> >      {
> >          ASSERT(v->arch.vm_event);
> >
> > -        if ( hvm_monitor_crX(CR0, value, old_value) )
> > +        if ( hvm_monitor_crX(CR0, value, old_value) &&
> > +             v->domain->arch.monitor.control_register_values )
> >          {
> >              /* The actual write will occur in hvm_do_resume(), if permitted. */
> >              v->arch.vm_event->write_data.do_write.cr0 = 1;
> > @@ -2362,7 +2363,8 @@ int hvm_set_cr3(unsigned long value, bool may_defer)
> >      {
> >          ASSERT(v->arch.vm_event);
> >
> > -        if ( hvm_monitor_crX(CR3, value, old) )
> > +        if ( hvm_monitor_crX(CR3, value, old) &&
> > +             v->domain->arch.monitor.control_register_values )
> >          {
> >              /* The actual write will occur in hvm_do_resume(), if permitted. */
> >              v->arch.vm_event->write_data.do_write.cr3 = 1;
> > @@ -2443,7 +2445,8 @@ int hvm_set_cr4(unsigned long value, bool may_defer)
> >      {
> >          ASSERT(v->arch.vm_event);
> >
> > -        if ( hvm_monitor_crX(CR4, value, old_cr) )
> > +        if ( hvm_monitor_crX(CR4, value, old_cr) &&
> > +             v->domain->arch.monitor.control_register_values )
>
> I think you could return control_register_values in hvm_monitor_crX
> instead of having to add the check to each caller?

We could, but this way the code is more consistent.

>
> >          {
> >              /* The actual write will occur in hvm_do_resume(), if permitted. */
> >              v->arch.vm_event->write_data.do_write.cr4 = 1;
> > @@ -3587,13 +3590,17 @@ int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content,
> >
> >          ASSERT(v->arch.vm_event);
> >
> > -        /* The actual write will occur in hvm_do_resume() (if permitted). */
> > -        v->arch.vm_event->write_data.do_write.msr = 1;
> > -        v->arch.vm_event->write_data.msr = msr;
> > -        v->arch.vm_event->write_data.value = msr_content;
> > -
> >          hvm_monitor_msr(msr, msr_content, msr_old_content);
> > -        return X86EMUL_OKAY;
> > +
> > +        if ( v->domain->arch.monitor.control_register_values )
>
> Is there any value in limiting control_register_values to MSR that
> represent control registers, like EFER and XSS?

I don't know, you would have to ask Bitdefender about it who made this feature.

>
> > +        {
> > +            /* The actual write will occur in hvm_do_resume(), if permitted. */
> > +            v->arch.vm_event->write_data.do_write.msr = 1;
> > +            v->arch.vm_event->write_data.msr = msr;
> > +            v->arch.vm_event->write_data.value = msr_content;
> > +
> > +            return X86EMUL_OKAY;
> > +        }
>
> You seem to change the previous flow of the function here, that would
> just call hvm_monitor_msr and return previously.
>
> Don't you need to move the return from outside the added if condition
> in order to keep previous behavior? Or else the write is committed
> straight away.

That's exactly what we want to achieve. Postponing the write is buggy.
We want to make that feature optional. Before Bitdefender contributed
that feature writes were always commited straight away, so with this
patch we are actually reverting default behavior to what it was like
to start with.

>
> >      }
> >
> >      if ( (ret = guest_wrmsr(v, msr, msr_content)) != X86EMUL_UNHANDLEABLE )
> > diff --git a/xen/arch/x86/monitor.c b/xen/arch/x86/monitor.c
> > index bbcb7536c7..1517a97f50 100644
> > --- a/xen/arch/x86/monitor.c
> > +++ b/xen/arch/x86/monitor.c
> > @@ -144,7 +144,15 @@ int arch_monitor_domctl_event(struct domain *d,
> >                                struct xen_domctl_monitor_op *mop)
> >  {
> >      struct arch_domain *ad = &d->arch;
> > -    bool requested_status = (XEN_DOMCTL_MONITOR_OP_ENABLE == mop->op);
> > +    bool requested_status;
> > +
> > +    if ( XEN_DOMCTL_MONITOR_OP_CONTROL_REGISTERS == mop->op )
> > +    {
> > +        ad->monitor.control_register_values = true;
>
> I think strictly speaking you need to use 1 here, since this variable
> is not a boolean.

Sure.

Thanks,
Tamas


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 for-4.14 2/3] xen/vm_event: add vm_event_check_pending_op
  2020-06-02 11:47   ` Roger Pau Monné
  2020-06-02 11:50     ` Jan Beulich
@ 2020-06-02 12:43     ` Tamas K Lengyel
  1 sibling, 0 replies; 21+ messages in thread
From: Tamas K Lengyel @ 2020-06-02 12:43 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Petre Pircalabu, Stefano Stabellini, Julien Grall, Wei Liu,
	Andrew Cooper, Jan Beulich, Alexandru Isaila, Xen-devel,
	Volodymyr Babchuk

On Tue, Jun 2, 2020 at 5:47 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>
> On Wed, May 20, 2020 at 08:31:53PM -0600, Tamas K Lengyel wrote:
> > Perform sanity checking when shutting vm_event down to determine whether
> > it is safe to do so. Error out with -EAGAIN in case pending operations
> > have been found for the domain.
> >
> > Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
> > ---
> >  xen/arch/x86/vm_event.c        | 23 +++++++++++++++++++++++
> >  xen/common/vm_event.c          | 17 ++++++++++++++---
> >  xen/include/asm-arm/vm_event.h |  7 +++++++
> >  xen/include/asm-x86/vm_event.h |  2 ++
> >  4 files changed, 46 insertions(+), 3 deletions(-)
> >
> > diff --git a/xen/arch/x86/vm_event.c b/xen/arch/x86/vm_event.c
> > index 848d69c1b0..a23aadc112 100644
> > --- a/xen/arch/x86/vm_event.c
> > +++ b/xen/arch/x86/vm_event.c
> > @@ -297,6 +297,29 @@ void vm_event_emulate_check(struct vcpu *v, vm_event_response_t *rsp)
> >      };
> >  }
> >
> > +bool vm_event_check_pending_op(const struct vcpu *v)
> > +{
> > +    struct monitor_write_data *w = &v->arch.vm_event->write_data;
>
> const
>
> > +
> > +    if ( !v->arch.vm_event->sync_event )
> > +        return false;
> > +
> > +    if ( w->do_write.cr0 )
> > +        return true;
> > +    if ( w->do_write.cr3 )
> > +        return true;
> > +    if ( w->do_write.cr4 )
> > +        return true;
> > +    if ( w->do_write.msr )
> > +        return true;
> > +    if ( v->arch.vm_event->set_gprs )
> > +        return true;
> > +    if ( v->arch.vm_event->emulate_flags )
> > +        return true;
>
> Can you please group this into a single if, ie:
>
> if ( w->do_write.cr0 || w->do_write.cr3 || ... )
>     return true;
>
> > +
> > +    return false;
> > +}
> > +
> >  /*
> >   * Local variables:
> >   * mode: C
> > diff --git a/xen/common/vm_event.c b/xen/common/vm_event.c
> > index 127f2d58f1..2df327a42c 100644
> > --- a/xen/common/vm_event.c
> > +++ b/xen/common/vm_event.c
> > @@ -183,6 +183,7 @@ static int vm_event_disable(struct domain *d, struct vm_event_domain **p_ved)
> >      if ( vm_event_check_ring(ved) )
> >      {
> >          struct vcpu *v;
> > +        bool pending_op = false;
> >
> >          spin_lock(&ved->lock);
> >
> > @@ -192,9 +193,6 @@ static int vm_event_disable(struct domain *d, struct vm_event_domain **p_ved)
> >              return -EBUSY;
> >          }
> >
> > -        /* Free domU's event channel and leave the other one unbound */
> > -        free_xen_event_channel(d, ved->xen_port);
> > -
> >          /* Unblock all vCPUs */
> >          for_each_vcpu ( d, v )
> >          {
> > @@ -203,8 +201,21 @@ static int vm_event_disable(struct domain *d, struct vm_event_domain **p_ved)
> >                  vcpu_unpause(v);
> >                  ved->blocked--;
> >              }
> > +
> > +            if ( vm_event_check_pending_op(v) )
> > +                pending_op = true;
>
> You could just do:
>
> pending_op |= vm_event_check_pending_op(v);
>
> and avoid the initialization of pending_op above. Or alternatively:
>
> if ( !pending_op && vm_event_check_pending_op(v) )
>     pending_op = true;
>
> Which avoid repeated calls to vm_event_check_pending_op when at least
> one vCPU is known to be busy.
>
> >          }
> >
> > +        /* vm_event ops are still pending until vCPUs get scheduled */
> > +        if ( pending_op )
> > +        {
> > +            spin_unlock(&ved->lock);
> > +            return -EAGAIN;
>
> What happens when this gets called from vm_event_cleanup?
>
> AFAICT the result there is ignored, and could leak the vm_event
> allocated memory?

Thanks for the feedback. I'm going to drop this patch at let
Bitdefender pick it up if they feel like fixing their buggy feature.
As things stand for my use-case I only need patch 1 from this series.

Tamas


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 for-4.14 1/3] xen/monitor: Control register values
  2020-06-02 12:40     ` Tamas K Lengyel
@ 2020-06-02 12:47       ` Jan Beulich
  2020-06-02 12:51         ` Tamas K Lengyel
  2020-06-02 13:01       ` Roger Pau Monné
  1 sibling, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2020-06-02 12:47 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Petre Pircalabu, Stefano Stabellini, Julien Grall, Wei Liu,
	Andrew Cooper, Ian Jackson, George Dunlap, Alexandru Isaila,
	Xen-devel, Roger Pau Monné

On 02.06.2020 14:40, Tamas K Lengyel wrote:
> On Tue, Jun 2, 2020 at 5:08 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>>
>> On Wed, May 20, 2020 at 08:31:52PM -0600, Tamas K Lengyel wrote:
>>> Extend the monitor_op domctl to include option that enables
>>> controlling what values certain registers are permitted to hold
>>> by a monitor subscriber.
>>
>> I think the change could benefit for some more detail commit message
>> here. Why is this useful?
> 
> You would have to ask the Bitdefender folks who made the feature. I
> don't use it. Here we are just making it optional as it is buggy so it
> is disabled by default.

Now that's exactly the opposite of what I had derived from the
description here so far. Perhaps an at least weak indication
that you want to reword this. For example, from your reply to
Roger I understand it's rather that the new flag allows to
"suppress" the controlling (since presumably you don't change
default behavior), rather then "enabling" it.

>> There already seems to be some support for gating MSR writes, which
>> seems to be expanded by this commit?
> 
> We don't expand on any existing features, we make an existing feature optional.
> 
>>
>> Is it solving some kind of bug reported?
> 
> It does, please take a look at the cover letter.

Please can such important aspects also go in the commit message,
so they're available later without needing to hunt down mail
threads (besides this way being more readily available to
reviewers)?

Jan


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 for-4.14 1/3] xen/monitor: Control register values
  2020-06-02 12:47       ` Jan Beulich
@ 2020-06-02 12:51         ` Tamas K Lengyel
  2020-06-02 13:00           ` Jan Beulich
  0 siblings, 1 reply; 21+ messages in thread
From: Tamas K Lengyel @ 2020-06-02 12:51 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Petre Pircalabu, Stefano Stabellini, Julien Grall, Wei Liu,
	Andrew Cooper, Ian Jackson, George Dunlap, Alexandru Isaila,
	Xen-devel, Roger Pau Monné

On Tue, Jun 2, 2020 at 6:47 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 02.06.2020 14:40, Tamas K Lengyel wrote:
> > On Tue, Jun 2, 2020 at 5:08 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> >>
> >> On Wed, May 20, 2020 at 08:31:52PM -0600, Tamas K Lengyel wrote:
> >>> Extend the monitor_op domctl to include option that enables
> >>> controlling what values certain registers are permitted to hold
> >>> by a monitor subscriber.
> >>
> >> I think the change could benefit for some more detail commit message
> >> here. Why is this useful?
> >
> > You would have to ask the Bitdefender folks who made the feature. I
> > don't use it. Here we are just making it optional as it is buggy so it
> > is disabled by default.
>
> Now that's exactly the opposite of what I had derived from the
> description here so far. Perhaps an at least weak indication
> that you want to reword this. For example, from your reply to
> Roger I understand it's rather that the new flag allows to
> "suppress" the controlling (since presumably you don't change
> default behavior), rather then "enabling" it.

What we are adding is a domctl you need to call that enables this
feature. It's not an option to suppress it. It shouldn't have been
enabled by default to begin with. That was a mistake when the feature
was contributed and it is buggy.

>
> >> There already seems to be some support for gating MSR writes, which
> >> seems to be expanded by this commit?
> >
> > We don't expand on any existing features, we make an existing feature optional.
> >
> >>
> >> Is it solving some kind of bug reported?
> >
> > It does, please take a look at the cover letter.
>
> Please can such important aspects also go in the commit message,
> so they're available later without needing to hunt down mail
> threads (besides this way being more readily available to
> reviewers)?

Noted.

Thanks,
Tamas


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 for-4.14 3/3] xen/vm_event: Add safe to disable vm_event
  2020-05-21  2:31 ` [PATCH v2 for-4.14 3/3] xen/vm_event: Add safe to disable vm_event Tamas K Lengyel
@ 2020-06-02 12:54   ` Roger Pau Monné
  2020-06-02 13:06     ` Tamas K Lengyel
  0 siblings, 1 reply; 21+ messages in thread
From: Roger Pau Monné @ 2020-06-02 12:54 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Petre Pircalabu, Stefano Stabellini, Julien Grall, Wei Liu,
	Andrew Cooper, Ian Jackson, George Dunlap, Jan Beulich,
	Alexandru Isaila, xen-devel

On Wed, May 20, 2020 at 08:31:54PM -0600, Tamas K Lengyel wrote:
> Instead of having to repeatedly try to disable vm_events,

Why not use a hypercall continuation instead so that this is all
hidden from the caller?

I take that the current interface requires the user to repeatedly
issue hypercalls in order to disable vm_events until one of those
succeeds?

> request a specific
> vm_event to be sent when the domain is safe to continue with shutting down
> the vm_event interface.
> 
> Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
> ---
>  xen/arch/x86/hvm/hvm.c            | 38 ++++++++++++++++++++++++++-----
>  xen/arch/x86/hvm/monitor.c        | 14 ++++++++++++
>  xen/arch/x86/monitor.c            | 13 +++++++++++
>  xen/include/asm-x86/domain.h      |  1 +
>  xen/include/asm-x86/hvm/monitor.h |  1 +
>  xen/include/public/domctl.h       |  2 ++
>  xen/include/public/vm_event.h     |  8 +++++++
>  7 files changed, 71 insertions(+), 6 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index e6780c685b..fc7e1e2b22 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -563,15 +563,41 @@ void hvm_do_resume(struct vcpu *v)
>          v->arch.hvm.inject_event.vector = HVM_EVENT_VECTOR_UNSET;
>      }
>  
> -    if ( unlikely(v->arch.vm_event) && v->arch.monitor.next_interrupt_enabled )
> +    if ( unlikely(v->arch.vm_event) )
>      {
> -        struct x86_event info;
> +        struct domain *d = v->domain;
> +
> +        if ( v->arch.monitor.next_interrupt_enabled )
> +        {
> +            struct x86_event info;
> +
> +            if ( hvm_get_pending_event(v, &info) )
> +            {
> +                hvm_monitor_interrupt(info.vector, info.type, info.error_code,
> +                                      info.cr2);
> +                v->arch.monitor.next_interrupt_enabled = false;
> +            }
> +        }
>  
> -        if ( hvm_get_pending_event(v, &info) )
> +        if ( d->arch.monitor.safe_to_disable )
>          {
> -            hvm_monitor_interrupt(info.vector, info.type, info.error_code,
> -                                  info.cr2);
> -            v->arch.monitor.next_interrupt_enabled = false;
> +            const struct vcpu *check_vcpu;
> +            bool pending_op = false;
> +
> +            for_each_vcpu ( d, check_vcpu )
> +            {
> +                if ( vm_event_check_pending_op(check_vcpu) )

Don't you need some kind of lock here, since you are poking at another
vCPU which could be modifying any of those bits?

> +                {
> +                    pending_op = true;
> +                    break;
> +                }
> +            }
> +
> +            if ( !pending_op )
> +            {
> +                hvm_monitor_safe_to_disable();
> +                d->arch.monitor.safe_to_disable = false;
> +            }
>          }
>      }
>  }
> diff --git a/xen/arch/x86/hvm/monitor.c b/xen/arch/x86/hvm/monitor.c
> index f5d89e71d1..75fd1a4b68 100644
> --- a/xen/arch/x86/hvm/monitor.c
> +++ b/xen/arch/x86/hvm/monitor.c
> @@ -300,6 +300,20 @@ bool hvm_monitor_check_p2m(unsigned long gla, gfn_t gfn, uint32_t pfec,
>      return monitor_traps(curr, true, &req) >= 0;
>  }
>  
> +void hvm_monitor_safe_to_disable(void)
> +{
> +    struct vcpu *curr = current;
> +    struct arch_domain *ad = &curr->domain->arch;

const

> +    vm_event_request_t req = {};
> +
> +    if ( !ad->monitor.safe_to_disable )
> +        return;

Should this rather be an ASSERT? I don't think you are supposed to
call hvm_monitor_safe_to_disable when the bit is not set?

> +
> +    req.reason = VM_EVENT_REASON_SAFE_TO_DISABLE;

I think you cat set the field at definition time.

> +
> +    monitor_traps(curr, 0, &req);
> +}
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/xen/arch/x86/monitor.c b/xen/arch/x86/monitor.c
> index 1517a97f50..86e0ba2fbc 100644
> --- a/xen/arch/x86/monitor.c
> +++ b/xen/arch/x86/monitor.c
> @@ -339,6 +339,19 @@ int arch_monitor_domctl_event(struct domain *d,
>          break;
>      }
>  
> +    case XEN_DOMCTL_MONITOR_EVENT_SAFE_TO_DISABLE:
> +    {
> +        bool old_status = ad->monitor.safe_to_disable;
> +
> +        if ( unlikely(old_status == requested_status) )
> +            return -EEXIST;
> +
> +        domain_pause(d);
> +        ad->monitor.safe_to_disable = requested_status;

Maybe I'm missing something, but I don't see any check that others
events are disabled before safe_to_disable is set?

In the same way, you should prevent setting any events when
safe_to_disable is set IMO, likely returning -EBUSY in both cases.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 for-4.14 1/3] xen/monitor: Control register values
  2020-06-02 12:51         ` Tamas K Lengyel
@ 2020-06-02 13:00           ` Jan Beulich
  2020-06-02 13:10             ` Tamas K Lengyel
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2020-06-02 13:00 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Petre Pircalabu, Stefano Stabellini, Julien Grall, Wei Liu,
	Andrew Cooper, Ian Jackson, George Dunlap, Alexandru Isaila,
	Xen-devel, Roger Pau Monné

On 02.06.2020 14:51, Tamas K Lengyel wrote:
> On Tue, Jun 2, 2020 at 6:47 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 02.06.2020 14:40, Tamas K Lengyel wrote:
>>> On Tue, Jun 2, 2020 at 5:08 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>>>>
>>>> On Wed, May 20, 2020 at 08:31:52PM -0600, Tamas K Lengyel wrote:
>>>>> Extend the monitor_op domctl to include option that enables
>>>>> controlling what values certain registers are permitted to hold
>>>>> by a monitor subscriber.
>>>>
>>>> I think the change could benefit for some more detail commit message
>>>> here. Why is this useful?
>>>
>>> You would have to ask the Bitdefender folks who made the feature. I
>>> don't use it. Here we are just making it optional as it is buggy so it
>>> is disabled by default.
>>
>> Now that's exactly the opposite of what I had derived from the
>> description here so far. Perhaps an at least weak indication
>> that you want to reword this. For example, from your reply to
>> Roger I understand it's rather that the new flag allows to
>> "suppress" the controlling (since presumably you don't change
>> default behavior), rather then "enabling" it.
> 
> What we are adding is a domctl you need to call that enables this
> feature. It's not an option to suppress it. It shouldn't have been
> enabled by default to begin with. That was a mistake when the feature
> was contributed and it is buggy.

Okay, in this case it's important to point out that you alter
default behavior. The BitDefender folks may not like this, yet
they've been surprisingly silent so far.

Jan


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 for-4.14 1/3] xen/monitor: Control register values
  2020-06-02 12:40     ` Tamas K Lengyel
  2020-06-02 12:47       ` Jan Beulich
@ 2020-06-02 13:01       ` Roger Pau Monné
  2020-06-02 13:04         ` Jan Beulich
  2020-06-02 13:09         ` Tamas K Lengyel
  1 sibling, 2 replies; 21+ messages in thread
From: Roger Pau Monné @ 2020-06-02 13:01 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Petre Pircalabu, Stefano Stabellini, Julien Grall, Wei Liu,
	Andrew Cooper, Ian Jackson, George Dunlap, Jan Beulich,
	Alexandru Isaila, Xen-devel

On Tue, Jun 02, 2020 at 06:40:12AM -0600, Tamas K Lengyel wrote:
> On Tue, Jun 2, 2020 at 5:08 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> >
> > On Wed, May 20, 2020 at 08:31:52PM -0600, Tamas K Lengyel wrote:
> > > Extend the monitor_op domctl to include option that enables
> > > controlling what values certain registers are permitted to hold
> > > by a monitor subscriber.
> >
> > I think the change could benefit for some more detail commit message
> > here. Why is this useful?
> 
> You would have to ask the Bitdefender folks who made the feature. I
> don't use it. Here we are just making it optional as it is buggy so it
> is disabled by default.
> 
> >
> > There already seems to be some support for gating MSR writes, which
> > seems to be expanded by this commit?
> 
> We don't expand on any existing features, we make an existing feature optional.
> 
> >
> > Is it solving some kind of bug reported?
> 
> It does, please take a look at the cover letter.

Please copy the relevant bits here for reference.

> >
> > > Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
> > > ---
> > >  xen/arch/x86/hvm/hvm.c       | 25 ++++++++++++++++---------
> > >  xen/arch/x86/monitor.c       | 10 +++++++++-
> > >  xen/include/asm-x86/domain.h |  1 +
> > >  xen/include/public/domctl.h  |  1 +
> > >  4 files changed, 27 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> > > index 09ee299bc7..e6780c685b 100644
> > > --- a/xen/arch/x86/hvm/hvm.c
> > > +++ b/xen/arch/x86/hvm/hvm.c
> > > @@ -2263,7 +2263,8 @@ int hvm_set_cr0(unsigned long value, bool may_defer)
> > >      {
> > >          ASSERT(v->arch.vm_event);
> > >
> > > -        if ( hvm_monitor_crX(CR0, value, old_value) )
> > > +        if ( hvm_monitor_crX(CR0, value, old_value) &&
> > > +             v->domain->arch.monitor.control_register_values )
> > >          {
> > >              /* The actual write will occur in hvm_do_resume(), if permitted. */
> > >              v->arch.vm_event->write_data.do_write.cr0 = 1;
> > > @@ -2362,7 +2363,8 @@ int hvm_set_cr3(unsigned long value, bool may_defer)
> > >      {
> > >          ASSERT(v->arch.vm_event);
> > >
> > > -        if ( hvm_monitor_crX(CR3, value, old) )
> > > +        if ( hvm_monitor_crX(CR3, value, old) &&
> > > +             v->domain->arch.monitor.control_register_values )
> > >          {
> > >              /* The actual write will occur in hvm_do_resume(), if permitted. */
> > >              v->arch.vm_event->write_data.do_write.cr3 = 1;
> > > @@ -2443,7 +2445,8 @@ int hvm_set_cr4(unsigned long value, bool may_defer)
> > >      {
> > >          ASSERT(v->arch.vm_event);
> > >
> > > -        if ( hvm_monitor_crX(CR4, value, old_cr) )
> > > +        if ( hvm_monitor_crX(CR4, value, old_cr) &&
> > > +             v->domain->arch.monitor.control_register_values )
> >
> > I think you could return control_register_values in hvm_monitor_crX
> > instead of having to add the check to each caller?
> 
> We could, but this way the code is more consistent.

OK, I guess it's a matter of taste. I would rather prefer those checks
to be confined to hvm_monitor_crX because then the generic code is not
polluted with monitor checks, but that's likely just my taste.

> >
> > >          {
> > >              /* The actual write will occur in hvm_do_resume(), if permitted. */
> > >              v->arch.vm_event->write_data.do_write.cr4 = 1;
> > > @@ -3587,13 +3590,17 @@ int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content,
> > >
> > >          ASSERT(v->arch.vm_event);
> > >
> > > -        /* The actual write will occur in hvm_do_resume() (if permitted). */
> > > -        v->arch.vm_event->write_data.do_write.msr = 1;
> > > -        v->arch.vm_event->write_data.msr = msr;
> > > -        v->arch.vm_event->write_data.value = msr_content;
> > > -
> > >          hvm_monitor_msr(msr, msr_content, msr_old_content);
> > > -        return X86EMUL_OKAY;
> > > +
> > > +        if ( v->domain->arch.monitor.control_register_values )
> >
> > Is there any value in limiting control_register_values to MSR that
> > represent control registers, like EFER and XSS?
> 
> I don't know, you would have to ask Bitdefender about it who made this feature.
> 
> >
> > > +        {
> > > +            /* The actual write will occur in hvm_do_resume(), if permitted. */
> > > +            v->arch.vm_event->write_data.do_write.msr = 1;
> > > +            v->arch.vm_event->write_data.msr = msr;
> > > +            v->arch.vm_event->write_data.value = msr_content;
> > > +
> > > +            return X86EMUL_OKAY;
> > > +        }
> >
> > You seem to change the previous flow of the function here, that would
> > just call hvm_monitor_msr and return previously.
> >
> > Don't you need to move the return from outside the added if condition
> > in order to keep previous behavior? Or else the write is committed
> > straight away.
> 
> That's exactly what we want to achieve. Postponing the write is buggy.
> We want to make that feature optional. Before Bitdefender contributed
> that feature writes were always commited straight away, so with this
> patch we are actually reverting default behavior to what it was like
> to start with.

Oh, could this be made clear on the commit message then?

When I first saw the code I assumed this was wrong (I'm likely not
familiar enough with the code anyway).

Thanks, Roger.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 for-4.14 1/3] xen/monitor: Control register values
  2020-06-02 13:01       ` Roger Pau Monné
@ 2020-06-02 13:04         ` Jan Beulich
  2020-06-02 13:07           ` Tamas K Lengyel
  2020-06-02 13:09         ` Tamas K Lengyel
  1 sibling, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2020-06-02 13:04 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Petre Pircalabu, Stefano Stabellini, Julien Grall, Wei Liu,
	Andrew Cooper, Ian Jackson, George Dunlap, Alexandru Isaila,
	Xen-devel, Roger Pau Monné

On 02.06.2020 15:01, Roger Pau Monné wrote:
> On Tue, Jun 02, 2020 at 06:40:12AM -0600, Tamas K Lengyel wrote:
>> On Tue, Jun 2, 2020 at 5:08 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>>> On Wed, May 20, 2020 at 08:31:52PM -0600, Tamas K Lengyel wrote:
>>>> --- a/xen/arch/x86/hvm/hvm.c
>>>> +++ b/xen/arch/x86/hvm/hvm.c
>>>> @@ -2263,7 +2263,8 @@ int hvm_set_cr0(unsigned long value, bool may_defer)
>>>>      {
>>>>          ASSERT(v->arch.vm_event);
>>>>
>>>> -        if ( hvm_monitor_crX(CR0, value, old_value) )
>>>> +        if ( hvm_monitor_crX(CR0, value, old_value) &&
>>>> +             v->domain->arch.monitor.control_register_values )
>>>>          {
>>>>              /* The actual write will occur in hvm_do_resume(), if permitted. */
>>>>              v->arch.vm_event->write_data.do_write.cr0 = 1;
>>>> @@ -2362,7 +2363,8 @@ int hvm_set_cr3(unsigned long value, bool may_defer)
>>>>      {
>>>>          ASSERT(v->arch.vm_event);
>>>>
>>>> -        if ( hvm_monitor_crX(CR3, value, old) )
>>>> +        if ( hvm_monitor_crX(CR3, value, old) &&
>>>> +             v->domain->arch.monitor.control_register_values )
>>>>          {
>>>>              /* The actual write will occur in hvm_do_resume(), if permitted. */
>>>>              v->arch.vm_event->write_data.do_write.cr3 = 1;
>>>> @@ -2443,7 +2445,8 @@ int hvm_set_cr4(unsigned long value, bool may_defer)
>>>>      {
>>>>          ASSERT(v->arch.vm_event);
>>>>
>>>> -        if ( hvm_monitor_crX(CR4, value, old_cr) )
>>>> +        if ( hvm_monitor_crX(CR4, value, old_cr) &&
>>>> +             v->domain->arch.monitor.control_register_values )
>>>
>>> I think you could return control_register_values in hvm_monitor_crX
>>> instead of having to add the check to each caller?
>>
>> We could, but this way the code is more consistent.
> 
> OK, I guess it's a matter of taste. I would rather prefer those checks
> to be confined to hvm_monitor_crX because then the generic code is not
> polluted with monitor checks, but that's likely just my taste.

+1

Jan


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 for-4.14 3/3] xen/vm_event: Add safe to disable vm_event
  2020-06-02 12:54   ` Roger Pau Monné
@ 2020-06-02 13:06     ` Tamas K Lengyel
  0 siblings, 0 replies; 21+ messages in thread
From: Tamas K Lengyel @ 2020-06-02 13:06 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Petre Pircalabu, Stefano Stabellini, Julien Grall, Wei Liu,
	Andrew Cooper, Ian Jackson, George Dunlap, Jan Beulich,
	Alexandru Isaila, Xen-devel

On Tue, Jun 2, 2020 at 6:54 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>
> On Wed, May 20, 2020 at 08:31:54PM -0600, Tamas K Lengyel wrote:
> > Instead of having to repeatedly try to disable vm_events,
>
> Why not use a hypercall continuation instead so that this is all
> hidden from the caller?
>
> I take that the current interface requires the user to repeatedly
> issue hypercalls in order to disable vm_events until one of those
> succeeds?

No, it succeeds right away. And then the guest crashes in unique and
unpredictable ways.

>
> > request a specific
> > vm_event to be sent when the domain is safe to continue with shutting down
> > the vm_event interface.
> >
> > Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
> > ---
> >  xen/arch/x86/hvm/hvm.c            | 38 ++++++++++++++++++++++++++-----
> >  xen/arch/x86/hvm/monitor.c        | 14 ++++++++++++
> >  xen/arch/x86/monitor.c            | 13 +++++++++++
> >  xen/include/asm-x86/domain.h      |  1 +
> >  xen/include/asm-x86/hvm/monitor.h |  1 +
> >  xen/include/public/domctl.h       |  2 ++
> >  xen/include/public/vm_event.h     |  8 +++++++
> >  7 files changed, 71 insertions(+), 6 deletions(-)
> >
> > diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> > index e6780c685b..fc7e1e2b22 100644
> > --- a/xen/arch/x86/hvm/hvm.c
> > +++ b/xen/arch/x86/hvm/hvm.c
> > @@ -563,15 +563,41 @@ void hvm_do_resume(struct vcpu *v)
> >          v->arch.hvm.inject_event.vector = HVM_EVENT_VECTOR_UNSET;
> >      }
> >
> > -    if ( unlikely(v->arch.vm_event) && v->arch.monitor.next_interrupt_enabled )
> > +    if ( unlikely(v->arch.vm_event) )
> >      {
> > -        struct x86_event info;
> > +        struct domain *d = v->domain;
> > +
> > +        if ( v->arch.monitor.next_interrupt_enabled )
> > +        {
> > +            struct x86_event info;
> > +
> > +            if ( hvm_get_pending_event(v, &info) )
> > +            {
> > +                hvm_monitor_interrupt(info.vector, info.type, info.error_code,
> > +                                      info.cr2);
> > +                v->arch.monitor.next_interrupt_enabled = false;
> > +            }
> > +        }
> >
> > -        if ( hvm_get_pending_event(v, &info) )
> > +        if ( d->arch.monitor.safe_to_disable )
> >          {
> > -            hvm_monitor_interrupt(info.vector, info.type, info.error_code,
> > -                                  info.cr2);
> > -            v->arch.monitor.next_interrupt_enabled = false;
> > +            const struct vcpu *check_vcpu;
> > +            bool pending_op = false;
> > +
> > +            for_each_vcpu ( d, check_vcpu )
> > +            {
> > +                if ( vm_event_check_pending_op(check_vcpu) )
>
> Don't you need some kind of lock here, since you are poking at another
> vCPU which could be modifying any of those bits?
>
> > +                {
> > +                    pending_op = true;
> > +                    break;
> > +                }
> > +            }
> > +
> > +            if ( !pending_op )
> > +            {
> > +                hvm_monitor_safe_to_disable();
> > +                d->arch.monitor.safe_to_disable = false;
> > +            }
> >          }
> >      }
> >  }
> > diff --git a/xen/arch/x86/hvm/monitor.c b/xen/arch/x86/hvm/monitor.c
> > index f5d89e71d1..75fd1a4b68 100644
> > --- a/xen/arch/x86/hvm/monitor.c
> > +++ b/xen/arch/x86/hvm/monitor.c
> > @@ -300,6 +300,20 @@ bool hvm_monitor_check_p2m(unsigned long gla, gfn_t gfn, uint32_t pfec,
> >      return monitor_traps(curr, true, &req) >= 0;
> >  }
> >
> > +void hvm_monitor_safe_to_disable(void)
> > +{
> > +    struct vcpu *curr = current;
> > +    struct arch_domain *ad = &curr->domain->arch;
>
> const
>
> > +    vm_event_request_t req = {};
> > +
> > +    if ( !ad->monitor.safe_to_disable )
> > +        return;
>
> Should this rather be an ASSERT? I don't think you are supposed to
> call hvm_monitor_safe_to_disable when the bit is not set?
>
> > +
> > +    req.reason = VM_EVENT_REASON_SAFE_TO_DISABLE;
>
> I think you cat set the field at definition time.
>
> > +
> > +    monitor_traps(curr, 0, &req);
> > +}
> > +
> >  /*
> >   * Local variables:
> >   * mode: C
> > diff --git a/xen/arch/x86/monitor.c b/xen/arch/x86/monitor.c
> > index 1517a97f50..86e0ba2fbc 100644
> > --- a/xen/arch/x86/monitor.c
> > +++ b/xen/arch/x86/monitor.c
> > @@ -339,6 +339,19 @@ int arch_monitor_domctl_event(struct domain *d,
> >          break;
> >      }
> >
> > +    case XEN_DOMCTL_MONITOR_EVENT_SAFE_TO_DISABLE:
> > +    {
> > +        bool old_status = ad->monitor.safe_to_disable;
> > +
> > +        if ( unlikely(old_status == requested_status) )
> > +            return -EEXIST;
> > +
> > +        domain_pause(d);
> > +        ad->monitor.safe_to_disable = requested_status;
>
> Maybe I'm missing something, but I don't see any check that others
> events are disabled before safe_to_disable is set?
>
> In the same way, you should prevent setting any events when
> safe_to_disable is set IMO, likely returning -EBUSY in both cases.
>
> Thanks, Roger.

Thanks for the feedback again. I won't have the bandwidth to address
these so I'm dropping this patch. If Bitdefender is so inclined to
pick-up later they are welcome to do so. This is only needed if their
buggy feature is enabled.

Tamas


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 for-4.14 1/3] xen/monitor: Control register values
  2020-06-02 13:04         ` Jan Beulich
@ 2020-06-02 13:07           ` Tamas K Lengyel
  0 siblings, 0 replies; 21+ messages in thread
From: Tamas K Lengyel @ 2020-06-02 13:07 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Petre Pircalabu, Stefano Stabellini, Julien Grall, Wei Liu,
	Andrew Cooper, Ian Jackson, George Dunlap, Alexandru Isaila,
	Xen-devel, Roger Pau Monné

On Tue, Jun 2, 2020 at 7:04 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 02.06.2020 15:01, Roger Pau Monné wrote:
> > On Tue, Jun 02, 2020 at 06:40:12AM -0600, Tamas K Lengyel wrote:
> >> On Tue, Jun 2, 2020 at 5:08 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> >>> On Wed, May 20, 2020 at 08:31:52PM -0600, Tamas K Lengyel wrote:
> >>>> --- a/xen/arch/x86/hvm/hvm.c
> >>>> +++ b/xen/arch/x86/hvm/hvm.c
> >>>> @@ -2263,7 +2263,8 @@ int hvm_set_cr0(unsigned long value, bool may_defer)
> >>>>      {
> >>>>          ASSERT(v->arch.vm_event);
> >>>>
> >>>> -        if ( hvm_monitor_crX(CR0, value, old_value) )
> >>>> +        if ( hvm_monitor_crX(CR0, value, old_value) &&
> >>>> +             v->domain->arch.monitor.control_register_values )
> >>>>          {
> >>>>              /* The actual write will occur in hvm_do_resume(), if permitted. */
> >>>>              v->arch.vm_event->write_data.do_write.cr0 = 1;
> >>>> @@ -2362,7 +2363,8 @@ int hvm_set_cr3(unsigned long value, bool may_defer)
> >>>>      {
> >>>>          ASSERT(v->arch.vm_event);
> >>>>
> >>>> -        if ( hvm_monitor_crX(CR3, value, old) )
> >>>> +        if ( hvm_monitor_crX(CR3, value, old) &&
> >>>> +             v->domain->arch.monitor.control_register_values )
> >>>>          {
> >>>>              /* The actual write will occur in hvm_do_resume(), if permitted. */
> >>>>              v->arch.vm_event->write_data.do_write.cr3 = 1;
> >>>> @@ -2443,7 +2445,8 @@ int hvm_set_cr4(unsigned long value, bool may_defer)
> >>>>      {
> >>>>          ASSERT(v->arch.vm_event);
> >>>>
> >>>> -        if ( hvm_monitor_crX(CR4, value, old_cr) )
> >>>> +        if ( hvm_monitor_crX(CR4, value, old_cr) &&
> >>>> +             v->domain->arch.monitor.control_register_values )
> >>>
> >>> I think you could return control_register_values in hvm_monitor_crX
> >>> instead of having to add the check to each caller?
> >>
> >> We could, but this way the code is more consistent.
> >
> > OK, I guess it's a matter of taste. I would rather prefer those checks
> > to be confined to hvm_monitor_crX because then the generic code is not
> > polluted with monitor checks, but that's likely just my taste.
>
> +1


OK.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 for-4.14 1/3] xen/monitor: Control register values
  2020-06-02 13:01       ` Roger Pau Monné
  2020-06-02 13:04         ` Jan Beulich
@ 2020-06-02 13:09         ` Tamas K Lengyel
  1 sibling, 0 replies; 21+ messages in thread
From: Tamas K Lengyel @ 2020-06-02 13:09 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Petre Pircalabu, Stefano Stabellini, Julien Grall, Wei Liu,
	Andrew Cooper, Ian Jackson, George Dunlap, Jan Beulich,
	Alexandru Isaila, Xen-devel

On Tue, Jun 2, 2020 at 7:01 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>
> On Tue, Jun 02, 2020 at 06:40:12AM -0600, Tamas K Lengyel wrote:
> > On Tue, Jun 2, 2020 at 5:08 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> > >
> > > On Wed, May 20, 2020 at 08:31:52PM -0600, Tamas K Lengyel wrote:
> > > > Extend the monitor_op domctl to include option that enables
> > > > controlling what values certain registers are permitted to hold
> > > > by a monitor subscriber.
> > >
> > > I think the change could benefit for some more detail commit message
> > > here. Why is this useful?
> >
> > You would have to ask the Bitdefender folks who made the feature. I
> > don't use it. Here we are just making it optional as it is buggy so it
> > is disabled by default.
> >
> > >
> > > There already seems to be some support for gating MSR writes, which
> > > seems to be expanded by this commit?
> >
> > We don't expand on any existing features, we make an existing feature optional.
> >
> > >
> > > Is it solving some kind of bug reported?
> >
> > It does, please take a look at the cover letter.
>
> Please copy the relevant bits here for reference.

Sure. As things stand I'm dropping patch 2 & 3 anyway so I'll just use
the cover letter as the commit message.

Tamas


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 for-4.14 1/3] xen/monitor: Control register values
  2020-06-02 13:00           ` Jan Beulich
@ 2020-06-02 13:10             ` Tamas K Lengyel
  2020-06-03  8:04               ` Roger Pau Monné
  0 siblings, 1 reply; 21+ messages in thread
From: Tamas K Lengyel @ 2020-06-02 13:10 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Petre Pircalabu, Stefano Stabellini, Julien Grall, Wei Liu,
	Andrew Cooper, Ian Jackson, George Dunlap, Alexandru Isaila,
	Xen-devel, Roger Pau Monné

On Tue, Jun 2, 2020 at 7:00 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 02.06.2020 14:51, Tamas K Lengyel wrote:
> > On Tue, Jun 2, 2020 at 6:47 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> On 02.06.2020 14:40, Tamas K Lengyel wrote:
> >>> On Tue, Jun 2, 2020 at 5:08 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> >>>>
> >>>> On Wed, May 20, 2020 at 08:31:52PM -0600, Tamas K Lengyel wrote:
> >>>>> Extend the monitor_op domctl to include option that enables
> >>>>> controlling what values certain registers are permitted to hold
> >>>>> by a monitor subscriber.
> >>>>
> >>>> I think the change could benefit for some more detail commit message
> >>>> here. Why is this useful?
> >>>
> >>> You would have to ask the Bitdefender folks who made the feature. I
> >>> don't use it. Here we are just making it optional as it is buggy so it
> >>> is disabled by default.
> >>
> >> Now that's exactly the opposite of what I had derived from the
> >> description here so far. Perhaps an at least weak indication
> >> that you want to reword this. For example, from your reply to
> >> Roger I understand it's rather that the new flag allows to
> >> "suppress" the controlling (since presumably you don't change
> >> default behavior), rather then "enabling" it.
> >
> > What we are adding is a domctl you need to call that enables this
> > feature. It's not an option to suppress it. It shouldn't have been
> > enabled by default to begin with. That was a mistake when the feature
> > was contributed and it is buggy.
>
> Okay, in this case it's important to point out that you alter
> default behavior. The BitDefender folks may not like this, yet
> they've been surprisingly silent so far.

Well, it was Bitdefender who altered the default behavior. We are
reverting their mistake and making it optional. But I can certainly
make that more clear.

Tamas


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 for-4.14 1/3] xen/monitor: Control register values
  2020-06-02 13:10             ` Tamas K Lengyel
@ 2020-06-03  8:04               ` Roger Pau Monné
  0 siblings, 0 replies; 21+ messages in thread
From: Roger Pau Monné @ 2020-06-03  8:04 UTC (permalink / raw)
  To: Tamas K Lengyel, Alexandru Isaila, Petre Pircalabu
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Jan Beulich, Xen-devel

On Tue, Jun 02, 2020 at 07:10:07AM -0600, Tamas K Lengyel wrote:
> On Tue, Jun 2, 2020 at 7:00 AM Jan Beulich <jbeulich@suse.com> wrote:
> >
> > On 02.06.2020 14:51, Tamas K Lengyel wrote:
> > > On Tue, Jun 2, 2020 at 6:47 AM Jan Beulich <jbeulich@suse.com> wrote:
> > >>
> > >> On 02.06.2020 14:40, Tamas K Lengyel wrote:
> > >>> On Tue, Jun 2, 2020 at 5:08 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> > >>>>
> > >>>> On Wed, May 20, 2020 at 08:31:52PM -0600, Tamas K Lengyel wrote:
> > >>>>> Extend the monitor_op domctl to include option that enables
> > >>>>> controlling what values certain registers are permitted to hold
> > >>>>> by a monitor subscriber.
> > >>>>
> > >>>> I think the change could benefit for some more detail commit message
> > >>>> here. Why is this useful?
> > >>>
> > >>> You would have to ask the Bitdefender folks who made the feature. I
> > >>> don't use it. Here we are just making it optional as it is buggy so it
> > >>> is disabled by default.
> > >>
> > >> Now that's exactly the opposite of what I had derived from the
> > >> description here so far. Perhaps an at least weak indication
> > >> that you want to reword this. For example, from your reply to
> > >> Roger I understand it's rather that the new flag allows to
> > >> "suppress" the controlling (since presumably you don't change
> > >> default behavior), rather then "enabling" it.
> > >
> > > What we are adding is a domctl you need to call that enables this
> > > feature. It's not an option to suppress it. It shouldn't have been
> > > enabled by default to begin with. That was a mistake when the feature
> > > was contributed and it is buggy.
> >
> > Okay, in this case it's important to point out that you alter
> > default behavior. The BitDefender folks may not like this, yet
> > they've been surprisingly silent so far.
> 
> Well, it was Bitdefender who altered the default behavior. We are
> reverting their mistake and making it optional. But I can certainly
> make that more clear.

I would like some input from the Bitdefender guys. Is there a bugfix
planned for this feature?

It would be nice to get this in before 4.14 releases.

Roger.


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2020-06-03  8:05 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-21  2:31 [PATCH v2 for-4.14 0/3] vm_event: fix race-condition when disabling monitor events Tamas K Lengyel
2020-05-21  2:31 ` [PATCH v2 for-4.14 1/3] xen/monitor: Control register values Tamas K Lengyel
2020-06-02 11:08   ` Roger Pau Monné
2020-06-02 12:40     ` Tamas K Lengyel
2020-06-02 12:47       ` Jan Beulich
2020-06-02 12:51         ` Tamas K Lengyel
2020-06-02 13:00           ` Jan Beulich
2020-06-02 13:10             ` Tamas K Lengyel
2020-06-03  8:04               ` Roger Pau Monné
2020-06-02 13:01       ` Roger Pau Monné
2020-06-02 13:04         ` Jan Beulich
2020-06-02 13:07           ` Tamas K Lengyel
2020-06-02 13:09         ` Tamas K Lengyel
2020-05-21  2:31 ` [PATCH v2 for-4.14 2/3] xen/vm_event: add vm_event_check_pending_op Tamas K Lengyel
2020-06-02 11:47   ` Roger Pau Monné
2020-06-02 11:50     ` Jan Beulich
2020-06-02 12:43     ` Tamas K Lengyel
2020-05-21  2:31 ` [PATCH v2 for-4.14 3/3] xen/vm_event: Add safe to disable vm_event Tamas K Lengyel
2020-06-02 12:54   ` Roger Pau Monné
2020-06-02 13:06     ` Tamas K Lengyel
2020-06-01 18:58 ` [PATCH v2 for-4.14 0/3] vm_event: fix race-condition when disabling monitor events Tamas K Lengyel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).