All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] x86/vm-event: Adjustments & fixes
@ 2016-06-30 18:40 Corneliu ZUZU
  2016-06-30 18:41 ` [PATCH 1/8] x86/vm-event: proper vCPU-paused checks at resume Corneliu ZUZU
                   ` (7 more replies)
  0 siblings, 8 replies; 59+ messages in thread
From: Corneliu ZUZU @ 2016-06-30 18:40 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Tamas K Lengyel, Jan Beulich, Razvan Cojocaru,
	George Dunlap, Andrew Cooper, Julien Grall, Paul Durrant,
	Stefano Stabellini, Jun Nakajima

This patch-series makes some adjustments and fixes to the X86 vm-events code.
Summary:
    1. proper checks for vCPU pause @ vm_event_resume
    2. minor optimization
    3. relocate some code into added vm-event functions
    4. turn monitor_write_data.do_write into enum
    5. fix monitor_write_data behavior on domain cleanup
    6. surround VM_EVENT_REASON_MOV_TO_MSR w/ #ifdef CONFIG_X86
    7+8. minor fixes (cleanup stuff)

Corneliu ZUZU (8):
  x86/vm-event: proper vCPU-paused checks at resume
  x86/vmx_update_guest_cr: minor optimization
  x86/vm-event/monitor: relocate code-motion more appropriately
  x86/vm-event/monitor: turn monitor_write_data.do_write into enum
  x86/vm-event/monitor: don't compromise monitor_write_data on domain
    cleanup
  x86/vm_event_resume: surround VM_EVENT_REASON_MOV_TO_MSR w/ CONFIG_X86
  minor fixes (formatting, comments, unused includes etc.)
  minor #include change

 xen/arch/arm/domain.c             |   1 -
 xen/arch/arm/traps.c              |   1 -
 xen/arch/x86/domain.c             |   5 +-
 xen/arch/x86/hvm/emulate.c        |   8 +--
 xen/arch/x86/hvm/hvm.c            | 105 ++++++++++++++++----------------------
 xen/arch/x86/hvm/monitor.c        |   1 +
 xen/arch/x86/hvm/vmx/vmx.c        |  12 ++---
 xen/arch/x86/mm/p2m.c             |   4 +-
 xen/arch/x86/monitor.c            |  87 ++++++++++++++++++++++++++++---
 xen/arch/x86/vm_event.c           |  50 ++++++------------
 xen/common/monitor.c              |   1 -
 xen/common/vm_event.c             |  14 +++--
 xen/include/asm-arm/vm_event.h    |   9 ++--
 xen/include/asm-x86/domain.h      |  42 +++++++++------
 xen/include/asm-x86/hvm/monitor.h |   2 -
 xen/include/asm-x86/monitor.h     |   8 +--
 xen/include/asm-x86/vm_event.h    |  11 ----
 xen/include/public/vm_event.h     |  37 +++++++-------
 xen/include/xen/vm_event.h        |   1 -
 19 files changed, 216 insertions(+), 183 deletions(-)

-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 1/8] x86/vm-event: proper vCPU-paused checks at resume
  2016-06-30 18:40 [PATCH 0/8] x86/vm-event: Adjustments & fixes Corneliu ZUZU
@ 2016-06-30 18:41 ` Corneliu ZUZU
  2016-07-01  7:14   ` Razvan Cojocaru
  2016-07-01 16:51   ` Tamas K Lengyel
  2016-06-30 18:42 ` [PATCH 2/8] x86/vmx_update_guest_cr: minor optimization Corneliu ZUZU
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 59+ messages in thread
From: Corneliu ZUZU @ 2016-06-30 18:41 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Tamas K Lengyel, Razvan Cojocaru, Jan Beulich

A VM_EVENT_FLAG_VCPU_PAUSED flag in a vm-event response should only be treated
as informative that the toolstack user wants the vm-event subsystem to unpause
the target vCPU, but not be relied upon to decide if the target vCPU is actually
paused.

That being said, this patch does the following:

* Fixes (replaces) the old behavior in vm_event_resume, which relied on
  VM_EVENT_FLAG_VCPU_PAUSED to determine if the target vCPU is paused, by
  actually checking the vCPU vm-event pause-count.

* ASSERTs that the vCPU is paused in vm_event_set_registers and
  vm_event_toggle_singlestep.

* Ignores VM_EVENT_FLAG_DENY @ vm_event_register_write_resume if the target vCPU
  is not paused. Also adjusts comment in public/vm_event.h to reflect that.

Signed-off-by: Corneliu ZUZU <czuzu@bitdefender.com>
---
 xen/arch/x86/vm_event.c       | 10 +++++++++-
 xen/common/vm_event.c         |  6 ++++--
 xen/include/public/vm_event.h |  1 +
 3 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/vm_event.c b/xen/arch/x86/vm_event.c
index a9d3861..80f84d6 100644
--- a/xen/arch/x86/vm_event.c
+++ b/xen/arch/x86/vm_event.c
@@ -61,9 +61,11 @@ void vm_event_cleanup_domain(struct domain *d)
 
 void vm_event_toggle_singlestep(struct domain *d, struct vcpu *v)
 {
-    if ( !is_hvm_domain(d) || !atomic_read(&v->vm_event_pause_count) )
+    if ( !is_hvm_domain(d) )
         return;
 
+    ASSERT(atomic_read(&v->vm_event_pause_count));
+
     hvm_toggle_singlestep(v);
 }
 
@@ -75,6 +77,10 @@ void vm_event_register_write_resume(struct vcpu *v, vm_event_response_t *rsp)
 
         ASSERT(w);
 
+        /* deny flag requires the vCPU to be paused */
+        if ( !atomic_read(&v->vm_event_pause_count) )
+            return;
+
         switch ( rsp->reason )
         {
         case VM_EVENT_REASON_MOV_TO_MSR:
@@ -100,6 +106,8 @@ void vm_event_register_write_resume(struct vcpu *v, vm_event_response_t *rsp)
 
 void vm_event_set_registers(struct vcpu *v, vm_event_response_t *rsp)
 {
+    ASSERT(atomic_read(&v->vm_event_pause_count));
+
     v->arch.user_regs.eax = rsp->data.regs.x86.rax;
     v->arch.user_regs.ebx = rsp->data.regs.x86.rbx;
     v->arch.user_regs.ecx = rsp->data.regs.x86.rcx;
diff --git a/xen/common/vm_event.c b/xen/common/vm_event.c
index b303180..17d2716 100644
--- a/xen/common/vm_event.c
+++ b/xen/common/vm_event.c
@@ -417,7 +417,8 @@ void vm_event_resume(struct domain *d, struct vm_event_domain *ved)
         if ( rsp.flags & VM_EVENT_FLAG_ALTERNATE_P2M )
             p2m_altp2m_check(v, rsp.altp2m_idx);
 
-        if ( rsp.flags & VM_EVENT_FLAG_VCPU_PAUSED )
+        /* Check flags which apply only when the vCPU is paused */
+        if ( atomic_read(&v->vm_event_pause_count) )
         {
             if ( rsp.flags & VM_EVENT_FLAG_SET_REGISTERS )
                 vm_event_set_registers(v, &rsp);
@@ -425,7 +426,8 @@ void vm_event_resume(struct domain *d, struct vm_event_domain *ved)
             if ( rsp.flags & VM_EVENT_FLAG_TOGGLE_SINGLESTEP )
                 vm_event_toggle_singlestep(d, v);
 
-            vm_event_vcpu_unpause(v);
+            if ( rsp.flags & VM_EVENT_FLAG_VCPU_PAUSED )
+                vm_event_vcpu_unpause(v);
         }
     }
 }
diff --git a/xen/include/public/vm_event.h b/xen/include/public/vm_event.h
index 9270d52..f888fdd 100644
--- a/xen/include/public/vm_event.h
+++ b/xen/include/public/vm_event.h
@@ -77,6 +77,7 @@
  /*
   * Deny completion of the operation that triggered the event.
   * Currently only useful for MSR, CR0, CR3 and CR4 write events.
+  * Requires the vCPU to be paused already (synchronous events only).
   */
 #define VM_EVENT_FLAG_DENY               (1 << 6)
 /*
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 2/8] x86/vmx_update_guest_cr: minor optimization
  2016-06-30 18:40 [PATCH 0/8] x86/vm-event: Adjustments & fixes Corneliu ZUZU
  2016-06-30 18:41 ` [PATCH 1/8] x86/vm-event: proper vCPU-paused checks at resume Corneliu ZUZU
@ 2016-06-30 18:42 ` Corneliu ZUZU
  2016-06-30 18:43 ` [PATCH 3/8] x86/vm-event/monitor: relocate code-motion more appropriately Corneliu ZUZU
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 59+ messages in thread
From: Corneliu ZUZU @ 2016-06-30 18:42 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Kevin Tian, Jan Beulich, Jun Nakajima

Minor optimization @ vmx_update_guest_cr: checks if v->arch.hvm_vmx.exec_control
was modified before actually calling vmx_update_cpu_exec_control(v).

Signed-off-by: Corneliu ZUZU <czuzu@bitdefender.com>
---
 xen/arch/x86/hvm/vmx/vmx.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index a0f5793..15c84c2 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1434,8 +1434,10 @@ static void vmx_update_guest_cr(struct vcpu *v, unsigned int cr)
         if ( paging_mode_hap(v->domain) )
         {
             /* Manage GUEST_CR3 when CR0.PE=0. */
+            uint32_t old_ctls = v->arch.hvm_vmx.exec_control;
             uint32_t cr3_ctls = (CPU_BASED_CR3_LOAD_EXITING |
                                  CPU_BASED_CR3_STORE_EXITING);
+
             v->arch.hvm_vmx.exec_control &= ~cr3_ctls;
             if ( !hvm_paging_enabled(v) && !vmx_unrestricted_guest(v) )
                 v->arch.hvm_vmx.exec_control |= cr3_ctls;
@@ -1445,7 +1447,8 @@ static void vmx_update_guest_cr(struct vcpu *v, unsigned int cr)
                  monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3) )
                 v->arch.hvm_vmx.exec_control |= CPU_BASED_CR3_LOAD_EXITING;
 
-            vmx_update_cpu_exec_control(v);
+            if ( old_ctls != v->arch.hvm_vmx.exec_control )
+                vmx_update_cpu_exec_control(v);
         }
 
         if ( !nestedhvm_vcpu_in_guestmode(v) )
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 3/8] x86/vm-event/monitor: relocate code-motion more appropriately
  2016-06-30 18:40 [PATCH 0/8] x86/vm-event: Adjustments & fixes Corneliu ZUZU
  2016-06-30 18:41 ` [PATCH 1/8] x86/vm-event: proper vCPU-paused checks at resume Corneliu ZUZU
  2016-06-30 18:42 ` [PATCH 2/8] x86/vmx_update_guest_cr: minor optimization Corneliu ZUZU
@ 2016-06-30 18:43 ` Corneliu ZUZU
  2016-07-01  7:54   ` Razvan Cojocaru
  2016-07-04 10:22   ` Jan Beulich
  2016-06-30 18:44 ` [PATCH 4/8] x86/vm-event/monitor: turn monitor_write_data.do_write into enum Corneliu ZUZU
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 59+ messages in thread
From: Corneliu ZUZU @ 2016-06-30 18:43 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Tamas K Lengyel, Jan Beulich, Razvan Cojocaru,
	Andrew Cooper, Jun Nakajima

For readability:

* Add function arch_monitor_write_data (in x86/monitor.c) and separate handling
of monitor_write_data there (previously done directly in hvm_do_resume).

* Add function write_ctrlreg_adjust_traps (in x86/monitor.c) and relocate
enabling/disabling of CPU_BASED_CR3_LOAD_EXITING for CR3 monitor vm-events there
(previously done through CR0 node @ vmx_update_guest_cr(v, 0)).

Signed-off-by: Corneliu ZUZU <czuzu@bitdefender.com>
---
 xen/arch/x86/hvm/hvm.c        | 48 +++++++++-------------
 xen/arch/x86/hvm/vmx/vmx.c    |  5 ---
 xen/arch/x86/monitor.c        | 94 +++++++++++++++++++++++++++++++++++++++----
 xen/include/asm-x86/monitor.h |  2 +
 4 files changed, 107 insertions(+), 42 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index c89ab6e..5481a6e 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -475,8 +475,6 @@ void hvm_do_resume(struct vcpu *v)
 
     if ( unlikely(v->arch.vm_event) )
     {
-        struct monitor_write_data *w = &v->arch.vm_event->write_data;
-
         if ( v->arch.vm_event->emulate_flags )
         {
             enum emul_kind kind = EMUL_KIND_NORMAL;
@@ -493,32 +491,10 @@ void hvm_do_resume(struct vcpu *v)
 
             v->arch.vm_event->emulate_flags = 0;
         }
-
-        if ( w->do_write.msr )
-        {
-            hvm_msr_write_intercept(w->msr, w->value, 0);
-            w->do_write.msr = 0;
-        }
-
-        if ( w->do_write.cr0 )
-        {
-            hvm_set_cr0(w->cr0, 0);
-            w->do_write.cr0 = 0;
-        }
-
-        if ( w->do_write.cr4 )
-        {
-            hvm_set_cr4(w->cr4, 0);
-            w->do_write.cr4 = 0;
-        }
-
-        if ( w->do_write.cr3 )
-        {
-            hvm_set_cr3(w->cr3, 0);
-            w->do_write.cr3 = 0;
-        }
     }
 
+    arch_monitor_write_data(v);
+
     /* Inject pending hw/sw trap */
     if ( v->arch.hvm_vcpu.inject_trap.vector != -1 )
     {
@@ -2206,7 +2182,10 @@ int hvm_set_cr0(unsigned long value, bool_t may_defer)
 
         if ( hvm_monitor_crX(CR0, value, old_value) )
         {
-            /* The actual write will occur in hvm_do_resume(), if permitted. */
+            /*
+             * The actual write will occur in arch_monitor_write_data(), if
+             * permitted.
+             */
             v->arch.vm_event->write_data.do_write.cr0 = 1;
             v->arch.vm_event->write_data.cr0 = value;
 
@@ -2308,7 +2287,10 @@ int hvm_set_cr3(unsigned long value, bool_t may_defer)
 
         if ( hvm_monitor_crX(CR3, value, old) )
         {
-            /* The actual write will occur in hvm_do_resume(), if permitted. */
+            /*
+             * The actual write will occur in arch_monitor_write_data(), if
+             * permitted.
+             */
             v->arch.vm_event->write_data.do_write.cr3 = 1;
             v->arch.vm_event->write_data.cr3 = value;
 
@@ -2388,7 +2370,10 @@ int hvm_set_cr4(unsigned long value, bool_t may_defer)
 
         if ( hvm_monitor_crX(CR4, value, old_cr) )
         {
-            /* The actual write will occur in hvm_do_resume(), if permitted. */
+            /*
+             * The actual write will occur in arch_monitor_write_data(), if
+             * permitted.
+             */
             v->arch.vm_event->write_data.do_write.cr4 = 1;
             v->arch.vm_event->write_data.cr4 = value;
 
@@ -3767,7 +3752,10 @@ int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content,
     {
         ASSERT(v->arch.vm_event);
 
-        /* The actual write will occur in hvm_do_resume() (if permitted). */
+        /*
+         * The actual write will occur in arch_monitor_write_data(), if
+         * permitted.
+         */
         v->arch.vm_event->write_data.do_write.msr = 1;
         v->arch.vm_event->write_data.msr = msr;
         v->arch.vm_event->write_data.value = msr_content;
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 15c84c2..de04e6c 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1442,11 +1442,6 @@ static void vmx_update_guest_cr(struct vcpu *v, unsigned int cr)
             if ( !hvm_paging_enabled(v) && !vmx_unrestricted_guest(v) )
                 v->arch.hvm_vmx.exec_control |= cr3_ctls;
 
-            /* Trap CR3 updates if CR3 memory events are enabled. */
-            if ( v->domain->arch.monitor.write_ctrlreg_enabled &
-                 monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3) )
-                v->arch.hvm_vmx.exec_control |= CPU_BASED_CR3_LOAD_EXITING;
-
             if ( old_ctls != v->arch.hvm_vmx.exec_control )
                 vmx_update_cpu_exec_control(v);
         }
diff --git a/xen/arch/x86/monitor.c b/xen/arch/x86/monitor.c
index a271161..90e4856 100644
--- a/xen/arch/x86/monitor.c
+++ b/xen/arch/x86/monitor.c
@@ -20,6 +20,9 @@
  */
 
 #include <asm/monitor.h>
+#include <asm/paging.h>
+#include <asm/hvm/vmx/vmx.h>
+#include <asm/vm_event.h>
 #include <public/vm_event.h>
 
 int arch_monitor_init_domain(struct domain *d)
@@ -41,6 +44,40 @@ void arch_monitor_cleanup_domain(struct domain *d)
     memset(&d->monitor, 0, sizeof(d->monitor));
 }
 
+void arch_monitor_write_data(struct vcpu *v)
+{
+    struct monitor_write_data *w;
+
+    if ( likely(!v->arch.vm_event) )
+        return;
+
+    w = &v->arch.vm_event->write_data;
+
+    if ( w->do_write.msr )
+    {
+        hvm_msr_write_intercept(w->msr, w->value, 0);
+        w->do_write.msr = 0;
+    }
+
+    if ( w->do_write.cr0 )
+    {
+        hvm_set_cr0(w->cr0, 0);
+        w->do_write.cr0 = 0;
+    }
+
+    if ( w->do_write.cr4 )
+    {
+        hvm_set_cr4(w->cr4, 0);
+        w->do_write.cr4 = 0;
+    }
+
+    if ( w->do_write.cr3 )
+    {
+        hvm_set_cr3(w->cr3, 0);
+        w->do_write.cr3 = 0;
+    }
+}
+
 static unsigned long *monitor_bitmap_for_msr(const struct domain *d, u32 *msr)
 {
     ASSERT(d->arch.monitor.msr_bitmap && msr);
@@ -119,6 +156,55 @@ bool_t monitored_msr(const struct domain *d, u32 msr)
     return test_bit(msr, bitmap);
 }
 
+static void write_ctrlreg_adjust_traps(struct domain *d)
+{
+    struct vcpu *v;
+    struct arch_vmx_struct *avmx;
+    unsigned int cr3_bitmask;
+    bool_t cr3_vmevent, cr3_ldexit;
+
+    /* Adjust CR3 load-exiting. */
+
+    /* vmx only */
+    ASSERT(cpu_has_vmx);
+
+    /* non-hap domains trap CR3 writes unconditionally */
+    if ( !paging_mode_hap(d) )
+    {
+        for_each_vcpu ( d, v )
+            ASSERT(v->arch.hvm_vmx.exec_control & CPU_BASED_CR3_LOAD_EXITING);
+        return;
+    }
+
+    cr3_bitmask = monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3);
+    cr3_vmevent = !!(d->arch.monitor.write_ctrlreg_enabled & cr3_bitmask);
+
+    for_each_vcpu ( d, v )
+    {
+        avmx = &v->arch.hvm_vmx;
+        cr3_ldexit = !!(avmx->exec_control & CPU_BASED_CR3_LOAD_EXITING);
+
+        if ( cr3_vmevent == cr3_ldexit )
+            continue;
+
+        /*
+         * If CR0.PE=0, CR3 load exiting must remain enabled.
+         * See vmx_update_guest_cr code motion for cr = 0.
+         */
+        if ( cr3_ldexit && !hvm_paging_enabled(v) && !vmx_unrestricted_guest(v) )
+            continue;
+
+        if ( cr3_vmevent )
+            avmx->exec_control |= CPU_BASED_CR3_LOAD_EXITING;
+        else
+            avmx->exec_control &= ~CPU_BASED_CR3_LOAD_EXITING;
+
+        vmx_vmcs_enter(v);
+        vmx_update_cpu_exec_control(v);
+        vmx_vmcs_exit(v);
+    }
+}
+
 int arch_monitor_domctl_event(struct domain *d,
                               struct xen_domctl_monitor_op *mop)
 {
@@ -159,13 +245,7 @@ int arch_monitor_domctl_event(struct domain *d,
         else
             ad->monitor.write_ctrlreg_enabled &= ~ctrlreg_bitmask;
 
-        if ( VM_EVENT_X86_CR3 == mop->u.mov_to_cr.index )
-        {
-            struct vcpu *v;
-            /* Latches new CR3 mask through CR0 code. */
-            for_each_vcpu ( d, v )
-                hvm_update_guest_cr(v, 0);
-        }
+        write_ctrlreg_adjust_traps(d);
 
         domain_unpause(d);
 
diff --git a/xen/include/asm-x86/monitor.h b/xen/include/asm-x86/monitor.h
index 94b6768..1068376 100644
--- a/xen/include/asm-x86/monitor.h
+++ b/xen/include/asm-x86/monitor.h
@@ -93,6 +93,8 @@ int arch_monitor_init_domain(struct domain *d);
 
 void arch_monitor_cleanup_domain(struct domain *d);
 
+void arch_monitor_write_data(struct vcpu *v);
+
 bool_t monitored_msr(const struct domain *d, u32 msr);
 
 #endif /* __ASM_X86_MONITOR_H__ */
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 4/8] x86/vm-event/monitor: turn monitor_write_data.do_write into enum
  2016-06-30 18:40 [PATCH 0/8] x86/vm-event: Adjustments & fixes Corneliu ZUZU
                   ` (2 preceding siblings ...)
  2016-06-30 18:43 ` [PATCH 3/8] x86/vm-event/monitor: relocate code-motion more appropriately Corneliu ZUZU
@ 2016-06-30 18:44 ` Corneliu ZUZU
  2016-07-01  6:53   ` Razvan Cojocaru
  2016-07-04 12:37   ` Jan Beulich
  2016-06-30 18:45 ` [PATCH 5/8] x86/vm-event/monitor: don't compromise monitor_write_data on domain cleanup Corneliu ZUZU
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 59+ messages in thread
From: Corneliu ZUZU @ 2016-06-30 18:44 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Jan Beulich

After trapping a control-register write vm-event and -until- deciding if that
write is to be permitted or not (VM_EVENT_FLAG_DENY) and doing the actual write,
there cannot and should not be another trapped control-register write event.
That is, currently -only one- of the fields of monitor_write_data.do_write can
be true at any given moment and therefore it would be more appropriate to
replace those fields with an enum value.

Signed-off-by: Corneliu ZUZU <czuzu@bitdefender.com>
---
 xen/arch/x86/hvm/hvm.c       | 18 +++++++++++-------
 xen/arch/x86/monitor.c       | 37 ++++++++++++++++++-------------------
 xen/arch/x86/vm_event.c      | 25 ++-----------------------
 xen/include/asm-x86/domain.h | 20 ++++++++++----------
 4 files changed, 41 insertions(+), 59 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 5481a6e..884ae40 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -2186,8 +2186,9 @@ int hvm_set_cr0(unsigned long value, bool_t may_defer)
              * The actual write will occur in arch_monitor_write_data(), if
              * permitted.
              */
-            v->arch.vm_event->write_data.do_write.cr0 = 1;
-            v->arch.vm_event->write_data.cr0 = value;
+            ASSERT(MWS_NOWRITE == v->arch.vm_event->write_data.status);
+            v->arch.vm_event->write_data.status = MWS_CR0;
+            v->arch.vm_event->write_data.value = value;
 
             return X86EMUL_OKAY;
         }
@@ -2291,8 +2292,9 @@ int hvm_set_cr3(unsigned long value, bool_t may_defer)
              * The actual write will occur in arch_monitor_write_data(), if
              * permitted.
              */
-            v->arch.vm_event->write_data.do_write.cr3 = 1;
-            v->arch.vm_event->write_data.cr3 = value;
+            ASSERT(MWS_NOWRITE == v->arch.vm_event->write_data.status);
+            v->arch.vm_event->write_data.status = MWS_CR3;
+            v->arch.vm_event->write_data.value = value;
 
             return X86EMUL_OKAY;
         }
@@ -2374,8 +2376,9 @@ int hvm_set_cr4(unsigned long value, bool_t may_defer)
              * The actual write will occur in arch_monitor_write_data(), if
              * permitted.
              */
-            v->arch.vm_event->write_data.do_write.cr4 = 1;
-            v->arch.vm_event->write_data.cr4 = value;
+            ASSERT(MWS_NOWRITE == v->arch.vm_event->write_data.status);
+            v->arch.vm_event->write_data.status = MWS_CR4;
+            v->arch.vm_event->write_data.value = value;
 
             return X86EMUL_OKAY;
         }
@@ -3756,7 +3759,8 @@ int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content,
          * The actual write will occur in arch_monitor_write_data(), if
          * permitted.
          */
-        v->arch.vm_event->write_data.do_write.msr = 1;
+        ASSERT(MWS_NOWRITE == v->arch.vm_event->write_data.status);
+        v->arch.vm_event->write_data.status = MWS_MSR;
         v->arch.vm_event->write_data.msr = msr;
         v->arch.vm_event->write_data.value = msr_content;
 
diff --git a/xen/arch/x86/monitor.c b/xen/arch/x86/monitor.c
index 90e4856..5c8d4da 100644
--- a/xen/arch/x86/monitor.c
+++ b/xen/arch/x86/monitor.c
@@ -53,29 +53,28 @@ void arch_monitor_write_data(struct vcpu *v)
 
     w = &v->arch.vm_event->write_data;
 
-    if ( w->do_write.msr )
-    {
-        hvm_msr_write_intercept(w->msr, w->value, 0);
-        w->do_write.msr = 0;
-    }
-
-    if ( w->do_write.cr0 )
-    {
-        hvm_set_cr0(w->cr0, 0);
-        w->do_write.cr0 = 0;
-    }
+    if ( likely(MWS_NOWRITE == w->status) )
+        return;
 
-    if ( w->do_write.cr4 )
+    switch ( w->status )
     {
-        hvm_set_cr4(w->cr4, 0);
-        w->do_write.cr4 = 0;
+    case MWS_MSR:
+        hvm_msr_write_intercept(w->msr, w->value, 0);
+        break;
+    case MWS_CR0:
+        hvm_set_cr0(w->value, 0);
+        break;
+    case MWS_CR3:
+        hvm_set_cr3(w->value, 0);
+        break;
+    case MWS_CR4:
+        hvm_set_cr4(w->value, 0);
+        break;
+    default:
+        break;
     }
 
-    if ( w->do_write.cr3 )
-    {
-        hvm_set_cr3(w->cr3, 0);
-        w->do_write.cr3 = 0;
-    }
+    w->status = MWS_NOWRITE;
 }
 
 static unsigned long *monitor_bitmap_for_msr(const struct domain *d, u32 *msr)
diff --git a/xen/arch/x86/vm_event.c b/xen/arch/x86/vm_event.c
index 80f84d6..825da48 100644
--- a/xen/arch/x86/vm_event.c
+++ b/xen/arch/x86/vm_event.c
@@ -73,34 +73,13 @@ void vm_event_register_write_resume(struct vcpu *v, vm_event_response_t *rsp)
 {
     if ( rsp->flags & VM_EVENT_FLAG_DENY )
     {
-        struct monitor_write_data *w = &v->arch.vm_event->write_data;
-
-        ASSERT(w);
+        ASSERT(v->arch.vm_event);
 
         /* deny flag requires the vCPU to be paused */
         if ( !atomic_read(&v->vm_event_pause_count) )
             return;
 
-        switch ( rsp->reason )
-        {
-        case VM_EVENT_REASON_MOV_TO_MSR:
-            w->do_write.msr = 0;
-            break;
-        case VM_EVENT_REASON_WRITE_CTRLREG:
-            switch ( rsp->u.write_ctrlreg.index )
-            {
-            case VM_EVENT_X86_CR0:
-                w->do_write.cr0 = 0;
-                break;
-            case VM_EVENT_X86_CR3:
-                w->do_write.cr3 = 0;
-                break;
-            case VM_EVENT_X86_CR4:
-                w->do_write.cr4 = 0;
-                break;
-            }
-            break;
-        }
+        v->arch.vm_event->write_data.status = MWS_NOWRITE;
     }
 }
 
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 7c27f9e..a22ee6b 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -259,19 +259,19 @@ struct pv_domain
     struct cpuidmasks *cpuidmasks;
 };
 
-struct monitor_write_data {
-    struct {
-        unsigned int msr : 1;
-        unsigned int cr0 : 1;
-        unsigned int cr3 : 1;
-        unsigned int cr4 : 1;
-    } do_write;
+enum monitor_write_status
+{
+    MWS_NOWRITE = 0,
+    MWS_MSR,
+    MWS_CR0,
+    MWS_CR3,
+    MWS_CR4,
+};
 
+struct monitor_write_data {
+    enum monitor_write_status status;
     uint32_t msr;
     uint64_t value;
-    uint64_t cr0;
-    uint64_t cr3;
-    uint64_t cr4;
 };
 
 struct arch_domain
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 5/8] x86/vm-event/monitor: don't compromise monitor_write_data on domain cleanup
  2016-06-30 18:40 [PATCH 0/8] x86/vm-event: Adjustments & fixes Corneliu ZUZU
                   ` (3 preceding siblings ...)
  2016-06-30 18:44 ` [PATCH 4/8] x86/vm-event/monitor: turn monitor_write_data.do_write into enum Corneliu ZUZU
@ 2016-06-30 18:45 ` Corneliu ZUZU
  2016-07-01  6:47   ` Razvan Cojocaru
  2016-07-04 12:47   ` Jan Beulich
  2016-06-30 18:46 ` [PATCH 6/8] x86/vm_event_resume: surround VM_EVENT_REASON_MOV_TO_MSR w/ CONFIG_X86 Corneliu ZUZU
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 59+ messages in thread
From: Corneliu ZUZU @ 2016-06-30 18:45 UTC (permalink / raw)
  To: xen-devel
  Cc: Tamas K Lengyel, Razvan Cojocaru, George Dunlap, Andrew Cooper,
	Paul Durrant, Jan Beulich

The arch_vm_event structure is dynamically allocated and freed @
vm_event_cleanup_domain. This cleanup is triggered e.g. when the toolstack user
disables domain monitoring (xc_monitor_disable), which in turn effectively
discards any information that was in arch_vm_event.write_data.

But this can yield unexpected behavior since if a CR-write was awaiting to be
committed on the scheduling tail (hvm_do_resume->arch_monitor_write_data)
before xc_monitor_disable is called, then the domain CR write is wrongfully
ignored, which of course, in these cases, can easily render a domain crash.

To fix the issue, this patch makes only arch_vm_event.emul_read_data dynamically
allocated instead of the whole arch_vm_event structure. With this we can avoid
invalidation of an awaiting arch_vm_event.write_data by selectively cleaning up
only emul_read_data and emulate_flags @ vm_event_cleanup_domain.

Small note: arch_vm_event structure definition needed to be moved from
asm-x86/vm_event.h to asm-x86/domain.h in the process.

Signed-off-by: Corneliu ZUZU <czuzu@bitdefender.com>
---
 xen/arch/x86/domain.c          |  5 ++--
 xen/arch/x86/hvm/emulate.c     |  8 +++---
 xen/arch/x86/hvm/hvm.c         | 62 ++++++++++++++++++------------------------
 xen/arch/x86/mm/p2m.c          |  4 +--
 xen/arch/x86/monitor.c         |  7 +----
 xen/arch/x86/vm_event.c        | 16 +++++------
 xen/include/asm-x86/domain.h   | 42 +++++++++++++++++-----------
 xen/include/asm-x86/monitor.h  |  3 +-
 xen/include/asm-x86/vm_event.h | 10 -------
 9 files changed, 73 insertions(+), 84 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index bb59247..06e68ae 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -492,8 +492,9 @@ int vcpu_initialise(struct vcpu *v)
 
 void vcpu_destroy(struct vcpu *v)
 {
-    xfree(v->arch.vm_event);
-    v->arch.vm_event = NULL;
+    v->arch.vm_event.emulate_flags = 0;
+    xfree(v->arch.vm_event.emul_read_data);
+    v->arch.vm_event.emul_read_data = NULL;
 
     if ( is_pv_32bit_vcpu(v) )
     {
diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 855af4d..68f5515 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -73,12 +73,12 @@ static int set_context_data(void *buffer, unsigned int size)
 {
     struct vcpu *curr = current;
 
-    if ( curr->arch.vm_event )
+    if ( curr->arch.vm_event.emul_read_data )
     {
         unsigned int safe_size =
-            min(size, curr->arch.vm_event->emul_read_data.size);
+            min(size, curr->arch.vm_event.emul_read_data->size);
 
-        memcpy(buffer, curr->arch.vm_event->emul_read_data.data, safe_size);
+        memcpy(buffer, curr->arch.vm_event.emul_read_data->data, safe_size);
         memset(buffer + safe_size, 0, size - safe_size);
         return X86EMUL_OKAY;
     }
@@ -523,7 +523,7 @@ static int hvmemul_virtual_to_linear(
      * vm_event being triggered for repeated writes to a whole page.
      */
     if ( unlikely(current->domain->arch.mem_access_emulate_each_rep) &&
-         current->arch.vm_event->emulate_flags != 0 )
+         current->arch.vm_event.emulate_flags != 0 )
        max_reps = 1;
 
     /*
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 884ae40..03dffb8 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -473,24 +473,24 @@ void hvm_do_resume(struct vcpu *v)
     if ( !handle_hvm_io_completion(v) )
         return;
 
-    if ( unlikely(v->arch.vm_event) )
+    if ( unlikely(v->arch.vm_event.emulate_flags) )
     {
-        if ( v->arch.vm_event->emulate_flags )
-        {
-            enum emul_kind kind = EMUL_KIND_NORMAL;
+        enum emul_kind kind;
 
-            if ( v->arch.vm_event->emulate_flags &
-                 VM_EVENT_FLAG_SET_EMUL_READ_DATA )
-                kind = EMUL_KIND_SET_CONTEXT;
-            else if ( v->arch.vm_event->emulate_flags &
-                      VM_EVENT_FLAG_EMULATE_NOWRITE )
-                kind = EMUL_KIND_NOWRITE;
+        ASSERT(v->arch.vm_event.emul_read_data);
 
-            hvm_mem_access_emulate_one(kind, TRAP_invalid_op,
-                                       HVM_DELIVER_NO_ERROR_CODE);
+        kind = EMUL_KIND_NORMAL;
 
-            v->arch.vm_event->emulate_flags = 0;
-        }
+        if ( v->arch.vm_event.emulate_flags & VM_EVENT_FLAG_SET_EMUL_READ_DATA )
+            kind = EMUL_KIND_SET_CONTEXT;
+        else if ( v->arch.vm_event.emulate_flags &
+                  VM_EVENT_FLAG_EMULATE_NOWRITE )
+            kind = EMUL_KIND_NOWRITE;
+
+        hvm_mem_access_emulate_one(kind, TRAP_invalid_op,
+                                   HVM_DELIVER_NO_ERROR_CODE);
+
+        v->arch.vm_event.emulate_flags = 0;
     }
 
     arch_monitor_write_data(v);
@@ -2178,17 +2178,15 @@ int hvm_set_cr0(unsigned long value, bool_t may_defer)
     if ( may_defer && unlikely(v->domain->arch.monitor.write_ctrlreg_enabled &
                                monitor_ctrlreg_bitmask(VM_EVENT_X86_CR0)) )
     {
-        ASSERT(v->arch.vm_event);
-
         if ( hvm_monitor_crX(CR0, value, old_value) )
         {
             /*
              * The actual write will occur in arch_monitor_write_data(), if
              * permitted.
              */
-            ASSERT(MWS_NOWRITE == v->arch.vm_event->write_data.status);
-            v->arch.vm_event->write_data.status = MWS_CR0;
-            v->arch.vm_event->write_data.value = value;
+            ASSERT(MWS_NOWRITE == v->arch.vm_event.write_data.status);
+            v->arch.vm_event.write_data.status = MWS_CR0;
+            v->arch.vm_event.write_data.value = value;
 
             return X86EMUL_OKAY;
         }
@@ -2284,17 +2282,15 @@ int hvm_set_cr3(unsigned long value, bool_t may_defer)
     if ( may_defer && unlikely(v->domain->arch.monitor.write_ctrlreg_enabled &
                                monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3)) )
     {
-        ASSERT(v->arch.vm_event);
-
         if ( hvm_monitor_crX(CR3, value, old) )
         {
             /*
              * The actual write will occur in arch_monitor_write_data(), if
              * permitted.
              */
-            ASSERT(MWS_NOWRITE == v->arch.vm_event->write_data.status);
-            v->arch.vm_event->write_data.status = MWS_CR3;
-            v->arch.vm_event->write_data.value = value;
+            ASSERT(MWS_NOWRITE == v->arch.vm_event.write_data.status);
+            v->arch.vm_event.write_data.status = MWS_CR3;
+            v->arch.vm_event.write_data.value = value;
 
             return X86EMUL_OKAY;
         }
@@ -2368,17 +2364,15 @@ int hvm_set_cr4(unsigned long value, bool_t may_defer)
     if ( may_defer && unlikely(v->domain->arch.monitor.write_ctrlreg_enabled &
                                monitor_ctrlreg_bitmask(VM_EVENT_X86_CR4)) )
     {
-        ASSERT(v->arch.vm_event);
-
         if ( hvm_monitor_crX(CR4, value, old_cr) )
         {
             /*
              * The actual write will occur in arch_monitor_write_data(), if
              * permitted.
              */
-            ASSERT(MWS_NOWRITE == v->arch.vm_event->write_data.status);
-            v->arch.vm_event->write_data.status = MWS_CR4;
-            v->arch.vm_event->write_data.value = value;
+            ASSERT(MWS_NOWRITE == v->arch.vm_event.write_data.status);
+            v->arch.vm_event.write_data.status = MWS_CR4;
+            v->arch.vm_event.write_data.value = value;
 
             return X86EMUL_OKAY;
         }
@@ -3753,16 +3747,14 @@ int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content,
 
     if ( may_defer && unlikely(monitored_msr(v->domain, msr)) )
     {
-        ASSERT(v->arch.vm_event);
-
         /*
          * The actual write will occur in arch_monitor_write_data(), if
          * permitted.
          */
-        ASSERT(MWS_NOWRITE == v->arch.vm_event->write_data.status);
-        v->arch.vm_event->write_data.status = MWS_MSR;
-        v->arch.vm_event->write_data.msr = msr;
-        v->arch.vm_event->write_data.value = msr_content;
+        ASSERT(MWS_NOWRITE == v->arch.vm_event.write_data.status);
+        v->arch.vm_event.write_data.status = MWS_MSR;
+        v->arch.vm_event.write_data.msr = msr;
+        v->arch.vm_event.write_data.value = msr_content;
 
         hvm_monitor_msr(msr, msr_content);
         return X86EMUL_OKAY;
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 16733a4..9bcaa8a 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1639,10 +1639,10 @@ void p2m_mem_access_emulate_check(struct vcpu *v,
             }
         }
 
-        v->arch.vm_event->emulate_flags = violation ? rsp->flags : 0;
+        v->arch.vm_event.emulate_flags = violation ? rsp->flags : 0;
 
         if ( (rsp->flags & VM_EVENT_FLAG_SET_EMUL_READ_DATA) )
-            v->arch.vm_event->emul_read_data = rsp->data.emul_read_data;
+            *v->arch.vm_event.emul_read_data = rsp->data.emul_read_data;
     }
 }
 
diff --git a/xen/arch/x86/monitor.c b/xen/arch/x86/monitor.c
index 5c8d4da..88d14ae 100644
--- a/xen/arch/x86/monitor.c
+++ b/xen/arch/x86/monitor.c
@@ -46,12 +46,7 @@ void arch_monitor_cleanup_domain(struct domain *d)
 
 void arch_monitor_write_data(struct vcpu *v)
 {
-    struct monitor_write_data *w;
-
-    if ( likely(!v->arch.vm_event) )
-        return;
-
-    w = &v->arch.vm_event->write_data;
+    struct monitor_write_data *w = &v->arch.vm_event.write_data;
 
     if ( likely(MWS_NOWRITE == w->status) )
         return;
diff --git a/xen/arch/x86/vm_event.c b/xen/arch/x86/vm_event.c
index 825da48..f21ff10 100644
--- a/xen/arch/x86/vm_event.c
+++ b/xen/arch/x86/vm_event.c
@@ -30,12 +30,13 @@ int vm_event_init_domain(struct domain *d)
 
     for_each_vcpu ( d, v )
     {
-        if ( v->arch.vm_event )
+        if ( v->arch.vm_event.emul_read_data )
             continue;
 
-        v->arch.vm_event = xzalloc(struct arch_vm_event);
+        v->arch.vm_event.emul_read_data =
+                xzalloc(struct vm_event_emul_read_data);
 
-        if ( !v->arch.vm_event )
+        if ( !v->arch.vm_event.emul_read_data )
             return -ENOMEM;
     }
 
@@ -52,8 +53,9 @@ void vm_event_cleanup_domain(struct domain *d)
 
     for_each_vcpu ( d, v )
     {
-        xfree(v->arch.vm_event);
-        v->arch.vm_event = NULL;
+        v->arch.vm_event.emulate_flags = 0;
+        xfree(v->arch.vm_event.emul_read_data);
+        v->arch.vm_event.emul_read_data = NULL;
     }
 
     d->arch.mem_access_emulate_each_rep = 0;
@@ -73,13 +75,11 @@ void vm_event_register_write_resume(struct vcpu *v, vm_event_response_t *rsp)
 {
     if ( rsp->flags & VM_EVENT_FLAG_DENY )
     {
-        ASSERT(v->arch.vm_event);
-
         /* deny flag requires the vCPU to be paused */
         if ( !atomic_read(&v->vm_event_pause_count) )
             return;
 
-        v->arch.vm_event->write_data.status = MWS_NOWRITE;
+        v->arch.vm_event.write_data.status = MWS_NOWRITE;
     }
 }
 
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index a22ee6b..7ea5c8f 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -259,21 +259,6 @@ struct pv_domain
     struct cpuidmasks *cpuidmasks;
 };
 
-enum monitor_write_status
-{
-    MWS_NOWRITE = 0,
-    MWS_MSR,
-    MWS_CR0,
-    MWS_CR3,
-    MWS_CR4,
-};
-
-struct monitor_write_data {
-    enum monitor_write_status status;
-    uint32_t msr;
-    uint64_t value;
-};
-
 struct arch_domain
 {
     struct page_info *perdomain_l3_pg;
@@ -496,6 +481,31 @@ typedef enum __packed {
     SMAP_CHECK_DISABLED,        /* disable the check */
 } smap_check_policy_t;
 
+enum monitor_write_status
+{
+    MWS_NOWRITE = 0,
+    MWS_MSR,
+    MWS_CR0,
+    MWS_CR3,
+    MWS_CR4,
+};
+
+struct monitor_write_data {
+    enum monitor_write_status status;
+    uint32_t msr;
+    uint64_t value;
+};
+
+/*
+ * Should we emulate the next matching instruction on VCPU resume
+ * after a vm_event?
+ */
+struct arch_vm_event {
+    uint32_t emulate_flags;
+    struct vm_event_emul_read_data *emul_read_data;
+    struct monitor_write_data write_data;
+};
+
 struct arch_vcpu
 {
     /*
@@ -569,7 +579,7 @@ struct arch_vcpu
     /* A secondary copy of the vcpu time info. */
     XEN_GUEST_HANDLE(vcpu_time_info_t) time_info_guest;
 
-    struct arch_vm_event *vm_event;
+    struct arch_vm_event vm_event;
 };
 
 smap_check_policy_t smap_policy_change(struct vcpu *v,
diff --git a/xen/include/asm-x86/monitor.h b/xen/include/asm-x86/monitor.h
index 1068376..984ac4c 100644
--- a/xen/include/asm-x86/monitor.h
+++ b/xen/include/asm-x86/monitor.h
@@ -48,7 +48,8 @@ int arch_monitor_domctl_op(struct domain *d, struct xen_domctl_monitor_op *mop)
          * Enabling mem_access_emulate_each_rep without a vm_event subscriber
          * is meaningless.
          */
-        if ( d->max_vcpus && d->vcpu[0] && d->vcpu[0]->arch.vm_event )
+        if ( d->max_vcpus && d->vcpu[0] &&
+             d->vcpu[0]->arch.vm_event.emul_read_data )
             d->arch.mem_access_emulate_each_rep = !!mop->event;
         else
             rc = -EINVAL;
diff --git a/xen/include/asm-x86/vm_event.h b/xen/include/asm-x86/vm_event.h
index 026f42e..c83583d 100644
--- a/xen/include/asm-x86/vm_event.h
+++ b/xen/include/asm-x86/vm_event.h
@@ -22,16 +22,6 @@
 #include <xen/sched.h>
 #include <xen/vm_event.h>
 
-/*
- * Should we emulate the next matching instruction on VCPU resume
- * after a vm_event?
- */
-struct arch_vm_event {
-    uint32_t emulate_flags;
-    struct vm_event_emul_read_data emul_read_data;
-    struct monitor_write_data write_data;
-};
-
 int vm_event_init_domain(struct domain *d);
 
 void vm_event_cleanup_domain(struct domain *d);
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 6/8] x86/vm_event_resume: surround VM_EVENT_REASON_MOV_TO_MSR w/ CONFIG_X86
  2016-06-30 18:40 [PATCH 0/8] x86/vm-event: Adjustments & fixes Corneliu ZUZU
                   ` (4 preceding siblings ...)
  2016-06-30 18:45 ` [PATCH 5/8] x86/vm-event/monitor: don't compromise monitor_write_data on domain cleanup Corneliu ZUZU
@ 2016-06-30 18:46 ` Corneliu ZUZU
  2016-07-01  7:02   ` Razvan Cojocaru
  2016-06-30 18:47 ` [PATCH 7/8] minor fixes (formatting, comments, unused includes etc.) Corneliu ZUZU
  2016-06-30 18:47 ` [PATCH 8/8] minor #include change Corneliu ZUZU
  7 siblings, 1 reply; 59+ messages in thread
From: Corneliu ZUZU @ 2016-06-30 18:46 UTC (permalink / raw)
  To: xen-devel; +Cc: Tamas K Lengyel, Razvan Cojocaru

VM_EVENT_REASON_MOV_TO_MSR is X86-specific, surround w/ #ifdef accordingly.

Signed-off-by: Corneliu ZUZU <czuzu@bitdefender.com>
---
 xen/common/vm_event.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/xen/common/vm_event.c b/xen/common/vm_event.c
index 17d2716..89a25d1 100644
--- a/xen/common/vm_event.c
+++ b/xen/common/vm_event.c
@@ -394,7 +394,9 @@ void vm_event_resume(struct domain *d, struct vm_event_domain *ved)
          */
         switch ( rsp.reason )
         {
+#ifdef CONFIG_X86
         case VM_EVENT_REASON_MOV_TO_MSR:
+#endif
         case VM_EVENT_REASON_WRITE_CTRLREG:
             vm_event_register_write_resume(v, &rsp);
             break;
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 7/8] minor fixes (formatting, comments, unused includes etc.)
  2016-06-30 18:40 [PATCH 0/8] x86/vm-event: Adjustments & fixes Corneliu ZUZU
                   ` (5 preceding siblings ...)
  2016-06-30 18:46 ` [PATCH 6/8] x86/vm_event_resume: surround VM_EVENT_REASON_MOV_TO_MSR w/ CONFIG_X86 Corneliu ZUZU
@ 2016-06-30 18:47 ` Corneliu ZUZU
  2016-06-30 18:47 ` [PATCH 8/8] minor #include change Corneliu ZUZU
  7 siblings, 0 replies; 59+ messages in thread
From: Corneliu ZUZU @ 2016-06-30 18:47 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Stefano Stabellini, Jan Beulich, Razvan Cojocaru,
	Andrew Cooper, Julien Grall, Tamas K Lengyel, Jun Nakajima

Minor fixes:
 - remove some empty lines
 - remove some unused includes
 - multi-line comment fixes
 - 80-columns formatting fixes

Signed-off-by: Corneliu ZUZU <czuzu@bitdefender.com>
---
 xen/arch/arm/domain.c             |  1 -
 xen/arch/arm/traps.c              |  1 -
 xen/arch/x86/hvm/hvm.c            |  3 ---
 xen/arch/x86/hvm/vmx/vmx.c        |  2 --
 xen/arch/x86/monitor.c            |  1 -
 xen/arch/x86/vm_event.c           |  3 ---
 xen/common/monitor.c              |  1 -
 xen/common/vm_event.c             |  6 ++++--
 xen/include/asm-arm/vm_event.h    |  9 +++------
 xen/include/asm-x86/hvm/monitor.h |  1 -
 xen/include/asm-x86/monitor.h     |  3 ---
 xen/include/asm-x86/vm_event.h    |  1 -
 xen/include/public/vm_event.h     | 38 +++++++++++++++++++-------------------
 xen/include/xen/vm_event.h        |  1 -
 14 files changed, 26 insertions(+), 45 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 6ce4645..61fc08e 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -294,7 +294,6 @@ static void continue_new_vcpu(struct vcpu *prev)
     else
         /* check_wakeup_from_wait(); */
         reset_stack_and_jump(return_to_new_vcpu64);
-
 }
 
 void context_switch(struct vcpu *prev, struct vcpu *next)
diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 44926ca..fb01703 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -439,7 +439,6 @@ static void inject_abt32_exception(struct cpu_user_regs *regs,
         far |= addr << 32;
         WRITE_SYSREG(far, FAR_EL1);
         WRITE_SYSREG(fsr, IFSR32_EL2);
-
 #endif
     }
     else
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 03dffb8..bdfd5ff 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -31,7 +31,6 @@
 #include <xen/hypercall.h>
 #include <xen/guest_access.h>
 #include <xen/event.h>
-#include <xen/paging.h>
 #include <xen/cpu.h>
 #include <xen/wait.h>
 #include <xen/mem_access.h>
@@ -63,11 +62,9 @@
 #include <asm/hvm/nestedhvm.h>
 #include <asm/hvm/monitor.h>
 #include <asm/hvm/ioreq.h>
-#include <asm/hvm/vmx/vmx.h>
 #include <asm/altp2m.h>
 #include <asm/mtrr.h>
 #include <asm/apic.h>
-#include <asm/vm_event.h>
 #include <public/sched.h>
 #include <public/hvm/ioreq.h>
 #include <public/version.h>
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index de04e6c..effe534 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -34,7 +34,6 @@
 #include <asm/guest_access.h>
 #include <asm/debugreg.h>
 #include <asm/msr.h>
-#include <asm/paging.h>
 #include <asm/p2m.h>
 #include <asm/mem_sharing.h>
 #include <asm/hvm/emulate.h>
@@ -57,7 +56,6 @@
 #include <asm/hvm/nestedhvm.h>
 #include <asm/altp2m.h>
 #include <asm/event.h>
-#include <asm/monitor.h>
 #include <public/arch-x86/cpuid.h>
 
 static bool_t __initdata opt_force_ept;
diff --git a/xen/arch/x86/monitor.c b/xen/arch/x86/monitor.c
index 88d14ae..c2ea037 100644
--- a/xen/arch/x86/monitor.c
+++ b/xen/arch/x86/monitor.c
@@ -22,7 +22,6 @@
 #include <asm/monitor.h>
 #include <asm/paging.h>
 #include <asm/hvm/vmx/vmx.h>
-#include <asm/vm_event.h>
 #include <public/vm_event.h>
 
 int arch_monitor_init_domain(struct domain *d)
diff --git a/xen/arch/x86/vm_event.c b/xen/arch/x86/vm_event.c
index f21ff10..e7f352d 100644
--- a/xen/arch/x86/vm_event.c
+++ b/xen/arch/x86/vm_event.c
@@ -18,9 +18,6 @@
  * License along with this program; If not, see <http://www.gnu.org/licenses/>.
  */
 
-#include <xen/sched.h>
-#include <asm/hvm/hvm.h>
-#include <asm/monitor.h>
 #include <asm/vm_event.h>
 
 /* Implicitly serialized by the domctl lock. */
diff --git a/xen/common/monitor.c b/xen/common/monitor.c
index 436214a..08d1d98 100644
--- a/xen/common/monitor.c
+++ b/xen/common/monitor.c
@@ -23,7 +23,6 @@
 #include <xen/sched.h>
 #include <xen/vm_event.h>
 #include <xsm/xsm.h>
-#include <public/domctl.h>
 #include <asm/monitor.h>
 
 int monitor_domctl(struct domain *d, struct xen_domctl_monitor_op *mop)
diff --git a/xen/common/vm_event.c b/xen/common/vm_event.c
index 89a25d1..9a89bf0 100644
--- a/xen/common/vm_event.c
+++ b/xen/common/vm_event.c
@@ -770,8 +770,10 @@ void vm_event_vcpu_unpause(struct vcpu *v)
 {
     int old, new, prev = v->vm_event_pause_count.counter;
 
-    /* All unpause requests as a result of toolstack responses.  Prevent
-     * underflow of the vcpu pause count. */
+    /*
+     * All unpause requests as a result of toolstack responses.
+     * Prevent underflow of the vcpu pause count.
+     */
     do
     {
         old = prev;
diff --git a/xen/include/asm-arm/vm_event.h b/xen/include/asm-arm/vm_event.h
index a3fc4ce..ccc4b60 100644
--- a/xen/include/asm-arm/vm_event.h
+++ b/xen/include/asm-arm/vm_event.h
@@ -23,21 +23,18 @@
 #include <xen/vm_event.h>
 #include <public/domctl.h>
 
-static inline
-int vm_event_init_domain(struct domain *d)
+static inline int vm_event_init_domain(struct domain *d)
 {
     /* Nothing to do. */
     return 0;
 }
 
-static inline
-void vm_event_cleanup_domain(struct domain *d)
+static inline void vm_event_cleanup_domain(struct domain *d)
 {
     memset(&d->monitor, 0, sizeof(d->monitor));
 }
 
-static inline
-void vm_event_toggle_singlestep(struct domain *d, struct vcpu *v)
+static inline void vm_event_toggle_singlestep(struct domain *d, struct vcpu *v)
 {
     /* Not supported on ARM. */
 }
diff --git a/xen/include/asm-x86/hvm/monitor.h b/xen/include/asm-x86/hvm/monitor.h
index 55d435e..ab433ed 100644
--- a/xen/include/asm-x86/hvm/monitor.h
+++ b/xen/include/asm-x86/hvm/monitor.h
@@ -19,7 +19,6 @@
 #ifndef __ASM_X86_HVM_MONITOR_H__
 #define __ASM_X86_HVM_MONITOR_H__
 
-#include <xen/sched.h>
 #include <xen/paging.h>
 #include <public/vm_event.h>
 
diff --git a/xen/include/asm-x86/monitor.h b/xen/include/asm-x86/monitor.h
index 984ac4c..a499b23 100644
--- a/xen/include/asm-x86/monitor.h
+++ b/xen/include/asm-x86/monitor.h
@@ -23,9 +23,6 @@
 #define __ASM_X86_MONITOR_H__
 
 #include <xen/sched.h>
-#include <public/domctl.h>
-#include <asm/cpufeature.h>
-#include <asm/hvm/hvm.h>
 
 #define monitor_ctrlreg_bitmask(ctrlreg_index) (1U << (ctrlreg_index))
 
diff --git a/xen/include/asm-x86/vm_event.h b/xen/include/asm-x86/vm_event.h
index c83583d..561b45e 100644
--- a/xen/include/asm-x86/vm_event.h
+++ b/xen/include/asm-x86/vm_event.h
@@ -20,7 +20,6 @@
 #define __ASM_X86_VM_EVENT_H__
 
 #include <xen/sched.h>
-#include <xen/vm_event.h>
 
 int vm_event_init_domain(struct domain *d);
 
diff --git a/xen/include/public/vm_event.h b/xen/include/public/vm_event.h
index f888fdd..8f94e20 100644
--- a/xen/include/public/vm_event.h
+++ b/xen/include/public/vm_event.h
@@ -74,20 +74,20 @@
  * VM_EVENT_FLAG_SET_EMUL_READ_DATA are set, only the latter will be honored).
  */
 #define VM_EVENT_FLAG_SET_EMUL_READ_DATA (1 << 5)
- /*
-  * Deny completion of the operation that triggered the event.
-  * Currently only useful for MSR, CR0, CR3 and CR4 write events.
-  * Requires the vCPU to be paused already (synchronous events only).
-  */
+/*
+ * Deny completion of the operation that triggered the event.
+ * Currently only useful for MSR and control-register write events.
+ * Requires the vCPU to be paused already (synchronous events only).
+ */
 #define VM_EVENT_FLAG_DENY               (1 << 6)
 /*
  * This flag can be set in a request or a response
  *
- * On a request, indicates that the event occurred in the alternate p2m specified by
- * the altp2m_idx request field.
+ * On a request, indicates that the event occurred in the alternate p2m
+ * specified by the altp2m_idx request field.
  *
- * On a response, indicates that the VCPU should resume in the alternate p2m specified
- * by the altp2m_idx response field if possible.
+ * On a response, indicates that the VCPU should resume in the alternate p2m
+ * specified by the altp2m_idx response field if possible.
  */
 #define VM_EVENT_FLAG_ALTERNATE_P2M      (1 << 7)
 /*
@@ -178,16 +178,16 @@ struct vm_event_regs_x86 {
  * FAULT_WITH_GLA: If the violation was triggered by accessing gla
  * FAULT_IN_GPT: If the violation was triggered during translating gla
  */
-#define MEM_ACCESS_R                    (1 << 0)
-#define MEM_ACCESS_W                    (1 << 1)
-#define MEM_ACCESS_X                    (1 << 2)
-#define MEM_ACCESS_RWX                  (MEM_ACCESS_R | MEM_ACCESS_W | MEM_ACCESS_X)
-#define MEM_ACCESS_RW                   (MEM_ACCESS_R | MEM_ACCESS_W)
-#define MEM_ACCESS_RX                   (MEM_ACCESS_R | MEM_ACCESS_X)
-#define MEM_ACCESS_WX                   (MEM_ACCESS_W | MEM_ACCESS_X)
-#define MEM_ACCESS_GLA_VALID            (1 << 3)
-#define MEM_ACCESS_FAULT_WITH_GLA       (1 << 4)
-#define MEM_ACCESS_FAULT_IN_GPT         (1 << 5)
+#define MEM_ACCESS_R                (1 << 0)
+#define MEM_ACCESS_W                (1 << 1)
+#define MEM_ACCESS_X                (1 << 2)
+#define MEM_ACCESS_RWX              (MEM_ACCESS_R | MEM_ACCESS_W | MEM_ACCESS_X)
+#define MEM_ACCESS_RW               (MEM_ACCESS_R | MEM_ACCESS_W)
+#define MEM_ACCESS_RX               (MEM_ACCESS_R | MEM_ACCESS_X)
+#define MEM_ACCESS_WX               (MEM_ACCESS_W | MEM_ACCESS_X)
+#define MEM_ACCESS_GLA_VALID        (1 << 3)
+#define MEM_ACCESS_FAULT_WITH_GLA   (1 << 4)
+#define MEM_ACCESS_FAULT_IN_GPT     (1 << 5)
 
 struct vm_event_mem_access {
     uint64_t gfn;
diff --git a/xen/include/xen/vm_event.h b/xen/include/xen/vm_event.h
index 89e6243..2074090 100644
--- a/xen/include/xen/vm_event.h
+++ b/xen/include/xen/vm_event.h
@@ -83,7 +83,6 @@ int vm_event_monitor_traps(struct vcpu *v, uint8_t sync,
 
 #endif /* __VM_EVENT_H__ */
 
-
 /*
  * Local variables:
  * mode: C
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 8/8] minor #include change
  2016-06-30 18:40 [PATCH 0/8] x86/vm-event: Adjustments & fixes Corneliu ZUZU
                   ` (6 preceding siblings ...)
  2016-06-30 18:47 ` [PATCH 7/8] minor fixes (formatting, comments, unused includes etc.) Corneliu ZUZU
@ 2016-06-30 18:47 ` Corneliu ZUZU
  2016-07-01  6:56   ` Razvan Cojocaru
  7 siblings, 1 reply; 59+ messages in thread
From: Corneliu ZUZU @ 2016-06-30 18:47 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Jan Beulich

Move xen/paging.h #include from hvm/monitor.h to hvm/monitor.c (include strictly
where needed) and also change to asm/paging.h (include strictly what's needed).

Signed-off-by: Corneliu ZUZU <czuzu@bitdefender.com>
---
 xen/arch/x86/hvm/monitor.c        | 1 +
 xen/include/asm-x86/hvm/monitor.h | 1 -
 2 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/monitor.c b/xen/arch/x86/hvm/monitor.c
index 472926c..d81b11b 100644
--- a/xen/arch/x86/hvm/monitor.c
+++ b/xen/arch/x86/hvm/monitor.c
@@ -26,6 +26,7 @@
 #include <asm/hvm/monitor.h>
 #include <asm/monitor.h>
 #include <asm/vm_event.h>
+#include <asm/paging.h>
 #include <public/vm_event.h>
 
 bool_t hvm_monitor_cr(unsigned int index, unsigned long value, unsigned long old)
diff --git a/xen/include/asm-x86/hvm/monitor.h b/xen/include/asm-x86/hvm/monitor.h
index ab433ed..6c6ea5c 100644
--- a/xen/include/asm-x86/hvm/monitor.h
+++ b/xen/include/asm-x86/hvm/monitor.h
@@ -19,7 +19,6 @@
 #ifndef __ASM_X86_HVM_MONITOR_H__
 #define __ASM_X86_HVM_MONITOR_H__
 
-#include <xen/paging.h>
 #include <public/vm_event.h>
 
 enum hvm_monitor_breakpoint_type
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH 5/8] x86/vm-event/monitor: don't compromise monitor_write_data on domain cleanup
  2016-06-30 18:45 ` [PATCH 5/8] x86/vm-event/monitor: don't compromise monitor_write_data on domain cleanup Corneliu ZUZU
@ 2016-07-01  6:47   ` Razvan Cojocaru
  2016-07-01  6:56     ` Corneliu ZUZU
  2016-07-04 12:47   ` Jan Beulich
  1 sibling, 1 reply; 59+ messages in thread
From: Razvan Cojocaru @ 2016-07-01  6:47 UTC (permalink / raw)
  To: Corneliu ZUZU, xen-devel
  Cc: George Dunlap, Andrew Cooper, Paul Durrant, Tamas K Lengyel, Jan Beulich

On 06/30/16 21:45, Corneliu ZUZU wrote:
> The arch_vm_event structure is dynamically allocated and freed @
> vm_event_cleanup_domain. This cleanup is triggered e.g. when the toolstack user
> disables domain monitoring (xc_monitor_disable), which in turn effectively
> discards any information that was in arch_vm_event.write_data.
> 
> But this can yield unexpected behavior since if a CR-write was awaiting to be
> committed on the scheduling tail (hvm_do_resume->arch_monitor_write_data)
> before xc_monitor_disable is called, then the domain CR write is wrongfully
> ignored, which of course, in these cases, can easily render a domain crash.
> 
> To fix the issue, this patch makes only arch_vm_event.emul_read_data dynamically
> allocated instead of the whole arch_vm_event structure. With this we can avoid
> invalidation of an awaiting arch_vm_event.write_data by selectively cleaning up
> only emul_read_data and emulate_flags @ vm_event_cleanup_domain.
> 
> Small note: arch_vm_event structure definition needed to be moved from
> asm-x86/vm_event.h to asm-x86/domain.h in the process.
> 
> Signed-off-by: Corneliu ZUZU <czuzu@bitdefender.com>
> ---
>  xen/arch/x86/domain.c          |  5 ++--
>  xen/arch/x86/hvm/emulate.c     |  8 +++---
>  xen/arch/x86/hvm/hvm.c         | 62 ++++++++++++++++++------------------------
>  xen/arch/x86/mm/p2m.c          |  4 +--
>  xen/arch/x86/monitor.c         |  7 +----
>  xen/arch/x86/vm_event.c        | 16 +++++------
>  xen/include/asm-x86/domain.h   | 42 +++++++++++++++++-----------
>  xen/include/asm-x86/monitor.h  |  3 +-
>  xen/include/asm-x86/vm_event.h | 10 -------
>  9 files changed, 73 insertions(+), 84 deletions(-)
> 
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index bb59247..06e68ae 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -492,8 +492,9 @@ int vcpu_initialise(struct vcpu *v)
>  
>  void vcpu_destroy(struct vcpu *v)
>  {
> -    xfree(v->arch.vm_event);
> -    v->arch.vm_event = NULL;
> +    v->arch.vm_event.emulate_flags = 0;
> +    xfree(v->arch.vm_event.emul_read_data);
> +    v->arch.vm_event.emul_read_data = NULL;
>  
>      if ( is_pv_32bit_vcpu(v) )
>      {
> diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
> index 855af4d..68f5515 100644
> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -73,12 +73,12 @@ static int set_context_data(void *buffer, unsigned int size)
>  {
>      struct vcpu *curr = current;
>  
> -    if ( curr->arch.vm_event )
> +    if ( curr->arch.vm_event.emul_read_data )
>      {
>          unsigned int safe_size =
> -            min(size, curr->arch.vm_event->emul_read_data.size);
> +            min(size, curr->arch.vm_event.emul_read_data->size);
>  
> -        memcpy(buffer, curr->arch.vm_event->emul_read_data.data, safe_size);
> +        memcpy(buffer, curr->arch.vm_event.emul_read_data->data, safe_size);
>          memset(buffer + safe_size, 0, size - safe_size);
>          return X86EMUL_OKAY;
>      }
> @@ -523,7 +523,7 @@ static int hvmemul_virtual_to_linear(
>       * vm_event being triggered for repeated writes to a whole page.
>       */
>      if ( unlikely(current->domain->arch.mem_access_emulate_each_rep) &&
> -         current->arch.vm_event->emulate_flags != 0 )
> +         current->arch.vm_event.emulate_flags != 0 )
>         max_reps = 1;
>  
>      /*
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 884ae40..03dffb8 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -473,24 +473,24 @@ void hvm_do_resume(struct vcpu *v)
>      if ( !handle_hvm_io_completion(v) )
>          return;
>  
> -    if ( unlikely(v->arch.vm_event) )
> +    if ( unlikely(v->arch.vm_event.emulate_flags) )
>      {
> -        if ( v->arch.vm_event->emulate_flags )
> -        {
> -            enum emul_kind kind = EMUL_KIND_NORMAL;
> +        enum emul_kind kind;
>  
> -            if ( v->arch.vm_event->emulate_flags &
> -                 VM_EVENT_FLAG_SET_EMUL_READ_DATA )
> -                kind = EMUL_KIND_SET_CONTEXT;
> -            else if ( v->arch.vm_event->emulate_flags &
> -                      VM_EVENT_FLAG_EMULATE_NOWRITE )
> -                kind = EMUL_KIND_NOWRITE;
> +        ASSERT(v->arch.vm_event.emul_read_data);
>  
> -            hvm_mem_access_emulate_one(kind, TRAP_invalid_op,
> -                                       HVM_DELIVER_NO_ERROR_CODE);
> +        kind = EMUL_KIND_NORMAL;

Why do the "kind = EMUL_KIND_NORMAL;" assignment separately after the
ASSERT()? Could it not be left the same way as before ("enum emul_kind
kind = EMUL_KIND_NORMAL;") above the ASSERT()?

It's not a big change and I won't hold the patch over it, but small
changes add up in the review process so unnecessary changes are best
either avoided, or done in a standalone cleanup patch.

> -            v->arch.vm_event->emulate_flags = 0;
> -        }
> +        if ( v->arch.vm_event.emulate_flags & VM_EVENT_FLAG_SET_EMUL_READ_DATA )
> +            kind = EMUL_KIND_SET_CONTEXT;
> +        else if ( v->arch.vm_event.emulate_flags &
> +                  VM_EVENT_FLAG_EMULATE_NOWRITE )
> +            kind = EMUL_KIND_NOWRITE;
> +
> +        hvm_mem_access_emulate_one(kind, TRAP_invalid_op,
> +                                   HVM_DELIVER_NO_ERROR_CODE);
> +
> +        v->arch.vm_event.emulate_flags = 0;
>      }
>  
>      arch_monitor_write_data(v);
> @@ -2178,17 +2178,15 @@ int hvm_set_cr0(unsigned long value, bool_t may_defer)
>      if ( may_defer && unlikely(v->domain->arch.monitor.write_ctrlreg_enabled &
>                                 monitor_ctrlreg_bitmask(VM_EVENT_X86_CR0)) )
>      {
> -        ASSERT(v->arch.vm_event);
> -
>          if ( hvm_monitor_crX(CR0, value, old_value) )
>          {
>              /*
>               * The actual write will occur in arch_monitor_write_data(), if
>               * permitted.
>               */
> -            ASSERT(MWS_NOWRITE == v->arch.vm_event->write_data.status);
> -            v->arch.vm_event->write_data.status = MWS_CR0;
> -            v->arch.vm_event->write_data.value = value;
> +            ASSERT(MWS_NOWRITE == v->arch.vm_event.write_data.status);
> +            v->arch.vm_event.write_data.status = MWS_CR0;
> +            v->arch.vm_event.write_data.value = value;
>  
>              return X86EMUL_OKAY;
>          }
> @@ -2284,17 +2282,15 @@ int hvm_set_cr3(unsigned long value, bool_t may_defer)
>      if ( may_defer && unlikely(v->domain->arch.monitor.write_ctrlreg_enabled &
>                                 monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3)) )
>      {
> -        ASSERT(v->arch.vm_event);
> -
>          if ( hvm_monitor_crX(CR3, value, old) )
>          {
>              /*
>               * The actual write will occur in arch_monitor_write_data(), if
>               * permitted.
>               */
> -            ASSERT(MWS_NOWRITE == v->arch.vm_event->write_data.status);
> -            v->arch.vm_event->write_data.status = MWS_CR3;
> -            v->arch.vm_event->write_data.value = value;
> +            ASSERT(MWS_NOWRITE == v->arch.vm_event.write_data.status);
> +            v->arch.vm_event.write_data.status = MWS_CR3;
> +            v->arch.vm_event.write_data.value = value;
>  
>              return X86EMUL_OKAY;
>          }
> @@ -2368,17 +2364,15 @@ int hvm_set_cr4(unsigned long value, bool_t may_defer)
>      if ( may_defer && unlikely(v->domain->arch.monitor.write_ctrlreg_enabled &
>                                 monitor_ctrlreg_bitmask(VM_EVENT_X86_CR4)) )
>      {
> -        ASSERT(v->arch.vm_event);
> -
>          if ( hvm_monitor_crX(CR4, value, old_cr) )
>          {
>              /*
>               * The actual write will occur in arch_monitor_write_data(), if
>               * permitted.
>               */
> -            ASSERT(MWS_NOWRITE == v->arch.vm_event->write_data.status);
> -            v->arch.vm_event->write_data.status = MWS_CR4;
> -            v->arch.vm_event->write_data.value = value;
> +            ASSERT(MWS_NOWRITE == v->arch.vm_event.write_data.status);
> +            v->arch.vm_event.write_data.status = MWS_CR4;
> +            v->arch.vm_event.write_data.value = value;
>  
>              return X86EMUL_OKAY;
>          }
> @@ -3753,16 +3747,14 @@ int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content,
>  
>      if ( may_defer && unlikely(monitored_msr(v->domain, msr)) )
>      {
> -        ASSERT(v->arch.vm_event);
> -
>          /*
>           * The actual write will occur in arch_monitor_write_data(), if
>           * permitted.
>           */
> -        ASSERT(MWS_NOWRITE == v->arch.vm_event->write_data.status);
> -        v->arch.vm_event->write_data.status = MWS_MSR;
> -        v->arch.vm_event->write_data.msr = msr;
> -        v->arch.vm_event->write_data.value = msr_content;
> +        ASSERT(MWS_NOWRITE == v->arch.vm_event.write_data.status);
> +        v->arch.vm_event.write_data.status = MWS_MSR;
> +        v->arch.vm_event.write_data.msr = msr;
> +        v->arch.vm_event.write_data.value = msr_content;
>  
>          hvm_monitor_msr(msr, msr_content);
>          return X86EMUL_OKAY;
> diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
> index 16733a4..9bcaa8a 100644
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -1639,10 +1639,10 @@ void p2m_mem_access_emulate_check(struct vcpu *v,
>              }
>          }
>  
> -        v->arch.vm_event->emulate_flags = violation ? rsp->flags : 0;
> +        v->arch.vm_event.emulate_flags = violation ? rsp->flags : 0;
>  
>          if ( (rsp->flags & VM_EVENT_FLAG_SET_EMUL_READ_DATA) )
> -            v->arch.vm_event->emul_read_data = rsp->data.emul_read_data;
> +            *v->arch.vm_event.emul_read_data = rsp->data.emul_read_data;
>      }
>  }
>  
> diff --git a/xen/arch/x86/monitor.c b/xen/arch/x86/monitor.c
> index 5c8d4da..88d14ae 100644
> --- a/xen/arch/x86/monitor.c
> +++ b/xen/arch/x86/monitor.c
> @@ -46,12 +46,7 @@ void arch_monitor_cleanup_domain(struct domain *d)
>  
>  void arch_monitor_write_data(struct vcpu *v)
>  {
> -    struct monitor_write_data *w;
> -
> -    if ( likely(!v->arch.vm_event) )
> -        return;
> -
> -    w = &v->arch.vm_event->write_data;
> +    struct monitor_write_data *w = &v->arch.vm_event.write_data;
>  
>      if ( likely(MWS_NOWRITE == w->status) )
>          return;
> diff --git a/xen/arch/x86/vm_event.c b/xen/arch/x86/vm_event.c
> index 825da48..f21ff10 100644
> --- a/xen/arch/x86/vm_event.c
> +++ b/xen/arch/x86/vm_event.c
> @@ -30,12 +30,13 @@ int vm_event_init_domain(struct domain *d)
>  
>      for_each_vcpu ( d, v )
>      {
> -        if ( v->arch.vm_event )
> +        if ( v->arch.vm_event.emul_read_data )
>              continue;
>  
> -        v->arch.vm_event = xzalloc(struct arch_vm_event);
> +        v->arch.vm_event.emul_read_data =
> +                xzalloc(struct vm_event_emul_read_data);
>  
> -        if ( !v->arch.vm_event )
> +        if ( !v->arch.vm_event.emul_read_data )
>              return -ENOMEM;
>      }
>  
> @@ -52,8 +53,9 @@ void vm_event_cleanup_domain(struct domain *d)
>  
>      for_each_vcpu ( d, v )
>      {
> -        xfree(v->arch.vm_event);
> -        v->arch.vm_event = NULL;
> +        v->arch.vm_event.emulate_flags = 0;
> +        xfree(v->arch.vm_event.emul_read_data);
> +        v->arch.vm_event.emul_read_data = NULL;
>      }
>  
>      d->arch.mem_access_emulate_each_rep = 0;
> @@ -73,13 +75,11 @@ void vm_event_register_write_resume(struct vcpu *v, vm_event_response_t *rsp)
>  {
>      if ( rsp->flags & VM_EVENT_FLAG_DENY )
>      {
> -        ASSERT(v->arch.vm_event);
> -
>          /* deny flag requires the vCPU to be paused */
>          if ( !atomic_read(&v->vm_event_pause_count) )
>              return;
>  
> -        v->arch.vm_event->write_data.status = MWS_NOWRITE;
> +        v->arch.vm_event.write_data.status = MWS_NOWRITE;
>      }
>  }
>  
> diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
> index a22ee6b..7ea5c8f 100644
> --- a/xen/include/asm-x86/domain.h
> +++ b/xen/include/asm-x86/domain.h
> @@ -259,21 +259,6 @@ struct pv_domain
>      struct cpuidmasks *cpuidmasks;
>  };
>  
> -enum monitor_write_status
> -{
> -    MWS_NOWRITE = 0,
> -    MWS_MSR,
> -    MWS_CR0,
> -    MWS_CR3,
> -    MWS_CR4,
> -};
> -
> -struct monitor_write_data {
> -    enum monitor_write_status status;
> -    uint32_t msr;
> -    uint64_t value;
> -};
> -
>  struct arch_domain
>  {
>      struct page_info *perdomain_l3_pg;
> @@ -496,6 +481,31 @@ typedef enum __packed {
>      SMAP_CHECK_DISABLED,        /* disable the check */
>  } smap_check_policy_t;
>  
> +enum monitor_write_status
> +{
> +    MWS_NOWRITE = 0,
> +    MWS_MSR,
> +    MWS_CR0,
> +    MWS_CR3,
> +    MWS_CR4,
> +};
> +
> +struct monitor_write_data {
> +    enum monitor_write_status status;
> +    uint32_t msr;
> +    uint64_t value;
> +};
> +
> +/*
> + * Should we emulate the next matching instruction on VCPU resume
> + * after a vm_event?
> + */
> +struct arch_vm_event {
> +    uint32_t emulate_flags;
> +    struct vm_event_emul_read_data *emul_read_data;
> +    struct monitor_write_data write_data;
> +};
> +
>  struct arch_vcpu
>  {
>      /*
> @@ -569,7 +579,7 @@ struct arch_vcpu
>      /* A secondary copy of the vcpu time info. */
>      XEN_GUEST_HANDLE(vcpu_time_info_t) time_info_guest;
>  
> -    struct arch_vm_event *vm_event;
> +    struct arch_vm_event vm_event;
>  };
>  
>  smap_check_policy_t smap_policy_change(struct vcpu *v,
> diff --git a/xen/include/asm-x86/monitor.h b/xen/include/asm-x86/monitor.h
> index 1068376..984ac4c 100644
> --- a/xen/include/asm-x86/monitor.h
> +++ b/xen/include/asm-x86/monitor.h
> @@ -48,7 +48,8 @@ int arch_monitor_domctl_op(struct domain *d, struct xen_domctl_monitor_op *mop)
>           * Enabling mem_access_emulate_each_rep without a vm_event subscriber
>           * is meaningless.
>           */
> -        if ( d->max_vcpus && d->vcpu[0] && d->vcpu[0]->arch.vm_event )
> +        if ( d->max_vcpus && d->vcpu[0] &&
> +             d->vcpu[0]->arch.vm_event.emul_read_data )

Again, I won't hold the patch over this, but if there are additional
reviews that require changes and cause another version of it, please add
a small line to the comment above the if, stating that emul_read_data
only gets allocated when vm_event gets enabled, otherwise (especially
for newcomers) that check might look confusing.

Otherwise:

Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 4/8] x86/vm-event/monitor: turn monitor_write_data.do_write into enum
  2016-06-30 18:44 ` [PATCH 4/8] x86/vm-event/monitor: turn monitor_write_data.do_write into enum Corneliu ZUZU
@ 2016-07-01  6:53   ` Razvan Cojocaru
  2016-07-01  6:59     ` Corneliu ZUZU
  2016-07-04 12:37   ` Jan Beulich
  1 sibling, 1 reply; 59+ messages in thread
From: Razvan Cojocaru @ 2016-07-01  6:53 UTC (permalink / raw)
  To: Corneliu ZUZU, xen-devel; +Cc: Andrew Cooper, Jan Beulich

On 06/30/16 21:44, Corneliu ZUZU wrote:
> After trapping a control-register write vm-event and -until- deciding if that
> write is to be permitted or not (VM_EVENT_FLAG_DENY) and doing the actual write,
> there cannot and should not be another trapped control-register write event.
> That is, currently -only one- of the fields of monitor_write_data.do_write can
> be true at any given moment and therefore it would be more appropriate to
> replace those fields with an enum value.
> 
> Signed-off-by: Corneliu ZUZU <czuzu@bitdefender.com>
> ---
>  xen/arch/x86/hvm/hvm.c       | 18 +++++++++++-------
>  xen/arch/x86/monitor.c       | 37 ++++++++++++++++++-------------------
>  xen/arch/x86/vm_event.c      | 25 ++-----------------------
>  xen/include/asm-x86/domain.h | 20 ++++++++++----------
>  4 files changed, 41 insertions(+), 59 deletions(-)

I'm not sure why neither Tamas' or my email address are in the CC list
(I would have assumed either monitor.c or vm_event.c would trigger a
response from get_maintainters.pl). In any case:

Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 5/8] x86/vm-event/monitor: don't compromise monitor_write_data on domain cleanup
  2016-07-01  6:47   ` Razvan Cojocaru
@ 2016-07-01  6:56     ` Corneliu ZUZU
  2016-07-01  6:59       ` Razvan Cojocaru
  0 siblings, 1 reply; 59+ messages in thread
From: Corneliu ZUZU @ 2016-07-01  6:56 UTC (permalink / raw)
  To: Razvan Cojocaru, xen-devel
  Cc: George Dunlap, Andrew Cooper, Paul Durrant, Tamas K Lengyel, Jan Beulich

On 7/1/2016 9:47 AM, Razvan Cojocaru wrote:
> On 06/30/16 21:45, Corneliu ZUZU wrote:
>> The arch_vm_event structure is dynamically allocated and freed @
>> vm_event_cleanup_domain. This cleanup is triggered e.g. when the toolstack user
>> disables domain monitoring (xc_monitor_disable), which in turn effectively
>> discards any information that was in arch_vm_event.write_data.
>>
>> But this can yield unexpected behavior since if a CR-write was awaiting to be
>> committed on the scheduling tail (hvm_do_resume->arch_monitor_write_data)
>> before xc_monitor_disable is called, then the domain CR write is wrongfully
>> ignored, which of course, in these cases, can easily render a domain crash.
>>
>> To fix the issue, this patch makes only arch_vm_event.emul_read_data dynamically
>> allocated instead of the whole arch_vm_event structure. With this we can avoid
>> invalidation of an awaiting arch_vm_event.write_data by selectively cleaning up
>> only emul_read_data and emulate_flags @ vm_event_cleanup_domain.
>>
>> Small note: arch_vm_event structure definition needed to be moved from
>> asm-x86/vm_event.h to asm-x86/domain.h in the process.
>>
>> Signed-off-by: Corneliu ZUZU <czuzu@bitdefender.com>
>> ---
>>   xen/arch/x86/domain.c          |  5 ++--
>>   xen/arch/x86/hvm/emulate.c     |  8 +++---
>>   xen/arch/x86/hvm/hvm.c         | 62 ++++++++++++++++++------------------------
>>   xen/arch/x86/mm/p2m.c          |  4 +--
>>   xen/arch/x86/monitor.c         |  7 +----
>>   xen/arch/x86/vm_event.c        | 16 +++++------
>>   xen/include/asm-x86/domain.h   | 42 +++++++++++++++++-----------
>>   xen/include/asm-x86/monitor.h  |  3 +-
>>   xen/include/asm-x86/vm_event.h | 10 -------
>>   9 files changed, 73 insertions(+), 84 deletions(-)
>>
>> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
>> index bb59247..06e68ae 100644
>> --- a/xen/arch/x86/domain.c
>> +++ b/xen/arch/x86/domain.c
>> @@ -492,8 +492,9 @@ int vcpu_initialise(struct vcpu *v)
>>   
>>   void vcpu_destroy(struct vcpu *v)
>>   {
>> -    xfree(v->arch.vm_event);
>> -    v->arch.vm_event = NULL;
>> +    v->arch.vm_event.emulate_flags = 0;
>> +    xfree(v->arch.vm_event.emul_read_data);
>> +    v->arch.vm_event.emul_read_data = NULL;
>>   
>>       if ( is_pv_32bit_vcpu(v) )
>>       {
>> diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
>> index 855af4d..68f5515 100644
>> --- a/xen/arch/x86/hvm/emulate.c
>> +++ b/xen/arch/x86/hvm/emulate.c
>> @@ -73,12 +73,12 @@ static int set_context_data(void *buffer, unsigned int size)
>>   {
>>       struct vcpu *curr = current;
>>   
>> -    if ( curr->arch.vm_event )
>> +    if ( curr->arch.vm_event.emul_read_data )
>>       {
>>           unsigned int safe_size =
>> -            min(size, curr->arch.vm_event->emul_read_data.size);
>> +            min(size, curr->arch.vm_event.emul_read_data->size);
>>   
>> -        memcpy(buffer, curr->arch.vm_event->emul_read_data.data, safe_size);
>> +        memcpy(buffer, curr->arch.vm_event.emul_read_data->data, safe_size);
>>           memset(buffer + safe_size, 0, size - safe_size);
>>           return X86EMUL_OKAY;
>>       }
>> @@ -523,7 +523,7 @@ static int hvmemul_virtual_to_linear(
>>        * vm_event being triggered for repeated writes to a whole page.
>>        */
>>       if ( unlikely(current->domain->arch.mem_access_emulate_each_rep) &&
>> -         current->arch.vm_event->emulate_flags != 0 )
>> +         current->arch.vm_event.emulate_flags != 0 )
>>          max_reps = 1;
>>   
>>       /*
>> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
>> index 884ae40..03dffb8 100644
>> --- a/xen/arch/x86/hvm/hvm.c
>> +++ b/xen/arch/x86/hvm/hvm.c
>> @@ -473,24 +473,24 @@ void hvm_do_resume(struct vcpu *v)
>>       if ( !handle_hvm_io_completion(v) )
>>           return;
>>   
>> -    if ( unlikely(v->arch.vm_event) )
>> +    if ( unlikely(v->arch.vm_event.emulate_flags) )
>>       {
>> -        if ( v->arch.vm_event->emulate_flags )
>> -        {
>> -            enum emul_kind kind = EMUL_KIND_NORMAL;
>> +        enum emul_kind kind;
>>   
>> -            if ( v->arch.vm_event->emulate_flags &
>> -                 VM_EVENT_FLAG_SET_EMUL_READ_DATA )
>> -                kind = EMUL_KIND_SET_CONTEXT;
>> -            else if ( v->arch.vm_event->emulate_flags &
>> -                      VM_EVENT_FLAG_EMULATE_NOWRITE )
>> -                kind = EMUL_KIND_NOWRITE;
>> +        ASSERT(v->arch.vm_event.emul_read_data);
>>   
>> -            hvm_mem_access_emulate_one(kind, TRAP_invalid_op,
>> -                                       HVM_DELIVER_NO_ERROR_CODE);
>> +        kind = EMUL_KIND_NORMAL;
> Why do the "kind = EMUL_KIND_NORMAL;" assignment separately after the
> ASSERT()? Could it not be left the same way as before ("enum emul_kind
> kind = EMUL_KIND_NORMAL;") above the ASSERT()?
>
> It's not a big change and I won't hold the patch over it, but small
> changes add up in the review process so unnecessary changes are best
> either avoided, or done in a standalone cleanup patch.
>
>> -            v->arch.vm_event->emulate_flags = 0;
>> -        }
>> +        if ( v->arch.vm_event.emulate_flags & VM_EVENT_FLAG_SET_EMUL_READ_DATA )
>> +            kind = EMUL_KIND_SET_CONTEXT;
>> +        else if ( v->arch.vm_event.emulate_flags &
>> +                  VM_EVENT_FLAG_EMULATE_NOWRITE )
>> +            kind = EMUL_KIND_NOWRITE;
>> +
>> +        hvm_mem_access_emulate_one(kind, TRAP_invalid_op,
>> +                                   HVM_DELIVER_NO_ERROR_CODE);
>> +
>> +        v->arch.vm_event.emulate_flags = 0;
>>       }
>>   
>>       arch_monitor_write_data(v);
>> @@ -2178,17 +2178,15 @@ int hvm_set_cr0(unsigned long value, bool_t may_defer)
>>       if ( may_defer && unlikely(v->domain->arch.monitor.write_ctrlreg_enabled &
>>                                  monitor_ctrlreg_bitmask(VM_EVENT_X86_CR0)) )
>>       {
>> -        ASSERT(v->arch.vm_event);
>> -
>>           if ( hvm_monitor_crX(CR0, value, old_value) )
>>           {
>>               /*
>>                * The actual write will occur in arch_monitor_write_data(), if
>>                * permitted.
>>                */
>> -            ASSERT(MWS_NOWRITE == v->arch.vm_event->write_data.status);
>> -            v->arch.vm_event->write_data.status = MWS_CR0;
>> -            v->arch.vm_event->write_data.value = value;
>> +            ASSERT(MWS_NOWRITE == v->arch.vm_event.write_data.status);
>> +            v->arch.vm_event.write_data.status = MWS_CR0;
>> +            v->arch.vm_event.write_data.value = value;
>>   
>>               return X86EMUL_OKAY;
>>           }
>> @@ -2284,17 +2282,15 @@ int hvm_set_cr3(unsigned long value, bool_t may_defer)
>>       if ( may_defer && unlikely(v->domain->arch.monitor.write_ctrlreg_enabled &
>>                                  monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3)) )
>>       {
>> -        ASSERT(v->arch.vm_event);
>> -
>>           if ( hvm_monitor_crX(CR3, value, old) )
>>           {
>>               /*
>>                * The actual write will occur in arch_monitor_write_data(), if
>>                * permitted.
>>                */
>> -            ASSERT(MWS_NOWRITE == v->arch.vm_event->write_data.status);
>> -            v->arch.vm_event->write_data.status = MWS_CR3;
>> -            v->arch.vm_event->write_data.value = value;
>> +            ASSERT(MWS_NOWRITE == v->arch.vm_event.write_data.status);
>> +            v->arch.vm_event.write_data.status = MWS_CR3;
>> +            v->arch.vm_event.write_data.value = value;
>>   
>>               return X86EMUL_OKAY;
>>           }
>> @@ -2368,17 +2364,15 @@ int hvm_set_cr4(unsigned long value, bool_t may_defer)
>>       if ( may_defer && unlikely(v->domain->arch.monitor.write_ctrlreg_enabled &
>>                                  monitor_ctrlreg_bitmask(VM_EVENT_X86_CR4)) )
>>       {
>> -        ASSERT(v->arch.vm_event);
>> -
>>           if ( hvm_monitor_crX(CR4, value, old_cr) )
>>           {
>>               /*
>>                * The actual write will occur in arch_monitor_write_data(), if
>>                * permitted.
>>                */
>> -            ASSERT(MWS_NOWRITE == v->arch.vm_event->write_data.status);
>> -            v->arch.vm_event->write_data.status = MWS_CR4;
>> -            v->arch.vm_event->write_data.value = value;
>> +            ASSERT(MWS_NOWRITE == v->arch.vm_event.write_data.status);
>> +            v->arch.vm_event.write_data.status = MWS_CR4;
>> +            v->arch.vm_event.write_data.value = value;
>>   
>>               return X86EMUL_OKAY;
>>           }
>> @@ -3753,16 +3747,14 @@ int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content,
>>   
>>       if ( may_defer && unlikely(monitored_msr(v->domain, msr)) )
>>       {
>> -        ASSERT(v->arch.vm_event);
>> -
>>           /*
>>            * The actual write will occur in arch_monitor_write_data(), if
>>            * permitted.
>>            */
>> -        ASSERT(MWS_NOWRITE == v->arch.vm_event->write_data.status);
>> -        v->arch.vm_event->write_data.status = MWS_MSR;
>> -        v->arch.vm_event->write_data.msr = msr;
>> -        v->arch.vm_event->write_data.value = msr_content;
>> +        ASSERT(MWS_NOWRITE == v->arch.vm_event.write_data.status);
>> +        v->arch.vm_event.write_data.status = MWS_MSR;
>> +        v->arch.vm_event.write_data.msr = msr;
>> +        v->arch.vm_event.write_data.value = msr_content;
>>   
>>           hvm_monitor_msr(msr, msr_content);
>>           return X86EMUL_OKAY;
>> diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
>> index 16733a4..9bcaa8a 100644
>> --- a/xen/arch/x86/mm/p2m.c
>> +++ b/xen/arch/x86/mm/p2m.c
>> @@ -1639,10 +1639,10 @@ void p2m_mem_access_emulate_check(struct vcpu *v,
>>               }
>>           }
>>   
>> -        v->arch.vm_event->emulate_flags = violation ? rsp->flags : 0;
>> +        v->arch.vm_event.emulate_flags = violation ? rsp->flags : 0;
>>   
>>           if ( (rsp->flags & VM_EVENT_FLAG_SET_EMUL_READ_DATA) )
>> -            v->arch.vm_event->emul_read_data = rsp->data.emul_read_data;
>> +            *v->arch.vm_event.emul_read_data = rsp->data.emul_read_data;
>>       }
>>   }
>>   
>> diff --git a/xen/arch/x86/monitor.c b/xen/arch/x86/monitor.c
>> index 5c8d4da..88d14ae 100644
>> --- a/xen/arch/x86/monitor.c
>> +++ b/xen/arch/x86/monitor.c
>> @@ -46,12 +46,7 @@ void arch_monitor_cleanup_domain(struct domain *d)
>>   
>>   void arch_monitor_write_data(struct vcpu *v)
>>   {
>> -    struct monitor_write_data *w;
>> -
>> -    if ( likely(!v->arch.vm_event) )
>> -        return;
>> -
>> -    w = &v->arch.vm_event->write_data;
>> +    struct monitor_write_data *w = &v->arch.vm_event.write_data;
>>   
>>       if ( likely(MWS_NOWRITE == w->status) )
>>           return;
>> diff --git a/xen/arch/x86/vm_event.c b/xen/arch/x86/vm_event.c
>> index 825da48..f21ff10 100644
>> --- a/xen/arch/x86/vm_event.c
>> +++ b/xen/arch/x86/vm_event.c
>> @@ -30,12 +30,13 @@ int vm_event_init_domain(struct domain *d)
>>   
>>       for_each_vcpu ( d, v )
>>       {
>> -        if ( v->arch.vm_event )
>> +        if ( v->arch.vm_event.emul_read_data )
>>               continue;
>>   
>> -        v->arch.vm_event = xzalloc(struct arch_vm_event);
>> +        v->arch.vm_event.emul_read_data =
>> +                xzalloc(struct vm_event_emul_read_data);
>>   
>> -        if ( !v->arch.vm_event )
>> +        if ( !v->arch.vm_event.emul_read_data )
>>               return -ENOMEM;
>>       }
>>   
>> @@ -52,8 +53,9 @@ void vm_event_cleanup_domain(struct domain *d)
>>   
>>       for_each_vcpu ( d, v )
>>       {
>> -        xfree(v->arch.vm_event);
>> -        v->arch.vm_event = NULL;
>> +        v->arch.vm_event.emulate_flags = 0;
>> +        xfree(v->arch.vm_event.emul_read_data);
>> +        v->arch.vm_event.emul_read_data = NULL;
>>       }
>>   
>>       d->arch.mem_access_emulate_each_rep = 0;
>> @@ -73,13 +75,11 @@ void vm_event_register_write_resume(struct vcpu *v, vm_event_response_t *rsp)
>>   {
>>       if ( rsp->flags & VM_EVENT_FLAG_DENY )
>>       {
>> -        ASSERT(v->arch.vm_event);
>> -
>>           /* deny flag requires the vCPU to be paused */
>>           if ( !atomic_read(&v->vm_event_pause_count) )
>>               return;
>>   
>> -        v->arch.vm_event->write_data.status = MWS_NOWRITE;
>> +        v->arch.vm_event.write_data.status = MWS_NOWRITE;
>>       }
>>   }
>>   
>> diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
>> index a22ee6b..7ea5c8f 100644
>> --- a/xen/include/asm-x86/domain.h
>> +++ b/xen/include/asm-x86/domain.h
>> @@ -259,21 +259,6 @@ struct pv_domain
>>       struct cpuidmasks *cpuidmasks;
>>   };
>>   
>> -enum monitor_write_status
>> -{
>> -    MWS_NOWRITE = 0,
>> -    MWS_MSR,
>> -    MWS_CR0,
>> -    MWS_CR3,
>> -    MWS_CR4,
>> -};
>> -
>> -struct monitor_write_data {
>> -    enum monitor_write_status status;
>> -    uint32_t msr;
>> -    uint64_t value;
>> -};
>> -
>>   struct arch_domain
>>   {
>>       struct page_info *perdomain_l3_pg;
>> @@ -496,6 +481,31 @@ typedef enum __packed {
>>       SMAP_CHECK_DISABLED,        /* disable the check */
>>   } smap_check_policy_t;
>>   
>> +enum monitor_write_status
>> +{
>> +    MWS_NOWRITE = 0,
>> +    MWS_MSR,
>> +    MWS_CR0,
>> +    MWS_CR3,
>> +    MWS_CR4,
>> +};
>> +
>> +struct monitor_write_data {
>> +    enum monitor_write_status status;
>> +    uint32_t msr;
>> +    uint64_t value;
>> +};
>> +
>> +/*
>> + * Should we emulate the next matching instruction on VCPU resume
>> + * after a vm_event?
>> + */
>> +struct arch_vm_event {
>> +    uint32_t emulate_flags;
>> +    struct vm_event_emul_read_data *emul_read_data;
>> +    struct monitor_write_data write_data;
>> +};
>> +
>>   struct arch_vcpu
>>   {
>>       /*
>> @@ -569,7 +579,7 @@ struct arch_vcpu
>>       /* A secondary copy of the vcpu time info. */
>>       XEN_GUEST_HANDLE(vcpu_time_info_t) time_info_guest;
>>   
>> -    struct arch_vm_event *vm_event;
>> +    struct arch_vm_event vm_event;
>>   };
>>   
>>   smap_check_policy_t smap_policy_change(struct vcpu *v,
>> diff --git a/xen/include/asm-x86/monitor.h b/xen/include/asm-x86/monitor.h
>> index 1068376..984ac4c 100644
>> --- a/xen/include/asm-x86/monitor.h
>> +++ b/xen/include/asm-x86/monitor.h
>> @@ -48,7 +48,8 @@ int arch_monitor_domctl_op(struct domain *d, struct xen_domctl_monitor_op *mop)
>>            * Enabling mem_access_emulate_each_rep without a vm_event subscriber
>>            * is meaningless.
>>            */
>> -        if ( d->max_vcpus && d->vcpu[0] && d->vcpu[0]->arch.vm_event )
>> +        if ( d->max_vcpus && d->vcpu[0] &&
>> +             d->vcpu[0]->arch.vm_event.emul_read_data )
> Again, I won't hold the patch over this, but if there are additional
> reviews that require changes and cause another version of it, please add
> a small line to the comment above the if, stating that emul_read_data
> only gets allocated when vm_event gets enabled, otherwise (especially
> for newcomers) that check might look confusing.
>
> Otherwise:
>
> Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
>
>
> Thanks,
> Razvan
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

Since that came up wouldn't it be even nicer if we add a:

#define vm_event_initialized_on_vcpu(v)     (NULL != 
(v)->arch.vm_event.emul_read_data)

in asm-x86/vm_event.h above vm_event_init_domain and use that everywhere 
instead?

Corneliu.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 8/8] minor #include change
  2016-06-30 18:47 ` [PATCH 8/8] minor #include change Corneliu ZUZU
@ 2016-07-01  6:56   ` Razvan Cojocaru
  2016-07-01  7:02     ` Corneliu ZUZU
  0 siblings, 1 reply; 59+ messages in thread
From: Razvan Cojocaru @ 2016-07-01  6:56 UTC (permalink / raw)
  To: Corneliu ZUZU, xen-devel; +Cc: Andrew Cooper, Jan Beulich

On 06/30/16 21:47, Corneliu ZUZU wrote:
> Move xen/paging.h #include from hvm/monitor.h to hvm/monitor.c (include strictly
> where needed) and also change to asm/paging.h (include strictly what's needed).
> 
> Signed-off-by: Corneliu ZUZU <czuzu@bitdefender.com>
> ---
>  xen/arch/x86/hvm/monitor.c        | 1 +
>  xen/include/asm-x86/hvm/monitor.h | 1 -
>  2 files changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/xen/arch/x86/hvm/monitor.c b/xen/arch/x86/hvm/monitor.c
> index 472926c..d81b11b 100644
> --- a/xen/arch/x86/hvm/monitor.c
> +++ b/xen/arch/x86/hvm/monitor.c
> @@ -26,6 +26,7 @@
>  #include <asm/hvm/monitor.h>
>  #include <asm/monitor.h>
>  #include <asm/vm_event.h>
> +#include <asm/paging.h>
>  #include <public/vm_event.h>

If this #include is not required to be exactly where it is, I believe
alphabetical order is preferred (so #include <asm/paging.h> would be
above #include <asm/vm_event.h>).


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 5/8] x86/vm-event/monitor: don't compromise monitor_write_data on domain cleanup
  2016-07-01  6:56     ` Corneliu ZUZU
@ 2016-07-01  6:59       ` Razvan Cojocaru
  0 siblings, 0 replies; 59+ messages in thread
From: Razvan Cojocaru @ 2016-07-01  6:59 UTC (permalink / raw)
  To: Corneliu ZUZU, xen-devel
  Cc: George Dunlap, Andrew Cooper, Paul Durrant, Tamas K Lengyel, Jan Beulich

On 07/01/16 09:56, Corneliu ZUZU wrote:
> On 7/1/2016 9:47 AM, Razvan Cojocaru wrote:
>> On 06/30/16 21:45, Corneliu ZUZU wrote:
>>> The arch_vm_event structure is dynamically allocated and freed @
>>> vm_event_cleanup_domain. This cleanup is triggered e.g. when the
>>> toolstack user
>>> disables domain monitoring (xc_monitor_disable), which in turn
>>> effectively
>>> discards any information that was in arch_vm_event.write_data.
>>>
>>> But this can yield unexpected behavior since if a CR-write was
>>> awaiting to be
>>> committed on the scheduling tail
>>> (hvm_do_resume->arch_monitor_write_data)
>>> before xc_monitor_disable is called, then the domain CR write is
>>> wrongfully
>>> ignored, which of course, in these cases, can easily render a domain
>>> crash.
>>>
>>> To fix the issue, this patch makes only arch_vm_event.emul_read_data
>>> dynamically
>>> allocated instead of the whole arch_vm_event structure. With this we
>>> can avoid
>>> invalidation of an awaiting arch_vm_event.write_data by selectively
>>> cleaning up
>>> only emul_read_data and emulate_flags @ vm_event_cleanup_domain.
>>>
>>> Small note: arch_vm_event structure definition needed to be moved from
>>> asm-x86/vm_event.h to asm-x86/domain.h in the process.
>>>
>>> Signed-off-by: Corneliu ZUZU <czuzu@bitdefender.com>
>>> ---
>>>   xen/arch/x86/domain.c          |  5 ++--
>>>   xen/arch/x86/hvm/emulate.c     |  8 +++---
>>>   xen/arch/x86/hvm/hvm.c         | 62
>>> ++++++++++++++++++------------------------
>>>   xen/arch/x86/mm/p2m.c          |  4 +--
>>>   xen/arch/x86/monitor.c         |  7 +----
>>>   xen/arch/x86/vm_event.c        | 16 +++++------
>>>   xen/include/asm-x86/domain.h   | 42 +++++++++++++++++-----------
>>>   xen/include/asm-x86/monitor.h  |  3 +-
>>>   xen/include/asm-x86/vm_event.h | 10 -------
>>>   9 files changed, 73 insertions(+), 84 deletions(-)
>>>
>>> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
>>> index bb59247..06e68ae 100644
>>> --- a/xen/arch/x86/domain.c
>>> +++ b/xen/arch/x86/domain.c
>>> @@ -492,8 +492,9 @@ int vcpu_initialise(struct vcpu *v)
>>>     void vcpu_destroy(struct vcpu *v)
>>>   {
>>> -    xfree(v->arch.vm_event);
>>> -    v->arch.vm_event = NULL;
>>> +    v->arch.vm_event.emulate_flags = 0;
>>> +    xfree(v->arch.vm_event.emul_read_data);
>>> +    v->arch.vm_event.emul_read_data = NULL;
>>>         if ( is_pv_32bit_vcpu(v) )
>>>       {
>>> diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
>>> index 855af4d..68f5515 100644
>>> --- a/xen/arch/x86/hvm/emulate.c
>>> +++ b/xen/arch/x86/hvm/emulate.c
>>> @@ -73,12 +73,12 @@ static int set_context_data(void *buffer,
>>> unsigned int size)
>>>   {
>>>       struct vcpu *curr = current;
>>>   -    if ( curr->arch.vm_event )
>>> +    if ( curr->arch.vm_event.emul_read_data )
>>>       {
>>>           unsigned int safe_size =
>>> -            min(size, curr->arch.vm_event->emul_read_data.size);
>>> +            min(size, curr->arch.vm_event.emul_read_data->size);
>>>   -        memcpy(buffer, curr->arch.vm_event->emul_read_data.data,
>>> safe_size);
>>> +        memcpy(buffer, curr->arch.vm_event.emul_read_data->data,
>>> safe_size);
>>>           memset(buffer + safe_size, 0, size - safe_size);
>>>           return X86EMUL_OKAY;
>>>       }
>>> @@ -523,7 +523,7 @@ static int hvmemul_virtual_to_linear(
>>>        * vm_event being triggered for repeated writes to a whole page.
>>>        */
>>>       if (
>>> unlikely(current->domain->arch.mem_access_emulate_each_rep) &&
>>> -         current->arch.vm_event->emulate_flags != 0 )
>>> +         current->arch.vm_event.emulate_flags != 0 )
>>>          max_reps = 1;
>>>         /*
>>> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
>>> index 884ae40..03dffb8 100644
>>> --- a/xen/arch/x86/hvm/hvm.c
>>> +++ b/xen/arch/x86/hvm/hvm.c
>>> @@ -473,24 +473,24 @@ void hvm_do_resume(struct vcpu *v)
>>>       if ( !handle_hvm_io_completion(v) )
>>>           return;
>>>   -    if ( unlikely(v->arch.vm_event) )
>>> +    if ( unlikely(v->arch.vm_event.emulate_flags) )
>>>       {
>>> -        if ( v->arch.vm_event->emulate_flags )
>>> -        {
>>> -            enum emul_kind kind = EMUL_KIND_NORMAL;
>>> +        enum emul_kind kind;
>>>   -            if ( v->arch.vm_event->emulate_flags &
>>> -                 VM_EVENT_FLAG_SET_EMUL_READ_DATA )
>>> -                kind = EMUL_KIND_SET_CONTEXT;
>>> -            else if ( v->arch.vm_event->emulate_flags &
>>> -                      VM_EVENT_FLAG_EMULATE_NOWRITE )
>>> -                kind = EMUL_KIND_NOWRITE;
>>> +        ASSERT(v->arch.vm_event.emul_read_data);
>>>   -            hvm_mem_access_emulate_one(kind, TRAP_invalid_op,
>>> -                                       HVM_DELIVER_NO_ERROR_CODE);
>>> +        kind = EMUL_KIND_NORMAL;
>> Why do the "kind = EMUL_KIND_NORMAL;" assignment separately after the
>> ASSERT()? Could it not be left the same way as before ("enum emul_kind
>> kind = EMUL_KIND_NORMAL;") above the ASSERT()?
>>
>> It's not a big change and I won't hold the patch over it, but small
>> changes add up in the review process so unnecessary changes are best
>> either avoided, or done in a standalone cleanup patch.
>>
>>> -            v->arch.vm_event->emulate_flags = 0;
>>> -        }
>>> +        if ( v->arch.vm_event.emulate_flags &
>>> VM_EVENT_FLAG_SET_EMUL_READ_DATA )
>>> +            kind = EMUL_KIND_SET_CONTEXT;
>>> +        else if ( v->arch.vm_event.emulate_flags &
>>> +                  VM_EVENT_FLAG_EMULATE_NOWRITE )
>>> +            kind = EMUL_KIND_NOWRITE;
>>> +
>>> +        hvm_mem_access_emulate_one(kind, TRAP_invalid_op,
>>> +                                   HVM_DELIVER_NO_ERROR_CODE);
>>> +
>>> +        v->arch.vm_event.emulate_flags = 0;
>>>       }
>>>         arch_monitor_write_data(v);
>>> @@ -2178,17 +2178,15 @@ int hvm_set_cr0(unsigned long value, bool_t
>>> may_defer)
>>>       if ( may_defer &&
>>> unlikely(v->domain->arch.monitor.write_ctrlreg_enabled &
>>>                                 
>>> monitor_ctrlreg_bitmask(VM_EVENT_X86_CR0)) )
>>>       {
>>> -        ASSERT(v->arch.vm_event);
>>> -
>>>           if ( hvm_monitor_crX(CR0, value, old_value) )
>>>           {
>>>               /*
>>>                * The actual write will occur in
>>> arch_monitor_write_data(), if
>>>                * permitted.
>>>                */
>>> -            ASSERT(MWS_NOWRITE == v->arch.vm_event->write_data.status);
>>> -            v->arch.vm_event->write_data.status = MWS_CR0;
>>> -            v->arch.vm_event->write_data.value = value;
>>> +            ASSERT(MWS_NOWRITE == v->arch.vm_event.write_data.status);
>>> +            v->arch.vm_event.write_data.status = MWS_CR0;
>>> +            v->arch.vm_event.write_data.value = value;
>>>                 return X86EMUL_OKAY;
>>>           }
>>> @@ -2284,17 +2282,15 @@ int hvm_set_cr3(unsigned long value, bool_t
>>> may_defer)
>>>       if ( may_defer &&
>>> unlikely(v->domain->arch.monitor.write_ctrlreg_enabled &
>>>                                 
>>> monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3)) )
>>>       {
>>> -        ASSERT(v->arch.vm_event);
>>> -
>>>           if ( hvm_monitor_crX(CR3, value, old) )
>>>           {
>>>               /*
>>>                * The actual write will occur in
>>> arch_monitor_write_data(), if
>>>                * permitted.
>>>                */
>>> -            ASSERT(MWS_NOWRITE == v->arch.vm_event->write_data.status);
>>> -            v->arch.vm_event->write_data.status = MWS_CR3;
>>> -            v->arch.vm_event->write_data.value = value;
>>> +            ASSERT(MWS_NOWRITE == v->arch.vm_event.write_data.status);
>>> +            v->arch.vm_event.write_data.status = MWS_CR3;
>>> +            v->arch.vm_event.write_data.value = value;
>>>                 return X86EMUL_OKAY;
>>>           }
>>> @@ -2368,17 +2364,15 @@ int hvm_set_cr4(unsigned long value, bool_t
>>> may_defer)
>>>       if ( may_defer &&
>>> unlikely(v->domain->arch.monitor.write_ctrlreg_enabled &
>>>                                 
>>> monitor_ctrlreg_bitmask(VM_EVENT_X86_CR4)) )
>>>       {
>>> -        ASSERT(v->arch.vm_event);
>>> -
>>>           if ( hvm_monitor_crX(CR4, value, old_cr) )
>>>           {
>>>               /*
>>>                * The actual write will occur in
>>> arch_monitor_write_data(), if
>>>                * permitted.
>>>                */
>>> -            ASSERT(MWS_NOWRITE == v->arch.vm_event->write_data.status);
>>> -            v->arch.vm_event->write_data.status = MWS_CR4;
>>> -            v->arch.vm_event->write_data.value = value;
>>> +            ASSERT(MWS_NOWRITE == v->arch.vm_event.write_data.status);
>>> +            v->arch.vm_event.write_data.status = MWS_CR4;
>>> +            v->arch.vm_event.write_data.value = value;
>>>                 return X86EMUL_OKAY;
>>>           }
>>> @@ -3753,16 +3747,14 @@ int hvm_msr_write_intercept(unsigned int msr,
>>> uint64_t msr_content,
>>>         if ( may_defer && unlikely(monitored_msr(v->domain, msr)) )
>>>       {
>>> -        ASSERT(v->arch.vm_event);
>>> -
>>>           /*
>>>            * The actual write will occur in
>>> arch_monitor_write_data(), if
>>>            * permitted.
>>>            */
>>> -        ASSERT(MWS_NOWRITE == v->arch.vm_event->write_data.status);
>>> -        v->arch.vm_event->write_data.status = MWS_MSR;
>>> -        v->arch.vm_event->write_data.msr = msr;
>>> -        v->arch.vm_event->write_data.value = msr_content;
>>> +        ASSERT(MWS_NOWRITE == v->arch.vm_event.write_data.status);
>>> +        v->arch.vm_event.write_data.status = MWS_MSR;
>>> +        v->arch.vm_event.write_data.msr = msr;
>>> +        v->arch.vm_event.write_data.value = msr_content;
>>>             hvm_monitor_msr(msr, msr_content);
>>>           return X86EMUL_OKAY;
>>> diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
>>> index 16733a4..9bcaa8a 100644
>>> --- a/xen/arch/x86/mm/p2m.c
>>> +++ b/xen/arch/x86/mm/p2m.c
>>> @@ -1639,10 +1639,10 @@ void p2m_mem_access_emulate_check(struct vcpu
>>> *v,
>>>               }
>>>           }
>>>   -        v->arch.vm_event->emulate_flags = violation ? rsp->flags : 0;
>>> +        v->arch.vm_event.emulate_flags = violation ? rsp->flags : 0;
>>>             if ( (rsp->flags & VM_EVENT_FLAG_SET_EMUL_READ_DATA) )
>>> -            v->arch.vm_event->emul_read_data =
>>> rsp->data.emul_read_data;
>>> +            *v->arch.vm_event.emul_read_data =
>>> rsp->data.emul_read_data;
>>>       }
>>>   }
>>>   diff --git a/xen/arch/x86/monitor.c b/xen/arch/x86/monitor.c
>>> index 5c8d4da..88d14ae 100644
>>> --- a/xen/arch/x86/monitor.c
>>> +++ b/xen/arch/x86/monitor.c
>>> @@ -46,12 +46,7 @@ void arch_monitor_cleanup_domain(struct domain *d)
>>>     void arch_monitor_write_data(struct vcpu *v)
>>>   {
>>> -    struct monitor_write_data *w;
>>> -
>>> -    if ( likely(!v->arch.vm_event) )
>>> -        return;
>>> -
>>> -    w = &v->arch.vm_event->write_data;
>>> +    struct monitor_write_data *w = &v->arch.vm_event.write_data;
>>>         if ( likely(MWS_NOWRITE == w->status) )
>>>           return;
>>> diff --git a/xen/arch/x86/vm_event.c b/xen/arch/x86/vm_event.c
>>> index 825da48..f21ff10 100644
>>> --- a/xen/arch/x86/vm_event.c
>>> +++ b/xen/arch/x86/vm_event.c
>>> @@ -30,12 +30,13 @@ int vm_event_init_domain(struct domain *d)
>>>         for_each_vcpu ( d, v )
>>>       {
>>> -        if ( v->arch.vm_event )
>>> +        if ( v->arch.vm_event.emul_read_data )
>>>               continue;
>>>   -        v->arch.vm_event = xzalloc(struct arch_vm_event);
>>> +        v->arch.vm_event.emul_read_data =
>>> +                xzalloc(struct vm_event_emul_read_data);
>>>   -        if ( !v->arch.vm_event )
>>> +        if ( !v->arch.vm_event.emul_read_data )
>>>               return -ENOMEM;
>>>       }
>>>   @@ -52,8 +53,9 @@ void vm_event_cleanup_domain(struct domain *d)
>>>         for_each_vcpu ( d, v )
>>>       {
>>> -        xfree(v->arch.vm_event);
>>> -        v->arch.vm_event = NULL;
>>> +        v->arch.vm_event.emulate_flags = 0;
>>> +        xfree(v->arch.vm_event.emul_read_data);
>>> +        v->arch.vm_event.emul_read_data = NULL;
>>>       }
>>>         d->arch.mem_access_emulate_each_rep = 0;
>>> @@ -73,13 +75,11 @@ void vm_event_register_write_resume(struct vcpu
>>> *v, vm_event_response_t *rsp)
>>>   {
>>>       if ( rsp->flags & VM_EVENT_FLAG_DENY )
>>>       {
>>> -        ASSERT(v->arch.vm_event);
>>> -
>>>           /* deny flag requires the vCPU to be paused */
>>>           if ( !atomic_read(&v->vm_event_pause_count) )
>>>               return;
>>>   -        v->arch.vm_event->write_data.status = MWS_NOWRITE;
>>> +        v->arch.vm_event.write_data.status = MWS_NOWRITE;
>>>       }
>>>   }
>>>   diff --git a/xen/include/asm-x86/domain.h
>>> b/xen/include/asm-x86/domain.h
>>> index a22ee6b..7ea5c8f 100644
>>> --- a/xen/include/asm-x86/domain.h
>>> +++ b/xen/include/asm-x86/domain.h
>>> @@ -259,21 +259,6 @@ struct pv_domain
>>>       struct cpuidmasks *cpuidmasks;
>>>   };
>>>   -enum monitor_write_status
>>> -{
>>> -    MWS_NOWRITE = 0,
>>> -    MWS_MSR,
>>> -    MWS_CR0,
>>> -    MWS_CR3,
>>> -    MWS_CR4,
>>> -};
>>> -
>>> -struct monitor_write_data {
>>> -    enum monitor_write_status status;
>>> -    uint32_t msr;
>>> -    uint64_t value;
>>> -};
>>> -
>>>   struct arch_domain
>>>   {
>>>       struct page_info *perdomain_l3_pg;
>>> @@ -496,6 +481,31 @@ typedef enum __packed {
>>>       SMAP_CHECK_DISABLED,        /* disable the check */
>>>   } smap_check_policy_t;
>>>   +enum monitor_write_status
>>> +{
>>> +    MWS_NOWRITE = 0,
>>> +    MWS_MSR,
>>> +    MWS_CR0,
>>> +    MWS_CR3,
>>> +    MWS_CR4,
>>> +};
>>> +
>>> +struct monitor_write_data {
>>> +    enum monitor_write_status status;
>>> +    uint32_t msr;
>>> +    uint64_t value;
>>> +};
>>> +
>>> +/*
>>> + * Should we emulate the next matching instruction on VCPU resume
>>> + * after a vm_event?
>>> + */
>>> +struct arch_vm_event {
>>> +    uint32_t emulate_flags;
>>> +    struct vm_event_emul_read_data *emul_read_data;
>>> +    struct monitor_write_data write_data;
>>> +};
>>> +
>>>   struct arch_vcpu
>>>   {
>>>       /*
>>> @@ -569,7 +579,7 @@ struct arch_vcpu
>>>       /* A secondary copy of the vcpu time info. */
>>>       XEN_GUEST_HANDLE(vcpu_time_info_t) time_info_guest;
>>>   -    struct arch_vm_event *vm_event;
>>> +    struct arch_vm_event vm_event;
>>>   };
>>>     smap_check_policy_t smap_policy_change(struct vcpu *v,
>>> diff --git a/xen/include/asm-x86/monitor.h
>>> b/xen/include/asm-x86/monitor.h
>>> index 1068376..984ac4c 100644
>>> --- a/xen/include/asm-x86/monitor.h
>>> +++ b/xen/include/asm-x86/monitor.h
>>> @@ -48,7 +48,8 @@ int arch_monitor_domctl_op(struct domain *d, struct
>>> xen_domctl_monitor_op *mop)
>>>            * Enabling mem_access_emulate_each_rep without a vm_event
>>> subscriber
>>>            * is meaningless.
>>>            */
>>> -        if ( d->max_vcpus && d->vcpu[0] && d->vcpu[0]->arch.vm_event )
>>> +        if ( d->max_vcpus && d->vcpu[0] &&
>>> +             d->vcpu[0]->arch.vm_event.emul_read_data )
>> Again, I won't hold the patch over this, but if there are additional
>> reviews that require changes and cause another version of it, please add
>> a small line to the comment above the if, stating that emul_read_data
>> only gets allocated when vm_event gets enabled, otherwise (especially
>> for newcomers) that check might look confusing.
>>
>> Otherwise:
>>
>> Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
>>
>>
>> Thanks,
>> Razvan
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
> 
> Since that came up wouldn't it be even nicer if we add a:
> 
> #define vm_event_initialized_on_vcpu(v)     (NULL !=
> (v)->arch.vm_event.emul_read_data)
> 
> in asm-x86/vm_event.h above vm_event_init_domain and use that everywhere
> instead?

Yes, I think that's the best way to go about that.


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 4/8] x86/vm-event/monitor: turn monitor_write_data.do_write into enum
  2016-07-01  6:53   ` Razvan Cojocaru
@ 2016-07-01  6:59     ` Corneliu ZUZU
  0 siblings, 0 replies; 59+ messages in thread
From: Corneliu ZUZU @ 2016-07-01  6:59 UTC (permalink / raw)
  To: Razvan Cojocaru, xen-devel; +Cc: Andrew Cooper, Jan Beulich

On 7/1/2016 9:53 AM, Razvan Cojocaru wrote:
> On 06/30/16 21:44, Corneliu ZUZU wrote:
>> After trapping a control-register write vm-event and -until- deciding if that
>> write is to be permitted or not (VM_EVENT_FLAG_DENY) and doing the actual write,
>> there cannot and should not be another trapped control-register write event.
>> That is, currently -only one- of the fields of monitor_write_data.do_write can
>> be true at any given moment and therefore it would be more appropriate to
>> replace those fields with an enum value.
>>
>> Signed-off-by: Corneliu ZUZU <czuzu@bitdefender.com>
>> ---
>>   xen/arch/x86/hvm/hvm.c       | 18 +++++++++++-------
>>   xen/arch/x86/monitor.c       | 37 ++++++++++++++++++-------------------
>>   xen/arch/x86/vm_event.c      | 25 ++-----------------------
>>   xen/include/asm-x86/domain.h | 20 ++++++++++----------
>>   4 files changed, 41 insertions(+), 59 deletions(-)
> I'm not sure why neither Tamas' or my email address are in the CC list
> (I would have assumed either monitor.c or vm_event.c would trigger a
> response from get_maintainters.pl). In any case:
>
> Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
>
>
> Thanks,
> Razvan
>

I only now notice, it seems there's a problem with MAINTAINERS, the 2 
files are listed:
F:    xen/*/vm_event.c
F:    xen/*/monitor.c
there and the listing should have been:
F:    xen/arch/*/vm_event.c
F:    xen/arch/*/monitor.c

Will submit a separate patch ASAP with that adjustment.

Corneliu.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 6/8] x86/vm_event_resume: surround VM_EVENT_REASON_MOV_TO_MSR w/ CONFIG_X86
  2016-06-30 18:46 ` [PATCH 6/8] x86/vm_event_resume: surround VM_EVENT_REASON_MOV_TO_MSR w/ CONFIG_X86 Corneliu ZUZU
@ 2016-07-01  7:02   ` Razvan Cojocaru
  0 siblings, 0 replies; 59+ messages in thread
From: Razvan Cojocaru @ 2016-07-01  7:02 UTC (permalink / raw)
  To: Corneliu ZUZU, xen-devel; +Cc: Tamas K Lengyel

On 06/30/16 21:46, Corneliu ZUZU wrote:
> VM_EVENT_REASON_MOV_TO_MSR is X86-specific, surround w/ #ifdef accordingly.
> 
> Signed-off-by: Corneliu ZUZU <czuzu@bitdefender.com>
> ---
>  xen/common/vm_event.c | 2 ++
>  1 file changed, 2 insertions(+)

Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>


Thanks,
Razvan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 8/8] minor #include change
  2016-07-01  6:56   ` Razvan Cojocaru
@ 2016-07-01  7:02     ` Corneliu ZUZU
  2016-07-01  7:31       ` Jan Beulich
  0 siblings, 1 reply; 59+ messages in thread
From: Corneliu ZUZU @ 2016-07-01  7:02 UTC (permalink / raw)
  To: Razvan Cojocaru, xen-devel; +Cc: Andrew Cooper, Jan Beulich

On 7/1/2016 9:56 AM, Razvan Cojocaru wrote:
> On 06/30/16 21:47, Corneliu ZUZU wrote:
>> Move xen/paging.h #include from hvm/monitor.h to hvm/monitor.c (include strictly
>> where needed) and also change to asm/paging.h (include strictly what's needed).
>>
>> Signed-off-by: Corneliu ZUZU <czuzu@bitdefender.com>
>> ---
>>   xen/arch/x86/hvm/monitor.c        | 1 +
>>   xen/include/asm-x86/hvm/monitor.h | 1 -
>>   2 files changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/xen/arch/x86/hvm/monitor.c b/xen/arch/x86/hvm/monitor.c
>> index 472926c..d81b11b 100644
>> --- a/xen/arch/x86/hvm/monitor.c
>> +++ b/xen/arch/x86/hvm/monitor.c
>> @@ -26,6 +26,7 @@
>>   #include <asm/hvm/monitor.h>
>>   #include <asm/monitor.h>
>>   #include <asm/vm_event.h>
>> +#include <asm/paging.h>
>>   #include <public/vm_event.h>
> If this #include is not required to be exactly where it is, I believe
> alphabetical order is preferred (so #include <asm/paging.h> would be
> above #include <asm/vm_event.h>).
>
>
> Thanks,
> Razvan
>

Didn't know that was the rule, ack.

Thanks,
Corneliu.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 1/8] x86/vm-event: proper vCPU-paused checks at resume
  2016-06-30 18:41 ` [PATCH 1/8] x86/vm-event: proper vCPU-paused checks at resume Corneliu ZUZU
@ 2016-07-01  7:14   ` Razvan Cojocaru
  2016-07-01 16:51   ` Tamas K Lengyel
  1 sibling, 0 replies; 59+ messages in thread
From: Razvan Cojocaru @ 2016-07-01  7:14 UTC (permalink / raw)
  To: Corneliu ZUZU, xen-devel; +Cc: Andrew Cooper, Tamas K Lengyel, Jan Beulich

On 06/30/16 21:41, Corneliu ZUZU wrote:
> A VM_EVENT_FLAG_VCPU_PAUSED flag in a vm-event response should only be treated
> as informative that the toolstack user wants the vm-event subsystem to unpause
> the target vCPU, but not be relied upon to decide if the target vCPU is actually
> paused.
> 
> That being said, this patch does the following:
> 
> * Fixes (replaces) the old behavior in vm_event_resume, which relied on
>   VM_EVENT_FLAG_VCPU_PAUSED to determine if the target vCPU is paused, by
>   actually checking the vCPU vm-event pause-count.
> 
> * ASSERTs that the vCPU is paused in vm_event_set_registers and
>   vm_event_toggle_singlestep.
> 
> * Ignores VM_EVENT_FLAG_DENY @ vm_event_register_write_resume if the target vCPU
>   is not paused. Also adjusts comment in public/vm_event.h to reflect that.
> 
> Signed-off-by: Corneliu ZUZU <czuzu@bitdefender.com>
> ---
>  xen/arch/x86/vm_event.c       | 10 +++++++++-
>  xen/common/vm_event.c         |  6 ++++--
>  xen/include/public/vm_event.h |  1 +
>  3 files changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/arch/x86/vm_event.c b/xen/arch/x86/vm_event.c
> index a9d3861..80f84d6 100644
> --- a/xen/arch/x86/vm_event.c
> +++ b/xen/arch/x86/vm_event.c
> @@ -61,9 +61,11 @@ void vm_event_cleanup_domain(struct domain *d)
>  
>  void vm_event_toggle_singlestep(struct domain *d, struct vcpu *v)
>  {
> -    if ( !is_hvm_domain(d) || !atomic_read(&v->vm_event_pause_count) )
> +    if ( !is_hvm_domain(d) )
>          return;
>  
> +    ASSERT(atomic_read(&v->vm_event_pause_count));
> +
>      hvm_toggle_singlestep(v);
>  }

I'd like for us to have Tamas' ack on the above change. Based on that:

Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 8/8] minor #include change
  2016-07-01  7:02     ` Corneliu ZUZU
@ 2016-07-01  7:31       ` Jan Beulich
  2016-07-01  7:44         ` Corneliu ZUZU
  0 siblings, 1 reply; 59+ messages in thread
From: Jan Beulich @ 2016-07-01  7:31 UTC (permalink / raw)
  To: Corneliu ZUZU; +Cc: Andrew Cooper, Razvan Cojocaru, xen-devel

>>> On 01.07.16 at 09:02, <czuzu@bitdefender.com> wrote:
> On 7/1/2016 9:56 AM, Razvan Cojocaru wrote:
>> On 06/30/16 21:47, Corneliu ZUZU wrote:
>>> --- a/xen/arch/x86/hvm/monitor.c
>>> +++ b/xen/arch/x86/hvm/monitor.c
>>> @@ -26,6 +26,7 @@
>>>   #include <asm/hvm/monitor.h>
>>>   #include <asm/monitor.h>
>>>   #include <asm/vm_event.h>
>>> +#include <asm/paging.h>
>>>   #include <public/vm_event.h>
>> If this #include is not required to be exactly where it is, I believe
>> alphabetical order is preferred (so #include <asm/paging.h> would be
>> above #include <asm/vm_event.h>).
> 
> Didn't know that was the rule, ack.

The fundamental idea being that conflicts between patches are more
likely when everyone just adds to the end of a group.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 8/8] minor #include change
  2016-07-01  7:31       ` Jan Beulich
@ 2016-07-01  7:44         ` Corneliu ZUZU
  0 siblings, 0 replies; 59+ messages in thread
From: Corneliu ZUZU @ 2016-07-01  7:44 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Razvan Cojocaru, xen-devel

On 7/1/2016 10:31 AM, Jan Beulich wrote:
>>>> On 01.07.16 at 09:02, <czuzu@bitdefender.com> wrote:
>> On 7/1/2016 9:56 AM, Razvan Cojocaru wrote:
>>> On 06/30/16 21:47, Corneliu ZUZU wrote:
>>>> --- a/xen/arch/x86/hvm/monitor.c
>>>> +++ b/xen/arch/x86/hvm/monitor.c
>>>> @@ -26,6 +26,7 @@
>>>>    #include <asm/hvm/monitor.h>
>>>>    #include <asm/monitor.h>
>>>>    #include <asm/vm_event.h>
>>>> +#include <asm/paging.h>
>>>>    #include <public/vm_event.h>
>>> If this #include is not required to be exactly where it is, I believe
>>> alphabetical order is preferred (so #include <asm/paging.h> would be
>>> above #include <asm/vm_event.h>).
>> Didn't know that was the rule, ack.
> The fundamental idea being that conflicts between patches are more
> likely when everyone just adds to the end of a group.
>
> Jan

Makes sense, thanks.

Corneliu.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 3/8] x86/vm-event/monitor: relocate code-motion more appropriately
  2016-06-30 18:43 ` [PATCH 3/8] x86/vm-event/monitor: relocate code-motion more appropriately Corneliu ZUZU
@ 2016-07-01  7:54   ` Razvan Cojocaru
  2016-07-04 10:22   ` Jan Beulich
  1 sibling, 0 replies; 59+ messages in thread
From: Razvan Cojocaru @ 2016-07-01  7:54 UTC (permalink / raw)
  To: Corneliu ZUZU, xen-devel
  Cc: Andrew Cooper, Kevin Tian, Tamas K Lengyel, Jun Nakajima, Jan Beulich

On 06/30/16 21:43, Corneliu ZUZU wrote:
> For readability:
> 
> * Add function arch_monitor_write_data (in x86/monitor.c) and separate handling
> of monitor_write_data there (previously done directly in hvm_do_resume).
> 
> * Add function write_ctrlreg_adjust_traps (in x86/monitor.c) and relocate
> enabling/disabling of CPU_BASED_CR3_LOAD_EXITING for CR3 monitor vm-events there
> (previously done through CR0 node @ vmx_update_guest_cr(v, 0)).
> 
> Signed-off-by: Corneliu ZUZU <czuzu@bitdefender.com>
> ---
>  xen/arch/x86/hvm/hvm.c        | 48 +++++++++-------------
>  xen/arch/x86/hvm/vmx/vmx.c    |  5 ---
>  xen/arch/x86/monitor.c        | 94 +++++++++++++++++++++++++++++++++++++++----
>  xen/include/asm-x86/monitor.h |  2 +
>  4 files changed, 107 insertions(+), 42 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index c89ab6e..5481a6e 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -475,8 +475,6 @@ void hvm_do_resume(struct vcpu *v)
>  
>      if ( unlikely(v->arch.vm_event) )
>      {
> -        struct monitor_write_data *w = &v->arch.vm_event->write_data;
> -
>          if ( v->arch.vm_event->emulate_flags )
>          {
>              enum emul_kind kind = EMUL_KIND_NORMAL;
> @@ -493,32 +491,10 @@ void hvm_do_resume(struct vcpu *v)
>  
>              v->arch.vm_event->emulate_flags = 0;
>          }
> -
> -        if ( w->do_write.msr )
> -        {
> -            hvm_msr_write_intercept(w->msr, w->value, 0);
> -            w->do_write.msr = 0;
> -        }
> -
> -        if ( w->do_write.cr0 )
> -        {
> -            hvm_set_cr0(w->cr0, 0);
> -            w->do_write.cr0 = 0;
> -        }
> -
> -        if ( w->do_write.cr4 )
> -        {
> -            hvm_set_cr4(w->cr4, 0);
> -            w->do_write.cr4 = 0;
> -        }
> -
> -        if ( w->do_write.cr3 )
> -        {
> -            hvm_set_cr3(w->cr3, 0);
> -            w->do_write.cr3 = 0;
> -        }
>      }
>  
> +    arch_monitor_write_data(v);
> +
>      /* Inject pending hw/sw trap */
>      if ( v->arch.hvm_vcpu.inject_trap.vector != -1 )
>      {
> @@ -2206,7 +2182,10 @@ int hvm_set_cr0(unsigned long value, bool_t may_defer)
>  
>          if ( hvm_monitor_crX(CR0, value, old_value) )
>          {
> -            /* The actual write will occur in hvm_do_resume(), if permitted. */
> +            /*
> +             * The actual write will occur in arch_monitor_write_data(), if
> +             * permitted.
> +             */
>              v->arch.vm_event->write_data.do_write.cr0 = 1;
>              v->arch.vm_event->write_data.cr0 = value;
>  
> @@ -2308,7 +2287,10 @@ int hvm_set_cr3(unsigned long value, bool_t may_defer)
>  
>          if ( hvm_monitor_crX(CR3, value, old) )
>          {
> -            /* The actual write will occur in hvm_do_resume(), if permitted. */
> +            /*
> +             * The actual write will occur in arch_monitor_write_data(), if
> +             * permitted.
> +             */
>              v->arch.vm_event->write_data.do_write.cr3 = 1;
>              v->arch.vm_event->write_data.cr3 = value;
>  
> @@ -2388,7 +2370,10 @@ int hvm_set_cr4(unsigned long value, bool_t may_defer)
>  
>          if ( hvm_monitor_crX(CR4, value, old_cr) )
>          {
> -            /* The actual write will occur in hvm_do_resume(), if permitted. */
> +            /*
> +             * The actual write will occur in arch_monitor_write_data(), if
> +             * permitted.
> +             */
>              v->arch.vm_event->write_data.do_write.cr4 = 1;
>              v->arch.vm_event->write_data.cr4 = value;
>  
> @@ -3767,7 +3752,10 @@ int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content,
>      {
>          ASSERT(v->arch.vm_event);
>  
> -        /* The actual write will occur in hvm_do_resume() (if permitted). */
> +        /*
> +         * The actual write will occur in arch_monitor_write_data(), if
> +         * permitted.
> +         */
>          v->arch.vm_event->write_data.do_write.msr = 1;
>          v->arch.vm_event->write_data.msr = msr;
>          v->arch.vm_event->write_data.value = msr_content;
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index 15c84c2..de04e6c 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -1442,11 +1442,6 @@ static void vmx_update_guest_cr(struct vcpu *v, unsigned int cr)
>              if ( !hvm_paging_enabled(v) && !vmx_unrestricted_guest(v) )
>                  v->arch.hvm_vmx.exec_control |= cr3_ctls;
>  
> -            /* Trap CR3 updates if CR3 memory events are enabled. */
> -            if ( v->domain->arch.monitor.write_ctrlreg_enabled &
> -                 monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3) )
> -                v->arch.hvm_vmx.exec_control |= CPU_BASED_CR3_LOAD_EXITING;
> -
>              if ( old_ctls != v->arch.hvm_vmx.exec_control )
>                  vmx_update_cpu_exec_control(v);
>          }
> diff --git a/xen/arch/x86/monitor.c b/xen/arch/x86/monitor.c
> index a271161..90e4856 100644
> --- a/xen/arch/x86/monitor.c
> +++ b/xen/arch/x86/monitor.c
> @@ -20,6 +20,9 @@
>   */
>  
>  #include <asm/monitor.h>
> +#include <asm/paging.h>
> +#include <asm/hvm/vmx/vmx.h>
> +#include <asm/vm_event.h>
>  #include <public/vm_event.h>

As previously stated, unless there's a compelling reason to do
otherwise, AFAIK asm/hvm/... goes above asm/monitor.h.

Otherwise, for the monitor parts:

Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 1/8] x86/vm-event: proper vCPU-paused checks at resume
  2016-06-30 18:41 ` [PATCH 1/8] x86/vm-event: proper vCPU-paused checks at resume Corneliu ZUZU
  2016-07-01  7:14   ` Razvan Cojocaru
@ 2016-07-01 16:51   ` Tamas K Lengyel
  1 sibling, 0 replies; 59+ messages in thread
From: Tamas K Lengyel @ 2016-07-01 16:51 UTC (permalink / raw)
  To: Corneliu ZUZU; +Cc: Andrew Cooper, Razvan Cojocaru, Jan Beulich, Xen-devel

On Thu, Jun 30, 2016 at 12:41 PM, Corneliu ZUZU <czuzu@bitdefender.com> wrote:
> A VM_EVENT_FLAG_VCPU_PAUSED flag in a vm-event response should only be treated
> as informative that the toolstack user wants the vm-event subsystem to unpause
> the target vCPU, but not be relied upon to decide if the target vCPU is actually
> paused.
>
> That being said, this patch does the following:
>
> * Fixes (replaces) the old behavior in vm_event_resume, which relied on
>   VM_EVENT_FLAG_VCPU_PAUSED to determine if the target vCPU is paused, by
>   actually checking the vCPU vm-event pause-count.
>
> * ASSERTs that the vCPU is paused in vm_event_set_registers and
>   vm_event_toggle_singlestep.
>
> * Ignores VM_EVENT_FLAG_DENY @ vm_event_register_write_resume if the target vCPU
>   is not paused. Also adjusts comment in public/vm_event.h to reflect that.
>
> Signed-off-by: Corneliu ZUZU <czuzu@bitdefender.com>

Thanks!

Acked-by: Tamas K Lengyel <tamas@tklengyel.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 3/8] x86/vm-event/monitor: relocate code-motion more appropriately
  2016-06-30 18:43 ` [PATCH 3/8] x86/vm-event/monitor: relocate code-motion more appropriately Corneliu ZUZU
  2016-07-01  7:54   ` Razvan Cojocaru
@ 2016-07-04 10:22   ` Jan Beulich
  2016-07-04 11:02     ` Corneliu ZUZU
  2016-07-04 13:22     ` Razvan Cojocaru
  1 sibling, 2 replies; 59+ messages in thread
From: Jan Beulich @ 2016-07-04 10:22 UTC (permalink / raw)
  To: Corneliu ZUZU, Razvan Cojocaru
  Cc: Andrew Cooper, Kevin Tian, Tamas K Lengyel, Jun Nakajima, xen-devel

>>> On 30.06.16 at 20:43, <czuzu@bitdefender.com> wrote:
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -475,8 +475,6 @@ void hvm_do_resume(struct vcpu *v)
>  
>      if ( unlikely(v->arch.vm_event) )
>      {
> -        struct monitor_write_data *w = &v->arch.vm_event->write_data;
> -
>          if ( v->arch.vm_event->emulate_flags )
>          {
>              enum emul_kind kind = EMUL_KIND_NORMAL;
> @@ -493,32 +491,10 @@ void hvm_do_resume(struct vcpu *v)
>  
>              v->arch.vm_event->emulate_flags = 0;
>          }
> -
> -        if ( w->do_write.msr )
> -        {
> -            hvm_msr_write_intercept(w->msr, w->value, 0);
> -            w->do_write.msr = 0;
> -        }
> -
> -        if ( w->do_write.cr0 )
> -        {
> -            hvm_set_cr0(w->cr0, 0);
> -            w->do_write.cr0 = 0;
> -        }
> -
> -        if ( w->do_write.cr4 )
> -        {
> -            hvm_set_cr4(w->cr4, 0);
> -            w->do_write.cr4 = 0;
> -        }
> -
> -        if ( w->do_write.cr3 )
> -        {
> -            hvm_set_cr3(w->cr3, 0);
> -            w->do_write.cr3 = 0;
> -        }
>      }
>  
> +    arch_monitor_write_data(v);

Why does this get moved outside the if(), with the same condition
getting added inside the function (inverted for bailing early)?

> @@ -119,6 +156,55 @@ bool_t monitored_msr(const struct domain *d, u32 msr)
>      return test_bit(msr, bitmap);
>  }
>  
> +static void write_ctrlreg_adjust_traps(struct domain *d)
> +{
> +    struct vcpu *v;
> +    struct arch_vmx_struct *avmx;
> +    unsigned int cr3_bitmask;
> +    bool_t cr3_vmevent, cr3_ldexit;
> +
> +    /* Adjust CR3 load-exiting. */
> +
> +    /* vmx only */
> +    ASSERT(cpu_has_vmx);
> +
> +    /* non-hap domains trap CR3 writes unconditionally */
> +    if ( !paging_mode_hap(d) )
> +    {
> +        for_each_vcpu ( d, v )
> +            ASSERT(v->arch.hvm_vmx.exec_control & CPU_BASED_CR3_LOAD_EXITING);
> +        return;
> +    }
> +
> +    cr3_bitmask = monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3);
> +    cr3_vmevent = !!(d->arch.monitor.write_ctrlreg_enabled & cr3_bitmask);
> +
> +    for_each_vcpu ( d, v )
> +    {
> +        avmx = &v->arch.hvm_vmx;
> +        cr3_ldexit = !!(avmx->exec_control & CPU_BASED_CR3_LOAD_EXITING);
> +
> +        if ( cr3_vmevent == cr3_ldexit )
> +            continue;
> +
> +        /*
> +         * If CR0.PE=0, CR3 load exiting must remain enabled.
> +         * See vmx_update_guest_cr code motion for cr = 0.
> +         */
> +        if ( cr3_ldexit && !hvm_paging_enabled(v) && !vmx_unrestricted_guest(v) 
> )
> +            continue;
> +
> +        if ( cr3_vmevent )
> +            avmx->exec_control |= CPU_BASED_CR3_LOAD_EXITING;
> +        else
> +            avmx->exec_control &= ~CPU_BASED_CR3_LOAD_EXITING;
> +
> +        vmx_vmcs_enter(v);
> +        vmx_update_cpu_exec_control(v);
> +        vmx_vmcs_exit(v);
> +    }
> +}

While Razvan gave his ack already, I wonder whether it's really a
good idea to put deeply VMX-specific code outside of a VMX-specific
file.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 3/8] x86/vm-event/monitor: relocate code-motion more appropriately
  2016-07-04 10:22   ` Jan Beulich
@ 2016-07-04 11:02     ` Corneliu ZUZU
  2016-07-04 14:58       ` Jan Beulich
  2016-07-04 13:22     ` Razvan Cojocaru
  1 sibling, 1 reply; 59+ messages in thread
From: Corneliu ZUZU @ 2016-07-04 11:02 UTC (permalink / raw)
  To: Jan Beulich, Razvan Cojocaru
  Cc: Andrew Cooper, Kevin Tian, Tamas K Lengyel, Jun Nakajima, xen-devel

Hi Jan,

On 7/4/2016 1:22 PM, Jan Beulich wrote:
>>>> On 30.06.16 at 20:43, <czuzu@bitdefender.com> wrote:
>> --- a/xen/arch/x86/hvm/hvm.c
>> +++ b/xen/arch/x86/hvm/hvm.c
>> @@ -475,8 +475,6 @@ void hvm_do_resume(struct vcpu *v)
>>   
>>       if ( unlikely(v->arch.vm_event) )
>>       {
>> -        struct monitor_write_data *w = &v->arch.vm_event->write_data;
>> -
>>           if ( v->arch.vm_event->emulate_flags )
>>           {
>>               enum emul_kind kind = EMUL_KIND_NORMAL;
>> @@ -493,32 +491,10 @@ void hvm_do_resume(struct vcpu *v)
>>   
>>               v->arch.vm_event->emulate_flags = 0;
>>           }
>> -
>> -        if ( w->do_write.msr )
>> -        {
>> -            hvm_msr_write_intercept(w->msr, w->value, 0);
>> -            w->do_write.msr = 0;
>> -        }
>> -
>> -        if ( w->do_write.cr0 )
>> -        {
>> -            hvm_set_cr0(w->cr0, 0);
>> -            w->do_write.cr0 = 0;
>> -        }
>> -
>> -        if ( w->do_write.cr4 )
>> -        {
>> -            hvm_set_cr4(w->cr4, 0);
>> -            w->do_write.cr4 = 0;
>> -        }
>> -
>> -        if ( w->do_write.cr3 )
>> -        {
>> -            hvm_set_cr3(w->cr3, 0);
>> -            w->do_write.cr3 = 0;
>> -        }
>>       }
>>   
>> +    arch_monitor_write_data(v);
> Why does this get moved outside the if(), with the same condition
> getting added inside the function (inverted for bailing early)?

I left that so because of patch 5/8 - specifically, monitor_write_data 
handling shouldn't depend on the vm_event subsystem being initialized.
But you're right, it still does depend on that initialization in this 
patch, so I should leave the call inside the if (and remove the check 
inside the function) as you suggest and only get it out in 5/8.
Will do that in v2.

>
>> @@ -119,6 +156,55 @@ bool_t monitored_msr(const struct domain *d, u32 msr)
>>       return test_bit(msr, bitmap);
>>   }
>>   
>> +static void write_ctrlreg_adjust_traps(struct domain *d)
>> +{
>> +    struct vcpu *v;
>> +    struct arch_vmx_struct *avmx;
>> +    unsigned int cr3_bitmask;
>> +    bool_t cr3_vmevent, cr3_ldexit;
>> +
>> +    /* Adjust CR3 load-exiting. */
>> +
>> +    /* vmx only */
>> +    ASSERT(cpu_has_vmx);
>> +
>> +    /* non-hap domains trap CR3 writes unconditionally */
>> +    if ( !paging_mode_hap(d) )
>> +    {
>> +        for_each_vcpu ( d, v )
>> +            ASSERT(v->arch.hvm_vmx.exec_control & CPU_BASED_CR3_LOAD_EXITING);
>> +        return;
>> +    }
>> +
>> +    cr3_bitmask = monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3);
>> +    cr3_vmevent = !!(d->arch.monitor.write_ctrlreg_enabled & cr3_bitmask);
>> +
>> +    for_each_vcpu ( d, v )
>> +    {
>> +        avmx = &v->arch.hvm_vmx;
>> +        cr3_ldexit = !!(avmx->exec_control & CPU_BASED_CR3_LOAD_EXITING);
>> +
>> +        if ( cr3_vmevent == cr3_ldexit )
>> +            continue;
>> +
>> +        /*
>> +         * If CR0.PE=0, CR3 load exiting must remain enabled.
>> +         * See vmx_update_guest_cr code motion for cr = 0.
>> +         */
>> +        if ( cr3_ldexit && !hvm_paging_enabled(v) && !vmx_unrestricted_guest(v)
>> )
>> +            continue;
>> +
>> +        if ( cr3_vmevent )
>> +            avmx->exec_control |= CPU_BASED_CR3_LOAD_EXITING;
>> +        else
>> +            avmx->exec_control &= ~CPU_BASED_CR3_LOAD_EXITING;
>> +
>> +        vmx_vmcs_enter(v);
>> +        vmx_update_cpu_exec_control(v);
>> +        vmx_vmcs_exit(v);
>> +    }
>> +}
> While Razvan gave his ack already, I wonder whether it's really a
> good idea to put deeply VMX-specific code outside of a VMX-specific
> file.
>
> Jan

Well, a summary of what this function does would sound like: "adjusts 
CR3 load-exiting for cr-write monitor vm-events". IMHO that's (monitor) 
vm-event specific enough to be placed within the vm-event subsystem.
Could you suggest concretely how this separation would look like? (where 
to put this function/parts of it (and what parts), what name should it 
have once moved). Another reason this was done (besides avoiding 
hackishly doing a CR0 update when we actually need a CR3 update 
specifically for a vm-event to happen) is keeping symmetry between 
ARM<->X86 in a future patch that would implement monitor CR vm-events 
for ARM. In that patch write_ctrlreg_adjust_traps is renamed and 
implemented per-architecture, on ARM it would have the same job, i.e. 
updating some hypervisor traps (~ vmx execution controls) for CR 
vm-events to happen.

On a different note, one thing I forgot to do though is to also move the 
following check (instead of completely removing it from 
arch_monitor_domctl_event):

	if ( VM_EVENT_X86_CR3 == mop->u.mov_to_cr.index )

inside write_ctrlreg_adjust_traps. Will remedy that in v2.

Thanks,
Corneliu.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 4/8] x86/vm-event/monitor: turn monitor_write_data.do_write into enum
  2016-06-30 18:44 ` [PATCH 4/8] x86/vm-event/monitor: turn monitor_write_data.do_write into enum Corneliu ZUZU
  2016-07-01  6:53   ` Razvan Cojocaru
@ 2016-07-04 12:37   ` Jan Beulich
  2016-07-04 12:47     ` Corneliu ZUZU
  1 sibling, 1 reply; 59+ messages in thread
From: Jan Beulich @ 2016-07-04 12:37 UTC (permalink / raw)
  To: Corneliu ZUZU; +Cc: Andrew Cooper, xen-devel

>>> On 30.06.16 at 20:44, <czuzu@bitdefender.com> wrote:
> After trapping a control-register write vm-event and -until- deciding if that
> write is to be permitted or not (VM_EVENT_FLAG_DENY) and doing the actual 
> write,
> there cannot and should not be another trapped control-register write event.

Is that true even for the case where full register state gets updated
for a vCPU? Is that updating-all-context case of no interest to a
monitoring application, now and forever?

> --- a/xen/include/asm-x86/domain.h
> +++ b/xen/include/asm-x86/domain.h
> @@ -259,19 +259,19 @@ struct pv_domain
>      struct cpuidmasks *cpuidmasks;
>  };
>  
> -struct monitor_write_data {
> -    struct {
> -        unsigned int msr : 1;
> -        unsigned int cr0 : 1;
> -        unsigned int cr3 : 1;
> -        unsigned int cr4 : 1;
> -    } do_write;
> +enum monitor_write_status
> +{
> +    MWS_NOWRITE = 0,

Please omit the "= 0" here - MWS_NOWRITE will be zero even
without that.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 5/8] x86/vm-event/monitor: don't compromise monitor_write_data on domain cleanup
  2016-06-30 18:45 ` [PATCH 5/8] x86/vm-event/monitor: don't compromise monitor_write_data on domain cleanup Corneliu ZUZU
  2016-07-01  6:47   ` Razvan Cojocaru
@ 2016-07-04 12:47   ` Jan Beulich
  2016-07-04 13:03     ` Corneliu ZUZU
  1 sibling, 1 reply; 59+ messages in thread
From: Jan Beulich @ 2016-07-04 12:47 UTC (permalink / raw)
  To: Corneliu ZUZU
  Cc: Tamas K Lengyel, Razvan Cojocaru, George Dunlap, Andrew Cooper,
	xen-devel, Paul Durrant

>>> On 30.06.16 at 20:45, <czuzu@bitdefender.com> wrote:
> The arch_vm_event structure is dynamically allocated and freed @
> vm_event_cleanup_domain. This cleanup is triggered e.g. when the toolstack user
> disables domain monitoring (xc_monitor_disable), which in turn effectively
> discards any information that was in arch_vm_event.write_data.

Isn't that rather a toolstack user bug, not warranting a relatively
extensive (even if mostly mechanical) hypervisor change like this
one? Sane monitor behavior, after all, is required anyway for the
monitored guest to survive.

> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -473,24 +473,24 @@ void hvm_do_resume(struct vcpu *v)
>      if ( !handle_hvm_io_completion(v) )
>          return;
>  
> -    if ( unlikely(v->arch.vm_event) )
> +    if ( unlikely(v->arch.vm_event.emulate_flags) )
>      {
> -        if ( v->arch.vm_event->emulate_flags )
> -        {
> -            enum emul_kind kind = EMUL_KIND_NORMAL;
> +        enum emul_kind kind;
>  
> -            if ( v->arch.vm_event->emulate_flags &
> -                 VM_EVENT_FLAG_SET_EMUL_READ_DATA )
> -                kind = EMUL_KIND_SET_CONTEXT;
> -            else if ( v->arch.vm_event->emulate_flags &
> -                      VM_EVENT_FLAG_EMULATE_NOWRITE )
> -                kind = EMUL_KIND_NOWRITE;
> +        ASSERT(v->arch.vm_event.emul_read_data);
>  
> -            hvm_mem_access_emulate_one(kind, TRAP_invalid_op,
> -                                       HVM_DELIVER_NO_ERROR_CODE);
> +        kind = EMUL_KIND_NORMAL;

Please keep this being the initializer of the variable.

>  
> -            v->arch.vm_event->emulate_flags = 0;
> -        }
> +        if ( v->arch.vm_event.emulate_flags & VM_EVENT_FLAG_SET_EMUL_READ_DATA )

Long line.

> --- a/xen/include/asm-x86/domain.h
> +++ b/xen/include/asm-x86/domain.h
> @@ -259,21 +259,6 @@ struct pv_domain
>      struct cpuidmasks *cpuidmasks;
>  };
>  
> -enum monitor_write_status
> -{
> -    MWS_NOWRITE = 0,
> -    MWS_MSR,
> -    MWS_CR0,
> -    MWS_CR3,
> -    MWS_CR4,
> -};
> -
> -struct monitor_write_data {
> -    enum monitor_write_status status;
> -    uint32_t msr;
> -    uint64_t value;
> -};
> -
>  struct arch_domain
>  {
>      struct page_info *perdomain_l3_pg;
> @@ -496,6 +481,31 @@ typedef enum __packed {
>      SMAP_CHECK_DISABLED,        /* disable the check */
>  } smap_check_policy_t;
>  
> +enum monitor_write_status
> +{
> +    MWS_NOWRITE = 0,
> +    MWS_MSR,
> +    MWS_CR0,
> +    MWS_CR3,
> +    MWS_CR4,
> +};
> +
> +struct monitor_write_data {
> +    enum monitor_write_status status;
> +    uint32_t msr;
> +    uint64_t value;
> +};

Instead of moving these around now, may I suggest you put them
into their final place right away in the previous patch?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 4/8] x86/vm-event/monitor: turn monitor_write_data.do_write into enum
  2016-07-04 12:37   ` Jan Beulich
@ 2016-07-04 12:47     ` Corneliu ZUZU
  2016-07-04 13:07       ` Jan Beulich
  0 siblings, 1 reply; 59+ messages in thread
From: Corneliu ZUZU @ 2016-07-04 12:47 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel

On 7/4/2016 3:37 PM, Jan Beulich wrote:
>>>> On 30.06.16 at 20:44, <czuzu@bitdefender.com> wrote:
>> After trapping a control-register write vm-event and -until- deciding if that
>> write is to be permitted or not (VM_EVENT_FLAG_DENY) and doing the actual
>> write,
>> there cannot and should not be another trapped control-register write event.
> Is that true even for the case where full register state gets updated
> for a vCPU?

AFAIK, the full register state cannot be updated _at once_, that is: 
after each trapped register update monitor_write_data must _always_ be 
handled _before reentering the vCPU_.

> Is that updating-all-context case of no interest to a
> monitoring application, now and forever?

As I said above, I'm don't see how such a case would (ever) be possible.

>> --- a/xen/include/asm-x86/domain.h
>> +++ b/xen/include/asm-x86/domain.h
>> @@ -259,19 +259,19 @@ struct pv_domain
>>       struct cpuidmasks *cpuidmasks;
>>   };
>>   
>> -struct monitor_write_data {
>> -    struct {
>> -        unsigned int msr : 1;
>> -        unsigned int cr0 : 1;
>> -        unsigned int cr3 : 1;
>> -        unsigned int cr4 : 1;
>> -    } do_write;
>> +enum monitor_write_status
>> +{
>> +    MWS_NOWRITE = 0,
> Please omit the "= 0" here - MWS_NOWRITE will be zero even
> without that.
>
> Jan
>
>

Ack.

Thanks,
Corneliu.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 5/8] x86/vm-event/monitor: don't compromise monitor_write_data on domain cleanup
  2016-07-04 12:47   ` Jan Beulich
@ 2016-07-04 13:03     ` Corneliu ZUZU
  2016-07-04 13:11       ` Jan Beulich
  0 siblings, 1 reply; 59+ messages in thread
From: Corneliu ZUZU @ 2016-07-04 13:03 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tamas K Lengyel, Razvan Cojocaru, George Dunlap, Andrew Cooper,
	xen-devel, Paul Durrant

On 7/4/2016 3:47 PM, Jan Beulich wrote:
>>>> On 30.06.16 at 20:45, <czuzu@bitdefender.com> wrote:
>> The arch_vm_event structure is dynamically allocated and freed @
>> vm_event_cleanup_domain. This cleanup is triggered e.g. when the toolstack user
>> disables domain monitoring (xc_monitor_disable), which in turn effectively
>> discards any information that was in arch_vm_event.write_data.
> Isn't that rather a toolstack user bug, not warranting a relatively
> extensive (even if mostly mechanical) hypervisor change like this
> one? Sane monitor behavior, after all, is required anyway for the
> monitored guest to survive.

Sorry but could you please rephrase this, I don't quite understand what 
you're saying.
The write_data field in arch_vm_event should _not ever_ be invalidated 
as a direct result of a toolstack user's action.

>> --- a/xen/arch/x86/hvm/hvm.c
>> +++ b/xen/arch/x86/hvm/hvm.c
>> @@ -473,24 +473,24 @@ void hvm_do_resume(struct vcpu *v)
>>       if ( !handle_hvm_io_completion(v) )
>>           return;
>>   
>> -    if ( unlikely(v->arch.vm_event) )
>> +    if ( unlikely(v->arch.vm_event.emulate_flags) )
>>       {
>> -        if ( v->arch.vm_event->emulate_flags )
>> -        {
>> -            enum emul_kind kind = EMUL_KIND_NORMAL;
>> +        enum emul_kind kind;
>>   
>> -            if ( v->arch.vm_event->emulate_flags &
>> -                 VM_EVENT_FLAG_SET_EMUL_READ_DATA )
>> -                kind = EMUL_KIND_SET_CONTEXT;
>> -            else if ( v->arch.vm_event->emulate_flags &
>> -                      VM_EVENT_FLAG_EMULATE_NOWRITE )
>> -                kind = EMUL_KIND_NOWRITE;
>> +        ASSERT(v->arch.vm_event.emul_read_data);
>>   
>> -            hvm_mem_access_emulate_one(kind, TRAP_invalid_op,
>> -                                       HVM_DELIVER_NO_ERROR_CODE);
>> +        kind = EMUL_KIND_NORMAL;
> Please keep this being the initializer of the variable.

I put it there because of the ASSERT (to do that before anything else), 
but I will undo if you prefer.

>
>>   
>> -            v->arch.vm_event->emulate_flags = 0;
>> -        }
>> +        if ( v->arch.vm_event.emulate_flags & VM_EVENT_FLAG_SET_EMUL_READ_DATA )
> Long line.

Long but under 80 columns, isn't that the rule? :-)

>
>> --- a/xen/include/asm-x86/domain.h
>> +++ b/xen/include/asm-x86/domain.h
>> @@ -259,21 +259,6 @@ struct pv_domain
>>       struct cpuidmasks *cpuidmasks;
>>   };
>>   
>> -enum monitor_write_status
>> -{
>> -    MWS_NOWRITE = 0,
>> -    MWS_MSR,
>> -    MWS_CR0,
>> -    MWS_CR3,
>> -    MWS_CR4,
>> -};
>> -
>> -struct monitor_write_data {
>> -    enum monitor_write_status status;
>> -    uint32_t msr;
>> -    uint64_t value;
>> -};
>> -
>>   struct arch_domain
>>   {
>>       struct page_info *perdomain_l3_pg;
>> @@ -496,6 +481,31 @@ typedef enum __packed {
>>       SMAP_CHECK_DISABLED,        /* disable the check */
>>   } smap_check_policy_t;
>>   
>> +enum monitor_write_status
>> +{
>> +    MWS_NOWRITE = 0,
>> +    MWS_MSR,
>> +    MWS_CR0,
>> +    MWS_CR3,
>> +    MWS_CR4,
>> +};
>> +
>> +struct monitor_write_data {
>> +    enum monitor_write_status status;
>> +    uint32_t msr;
>> +    uint64_t value;
>> +};
> Instead of moving these around now, may I suggest you put them
> into their final place right away in the previous patch?
>
> Jan
>
>

Sounds good, will do.

Corneliu.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 4/8] x86/vm-event/monitor: turn monitor_write_data.do_write into enum
  2016-07-04 12:47     ` Corneliu ZUZU
@ 2016-07-04 13:07       ` Jan Beulich
  2016-07-04 13:21         ` Corneliu ZUZU
  0 siblings, 1 reply; 59+ messages in thread
From: Jan Beulich @ 2016-07-04 13:07 UTC (permalink / raw)
  To: Corneliu ZUZU; +Cc: Andrew Cooper, xen-devel

>>> On 04.07.16 at 14:47, <czuzu@bitdefender.com> wrote:
> On 7/4/2016 3:37 PM, Jan Beulich wrote:
>>>>> On 30.06.16 at 20:44, <czuzu@bitdefender.com> wrote:
>>> After trapping a control-register write vm-event and -until- deciding if that
>>> write is to be permitted or not (VM_EVENT_FLAG_DENY) and doing the actual
>>> write,
>>> there cannot and should not be another trapped control-register write event.
>> Is that true even for the case where full register state gets updated
>> for a vCPU?
> 
> AFAIK, the full register state cannot be updated _at once_, that is: 
> after each trapped register update monitor_write_data must _always_ be 
> handled _before reentering the vCPU_.

I'm thinking about operations like VCPUOP_initialise here. Of course
operations on current can't possibly update more than one register
at a time.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 5/8] x86/vm-event/monitor: don't compromise monitor_write_data on domain cleanup
  2016-07-04 13:03     ` Corneliu ZUZU
@ 2016-07-04 13:11       ` Jan Beulich
  2016-07-04 13:28         ` Corneliu ZUZU
  2016-07-04 13:50         ` Razvan Cojocaru
  0 siblings, 2 replies; 59+ messages in thread
From: Jan Beulich @ 2016-07-04 13:11 UTC (permalink / raw)
  To: Corneliu ZUZU
  Cc: Tamas K Lengyel, Razvan Cojocaru, George Dunlap, Andrew Cooper,
	xen-devel, Paul Durrant

>>> On 04.07.16 at 15:03, <czuzu@bitdefender.com> wrote:
> On 7/4/2016 3:47 PM, Jan Beulich wrote:
>>>>> On 30.06.16 at 20:45, <czuzu@bitdefender.com> wrote:
>>> The arch_vm_event structure is dynamically allocated and freed @
>>> vm_event_cleanup_domain. This cleanup is triggered e.g. when the toolstack user
>>> disables domain monitoring (xc_monitor_disable), which in turn effectively
>>> discards any information that was in arch_vm_event.write_data.
>> Isn't that rather a toolstack user bug, not warranting a relatively
>> extensive (even if mostly mechanical) hypervisor change like this
>> one? Sane monitor behavior, after all, is required anyway for the
>> monitored guest to survive.
> 
> Sorry but could you please rephrase this, I don't quite understand what 
> you're saying.
> The write_data field in arch_vm_event should _not ever_ be invalidated 
> as a direct result of a toolstack user's action.

The monitoring app can cause all kinds of problems to the guest it
monitors. Why would this specific one need taking care of in the
hypervisor, instead of demanding that the app not disable monitoring
at the wrong time?

>>> --- a/xen/arch/x86/hvm/hvm.c
>>> +++ b/xen/arch/x86/hvm/hvm.c
>>> @@ -473,24 +473,24 @@ void hvm_do_resume(struct vcpu *v)
>>>       if ( !handle_hvm_io_completion(v) )
>>>           return;
>>>   
>>> -    if ( unlikely(v->arch.vm_event) )
>>> +    if ( unlikely(v->arch.vm_event.emulate_flags) )
>>>       {
>>> -        if ( v->arch.vm_event->emulate_flags )
>>> -        {
>>> -            enum emul_kind kind = EMUL_KIND_NORMAL;
>>> +        enum emul_kind kind;
>>>   
>>> -            if ( v->arch.vm_event->emulate_flags &
>>> -                 VM_EVENT_FLAG_SET_EMUL_READ_DATA )
>>> -                kind = EMUL_KIND_SET_CONTEXT;
>>> -            else if ( v->arch.vm_event->emulate_flags &
>>> -                      VM_EVENT_FLAG_EMULATE_NOWRITE )
>>> -                kind = EMUL_KIND_NOWRITE;
>>> +        ASSERT(v->arch.vm_event.emul_read_data);
>>>   
>>> -            hvm_mem_access_emulate_one(kind, TRAP_invalid_op,
>>> -                                       HVM_DELIVER_NO_ERROR_CODE);
>>> +        kind = EMUL_KIND_NORMAL;
>> Please keep this being the initializer of the variable.
> 
> I put it there because of the ASSERT (to do that before anything else), 
> but I will undo if you prefer.

Since the initializer is (very obviously) independent of the
condition the ASSERT() checks, I indeed would prefer it to remain
the way it is before this change.

>>> -            v->arch.vm_event->emulate_flags = 0;
>>> -        }
>>> +        if ( v->arch.vm_event.emulate_flags & VM_EVENT_FLAG_SET_EMUL_READ_DATA )
>> Long line.
> 
> Long but under 80 columns, isn't that the rule? :-)

I've counted 81 here.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 4/8] x86/vm-event/monitor: turn monitor_write_data.do_write into enum
  2016-07-04 13:07       ` Jan Beulich
@ 2016-07-04 13:21         ` Corneliu ZUZU
  2016-07-04 14:08           ` Jan Beulich
  0 siblings, 1 reply; 59+ messages in thread
From: Corneliu ZUZU @ 2016-07-04 13:21 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel

On 7/4/2016 4:07 PM, Jan Beulich wrote:
>>>> On 04.07.16 at 14:47, <czuzu@bitdefender.com> wrote:
>> On 7/4/2016 3:37 PM, Jan Beulich wrote:
>>>>>> On 30.06.16 at 20:44, <czuzu@bitdefender.com> wrote:
>>>> After trapping a control-register write vm-event and -until- deciding if that
>>>> write is to be permitted or not (VM_EVENT_FLAG_DENY) and doing the actual
>>>> write,
>>>> there cannot and should not be another trapped control-register write event.
>>> Is that true even for the case where full register state gets updated
>>> for a vCPU?
>> AFAIK, the full register state cannot be updated _at once_, that is:
>> after each trapped register update monitor_write_data must _always_ be
>> handled _before reentering the vCPU_.
> I'm thinking about operations like VCPUOP_initialise here. Of course
> operations on current can't possibly update more than one register
> at a time.
>
> Jan

Yes but those register update operations happen outside the vm-event 
subsystem, i.e. in those cases the registers get updated directly, not 
by setting bits in monitor_write_data.
Only hypervisor-trapped register updates (e.g. hvm_set_cr0 w/ may_defer 
parameter = 1) which are deferred because of hvm_event_crX returning 
true use monitor_write_data.

Corneliu.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 3/8] x86/vm-event/monitor: relocate code-motion more appropriately
  2016-07-04 10:22   ` Jan Beulich
  2016-07-04 11:02     ` Corneliu ZUZU
@ 2016-07-04 13:22     ` Razvan Cojocaru
  2016-07-04 14:05       ` Jan Beulich
  1 sibling, 1 reply; 59+ messages in thread
From: Razvan Cojocaru @ 2016-07-04 13:22 UTC (permalink / raw)
  To: Jan Beulich, Corneliu ZUZU
  Cc: Andrew Cooper, Kevin Tian, Tamas K Lengyel, Jun Nakajima, xen-devel

On 07/04/16 13:22, Jan Beulich wrote:
>>>> On 30.06.16 at 20:43, <czuzu@bitdefender.com> wrote:
>> --- a/xen/arch/x86/hvm/hvm.c
>> +++ b/xen/arch/x86/hvm/hvm.c
>> @@ -475,8 +475,6 @@ void hvm_do_resume(struct vcpu *v)
>>  
>>      if ( unlikely(v->arch.vm_event) )
>>      {
>> -        struct monitor_write_data *w = &v->arch.vm_event->write_data;
>> -
>>          if ( v->arch.vm_event->emulate_flags )
>>          {
>>              enum emul_kind kind = EMUL_KIND_NORMAL;
>> @@ -493,32 +491,10 @@ void hvm_do_resume(struct vcpu *v)
>>  
>>              v->arch.vm_event->emulate_flags = 0;
>>          }
>> -
>> -        if ( w->do_write.msr )
>> -        {
>> -            hvm_msr_write_intercept(w->msr, w->value, 0);
>> -            w->do_write.msr = 0;
>> -        }
>> -
>> -        if ( w->do_write.cr0 )
>> -        {
>> -            hvm_set_cr0(w->cr0, 0);
>> -            w->do_write.cr0 = 0;
>> -        }
>> -
>> -        if ( w->do_write.cr4 )
>> -        {
>> -            hvm_set_cr4(w->cr4, 0);
>> -            w->do_write.cr4 = 0;
>> -        }
>> -
>> -        if ( w->do_write.cr3 )
>> -        {
>> -            hvm_set_cr3(w->cr3, 0);
>> -            w->do_write.cr3 = 0;
>> -        }
>>      }
>>  
>> +    arch_monitor_write_data(v);
> 
> Why does this get moved outside the if(), with the same condition
> getting added inside the function (inverted for bailing early)?
> 
>> @@ -119,6 +156,55 @@ bool_t monitored_msr(const struct domain *d, u32 msr)
>>      return test_bit(msr, bitmap);
>>  }
>>  
>> +static void write_ctrlreg_adjust_traps(struct domain *d)
>> +{
>> +    struct vcpu *v;
>> +    struct arch_vmx_struct *avmx;
>> +    unsigned int cr3_bitmask;
>> +    bool_t cr3_vmevent, cr3_ldexit;
>> +
>> +    /* Adjust CR3 load-exiting. */
>> +
>> +    /* vmx only */
>> +    ASSERT(cpu_has_vmx);
>> +
>> +    /* non-hap domains trap CR3 writes unconditionally */
>> +    if ( !paging_mode_hap(d) )
>> +    {
>> +        for_each_vcpu ( d, v )
>> +            ASSERT(v->arch.hvm_vmx.exec_control & CPU_BASED_CR3_LOAD_EXITING);
>> +        return;
>> +    }
>> +
>> +    cr3_bitmask = monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3);
>> +    cr3_vmevent = !!(d->arch.monitor.write_ctrlreg_enabled & cr3_bitmask);
>> +
>> +    for_each_vcpu ( d, v )
>> +    {
>> +        avmx = &v->arch.hvm_vmx;
>> +        cr3_ldexit = !!(avmx->exec_control & CPU_BASED_CR3_LOAD_EXITING);
>> +
>> +        if ( cr3_vmevent == cr3_ldexit )
>> +            continue;
>> +
>> +        /*
>> +         * If CR0.PE=0, CR3 load exiting must remain enabled.
>> +         * See vmx_update_guest_cr code motion for cr = 0.
>> +         */
>> +        if ( cr3_ldexit && !hvm_paging_enabled(v) && !vmx_unrestricted_guest(v) 
>> )
>> +            continue;
>> +
>> +        if ( cr3_vmevent )
>> +            avmx->exec_control |= CPU_BASED_CR3_LOAD_EXITING;
>> +        else
>> +            avmx->exec_control &= ~CPU_BASED_CR3_LOAD_EXITING;
>> +
>> +        vmx_vmcs_enter(v);
>> +        vmx_update_cpu_exec_control(v);
>> +        vmx_vmcs_exit(v);
>> +    }
>> +}
> 
> While Razvan gave his ack already, I wonder whether it's really a
> good idea to put deeply VMX-specific code outside of a VMX-specific
> file.

Didn't I add "for monitor / vm_event parts Acked-by: ..."? If I didn't,
I meant to. Obviously VMX code maintainers outrank me on these issues.


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 5/8] x86/vm-event/monitor: don't compromise monitor_write_data on domain cleanup
  2016-07-04 13:11       ` Jan Beulich
@ 2016-07-04 13:28         ` Corneliu ZUZU
  2016-07-04 14:13           ` Jan Beulich
  2016-07-04 13:50         ` Razvan Cojocaru
  1 sibling, 1 reply; 59+ messages in thread
From: Corneliu ZUZU @ 2016-07-04 13:28 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tamas K Lengyel, Razvan Cojocaru, George Dunlap, Andrew Cooper,
	xen-devel, Paul Durrant

On 7/4/2016 4:11 PM, Jan Beulich wrote:
>>>> On 04.07.16 at 15:03, <czuzu@bitdefender.com> wrote:
>> On 7/4/2016 3:47 PM, Jan Beulich wrote:
>>>>>> On 30.06.16 at 20:45, <czuzu@bitdefender.com> wrote:
>>>> The arch_vm_event structure is dynamically allocated and freed @
>>>> vm_event_cleanup_domain. This cleanup is triggered e.g. when the toolstack user
>>>> disables domain monitoring (xc_monitor_disable), which in turn effectively
>>>> discards any information that was in arch_vm_event.write_data.
>>> Isn't that rather a toolstack user bug, not warranting a relatively
>>> extensive (even if mostly mechanical) hypervisor change like this
>>> one? Sane monitor behavior, after all, is required anyway for the
>>> monitored guest to survive.
>> Sorry but could you please rephrase this, I don't quite understand what
>> you're saying.
>> The write_data field in arch_vm_event should _not ever_ be invalidated
>> as a direct result of a toolstack user's action.
> The monitoring app can cause all kinds of problems to the guest it
> monitors. Why would this specific one need taking care of in the
> hypervisor, instead of demanding that the app not disable monitoring
> at the wrong time?

Because it wouldn't be the wrong time to disable monitoring.
This is not a case of wrong toolstack usage, but rather a case of 
xc_monitor_disable doing the wrong thing along the way.

>
>>>> --- a/xen/arch/x86/hvm/hvm.c
>>>> +++ b/xen/arch/x86/hvm/hvm.c
>>>> @@ -473,24 +473,24 @@ void hvm_do_resume(struct vcpu *v)
>>>>        if ( !handle_hvm_io_completion(v) )
>>>>            return;
>>>>    
>>>> -    if ( unlikely(v->arch.vm_event) )
>>>> +    if ( unlikely(v->arch.vm_event.emulate_flags) )
>>>>        {
>>>> -        if ( v->arch.vm_event->emulate_flags )
>>>> -        {
>>>> -            enum emul_kind kind = EMUL_KIND_NORMAL;
>>>> +        enum emul_kind kind;
>>>>    
>>>> -            if ( v->arch.vm_event->emulate_flags &
>>>> -                 VM_EVENT_FLAG_SET_EMUL_READ_DATA )
>>>> -                kind = EMUL_KIND_SET_CONTEXT;
>>>> -            else if ( v->arch.vm_event->emulate_flags &
>>>> -                      VM_EVENT_FLAG_EMULATE_NOWRITE )
>>>> -                kind = EMUL_KIND_NOWRITE;
>>>> +        ASSERT(v->arch.vm_event.emul_read_data);
>>>>    
>>>> -            hvm_mem_access_emulate_one(kind, TRAP_invalid_op,
>>>> -                                       HVM_DELIVER_NO_ERROR_CODE);
>>>> +        kind = EMUL_KIND_NORMAL;
>>> Please keep this being the initializer of the variable.
>> I put it there because of the ASSERT (to do that before anything else),
>> but I will undo if you prefer.
> Since the initializer is (very obviously) independent of the
> condition the ASSERT() checks, I indeed would prefer it to remain
> the way it is before this change.
>
>>>> -            v->arch.vm_event->emulate_flags = 0;
>>>> -        }
>>>> +        if ( v->arch.vm_event.emulate_flags & VM_EVENT_FLAG_SET_EMUL_READ_DATA )
>>> Long line.
>> Long but under 80 columns, isn't that the rule? :-)
> I've counted 81 here.
>
> Jan

You may have counted the beginning '+' as well. Is the rule "<= 80 
columns in the source file" (in which case you're wrong) or is it "<= 80 
columns in the resulting diff" (in which case I'm wrong)?

Corneliu.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 5/8] x86/vm-event/monitor: don't compromise monitor_write_data on domain cleanup
  2016-07-04 13:11       ` Jan Beulich
  2016-07-04 13:28         ` Corneliu ZUZU
@ 2016-07-04 13:50         ` Razvan Cojocaru
  2016-07-04 14:05           ` Corneliu ZUZU
  2016-07-04 14:17           ` Jan Beulich
  1 sibling, 2 replies; 59+ messages in thread
From: Razvan Cojocaru @ 2016-07-04 13:50 UTC (permalink / raw)
  To: Jan Beulich, Corneliu ZUZU
  Cc: George Dunlap, Andrew Cooper, Tamas K Lengyel, Paul Durrant, xen-devel

On 07/04/16 16:11, Jan Beulich wrote:
>>>> On 04.07.16 at 15:03, <czuzu@bitdefender.com> wrote:
>> On 7/4/2016 3:47 PM, Jan Beulich wrote:
>>>>>> On 30.06.16 at 20:45, <czuzu@bitdefender.com> wrote:
>>>> The arch_vm_event structure is dynamically allocated and freed @
>>>> vm_event_cleanup_domain. This cleanup is triggered e.g. when the toolstack user
>>>> disables domain monitoring (xc_monitor_disable), which in turn effectively
>>>> discards any information that was in arch_vm_event.write_data.
>>> Isn't that rather a toolstack user bug, not warranting a relatively
>>> extensive (even if mostly mechanical) hypervisor change like this
>>> one? Sane monitor behavior, after all, is required anyway for the
>>> monitored guest to survive.
>>
>> Sorry but could you please rephrase this, I don't quite understand what 
>> you're saying.
>> The write_data field in arch_vm_event should _not ever_ be invalidated 
>> as a direct result of a toolstack user's action.
> 
> The monitoring app can cause all kinds of problems to the guest it
> monitors. Why would this specific one need taking care of in the
> hypervisor, instead of demanding that the app not disable monitoring
> at the wrong time?

I'm not sure there's a right time here. The problem is that, if I
understand this correctly, a race is possible between the moment the
userspace application responds to the vm_event _and_ call
xc_monitor_disable() and the time hvm_do_resume() gets called.

If xc_monitor_disable() happened before hvm_do_resume() springs into
action, we lose a register write. There's no guaranteed way for this not
to happen as far as I can see, although it's true that the race should
pretty much never happen in practice - at least we've never come across
such a case so far.


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 5/8] x86/vm-event/monitor: don't compromise monitor_write_data on domain cleanup
  2016-07-04 13:50         ` Razvan Cojocaru
@ 2016-07-04 14:05           ` Corneliu ZUZU
  2016-07-04 14:17           ` Jan Beulich
  1 sibling, 0 replies; 59+ messages in thread
From: Corneliu ZUZU @ 2016-07-04 14:05 UTC (permalink / raw)
  To: Razvan Cojocaru, Jan Beulich
  Cc: George Dunlap, Andrew Cooper, Tamas K Lengyel, Paul Durrant, xen-devel

On 7/4/2016 4:50 PM, Razvan Cojocaru wrote:
> On 07/04/16 16:11, Jan Beulich wrote:
>>>>> On 04.07.16 at 15:03, <czuzu@bitdefender.com> wrote:
>>> On 7/4/2016 3:47 PM, Jan Beulich wrote:
>>>>>>> On 30.06.16 at 20:45, <czuzu@bitdefender.com> wrote:
>>>>> The arch_vm_event structure is dynamically allocated and freed @
>>>>> vm_event_cleanup_domain. This cleanup is triggered e.g. when the toolstack user
>>>>> disables domain monitoring (xc_monitor_disable), which in turn effectively
>>>>> discards any information that was in arch_vm_event.write_data.
>>>> Isn't that rather a toolstack user bug, not warranting a relatively
>>>> extensive (even if mostly mechanical) hypervisor change like this
>>>> one? Sane monitor behavior, after all, is required anyway for the
>>>> monitored guest to survive.
>>> Sorry but could you please rephrase this, I don't quite understand what
>>> you're saying.
>>> The write_data field in arch_vm_event should _not ever_ be invalidated
>>> as a direct result of a toolstack user's action.
>> The monitoring app can cause all kinds of problems to the guest it
>> monitors. Why would this specific one need taking care of in the
>> hypervisor, instead of demanding that the app not disable monitoring
>> at the wrong time?
> I'm not sure there's a right time here. The problem is that, if I
> understand this correctly, a race is possible between the moment the
> userspace application responds to the vm_event _and_ call
> xc_monitor_disable() and the time hvm_do_resume() gets called.
>
> If xc_monitor_disable() happened before hvm_do_resume() springs into
> action, we lose a register write. There's no guaranteed way for this not
> to happen as far as I can see, although it's true that the race should
> pretty much never happen in practice - at least we've never come across
> such a case so far.
>
>
> Thanks,
> Razvan
>

Perfectly pointed, thanks. Note that xc_monitor_disable() may happen 
before hvm_do_resume() because the latter only happens _when the 
scheduler reschedules the target vCPU_, which _may not happen between_ 
the moment the toolstack user _handles the vm-event_ and the moment when 
he _calls xc_monitor_disable()_, but rather after xc_monitor_disable() 
is called.

Corneliu.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 3/8] x86/vm-event/monitor: relocate code-motion more appropriately
  2016-07-04 13:22     ` Razvan Cojocaru
@ 2016-07-04 14:05       ` Jan Beulich
  2016-07-04 14:16         ` Razvan Cojocaru
  0 siblings, 1 reply; 59+ messages in thread
From: Jan Beulich @ 2016-07-04 14:05 UTC (permalink / raw)
  To: Razvan Cojocaru
  Cc: Kevin Tian, Tamas K Lengyel, Andrew Cooper, xen-devel,
	Jun Nakajima, Corneliu ZUZU

>>> On 04.07.16 at 15:22, <rcojocaru@bitdefender.com> wrote:
> On 07/04/16 13:22, Jan Beulich wrote:
>>>>> On 30.06.16 at 20:43, <czuzu@bitdefender.com> wrote:
>>> @@ -119,6 +156,55 @@ bool_t monitored_msr(const struct domain *d, u32 msr)
>>>      return test_bit(msr, bitmap);
>>>  }
>>>  
>>> +static void write_ctrlreg_adjust_traps(struct domain *d)
>>> +{
>>> +    struct vcpu *v;
>>> +    struct arch_vmx_struct *avmx;
>>> +    unsigned int cr3_bitmask;
>>> +    bool_t cr3_vmevent, cr3_ldexit;
>>> +
>>> +    /* Adjust CR3 load-exiting. */
>>> +
>>> +    /* vmx only */
>>> +    ASSERT(cpu_has_vmx);
>>> +
>>> +    /* non-hap domains trap CR3 writes unconditionally */
>>> +    if ( !paging_mode_hap(d) )
>>> +    {
>>> +        for_each_vcpu ( d, v )
>>> +            ASSERT(v->arch.hvm_vmx.exec_control & CPU_BASED_CR3_LOAD_EXITING);
>>> +        return;
>>> +    }
>>> +
>>> +    cr3_bitmask = monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3);
>>> +    cr3_vmevent = !!(d->arch.monitor.write_ctrlreg_enabled & cr3_bitmask);
>>> +
>>> +    for_each_vcpu ( d, v )
>>> +    {
>>> +        avmx = &v->arch.hvm_vmx;
>>> +        cr3_ldexit = !!(avmx->exec_control & CPU_BASED_CR3_LOAD_EXITING);
>>> +
>>> +        if ( cr3_vmevent == cr3_ldexit )
>>> +            continue;
>>> +
>>> +        /*
>>> +         * If CR0.PE=0, CR3 load exiting must remain enabled.
>>> +         * See vmx_update_guest_cr code motion for cr = 0.
>>> +         */
>>> +        if ( cr3_ldexit && !hvm_paging_enabled(v) && !vmx_unrestricted_guest(v) 
>>> )
>>> +            continue;
>>> +
>>> +        if ( cr3_vmevent )
>>> +            avmx->exec_control |= CPU_BASED_CR3_LOAD_EXITING;
>>> +        else
>>> +            avmx->exec_control &= ~CPU_BASED_CR3_LOAD_EXITING;
>>> +
>>> +        vmx_vmcs_enter(v);
>>> +        vmx_update_cpu_exec_control(v);
>>> +        vmx_vmcs_exit(v);
>>> +    }
>>> +}
>> 
>> While Razvan gave his ack already, I wonder whether it's really a
>> good idea to put deeply VMX-specific code outside of a VMX-specific
>> file.
> 
> Didn't I add "for monitor / vm_event parts Acked-by: ..."? If I didn't,
> I meant to.

Well - this is a monitor file (monitor.c).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 4/8] x86/vm-event/monitor: turn monitor_write_data.do_write into enum
  2016-07-04 13:21         ` Corneliu ZUZU
@ 2016-07-04 14:08           ` Jan Beulich
  2016-07-04 14:23             ` Razvan Cojocaru
  2016-07-04 14:24             ` Corneliu ZUZU
  0 siblings, 2 replies; 59+ messages in thread
From: Jan Beulich @ 2016-07-04 14:08 UTC (permalink / raw)
  To: Corneliu ZUZU; +Cc: Andrew Cooper, xen-devel

>>> On 04.07.16 at 15:21, <czuzu@bitdefender.com> wrote:
> On 7/4/2016 4:07 PM, Jan Beulich wrote:
>>>>> On 04.07.16 at 14:47, <czuzu@bitdefender.com> wrote:
>>> On 7/4/2016 3:37 PM, Jan Beulich wrote:
>>>>>>> On 30.06.16 at 20:44, <czuzu@bitdefender.com> wrote:
>>>>> After trapping a control-register write vm-event and -until- deciding if that
>>>>> write is to be permitted or not (VM_EVENT_FLAG_DENY) and doing the actual
>>>>> write,
>>>>> there cannot and should not be another trapped control-register write event.
>>>> Is that true even for the case where full register state gets updated
>>>> for a vCPU?
>>> AFAIK, the full register state cannot be updated _at once_, that is:
>>> after each trapped register update monitor_write_data must _always_ be
>>> handled _before reentering the vCPU_.
>> I'm thinking about operations like VCPUOP_initialise here. Of course
>> operations on current can't possibly update more than one register
>> at a time.
> 
> Yes but those register update operations happen outside the vm-event 
> subsystem, i.e. in those cases the registers get updated directly, not 
> by setting bits in monitor_write_data.
> Only hypervisor-trapped register updates (e.g. hvm_set_cr0 w/ may_defer 
> parameter = 1) which are deferred because of hvm_event_crX returning 
> true use monitor_write_data.

That's why I had added "Is that updating-all-context case of no
interest to a monitoring application, now and forever?" After all I
gave VCPUOP_initialise as an example because the guest itself
can invoke it, and so I had assumed this to be of interest to a
monitoring app.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 5/8] x86/vm-event/monitor: don't compromise monitor_write_data on domain cleanup
  2016-07-04 13:28         ` Corneliu ZUZU
@ 2016-07-04 14:13           ` Jan Beulich
  2016-07-04 14:31             ` Corneliu ZUZU
  0 siblings, 1 reply; 59+ messages in thread
From: Jan Beulich @ 2016-07-04 14:13 UTC (permalink / raw)
  To: Corneliu ZUZU
  Cc: Tamas K Lengyel, Razvan Cojocaru, George Dunlap, Andrew Cooper,
	xen-devel, Paul Durrant

>>> On 04.07.16 at 15:28, <czuzu@bitdefender.com> wrote:
> On 7/4/2016 4:11 PM, Jan Beulich wrote:
>>>>> On 04.07.16 at 15:03, <czuzu@bitdefender.com> wrote:
>>> On 7/4/2016 3:47 PM, Jan Beulich wrote:
>>>>>>> On 30.06.16 at 20:45, <czuzu@bitdefender.com> wrote:
>>>>> The arch_vm_event structure is dynamically allocated and freed @
>>>>> vm_event_cleanup_domain. This cleanup is triggered e.g. when the toolstack 
> user
>>>>> disables domain monitoring (xc_monitor_disable), which in turn effectively
>>>>> discards any information that was in arch_vm_event.write_data.
>>>> Isn't that rather a toolstack user bug, not warranting a relatively
>>>> extensive (even if mostly mechanical) hypervisor change like this
>>>> one? Sane monitor behavior, after all, is required anyway for the
>>>> monitored guest to survive.
>>> Sorry but could you please rephrase this, I don't quite understand what
>>> you're saying.
>>> The write_data field in arch_vm_event should _not ever_ be invalidated
>>> as a direct result of a toolstack user's action.
>> The monitoring app can cause all kinds of problems to the guest it
>> monitors. Why would this specific one need taking care of in the
>> hypervisor, instead of demanding that the app not disable monitoring
>> at the wrong time?
> 
> Because it wouldn't be the wrong time to disable monitoring.
> This is not a case of wrong toolstack usage, but rather a case of 
> xc_monitor_disable doing the wrong thing along the way.
> 
>>
>>>>> --- a/xen/arch/x86/hvm/hvm.c
>>>>> +++ b/xen/arch/x86/hvm/hvm.c
>>>>> @@ -473,24 +473,24 @@ void hvm_do_resume(struct vcpu *v)
>>>>>        if ( !handle_hvm_io_completion(v) )
>>>>>            return;
>>>>>    
>>>>> -    if ( unlikely(v->arch.vm_event) )
>>>>> +    if ( unlikely(v->arch.vm_event.emulate_flags) )
>>>>>        {
>>>>> -        if ( v->arch.vm_event->emulate_flags )
>>>>> -        {
>>>>> -            enum emul_kind kind = EMUL_KIND_NORMAL;
>>>>> +        enum emul_kind kind;
>>>>>    
>>>>> -            if ( v->arch.vm_event->emulate_flags &
>>>>> -                 VM_EVENT_FLAG_SET_EMUL_READ_DATA )
>>>>> -                kind = EMUL_KIND_SET_CONTEXT;
>>>>> -            else if ( v->arch.vm_event->emulate_flags &
>>>>> -                      VM_EVENT_FLAG_EMULATE_NOWRITE )
>>>>> -                kind = EMUL_KIND_NOWRITE;
>>>>> +        ASSERT(v->arch.vm_event.emul_read_data);
>>>>>    
>>>>> -            hvm_mem_access_emulate_one(kind, TRAP_invalid_op,
>>>>> -                                       HVM_DELIVER_NO_ERROR_CODE);
>>>>> +        kind = EMUL_KIND_NORMAL;
>>>> Please keep this being the initializer of the variable.
>>> I put it there because of the ASSERT (to do that before anything else),
>>> but I will undo if you prefer.
>> Since the initializer is (very obviously) independent of the
>> condition the ASSERT() checks, I indeed would prefer it to remain
>> the way it is before this change.
>>
>>>>> -            v->arch.vm_event->emulate_flags = 0;
>>>>> -        }
>>>>> +        if ( v->arch.vm_event.emulate_flags & VM_EVENT_FLAG_SET_EMUL_READ_DATA 
> )
>>>> Long line.
>>> Long but under 80 columns, isn't that the rule? :-)
>> I've counted 81 here.
> 
> You may have counted the beginning '+' as well. Is the rule "<= 80 
> columns in the source file" (in which case you're wrong) or is it "<= 80 
> columns in the resulting diff" (in which case I'm wrong)?

Ah - you first said "under 80" but now say "<= 80". The former
is what ./CODING_STYLE asks for. (And ftr the 81 included the
newline, so indeed I was off by one too.)

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 3/8] x86/vm-event/monitor: relocate code-motion more appropriately
  2016-07-04 14:05       ` Jan Beulich
@ 2016-07-04 14:16         ` Razvan Cojocaru
  2016-07-04 14:35           ` Corneliu ZUZU
  0 siblings, 1 reply; 59+ messages in thread
From: Razvan Cojocaru @ 2016-07-04 14:16 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Tamas K Lengyel, Andrew Cooper, xen-devel,
	Jun Nakajima, Corneliu ZUZU

On 07/04/16 17:05, Jan Beulich wrote:
>>>> On 04.07.16 at 15:22, <rcojocaru@bitdefender.com> wrote:
>> On 07/04/16 13:22, Jan Beulich wrote:
>>>>>> On 30.06.16 at 20:43, <czuzu@bitdefender.com> wrote:
>>>> @@ -119,6 +156,55 @@ bool_t monitored_msr(const struct domain *d, u32 msr)
>>>>      return test_bit(msr, bitmap);
>>>>  }
>>>>  
>>>> +static void write_ctrlreg_adjust_traps(struct domain *d)
>>>> +{
>>>> +    struct vcpu *v;
>>>> +    struct arch_vmx_struct *avmx;
>>>> +    unsigned int cr3_bitmask;
>>>> +    bool_t cr3_vmevent, cr3_ldexit;
>>>> +
>>>> +    /* Adjust CR3 load-exiting. */
>>>> +
>>>> +    /* vmx only */
>>>> +    ASSERT(cpu_has_vmx);
>>>> +
>>>> +    /* non-hap domains trap CR3 writes unconditionally */
>>>> +    if ( !paging_mode_hap(d) )
>>>> +    {
>>>> +        for_each_vcpu ( d, v )
>>>> +            ASSERT(v->arch.hvm_vmx.exec_control & CPU_BASED_CR3_LOAD_EXITING);
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    cr3_bitmask = monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3);
>>>> +    cr3_vmevent = !!(d->arch.monitor.write_ctrlreg_enabled & cr3_bitmask);
>>>> +
>>>> +    for_each_vcpu ( d, v )
>>>> +    {
>>>> +        avmx = &v->arch.hvm_vmx;
>>>> +        cr3_ldexit = !!(avmx->exec_control & CPU_BASED_CR3_LOAD_EXITING);
>>>> +
>>>> +        if ( cr3_vmevent == cr3_ldexit )
>>>> +            continue;
>>>> +
>>>> +        /*
>>>> +         * If CR0.PE=0, CR3 load exiting must remain enabled.
>>>> +         * See vmx_update_guest_cr code motion for cr = 0.
>>>> +         */
>>>> +        if ( cr3_ldexit && !hvm_paging_enabled(v) && !vmx_unrestricted_guest(v) 
>>>> )
>>>> +            continue;
>>>> +
>>>> +        if ( cr3_vmevent )
>>>> +            avmx->exec_control |= CPU_BASED_CR3_LOAD_EXITING;
>>>> +        else
>>>> +            avmx->exec_control &= ~CPU_BASED_CR3_LOAD_EXITING;
>>>> +
>>>> +        vmx_vmcs_enter(v);
>>>> +        vmx_update_cpu_exec_control(v);
>>>> +        vmx_vmcs_exit(v);
>>>> +    }
>>>> +}
>>>
>>> While Razvan gave his ack already, I wonder whether it's really a
>>> good idea to put deeply VMX-specific code outside of a VMX-specific
>>> file.
>>
>> Didn't I add "for monitor / vm_event parts Acked-by: ..."? If I didn't,
>> I meant to.
> 
> Well - this is a monitor file (monitor.c).

Fair enough, I should have been more detailed here. I do see the merit
of your suggestion, and so FWIW I second your suggestion to move the
code to some VMX-specific part of the tree if possible.


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 5/8] x86/vm-event/monitor: don't compromise monitor_write_data on domain cleanup
  2016-07-04 13:50         ` Razvan Cojocaru
  2016-07-04 14:05           ` Corneliu ZUZU
@ 2016-07-04 14:17           ` Jan Beulich
  2016-07-04 14:21             ` Razvan Cojocaru
  2016-07-04 14:45             ` Corneliu ZUZU
  1 sibling, 2 replies; 59+ messages in thread
From: Jan Beulich @ 2016-07-04 14:17 UTC (permalink / raw)
  To: Corneliu ZUZU, Razvan Cojocaru
  Cc: George Dunlap, Andrew Cooper, Paul Durrant, Tamas K Lengyel, xen-devel

>>> On 04.07.16 at 15:50, <rcojocaru@bitdefender.com> wrote:
> On 07/04/16 16:11, Jan Beulich wrote:
>>>>> On 04.07.16 at 15:03, <czuzu@bitdefender.com> wrote:
>>> On 7/4/2016 3:47 PM, Jan Beulich wrote:
>>>>>>> On 30.06.16 at 20:45, <czuzu@bitdefender.com> wrote:
>>>>> The arch_vm_event structure is dynamically allocated and freed @
>>>>> vm_event_cleanup_domain. This cleanup is triggered e.g. when the toolstack 
> user
>>>>> disables domain monitoring (xc_monitor_disable), which in turn effectively
>>>>> discards any information that was in arch_vm_event.write_data.
>>>> Isn't that rather a toolstack user bug, not warranting a relatively
>>>> extensive (even if mostly mechanical) hypervisor change like this
>>>> one? Sane monitor behavior, after all, is required anyway for the
>>>> monitored guest to survive.
>>>
>>> Sorry but could you please rephrase this, I don't quite understand what 
>>> you're saying.
>>> The write_data field in arch_vm_event should _not ever_ be invalidated 
>>> as a direct result of a toolstack user's action.
>> 
>> The monitoring app can cause all kinds of problems to the guest it
>> monitors. Why would this specific one need taking care of in the
>> hypervisor, instead of demanding that the app not disable monitoring
>> at the wrong time?
> 
> I'm not sure there's a right time here. The problem is that, if I
> understand this correctly, a race is possible between the moment the
> userspace application responds to the vm_event _and_ call
> xc_monitor_disable() and the time hvm_do_resume() gets called.

It's that _and_ in your reply that I put under question, but I admit
that questioning may be due to my limited (or should I say non-
existent) knowledge on the user space parts here: Why would the
app be _required_ to "responds to the vm_event _and_ call
xc_monitor_disable()", rather than delaying the latter for long
enough?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 5/8] x86/vm-event/monitor: don't compromise monitor_write_data on domain cleanup
  2016-07-04 14:17           ` Jan Beulich
@ 2016-07-04 14:21             ` Razvan Cojocaru
  2016-07-04 15:07               ` Jan Beulich
  2016-07-04 14:45             ` Corneliu ZUZU
  1 sibling, 1 reply; 59+ messages in thread
From: Razvan Cojocaru @ 2016-07-04 14:21 UTC (permalink / raw)
  To: Jan Beulich, Corneliu ZUZU
  Cc: George Dunlap, Andrew Cooper, Paul Durrant, Tamas K Lengyel, xen-devel

On 07/04/16 17:17, Jan Beulich wrote:
>>>> On 04.07.16 at 15:50, <rcojocaru@bitdefender.com> wrote:
>> On 07/04/16 16:11, Jan Beulich wrote:
>>>>>> On 04.07.16 at 15:03, <czuzu@bitdefender.com> wrote:
>>>> On 7/4/2016 3:47 PM, Jan Beulich wrote:
>>>>>>>> On 30.06.16 at 20:45, <czuzu@bitdefender.com> wrote:
>>>>>> The arch_vm_event structure is dynamically allocated and freed @
>>>>>> vm_event_cleanup_domain. This cleanup is triggered e.g. when the toolstack 
>> user
>>>>>> disables domain monitoring (xc_monitor_disable), which in turn effectively
>>>>>> discards any information that was in arch_vm_event.write_data.
>>>>> Isn't that rather a toolstack user bug, not warranting a relatively
>>>>> extensive (even if mostly mechanical) hypervisor change like this
>>>>> one? Sane monitor behavior, after all, is required anyway for the
>>>>> monitored guest to survive.
>>>>
>>>> Sorry but could you please rephrase this, I don't quite understand what 
>>>> you're saying.
>>>> The write_data field in arch_vm_event should _not ever_ be invalidated 
>>>> as a direct result of a toolstack user's action.
>>>
>>> The monitoring app can cause all kinds of problems to the guest it
>>> monitors. Why would this specific one need taking care of in the
>>> hypervisor, instead of demanding that the app not disable monitoring
>>> at the wrong time?
>>
>> I'm not sure there's a right time here. The problem is that, if I
>> understand this correctly, a race is possible between the moment the
>> userspace application responds to the vm_event _and_ call
>> xc_monitor_disable() and the time hvm_do_resume() gets called.
> 
> It's that _and_ in your reply that I put under question, but I admit
> that questioning may be due to my limited (or should I say non-
> existent) knowledge on the user space parts here: Why would the
> app be _required_ to "responds to the vm_event _and_ call
> xc_monitor_disable()", rather than delaying the latter for long
> enough?

It's not required to do that, it's just that we don't really know what
"long enough means". I suppose a second should do be more than enough,
but obviously we'd prefer a fool-proof solution with better guarantees
and no lag - I thought that the hypervisor change is trivial enough to
not make the tradeoff into much of an issue.


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 4/8] x86/vm-event/monitor: turn monitor_write_data.do_write into enum
  2016-07-04 14:08           ` Jan Beulich
@ 2016-07-04 14:23             ` Razvan Cojocaru
  2016-07-04 14:24             ` Corneliu ZUZU
  1 sibling, 0 replies; 59+ messages in thread
From: Razvan Cojocaru @ 2016-07-04 14:23 UTC (permalink / raw)
  To: Jan Beulich, Corneliu ZUZU; +Cc: Andrew Cooper, xen-devel

On 07/04/16 17:08, Jan Beulich wrote:
>>>> On 04.07.16 at 15:21, <czuzu@bitdefender.com> wrote:
>> On 7/4/2016 4:07 PM, Jan Beulich wrote:
>>>>>> On 04.07.16 at 14:47, <czuzu@bitdefender.com> wrote:
>>>> On 7/4/2016 3:37 PM, Jan Beulich wrote:
>>>>>>>> On 30.06.16 at 20:44, <czuzu@bitdefender.com> wrote:
>>>>>> After trapping a control-register write vm-event and -until- deciding if that
>>>>>> write is to be permitted or not (VM_EVENT_FLAG_DENY) and doing the actual
>>>>>> write,
>>>>>> there cannot and should not be another trapped control-register write event.
>>>>> Is that true even for the case where full register state gets updated
>>>>> for a vCPU?
>>>> AFAIK, the full register state cannot be updated _at once_, that is:
>>>> after each trapped register update monitor_write_data must _always_ be
>>>> handled _before reentering the vCPU_.
>>> I'm thinking about operations like VCPUOP_initialise here. Of course
>>> operations on current can't possibly update more than one register
>>> at a time.
>>
>> Yes but those register update operations happen outside the vm-event 
>> subsystem, i.e. in those cases the registers get updated directly, not 
>> by setting bits in monitor_write_data.
>> Only hypervisor-trapped register updates (e.g. hvm_set_cr0 w/ may_defer 
>> parameter = 1) which are deferred because of hvm_event_crX returning 
>> true use monitor_write_data.
> 
> That's why I had added "Is that updating-all-context case of no
> interest to a monitoring application, now and forever?" After all I
> gave VCPUOP_initialise as an example because the guest itself
> can invoke it, and so I had assumed this to be of interest to a
> monitoring app.

Well, the change does simplify the code for now but I can't bet on
"forever" - maybe Tamas has some ideas here also? It may be that the
prudent thing to do is leave the code without the enum change.


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 4/8] x86/vm-event/monitor: turn monitor_write_data.do_write into enum
  2016-07-04 14:08           ` Jan Beulich
  2016-07-04 14:23             ` Razvan Cojocaru
@ 2016-07-04 14:24             ` Corneliu ZUZU
  2016-07-04 15:03               ` Jan Beulich
  1 sibling, 1 reply; 59+ messages in thread
From: Corneliu ZUZU @ 2016-07-04 14:24 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel

On 7/4/2016 5:08 PM, Jan Beulich wrote:
>>>> On 04.07.16 at 15:21, <czuzu@bitdefender.com> wrote:
>> On 7/4/2016 4:07 PM, Jan Beulich wrote:
>>>>>> On 04.07.16 at 14:47, <czuzu@bitdefender.com> wrote:
>>>> On 7/4/2016 3:37 PM, Jan Beulich wrote:
>>>>>>>> On 30.06.16 at 20:44, <czuzu@bitdefender.com> wrote:
>>>>>> After trapping a control-register write vm-event and -until- deciding if that
>>>>>> write is to be permitted or not (VM_EVENT_FLAG_DENY) and doing the actual
>>>>>> write,
>>>>>> there cannot and should not be another trapped control-register write event.
>>>>> Is that true even for the case where full register state gets updated
>>>>> for a vCPU?
>>>> AFAIK, the full register state cannot be updated _at once_, that is:
>>>> after each trapped register update monitor_write_data must _always_ be
>>>> handled _before reentering the vCPU_.
>>> I'm thinking about operations like VCPUOP_initialise here. Of course
>>> operations on current can't possibly update more than one register
>>> at a time.
>> Yes but those register update operations happen outside the vm-event
>> subsystem, i.e. in those cases the registers get updated directly, not
>> by setting bits in monitor_write_data.
>> Only hypervisor-trapped register updates (e.g. hvm_set_cr0 w/ may_defer
>> parameter = 1) which are deferred because of hvm_event_crX returning
>> true use monitor_write_data.
> That's why I had added "Is that updating-all-context case of no
> interest to a monitoring application, now and forever?" After all I
> gave VCPUOP_initialise as an example because the guest itself
> can invoke it, and so I had assumed this to be of interest to a
> monitoring app.
>
> Jan
>
>

Hmm, didn't have in mind the fact that the guest can do those kind of 
operations, I suppose that applies to PV guests, right? (in which case 
this scenario would be triggered by an in-guest PV driver, right?)
CR-write monitor vm-events currently only trap _machine instructions_ 
which cause CR writes, but in the future I can see why it might be 
useful for a monitoring app to trap _any kind of guest behavior_ which 
would cause that.
Good point, maybe Razvan and Tamas can chime in on this.

Thanks,
Corneliu.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 5/8] x86/vm-event/monitor: don't compromise monitor_write_data on domain cleanup
  2016-07-04 14:13           ` Jan Beulich
@ 2016-07-04 14:31             ` Corneliu ZUZU
  2016-07-04 15:09               ` Jan Beulich
  0 siblings, 1 reply; 59+ messages in thread
From: Corneliu ZUZU @ 2016-07-04 14:31 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tamas K Lengyel, Razvan Cojocaru, George Dunlap, Andrew Cooper,
	xen-devel, Paul Durrant

On 7/4/2016 5:13 PM, Jan Beulich wrote:
>>>> On 04.07.16 at 15:28, <czuzu@bitdefender.com> wrote:
>> On 7/4/2016 4:11 PM, Jan Beulich wrote:
>>>>>> On 04.07.16 at 15:03, <czuzu@bitdefender.com> wrote:
>>>> On 7/4/2016 3:47 PM, Jan Beulich wrote:
>>>>>>>> On 30.06.16 at 20:45, <czuzu@bitdefender.com> wrote:
>>>>>> The arch_vm_event structure is dynamically allocated and freed @
>>>>>> vm_event_cleanup_domain. This cleanup is triggered e.g. when the toolstack
>> user
>>>>>> disables domain monitoring (xc_monitor_disable), which in turn effectively
>>>>>> discards any information that was in arch_vm_event.write_data.
>>>>> Isn't that rather a toolstack user bug, not warranting a relatively
>>>>> extensive (even if mostly mechanical) hypervisor change like this
>>>>> one? Sane monitor behavior, after all, is required anyway for the
>>>>> monitored guest to survive.
>>>> Sorry but could you please rephrase this, I don't quite understand what
>>>> you're saying.
>>>> The write_data field in arch_vm_event should _not ever_ be invalidated
>>>> as a direct result of a toolstack user's action.
>>> The monitoring app can cause all kinds of problems to the guest it
>>> monitors. Why would this specific one need taking care of in the
>>> hypervisor, instead of demanding that the app not disable monitoring
>>> at the wrong time?
>> Because it wouldn't be the wrong time to disable monitoring.
>> This is not a case of wrong toolstack usage, but rather a case of
>> xc_monitor_disable doing the wrong thing along the way.
>>
>>>>>> --- a/xen/arch/x86/hvm/hvm.c
>>>>>> +++ b/xen/arch/x86/hvm/hvm.c
>>>>>> @@ -473,24 +473,24 @@ void hvm_do_resume(struct vcpu *v)
>>>>>>         if ( !handle_hvm_io_completion(v) )
>>>>>>             return;
>>>>>>     
>>>>>> -    if ( unlikely(v->arch.vm_event) )
>>>>>> +    if ( unlikely(v->arch.vm_event.emulate_flags) )
>>>>>>         {
>>>>>> -        if ( v->arch.vm_event->emulate_flags )
>>>>>> -        {
>>>>>> -            enum emul_kind kind = EMUL_KIND_NORMAL;
>>>>>> +        enum emul_kind kind;
>>>>>>     
>>>>>> -            if ( v->arch.vm_event->emulate_flags &
>>>>>> -                 VM_EVENT_FLAG_SET_EMUL_READ_DATA )
>>>>>> -                kind = EMUL_KIND_SET_CONTEXT;
>>>>>> -            else if ( v->arch.vm_event->emulate_flags &
>>>>>> -                      VM_EVENT_FLAG_EMULATE_NOWRITE )
>>>>>> -                kind = EMUL_KIND_NOWRITE;
>>>>>> +        ASSERT(v->arch.vm_event.emul_read_data);
>>>>>>     
>>>>>> -            hvm_mem_access_emulate_one(kind, TRAP_invalid_op,
>>>>>> -                                       HVM_DELIVER_NO_ERROR_CODE);
>>>>>> +        kind = EMUL_KIND_NORMAL;
>>>>> Please keep this being the initializer of the variable.
>>>> I put it there because of the ASSERT (to do that before anything else),
>>>> but I will undo if you prefer.
>>> Since the initializer is (very obviously) independent of the
>>> condition the ASSERT() checks, I indeed would prefer it to remain
>>> the way it is before this change.
>>>
>>>>>> -            v->arch.vm_event->emulate_flags = 0;
>>>>>> -        }
>>>>>> +        if ( v->arch.vm_event.emulate_flags & VM_EVENT_FLAG_SET_EMUL_READ_DATA
>> )
>>>>> Long line.
>>>> Long but under 80 columns, isn't that the rule? :-)
>>> I've counted 81 here.
>> You may have counted the beginning '+' as well. Is the rule "<= 80
>> columns in the source file" (in which case you're wrong) or is it "<= 80
>> columns in the resulting diff" (in which case I'm wrong)?
> Ah - you first said "under 80" but now say "<= 80". The former
> is what ./CODING_STYLE asks for. (And ftr the 81 included the
> newline, so indeed I was off by one too.)
>
> Jan

Hmm, I meant "<= 80" the first time too, I was under the impression that 
that's what CODING_STYLE asks for too, but, quoting:
     'Lines should be less than 80 characters in length.'
Why not <= 80? Shouldn't the limit (i.e. "<=") be 80 columns, including 
the 80-th? (this points me otherwise 
http://programmers.stackexchange.com/questions/148677/why-is-80-characters-the-standard-limit-for-code-width 
)

Corneliu.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 3/8] x86/vm-event/monitor: relocate code-motion more appropriately
  2016-07-04 14:16         ` Razvan Cojocaru
@ 2016-07-04 14:35           ` Corneliu ZUZU
  0 siblings, 0 replies; 59+ messages in thread
From: Corneliu ZUZU @ 2016-07-04 14:35 UTC (permalink / raw)
  To: Razvan Cojocaru, Jan Beulich
  Cc: Andrew Cooper, Kevin Tian, Tamas K Lengyel, Jun Nakajima, xen-devel

On 7/4/2016 5:16 PM, Razvan Cojocaru wrote:
> On 07/04/16 17:05, Jan Beulich wrote:
>>>>> On 04.07.16 at 15:22, <rcojocaru@bitdefender.com> wrote:
>>> On 07/04/16 13:22, Jan Beulich wrote:
>>>>>>> On 30.06.16 at 20:43, <czuzu@bitdefender.com> wrote:
>>>>> @@ -119,6 +156,55 @@ bool_t monitored_msr(const struct domain *d, u32 msr)
>>>>>       return test_bit(msr, bitmap);
>>>>>   }
>>>>>   
>>>>> +static void write_ctrlreg_adjust_traps(struct domain *d)
>>>>> +{
>>>>> +    struct vcpu *v;
>>>>> +    struct arch_vmx_struct *avmx;
>>>>> +    unsigned int cr3_bitmask;
>>>>> +    bool_t cr3_vmevent, cr3_ldexit;
>>>>> +
>>>>> +    /* Adjust CR3 load-exiting. */
>>>>> +
>>>>> +    /* vmx only */
>>>>> +    ASSERT(cpu_has_vmx);
>>>>> +
>>>>> +    /* non-hap domains trap CR3 writes unconditionally */
>>>>> +    if ( !paging_mode_hap(d) )
>>>>> +    {
>>>>> +        for_each_vcpu ( d, v )
>>>>> +            ASSERT(v->arch.hvm_vmx.exec_control & CPU_BASED_CR3_LOAD_EXITING);
>>>>> +        return;
>>>>> +    }
>>>>> +
>>>>> +    cr3_bitmask = monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3);
>>>>> +    cr3_vmevent = !!(d->arch.monitor.write_ctrlreg_enabled & cr3_bitmask);
>>>>> +
>>>>> +    for_each_vcpu ( d, v )
>>>>> +    {
>>>>> +        avmx = &v->arch.hvm_vmx;
>>>>> +        cr3_ldexit = !!(avmx->exec_control & CPU_BASED_CR3_LOAD_EXITING);
>>>>> +
>>>>> +        if ( cr3_vmevent == cr3_ldexit )
>>>>> +            continue;
>>>>> +
>>>>> +        /*
>>>>> +         * If CR0.PE=0, CR3 load exiting must remain enabled.
>>>>> +         * See vmx_update_guest_cr code motion for cr = 0.
>>>>> +         */
>>>>> +        if ( cr3_ldexit && !hvm_paging_enabled(v) && !vmx_unrestricted_guest(v)
>>>>> )
>>>>> +            continue;
>>>>> +
>>>>> +        if ( cr3_vmevent )
>>>>> +            avmx->exec_control |= CPU_BASED_CR3_LOAD_EXITING;
>>>>> +        else
>>>>> +            avmx->exec_control &= ~CPU_BASED_CR3_LOAD_EXITING;
>>>>> +
>>>>> +        vmx_vmcs_enter(v);
>>>>> +        vmx_update_cpu_exec_control(v);
>>>>> +        vmx_vmcs_exit(v);
>>>>> +    }
>>>>> +}
>>>> While Razvan gave his ack already, I wonder whether it's really a
>>>> good idea to put deeply VMX-specific code outside of a VMX-specific
>>>> file.
>>> Didn't I add "for monitor / vm_event parts Acked-by: ..."? If I didn't,
>>> I meant to.
>> Well - this is a monitor file (monitor.c).
> Fair enough, I should have been more detailed here. I do see the merit
> of your suggestion, and so FWIW I second your suggestion to move the
> code to some VMX-specific part of the tree if possible.

But did you also see my reply concerning this suggestion? :-)

>
> Thanks,
> Razvan
>

Thanks,
Corneliu.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 5/8] x86/vm-event/monitor: don't compromise monitor_write_data on domain cleanup
  2016-07-04 14:17           ` Jan Beulich
  2016-07-04 14:21             ` Razvan Cojocaru
@ 2016-07-04 14:45             ` Corneliu ZUZU
  1 sibling, 0 replies; 59+ messages in thread
From: Corneliu ZUZU @ 2016-07-04 14:45 UTC (permalink / raw)
  To: Jan Beulich, Razvan Cojocaru
  Cc: George Dunlap, Andrew Cooper, Paul Durrant, Tamas K Lengyel, xen-devel

On 7/4/2016 5:17 PM, Jan Beulich wrote:
>>>> On 04.07.16 at 15:50, <rcojocaru@bitdefender.com> wrote:
>> On 07/04/16 16:11, Jan Beulich wrote:
>>>>>> On 04.07.16 at 15:03, <czuzu@bitdefender.com> wrote:
>>>> On 7/4/2016 3:47 PM, Jan Beulich wrote:
>>>>>>>> On 30.06.16 at 20:45, <czuzu@bitdefender.com> wrote:
>>>>>> The arch_vm_event structure is dynamically allocated and freed @
>>>>>> vm_event_cleanup_domain. This cleanup is triggered e.g. when the toolstack
>> user
>>>>>> disables domain monitoring (xc_monitor_disable), which in turn effectively
>>>>>> discards any information that was in arch_vm_event.write_data.
>>>>> Isn't that rather a toolstack user bug, not warranting a relatively
>>>>> extensive (even if mostly mechanical) hypervisor change like this
>>>>> one? Sane monitor behavior, after all, is required anyway for the
>>>>> monitored guest to survive.
>>>> Sorry but could you please rephrase this, I don't quite understand what
>>>> you're saying.
>>>> The write_data field in arch_vm_event should _not ever_ be invalidated
>>>> as a direct result of a toolstack user's action.
>>> The monitoring app can cause all kinds of problems to the guest it
>>> monitors. Why would this specific one need taking care of in the
>>> hypervisor, instead of demanding that the app not disable monitoring
>>> at the wrong time?
>> I'm not sure there's a right time here. The problem is that, if I
>> understand this correctly, a race is possible between the moment the
>> userspace application responds to the vm_event _and_ call
>> xc_monitor_disable() and the time hvm_do_resume() gets called.
> It's that _and_ in your reply that I put under question, but I admit
> that questioning may be due to my limited (or should I say non-
> existent) knowledge on the user space parts here: Why would the
> app be _required_ to "responds to the vm_event _and_ call
> xc_monitor_disable()", rather than delaying the latter for long
> enough?
>
> Jan
>
>

Because it's a limitation that can easily be avoided and also a rule 
which the user has to keep in mind (why should we impose this effort on 
the toolstack user if not necessary?).

Corneliu.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 3/8] x86/vm-event/monitor: relocate code-motion more appropriately
  2016-07-04 11:02     ` Corneliu ZUZU
@ 2016-07-04 14:58       ` Jan Beulich
  2016-07-04 16:00         ` Corneliu ZUZU
  0 siblings, 1 reply; 59+ messages in thread
From: Jan Beulich @ 2016-07-04 14:58 UTC (permalink / raw)
  To: Corneliu ZUZU
  Cc: Kevin Tian, Tamas K Lengyel, Razvan Cojocaru, Andrew Cooper,
	xen-devel, Jun Nakajima

>>> On 04.07.16 at 13:02, <czuzu@bitdefender.com> wrote:
> On 7/4/2016 1:22 PM, Jan Beulich wrote:
>>>>> On 30.06.16 at 20:43, <czuzu@bitdefender.com> wrote:
>>> @@ -119,6 +156,55 @@ bool_t monitored_msr(const struct domain *d, u32 msr)
>>>       return test_bit(msr, bitmap);
>>>   }
>>>   
>>> +static void write_ctrlreg_adjust_traps(struct domain *d)
>>> +{
>>> +    struct vcpu *v;
>>> +    struct arch_vmx_struct *avmx;
>>> +    unsigned int cr3_bitmask;
>>> +    bool_t cr3_vmevent, cr3_ldexit;
>>> +
>>> +    /* Adjust CR3 load-exiting. */
>>> +
>>> +    /* vmx only */
>>> +    ASSERT(cpu_has_vmx);
>>> +
>>> +    /* non-hap domains trap CR3 writes unconditionally */
>>> +    if ( !paging_mode_hap(d) )
>>> +    {
>>> +        for_each_vcpu ( d, v )
>>> +            ASSERT(v->arch.hvm_vmx.exec_control & CPU_BASED_CR3_LOAD_EXITING);
>>> +        return;
>>> +    }
>>> +
>>> +    cr3_bitmask = monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3);
>>> +    cr3_vmevent = !!(d->arch.monitor.write_ctrlreg_enabled & cr3_bitmask);
>>> +
>>> +    for_each_vcpu ( d, v )
>>> +    {
>>> +        avmx = &v->arch.hvm_vmx;
>>> +        cr3_ldexit = !!(avmx->exec_control & CPU_BASED_CR3_LOAD_EXITING);
>>> +
>>> +        if ( cr3_vmevent == cr3_ldexit )
>>> +            continue;
>>> +
>>> +        /*
>>> +         * If CR0.PE=0, CR3 load exiting must remain enabled.
>>> +         * See vmx_update_guest_cr code motion for cr = 0.
>>> +         */
>>> +        if ( cr3_ldexit && !hvm_paging_enabled(v) && !vmx_unrestricted_guest(v)
>>> )
>>> +            continue;
>>> +
>>> +        if ( cr3_vmevent )
>>> +            avmx->exec_control |= CPU_BASED_CR3_LOAD_EXITING;
>>> +        else
>>> +            avmx->exec_control &= ~CPU_BASED_CR3_LOAD_EXITING;
>>> +
>>> +        vmx_vmcs_enter(v);
>>> +        vmx_update_cpu_exec_control(v);
>>> +        vmx_vmcs_exit(v);
>>> +    }
>>> +}
>> While Razvan gave his ack already, I wonder whether it's really a
>> good idea to put deeply VMX-specific code outside of a VMX-specific
>> file.
> 
> Well, a summary of what this function does would sound like: "adjusts 
> CR3 load-exiting for cr-write monitor vm-events". IMHO that's (monitor) 
> vm-event specific enough to be placed within the vm-event subsystem.
> Could you suggest concretely how this separation would look like? (where 
> to put this function/parts of it (and what parts), what name should it 
> have once moved).

I won't go into that level of detail. Fact is that VMX-specific code
should be kept out of here. Whether you move the entire function
behind a hvm_funcs hook or just part of it is of no interest to me.
In no case should, if and when SVM eventually gets supported for
vm-event/monitor too, this function end up doing both VMX and SVM
specific things.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 4/8] x86/vm-event/monitor: turn monitor_write_data.do_write into enum
  2016-07-04 14:24             ` Corneliu ZUZU
@ 2016-07-04 15:03               ` Jan Beulich
  2016-07-04 15:10                 ` Corneliu ZUZU
  0 siblings, 1 reply; 59+ messages in thread
From: Jan Beulich @ 2016-07-04 15:03 UTC (permalink / raw)
  To: Corneliu ZUZU; +Cc: Andrew Cooper, xen-devel

>>> On 04.07.16 at 16:24, <czuzu@bitdefender.com> wrote:
> On 7/4/2016 5:08 PM, Jan Beulich wrote:
>>>>> On 04.07.16 at 15:21, <czuzu@bitdefender.com> wrote:
>>> On 7/4/2016 4:07 PM, Jan Beulich wrote:
>>>>>>> On 04.07.16 at 14:47, <czuzu@bitdefender.com> wrote:
>>>>> On 7/4/2016 3:37 PM, Jan Beulich wrote:
>>>>>>>>> On 30.06.16 at 20:44, <czuzu@bitdefender.com> wrote:
>>>>>>> After trapping a control-register write vm-event and -until- deciding if that
>>>>>>> write is to be permitted or not (VM_EVENT_FLAG_DENY) and doing the actual
>>>>>>> write,
>>>>>>> there cannot and should not be another trapped control-register write event.
>>>>>> Is that true even for the case where full register state gets updated
>>>>>> for a vCPU?
>>>>> AFAIK, the full register state cannot be updated _at once_, that is:
>>>>> after each trapped register update monitor_write_data must _always_ be
>>>>> handled _before reentering the vCPU_.
>>>> I'm thinking about operations like VCPUOP_initialise here. Of course
>>>> operations on current can't possibly update more than one register
>>>> at a time.
>>> Yes but those register update operations happen outside the vm-event
>>> subsystem, i.e. in those cases the registers get updated directly, not
>>> by setting bits in monitor_write_data.
>>> Only hypervisor-trapped register updates (e.g. hvm_set_cr0 w/ may_defer
>>> parameter = 1) which are deferred because of hvm_event_crX returning
>>> true use monitor_write_data.
>> That's why I had added "Is that updating-all-context case of no
>> interest to a monitoring application, now and forever?" After all I
>> gave VCPUOP_initialise as an example because the guest itself
>> can invoke it, and so I had assumed this to be of interest to a
>> monitoring app.
> 
> Hmm, didn't have in mind the fact that the guest can do those kind of 
> operations, I suppose that applies to PV guests, right? (in which case 
> this scenario would be triggered by an in-guest PV driver, right?)

A driver wouldn't normally do such things, unless the OS is really
highly modularized (far beyond what Linux allows). But yes, this
is a PV operation, but equally applies to PVHv2 (see commit
192df6f912 ["x86: allow HVM guests to use hypercalls to bring up
vCPUs"]) and by extension/generalization HVM.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 5/8] x86/vm-event/monitor: don't compromise monitor_write_data on domain cleanup
  2016-07-04 14:21             ` Razvan Cojocaru
@ 2016-07-04 15:07               ` Jan Beulich
  2016-07-04 15:21                 ` Corneliu ZUZU
  0 siblings, 1 reply; 59+ messages in thread
From: Jan Beulich @ 2016-07-04 15:07 UTC (permalink / raw)
  To: Corneliu ZUZU, Razvan Cojocaru
  Cc: George Dunlap, Andrew Cooper, Paul Durrant, Tamas K Lengyel, xen-devel

>>> On 04.07.16 at 16:21, <rcojocaru@bitdefender.com> wrote:
> On 07/04/16 17:17, Jan Beulich wrote:
>>>>> On 04.07.16 at 15:50, <rcojocaru@bitdefender.com> wrote:
>>> On 07/04/16 16:11, Jan Beulich wrote:
>>>>>>> On 04.07.16 at 15:03, <czuzu@bitdefender.com> wrote:
>>>>> On 7/4/2016 3:47 PM, Jan Beulich wrote:
>>>>>>>>> On 30.06.16 at 20:45, <czuzu@bitdefender.com> wrote:
>>>>>>> The arch_vm_event structure is dynamically allocated and freed @
>>>>>>> vm_event_cleanup_domain. This cleanup is triggered e.g. when the toolstack 
>>> user
>>>>>>> disables domain monitoring (xc_monitor_disable), which in turn effectively
>>>>>>> discards any information that was in arch_vm_event.write_data.
>>>>>> Isn't that rather a toolstack user bug, not warranting a relatively
>>>>>> extensive (even if mostly mechanical) hypervisor change like this
>>>>>> one? Sane monitor behavior, after all, is required anyway for the
>>>>>> monitored guest to survive.
>>>>>
>>>>> Sorry but could you please rephrase this, I don't quite understand what 
>>>>> you're saying.
>>>>> The write_data field in arch_vm_event should _not ever_ be invalidated 
>>>>> as a direct result of a toolstack user's action.
>>>>
>>>> The monitoring app can cause all kinds of problems to the guest it
>>>> monitors. Why would this specific one need taking care of in the
>>>> hypervisor, instead of demanding that the app not disable monitoring
>>>> at the wrong time?
>>>
>>> I'm not sure there's a right time here. The problem is that, if I
>>> understand this correctly, a race is possible between the moment the
>>> userspace application responds to the vm_event _and_ call
>>> xc_monitor_disable() and the time hvm_do_resume() gets called.
>> 
>> It's that _and_ in your reply that I put under question, but I admit
>> that questioning may be due to my limited (or should I say non-
>> existent) knowledge on the user space parts here: Why would the
>> app be _required_ to "responds to the vm_event _and_ call
>> xc_monitor_disable()", rather than delaying the latter for long
>> enough?
> 
> It's not required to do that, it's just that we don't really know what
> "long enough means". I suppose a second should do be more than enough,
> but obviously we'd prefer a fool-proof solution with better guarantees
> and no lag - I thought that the hypervisor change is trivial enough to
> not make the tradeoff into much of an issue.

It's mostly mechanical indeed, but iirc it grows struct vcpu (or was
it struct domain?), which is never really desirable (but admittedly
often unavoidable). I'm therefore simply trying to understand what
alternatives there are.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 5/8] x86/vm-event/monitor: don't compromise monitor_write_data on domain cleanup
  2016-07-04 14:31             ` Corneliu ZUZU
@ 2016-07-04 15:09               ` Jan Beulich
  2016-07-04 15:24                 ` Corneliu ZUZU
  0 siblings, 1 reply; 59+ messages in thread
From: Jan Beulich @ 2016-07-04 15:09 UTC (permalink / raw)
  To: Corneliu ZUZU
  Cc: Tamas K Lengyel, Razvan Cojocaru, George Dunlap, Andrew Cooper,
	xen-devel, Paul Durrant

>>> On 04.07.16 at 16:31, <czuzu@bitdefender.com> wrote:
> Hmm, I meant "<= 80" the first time too, I was under the impression that 
> that's what CODING_STYLE asks for too, but, quoting:
>      'Lines should be less than 80 characters in length.'
> Why not <= 80? Shouldn't the limit (i.e. "<=") be 80 columns, including 
> the 80-th? (this points me otherwise 
> http://programmers.stackexchange.com/questions/148677/why-is-80-characters-t 
> he-standard-limit-for-code-width 
> )

I have no idea what (editing) tools have limitation at precisely which
line length. Hence I can't answer the "why". I can only refer to the
documented requirement, which aiui is pretty clear.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 4/8] x86/vm-event/monitor: turn monitor_write_data.do_write into enum
  2016-07-04 15:03               ` Jan Beulich
@ 2016-07-04 15:10                 ` Corneliu ZUZU
  0 siblings, 0 replies; 59+ messages in thread
From: Corneliu ZUZU @ 2016-07-04 15:10 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel

On 7/4/2016 6:03 PM, Jan Beulich wrote:
>>>> On 04.07.16 at 16:24, <czuzu@bitdefender.com> wrote:
>> On 7/4/2016 5:08 PM, Jan Beulich wrote:
>>>>>> On 04.07.16 at 15:21, <czuzu@bitdefender.com> wrote:
>>>> On 7/4/2016 4:07 PM, Jan Beulich wrote:
>>>>>>>> On 04.07.16 at 14:47, <czuzu@bitdefender.com> wrote:
>>>>>> On 7/4/2016 3:37 PM, Jan Beulich wrote:
>>>>>>>>>> On 30.06.16 at 20:44, <czuzu@bitdefender.com> wrote:
>>>>>>>> After trapping a control-register write vm-event and -until- deciding if that
>>>>>>>> write is to be permitted or not (VM_EVENT_FLAG_DENY) and doing the actual
>>>>>>>> write,
>>>>>>>> there cannot and should not be another trapped control-register write event.
>>>>>>> Is that true even for the case where full register state gets updated
>>>>>>> for a vCPU?
>>>>>> AFAIK, the full register state cannot be updated _at once_, that is:
>>>>>> after each trapped register update monitor_write_data must _always_ be
>>>>>> handled _before reentering the vCPU_.
>>>>> I'm thinking about operations like VCPUOP_initialise here. Of course
>>>>> operations on current can't possibly update more than one register
>>>>> at a time.
>>>> Yes but those register update operations happen outside the vm-event
>>>> subsystem, i.e. in those cases the registers get updated directly, not
>>>> by setting bits in monitor_write_data.
>>>> Only hypervisor-trapped register updates (e.g. hvm_set_cr0 w/ may_defer
>>>> parameter = 1) which are deferred because of hvm_event_crX returning
>>>> true use monitor_write_data.
>>> That's why I had added "Is that updating-all-context case of no
>>> interest to a monitoring application, now and forever?" After all I
>>> gave VCPUOP_initialise as an example because the guest itself
>>> can invoke it, and so I had assumed this to be of interest to a
>>> monitoring app.
>> Hmm, didn't have in mind the fact that the guest can do those kind of
>> operations, I suppose that applies to PV guests, right? (in which case
>> this scenario would be triggered by an in-guest PV driver, right?)
> A driver wouldn't normally do such things, unless the OS is really
> highly modularized (far beyond what Linux allows). But yes, this
> is a PV operation, but equally applies to PVHv2 (see commit
> 192df6f912 ["x86: allow HVM guests to use hypercalls to bring up
> vCPUs"]) and by extension/generalization HVM.
>
> Jan
>
>

Right, then I agree w/ making this a bitfield as it was (if Tamas has 
nothing to object). And now it also makes sense to _not make this an 
enum_ on ARM too, thanks for pointing this out.

Corneliu.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 5/8] x86/vm-event/monitor: don't compromise monitor_write_data on domain cleanup
  2016-07-04 15:07               ` Jan Beulich
@ 2016-07-04 15:21                 ` Corneliu ZUZU
  2016-07-04 15:57                   ` Jan Beulich
  0 siblings, 1 reply; 59+ messages in thread
From: Corneliu ZUZU @ 2016-07-04 15:21 UTC (permalink / raw)
  To: Jan Beulich, Razvan Cojocaru
  Cc: George Dunlap, Andrew Cooper, Paul Durrant, Tamas K Lengyel, xen-devel

On 7/4/2016 6:07 PM, Jan Beulich wrote:
>>>> On 04.07.16 at 16:21, <rcojocaru@bitdefender.com> wrote:
>> On 07/04/16 17:17, Jan Beulich wrote:
>>>>>> On 04.07.16 at 15:50, <rcojocaru@bitdefender.com> wrote:
>>>> On 07/04/16 16:11, Jan Beulich wrote:
>>>>>>>> On 04.07.16 at 15:03, <czuzu@bitdefender.com> wrote:
>>>>>> On 7/4/2016 3:47 PM, Jan Beulich wrote:
>>>>>>>>>> On 30.06.16 at 20:45, <czuzu@bitdefender.com> wrote:
>>>>>>>> The arch_vm_event structure is dynamically allocated and freed @
>>>>>>>> vm_event_cleanup_domain. This cleanup is triggered e.g. when the toolstack
>>>> user
>>>>>>>> disables domain monitoring (xc_monitor_disable), which in turn effectively
>>>>>>>> discards any information that was in arch_vm_event.write_data.
>>>>>>> Isn't that rather a toolstack user bug, not warranting a relatively
>>>>>>> extensive (even if mostly mechanical) hypervisor change like this
>>>>>>> one? Sane monitor behavior, after all, is required anyway for the
>>>>>>> monitored guest to survive.
>>>>>> Sorry but could you please rephrase this, I don't quite understand what
>>>>>> you're saying.
>>>>>> The write_data field in arch_vm_event should _not ever_ be invalidated
>>>>>> as a direct result of a toolstack user's action.
>>>>> The monitoring app can cause all kinds of problems to the guest it
>>>>> monitors. Why would this specific one need taking care of in the
>>>>> hypervisor, instead of demanding that the app not disable monitoring
>>>>> at the wrong time?
>>>> I'm not sure there's a right time here. The problem is that, if I
>>>> understand this correctly, a race is possible between the moment the
>>>> userspace application responds to the vm_event _and_ call
>>>> xc_monitor_disable() and the time hvm_do_resume() gets called.
>>> It's that _and_ in your reply that I put under question, but I admit
>>> that questioning may be due to my limited (or should I say non-
>>> existent) knowledge on the user space parts here: Why would the
>>> app be _required_ to "responds to the vm_event _and_ call
>>> xc_monitor_disable()", rather than delaying the latter for long
>>> enough?
>> It's not required to do that, it's just that we don't really know what
>> "long enough means". I suppose a second should do be more than enough,
>> but obviously we'd prefer a fool-proof solution with better guarantees
>> and no lag - I thought that the hypervisor change is trivial enough to
>> not make the tradeoff into much of an issue.
> It's mostly mechanical indeed, but iirc it grows struct vcpu (or was
> it struct domain?), which is never really desirable (but admittedly
> often unavoidable). I'm therefore simply trying to understand what
> alternatives there are.
>
> Jan

The change adds 20 bytes to the structure, that's less than 3 
uint64_t's, not that much if you ask me.
But given the discussion we're having over patch 4/8, monitor_write_data 
would indeed get larger if I revert that, so what I propose instead is 
going for something in-between what was before and what I did after, 
i.e. don't dynamically allocate arch_vm_event entirely, but don't 
statically allocate monitor_write_data either. I propose turning 
arch_vm_event into:

struct arch_vm_event {
     uint32_t emulate_flags;
     struct vm_event_emul_read_data *emul_read_data;
     struct monitor_write_data *write_data;
};

This way we can still selectively invalidate write_data from arch_vm_event.
Sounds good?

Corneliu.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 5/8] x86/vm-event/monitor: don't compromise monitor_write_data on domain cleanup
  2016-07-04 15:09               ` Jan Beulich
@ 2016-07-04 15:24                 ` Corneliu ZUZU
  0 siblings, 0 replies; 59+ messages in thread
From: Corneliu ZUZU @ 2016-07-04 15:24 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tamas K Lengyel, Razvan Cojocaru, George Dunlap, Andrew Cooper,
	xen-devel, Paul Durrant

On 7/4/2016 6:09 PM, Jan Beulich wrote:
>>>> On 04.07.16 at 16:31, <czuzu@bitdefender.com> wrote:
>> Hmm, I meant "<= 80" the first time too, I was under the impression that
>> that's what CODING_STYLE asks for too, but, quoting:
>>       'Lines should be less than 80 characters in length.'
>> Why not <= 80? Shouldn't the limit (i.e. "<=") be 80 columns, including
>> the 80-th? (this points me otherwise
>> http://programmers.stackexchange.com/questions/148677/why-is-80-characters-t
>> he-standard-limit-for-code-width
>> )
> I have no idea what (editing) tools have limitation at precisely which
> line length. Hence I can't answer the "why". I can only refer to the
> documented requirement, which aiui is pretty clear.
>
> Jan

Without understanding the "why" we can't be sure we're doing the right 
thing.
I was hoping someone else would provide feedback on this, as I'm almost 
sure who wrote CODING_STYLE actually meant "less than or equal" instead 
of "less than".

Corneliu.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 5/8] x86/vm-event/monitor: don't compromise monitor_write_data on domain cleanup
  2016-07-04 15:21                 ` Corneliu ZUZU
@ 2016-07-04 15:57                   ` Jan Beulich
  2016-07-04 16:09                     ` Corneliu ZUZU
  0 siblings, 1 reply; 59+ messages in thread
From: Jan Beulich @ 2016-07-04 15:57 UTC (permalink / raw)
  To: Corneliu ZUZU, Razvan Cojocaru
  Cc: George Dunlap, Andrew Cooper, Paul Durrant, Tamas K Lengyel, xen-devel

>>> On 04.07.16 at 17:21, <czuzu@bitdefender.com> wrote:
> On 7/4/2016 6:07 PM, Jan Beulich wrote:
>>>>> On 04.07.16 at 16:21, <rcojocaru@bitdefender.com> wrote:
>>> On 07/04/16 17:17, Jan Beulich wrote:
>>>>>>> On 04.07.16 at 15:50, <rcojocaru@bitdefender.com> wrote:
>>>>> On 07/04/16 16:11, Jan Beulich wrote:
>>>>>>>>> On 04.07.16 at 15:03, <czuzu@bitdefender.com> wrote:
>>>>>>> On 7/4/2016 3:47 PM, Jan Beulich wrote:
>>>>>>>>>>> On 30.06.16 at 20:45, <czuzu@bitdefender.com> wrote:
>>>>>>>>> The arch_vm_event structure is dynamically allocated and freed @
>>>>>>>>> vm_event_cleanup_domain. This cleanup is triggered e.g. when the toolstack
>>>>> user
>>>>>>>>> disables domain monitoring (xc_monitor_disable), which in turn effectively
>>>>>>>>> discards any information that was in arch_vm_event.write_data.
>>>>>>>> Isn't that rather a toolstack user bug, not warranting a relatively
>>>>>>>> extensive (even if mostly mechanical) hypervisor change like this
>>>>>>>> one? Sane monitor behavior, after all, is required anyway for the
>>>>>>>> monitored guest to survive.
>>>>>>> Sorry but could you please rephrase this, I don't quite understand what
>>>>>>> you're saying.
>>>>>>> The write_data field in arch_vm_event should _not ever_ be invalidated
>>>>>>> as a direct result of a toolstack user's action.
>>>>>> The monitoring app can cause all kinds of problems to the guest it
>>>>>> monitors. Why would this specific one need taking care of in the
>>>>>> hypervisor, instead of demanding that the app not disable monitoring
>>>>>> at the wrong time?
>>>>> I'm not sure there's a right time here. The problem is that, if I
>>>>> understand this correctly, a race is possible between the moment the
>>>>> userspace application responds to the vm_event _and_ call
>>>>> xc_monitor_disable() and the time hvm_do_resume() gets called.
>>>> It's that _and_ in your reply that I put under question, but I admit
>>>> that questioning may be due to my limited (or should I say non-
>>>> existent) knowledge on the user space parts here: Why would the
>>>> app be _required_ to "responds to the vm_event _and_ call
>>>> xc_monitor_disable()", rather than delaying the latter for long
>>>> enough?
>>> It's not required to do that, it's just that we don't really know what
>>> "long enough means". I suppose a second should do be more than enough,
>>> but obviously we'd prefer a fool-proof solution with better guarantees
>>> and no lag - I thought that the hypervisor change is trivial enough to
>>> not make the tradeoff into much of an issue.
>> It's mostly mechanical indeed, but iirc it grows struct vcpu (or was
>> it struct domain?), which is never really desirable (but admittedly
>> often unavoidable). I'm therefore simply trying to understand what
>> alternatives there are.
> 
> The change adds 20 bytes to the structure, that's less than 3 
> uint64_t's, not that much if you ask me.
> But given the discussion we're having over patch 4/8, monitor_write_data 
> would indeed get larger if I revert that, so what I propose instead is 
> going for something in-between what was before and what I did after, 
> i.e. don't dynamically allocate arch_vm_event entirely, but don't 
> statically allocate monitor_write_data either. I propose turning 
> arch_vm_event into:
> 
> struct arch_vm_event {
>      uint32_t emulate_flags;
>      struct vm_event_emul_read_data *emul_read_data;
>      struct monitor_write_data *write_data;
> };
> 
> This way we can still selectively invalidate write_data from arch_vm_event.
> Sounds good?

Well, ideally I'd like there to remain a single pointer in struct vcpu.
What this points to and when that backing memory gets allocated
would then of less interest (i.e. perhaps you could allocate the
part that needs to remain active after disabling monitoring
independent from what can go away at that point). Systems
without any monitoring in use would then still pay no larger
penalty than they do today.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 3/8] x86/vm-event/monitor: relocate code-motion more appropriately
  2016-07-04 14:58       ` Jan Beulich
@ 2016-07-04 16:00         ` Corneliu ZUZU
  2016-07-04 16:12           ` Jan Beulich
  0 siblings, 1 reply; 59+ messages in thread
From: Corneliu ZUZU @ 2016-07-04 16:00 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Tamas K Lengyel, Razvan Cojocaru, Andrew Cooper,
	xen-devel, Jun Nakajima

On 7/4/2016 5:58 PM, Jan Beulich wrote:
>>>> On 04.07.16 at 13:02, <czuzu@bitdefender.com> wrote:
>> On 7/4/2016 1:22 PM, Jan Beulich wrote:
>>>>>> On 30.06.16 at 20:43, <czuzu@bitdefender.com> wrote:
>>>> @@ -119,6 +156,55 @@ bool_t monitored_msr(const struct domain *d, u32 msr)
>>>>        return test_bit(msr, bitmap);
>>>>    }
>>>>    
>>>> +static void write_ctrlreg_adjust_traps(struct domain *d)
>>>> +{
>>>> +    struct vcpu *v;
>>>> +    struct arch_vmx_struct *avmx;
>>>> +    unsigned int cr3_bitmask;
>>>> +    bool_t cr3_vmevent, cr3_ldexit;
>>>> +
>>>> +    /* Adjust CR3 load-exiting. */
>>>> +
>>>> +    /* vmx only */
>>>> +    ASSERT(cpu_has_vmx);
>>>> +
>>>> +    /* non-hap domains trap CR3 writes unconditionally */
>>>> +    if ( !paging_mode_hap(d) )
>>>> +    {
>>>> +        for_each_vcpu ( d, v )
>>>> +            ASSERT(v->arch.hvm_vmx.exec_control & CPU_BASED_CR3_LOAD_EXITING);
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    cr3_bitmask = monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3);
>>>> +    cr3_vmevent = !!(d->arch.monitor.write_ctrlreg_enabled & cr3_bitmask);
>>>> +
>>>> +    for_each_vcpu ( d, v )
>>>> +    {
>>>> +        avmx = &v->arch.hvm_vmx;
>>>> +        cr3_ldexit = !!(avmx->exec_control & CPU_BASED_CR3_LOAD_EXITING);
>>>> +
>>>> +        if ( cr3_vmevent == cr3_ldexit )
>>>> +            continue;
>>>> +
>>>> +        /*
>>>> +         * If CR0.PE=0, CR3 load exiting must remain enabled.
>>>> +         * See vmx_update_guest_cr code motion for cr = 0.
>>>> +         */
>>>> +        if ( cr3_ldexit && !hvm_paging_enabled(v) && !vmx_unrestricted_guest(v)
>>>> )
>>>> +            continue;
>>>> +
>>>> +        if ( cr3_vmevent )
>>>> +            avmx->exec_control |= CPU_BASED_CR3_LOAD_EXITING;
>>>> +        else
>>>> +            avmx->exec_control &= ~CPU_BASED_CR3_LOAD_EXITING;
>>>> +
>>>> +        vmx_vmcs_enter(v);
>>>> +        vmx_update_cpu_exec_control(v);
>>>> +        vmx_vmcs_exit(v);
>>>> +    }
>>>> +}
>>> While Razvan gave his ack already, I wonder whether it's really a
>>> good idea to put deeply VMX-specific code outside of a VMX-specific
>>> file.
>> Well, a summary of what this function does would sound like: "adjusts
>> CR3 load-exiting for cr-write monitor vm-events". IMHO that's (monitor)
>> vm-event specific enough to be placed within the vm-event subsystem.
>> Could you suggest concretely how this separation would look like? (where
>> to put this function/parts of it (and what parts), what name should it
>> have once moved).
> I won't go into that level of detail. Fact is that VMX-specific code
> should be kept out of here. Whether you move the entire function
> behind a hvm_funcs hook or just part of it is of no interest to me.
> In no case should, if and when SVM eventually gets supported for
> vm-event/monitor too, this function end up doing both VMX and SVM
> specific things.
>
> Jan

Why move it behind a hvm_funcs hook if it's only valid for VMX? SVM 
support is not currently implemented, hence the ASSERT(cpu_has_vmx) at 
the beginning of the function.
And of course if @ some point SVM support will be implemented then the 
right thing to do is what you say, i.e. make this function part of 
hvm_function_table, but until then I don't see why we should do that.
Note that arch_monitor_get_capabilities also returns no capability at 
the moment if !cpu_has_vmx.

What if I move the vmx-specific parts to vmx.c in a function called 
something like vmx_vm_event_update_cr3_traps() and call it from 
write_ctrlreg_adjust_traps instead?...

Corneliu.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 5/8] x86/vm-event/monitor: don't compromise monitor_write_data on domain cleanup
  2016-07-04 15:57                   ` Jan Beulich
@ 2016-07-04 16:09                     ` Corneliu ZUZU
  2016-07-04 16:16                       ` Jan Beulich
  0 siblings, 1 reply; 59+ messages in thread
From: Corneliu ZUZU @ 2016-07-04 16:09 UTC (permalink / raw)
  To: Jan Beulich, Razvan Cojocaru
  Cc: George Dunlap, Andrew Cooper, Paul Durrant, Tamas K Lengyel, xen-devel

On 7/4/2016 6:57 PM, Jan Beulich wrote:
>>>> On 04.07.16 at 17:21, <czuzu@bitdefender.com> wrote:
>> On 7/4/2016 6:07 PM, Jan Beulich wrote:
>>>>>> On 04.07.16 at 16:21, <rcojocaru@bitdefender.com> wrote:
>>>> On 07/04/16 17:17, Jan Beulich wrote:
>>>>>>>> On 04.07.16 at 15:50, <rcojocaru@bitdefender.com> wrote:
>>>>>> On 07/04/16 16:11, Jan Beulich wrote:
>>>>>>>>>> On 04.07.16 at 15:03, <czuzu@bitdefender.com> wrote:
>>>>>>>> On 7/4/2016 3:47 PM, Jan Beulich wrote:
>>>>>>>>>>>> On 30.06.16 at 20:45, <czuzu@bitdefender.com> wrote:
>>>>>>>>>> The arch_vm_event structure is dynamically allocated and freed @
>>>>>>>>>> vm_event_cleanup_domain. This cleanup is triggered e.g. when the toolstack
>>>>>> user
>>>>>>>>>> disables domain monitoring (xc_monitor_disable), which in turn effectively
>>>>>>>>>> discards any information that was in arch_vm_event.write_data.
>>>>>>>>> Isn't that rather a toolstack user bug, not warranting a relatively
>>>>>>>>> extensive (even if mostly mechanical) hypervisor change like this
>>>>>>>>> one? Sane monitor behavior, after all, is required anyway for the
>>>>>>>>> monitored guest to survive.
>>>>>>>> Sorry but could you please rephrase this, I don't quite understand what
>>>>>>>> you're saying.
>>>>>>>> The write_data field in arch_vm_event should _not ever_ be invalidated
>>>>>>>> as a direct result of a toolstack user's action.
>>>>>>> The monitoring app can cause all kinds of problems to the guest it
>>>>>>> monitors. Why would this specific one need taking care of in the
>>>>>>> hypervisor, instead of demanding that the app not disable monitoring
>>>>>>> at the wrong time?
>>>>>> I'm not sure there's a right time here. The problem is that, if I
>>>>>> understand this correctly, a race is possible between the moment the
>>>>>> userspace application responds to the vm_event _and_ call
>>>>>> xc_monitor_disable() and the time hvm_do_resume() gets called.
>>>>> It's that _and_ in your reply that I put under question, but I admit
>>>>> that questioning may be due to my limited (or should I say non-
>>>>> existent) knowledge on the user space parts here: Why would the
>>>>> app be _required_ to "responds to the vm_event _and_ call
>>>>> xc_monitor_disable()", rather than delaying the latter for long
>>>>> enough?
>>>> It's not required to do that, it's just that we don't really know what
>>>> "long enough means". I suppose a second should do be more than enough,
>>>> but obviously we'd prefer a fool-proof solution with better guarantees
>>>> and no lag - I thought that the hypervisor change is trivial enough to
>>>> not make the tradeoff into much of an issue.
>>> It's mostly mechanical indeed, but iirc it grows struct vcpu (or was
>>> it struct domain?), which is never really desirable (but admittedly
>>> often unavoidable). I'm therefore simply trying to understand what
>>> alternatives there are.
>> The change adds 20 bytes to the structure, that's less than 3
>> uint64_t's, not that much if you ask me.
>> But given the discussion we're having over patch 4/8, monitor_write_data
>> would indeed get larger if I revert that, so what I propose instead is
>> going for something in-between what was before and what I did after,
>> i.e. don't dynamically allocate arch_vm_event entirely, but don't
>> statically allocate monitor_write_data either. I propose turning
>> arch_vm_event into:
>>
>> struct arch_vm_event {
>>       uint32_t emulate_flags;
>>       struct vm_event_emul_read_data *emul_read_data;
>>       struct monitor_write_data *write_data;
>> };
>>
>> This way we can still selectively invalidate write_data from arch_vm_event.
>> Sounds good?
> Well, ideally I'd like there to remain a single pointer in struct vcpu.
> What this points to and when that backing memory gets allocated
> would then of less interest (i.e. perhaps you could allocate the
> part that needs to remain active after disabling monitoring
> independent from what can go away at that point). Systems
> without any monitoring in use would then still pay no larger
> penalty than they do today.
>
> Jan

Now that I think about it, that's feasible too. So then, make 
arch_vm_event be dynamically allocated as it was, but slightly change 
its definition to:

struct arch_vm_event {
      uint32_t emulate_flags;
      struct vm_event_emul_read_data *emul_read_data;
      struct monitor_write_data *write_data;
};

, allocate it when it was previously allocated along emul_read_data and 
write_data but don't ever deallocate it entirely, instead only 
deallocate  and reallocate selectively (emul_read_data) as needed, correct?
This way indeed there's no memory overhead in arch_vcpu compared to how 
it was before.

Corneliu.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 3/8] x86/vm-event/monitor: relocate code-motion more appropriately
  2016-07-04 16:00         ` Corneliu ZUZU
@ 2016-07-04 16:12           ` Jan Beulich
  0 siblings, 0 replies; 59+ messages in thread
From: Jan Beulich @ 2016-07-04 16:12 UTC (permalink / raw)
  To: Corneliu ZUZU
  Cc: Kevin Tian, Tamas K Lengyel, Razvan Cojocaru, Andrew Cooper,
	xen-devel, Jun Nakajima

>>> On 04.07.16 at 18:00, <czuzu@bitdefender.com> wrote:
> On 7/4/2016 5:58 PM, Jan Beulich wrote:
>>>>> On 04.07.16 at 13:02, <czuzu@bitdefender.com> wrote:
>>> On 7/4/2016 1:22 PM, Jan Beulich wrote:
>>>>>>> On 30.06.16 at 20:43, <czuzu@bitdefender.com> wrote:
>>>>> @@ -119,6 +156,55 @@ bool_t monitored_msr(const struct domain *d, u32 msr)
>>>>>        return test_bit(msr, bitmap);
>>>>>    }
>>>>>    
>>>>> +static void write_ctrlreg_adjust_traps(struct domain *d)
>>>>> +{
>>>>> +    struct vcpu *v;
>>>>> +    struct arch_vmx_struct *avmx;
>>>>> +    unsigned int cr3_bitmask;
>>>>> +    bool_t cr3_vmevent, cr3_ldexit;
>>>>> +
>>>>> +    /* Adjust CR3 load-exiting. */
>>>>> +
>>>>> +    /* vmx only */
>>>>> +    ASSERT(cpu_has_vmx);
>>>>> +
>>>>> +    /* non-hap domains trap CR3 writes unconditionally */
>>>>> +    if ( !paging_mode_hap(d) )
>>>>> +    {
>>>>> +        for_each_vcpu ( d, v )
>>>>> +            ASSERT(v->arch.hvm_vmx.exec_control & CPU_BASED_CR3_LOAD_EXITING);
>>>>> +        return;
>>>>> +    }
>>>>> +
>>>>> +    cr3_bitmask = monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3);
>>>>> +    cr3_vmevent = !!(d->arch.monitor.write_ctrlreg_enabled & cr3_bitmask);
>>>>> +
>>>>> +    for_each_vcpu ( d, v )
>>>>> +    {
>>>>> +        avmx = &v->arch.hvm_vmx;
>>>>> +        cr3_ldexit = !!(avmx->exec_control & CPU_BASED_CR3_LOAD_EXITING);
>>>>> +
>>>>> +        if ( cr3_vmevent == cr3_ldexit )
>>>>> +            continue;
>>>>> +
>>>>> +        /*
>>>>> +         * If CR0.PE=0, CR3 load exiting must remain enabled.
>>>>> +         * See vmx_update_guest_cr code motion for cr = 0.
>>>>> +         */
>>>>> +        if ( cr3_ldexit && !hvm_paging_enabled(v) && !vmx_unrestricted_guest(v)
>>>>> )
>>>>> +            continue;
>>>>> +
>>>>> +        if ( cr3_vmevent )
>>>>> +            avmx->exec_control |= CPU_BASED_CR3_LOAD_EXITING;
>>>>> +        else
>>>>> +            avmx->exec_control &= ~CPU_BASED_CR3_LOAD_EXITING;
>>>>> +
>>>>> +        vmx_vmcs_enter(v);
>>>>> +        vmx_update_cpu_exec_control(v);
>>>>> +        vmx_vmcs_exit(v);
>>>>> +    }
>>>>> +}
>>>> While Razvan gave his ack already, I wonder whether it's really a
>>>> good idea to put deeply VMX-specific code outside of a VMX-specific
>>>> file.
>>> Well, a summary of what this function does would sound like: "adjusts
>>> CR3 load-exiting for cr-write monitor vm-events". IMHO that's (monitor)
>>> vm-event specific enough to be placed within the vm-event subsystem.
>>> Could you suggest concretely how this separation would look like? (where
>>> to put this function/parts of it (and what parts), what name should it
>>> have once moved).
>> I won't go into that level of detail. Fact is that VMX-specific code
>> should be kept out of here. Whether you move the entire function
>> behind a hvm_funcs hook or just part of it is of no interest to me.
>> In no case should, if and when SVM eventually gets supported for
>> vm-event/monitor too, this function end up doing both VMX and SVM
>> specific things.
> 
> Why move it behind a hvm_funcs hook if it's only valid for VMX? SVM 
> support is not currently implemented, hence the ASSERT(cpu_has_vmx) at 
> the beginning of the function.
> And of course if @ some point SVM support will be implemented then the 
> right thing to do is what you say, i.e. make this function part of 
> hvm_function_table, but until then I don't see why we should do that.
> Note that arch_monitor_get_capabilities also returns no capability at 
> the moment if !cpu_has_vmx.
> 
> What if I move the vmx-specific parts to vmx.c in a function called 
> something like vmx_vm_event_update_cr3_traps() and call it from 
> write_ctrlreg_adjust_traps instead?...

That would be fine with me.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 5/8] x86/vm-event/monitor: don't compromise monitor_write_data on domain cleanup
  2016-07-04 16:09                     ` Corneliu ZUZU
@ 2016-07-04 16:16                       ` Jan Beulich
  2016-07-04 16:26                         ` Corneliu ZUZU
  0 siblings, 1 reply; 59+ messages in thread
From: Jan Beulich @ 2016-07-04 16:16 UTC (permalink / raw)
  To: Corneliu ZUZU
  Cc: Tamas K Lengyel, Razvan Cojocaru, George Dunlap, Andrew Cooper,
	xen-devel, Paul Durrant

>>> On 04.07.16 at 18:09, <czuzu@bitdefender.com> wrote:
> Now that I think about it, that's feasible too. So then, make 
> arch_vm_event be dynamically allocated as it was, but slightly change 
> its definition to:
> 
> struct arch_vm_event {
>       uint32_t emulate_flags;
>       struct vm_event_emul_read_data *emul_read_data;
>       struct monitor_write_data *write_data;
> };
> 
> , allocate it when it was previously allocated along emul_read_data and 
> write_data but don't ever deallocate it entirely, instead only 
> deallocate  and reallocate selectively (emul_read_data) as needed, correct?

Yes, if that also fits you.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 5/8] x86/vm-event/monitor: don't compromise monitor_write_data on domain cleanup
  2016-07-04 16:16                       ` Jan Beulich
@ 2016-07-04 16:26                         ` Corneliu ZUZU
  0 siblings, 0 replies; 59+ messages in thread
From: Corneliu ZUZU @ 2016-07-04 16:26 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tamas K Lengyel, Razvan Cojocaru, George Dunlap, Andrew Cooper,
	xen-devel, Paul Durrant

On 7/4/2016 7:16 PM, Jan Beulich wrote:
>>>> On 04.07.16 at 18:09, <czuzu@bitdefender.com> wrote:
>> Now that I think about it, that's feasible too. So then, make
>> arch_vm_event be dynamically allocated as it was, but slightly change
>> its definition to:
>>
>> struct arch_vm_event {
>>        uint32_t emulate_flags;
>>        struct vm_event_emul_read_data *emul_read_data;
>>        struct monitor_write_data *write_data;
>> };
>>
>> , allocate it when it was previously allocated along emul_read_data and
>> write_data but don't ever deallocate it entirely, instead only
>> deallocate  and reallocate selectively (emul_read_data) as needed, correct?
> Yes, if that also fits you.
>
> Jan

Heh, within the above scheme I only now realized it would make no sense 
to have write_data dynamically allocated.
With that settled, thanks for the great feedback.

Corneliu.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2016-07-04 16:26 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-30 18:40 [PATCH 0/8] x86/vm-event: Adjustments & fixes Corneliu ZUZU
2016-06-30 18:41 ` [PATCH 1/8] x86/vm-event: proper vCPU-paused checks at resume Corneliu ZUZU
2016-07-01  7:14   ` Razvan Cojocaru
2016-07-01 16:51   ` Tamas K Lengyel
2016-06-30 18:42 ` [PATCH 2/8] x86/vmx_update_guest_cr: minor optimization Corneliu ZUZU
2016-06-30 18:43 ` [PATCH 3/8] x86/vm-event/monitor: relocate code-motion more appropriately Corneliu ZUZU
2016-07-01  7:54   ` Razvan Cojocaru
2016-07-04 10:22   ` Jan Beulich
2016-07-04 11:02     ` Corneliu ZUZU
2016-07-04 14:58       ` Jan Beulich
2016-07-04 16:00         ` Corneliu ZUZU
2016-07-04 16:12           ` Jan Beulich
2016-07-04 13:22     ` Razvan Cojocaru
2016-07-04 14:05       ` Jan Beulich
2016-07-04 14:16         ` Razvan Cojocaru
2016-07-04 14:35           ` Corneliu ZUZU
2016-06-30 18:44 ` [PATCH 4/8] x86/vm-event/monitor: turn monitor_write_data.do_write into enum Corneliu ZUZU
2016-07-01  6:53   ` Razvan Cojocaru
2016-07-01  6:59     ` Corneliu ZUZU
2016-07-04 12:37   ` Jan Beulich
2016-07-04 12:47     ` Corneliu ZUZU
2016-07-04 13:07       ` Jan Beulich
2016-07-04 13:21         ` Corneliu ZUZU
2016-07-04 14:08           ` Jan Beulich
2016-07-04 14:23             ` Razvan Cojocaru
2016-07-04 14:24             ` Corneliu ZUZU
2016-07-04 15:03               ` Jan Beulich
2016-07-04 15:10                 ` Corneliu ZUZU
2016-06-30 18:45 ` [PATCH 5/8] x86/vm-event/monitor: don't compromise monitor_write_data on domain cleanup Corneliu ZUZU
2016-07-01  6:47   ` Razvan Cojocaru
2016-07-01  6:56     ` Corneliu ZUZU
2016-07-01  6:59       ` Razvan Cojocaru
2016-07-04 12:47   ` Jan Beulich
2016-07-04 13:03     ` Corneliu ZUZU
2016-07-04 13:11       ` Jan Beulich
2016-07-04 13:28         ` Corneliu ZUZU
2016-07-04 14:13           ` Jan Beulich
2016-07-04 14:31             ` Corneliu ZUZU
2016-07-04 15:09               ` Jan Beulich
2016-07-04 15:24                 ` Corneliu ZUZU
2016-07-04 13:50         ` Razvan Cojocaru
2016-07-04 14:05           ` Corneliu ZUZU
2016-07-04 14:17           ` Jan Beulich
2016-07-04 14:21             ` Razvan Cojocaru
2016-07-04 15:07               ` Jan Beulich
2016-07-04 15:21                 ` Corneliu ZUZU
2016-07-04 15:57                   ` Jan Beulich
2016-07-04 16:09                     ` Corneliu ZUZU
2016-07-04 16:16                       ` Jan Beulich
2016-07-04 16:26                         ` Corneliu ZUZU
2016-07-04 14:45             ` Corneliu ZUZU
2016-06-30 18:46 ` [PATCH 6/8] x86/vm_event_resume: surround VM_EVENT_REASON_MOV_TO_MSR w/ CONFIG_X86 Corneliu ZUZU
2016-07-01  7:02   ` Razvan Cojocaru
2016-06-30 18:47 ` [PATCH 7/8] minor fixes (formatting, comments, unused includes etc.) Corneliu ZUZU
2016-06-30 18:47 ` [PATCH 8/8] minor #include change Corneliu ZUZU
2016-07-01  6:56   ` Razvan Cojocaru
2016-07-01  7:02     ` Corneliu ZUZU
2016-07-01  7:31       ` Jan Beulich
2016-07-01  7:44         ` Corneliu ZUZU

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.