All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] x86/monitor: More generic livelock avoidance
@ 2018-10-23 14:35 Andrew Cooper
  2018-10-23 14:35 ` [PATCH 1/2] x86/monitor: Introduce a boolean to suppress nested monitoring events Andrew Cooper
  2018-10-23 14:35 ` [PATCH 2/2] x86/hvm: Drop the may_defer boolean from hvm_* helpers Andrew Cooper
  0 siblings, 2 replies; 8+ messages in thread
From: Andrew Cooper @ 2018-10-23 14:35 UTC (permalink / raw)
  To: Xen-devel
  Cc: Kevin Tian, Tamas K Lengyel, Wei Liu, Jan Beulich,
	Razvan Cojocaru, Andrew Cooper, Jun Nakajima, Boris Ostrovsky,
	Brian Woods, Suravee Suthikulpanit

This is another split-out part of of the v1 debugging series.  It is semi-RFC
as it as only had extremely light testing.

Andrew Cooper (2):
  x86/monitor: Introduce a boolean to suppress nested monitoring events
  x86/hvm: Drop the may_defer boolean from hvm_* helpers

 xen/arch/x86/hvm/emulate.c        |  8 ++++----
 xen/arch/x86/hvm/hvm.c            | 39 +++++++++++++++++++++++----------------
 xen/arch/x86/hvm/monitor.c        | 24 ++++++++++++++++++++++--
 xen/arch/x86/hvm/svm/nestedsvm.c  | 14 +++++++-------
 xen/arch/x86/hvm/svm/svm.c        |  2 +-
 xen/arch/x86/hvm/vm_event.c       |  9 ++++-----
 xen/arch/x86/hvm/vmx/vmx.c        |  4 ++--
 xen/arch/x86/hvm/vmx/vvmx.c       | 16 ++++++++--------
 xen/include/asm-x86/hvm/support.h |  8 ++++----
 xen/include/xen/sched.h           |  8 ++++++++
 10 files changed, 83 insertions(+), 49 deletions(-)

-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] x86/monitor: Introduce a boolean to suppress nested monitoring events
  2018-10-23 14:35 [PATCH 0/2] x86/monitor: More generic livelock avoidance Andrew Cooper
@ 2018-10-23 14:35 ` Andrew Cooper
  2018-10-23 14:54   ` Razvan Cojocaru
  2018-10-23 14:35 ` [PATCH 2/2] x86/hvm: Drop the may_defer boolean from hvm_* helpers Andrew Cooper
  1 sibling, 1 reply; 8+ messages in thread
From: Andrew Cooper @ 2018-10-23 14:35 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, Tamas K Lengyel, Wei Liu, Jan Beulich, Razvan Cojocaru

When applying vm_event actions, monitoring events can nest and effectively
livelock the vcpu.  Introduce a boolean which is set for a specific portion of
the hvm_do_resume() path, which causes the hvm_monitor_* helpers to exit
early.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Razvan Cojocaru <rcojocaru@bitdefender.com>
CC: Tamas K Lengyel <tamas@tklengyel.com>
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wei.liu2@citrix.com>

RFC - This probably wants wiring up on ARM as well, but all I see is
monitor_smc() and no equivalent parts to hvm_do_resume() where we may inject
an exception from a vm_event.  Should this be included in non-x86 monitor
functions for consistency at this point?
---
 xen/arch/x86/hvm/hvm.c     |  8 ++++++++
 xen/arch/x86/hvm/monitor.c | 24 ++++++++++++++++++++++--
 xen/include/xen/sched.h    |  8 ++++++++
 3 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 9bc8078..4b4d9d6 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -512,6 +512,12 @@ void hvm_do_resume(struct vcpu *v)
     if ( !handle_hvm_io_completion(v) )
         return;
 
+    /*
+     * We are about to apply actions requested by the introspection
+     * agent.  Don't trigger further monitoring.
+     */
+    v->monitor.suppress = true;
+
     if ( unlikely(v->arch.vm_event) )
         hvm_vm_event_do_resume(v);
 
@@ -526,6 +532,8 @@ void hvm_do_resume(struct vcpu *v)
         v->arch.hvm.inject_event.vector = HVM_EVENT_VECTOR_UNSET;
     }
 
+    v->monitor.suppress = false;
+
     if ( unlikely(v->arch.vm_event) && v->arch.monitor.next_interrupt_enabled )
     {
         struct x86_event info;
diff --git a/xen/arch/x86/hvm/monitor.c b/xen/arch/x86/hvm/monitor.c
index 2a41ccc..f1a196f 100644
--- a/xen/arch/x86/hvm/monitor.c
+++ b/xen/arch/x86/hvm/monitor.c
@@ -36,6 +36,9 @@ bool hvm_monitor_cr(unsigned int index, unsigned long value, unsigned long old)
     struct arch_domain *ad = &curr->domain->arch;
     unsigned int ctrlreg_bitmask = monitor_ctrlreg_bitmask(index);
 
+    if ( curr->monitor.suppress )
+        return 0;
+
     if ( index == VM_EVENT_X86_CR3 && hvm_pcid_enabled(curr) )
         value &= ~X86_CR3_NOFLUSH; /* Clear the noflush bit. */
 
@@ -73,6 +76,9 @@ bool hvm_monitor_emul_unimplemented(void)
         .vcpu_id  = curr->vcpu_id,
     };
 
+    if ( curr->monitor.suppress )
+        return false;
+
     return curr->domain->arch.monitor.emul_unimplemented_enabled &&
         monitor_traps(curr, true, &req) == 1;
 }
@@ -81,6 +87,9 @@ void hvm_monitor_msr(unsigned int msr, uint64_t new_value, uint64_t old_value)
 {
     struct vcpu *curr = current;
 
+    if ( curr->monitor.suppress )
+        return;
+
     if ( monitored_msr(curr->domain, msr) &&
          (!monitored_msr_onchangeonly(curr->domain, msr) ||
            new_value != old_value) )
@@ -100,12 +109,16 @@ void hvm_monitor_descriptor_access(uint64_t exit_info,
                                    uint64_t vmx_exit_qualification,
                                    uint8_t descriptor, bool is_write)
 {
+    struct vcpu *curr = current;
     vm_event_request_t req = {
         .reason = VM_EVENT_REASON_DESCRIPTOR_ACCESS,
         .u.desc_access.descriptor = descriptor,
         .u.desc_access.is_write = is_write,
     };
 
+    if ( curr->monitor.suppress )
+        return;
+
     if ( cpu_has_vmx )
     {
         req.u.desc_access.arch.vmx.instr_info = exit_info;
@@ -116,7 +129,7 @@ void hvm_monitor_descriptor_access(uint64_t exit_info,
         req.u.desc_access.arch.svm.exitinfo = exit_info;
     }
 
-    monitor_traps(current, true, &req);
+    monitor_traps(curr, true, &req);
 }
 
 static inline unsigned long gfn_of_rip(unsigned long rip)
@@ -146,6 +159,9 @@ int hvm_monitor_debug(unsigned long rip, enum hvm_monitor_debug_type type,
     vm_event_request_t req = {};
     bool sync;
 
+    if ( curr->monitor.suppress )
+        return 0;
+
     switch ( type )
     {
     case HVM_MONITOR_SOFTWARE_BREAKPOINT:
@@ -204,6 +220,7 @@ int hvm_monitor_cpuid(unsigned long insn_length, unsigned int leaf,
 void hvm_monitor_interrupt(unsigned int vector, unsigned int type,
                            unsigned int err, uint64_t cr2)
 {
+    struct vcpu *curr = current;
     vm_event_request_t req = {
         .reason = VM_EVENT_REASON_INTERRUPT,
         .u.interrupt.x86.vector = vector,
@@ -212,7 +229,10 @@ void hvm_monitor_interrupt(unsigned int vector, unsigned int type,
         .u.interrupt.x86.cr2 = cr2,
     };
 
-    monitor_traps(current, 1, &req);
+    if ( curr->monitor.suppress )
+        return;
+
+    monitor_traps(curr, 1, &req);
 }
 
 /*
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 3171eab..b32089a 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -268,6 +268,14 @@ struct vcpu
 
     struct evtchn_fifo_vcpu *evtchn_fifo;
 
+    struct {
+        /*
+         * Suppress monitoring events.  Used to prevent vm_event-generated
+         * actions causing new monitoring events, and livelock the vcpu.
+         */
+        bool suppress;
+    } monitor;
+
     /* vPCI per-vCPU area, used to store data for long running operations. */
     struct vpci_vcpu vpci;
 
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] x86/hvm: Drop the may_defer boolean from hvm_* helpers
  2018-10-23 14:35 [PATCH 0/2] x86/monitor: More generic livelock avoidance Andrew Cooper
  2018-10-23 14:35 ` [PATCH 1/2] x86/monitor: Introduce a boolean to suppress nested monitoring events Andrew Cooper
@ 2018-10-23 14:35 ` Andrew Cooper
  2018-10-23 15:24   ` Razvan Cojocaru
  1 sibling, 1 reply; 8+ messages in thread
From: Andrew Cooper @ 2018-10-23 14:35 UTC (permalink / raw)
  To: Xen-devel
  Cc: Kevin Tian, Tamas K Lengyel, Wei Liu, Jan Beulich,
	Razvan Cojocaru, Andrew Cooper, Jun Nakajima, Boris Ostrovsky,
	Brian Woods, Suravee Suthikulpanit

The may_defer booleans were introduced with the monitor infrastructure, but
their purpose is not obvious and not described anywhere.

They exist to avoid triggering nested monitoring events from introspection
activities, but with the introduction of the general monitor.suppress
infrastructure, they are no longer needed.  Drop them.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Razvan Cojocaru <rcojocaru@bitdefender.com>
CC: Tamas K Lengyel <tamas@tklengyel.com>
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Jun Nakajima <jun.nakajima@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
CC: Boris Ostrovsky <boris.ostrovsky@oracle.com>
CC: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
CC: Brian Woods <brian.woods@amd.com>
---
 xen/arch/x86/hvm/emulate.c        |  8 ++++----
 xen/arch/x86/hvm/hvm.c            | 31 +++++++++++++++----------------
 xen/arch/x86/hvm/svm/nestedsvm.c  | 14 +++++++-------
 xen/arch/x86/hvm/svm/svm.c        |  2 +-
 xen/arch/x86/hvm/vm_event.c       |  9 ++++-----
 xen/arch/x86/hvm/vmx/vmx.c        |  4 ++--
 xen/arch/x86/hvm/vmx/vvmx.c       | 16 ++++++++--------
 xen/include/asm-x86/hvm/support.h |  8 ++++----
 8 files changed, 45 insertions(+), 47 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index cd1d9a7..43f18c2 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -2024,7 +2024,7 @@ static int hvmemul_write_cr(
     switch ( reg )
     {
     case 0:
-        rc = hvm_set_cr0(val, 1);
+        rc = hvm_set_cr0(val);
         break;
 
     case 2:
@@ -2033,11 +2033,11 @@ static int hvmemul_write_cr(
         break;
 
     case 3:
-        rc = hvm_set_cr3(val, 1);
+        rc = hvm_set_cr3(val);
         break;
 
     case 4:
-        rc = hvm_set_cr4(val, 1);
+        rc = hvm_set_cr4(val);
         break;
 
     default:
@@ -2092,7 +2092,7 @@ static int hvmemul_write_msr(
     uint64_t val,
     struct x86_emulate_ctxt *ctxt)
 {
-    int rc = hvm_msr_write_intercept(reg, val, 1);
+    int rc = hvm_msr_write_intercept(reg, val);
 
     if ( rc == X86EMUL_EXCEPTION )
         x86_emul_hw_exception(TRAP_gp_fault, 0, ctxt);
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 4b4d9d6..296b967 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -2046,15 +2046,15 @@ int hvm_mov_to_cr(unsigned int cr, unsigned int gpr)
     switch ( cr )
     {
     case 0:
-        rc = hvm_set_cr0(val, 1);
+        rc = hvm_set_cr0(val);
         break;
 
     case 3:
-        rc = hvm_set_cr3(val, 1);
+        rc = hvm_set_cr3(val);
         break;
 
     case 4:
-        rc = hvm_set_cr4(val, 1);
+        rc = hvm_set_cr4(val);
         break;
 
     case 8:
@@ -2150,7 +2150,7 @@ static void hvm_update_cr(struct vcpu *v, unsigned int cr, unsigned long value)
     hvm_update_guest_cr(v, cr);
 }
 
-int hvm_set_cr0(unsigned long value, bool_t may_defer)
+int hvm_set_cr0(unsigned long value)
 {
     struct vcpu *v = current;
     struct domain *d = v->domain;
@@ -2176,8 +2176,8 @@ int hvm_set_cr0(unsigned long value, bool_t may_defer)
          (value & (X86_CR0_PE | X86_CR0_PG)) == X86_CR0_PG )
         return X86EMUL_EXCEPTION;
 
-    if ( may_defer && unlikely(v->domain->arch.monitor.write_ctrlreg_enabled &
-                               monitor_ctrlreg_bitmask(VM_EVENT_X86_CR0)) )
+    if ( unlikely(v->domain->arch.monitor.write_ctrlreg_enabled &
+                  monitor_ctrlreg_bitmask(VM_EVENT_X86_CR0)) )
     {
         ASSERT(v->arch.vm_event);
 
@@ -2268,15 +2268,15 @@ int hvm_set_cr0(unsigned long value, bool_t may_defer)
     return X86EMUL_OKAY;
 }
 
-int hvm_set_cr3(unsigned long value, bool_t may_defer)
+int hvm_set_cr3(unsigned long value)
 {
     struct vcpu *v = current;
     struct page_info *page;
     unsigned long old = v->arch.hvm.guest_cr[3];
     bool noflush = false;
 
-    if ( may_defer && unlikely(v->domain->arch.monitor.write_ctrlreg_enabled &
-                               monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3)) )
+    if ( unlikely(v->domain->arch.monitor.write_ctrlreg_enabled &
+                  monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3)) )
     {
         ASSERT(v->arch.vm_event);
 
@@ -2322,7 +2322,7 @@ int hvm_set_cr3(unsigned long value, bool_t may_defer)
     return X86EMUL_UNHANDLEABLE;
 }
 
-int hvm_set_cr4(unsigned long value, bool_t may_defer)
+int hvm_set_cr4(unsigned long value)
 {
     struct vcpu *v = current;
     unsigned long old_cr;
@@ -2356,8 +2356,8 @@ int hvm_set_cr4(unsigned long value, bool_t may_defer)
         return X86EMUL_EXCEPTION;
     }
 
-    if ( may_defer && unlikely(v->domain->arch.monitor.write_ctrlreg_enabled &
-                               monitor_ctrlreg_bitmask(VM_EVENT_X86_CR4)) )
+    if ( unlikely(v->domain->arch.monitor.write_ctrlreg_enabled &
+                  monitor_ctrlreg_bitmask(VM_EVENT_X86_CR4)) )
     {
         ASSERT(v->arch.vm_event);
 
@@ -2989,7 +2989,7 @@ void hvm_task_switch(
     if ( task_switch_load_seg(x86_seg_ldtr, tss.ldt, new_cpl, 0) )
         goto out;
 
-    rc = hvm_set_cr3(tss.cr3, 1);
+    rc = hvm_set_cr3(tss.cr3);
     if ( rc == X86EMUL_EXCEPTION )
         hvm_inject_hw_exception(TRAP_gp_fault, 0);
     if ( rc != X86EMUL_OKAY )
@@ -3497,8 +3497,7 @@ int hvm_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
     goto out;
 }
 
-int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content,
-                            bool may_defer)
+int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content)
 {
     struct vcpu *v = current;
     struct domain *d = v->domain;
@@ -3507,7 +3506,7 @@ int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content,
     HVMTRACE_3D(MSR_WRITE, msr,
                (uint32_t)msr_content, (uint32_t)(msr_content >> 32));
 
-    if ( may_defer && unlikely(monitored_msr(v->domain, msr)) )
+    if ( unlikely(monitored_msr(v->domain, msr)) )
     {
         uint64_t msr_old_content;
 
diff --git a/xen/arch/x86/hvm/svm/nestedsvm.c b/xen/arch/x86/hvm/svm/nestedsvm.c
index 78a1016..399ff05 100644
--- a/xen/arch/x86/hvm/svm/nestedsvm.c
+++ b/xen/arch/x86/hvm/svm/nestedsvm.c
@@ -285,7 +285,7 @@ static int nsvm_vcpu_hostrestore(struct vcpu *v, struct cpu_user_regs *regs)
 
     /* CR4 */
     v->arch.hvm.guest_cr[4] = n1vmcb->_cr4;
-    rc = hvm_set_cr4(n1vmcb->_cr4, 1);
+    rc = hvm_set_cr4(n1vmcb->_cr4);
     if ( rc == X86EMUL_EXCEPTION )
         hvm_inject_hw_exception(TRAP_gp_fault, 0);
     if (rc != X86EMUL_OKAY)
@@ -296,7 +296,7 @@ static int nsvm_vcpu_hostrestore(struct vcpu *v, struct cpu_user_regs *regs)
         svm->ns_cr0, v->arch.hvm.guest_cr[0]);
     v->arch.hvm.guest_cr[0] = n1vmcb->_cr0 | X86_CR0_PE;
     n1vmcb->rflags &= ~X86_EFLAGS_VM;
-    rc = hvm_set_cr0(n1vmcb->_cr0 | X86_CR0_PE, 1);
+    rc = hvm_set_cr0(n1vmcb->_cr0 | X86_CR0_PE);
     if ( rc == X86EMUL_EXCEPTION )
         hvm_inject_hw_exception(TRAP_gp_fault, 0);
     if (rc != X86EMUL_OKAY)
@@ -324,7 +324,7 @@ static int nsvm_vcpu_hostrestore(struct vcpu *v, struct cpu_user_regs *regs)
         v->arch.guest_table = pagetable_null();
         /* hvm_set_cr3() below sets v->arch.hvm.guest_cr[3] for us. */
     }
-    rc = hvm_set_cr3(n1vmcb->_cr3, 1);
+    rc = hvm_set_cr3(n1vmcb->_cr3);
     if ( rc == X86EMUL_EXCEPTION )
         hvm_inject_hw_exception(TRAP_gp_fault, 0);
     if (rc != X86EMUL_OKAY)
@@ -556,7 +556,7 @@ static int nsvm_vmcb_prepare4vmrun(struct vcpu *v, struct cpu_user_regs *regs)
 
     /* CR4 */
     v->arch.hvm.guest_cr[4] = ns_vmcb->_cr4;
-    rc = hvm_set_cr4(ns_vmcb->_cr4, 1);
+    rc = hvm_set_cr4(ns_vmcb->_cr4);
     if ( rc == X86EMUL_EXCEPTION )
         hvm_inject_hw_exception(TRAP_gp_fault, 0);
     if (rc != X86EMUL_OKAY)
@@ -566,7 +566,7 @@ static int nsvm_vmcb_prepare4vmrun(struct vcpu *v, struct cpu_user_regs *regs)
     svm->ns_cr0 = v->arch.hvm.guest_cr[0];
     cr0 = nestedsvm_fpu_vmentry(svm->ns_cr0, ns_vmcb, n1vmcb, n2vmcb);
     v->arch.hvm.guest_cr[0] = ns_vmcb->_cr0;
-    rc = hvm_set_cr0(cr0, 1);
+    rc = hvm_set_cr0(cr0);
     if ( rc == X86EMUL_EXCEPTION )
         hvm_inject_hw_exception(TRAP_gp_fault, 0);
     if (rc != X86EMUL_OKAY)
@@ -584,7 +584,7 @@ static int nsvm_vmcb_prepare4vmrun(struct vcpu *v, struct cpu_user_regs *regs)
         nestedsvm_vmcb_set_nestedp2m(v, ns_vmcb, n2vmcb);
 
         /* hvm_set_cr3() below sets v->arch.hvm.guest_cr[3] for us. */
-        rc = hvm_set_cr3(ns_vmcb->_cr3, 1);
+        rc = hvm_set_cr3(ns_vmcb->_cr3);
         if ( rc == X86EMUL_EXCEPTION )
             hvm_inject_hw_exception(TRAP_gp_fault, 0);
         if (rc != X86EMUL_OKAY)
@@ -598,7 +598,7 @@ static int nsvm_vmcb_prepare4vmrun(struct vcpu *v, struct cpu_user_regs *regs)
          * we assume it intercepts page faults.
          */
         /* hvm_set_cr3() below sets v->arch.hvm.guest_cr[3] for us. */
-        rc = hvm_set_cr3(ns_vmcb->_cr3, 1);
+        rc = hvm_set_cr3(ns_vmcb->_cr3);
         if ( rc == X86EMUL_EXCEPTION )
             hvm_inject_hw_exception(TRAP_gp_fault, 0);
         if (rc != X86EMUL_OKAY)
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 41427e7..80bb8fd 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -2333,7 +2333,7 @@ static void svm_do_msr_access(struct cpu_user_regs *regs)
             msr_split(regs, msr_content);
     }
     else
-        rc = hvm_msr_write_intercept(regs->ecx, msr_fold(regs), 1);
+        rc = hvm_msr_write_intercept(regs->ecx, msr_fold(regs));
 
     if ( rc == X86EMUL_OKAY )
         __update_guest_eip(regs, inst_len);
diff --git a/xen/arch/x86/hvm/vm_event.c b/xen/arch/x86/hvm/vm_event.c
index 28d08a6..0ae6a0f 100644
--- a/xen/arch/x86/hvm/vm_event.c
+++ b/xen/arch/x86/hvm/vm_event.c
@@ -94,7 +94,7 @@ void hvm_vm_event_do_resume(struct vcpu *v)
 
     if ( unlikely(w->do_write.cr0) )
     {
-        if ( hvm_set_cr0(w->cr0, 0) == X86EMUL_EXCEPTION )
+        if ( hvm_set_cr0(w->cr0) == X86EMUL_EXCEPTION )
             hvm_inject_hw_exception(TRAP_gp_fault, 0);
 
         w->do_write.cr0 = 0;
@@ -102,7 +102,7 @@ void hvm_vm_event_do_resume(struct vcpu *v)
 
     if ( unlikely(w->do_write.cr4) )
     {
-        if ( hvm_set_cr4(w->cr4, 0) == X86EMUL_EXCEPTION )
+        if ( hvm_set_cr4(w->cr4) == X86EMUL_EXCEPTION )
             hvm_inject_hw_exception(TRAP_gp_fault, 0);
 
         w->do_write.cr4 = 0;
@@ -110,7 +110,7 @@ void hvm_vm_event_do_resume(struct vcpu *v)
 
     if ( unlikely(w->do_write.cr3) )
     {
-        if ( hvm_set_cr3(w->cr3, 0) == X86EMUL_EXCEPTION )
+        if ( hvm_set_cr3(w->cr3) == X86EMUL_EXCEPTION )
             hvm_inject_hw_exception(TRAP_gp_fault, 0);
 
         w->do_write.cr3 = 0;
@@ -118,8 +118,7 @@ void hvm_vm_event_do_resume(struct vcpu *v)
 
     if ( unlikely(w->do_write.msr) )
     {
-        if ( hvm_msr_write_intercept(w->msr, w->value, 0) ==
-             X86EMUL_EXCEPTION )
+        if ( hvm_msr_write_intercept(w->msr, w->value) == X86EMUL_EXCEPTION )
             hvm_inject_hw_exception(TRAP_gp_fault, 0);
 
         w->do_write.msr = 0;
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 46f51c3..3c39bf3 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2654,7 +2654,7 @@ static int vmx_cr_access(cr_access_qual_t qual)
                  (X86_CR0_PE|X86_CR0_MP|X86_CR0_EM|X86_CR0_TS));
         HVMTRACE_LONG_1D(LMSW, value);
 
-        if ( (rc = hvm_set_cr0(value, 1)) == X86EMUL_EXCEPTION )
+        if ( (rc = hvm_set_cr0(value)) == X86EMUL_EXCEPTION )
             hvm_inject_hw_exception(TRAP_gp_fault, 0);
 
         return rc;
@@ -3990,7 +3990,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
     }
 
     case EXIT_REASON_MSR_WRITE:
-        switch ( hvm_msr_write_intercept(regs->ecx, msr_fold(regs), 1) )
+        switch ( hvm_msr_write_intercept(regs->ecx, msr_fold(regs)) )
         {
         case X86EMUL_OKAY:
             update_guest_eip(); /* Safe: WRMSR */
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 0e45db8..19f925e 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -1059,15 +1059,15 @@ static void load_shadow_guest_state(struct vcpu *v)
     nvcpu->guest_cr[0] = get_vvmcs(v, CR0_READ_SHADOW);
     nvcpu->guest_cr[4] = get_vvmcs(v, CR4_READ_SHADOW);
 
-    rc = hvm_set_cr0(get_vvmcs(v, GUEST_CR0), 1);
+    rc = hvm_set_cr0(get_vvmcs(v, GUEST_CR0));
     if ( rc == X86EMUL_EXCEPTION )
         hvm_inject_hw_exception(TRAP_gp_fault, 0);
 
-    rc = hvm_set_cr4(get_vvmcs(v, GUEST_CR4), 1);
+    rc = hvm_set_cr4(get_vvmcs(v, GUEST_CR4));
     if ( rc == X86EMUL_EXCEPTION )
         hvm_inject_hw_exception(TRAP_gp_fault, 0);
 
-    rc = hvm_set_cr3(get_vvmcs(v, GUEST_CR3), 1);
+    rc = hvm_set_cr3(get_vvmcs(v, GUEST_CR3));
     if ( rc == X86EMUL_EXCEPTION )
         hvm_inject_hw_exception(TRAP_gp_fault, 0);
 
@@ -1077,7 +1077,7 @@ static void load_shadow_guest_state(struct vcpu *v)
     if ( control & VM_ENTRY_LOAD_PERF_GLOBAL_CTRL )
     {
         rc = hvm_msr_write_intercept(MSR_CORE_PERF_GLOBAL_CTRL,
-                                     get_vvmcs(v, GUEST_PERF_GLOBAL_CTRL), 0);
+                                     get_vvmcs(v, GUEST_PERF_GLOBAL_CTRL));
         if ( rc == X86EMUL_EXCEPTION )
             hvm_inject_hw_exception(TRAP_gp_fault, 0);
     }
@@ -1265,15 +1265,15 @@ static void load_vvmcs_host_state(struct vcpu *v)
         __vmwrite(vmcs_h2g_field[i].guest_field, r);
     }
 
-    rc = hvm_set_cr0(get_vvmcs(v, HOST_CR0), 1);
+    rc = hvm_set_cr0(get_vvmcs(v, HOST_CR0));
     if ( rc == X86EMUL_EXCEPTION )
         hvm_inject_hw_exception(TRAP_gp_fault, 0);
 
-    rc = hvm_set_cr4(get_vvmcs(v, HOST_CR4), 1);
+    rc = hvm_set_cr4(get_vvmcs(v, HOST_CR4));
     if ( rc == X86EMUL_EXCEPTION )
         hvm_inject_hw_exception(TRAP_gp_fault, 0);
 
-    rc = hvm_set_cr3(get_vvmcs(v, HOST_CR3), 1);
+    rc = hvm_set_cr3(get_vvmcs(v, HOST_CR3));
     if ( rc == X86EMUL_EXCEPTION )
         hvm_inject_hw_exception(TRAP_gp_fault, 0);
 
@@ -1283,7 +1283,7 @@ static void load_vvmcs_host_state(struct vcpu *v)
     if ( control & VM_EXIT_LOAD_PERF_GLOBAL_CTRL )
     {
         rc = hvm_msr_write_intercept(MSR_CORE_PERF_GLOBAL_CTRL,
-                                     get_vvmcs(v, HOST_PERF_GLOBAL_CTRL), 1);
+                                     get_vvmcs(v, HOST_PERF_GLOBAL_CTRL));
         if ( rc == X86EMUL_EXCEPTION )
             hvm_inject_hw_exception(TRAP_gp_fault, 0);
     }
diff --git a/xen/include/asm-x86/hvm/support.h b/xen/include/asm-x86/hvm/support.h
index 7222939..7aaa4b1 100644
--- a/xen/include/asm-x86/hvm/support.h
+++ b/xen/include/asm-x86/hvm/support.h
@@ -134,9 +134,9 @@ void hvm_shadow_handle_cd(struct vcpu *v, unsigned long value);
  * returned.
  */
 int hvm_set_efer(uint64_t value);
-int hvm_set_cr0(unsigned long value, bool_t may_defer);
-int hvm_set_cr3(unsigned long value, bool_t may_defer);
-int hvm_set_cr4(unsigned long value, bool_t may_defer);
+int hvm_set_cr0(unsigned long value);
+int hvm_set_cr3(unsigned long value);
+int hvm_set_cr4(unsigned long value);
 int hvm_descriptor_access_intercept(uint64_t exit_info,
                                     uint64_t vmx_exit_qualification,
                                     unsigned int descriptor, bool is_write);
@@ -151,7 +151,7 @@ void hvm_ud_intercept(struct cpu_user_regs *);
 int __must_check hvm_msr_read_intercept(
     unsigned int msr, uint64_t *msr_content);
 int __must_check hvm_msr_write_intercept(
-    unsigned int msr, uint64_t msr_content, bool may_defer);
+    unsigned int msr, uint64_t msr_content);
 
 #endif /* __ASM_X86_HVM_SUPPORT_H__ */
 
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] x86/monitor: Introduce a boolean to suppress nested monitoring events
  2018-10-23 14:35 ` [PATCH 1/2] x86/monitor: Introduce a boolean to suppress nested monitoring events Andrew Cooper
@ 2018-10-23 14:54   ` Razvan Cojocaru
  2018-10-23 15:08     ` Andrew Cooper
  0 siblings, 1 reply; 8+ messages in thread
From: Razvan Cojocaru @ 2018-10-23 14:54 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel; +Cc: Tamas K Lengyel, Wei Liu, Jan Beulich

On 10/23/18 5:35 PM, Andrew Cooper wrote:
> When applying vm_event actions, monitoring events can nest and effectively
> livelock the vcpu.  Introduce a boolean which is set for a specific portion of
> the hvm_do_resume() path, which causes the hvm_monitor_* helpers to exit
> early.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Razvan Cojocaru <rcojocaru@bitdefender.com>
> CC: Tamas K Lengyel <tamas@tklengyel.com>
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> 
> RFC - This probably wants wiring up on ARM as well, but all I see is
> monitor_smc() and no equivalent parts to hvm_do_resume() where we may inject
> an exception from a vm_event.  Should this be included in non-x86 monitor
> functions for consistency at this point?
> ---
>  xen/arch/x86/hvm/hvm.c     |  8 ++++++++
>  xen/arch/x86/hvm/monitor.c | 24 ++++++++++++++++++++++--
>  xen/include/xen/sched.h    |  8 ++++++++
>  3 files changed, 38 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 9bc8078..4b4d9d6 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -512,6 +512,12 @@ void hvm_do_resume(struct vcpu *v)
>      if ( !handle_hvm_io_completion(v) )
>          return;
>  
> +    /*
> +     * We are about to apply actions requested by the introspection
> +     * agent.  Don't trigger further monitoring.
> +     */
> +    v->monitor.suppress = true;
> +
>      if ( unlikely(v->arch.vm_event) )
>          hvm_vm_event_do_resume(v);
>  
> @@ -526,6 +532,8 @@ void hvm_do_resume(struct vcpu *v)
>          v->arch.hvm.inject_event.vector = HVM_EVENT_VECTOR_UNSET;
>      }
>  
> +    v->monitor.suppress = false;
> +
>      if ( unlikely(v->arch.vm_event) && v->arch.monitor.next_interrupt_enabled )
>      {
>          struct x86_event info;
> diff --git a/xen/arch/x86/hvm/monitor.c b/xen/arch/x86/hvm/monitor.c
> index 2a41ccc..f1a196f 100644
> --- a/xen/arch/x86/hvm/monitor.c
> +++ b/xen/arch/x86/hvm/monitor.c
> @@ -36,6 +36,9 @@ bool hvm_monitor_cr(unsigned int index, unsigned long value, unsigned long old)
>      struct arch_domain *ad = &curr->domain->arch;
>      unsigned int ctrlreg_bitmask = monitor_ctrlreg_bitmask(index);
>  
> +    if ( curr->monitor.suppress )
> +        return 0;
> +
>      if ( index == VM_EVENT_X86_CR3 && hvm_pcid_enabled(curr) )
>          value &= ~X86_CR3_NOFLUSH; /* Clear the noflush bit. */
>  
> @@ -73,6 +76,9 @@ bool hvm_monitor_emul_unimplemented(void)
>          .vcpu_id  = curr->vcpu_id,
>      };
>  
> +    if ( curr->monitor.suppress )
> +        return false;

Rather than doing this for each event, I think we may be able to do it
only in monitor_traps(). Am I missing something?


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] x86/monitor: Introduce a boolean to suppress nested monitoring events
  2018-10-23 14:54   ` Razvan Cojocaru
@ 2018-10-23 15:08     ` Andrew Cooper
  2018-10-23 15:27       ` Razvan Cojocaru
  0 siblings, 1 reply; 8+ messages in thread
From: Andrew Cooper @ 2018-10-23 15:08 UTC (permalink / raw)
  To: Razvan Cojocaru, Xen-devel; +Cc: Tamas K Lengyel, Wei Liu, Jan Beulich

On 23/10/18 15:54, Razvan Cojocaru wrote:
> On 10/23/18 5:35 PM, Andrew Cooper wrote:
>> diff --git a/xen/arch/x86/hvm/monitor.c b/xen/arch/x86/hvm/monitor.c
>> index 2a41ccc..f1a196f 100644
>> --- a/xen/arch/x86/hvm/monitor.c
>> +++ b/xen/arch/x86/hvm/monitor.c
>> @@ -36,6 +36,9 @@ bool hvm_monitor_cr(unsigned int index, unsigned long value, unsigned long old)
>>      struct arch_domain *ad = &curr->domain->arch;
>>      unsigned int ctrlreg_bitmask = monitor_ctrlreg_bitmask(index);
>>  
>> +    if ( curr->monitor.suppress )
>> +        return 0;
>> +
>>      if ( index == VM_EVENT_X86_CR3 && hvm_pcid_enabled(curr) )
>>          value &= ~X86_CR3_NOFLUSH; /* Clear the noflush bit. */
>>  
>> @@ -73,6 +76,9 @@ bool hvm_monitor_emul_unimplemented(void)
>>          .vcpu_id  = curr->vcpu_id,
>>      };
>>  
>> +    if ( curr->monitor.suppress )
>> +        return false;
> Rather than doing this for each event, I think we may be able to do it
> only in monitor_traps(). Am I missing something?

I guess that depends on how expensive it is to collect together the
other data being fed into the monitor ring.  I suppose it is only the
hvm_do_resume() path which will suffer, and only on a reply from
introspection, which isn't exactly a fastpath.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] x86/hvm: Drop the may_defer boolean from hvm_* helpers
  2018-10-23 14:35 ` [PATCH 2/2] x86/hvm: Drop the may_defer boolean from hvm_* helpers Andrew Cooper
@ 2018-10-23 15:24   ` Razvan Cojocaru
  2018-10-24 10:00     ` Andrew Cooper
  0 siblings, 1 reply; 8+ messages in thread
From: Razvan Cojocaru @ 2018-10-23 15:24 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel
  Cc: Kevin Tian, Tamas K Lengyel, Wei Liu, Jan Beulich, Jun Nakajima,
	Boris Ostrovsky, Brian Woods, Suravee Suthikulpanit

On 10/23/18 5:35 PM, Andrew Cooper wrote:
> The may_defer booleans were introduced with the monitor infrastructure, but
> their purpose is not obvious and not described anywhere.
> 
> They exist to avoid triggering nested monitoring events from introspection
> activities, but with the introduction of the general monitor.suppress
> infrastructure, they are no longer needed.  Drop them.

I admit their purpose may not be obvious, but they don't exist only for
the reason you've given. They exist so that we may be able to send out
vm_events _before_ a write happens (so that we are then able to veto the
CR or MSR write from the introspection agent).

So defer means that we defer the write to until after the introspection
agent replies. The "may" part refers to the fact that the introspection
may not be interested in that event, so you're telling the function
"please don't write the value in this MSR, just send a vm_event for now,
_unless_ the introspection agent didn't subscribe to writes in this
particular MSR".

The actual write is done in the code called by hvm_vm_event_do_resume(),
if the vm_event reply allows it.

> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Razvan Cojocaru <rcojocaru@bitdefender.com>
> CC: Tamas K Lengyel <tamas@tklengyel.com>
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> CC: Jun Nakajima <jun.nakajima@intel.com>
> CC: Kevin Tian <kevin.tian@intel.com>
> CC: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> CC: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> CC: Brian Woods <brian.woods@amd.com>
> ---
>  xen/arch/x86/hvm/emulate.c        |  8 ++++----
>  xen/arch/x86/hvm/hvm.c            | 31 +++++++++++++++----------------
>  xen/arch/x86/hvm/svm/nestedsvm.c  | 14 +++++++-------
>  xen/arch/x86/hvm/svm/svm.c        |  2 +-
>  xen/arch/x86/hvm/vm_event.c       |  9 ++++-----
>  xen/arch/x86/hvm/vmx/vmx.c        |  4 ++--
>  xen/arch/x86/hvm/vmx/vvmx.c       | 16 ++++++++--------
>  xen/include/asm-x86/hvm/support.h |  8 ++++----
>  8 files changed, 45 insertions(+), 47 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
> index cd1d9a7..43f18c2 100644
> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -2024,7 +2024,7 @@ static int hvmemul_write_cr(
>      switch ( reg )
>      {
>      case 0:
> -        rc = hvm_set_cr0(val, 1);
> +        rc = hvm_set_cr0(val);
>          break;
>  
>      case 2:
> @@ -2033,11 +2033,11 @@ static int hvmemul_write_cr(
>          break;
>  
>      case 3:
> -        rc = hvm_set_cr3(val, 1);
> +        rc = hvm_set_cr3(val);
>          break;
>  
>      case 4:
> -        rc = hvm_set_cr4(val, 1);
> +        rc = hvm_set_cr4(val);
>          break;
>  
>      default:
> @@ -2092,7 +2092,7 @@ static int hvmemul_write_msr(
>      uint64_t val,
>      struct x86_emulate_ctxt *ctxt)
>  {
> -    int rc = hvm_msr_write_intercept(reg, val, 1);
> +    int rc = hvm_msr_write_intercept(reg, val);
>  
>      if ( rc == X86EMUL_EXCEPTION )
>          x86_emul_hw_exception(TRAP_gp_fault, 0, ctxt);
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 4b4d9d6..296b967 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -2046,15 +2046,15 @@ int hvm_mov_to_cr(unsigned int cr, unsigned int gpr)
>      switch ( cr )
>      {
>      case 0:
> -        rc = hvm_set_cr0(val, 1);
> +        rc = hvm_set_cr0(val);
>          break;
>  
>      case 3:
> -        rc = hvm_set_cr3(val, 1);
> +        rc = hvm_set_cr3(val);
>          break;
>  
>      case 4:
> -        rc = hvm_set_cr4(val, 1);
> +        rc = hvm_set_cr4(val);
>          break;
>  
>      case 8:
> @@ -2150,7 +2150,7 @@ static void hvm_update_cr(struct vcpu *v, unsigned int cr, unsigned long value)
>      hvm_update_guest_cr(v, cr);
>  }
>  
> -int hvm_set_cr0(unsigned long value, bool_t may_defer)
> +int hvm_set_cr0(unsigned long value)
>  {
>      struct vcpu *v = current;
>      struct domain *d = v->domain;
> @@ -2176,8 +2176,8 @@ int hvm_set_cr0(unsigned long value, bool_t may_defer)
>           (value & (X86_CR0_PE | X86_CR0_PG)) == X86_CR0_PG )
>          return X86EMUL_EXCEPTION;
>  
> -    if ( may_defer && unlikely(v->domain->arch.monitor.write_ctrlreg_enabled &
> -                               monitor_ctrlreg_bitmask(VM_EVENT_X86_CR0)) )
> +    if ( unlikely(v->domain->arch.monitor.write_ctrlreg_enabled &
> +                  monitor_ctrlreg_bitmask(VM_EVENT_X86_CR0)) )
>      {
>          ASSERT(v->arch.vm_event);
>  
> @@ -2268,15 +2268,15 @@ int hvm_set_cr0(unsigned long value, bool_t may_defer)
>      return X86EMUL_OKAY;
>  }
>  
> -int hvm_set_cr3(unsigned long value, bool_t may_defer)
> +int hvm_set_cr3(unsigned long value)
>  {
>      struct vcpu *v = current;
>      struct page_info *page;
>      unsigned long old = v->arch.hvm.guest_cr[3];
>      bool noflush = false;
>  
> -    if ( may_defer && unlikely(v->domain->arch.monitor.write_ctrlreg_enabled &
> -                               monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3)) )
> +    if ( unlikely(v->domain->arch.monitor.write_ctrlreg_enabled &
> +                  monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3)) )
>      {
>          ASSERT(v->arch.vm_event);
>  
> @@ -2322,7 +2322,7 @@ int hvm_set_cr3(unsigned long value, bool_t may_defer)
>      return X86EMUL_UNHANDLEABLE;
>  }
>  
> -int hvm_set_cr4(unsigned long value, bool_t may_defer)
> +int hvm_set_cr4(unsigned long value)
>  {
>      struct vcpu *v = current;
>      unsigned long old_cr;
> @@ -2356,8 +2356,8 @@ int hvm_set_cr4(unsigned long value, bool_t may_defer)
>          return X86EMUL_EXCEPTION;
>      }
>  
> -    if ( may_defer && unlikely(v->domain->arch.monitor.write_ctrlreg_enabled &
> -                               monitor_ctrlreg_bitmask(VM_EVENT_X86_CR4)) )
> +    if ( unlikely(v->domain->arch.monitor.write_ctrlreg_enabled &
> +                  monitor_ctrlreg_bitmask(VM_EVENT_X86_CR4)) )
>      {
>          ASSERT(v->arch.vm_event);
>  
> @@ -2989,7 +2989,7 @@ void hvm_task_switch(
>      if ( task_switch_load_seg(x86_seg_ldtr, tss.ldt, new_cpl, 0) )
>          goto out;
>  
> -    rc = hvm_set_cr3(tss.cr3, 1);
> +    rc = hvm_set_cr3(tss.cr3);
>      if ( rc == X86EMUL_EXCEPTION )
>          hvm_inject_hw_exception(TRAP_gp_fault, 0);
>      if ( rc != X86EMUL_OKAY )
> @@ -3497,8 +3497,7 @@ int hvm_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
>      goto out;
>  }
>  
> -int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content,
> -                            bool may_defer)
> +int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content)
>  {
>      struct vcpu *v = current;
>      struct domain *d = v->domain;
> @@ -3507,7 +3506,7 @@ int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content,
>      HVMTRACE_3D(MSR_WRITE, msr,
>                 (uint32_t)msr_content, (uint32_t)(msr_content >> 32));
>  
> -    if ( may_defer && unlikely(monitored_msr(v->domain, msr)) )
> +    if ( unlikely(monitored_msr(v->domain, msr)) )
>      {
>          uint64_t msr_old_content;
>  

I don't see how this could work. The beginning of this function looks as
follows:

3492 int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content,
3493                             bool may_defer)
3494 {
3495     struct vcpu *v = current;
3496     struct domain *d = v->domain;
3497     int ret;
3498
3499     HVMTRACE_3D(MSR_WRITE, msr,
3500                (uint32_t)msr_content, (uint32_t)(msr_content >> 32));
3501
3502     if ( may_defer && unlikely(monitored_msr(v->domain, msr)) )
3503     {
3504         uint64_t msr_old_content;
3505
3506         ret = hvm_msr_read_intercept(msr, &msr_old_content);
3507         if ( ret != X86EMUL_OKAY )
3508             return ret;
3509
3510         ASSERT(v->arch.vm_event);
3511
3512         /* The actual write will occur in hvm_do_resume() (if
permitted). */
3513         v->arch.vm_event->write_data.do_write.msr = 1;
3514         v->arch.vm_event->write_data.msr = msr;
3515         v->arch.vm_event->write_data.value = msr_content;
3516
3517         hvm_monitor_msr(msr, msr_content, msr_old_content);
3518         return X86EMUL_OKAY;
3519     }
3520
3521     if ( (ret = guest_wrmsr(v, msr, msr_content)) !=
X86EMUL_UNHANDLEABLE )
3522         return ret;

By dumping may_defer, you're now making sure that this function will
never get to guest_wrmsr() as long as we're dealing with a monitored_msr().

But the code currently calls hvm_msr_write_intercept() with a 0 value
for may_defer not only in hvm_vm_event_do_resume(), but also in
load_shadow_guest_state() in vvmx.c, for example.

Speaking of which, removing may_defer from these functions without
looking at v->monitor.suppress won't work. I think what you were aiming
at was perhaps to replace may_defer with an equilvalent test on
v->monitor.suppress in the body of the function instead of simply
erasing may_defer from everywhere.


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] x86/monitor: Introduce a boolean to suppress nested monitoring events
  2018-10-23 15:08     ` Andrew Cooper
@ 2018-10-23 15:27       ` Razvan Cojocaru
  0 siblings, 0 replies; 8+ messages in thread
From: Razvan Cojocaru @ 2018-10-23 15:27 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel; +Cc: Tamas K Lengyel, Wei Liu, Jan Beulich

On 10/23/18 6:08 PM, Andrew Cooper wrote:
> On 23/10/18 15:54, Razvan Cojocaru wrote:
>> On 10/23/18 5:35 PM, Andrew Cooper wrote:
>>> diff --git a/xen/arch/x86/hvm/monitor.c b/xen/arch/x86/hvm/monitor.c
>>> index 2a41ccc..f1a196f 100644
>>> --- a/xen/arch/x86/hvm/monitor.c
>>> +++ b/xen/arch/x86/hvm/monitor.c
>>> @@ -36,6 +36,9 @@ bool hvm_monitor_cr(unsigned int index, unsigned long value, unsigned long old)
>>>      struct arch_domain *ad = &curr->domain->arch;
>>>      unsigned int ctrlreg_bitmask = monitor_ctrlreg_bitmask(index);
>>>  
>>> +    if ( curr->monitor.suppress )
>>> +        return 0;
>>> +
>>>      if ( index == VM_EVENT_X86_CR3 && hvm_pcid_enabled(curr) )
>>>          value &= ~X86_CR3_NOFLUSH; /* Clear the noflush bit. */
>>>  
>>> @@ -73,6 +76,9 @@ bool hvm_monitor_emul_unimplemented(void)
>>>          .vcpu_id  = curr->vcpu_id,
>>>      };
>>>  
>>> +    if ( curr->monitor.suppress )
>>> +        return false;
>> Rather than doing this for each event, I think we may be able to do it
>> only in monitor_traps(). Am I missing something?
> 
> I guess that depends on how expensive it is to collect together the
> other data being fed into the monitor ring.  I suppose it is only the
> hvm_do_resume() path which will suffer, and only on a reply from
> introspection, which isn't exactly a fastpath.

monitor_traps() calls vm_event_fill_regs(req); at the very end, which
you can short-circuit by returning sooner. The rest of the information I
believe has already been collected where you test v->monitor.suppress:

 91 int monitor_traps(struct vcpu *v, bool sync, vm_event_request_t *req)
 92 {
 93     int rc;
 94     struct domain *d = v->domain;
 95
 96     rc = vm_event_claim_slot(d, d->vm_event_monitor);
 97     switch ( rc )
 98     {
 99     case 0:
100         break;
101     case -ENOSYS:
102         /*
103          * If there was no ring to handle the event, then
104          * simply continue executing normally.
105          */
106         return 0;
107     default:
108         return rc;
109     };
110
111     req->vcpu_id = v->vcpu_id;
112
113     if ( sync )
114     {
115         req->flags |= VM_EVENT_FLAG_VCPU_PAUSED;
116         vm_event_vcpu_pause(v);
117         rc = 1;
118     }
119
120     if ( altp2m_active(d) )
121     {
122         req->flags |= VM_EVENT_FLAG_ALTERNATE_P2M;
123         req->altp2m_idx = altp2m_vcpu_idx(v);
124     }
125
126     vm_event_fill_regs(req);
127     vm_event_put_request(d, d->vm_event_monitor, req);
128
129     return rc;
130 }


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] x86/hvm: Drop the may_defer boolean from hvm_* helpers
  2018-10-23 15:24   ` Razvan Cojocaru
@ 2018-10-24 10:00     ` Andrew Cooper
  0 siblings, 0 replies; 8+ messages in thread
From: Andrew Cooper @ 2018-10-24 10:00 UTC (permalink / raw)
  To: Razvan Cojocaru, Xen-devel
  Cc: Kevin Tian, Tamas K Lengyel, Wei Liu, Jan Beulich, Jun Nakajima,
	Boris Ostrovsky, Brian Woods, Suravee Suthikulpanit

On 23/10/18 16:24, Razvan Cojocaru wrote:
> On 10/23/18 5:35 PM, Andrew Cooper wrote:
>> The may_defer booleans were introduced with the monitor infrastructure, but
>> their purpose is not obvious and not described anywhere.
>>
>> They exist to avoid triggering nested monitoring events from introspection
>> activities, but with the introduction of the general monitor.suppress
>> infrastructure, they are no longer needed.  Drop them.
> I admit their purpose may not be obvious, but they don't exist only for
> the reason you've given. They exist so that we may be able to send out
> vm_events _before_ a write happens (so that we are then able to veto the
> CR or MSR write from the introspection agent).
>
> So defer means that we defer the write to until after the introspection
> agent replies. The "may" part refers to the fact that the introspection
> may not be interested in that event, so you're telling the function
> "please don't write the value in this MSR, just send a vm_event for now,
> _unless_ the introspection agent didn't subscribe to writes in this
> particular MSR".
>
> The actual write is done in the code called by hvm_vm_event_do_resume(),
> if the vm_event reply allows it.
>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>> CC: Razvan Cojocaru <rcojocaru@bitdefender.com>
>> CC: Tamas K Lengyel <tamas@tklengyel.com>
>> CC: Jan Beulich <JBeulich@suse.com>
>> CC: Wei Liu <wei.liu2@citrix.com>
>> CC: Jun Nakajima <jun.nakajima@intel.com>
>> CC: Kevin Tian <kevin.tian@intel.com>
>> CC: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>> CC: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
>> CC: Brian Woods <brian.woods@amd.com>
>> ---
>>  xen/arch/x86/hvm/emulate.c        |  8 ++++----
>>  xen/arch/x86/hvm/hvm.c            | 31 +++++++++++++++----------------
>>  xen/arch/x86/hvm/svm/nestedsvm.c  | 14 +++++++-------
>>  xen/arch/x86/hvm/svm/svm.c        |  2 +-
>>  xen/arch/x86/hvm/vm_event.c       |  9 ++++-----
>>  xen/arch/x86/hvm/vmx/vmx.c        |  4 ++--
>>  xen/arch/x86/hvm/vmx/vvmx.c       | 16 ++++++++--------
>>  xen/include/asm-x86/hvm/support.h |  8 ++++----
>>  8 files changed, 45 insertions(+), 47 deletions(-)
>>
>> diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
>> index cd1d9a7..43f18c2 100644
>> --- a/xen/arch/x86/hvm/emulate.c
>> +++ b/xen/arch/x86/hvm/emulate.c
>> @@ -2024,7 +2024,7 @@ static int hvmemul_write_cr(
>>      switch ( reg )
>>      {
>>      case 0:
>> -        rc = hvm_set_cr0(val, 1);
>> +        rc = hvm_set_cr0(val);
>>          break;
>>  
>>      case 2:
>> @@ -2033,11 +2033,11 @@ static int hvmemul_write_cr(
>>          break;
>>  
>>      case 3:
>> -        rc = hvm_set_cr3(val, 1);
>> +        rc = hvm_set_cr3(val);
>>          break;
>>  
>>      case 4:
>> -        rc = hvm_set_cr4(val, 1);
>> +        rc = hvm_set_cr4(val);
>>          break;
>>  
>>      default:
>> @@ -2092,7 +2092,7 @@ static int hvmemul_write_msr(
>>      uint64_t val,
>>      struct x86_emulate_ctxt *ctxt)
>>  {
>> -    int rc = hvm_msr_write_intercept(reg, val, 1);
>> +    int rc = hvm_msr_write_intercept(reg, val);
>>  
>>      if ( rc == X86EMUL_EXCEPTION )
>>          x86_emul_hw_exception(TRAP_gp_fault, 0, ctxt);
>> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
>> index 4b4d9d6..296b967 100644
>> --- a/xen/arch/x86/hvm/hvm.c
>> +++ b/xen/arch/x86/hvm/hvm.c
>> @@ -2046,15 +2046,15 @@ int hvm_mov_to_cr(unsigned int cr, unsigned int gpr)
>>      switch ( cr )
>>      {
>>      case 0:
>> -        rc = hvm_set_cr0(val, 1);
>> +        rc = hvm_set_cr0(val);
>>          break;
>>  
>>      case 3:
>> -        rc = hvm_set_cr3(val, 1);
>> +        rc = hvm_set_cr3(val);
>>          break;
>>  
>>      case 4:
>> -        rc = hvm_set_cr4(val, 1);
>> +        rc = hvm_set_cr4(val);
>>          break;
>>  
>>      case 8:
>> @@ -2150,7 +2150,7 @@ static void hvm_update_cr(struct vcpu *v, unsigned int cr, unsigned long value)
>>      hvm_update_guest_cr(v, cr);
>>  }
>>  
>> -int hvm_set_cr0(unsigned long value, bool_t may_defer)
>> +int hvm_set_cr0(unsigned long value)
>>  {
>>      struct vcpu *v = current;
>>      struct domain *d = v->domain;
>> @@ -2176,8 +2176,8 @@ int hvm_set_cr0(unsigned long value, bool_t may_defer)
>>           (value & (X86_CR0_PE | X86_CR0_PG)) == X86_CR0_PG )
>>          return X86EMUL_EXCEPTION;
>>  
>> -    if ( may_defer && unlikely(v->domain->arch.monitor.write_ctrlreg_enabled &
>> -                               monitor_ctrlreg_bitmask(VM_EVENT_X86_CR0)) )
>> +    if ( unlikely(v->domain->arch.monitor.write_ctrlreg_enabled &
>> +                  monitor_ctrlreg_bitmask(VM_EVENT_X86_CR0)) )
>>      {
>>          ASSERT(v->arch.vm_event);
>>  
>> @@ -2268,15 +2268,15 @@ int hvm_set_cr0(unsigned long value, bool_t may_defer)
>>      return X86EMUL_OKAY;
>>  }
>>  
>> -int hvm_set_cr3(unsigned long value, bool_t may_defer)
>> +int hvm_set_cr3(unsigned long value)
>>  {
>>      struct vcpu *v = current;
>>      struct page_info *page;
>>      unsigned long old = v->arch.hvm.guest_cr[3];
>>      bool noflush = false;
>>  
>> -    if ( may_defer && unlikely(v->domain->arch.monitor.write_ctrlreg_enabled &
>> -                               monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3)) )
>> +    if ( unlikely(v->domain->arch.monitor.write_ctrlreg_enabled &
>> +                  monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3)) )
>>      {
>>          ASSERT(v->arch.vm_event);
>>  
>> @@ -2322,7 +2322,7 @@ int hvm_set_cr3(unsigned long value, bool_t may_defer)
>>      return X86EMUL_UNHANDLEABLE;
>>  }
>>  
>> -int hvm_set_cr4(unsigned long value, bool_t may_defer)
>> +int hvm_set_cr4(unsigned long value)
>>  {
>>      struct vcpu *v = current;
>>      unsigned long old_cr;
>> @@ -2356,8 +2356,8 @@ int hvm_set_cr4(unsigned long value, bool_t may_defer)
>>          return X86EMUL_EXCEPTION;
>>      }
>>  
>> -    if ( may_defer && unlikely(v->domain->arch.monitor.write_ctrlreg_enabled &
>> -                               monitor_ctrlreg_bitmask(VM_EVENT_X86_CR4)) )
>> +    if ( unlikely(v->domain->arch.monitor.write_ctrlreg_enabled &
>> +                  monitor_ctrlreg_bitmask(VM_EVENT_X86_CR4)) )
>>      {
>>          ASSERT(v->arch.vm_event);
>>  
>> @@ -2989,7 +2989,7 @@ void hvm_task_switch(
>>      if ( task_switch_load_seg(x86_seg_ldtr, tss.ldt, new_cpl, 0) )
>>          goto out;
>>  
>> -    rc = hvm_set_cr3(tss.cr3, 1);
>> +    rc = hvm_set_cr3(tss.cr3);
>>      if ( rc == X86EMUL_EXCEPTION )
>>          hvm_inject_hw_exception(TRAP_gp_fault, 0);
>>      if ( rc != X86EMUL_OKAY )
>> @@ -3497,8 +3497,7 @@ int hvm_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
>>      goto out;
>>  }
>>  
>> -int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content,
>> -                            bool may_defer)
>> +int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content)
>>  {
>>      struct vcpu *v = current;
>>      struct domain *d = v->domain;
>> @@ -3507,7 +3506,7 @@ int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content,
>>      HVMTRACE_3D(MSR_WRITE, msr,
>>                 (uint32_t)msr_content, (uint32_t)(msr_content >> 32));
>>  
>> -    if ( may_defer && unlikely(monitored_msr(v->domain, msr)) )
>> +    if ( unlikely(monitored_msr(v->domain, msr)) )
>>      {
>>          uint64_t msr_old_content;
>>  
> I don't see how this could work. The beginning of this function looks as
> follows:
>
> 3492 int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content,
> 3493                             bool may_defer)
> 3494 {
> 3495     struct vcpu *v = current;
> 3496     struct domain *d = v->domain;
> 3497     int ret;
> 3498
> 3499     HVMTRACE_3D(MSR_WRITE, msr,
> 3500                (uint32_t)msr_content, (uint32_t)(msr_content >> 32));
> 3501
> 3502     if ( may_defer && unlikely(monitored_msr(v->domain, msr)) )
> 3503     {
> 3504         uint64_t msr_old_content;
> 3505
> 3506         ret = hvm_msr_read_intercept(msr, &msr_old_content);
> 3507         if ( ret != X86EMUL_OKAY )
> 3508             return ret;
> 3509
> 3510         ASSERT(v->arch.vm_event);
> 3511
> 3512         /* The actual write will occur in hvm_do_resume() (if
> permitted). */
> 3513         v->arch.vm_event->write_data.do_write.msr = 1;
> 3514         v->arch.vm_event->write_data.msr = msr;
> 3515         v->arch.vm_event->write_data.value = msr_content;
> 3516
> 3517         hvm_monitor_msr(msr, msr_content, msr_old_content);
> 3518         return X86EMUL_OKAY;
> 3519     }
> 3520
> 3521     if ( (ret = guest_wrmsr(v, msr, msr_content)) !=
> X86EMUL_UNHANDLEABLE )
> 3522         return ret;
>
> By dumping may_defer, you're now making sure that this function will
> never get to guest_wrmsr() as long as we're dealing with a monitored_msr().
>
> But the code currently calls hvm_msr_write_intercept() with a 0 value
> for may_defer not only in hvm_vm_event_do_resume(), but also in
> load_shadow_guest_state() in vvmx.c, for example.
>
> Speaking of which, removing may_defer from these functions without
> looking at v->monitor.suppress won't work. I think what you were aiming
> at was perhaps to replace may_defer with an equilvalent test on
> v->monitor.suppress in the body of the function instead of simply
> erasing may_defer from everywhere.

Hmm - good point.  This will break things even more.  The
monitor.suppress check needs to be at this point, rather than later.

I'll see about wrapping the monitor checks up into some static inlines
which better model the intercepts they are built from, and check
suppress as the first action.  That should resolve the issues here.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-10-24 10:00 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-23 14:35 [PATCH 0/2] x86/monitor: More generic livelock avoidance Andrew Cooper
2018-10-23 14:35 ` [PATCH 1/2] x86/monitor: Introduce a boolean to suppress nested monitoring events Andrew Cooper
2018-10-23 14:54   ` Razvan Cojocaru
2018-10-23 15:08     ` Andrew Cooper
2018-10-23 15:27       ` Razvan Cojocaru
2018-10-23 14:35 ` [PATCH 2/2] x86/hvm: Drop the may_defer boolean from hvm_* helpers Andrew Cooper
2018-10-23 15:24   ` Razvan Cojocaru
2018-10-24 10:00     ` Andrew Cooper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.