* [PATCH 0/2] x86/monitor: More generic livelock avoidance
@ 2018-10-23 14:35 Andrew Cooper
2018-10-23 14:35 ` [PATCH 1/2] x86/monitor: Introduce a boolean to suppress nested monitoring events Andrew Cooper
2018-10-23 14:35 ` [PATCH 2/2] x86/hvm: Drop the may_defer boolean from hvm_* helpers Andrew Cooper
0 siblings, 2 replies; 8+ messages in thread
From: Andrew Cooper @ 2018-10-23 14:35 UTC (permalink / raw)
To: Xen-devel
Cc: Kevin Tian, Tamas K Lengyel, Wei Liu, Jan Beulich,
Razvan Cojocaru, Andrew Cooper, Jun Nakajima, Boris Ostrovsky,
Brian Woods, Suravee Suthikulpanit
This is another split-out part of of the v1 debugging series. It is semi-RFC
as it as only had extremely light testing.
Andrew Cooper (2):
x86/monitor: Introduce a boolean to suppress nested monitoring events
x86/hvm: Drop the may_defer boolean from hvm_* helpers
xen/arch/x86/hvm/emulate.c | 8 ++++----
xen/arch/x86/hvm/hvm.c | 39 +++++++++++++++++++++++----------------
xen/arch/x86/hvm/monitor.c | 24 ++++++++++++++++++++++--
xen/arch/x86/hvm/svm/nestedsvm.c | 14 +++++++-------
xen/arch/x86/hvm/svm/svm.c | 2 +-
xen/arch/x86/hvm/vm_event.c | 9 ++++-----
xen/arch/x86/hvm/vmx/vmx.c | 4 ++--
xen/arch/x86/hvm/vmx/vvmx.c | 16 ++++++++--------
xen/include/asm-x86/hvm/support.h | 8 ++++----
xen/include/xen/sched.h | 8 ++++++++
10 files changed, 83 insertions(+), 49 deletions(-)
--
2.1.4
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/2] x86/monitor: Introduce a boolean to suppress nested monitoring events
2018-10-23 14:35 [PATCH 0/2] x86/monitor: More generic livelock avoidance Andrew Cooper
@ 2018-10-23 14:35 ` Andrew Cooper
2018-10-23 14:54 ` Razvan Cojocaru
2018-10-23 14:35 ` [PATCH 2/2] x86/hvm: Drop the may_defer boolean from hvm_* helpers Andrew Cooper
1 sibling, 1 reply; 8+ messages in thread
From: Andrew Cooper @ 2018-10-23 14:35 UTC (permalink / raw)
To: Xen-devel
Cc: Andrew Cooper, Tamas K Lengyel, Wei Liu, Jan Beulich, Razvan Cojocaru
When applying vm_event actions, monitoring events can nest and effectively
livelock the vcpu. Introduce a boolean which is set for a specific portion of
the hvm_do_resume() path, which causes the hvm_monitor_* helpers to exit
early.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Razvan Cojocaru <rcojocaru@bitdefender.com>
CC: Tamas K Lengyel <tamas@tklengyel.com>
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wei.liu2@citrix.com>
RFC - This probably wants wiring up on ARM as well, but all I see is
monitor_smc() and no equivalent parts to hvm_do_resume() where we may inject
an exception from a vm_event. Should this be included in non-x86 monitor
functions for consistency at this point?
---
xen/arch/x86/hvm/hvm.c | 8 ++++++++
xen/arch/x86/hvm/monitor.c | 24 ++++++++++++++++++++++--
xen/include/xen/sched.h | 8 ++++++++
3 files changed, 38 insertions(+), 2 deletions(-)
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 9bc8078..4b4d9d6 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -512,6 +512,12 @@ void hvm_do_resume(struct vcpu *v)
if ( !handle_hvm_io_completion(v) )
return;
+ /*
+ * We are about to apply actions requested by the introspection
+ * agent. Don't trigger further monitoring.
+ */
+ v->monitor.suppress = true;
+
if ( unlikely(v->arch.vm_event) )
hvm_vm_event_do_resume(v);
@@ -526,6 +532,8 @@ void hvm_do_resume(struct vcpu *v)
v->arch.hvm.inject_event.vector = HVM_EVENT_VECTOR_UNSET;
}
+ v->monitor.suppress = false;
+
if ( unlikely(v->arch.vm_event) && v->arch.monitor.next_interrupt_enabled )
{
struct x86_event info;
diff --git a/xen/arch/x86/hvm/monitor.c b/xen/arch/x86/hvm/monitor.c
index 2a41ccc..f1a196f 100644
--- a/xen/arch/x86/hvm/monitor.c
+++ b/xen/arch/x86/hvm/monitor.c
@@ -36,6 +36,9 @@ bool hvm_monitor_cr(unsigned int index, unsigned long value, unsigned long old)
struct arch_domain *ad = &curr->domain->arch;
unsigned int ctrlreg_bitmask = monitor_ctrlreg_bitmask(index);
+ if ( curr->monitor.suppress )
+ return 0;
+
if ( index == VM_EVENT_X86_CR3 && hvm_pcid_enabled(curr) )
value &= ~X86_CR3_NOFLUSH; /* Clear the noflush bit. */
@@ -73,6 +76,9 @@ bool hvm_monitor_emul_unimplemented(void)
.vcpu_id = curr->vcpu_id,
};
+ if ( curr->monitor.suppress )
+ return false;
+
return curr->domain->arch.monitor.emul_unimplemented_enabled &&
monitor_traps(curr, true, &req) == 1;
}
@@ -81,6 +87,9 @@ void hvm_monitor_msr(unsigned int msr, uint64_t new_value, uint64_t old_value)
{
struct vcpu *curr = current;
+ if ( curr->monitor.suppress )
+ return;
+
if ( monitored_msr(curr->domain, msr) &&
(!monitored_msr_onchangeonly(curr->domain, msr) ||
new_value != old_value) )
@@ -100,12 +109,16 @@ void hvm_monitor_descriptor_access(uint64_t exit_info,
uint64_t vmx_exit_qualification,
uint8_t descriptor, bool is_write)
{
+ struct vcpu *curr = current;
vm_event_request_t req = {
.reason = VM_EVENT_REASON_DESCRIPTOR_ACCESS,
.u.desc_access.descriptor = descriptor,
.u.desc_access.is_write = is_write,
};
+ if ( curr->monitor.suppress )
+ return;
+
if ( cpu_has_vmx )
{
req.u.desc_access.arch.vmx.instr_info = exit_info;
@@ -116,7 +129,7 @@ void hvm_monitor_descriptor_access(uint64_t exit_info,
req.u.desc_access.arch.svm.exitinfo = exit_info;
}
- monitor_traps(current, true, &req);
+ monitor_traps(curr, true, &req);
}
static inline unsigned long gfn_of_rip(unsigned long rip)
@@ -146,6 +159,9 @@ int hvm_monitor_debug(unsigned long rip, enum hvm_monitor_debug_type type,
vm_event_request_t req = {};
bool sync;
+ if ( curr->monitor.suppress )
+ return 0;
+
switch ( type )
{
case HVM_MONITOR_SOFTWARE_BREAKPOINT:
@@ -204,6 +220,7 @@ int hvm_monitor_cpuid(unsigned long insn_length, unsigned int leaf,
void hvm_monitor_interrupt(unsigned int vector, unsigned int type,
unsigned int err, uint64_t cr2)
{
+ struct vcpu *curr = current;
vm_event_request_t req = {
.reason = VM_EVENT_REASON_INTERRUPT,
.u.interrupt.x86.vector = vector,
@@ -212,7 +229,10 @@ void hvm_monitor_interrupt(unsigned int vector, unsigned int type,
.u.interrupt.x86.cr2 = cr2,
};
- monitor_traps(current, 1, &req);
+ if ( curr->monitor.suppress )
+ return;
+
+ monitor_traps(curr, 1, &req);
}
/*
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 3171eab..b32089a 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -268,6 +268,14 @@ struct vcpu
struct evtchn_fifo_vcpu *evtchn_fifo;
+ struct {
+ /*
+ * Suppress monitoring events. Used to prevent vm_event-generated
+ * actions causing new monitoring events, and livelock the vcpu.
+ */
+ bool suppress;
+ } monitor;
+
/* vPCI per-vCPU area, used to store data for long running operations. */
struct vpci_vcpu vpci;
--
2.1.4
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/2] x86/hvm: Drop the may_defer boolean from hvm_* helpers
2018-10-23 14:35 [PATCH 0/2] x86/monitor: More generic livelock avoidance Andrew Cooper
2018-10-23 14:35 ` [PATCH 1/2] x86/monitor: Introduce a boolean to suppress nested monitoring events Andrew Cooper
@ 2018-10-23 14:35 ` Andrew Cooper
2018-10-23 15:24 ` Razvan Cojocaru
1 sibling, 1 reply; 8+ messages in thread
From: Andrew Cooper @ 2018-10-23 14:35 UTC (permalink / raw)
To: Xen-devel
Cc: Kevin Tian, Tamas K Lengyel, Wei Liu, Jan Beulich,
Razvan Cojocaru, Andrew Cooper, Jun Nakajima, Boris Ostrovsky,
Brian Woods, Suravee Suthikulpanit
The may_defer booleans were introduced with the monitor infrastructure, but
their purpose is not obvious and not described anywhere.
They exist to avoid triggering nested monitoring events from introspection
activities, but with the introduction of the general monitor.suppress
infrastructure, they are no longer needed. Drop them.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Razvan Cojocaru <rcojocaru@bitdefender.com>
CC: Tamas K Lengyel <tamas@tklengyel.com>
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Jun Nakajima <jun.nakajima@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
CC: Boris Ostrovsky <boris.ostrovsky@oracle.com>
CC: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
CC: Brian Woods <brian.woods@amd.com>
---
xen/arch/x86/hvm/emulate.c | 8 ++++----
xen/arch/x86/hvm/hvm.c | 31 +++++++++++++++----------------
xen/arch/x86/hvm/svm/nestedsvm.c | 14 +++++++-------
xen/arch/x86/hvm/svm/svm.c | 2 +-
xen/arch/x86/hvm/vm_event.c | 9 ++++-----
xen/arch/x86/hvm/vmx/vmx.c | 4 ++--
xen/arch/x86/hvm/vmx/vvmx.c | 16 ++++++++--------
xen/include/asm-x86/hvm/support.h | 8 ++++----
8 files changed, 45 insertions(+), 47 deletions(-)
diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index cd1d9a7..43f18c2 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -2024,7 +2024,7 @@ static int hvmemul_write_cr(
switch ( reg )
{
case 0:
- rc = hvm_set_cr0(val, 1);
+ rc = hvm_set_cr0(val);
break;
case 2:
@@ -2033,11 +2033,11 @@ static int hvmemul_write_cr(
break;
case 3:
- rc = hvm_set_cr3(val, 1);
+ rc = hvm_set_cr3(val);
break;
case 4:
- rc = hvm_set_cr4(val, 1);
+ rc = hvm_set_cr4(val);
break;
default:
@@ -2092,7 +2092,7 @@ static int hvmemul_write_msr(
uint64_t val,
struct x86_emulate_ctxt *ctxt)
{
- int rc = hvm_msr_write_intercept(reg, val, 1);
+ int rc = hvm_msr_write_intercept(reg, val);
if ( rc == X86EMUL_EXCEPTION )
x86_emul_hw_exception(TRAP_gp_fault, 0, ctxt);
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 4b4d9d6..296b967 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -2046,15 +2046,15 @@ int hvm_mov_to_cr(unsigned int cr, unsigned int gpr)
switch ( cr )
{
case 0:
- rc = hvm_set_cr0(val, 1);
+ rc = hvm_set_cr0(val);
break;
case 3:
- rc = hvm_set_cr3(val, 1);
+ rc = hvm_set_cr3(val);
break;
case 4:
- rc = hvm_set_cr4(val, 1);
+ rc = hvm_set_cr4(val);
break;
case 8:
@@ -2150,7 +2150,7 @@ static void hvm_update_cr(struct vcpu *v, unsigned int cr, unsigned long value)
hvm_update_guest_cr(v, cr);
}
-int hvm_set_cr0(unsigned long value, bool_t may_defer)
+int hvm_set_cr0(unsigned long value)
{
struct vcpu *v = current;
struct domain *d = v->domain;
@@ -2176,8 +2176,8 @@ int hvm_set_cr0(unsigned long value, bool_t may_defer)
(value & (X86_CR0_PE | X86_CR0_PG)) == X86_CR0_PG )
return X86EMUL_EXCEPTION;
- if ( may_defer && unlikely(v->domain->arch.monitor.write_ctrlreg_enabled &
- monitor_ctrlreg_bitmask(VM_EVENT_X86_CR0)) )
+ if ( unlikely(v->domain->arch.monitor.write_ctrlreg_enabled &
+ monitor_ctrlreg_bitmask(VM_EVENT_X86_CR0)) )
{
ASSERT(v->arch.vm_event);
@@ -2268,15 +2268,15 @@ int hvm_set_cr0(unsigned long value, bool_t may_defer)
return X86EMUL_OKAY;
}
-int hvm_set_cr3(unsigned long value, bool_t may_defer)
+int hvm_set_cr3(unsigned long value)
{
struct vcpu *v = current;
struct page_info *page;
unsigned long old = v->arch.hvm.guest_cr[3];
bool noflush = false;
- if ( may_defer && unlikely(v->domain->arch.monitor.write_ctrlreg_enabled &
- monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3)) )
+ if ( unlikely(v->domain->arch.monitor.write_ctrlreg_enabled &
+ monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3)) )
{
ASSERT(v->arch.vm_event);
@@ -2322,7 +2322,7 @@ int hvm_set_cr3(unsigned long value, bool_t may_defer)
return X86EMUL_UNHANDLEABLE;
}
-int hvm_set_cr4(unsigned long value, bool_t may_defer)
+int hvm_set_cr4(unsigned long value)
{
struct vcpu *v = current;
unsigned long old_cr;
@@ -2356,8 +2356,8 @@ int hvm_set_cr4(unsigned long value, bool_t may_defer)
return X86EMUL_EXCEPTION;
}
- if ( may_defer && unlikely(v->domain->arch.monitor.write_ctrlreg_enabled &
- monitor_ctrlreg_bitmask(VM_EVENT_X86_CR4)) )
+ if ( unlikely(v->domain->arch.monitor.write_ctrlreg_enabled &
+ monitor_ctrlreg_bitmask(VM_EVENT_X86_CR4)) )
{
ASSERT(v->arch.vm_event);
@@ -2989,7 +2989,7 @@ void hvm_task_switch(
if ( task_switch_load_seg(x86_seg_ldtr, tss.ldt, new_cpl, 0) )
goto out;
- rc = hvm_set_cr3(tss.cr3, 1);
+ rc = hvm_set_cr3(tss.cr3);
if ( rc == X86EMUL_EXCEPTION )
hvm_inject_hw_exception(TRAP_gp_fault, 0);
if ( rc != X86EMUL_OKAY )
@@ -3497,8 +3497,7 @@ int hvm_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
goto out;
}
-int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content,
- bool may_defer)
+int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content)
{
struct vcpu *v = current;
struct domain *d = v->domain;
@@ -3507,7 +3506,7 @@ int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content,
HVMTRACE_3D(MSR_WRITE, msr,
(uint32_t)msr_content, (uint32_t)(msr_content >> 32));
- if ( may_defer && unlikely(monitored_msr(v->domain, msr)) )
+ if ( unlikely(monitored_msr(v->domain, msr)) )
{
uint64_t msr_old_content;
diff --git a/xen/arch/x86/hvm/svm/nestedsvm.c b/xen/arch/x86/hvm/svm/nestedsvm.c
index 78a1016..399ff05 100644
--- a/xen/arch/x86/hvm/svm/nestedsvm.c
+++ b/xen/arch/x86/hvm/svm/nestedsvm.c
@@ -285,7 +285,7 @@ static int nsvm_vcpu_hostrestore(struct vcpu *v, struct cpu_user_regs *regs)
/* CR4 */
v->arch.hvm.guest_cr[4] = n1vmcb->_cr4;
- rc = hvm_set_cr4(n1vmcb->_cr4, 1);
+ rc = hvm_set_cr4(n1vmcb->_cr4);
if ( rc == X86EMUL_EXCEPTION )
hvm_inject_hw_exception(TRAP_gp_fault, 0);
if (rc != X86EMUL_OKAY)
@@ -296,7 +296,7 @@ static int nsvm_vcpu_hostrestore(struct vcpu *v, struct cpu_user_regs *regs)
svm->ns_cr0, v->arch.hvm.guest_cr[0]);
v->arch.hvm.guest_cr[0] = n1vmcb->_cr0 | X86_CR0_PE;
n1vmcb->rflags &= ~X86_EFLAGS_VM;
- rc = hvm_set_cr0(n1vmcb->_cr0 | X86_CR0_PE, 1);
+ rc = hvm_set_cr0(n1vmcb->_cr0 | X86_CR0_PE);
if ( rc == X86EMUL_EXCEPTION )
hvm_inject_hw_exception(TRAP_gp_fault, 0);
if (rc != X86EMUL_OKAY)
@@ -324,7 +324,7 @@ static int nsvm_vcpu_hostrestore(struct vcpu *v, struct cpu_user_regs *regs)
v->arch.guest_table = pagetable_null();
/* hvm_set_cr3() below sets v->arch.hvm.guest_cr[3] for us. */
}
- rc = hvm_set_cr3(n1vmcb->_cr3, 1);
+ rc = hvm_set_cr3(n1vmcb->_cr3);
if ( rc == X86EMUL_EXCEPTION )
hvm_inject_hw_exception(TRAP_gp_fault, 0);
if (rc != X86EMUL_OKAY)
@@ -556,7 +556,7 @@ static int nsvm_vmcb_prepare4vmrun(struct vcpu *v, struct cpu_user_regs *regs)
/* CR4 */
v->arch.hvm.guest_cr[4] = ns_vmcb->_cr4;
- rc = hvm_set_cr4(ns_vmcb->_cr4, 1);
+ rc = hvm_set_cr4(ns_vmcb->_cr4);
if ( rc == X86EMUL_EXCEPTION )
hvm_inject_hw_exception(TRAP_gp_fault, 0);
if (rc != X86EMUL_OKAY)
@@ -566,7 +566,7 @@ static int nsvm_vmcb_prepare4vmrun(struct vcpu *v, struct cpu_user_regs *regs)
svm->ns_cr0 = v->arch.hvm.guest_cr[0];
cr0 = nestedsvm_fpu_vmentry(svm->ns_cr0, ns_vmcb, n1vmcb, n2vmcb);
v->arch.hvm.guest_cr[0] = ns_vmcb->_cr0;
- rc = hvm_set_cr0(cr0, 1);
+ rc = hvm_set_cr0(cr0);
if ( rc == X86EMUL_EXCEPTION )
hvm_inject_hw_exception(TRAP_gp_fault, 0);
if (rc != X86EMUL_OKAY)
@@ -584,7 +584,7 @@ static int nsvm_vmcb_prepare4vmrun(struct vcpu *v, struct cpu_user_regs *regs)
nestedsvm_vmcb_set_nestedp2m(v, ns_vmcb, n2vmcb);
/* hvm_set_cr3() below sets v->arch.hvm.guest_cr[3] for us. */
- rc = hvm_set_cr3(ns_vmcb->_cr3, 1);
+ rc = hvm_set_cr3(ns_vmcb->_cr3);
if ( rc == X86EMUL_EXCEPTION )
hvm_inject_hw_exception(TRAP_gp_fault, 0);
if (rc != X86EMUL_OKAY)
@@ -598,7 +598,7 @@ static int nsvm_vmcb_prepare4vmrun(struct vcpu *v, struct cpu_user_regs *regs)
* we assume it intercepts page faults.
*/
/* hvm_set_cr3() below sets v->arch.hvm.guest_cr[3] for us. */
- rc = hvm_set_cr3(ns_vmcb->_cr3, 1);
+ rc = hvm_set_cr3(ns_vmcb->_cr3);
if ( rc == X86EMUL_EXCEPTION )
hvm_inject_hw_exception(TRAP_gp_fault, 0);
if (rc != X86EMUL_OKAY)
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 41427e7..80bb8fd 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -2333,7 +2333,7 @@ static void svm_do_msr_access(struct cpu_user_regs *regs)
msr_split(regs, msr_content);
}
else
- rc = hvm_msr_write_intercept(regs->ecx, msr_fold(regs), 1);
+ rc = hvm_msr_write_intercept(regs->ecx, msr_fold(regs));
if ( rc == X86EMUL_OKAY )
__update_guest_eip(regs, inst_len);
diff --git a/xen/arch/x86/hvm/vm_event.c b/xen/arch/x86/hvm/vm_event.c
index 28d08a6..0ae6a0f 100644
--- a/xen/arch/x86/hvm/vm_event.c
+++ b/xen/arch/x86/hvm/vm_event.c
@@ -94,7 +94,7 @@ void hvm_vm_event_do_resume(struct vcpu *v)
if ( unlikely(w->do_write.cr0) )
{
- if ( hvm_set_cr0(w->cr0, 0) == X86EMUL_EXCEPTION )
+ if ( hvm_set_cr0(w->cr0) == X86EMUL_EXCEPTION )
hvm_inject_hw_exception(TRAP_gp_fault, 0);
w->do_write.cr0 = 0;
@@ -102,7 +102,7 @@ void hvm_vm_event_do_resume(struct vcpu *v)
if ( unlikely(w->do_write.cr4) )
{
- if ( hvm_set_cr4(w->cr4, 0) == X86EMUL_EXCEPTION )
+ if ( hvm_set_cr4(w->cr4) == X86EMUL_EXCEPTION )
hvm_inject_hw_exception(TRAP_gp_fault, 0);
w->do_write.cr4 = 0;
@@ -110,7 +110,7 @@ void hvm_vm_event_do_resume(struct vcpu *v)
if ( unlikely(w->do_write.cr3) )
{
- if ( hvm_set_cr3(w->cr3, 0) == X86EMUL_EXCEPTION )
+ if ( hvm_set_cr3(w->cr3) == X86EMUL_EXCEPTION )
hvm_inject_hw_exception(TRAP_gp_fault, 0);
w->do_write.cr3 = 0;
@@ -118,8 +118,7 @@ void hvm_vm_event_do_resume(struct vcpu *v)
if ( unlikely(w->do_write.msr) )
{
- if ( hvm_msr_write_intercept(w->msr, w->value, 0) ==
- X86EMUL_EXCEPTION )
+ if ( hvm_msr_write_intercept(w->msr, w->value) == X86EMUL_EXCEPTION )
hvm_inject_hw_exception(TRAP_gp_fault, 0);
w->do_write.msr = 0;
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 46f51c3..3c39bf3 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2654,7 +2654,7 @@ static int vmx_cr_access(cr_access_qual_t qual)
(X86_CR0_PE|X86_CR0_MP|X86_CR0_EM|X86_CR0_TS));
HVMTRACE_LONG_1D(LMSW, value);
- if ( (rc = hvm_set_cr0(value, 1)) == X86EMUL_EXCEPTION )
+ if ( (rc = hvm_set_cr0(value)) == X86EMUL_EXCEPTION )
hvm_inject_hw_exception(TRAP_gp_fault, 0);
return rc;
@@ -3990,7 +3990,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
}
case EXIT_REASON_MSR_WRITE:
- switch ( hvm_msr_write_intercept(regs->ecx, msr_fold(regs), 1) )
+ switch ( hvm_msr_write_intercept(regs->ecx, msr_fold(regs)) )
{
case X86EMUL_OKAY:
update_guest_eip(); /* Safe: WRMSR */
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 0e45db8..19f925e 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -1059,15 +1059,15 @@ static void load_shadow_guest_state(struct vcpu *v)
nvcpu->guest_cr[0] = get_vvmcs(v, CR0_READ_SHADOW);
nvcpu->guest_cr[4] = get_vvmcs(v, CR4_READ_SHADOW);
- rc = hvm_set_cr0(get_vvmcs(v, GUEST_CR0), 1);
+ rc = hvm_set_cr0(get_vvmcs(v, GUEST_CR0));
if ( rc == X86EMUL_EXCEPTION )
hvm_inject_hw_exception(TRAP_gp_fault, 0);
- rc = hvm_set_cr4(get_vvmcs(v, GUEST_CR4), 1);
+ rc = hvm_set_cr4(get_vvmcs(v, GUEST_CR4));
if ( rc == X86EMUL_EXCEPTION )
hvm_inject_hw_exception(TRAP_gp_fault, 0);
- rc = hvm_set_cr3(get_vvmcs(v, GUEST_CR3), 1);
+ rc = hvm_set_cr3(get_vvmcs(v, GUEST_CR3));
if ( rc == X86EMUL_EXCEPTION )
hvm_inject_hw_exception(TRAP_gp_fault, 0);
@@ -1077,7 +1077,7 @@ static void load_shadow_guest_state(struct vcpu *v)
if ( control & VM_ENTRY_LOAD_PERF_GLOBAL_CTRL )
{
rc = hvm_msr_write_intercept(MSR_CORE_PERF_GLOBAL_CTRL,
- get_vvmcs(v, GUEST_PERF_GLOBAL_CTRL), 0);
+ get_vvmcs(v, GUEST_PERF_GLOBAL_CTRL));
if ( rc == X86EMUL_EXCEPTION )
hvm_inject_hw_exception(TRAP_gp_fault, 0);
}
@@ -1265,15 +1265,15 @@ static void load_vvmcs_host_state(struct vcpu *v)
__vmwrite(vmcs_h2g_field[i].guest_field, r);
}
- rc = hvm_set_cr0(get_vvmcs(v, HOST_CR0), 1);
+ rc = hvm_set_cr0(get_vvmcs(v, HOST_CR0));
if ( rc == X86EMUL_EXCEPTION )
hvm_inject_hw_exception(TRAP_gp_fault, 0);
- rc = hvm_set_cr4(get_vvmcs(v, HOST_CR4), 1);
+ rc = hvm_set_cr4(get_vvmcs(v, HOST_CR4));
if ( rc == X86EMUL_EXCEPTION )
hvm_inject_hw_exception(TRAP_gp_fault, 0);
- rc = hvm_set_cr3(get_vvmcs(v, HOST_CR3), 1);
+ rc = hvm_set_cr3(get_vvmcs(v, HOST_CR3));
if ( rc == X86EMUL_EXCEPTION )
hvm_inject_hw_exception(TRAP_gp_fault, 0);
@@ -1283,7 +1283,7 @@ static void load_vvmcs_host_state(struct vcpu *v)
if ( control & VM_EXIT_LOAD_PERF_GLOBAL_CTRL )
{
rc = hvm_msr_write_intercept(MSR_CORE_PERF_GLOBAL_CTRL,
- get_vvmcs(v, HOST_PERF_GLOBAL_CTRL), 1);
+ get_vvmcs(v, HOST_PERF_GLOBAL_CTRL));
if ( rc == X86EMUL_EXCEPTION )
hvm_inject_hw_exception(TRAP_gp_fault, 0);
}
diff --git a/xen/include/asm-x86/hvm/support.h b/xen/include/asm-x86/hvm/support.h
index 7222939..7aaa4b1 100644
--- a/xen/include/asm-x86/hvm/support.h
+++ b/xen/include/asm-x86/hvm/support.h
@@ -134,9 +134,9 @@ void hvm_shadow_handle_cd(struct vcpu *v, unsigned long value);
* returned.
*/
int hvm_set_efer(uint64_t value);
-int hvm_set_cr0(unsigned long value, bool_t may_defer);
-int hvm_set_cr3(unsigned long value, bool_t may_defer);
-int hvm_set_cr4(unsigned long value, bool_t may_defer);
+int hvm_set_cr0(unsigned long value);
+int hvm_set_cr3(unsigned long value);
+int hvm_set_cr4(unsigned long value);
int hvm_descriptor_access_intercept(uint64_t exit_info,
uint64_t vmx_exit_qualification,
unsigned int descriptor, bool is_write);
@@ -151,7 +151,7 @@ void hvm_ud_intercept(struct cpu_user_regs *);
int __must_check hvm_msr_read_intercept(
unsigned int msr, uint64_t *msr_content);
int __must_check hvm_msr_write_intercept(
- unsigned int msr, uint64_t msr_content, bool may_defer);
+ unsigned int msr, uint64_t msr_content);
#endif /* __ASM_X86_HVM_SUPPORT_H__ */
--
2.1.4
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] x86/monitor: Introduce a boolean to suppress nested monitoring events
2018-10-23 14:35 ` [PATCH 1/2] x86/monitor: Introduce a boolean to suppress nested monitoring events Andrew Cooper
@ 2018-10-23 14:54 ` Razvan Cojocaru
2018-10-23 15:08 ` Andrew Cooper
0 siblings, 1 reply; 8+ messages in thread
From: Razvan Cojocaru @ 2018-10-23 14:54 UTC (permalink / raw)
To: Andrew Cooper, Xen-devel; +Cc: Tamas K Lengyel, Wei Liu, Jan Beulich
On 10/23/18 5:35 PM, Andrew Cooper wrote:
> When applying vm_event actions, monitoring events can nest and effectively
> livelock the vcpu. Introduce a boolean which is set for a specific portion of
> the hvm_do_resume() path, which causes the hvm_monitor_* helpers to exit
> early.
>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Razvan Cojocaru <rcojocaru@bitdefender.com>
> CC: Tamas K Lengyel <tamas@tklengyel.com>
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Wei Liu <wei.liu2@citrix.com>
>
> RFC - This probably wants wiring up on ARM as well, but all I see is
> monitor_smc() and no equivalent parts to hvm_do_resume() where we may inject
> an exception from a vm_event. Should this be included in non-x86 monitor
> functions for consistency at this point?
> ---
> xen/arch/x86/hvm/hvm.c | 8 ++++++++
> xen/arch/x86/hvm/monitor.c | 24 ++++++++++++++++++++++--
> xen/include/xen/sched.h | 8 ++++++++
> 3 files changed, 38 insertions(+), 2 deletions(-)
>
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 9bc8078..4b4d9d6 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -512,6 +512,12 @@ void hvm_do_resume(struct vcpu *v)
> if ( !handle_hvm_io_completion(v) )
> return;
>
> + /*
> + * We are about to apply actions requested by the introspection
> + * agent. Don't trigger further monitoring.
> + */
> + v->monitor.suppress = true;
> +
> if ( unlikely(v->arch.vm_event) )
> hvm_vm_event_do_resume(v);
>
> @@ -526,6 +532,8 @@ void hvm_do_resume(struct vcpu *v)
> v->arch.hvm.inject_event.vector = HVM_EVENT_VECTOR_UNSET;
> }
>
> + v->monitor.suppress = false;
> +
> if ( unlikely(v->arch.vm_event) && v->arch.monitor.next_interrupt_enabled )
> {
> struct x86_event info;
> diff --git a/xen/arch/x86/hvm/monitor.c b/xen/arch/x86/hvm/monitor.c
> index 2a41ccc..f1a196f 100644
> --- a/xen/arch/x86/hvm/monitor.c
> +++ b/xen/arch/x86/hvm/monitor.c
> @@ -36,6 +36,9 @@ bool hvm_monitor_cr(unsigned int index, unsigned long value, unsigned long old)
> struct arch_domain *ad = &curr->domain->arch;
> unsigned int ctrlreg_bitmask = monitor_ctrlreg_bitmask(index);
>
> + if ( curr->monitor.suppress )
> + return 0;
> +
> if ( index == VM_EVENT_X86_CR3 && hvm_pcid_enabled(curr) )
> value &= ~X86_CR3_NOFLUSH; /* Clear the noflush bit. */
>
> @@ -73,6 +76,9 @@ bool hvm_monitor_emul_unimplemented(void)
> .vcpu_id = curr->vcpu_id,
> };
>
> + if ( curr->monitor.suppress )
> + return false;
Rather than doing this for each event, I think we may be able to do it
only in monitor_traps(). Am I missing something?
Thanks,
Razvan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] x86/monitor: Introduce a boolean to suppress nested monitoring events
2018-10-23 14:54 ` Razvan Cojocaru
@ 2018-10-23 15:08 ` Andrew Cooper
2018-10-23 15:27 ` Razvan Cojocaru
0 siblings, 1 reply; 8+ messages in thread
From: Andrew Cooper @ 2018-10-23 15:08 UTC (permalink / raw)
To: Razvan Cojocaru, Xen-devel; +Cc: Tamas K Lengyel, Wei Liu, Jan Beulich
On 23/10/18 15:54, Razvan Cojocaru wrote:
> On 10/23/18 5:35 PM, Andrew Cooper wrote:
>> diff --git a/xen/arch/x86/hvm/monitor.c b/xen/arch/x86/hvm/monitor.c
>> index 2a41ccc..f1a196f 100644
>> --- a/xen/arch/x86/hvm/monitor.c
>> +++ b/xen/arch/x86/hvm/monitor.c
>> @@ -36,6 +36,9 @@ bool hvm_monitor_cr(unsigned int index, unsigned long value, unsigned long old)
>> struct arch_domain *ad = &curr->domain->arch;
>> unsigned int ctrlreg_bitmask = monitor_ctrlreg_bitmask(index);
>>
>> + if ( curr->monitor.suppress )
>> + return 0;
>> +
>> if ( index == VM_EVENT_X86_CR3 && hvm_pcid_enabled(curr) )
>> value &= ~X86_CR3_NOFLUSH; /* Clear the noflush bit. */
>>
>> @@ -73,6 +76,9 @@ bool hvm_monitor_emul_unimplemented(void)
>> .vcpu_id = curr->vcpu_id,
>> };
>>
>> + if ( curr->monitor.suppress )
>> + return false;
> Rather than doing this for each event, I think we may be able to do it
> only in monitor_traps(). Am I missing something?
I guess that depends on how expensive it is to collect together the
other data being fed into the monitor ring. I suppose it is only the
hvm_do_resume() path which will suffer, and only on a reply from
introspection, which isn't exactly a fastpath.
~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] x86/hvm: Drop the may_defer boolean from hvm_* helpers
2018-10-23 14:35 ` [PATCH 2/2] x86/hvm: Drop the may_defer boolean from hvm_* helpers Andrew Cooper
@ 2018-10-23 15:24 ` Razvan Cojocaru
2018-10-24 10:00 ` Andrew Cooper
0 siblings, 1 reply; 8+ messages in thread
From: Razvan Cojocaru @ 2018-10-23 15:24 UTC (permalink / raw)
To: Andrew Cooper, Xen-devel
Cc: Kevin Tian, Tamas K Lengyel, Wei Liu, Jan Beulich, Jun Nakajima,
Boris Ostrovsky, Brian Woods, Suravee Suthikulpanit
On 10/23/18 5:35 PM, Andrew Cooper wrote:
> The may_defer booleans were introduced with the monitor infrastructure, but
> their purpose is not obvious and not described anywhere.
>
> They exist to avoid triggering nested monitoring events from introspection
> activities, but with the introduction of the general monitor.suppress
> infrastructure, they are no longer needed. Drop them.
I admit their purpose may not be obvious, but they don't exist only for
the reason you've given. They exist so that we may be able to send out
vm_events _before_ a write happens (so that we are then able to veto the
CR or MSR write from the introspection agent).
So defer means that we defer the write to until after the introspection
agent replies. The "may" part refers to the fact that the introspection
may not be interested in that event, so you're telling the function
"please don't write the value in this MSR, just send a vm_event for now,
_unless_ the introspection agent didn't subscribe to writes in this
particular MSR".
The actual write is done in the code called by hvm_vm_event_do_resume(),
if the vm_event reply allows it.
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Razvan Cojocaru <rcojocaru@bitdefender.com>
> CC: Tamas K Lengyel <tamas@tklengyel.com>
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> CC: Jun Nakajima <jun.nakajima@intel.com>
> CC: Kevin Tian <kevin.tian@intel.com>
> CC: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> CC: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> CC: Brian Woods <brian.woods@amd.com>
> ---
> xen/arch/x86/hvm/emulate.c | 8 ++++----
> xen/arch/x86/hvm/hvm.c | 31 +++++++++++++++----------------
> xen/arch/x86/hvm/svm/nestedsvm.c | 14 +++++++-------
> xen/arch/x86/hvm/svm/svm.c | 2 +-
> xen/arch/x86/hvm/vm_event.c | 9 ++++-----
> xen/arch/x86/hvm/vmx/vmx.c | 4 ++--
> xen/arch/x86/hvm/vmx/vvmx.c | 16 ++++++++--------
> xen/include/asm-x86/hvm/support.h | 8 ++++----
> 8 files changed, 45 insertions(+), 47 deletions(-)
>
> diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
> index cd1d9a7..43f18c2 100644
> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -2024,7 +2024,7 @@ static int hvmemul_write_cr(
> switch ( reg )
> {
> case 0:
> - rc = hvm_set_cr0(val, 1);
> + rc = hvm_set_cr0(val);
> break;
>
> case 2:
> @@ -2033,11 +2033,11 @@ static int hvmemul_write_cr(
> break;
>
> case 3:
> - rc = hvm_set_cr3(val, 1);
> + rc = hvm_set_cr3(val);
> break;
>
> case 4:
> - rc = hvm_set_cr4(val, 1);
> + rc = hvm_set_cr4(val);
> break;
>
> default:
> @@ -2092,7 +2092,7 @@ static int hvmemul_write_msr(
> uint64_t val,
> struct x86_emulate_ctxt *ctxt)
> {
> - int rc = hvm_msr_write_intercept(reg, val, 1);
> + int rc = hvm_msr_write_intercept(reg, val);
>
> if ( rc == X86EMUL_EXCEPTION )
> x86_emul_hw_exception(TRAP_gp_fault, 0, ctxt);
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 4b4d9d6..296b967 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -2046,15 +2046,15 @@ int hvm_mov_to_cr(unsigned int cr, unsigned int gpr)
> switch ( cr )
> {
> case 0:
> - rc = hvm_set_cr0(val, 1);
> + rc = hvm_set_cr0(val);
> break;
>
> case 3:
> - rc = hvm_set_cr3(val, 1);
> + rc = hvm_set_cr3(val);
> break;
>
> case 4:
> - rc = hvm_set_cr4(val, 1);
> + rc = hvm_set_cr4(val);
> break;
>
> case 8:
> @@ -2150,7 +2150,7 @@ static void hvm_update_cr(struct vcpu *v, unsigned int cr, unsigned long value)
> hvm_update_guest_cr(v, cr);
> }
>
> -int hvm_set_cr0(unsigned long value, bool_t may_defer)
> +int hvm_set_cr0(unsigned long value)
> {
> struct vcpu *v = current;
> struct domain *d = v->domain;
> @@ -2176,8 +2176,8 @@ int hvm_set_cr0(unsigned long value, bool_t may_defer)
> (value & (X86_CR0_PE | X86_CR0_PG)) == X86_CR0_PG )
> return X86EMUL_EXCEPTION;
>
> - if ( may_defer && unlikely(v->domain->arch.monitor.write_ctrlreg_enabled &
> - monitor_ctrlreg_bitmask(VM_EVENT_X86_CR0)) )
> + if ( unlikely(v->domain->arch.monitor.write_ctrlreg_enabled &
> + monitor_ctrlreg_bitmask(VM_EVENT_X86_CR0)) )
> {
> ASSERT(v->arch.vm_event);
>
> @@ -2268,15 +2268,15 @@ int hvm_set_cr0(unsigned long value, bool_t may_defer)
> return X86EMUL_OKAY;
> }
>
> -int hvm_set_cr3(unsigned long value, bool_t may_defer)
> +int hvm_set_cr3(unsigned long value)
> {
> struct vcpu *v = current;
> struct page_info *page;
> unsigned long old = v->arch.hvm.guest_cr[3];
> bool noflush = false;
>
> - if ( may_defer && unlikely(v->domain->arch.monitor.write_ctrlreg_enabled &
> - monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3)) )
> + if ( unlikely(v->domain->arch.monitor.write_ctrlreg_enabled &
> + monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3)) )
> {
> ASSERT(v->arch.vm_event);
>
> @@ -2322,7 +2322,7 @@ int hvm_set_cr3(unsigned long value, bool_t may_defer)
> return X86EMUL_UNHANDLEABLE;
> }
>
> -int hvm_set_cr4(unsigned long value, bool_t may_defer)
> +int hvm_set_cr4(unsigned long value)
> {
> struct vcpu *v = current;
> unsigned long old_cr;
> @@ -2356,8 +2356,8 @@ int hvm_set_cr4(unsigned long value, bool_t may_defer)
> return X86EMUL_EXCEPTION;
> }
>
> - if ( may_defer && unlikely(v->domain->arch.monitor.write_ctrlreg_enabled &
> - monitor_ctrlreg_bitmask(VM_EVENT_X86_CR4)) )
> + if ( unlikely(v->domain->arch.monitor.write_ctrlreg_enabled &
> + monitor_ctrlreg_bitmask(VM_EVENT_X86_CR4)) )
> {
> ASSERT(v->arch.vm_event);
>
> @@ -2989,7 +2989,7 @@ void hvm_task_switch(
> if ( task_switch_load_seg(x86_seg_ldtr, tss.ldt, new_cpl, 0) )
> goto out;
>
> - rc = hvm_set_cr3(tss.cr3, 1);
> + rc = hvm_set_cr3(tss.cr3);
> if ( rc == X86EMUL_EXCEPTION )
> hvm_inject_hw_exception(TRAP_gp_fault, 0);
> if ( rc != X86EMUL_OKAY )
> @@ -3497,8 +3497,7 @@ int hvm_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
> goto out;
> }
>
> -int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content,
> - bool may_defer)
> +int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content)
> {
> struct vcpu *v = current;
> struct domain *d = v->domain;
> @@ -3507,7 +3506,7 @@ int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content,
> HVMTRACE_3D(MSR_WRITE, msr,
> (uint32_t)msr_content, (uint32_t)(msr_content >> 32));
>
> - if ( may_defer && unlikely(monitored_msr(v->domain, msr)) )
> + if ( unlikely(monitored_msr(v->domain, msr)) )
> {
> uint64_t msr_old_content;
>
I don't see how this could work. The beginning of this function looks as
follows:
3492 int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content,
3493 bool may_defer)
3494 {
3495 struct vcpu *v = current;
3496 struct domain *d = v->domain;
3497 int ret;
3498
3499 HVMTRACE_3D(MSR_WRITE, msr,
3500 (uint32_t)msr_content, (uint32_t)(msr_content >> 32));
3501
3502 if ( may_defer && unlikely(monitored_msr(v->domain, msr)) )
3503 {
3504 uint64_t msr_old_content;
3505
3506 ret = hvm_msr_read_intercept(msr, &msr_old_content);
3507 if ( ret != X86EMUL_OKAY )
3508 return ret;
3509
3510 ASSERT(v->arch.vm_event);
3511
3512 /* The actual write will occur in hvm_do_resume() (if
permitted). */
3513 v->arch.vm_event->write_data.do_write.msr = 1;
3514 v->arch.vm_event->write_data.msr = msr;
3515 v->arch.vm_event->write_data.value = msr_content;
3516
3517 hvm_monitor_msr(msr, msr_content, msr_old_content);
3518 return X86EMUL_OKAY;
3519 }
3520
3521 if ( (ret = guest_wrmsr(v, msr, msr_content)) !=
X86EMUL_UNHANDLEABLE )
3522 return ret;
By dumping may_defer, you're now making sure that this function will
never get to guest_wrmsr() as long as we're dealing with a monitored_msr().
But the code currently calls hvm_msr_write_intercept() with a 0 value
for may_defer not only in hvm_vm_event_do_resume(), but also in
load_shadow_guest_state() in vvmx.c, for example.
Speaking of which, removing may_defer from these functions without
looking at v->monitor.suppress won't work. I think what you were aiming
at was perhaps to replace may_defer with an equilvalent test on
v->monitor.suppress in the body of the function instead of simply
erasing may_defer from everywhere.
Thanks,
Razvan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] x86/monitor: Introduce a boolean to suppress nested monitoring events
2018-10-23 15:08 ` Andrew Cooper
@ 2018-10-23 15:27 ` Razvan Cojocaru
0 siblings, 0 replies; 8+ messages in thread
From: Razvan Cojocaru @ 2018-10-23 15:27 UTC (permalink / raw)
To: Andrew Cooper, Xen-devel; +Cc: Tamas K Lengyel, Wei Liu, Jan Beulich
On 10/23/18 6:08 PM, Andrew Cooper wrote:
> On 23/10/18 15:54, Razvan Cojocaru wrote:
>> On 10/23/18 5:35 PM, Andrew Cooper wrote:
>>> diff --git a/xen/arch/x86/hvm/monitor.c b/xen/arch/x86/hvm/monitor.c
>>> index 2a41ccc..f1a196f 100644
>>> --- a/xen/arch/x86/hvm/monitor.c
>>> +++ b/xen/arch/x86/hvm/monitor.c
>>> @@ -36,6 +36,9 @@ bool hvm_monitor_cr(unsigned int index, unsigned long value, unsigned long old)
>>> struct arch_domain *ad = &curr->domain->arch;
>>> unsigned int ctrlreg_bitmask = monitor_ctrlreg_bitmask(index);
>>>
>>> + if ( curr->monitor.suppress )
>>> + return 0;
>>> +
>>> if ( index == VM_EVENT_X86_CR3 && hvm_pcid_enabled(curr) )
>>> value &= ~X86_CR3_NOFLUSH; /* Clear the noflush bit. */
>>>
>>> @@ -73,6 +76,9 @@ bool hvm_monitor_emul_unimplemented(void)
>>> .vcpu_id = curr->vcpu_id,
>>> };
>>>
>>> + if ( curr->monitor.suppress )
>>> + return false;
>> Rather than doing this for each event, I think we may be able to do it
>> only in monitor_traps(). Am I missing something?
>
> I guess that depends on how expensive it is to collect together the
> other data being fed into the monitor ring. I suppose it is only the
> hvm_do_resume() path which will suffer, and only on a reply from
> introspection, which isn't exactly a fastpath.
monitor_traps() calls vm_event_fill_regs(req); at the very end, which
you can short-circuit by returning sooner. The rest of the information I
believe has already been collected where you test v->monitor.suppress:
91 int monitor_traps(struct vcpu *v, bool sync, vm_event_request_t *req)
92 {
93 int rc;
94 struct domain *d = v->domain;
95
96 rc = vm_event_claim_slot(d, d->vm_event_monitor);
97 switch ( rc )
98 {
99 case 0:
100 break;
101 case -ENOSYS:
102 /*
103 * If there was no ring to handle the event, then
104 * simply continue executing normally.
105 */
106 return 0;
107 default:
108 return rc;
109 };
110
111 req->vcpu_id = v->vcpu_id;
112
113 if ( sync )
114 {
115 req->flags |= VM_EVENT_FLAG_VCPU_PAUSED;
116 vm_event_vcpu_pause(v);
117 rc = 1;
118 }
119
120 if ( altp2m_active(d) )
121 {
122 req->flags |= VM_EVENT_FLAG_ALTERNATE_P2M;
123 req->altp2m_idx = altp2m_vcpu_idx(v);
124 }
125
126 vm_event_fill_regs(req);
127 vm_event_put_request(d, d->vm_event_monitor, req);
128
129 return rc;
130 }
Thanks,
Razvan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] x86/hvm: Drop the may_defer boolean from hvm_* helpers
2018-10-23 15:24 ` Razvan Cojocaru
@ 2018-10-24 10:00 ` Andrew Cooper
0 siblings, 0 replies; 8+ messages in thread
From: Andrew Cooper @ 2018-10-24 10:00 UTC (permalink / raw)
To: Razvan Cojocaru, Xen-devel
Cc: Kevin Tian, Tamas K Lengyel, Wei Liu, Jan Beulich, Jun Nakajima,
Boris Ostrovsky, Brian Woods, Suravee Suthikulpanit
On 23/10/18 16:24, Razvan Cojocaru wrote:
> On 10/23/18 5:35 PM, Andrew Cooper wrote:
>> The may_defer booleans were introduced with the monitor infrastructure, but
>> their purpose is not obvious and not described anywhere.
>>
>> They exist to avoid triggering nested monitoring events from introspection
>> activities, but with the introduction of the general monitor.suppress
>> infrastructure, they are no longer needed. Drop them.
> I admit their purpose may not be obvious, but they don't exist only for
> the reason you've given. They exist so that we may be able to send out
> vm_events _before_ a write happens (so that we are then able to veto the
> CR or MSR write from the introspection agent).
>
> So defer means that we defer the write to until after the introspection
> agent replies. The "may" part refers to the fact that the introspection
> may not be interested in that event, so you're telling the function
> "please don't write the value in this MSR, just send a vm_event for now,
> _unless_ the introspection agent didn't subscribe to writes in this
> particular MSR".
>
> The actual write is done in the code called by hvm_vm_event_do_resume(),
> if the vm_event reply allows it.
>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>> CC: Razvan Cojocaru <rcojocaru@bitdefender.com>
>> CC: Tamas K Lengyel <tamas@tklengyel.com>
>> CC: Jan Beulich <JBeulich@suse.com>
>> CC: Wei Liu <wei.liu2@citrix.com>
>> CC: Jun Nakajima <jun.nakajima@intel.com>
>> CC: Kevin Tian <kevin.tian@intel.com>
>> CC: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>> CC: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
>> CC: Brian Woods <brian.woods@amd.com>
>> ---
>> xen/arch/x86/hvm/emulate.c | 8 ++++----
>> xen/arch/x86/hvm/hvm.c | 31 +++++++++++++++----------------
>> xen/arch/x86/hvm/svm/nestedsvm.c | 14 +++++++-------
>> xen/arch/x86/hvm/svm/svm.c | 2 +-
>> xen/arch/x86/hvm/vm_event.c | 9 ++++-----
>> xen/arch/x86/hvm/vmx/vmx.c | 4 ++--
>> xen/arch/x86/hvm/vmx/vvmx.c | 16 ++++++++--------
>> xen/include/asm-x86/hvm/support.h | 8 ++++----
>> 8 files changed, 45 insertions(+), 47 deletions(-)
>>
>> diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
>> index cd1d9a7..43f18c2 100644
>> --- a/xen/arch/x86/hvm/emulate.c
>> +++ b/xen/arch/x86/hvm/emulate.c
>> @@ -2024,7 +2024,7 @@ static int hvmemul_write_cr(
>> switch ( reg )
>> {
>> case 0:
>> - rc = hvm_set_cr0(val, 1);
>> + rc = hvm_set_cr0(val);
>> break;
>>
>> case 2:
>> @@ -2033,11 +2033,11 @@ static int hvmemul_write_cr(
>> break;
>>
>> case 3:
>> - rc = hvm_set_cr3(val, 1);
>> + rc = hvm_set_cr3(val);
>> break;
>>
>> case 4:
>> - rc = hvm_set_cr4(val, 1);
>> + rc = hvm_set_cr4(val);
>> break;
>>
>> default:
>> @@ -2092,7 +2092,7 @@ static int hvmemul_write_msr(
>> uint64_t val,
>> struct x86_emulate_ctxt *ctxt)
>> {
>> - int rc = hvm_msr_write_intercept(reg, val, 1);
>> + int rc = hvm_msr_write_intercept(reg, val);
>>
>> if ( rc == X86EMUL_EXCEPTION )
>> x86_emul_hw_exception(TRAP_gp_fault, 0, ctxt);
>> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
>> index 4b4d9d6..296b967 100644
>> --- a/xen/arch/x86/hvm/hvm.c
>> +++ b/xen/arch/x86/hvm/hvm.c
>> @@ -2046,15 +2046,15 @@ int hvm_mov_to_cr(unsigned int cr, unsigned int gpr)
>> switch ( cr )
>> {
>> case 0:
>> - rc = hvm_set_cr0(val, 1);
>> + rc = hvm_set_cr0(val);
>> break;
>>
>> case 3:
>> - rc = hvm_set_cr3(val, 1);
>> + rc = hvm_set_cr3(val);
>> break;
>>
>> case 4:
>> - rc = hvm_set_cr4(val, 1);
>> + rc = hvm_set_cr4(val);
>> break;
>>
>> case 8:
>> @@ -2150,7 +2150,7 @@ static void hvm_update_cr(struct vcpu *v, unsigned int cr, unsigned long value)
>> hvm_update_guest_cr(v, cr);
>> }
>>
>> -int hvm_set_cr0(unsigned long value, bool_t may_defer)
>> +int hvm_set_cr0(unsigned long value)
>> {
>> struct vcpu *v = current;
>> struct domain *d = v->domain;
>> @@ -2176,8 +2176,8 @@ int hvm_set_cr0(unsigned long value, bool_t may_defer)
>> (value & (X86_CR0_PE | X86_CR0_PG)) == X86_CR0_PG )
>> return X86EMUL_EXCEPTION;
>>
>> - if ( may_defer && unlikely(v->domain->arch.monitor.write_ctrlreg_enabled &
>> - monitor_ctrlreg_bitmask(VM_EVENT_X86_CR0)) )
>> + if ( unlikely(v->domain->arch.monitor.write_ctrlreg_enabled &
>> + monitor_ctrlreg_bitmask(VM_EVENT_X86_CR0)) )
>> {
>> ASSERT(v->arch.vm_event);
>>
>> @@ -2268,15 +2268,15 @@ int hvm_set_cr0(unsigned long value, bool_t may_defer)
>> return X86EMUL_OKAY;
>> }
>>
>> -int hvm_set_cr3(unsigned long value, bool_t may_defer)
>> +int hvm_set_cr3(unsigned long value)
>> {
>> struct vcpu *v = current;
>> struct page_info *page;
>> unsigned long old = v->arch.hvm.guest_cr[3];
>> bool noflush = false;
>>
>> - if ( may_defer && unlikely(v->domain->arch.monitor.write_ctrlreg_enabled &
>> - monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3)) )
>> + if ( unlikely(v->domain->arch.monitor.write_ctrlreg_enabled &
>> + monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3)) )
>> {
>> ASSERT(v->arch.vm_event);
>>
>> @@ -2322,7 +2322,7 @@ int hvm_set_cr3(unsigned long value, bool_t may_defer)
>> return X86EMUL_UNHANDLEABLE;
>> }
>>
>> -int hvm_set_cr4(unsigned long value, bool_t may_defer)
>> +int hvm_set_cr4(unsigned long value)
>> {
>> struct vcpu *v = current;
>> unsigned long old_cr;
>> @@ -2356,8 +2356,8 @@ int hvm_set_cr4(unsigned long value, bool_t may_defer)
>> return X86EMUL_EXCEPTION;
>> }
>>
>> - if ( may_defer && unlikely(v->domain->arch.monitor.write_ctrlreg_enabled &
>> - monitor_ctrlreg_bitmask(VM_EVENT_X86_CR4)) )
>> + if ( unlikely(v->domain->arch.monitor.write_ctrlreg_enabled &
>> + monitor_ctrlreg_bitmask(VM_EVENT_X86_CR4)) )
>> {
>> ASSERT(v->arch.vm_event);
>>
>> @@ -2989,7 +2989,7 @@ void hvm_task_switch(
>> if ( task_switch_load_seg(x86_seg_ldtr, tss.ldt, new_cpl, 0) )
>> goto out;
>>
>> - rc = hvm_set_cr3(tss.cr3, 1);
>> + rc = hvm_set_cr3(tss.cr3);
>> if ( rc == X86EMUL_EXCEPTION )
>> hvm_inject_hw_exception(TRAP_gp_fault, 0);
>> if ( rc != X86EMUL_OKAY )
>> @@ -3497,8 +3497,7 @@ int hvm_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
>> goto out;
>> }
>>
>> -int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content,
>> - bool may_defer)
>> +int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content)
>> {
>> struct vcpu *v = current;
>> struct domain *d = v->domain;
>> @@ -3507,7 +3506,7 @@ int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content,
>> HVMTRACE_3D(MSR_WRITE, msr,
>> (uint32_t)msr_content, (uint32_t)(msr_content >> 32));
>>
>> - if ( may_defer && unlikely(monitored_msr(v->domain, msr)) )
>> + if ( unlikely(monitored_msr(v->domain, msr)) )
>> {
>> uint64_t msr_old_content;
>>
> I don't see how this could work. The beginning of this function looks as
> follows:
>
> 3492 int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content,
> 3493 bool may_defer)
> 3494 {
> 3495 struct vcpu *v = current;
> 3496 struct domain *d = v->domain;
> 3497 int ret;
> 3498
> 3499 HVMTRACE_3D(MSR_WRITE, msr,
> 3500 (uint32_t)msr_content, (uint32_t)(msr_content >> 32));
> 3501
> 3502 if ( may_defer && unlikely(monitored_msr(v->domain, msr)) )
> 3503 {
> 3504 uint64_t msr_old_content;
> 3505
> 3506 ret = hvm_msr_read_intercept(msr, &msr_old_content);
> 3507 if ( ret != X86EMUL_OKAY )
> 3508 return ret;
> 3509
> 3510 ASSERT(v->arch.vm_event);
> 3511
> 3512 /* The actual write will occur in hvm_do_resume() (if
> permitted). */
> 3513 v->arch.vm_event->write_data.do_write.msr = 1;
> 3514 v->arch.vm_event->write_data.msr = msr;
> 3515 v->arch.vm_event->write_data.value = msr_content;
> 3516
> 3517 hvm_monitor_msr(msr, msr_content, msr_old_content);
> 3518 return X86EMUL_OKAY;
> 3519 }
> 3520
> 3521 if ( (ret = guest_wrmsr(v, msr, msr_content)) !=
> X86EMUL_UNHANDLEABLE )
> 3522 return ret;
>
> By dumping may_defer, you're now making sure that this function will
> never get to guest_wrmsr() as long as we're dealing with a monitored_msr().
>
> But the code currently calls hvm_msr_write_intercept() with a 0 value
> for may_defer not only in hvm_vm_event_do_resume(), but also in
> load_shadow_guest_state() in vvmx.c, for example.
>
> Speaking of which, removing may_defer from these functions without
> looking at v->monitor.suppress won't work. I think what you were aiming
> at was perhaps to replace may_defer with an equilvalent test on
> v->monitor.suppress in the body of the function instead of simply
> erasing may_defer from everywhere.
Hmm - good point. This will break things even more. The
monitor.suppress check needs to be at this point, rather than later.
I'll see about wrapping the monitor checks up into some static inlines
which better model the intercepts they are built from, and check
suppress as the first action. That should resolve the issues here.
~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2018-10-24 10:00 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-23 14:35 [PATCH 0/2] x86/monitor: More generic livelock avoidance Andrew Cooper
2018-10-23 14:35 ` [PATCH 1/2] x86/monitor: Introduce a boolean to suppress nested monitoring events Andrew Cooper
2018-10-23 14:54 ` Razvan Cojocaru
2018-10-23 15:08 ` Andrew Cooper
2018-10-23 15:27 ` Razvan Cojocaru
2018-10-23 14:35 ` [PATCH 2/2] x86/hvm: Drop the may_defer boolean from hvm_* helpers Andrew Cooper
2018-10-23 15:24 ` Razvan Cojocaru
2018-10-24 10:00 ` Andrew Cooper
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.