All of lore.kernel.org
 help / color / mirror / Atom feed
* Vm_event memory introspection helpers
@ 2015-06-15  9:03 Razvan Cojocaru
  2015-06-15  9:03 ` [PATCH V2 1/3] xen/vm_access: Support for memory-content hiding Razvan Cojocaru
                   ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Razvan Cojocaru @ 2015-06-15  9:03 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, keir, eddie.dong, stefano.stabellini, jun.nakajima,
	andrew.cooper3, ian.jackson, tim, Aravind.Gopalakrishnan,
	jbeulich, wei.liu2, boris.ostrovsky, suravee.suthikulpanit,
	ian.campbell

Hello,

Since the first version of the series, which consisted of five patches,
the following major changes have happened:

1. The first patch in the previous series has already been accepted,
under the name "xen/vm_event: Clean up control-register-write vm_events
and add XCR0 event". The patch was "xen/vm_event: Added support for
XSETBV events" - the XCR0 is the previously-named XSETBV event.

2. It has been decided that having a single DENY vm_event flag that
applies to both CR and MSR writes is not an outlandish idea, and as
such the last patch (patch 5) from the previous series has been
removed, and the CR0, CR3 and CR4 write events are now being sent
as pre-write events, to be denied the same way as the MSR writes
are, in patch #3.

So the current version now went from 5 patches to 3:

[PATCH V2 1/3] xen/vm_access: Support for memory-content hiding
[PATCH V2 2/3] xen/vm_event: Support for guest-requested events
[PATCH V2 3/3] xen/vm_event: Deny register writes if refused by
vm_event reply.


Thanks,
Razvan Cojocaru

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH V2 1/3] xen/vm_access: Support for memory-content hiding
  2015-06-15  9:03 Vm_event memory introspection helpers Razvan Cojocaru
@ 2015-06-15  9:03 ` Razvan Cojocaru
  2015-06-25 15:57   ` Jan Beulich
  2015-06-15  9:03 ` [PATCH V2 2/3] xen/vm_event: Support for guest-requested events Razvan Cojocaru
  2015-06-15  9:03 ` [PATCH V2 3/3] xen/vm_event: Deny register writes if refused by vm_event reply Razvan Cojocaru
  2 siblings, 1 reply; 30+ messages in thread
From: Razvan Cojocaru @ 2015-06-15  9:03 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, keir, eddie.dong, Razvan Cojocaru,
	stefano.stabellini, jun.nakajima, andrew.cooper3, ian.jackson,
	tim, Aravind.Gopalakrishnan, jbeulich, wei.liu2, boris.ostrovsky,
	suravee.suthikulpanit, ian.campbell

This patch adds support for memory-content hiding, by modifying the
value returned by emulated instructions that read certain memory
addresses that contain sensitive data. The patch only applies to
cases where MEM_ACCESS_EMULATE or MEM_ACCESS_EMULATE_NOWRITE have
been set to a vm_event response.

Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>

---
Changes since V1:
 - Using min() and max() where applicable.
 - Fixed memcpy() length in hvmemul_read_set_context().
 - "struct vcpu* curr" became "struct vcpu *curr".
 - Put "NORMAL" as the first case in the switch() in
   hvm_mem_access_emulate_one().
 - If ( unlikely(set_context) ) overwrite a safe chunk of the buffer
   instead of completely ignoring *reps and bytes in
   _hvmemul_rep_movs().
 - Added a comment to explain vm_event_emul_read_data.data's size
   and updated the header to no longer use a magic constant for the
   size.
 - A safe portion of the buffer is now overwritten with the provided
   bytes (no more lost bytes or ignored reps count).
 - Added handlers for hvmemul_cmpxchg(), hvmemul_read_io() and
   hvmemul_rep_outs().
 - Named the regs / emul_read_data union.
 - Modified xen-access.c to compile after naming the regs /
   emul_read_data union.
---
 tools/tests/xen-access/xen-access.c |    2 +-
 xen/arch/x86/domain.c               |    1 +
 xen/arch/x86/hvm/emulate.c          |  218 +++++++++++++++++++++++++++++++++--
 xen/arch/x86/hvm/event.c            |   50 ++++----
 xen/arch/x86/mm/p2m.c               |   90 ++++++++-------
 xen/common/vm_event.c               |   23 ++++
 xen/include/asm-x86/domain.h        |    2 +
 xen/include/asm-x86/hvm/emulate.h   |   10 +-
 xen/include/public/vm_event.h       |   21 +++-
 9 files changed, 339 insertions(+), 78 deletions(-)

diff --git a/tools/tests/xen-access/xen-access.c b/tools/tests/xen-access/xen-access.c
index 12ab921..e6ca9ba 100644
--- a/tools/tests/xen-access/xen-access.c
+++ b/tools/tests/xen-access/xen-access.c
@@ -530,7 +530,7 @@ int main(int argc, char *argv[])
                 break;
             case VM_EVENT_REASON_SOFTWARE_BREAKPOINT:
                 printf("Breakpoint: rip=%016"PRIx64", gfn=%"PRIx64" (vcpu %d)\n",
-                       req.regs.x86.rip,
+                       req.data.regs.x86.rip,
                        req.u.software_breakpoint.gfn,
                        req.vcpu_id);
 
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index db073a6..95eb190 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -269,6 +269,7 @@ struct vcpu *alloc_vcpu_struct(void)
 
 void free_vcpu_struct(struct vcpu *v)
 {
+    xfree(v->arch.vm_event.emul_read_data);
     free_xenheap_page(v);
 }
 
diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index ac9c9d6..2f7081d 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -578,6 +578,28 @@ static int hvmemul_read(
         container_of(ctxt, struct hvm_emulate_ctxt, ctxt));
 }
 
+static int hvmemul_read_set_context(
+    enum x86_segment seg,
+    unsigned long offset,
+    void *p_data,
+    unsigned int bytes,
+    struct x86_emulate_ctxt *ctxt)
+{
+    struct vcpu *curr = current;
+    unsigned int len;
+
+    if ( !curr->arch.vm_event.emul_read_data )
+        return X86EMUL_UNHANDLEABLE;
+
+    len = min_t(unsigned int,
+        bytes, curr->arch.vm_event.emul_read_data->size);
+
+    if ( len )
+        memcpy(p_data, curr->arch.vm_event.emul_read_data->data, len);
+
+    return X86EMUL_OKAY;
+}
+
 static int hvmemul_insn_fetch(
     enum x86_segment seg,
     unsigned long offset,
@@ -815,6 +837,27 @@ static int hvmemul_cmpxchg(
     return hvmemul_write(seg, offset, p_new, bytes, ctxt);
 }
 
+static int hvmemul_cmpxchg_set_context(
+    enum x86_segment seg,
+    unsigned long offset,
+    void *p_old,
+    void *p_new,
+    unsigned int bytes,
+    struct x86_emulate_ctxt *ctxt)
+{
+    struct vcpu *curr = current;
+
+    if ( curr->arch.vm_event.emul_read_data )
+    {
+        unsigned int safe_bytes = min_t(unsigned int, bytes,
+                curr->arch.vm_event.emul_read_data->size);
+
+        memcpy(p_new, curr->arch.vm_event.emul_read_data->data, safe_bytes);
+    }
+
+    return hvmemul_write(seg, offset, p_new, bytes, ctxt);
+}
+
 static int hvmemul_rep_ins(
     uint16_t src_port,
     enum x86_segment dst_seg,
@@ -891,14 +934,37 @@ static int hvmemul_rep_outs(
                           !!(ctxt->regs->eflags & X86_EFLAGS_DF), NULL);
 }
 
-static int hvmemul_rep_movs(
+static int hvmemul_rep_outs_set_context(
+    enum x86_segment src_seg,
+    unsigned long src_offset,
+    uint16_t dst_port,
+    unsigned int bytes_per_rep,
+    unsigned long *reps,
+    struct x86_emulate_ctxt *ctxt)
+{
+    struct vcpu *curr = current;
+    unsigned int safe_bytes;
+
+    if ( !curr->arch.vm_event.emul_read_data )
+        return X86EMUL_UNHANDLEABLE;
+
+    safe_bytes = min_t(unsigned int, bytes_per_rep,
+        curr->arch.vm_event.emul_read_data->size);
+
+    return hvmemul_do_pio(dst_port, reps, safe_bytes, 0, IOREQ_WRITE,
+                          !!(ctxt->regs->eflags & X86_EFLAGS_DF),
+                          curr->arch.vm_event.emul_read_data->data);
+}
+
+static int _hvmemul_rep_movs(
    enum x86_segment src_seg,
    unsigned long src_offset,
    enum x86_segment dst_seg,
    unsigned long dst_offset,
    unsigned int bytes_per_rep,
    unsigned long *reps,
-   struct x86_emulate_ctxt *ctxt)
+   struct x86_emulate_ctxt *ctxt,
+   bool_t set_context)
 {
     struct hvm_emulate_ctxt *hvmemul_ctxt =
         container_of(ctxt, struct hvm_emulate_ctxt, ctxt);
@@ -981,7 +1047,19 @@ static int hvmemul_rep_movs(
      */
     rc = hvm_copy_from_guest_phys(buf, sgpa, bytes);
     if ( rc == HVMCOPY_okay )
+    {
+        struct vcpu *curr = current;
+
+        if ( unlikely(set_context) && curr->arch.vm_event.emul_read_data )
+        {
+            unsigned long safe_bytes = min_t(unsigned long, bytes,
+                curr->arch.vm_event.emul_read_data->size);
+
+            memcpy(buf, curr->arch.vm_event.emul_read_data->data, safe_bytes);
+        }
+
         rc = hvm_copy_to_guest_phys(dgpa, buf, bytes);
+    }
 
     xfree(buf);
 
@@ -1000,13 +1078,40 @@ static int hvmemul_rep_movs(
     return X86EMUL_OKAY;
 }
 
-static int hvmemul_rep_stos(
+static int hvmemul_rep_movs(
+   enum x86_segment src_seg,
+   unsigned long src_offset,
+   enum x86_segment dst_seg,
+   unsigned long dst_offset,
+   unsigned int bytes_per_rep,
+   unsigned long *reps,
+   struct x86_emulate_ctxt *ctxt)
+{
+    return _hvmemul_rep_movs(src_seg, src_offset, dst_seg, dst_offset,
+                             bytes_per_rep, reps, ctxt, 0);
+}
+
+static int hvmemul_rep_movs_set_context(
+   enum x86_segment src_seg,
+   unsigned long src_offset,
+   enum x86_segment dst_seg,
+   unsigned long dst_offset,
+   unsigned int bytes_per_rep,
+   unsigned long *reps,
+   struct x86_emulate_ctxt *ctxt)
+{
+    return _hvmemul_rep_movs(src_seg, src_offset, dst_seg, dst_offset,
+                             bytes_per_rep, reps, ctxt, 1);
+}
+
+static int _hvmemul_rep_stos(
     void *p_data,
     enum x86_segment seg,
     unsigned long offset,
     unsigned int bytes_per_rep,
     unsigned long *reps,
-    struct x86_emulate_ctxt *ctxt)
+    struct x86_emulate_ctxt *ctxt,
+    bool_t set_context)
 {
     struct hvm_emulate_ctxt *hvmemul_ctxt =
         container_of(ctxt, struct hvm_emulate_ctxt, ctxt);
@@ -1016,6 +1121,7 @@ static int hvmemul_rep_stos(
     bool_t df = !!(ctxt->regs->eflags & X86_EFLAGS_DF);
     int rc = hvmemul_virtual_to_linear(seg, offset, bytes_per_rep, reps,
                                        hvm_access_write, hvmemul_ctxt, &addr);
+    struct vcpu *curr = current;
 
     if ( rc == X86EMUL_OKAY )
     {
@@ -1080,6 +1186,14 @@ static int hvmemul_rep_stos(
         if ( df )
             gpa -= bytes - bytes_per_rep;
 
+        if ( unlikely(set_context) && curr->arch.vm_event.emul_read_data )
+        {
+            unsigned long safe_bytes = min_t(unsigned long, bytes,
+                curr->arch.vm_event.emul_read_data->size);
+
+            memcpy(buf, curr->arch.vm_event.emul_read_data->data, safe_bytes);
+        }
+
         rc = hvm_copy_to_guest_phys(gpa, buf, bytes);
 
         if ( buf != p_data )
@@ -1107,6 +1221,31 @@ static int hvmemul_rep_stos(
     }
 }
 
+static int hvmemul_rep_stos(
+    void *p_data,
+    enum x86_segment seg,
+    unsigned long offset,
+    unsigned int bytes_per_rep,
+    unsigned long *reps,
+    struct x86_emulate_ctxt *ctxt)
+{
+    return _hvmemul_rep_stos(p_data, seg, offset, bytes_per_rep,
+                             reps, ctxt, 0);
+}
+
+
+static int hvmemul_rep_stos_set_context(
+    void *p_data,
+    enum x86_segment seg,
+    unsigned long offset,
+    unsigned int bytes_per_rep,
+    unsigned long *reps,
+    struct x86_emulate_ctxt *ctxt)
+{
+    return _hvmemul_rep_stos(p_data, seg, offset, bytes_per_rep,
+                             reps, ctxt, 1);
+}
+
 static int hvmemul_read_segment(
     enum x86_segment seg,
     struct segment_register *reg,
@@ -1145,6 +1284,28 @@ static int hvmemul_read_io(
     return hvmemul_do_pio(port, &reps, bytes, 0, IOREQ_READ, 0, val);
 }
 
+static int hvmemul_read_io_set_context(
+    unsigned int port,
+    unsigned int bytes,
+    unsigned long *val,
+    struct x86_emulate_ctxt *ctxt)
+{
+    struct vcpu *curr = current;
+    unsigned int safe_bytes;
+
+    *val = 0;
+
+    if ( !curr->arch.vm_event.emul_read_data )
+        return X86EMUL_UNHANDLEABLE;
+
+    safe_bytes = min_t(unsigned int, bytes,
+        curr->arch.vm_event.emul_read_data->size);
+
+    memcpy(val, curr->arch.vm_event.emul_read_data->data, safe_bytes);
+
+    return X86EMUL_OKAY;
+}
+
 static int hvmemul_write_io(
     unsigned int port,
     unsigned int bytes,
@@ -1408,6 +1569,32 @@ static const struct x86_emulate_ops hvm_emulate_ops_no_write = {
     .invlpg        = hvmemul_invlpg
 };
 
+static const struct x86_emulate_ops hvm_emulate_ops_set_context = {
+    .read          = hvmemul_read_set_context,
+    .insn_fetch    = hvmemul_insn_fetch,
+    .write         = hvmemul_write,
+    .cmpxchg       = hvmemul_cmpxchg_set_context,
+    .rep_ins       = hvmemul_rep_ins,
+    .rep_outs      = hvmemul_rep_outs_set_context,
+    .rep_movs      = hvmemul_rep_movs_set_context,
+    .rep_stos      = hvmemul_rep_stos_set_context,
+    .read_segment  = hvmemul_read_segment,
+    .write_segment = hvmemul_write_segment,
+    .read_io       = hvmemul_read_io_set_context,
+    .write_io      = hvmemul_write_io,
+    .read_cr       = hvmemul_read_cr,
+    .write_cr      = hvmemul_write_cr,
+    .read_msr      = hvmemul_read_msr,
+    .write_msr     = hvmemul_write_msr,
+    .wbinvd        = hvmemul_wbinvd,
+    .cpuid         = hvmemul_cpuid,
+    .inject_hw_exception = hvmemul_inject_hw_exception,
+    .inject_sw_interrupt = hvmemul_inject_sw_interrupt,
+    .get_fpu       = hvmemul_get_fpu,
+    .put_fpu       = hvmemul_put_fpu,
+    .invlpg        = hvmemul_invlpg
+};
+
 static int _hvm_emulate_one(struct hvm_emulate_ctxt *hvmemul_ctxt,
     const struct x86_emulate_ops *ops)
 {
@@ -1528,18 +1715,31 @@ int hvm_emulate_one_no_write(
     return _hvm_emulate_one(hvmemul_ctxt, &hvm_emulate_ops_no_write);
 }
 
-void hvm_mem_access_emulate_one(bool_t nowrite, unsigned int trapnr,
+int hvm_emulate_one_set_context(
+    struct hvm_emulate_ctxt *hvmemul_ctxt)
+{
+    return _hvm_emulate_one(hvmemul_ctxt, &hvm_emulate_ops_set_context);
+}
+
+void hvm_mem_access_emulate_one(enum emul_kind kind, unsigned int trapnr,
     unsigned int errcode)
 {
     struct hvm_emulate_ctxt ctx = {{ 0 }};
-    int rc;
+    int rc = X86EMUL_UNHANDLEABLE;
 
     hvm_emulate_prepare(&ctx, guest_cpu_user_regs());
 
-    if ( nowrite )
-        rc = hvm_emulate_one_no_write(&ctx);
-    else
+    switch ( kind ) {
+    case EMUL_KIND_NORMAL:
         rc = hvm_emulate_one(&ctx);
+        break;
+    case EMUL_KIND_NOWRITE:
+        rc = hvm_emulate_one_no_write(&ctx);
+        break;
+    case EMUL_KIND_SET_CONTEXT:
+        rc = hvm_emulate_one_set_context(&ctx);
+        break;
+    }
 
     switch ( rc )
     {
diff --git a/xen/arch/x86/hvm/event.c b/xen/arch/x86/hvm/event.c
index 53b9ca4..5341937 100644
--- a/xen/arch/x86/hvm/event.c
+++ b/xen/arch/x86/hvm/event.c
@@ -30,31 +30,31 @@ static void hvm_event_fill_regs(vm_event_request_t *req)
     const struct cpu_user_regs *regs = guest_cpu_user_regs();
     const struct vcpu *curr = current;
 
-    req->regs.x86.rax = regs->eax;
-    req->regs.x86.rcx = regs->ecx;
-    req->regs.x86.rdx = regs->edx;
-    req->regs.x86.rbx = regs->ebx;
-    req->regs.x86.rsp = regs->esp;
-    req->regs.x86.rbp = regs->ebp;
-    req->regs.x86.rsi = regs->esi;
-    req->regs.x86.rdi = regs->edi;
-
-    req->regs.x86.r8  = regs->r8;
-    req->regs.x86.r9  = regs->r9;
-    req->regs.x86.r10 = regs->r10;
-    req->regs.x86.r11 = regs->r11;
-    req->regs.x86.r12 = regs->r12;
-    req->regs.x86.r13 = regs->r13;
-    req->regs.x86.r14 = regs->r14;
-    req->regs.x86.r15 = regs->r15;
-
-    req->regs.x86.rflags = regs->eflags;
-    req->regs.x86.rip    = regs->eip;
-
-    req->regs.x86.msr_efer = curr->arch.hvm_vcpu.guest_efer;
-    req->regs.x86.cr0 = curr->arch.hvm_vcpu.guest_cr[0];
-    req->regs.x86.cr3 = curr->arch.hvm_vcpu.guest_cr[3];
-    req->regs.x86.cr4 = curr->arch.hvm_vcpu.guest_cr[4];
+    req->data.regs.x86.rax = regs->eax;
+    req->data.regs.x86.rcx = regs->ecx;
+    req->data.regs.x86.rdx = regs->edx;
+    req->data.regs.x86.rbx = regs->ebx;
+    req->data.regs.x86.rsp = regs->esp;
+    req->data.regs.x86.rbp = regs->ebp;
+    req->data.regs.x86.rsi = regs->esi;
+    req->data.regs.x86.rdi = regs->edi;
+
+    req->data.regs.x86.r8  = regs->r8;
+    req->data.regs.x86.r9  = regs->r9;
+    req->data.regs.x86.r10 = regs->r10;
+    req->data.regs.x86.r11 = regs->r11;
+    req->data.regs.x86.r12 = regs->r12;
+    req->data.regs.x86.r13 = regs->r13;
+    req->data.regs.x86.r14 = regs->r14;
+    req->data.regs.x86.r15 = regs->r15;
+
+    req->data.regs.x86.rflags = regs->eflags;
+    req->data.regs.x86.rip    = regs->eip;
+
+    req->data.regs.x86.msr_efer = curr->arch.hvm_vcpu.guest_efer;
+    req->data.regs.x86.cr0 = curr->arch.hvm_vcpu.guest_cr[0];
+    req->data.regs.x86.cr3 = curr->arch.hvm_vcpu.guest_cr[3];
+    req->data.regs.x86.cr4 = curr->arch.hvm_vcpu.guest_cr[4];
 }
 
 static int hvm_event_traps(uint8_t sync, vm_event_request_t *req)
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 1fd1194..0ab74bc 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1369,49 +1369,49 @@ static void p2m_vm_event_fill_regs(vm_event_request_t *req)
     /* Architecture-specific vmcs/vmcb bits */
     hvm_funcs.save_cpu_ctxt(curr, &ctxt);
 
-    req->regs.x86.rax = regs->eax;
-    req->regs.x86.rcx = regs->ecx;
-    req->regs.x86.rdx = regs->edx;
-    req->regs.x86.rbx = regs->ebx;
-    req->regs.x86.rsp = regs->esp;
-    req->regs.x86.rbp = regs->ebp;
-    req->regs.x86.rsi = regs->esi;
-    req->regs.x86.rdi = regs->edi;
-
-    req->regs.x86.r8  = regs->r8;
-    req->regs.x86.r9  = regs->r9;
-    req->regs.x86.r10 = regs->r10;
-    req->regs.x86.r11 = regs->r11;
-    req->regs.x86.r12 = regs->r12;
-    req->regs.x86.r13 = regs->r13;
-    req->regs.x86.r14 = regs->r14;
-    req->regs.x86.r15 = regs->r15;
-
-    req->regs.x86.rflags = regs->eflags;
-    req->regs.x86.rip    = regs->eip;
-
-    req->regs.x86.dr7 = curr->arch.debugreg[7];
-    req->regs.x86.cr0 = ctxt.cr0;
-    req->regs.x86.cr2 = ctxt.cr2;
-    req->regs.x86.cr3 = ctxt.cr3;
-    req->regs.x86.cr4 = ctxt.cr4;
-
-    req->regs.x86.sysenter_cs = ctxt.sysenter_cs;
-    req->regs.x86.sysenter_esp = ctxt.sysenter_esp;
-    req->regs.x86.sysenter_eip = ctxt.sysenter_eip;
-
-    req->regs.x86.msr_efer = ctxt.msr_efer;
-    req->regs.x86.msr_star = ctxt.msr_star;
-    req->regs.x86.msr_lstar = ctxt.msr_lstar;
+    req->data.regs.x86.rax = regs->eax;
+    req->data.regs.x86.rcx = regs->ecx;
+    req->data.regs.x86.rdx = regs->edx;
+    req->data.regs.x86.rbx = regs->ebx;
+    req->data.regs.x86.rsp = regs->esp;
+    req->data.regs.x86.rbp = regs->ebp;
+    req->data.regs.x86.rsi = regs->esi;
+    req->data.regs.x86.rdi = regs->edi;
+
+    req->data.regs.x86.r8  = regs->r8;
+    req->data.regs.x86.r9  = regs->r9;
+    req->data.regs.x86.r10 = regs->r10;
+    req->data.regs.x86.r11 = regs->r11;
+    req->data.regs.x86.r12 = regs->r12;
+    req->data.regs.x86.r13 = regs->r13;
+    req->data.regs.x86.r14 = regs->r14;
+    req->data.regs.x86.r15 = regs->r15;
+
+    req->data.regs.x86.rflags = regs->eflags;
+    req->data.regs.x86.rip    = regs->eip;
+
+    req->data.regs.x86.dr7 = curr->arch.debugreg[7];
+    req->data.regs.x86.cr0 = ctxt.cr0;
+    req->data.regs.x86.cr2 = ctxt.cr2;
+    req->data.regs.x86.cr3 = ctxt.cr3;
+    req->data.regs.x86.cr4 = ctxt.cr4;
+
+    req->data.regs.x86.sysenter_cs = ctxt.sysenter_cs;
+    req->data.regs.x86.sysenter_esp = ctxt.sysenter_esp;
+    req->data.regs.x86.sysenter_eip = ctxt.sysenter_eip;
+
+    req->data.regs.x86.msr_efer = ctxt.msr_efer;
+    req->data.regs.x86.msr_star = ctxt.msr_star;
+    req->data.regs.x86.msr_lstar = ctxt.msr_lstar;
 
     hvm_get_segment_register(curr, x86_seg_fs, &seg);
-    req->regs.x86.fs_base = seg.base;
+    req->data.regs.x86.fs_base = seg.base;
 
     hvm_get_segment_register(curr, x86_seg_gs, &seg);
-    req->regs.x86.gs_base = seg.base;
+    req->data.regs.x86.gs_base = seg.base;
 
     hvm_get_segment_register(curr, x86_seg_cs, &seg);
-    req->regs.x86.cs_arbytes = seg.attr.bytes;
+    req->data.regs.x86.cs_arbytes = seg.attr.bytes;
 }
 
 void p2m_mem_access_emulate_check(struct vcpu *v,
@@ -1466,6 +1466,10 @@ void p2m_mem_access_emulate_check(struct vcpu *v,
         }
 
         v->arch.vm_event.emulate_flags = violation ? rsp->flags : 0;
+
+        if ( rsp->flags & MEM_ACCESS_SET_EMUL_READ_DATA &&
+             v->arch.vm_event.emul_read_data )
+            *v->arch.vm_event.emul_read_data = rsp->data.emul_read_data;
     }
 }
 
@@ -1552,9 +1556,15 @@ bool_t p2m_mem_access_check(paddr_t gpa, unsigned long gla,
 
     if ( v->arch.vm_event.emulate_flags )
     {
-        hvm_mem_access_emulate_one((v->arch.vm_event.emulate_flags &
-                                    MEM_ACCESS_EMULATE_NOWRITE) != 0,
-                                   TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
+        enum emul_kind kind = EMUL_KIND_NORMAL;
+
+        if ( v->arch.vm_event.emulate_flags & MEM_ACCESS_SET_EMUL_READ_DATA )
+            kind = EMUL_KIND_SET_CONTEXT;
+        else if ( v->arch.vm_event.emulate_flags & MEM_ACCESS_EMULATE_NOWRITE )
+            kind = EMUL_KIND_NOWRITE;
+
+        hvm_mem_access_emulate_one(kind, TRAP_invalid_op,
+                                   HVM_DELIVER_NO_ERROR_CODE);
 
         v->arch.vm_event.emulate_flags = 0;
         return 1;
diff --git a/xen/common/vm_event.c b/xen/common/vm_event.c
index 120a78a..d635b36 100644
--- a/xen/common/vm_event.c
+++ b/xen/common/vm_event.c
@@ -48,6 +48,7 @@ static int vm_event_enable(
 {
     int rc;
     unsigned long ring_gfn = d->arch.hvm_domain.params[param];
+    struct vcpu *v;
 
     /* Only one helper at a time. If the helper crashed,
      * the ring is in an undefined state and so is the guest.
@@ -63,6 +64,21 @@ static int vm_event_enable(
     vm_event_ring_lock_init(ved);
     vm_event_ring_lock(ved);
 
+    for_each_vcpu( d, v )
+    {
+        if ( v->arch.vm_event.emul_read_data )
+            break;
+
+        v->arch.vm_event.emul_read_data =
+            xmalloc(struct vm_event_emul_read_data);
+
+        if ( !v->arch.vm_event.emul_read_data )
+        {
+            rc = -ENOMEM;
+            goto err;
+        }
+    }
+
     rc = prepare_ring_for_helper(d, ring_gfn, &ved->ring_pg_struct,
                                     &ved->ring_page);
     if ( rc < 0 )
@@ -225,6 +241,13 @@ static int vm_event_disable(struct domain *d, struct vm_event_domain *ved)
 
         destroy_ring_for_helper(&ved->ring_page,
                                 ved->ring_pg_struct);
+
+        for_each_vcpu( d, v )
+        {
+            xfree(v->arch.vm_event.emul_read_data);
+            v->arch.vm_event.emul_read_data = NULL;
+        }
+
         vm_event_ring_unlock(ved);
     }
 
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index a3c117f..24011d5 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -10,6 +10,7 @@
 #include <asm/mce.h>
 #include <public/vcpu.h>
 #include <public/hvm/hvm_info_table.h>
+#include <public/vm_event.h>
 
 #define has_32bit_shinfo(d)    ((d)->arch.has_32bit_shinfo)
 #define is_pv_32bit_domain(d)  ((d)->arch.is_32bit_pv)
@@ -504,6 +505,7 @@ struct arch_vcpu
         uint32_t emulate_flags;
         unsigned long gpa;
         unsigned long eip;
+        struct vm_event_emul_read_data *emul_read_data;
     } vm_event;
 
 };
diff --git a/xen/include/asm-x86/hvm/emulate.h b/xen/include/asm-x86/hvm/emulate.h
index b3971c8..65ccfd8 100644
--- a/xen/include/asm-x86/hvm/emulate.h
+++ b/xen/include/asm-x86/hvm/emulate.h
@@ -34,11 +34,19 @@ struct hvm_emulate_ctxt {
     uint32_t intr_shadow;
 };
 
+enum emul_kind {
+    EMUL_KIND_NORMAL,
+    EMUL_KIND_NOWRITE,
+    EMUL_KIND_SET_CONTEXT
+};
+
 int hvm_emulate_one(
     struct hvm_emulate_ctxt *hvmemul_ctxt);
 int hvm_emulate_one_no_write(
     struct hvm_emulate_ctxt *hvmemul_ctxt);
-void hvm_mem_access_emulate_one(bool_t nowrite,
+int hvm_emulate_one_set_context(
+    struct hvm_emulate_ctxt *hvmemul_ctxt);
+void hvm_mem_access_emulate_one(enum emul_kind kind,
     unsigned int trapnr,
     unsigned int errcode);
 void hvm_emulate_prepare(
diff --git a/xen/include/public/vm_event.h b/xen/include/public/vm_event.h
index 577e971..3223bb4 100644
--- a/xen/include/public/vm_event.h
+++ b/xen/include/public/vm_event.h
@@ -149,6 +149,13 @@ struct vm_event_regs_x86 {
  * potentially having side effects (like memory mapped or port I/O) disabled.
  */
 #define MEM_ACCESS_EMULATE_NOWRITE      (1 << 7)
+/*
+ * Data is being sent back to the hypervisor in the event response, to be
+ * returned by the read function when emulating an instruction.
+ * This flag is only useful when combined with MEM_ACCESS_EMULATE or
+ * MEM_ACCESS_EMULATE_NOWRITE.
+ */
+#define MEM_ACCESS_SET_EMUL_READ_DATA   (1 << 8)
 
 struct vm_event_mem_access {
     uint64_t gfn;
@@ -189,6 +196,12 @@ struct vm_event_sharing {
     uint32_t _pad;
 };
 
+struct vm_event_emul_read_data {
+    uint32_t size;
+    /* The struct is used in a union with vm_event_regs_x86. */
+    uint8_t  data[sizeof(struct vm_event_regs_x86) - sizeof(uint32_t)];
+};
+
 typedef struct vm_event_st {
     uint32_t version;   /* VM_EVENT_INTERFACE_VERSION */
     uint32_t flags;     /* VM_EVENT_FLAG_* */
@@ -206,8 +219,12 @@ typedef struct vm_event_st {
     } u;
 
     union {
-        struct vm_event_regs_x86 x86;
-    } regs;
+        union {
+            struct vm_event_regs_x86 x86;
+        } regs;
+
+        struct vm_event_emul_read_data emul_read_data;
+    } data;
 } vm_event_request_t, vm_event_response_t;
 
 DEFINE_RING_TYPES(vm_event, vm_event_request_t, vm_event_response_t);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V2 2/3] xen/vm_event: Support for guest-requested events
  2015-06-15  9:03 Vm_event memory introspection helpers Razvan Cojocaru
  2015-06-15  9:03 ` [PATCH V2 1/3] xen/vm_access: Support for memory-content hiding Razvan Cojocaru
@ 2015-06-15  9:03 ` Razvan Cojocaru
  2015-06-24 14:56   ` Razvan Cojocaru
  2015-06-26  7:02   ` Jan Beulich
  2015-06-15  9:03 ` [PATCH V2 3/3] xen/vm_event: Deny register writes if refused by vm_event reply Razvan Cojocaru
  2 siblings, 2 replies; 30+ messages in thread
From: Razvan Cojocaru @ 2015-06-15  9:03 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, keir, eddie.dong, Razvan Cojocaru,
	stefano.stabellini, jun.nakajima, andrew.cooper3, ian.jackson,
	tim, Aravind.Gopalakrishnan, jbeulich, wei.liu2, boris.ostrovsky,
	suravee.suthikulpanit, ian.campbell

Added support for a new class of vm_events: VM_EVENT_REASON_REQUEST,
sent via HVMOP_request_vm_event. The guest can request that a
generic vm_event (containing only the vm_event-filled guest registers
as information) be sent to userspace by setting up the correct
registers and doing a VMCALL. For example, for a 64-bit guest, this
means: EAX = 34 (hvmop), EBX = 24 (HVMOP_request_vm_event).

Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>

---
Changes since V1:
 - Using "curr" consistently in hvm_event_requested().
 - Renamed VM_EVENT_REASON_REQUEST to VM_EVENT_REASON_GUEST_REQUEST.
---
 tools/libxc/include/xenctrl.h   |    2 ++
 tools/libxc/xc_monitor.c        |   15 +++++++++++++++
 xen/arch/x86/hvm/event.c        |   14 ++++++++++++++
 xen/arch/x86/hvm/hvm.c          |    4 ++++
 xen/arch/x86/monitor.c          |   16 ++++++++++++++++
 xen/include/asm-x86/domain.h    |   16 +++++++++-------
 xen/include/asm-x86/hvm/event.h |    1 +
 xen/include/public/domctl.h     |    6 ++++++
 xen/include/public/hvm/hvm_op.h |    4 ++++
 xen/include/public/vm_event.h   |    2 ++
 10 files changed, 73 insertions(+), 7 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 50fa9e7..ffedeee 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2383,6 +2383,8 @@ int xc_monitor_mov_to_msr(xc_interface *xch, domid_t domain_id, bool enable,
 int xc_monitor_singlestep(xc_interface *xch, domid_t domain_id, bool enable);
 int xc_monitor_software_breakpoint(xc_interface *xch, domid_t domain_id,
                                    bool enable);
+int xc_monitor_requested(xc_interface *xch, domid_t domain_id, bool enable,
+                         bool sync);
 
 /***
  * Memory sharing operations.
diff --git a/tools/libxc/xc_monitor.c b/tools/libxc/xc_monitor.c
index 63013de..545d427 100644
--- a/tools/libxc/xc_monitor.c
+++ b/tools/libxc/xc_monitor.c
@@ -105,3 +105,18 @@ int xc_monitor_singlestep(xc_interface *xch, domid_t domain_id,
 
     return do_domctl(xch, &domctl);
 }
+
+int xc_monitor_requested(xc_interface *xch, domid_t domain_id, bool enable,
+                         bool sync)
+{
+    DECLARE_DOMCTL;
+
+    domctl.cmd = XEN_DOMCTL_monitor_op;
+    domctl.domain = domain_id;
+    domctl.u.monitor_op.op = enable ? XEN_DOMCTL_MONITOR_OP_ENABLE
+                                    : XEN_DOMCTL_MONITOR_OP_DISABLE;
+    domctl.u.monitor_op.event = XEN_DOMCTL_MONITOR_EVENT_GUEST_REQUEST;
+    domctl.u.monitor_op.u.request.sync = sync;
+
+    return do_domctl(xch, &domctl);
+}
diff --git a/xen/arch/x86/hvm/event.c b/xen/arch/x86/hvm/event.c
index 5341937..5df0e31 100644
--- a/xen/arch/x86/hvm/event.c
+++ b/xen/arch/x86/hvm/event.c
@@ -126,6 +126,20 @@ void hvm_event_msr(unsigned int msr, uint64_t value)
         hvm_event_traps(1, &req);
 }
 
+void hvm_event_requested(void)
+{
+    struct vcpu *curr = current;
+    struct arch_domain *currad = &curr->domain->arch;
+
+    vm_event_request_t req = {
+        .reason = VM_EVENT_REASON_GUEST_REQUEST,
+        .vcpu_id = curr->vcpu_id,
+    };
+
+    if ( currad->monitor.request_enabled )
+        hvm_event_traps(currad->monitor.request_sync, &req);
+}
+
 int hvm_event_int3(unsigned long gla)
 {
     int rc = 0;
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index b1023bb..02a3df8 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -6373,6 +6373,10 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
         break;
     }
 
+    case HVMOP_request_vm_event:
+        hvm_event_requested();
+        break;
+
     default:
     {
         gdprintk(XENLOG_DEBUG, "Bad HVM op %ld.\n", op);
diff --git a/xen/arch/x86/monitor.c b/xen/arch/x86/monitor.c
index 896acf7..11c66bd 100644
--- a/xen/arch/x86/monitor.c
+++ b/xen/arch/x86/monitor.c
@@ -161,6 +161,22 @@ int monitor_domctl(struct domain *d, struct xen_domctl_monitor_op *mop)
         break;
     }
 
+    case XEN_DOMCTL_MONITOR_EVENT_GUEST_REQUEST:
+    {
+        bool_t status = ad->monitor.request_enabled;
+
+        rc = status_check(mop, status);
+        if ( rc )
+            return rc;
+
+        ad->monitor.request_sync = mop->u.request.sync;
+
+        domain_pause(d);
+        ad->monitor.request_enabled = !status;
+        domain_unpause(d);
+        break;
+    }
+
     default:
         return -EOPNOTSUPP;
 
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 24011d5..a9af5f4 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -342,13 +342,15 @@ struct arch_domain
 
     /* Monitor options */
     struct {
-        uint16_t write_ctrlreg_enabled       : 4;
-        uint16_t write_ctrlreg_sync          : 4;
-        uint16_t write_ctrlreg_onchangeonly  : 4;
-        uint16_t mov_to_msr_enabled          : 1;
-        uint16_t mov_to_msr_extended         : 1;
-        uint16_t singlestep_enabled          : 1;
-        uint16_t software_breakpoint_enabled : 1;
+        uint32_t write_ctrlreg_enabled       : 4;
+        uint32_t write_ctrlreg_sync          : 4;
+        uint32_t write_ctrlreg_onchangeonly  : 4;
+        uint32_t mov_to_msr_enabled          : 1;
+        uint32_t mov_to_msr_extended         : 1;
+        uint32_t singlestep_enabled          : 1;
+        uint32_t software_breakpoint_enabled : 1;
+        uint32_t request_enabled             : 1;
+        uint32_t request_sync                : 1;
     } monitor;
 
     /* Mem_access emulation control */
diff --git a/xen/include/asm-x86/hvm/event.h b/xen/include/asm-x86/hvm/event.h
index 819ef96..8d93f6a 100644
--- a/xen/include/asm-x86/hvm/event.h
+++ b/xen/include/asm-x86/hvm/event.h
@@ -26,6 +26,7 @@ void hvm_event_msr(unsigned int msr, uint64_t value);
 /* Called for current VCPU: returns -1 if no listener */
 int hvm_event_int3(unsigned long gla);
 int hvm_event_single_step(unsigned long gla);
+void hvm_event_requested(void);
 
 #endif /* __ASM_X86_HVM_EVENT_H__ */
 
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index bc45ea5..f75af87 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -1012,6 +1012,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_domctl_psr_cmt_op_t);
 #define XEN_DOMCTL_MONITOR_EVENT_MOV_TO_MSR            1
 #define XEN_DOMCTL_MONITOR_EVENT_SINGLESTEP            2
 #define XEN_DOMCTL_MONITOR_EVENT_SOFTWARE_BREAKPOINT   3
+#define XEN_DOMCTL_MONITOR_EVENT_GUEST_REQUEST         4
 
 struct xen_domctl_monitor_op {
     uint32_t op; /* XEN_DOMCTL_MONITOR_OP_* */
@@ -1034,6 +1035,11 @@ struct xen_domctl_monitor_op {
             /* Enable the capture of an extended set of MSRs */
             uint8_t extended_capture;
         } mov_to_msr;
+
+        struct {
+            /* Pause vCPU until response */
+            uint8_t sync;
+        } request;
     } u;
 };
 typedef struct xen_domctl_monitor_op xen_domctl_monitor_op_t;
diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
index cde3571..cb5168a 100644
--- a/xen/include/public/hvm/hvm_op.h
+++ b/xen/include/public/hvm/hvm_op.h
@@ -389,6 +389,10 @@ DEFINE_XEN_GUEST_HANDLE(xen_hvm_evtchn_upcall_vector_t);
 
 #endif /* defined(__i386__) || defined(__x86_64__) */
 
+#if defined(__XEN__) || defined(__XEN_TOOLS__)
+#define HVMOP_request_vm_event 24
+#endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
+
 #endif /* __XEN_PUBLIC_HVM_HVM_OP_H__ */
 
 /*
diff --git a/xen/include/public/vm_event.h b/xen/include/public/vm_event.h
index 3223bb4..8c39549 100644
--- a/xen/include/public/vm_event.h
+++ b/xen/include/public/vm_event.h
@@ -68,6 +68,8 @@
 #define VM_EVENT_REASON_SOFTWARE_BREAKPOINT     6
 /* Single-step (e.g. MTF) */
 #define VM_EVENT_REASON_SINGLESTEP              7
+/* An event has been requested via HVMOP_request_vm_event. */
+#define VM_EVENT_REASON_GUEST_REQUEST           8
 
 /* Supported values for the vm_event_write_ctrlreg index. */
 #define VM_EVENT_X86_CR0    0
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V2 3/3] xen/vm_event: Deny register writes if refused by vm_event reply
  2015-06-15  9:03 Vm_event memory introspection helpers Razvan Cojocaru
  2015-06-15  9:03 ` [PATCH V2 1/3] xen/vm_access: Support for memory-content hiding Razvan Cojocaru
  2015-06-15  9:03 ` [PATCH V2 2/3] xen/vm_event: Support for guest-requested events Razvan Cojocaru
@ 2015-06-15  9:03 ` Razvan Cojocaru
  2015-06-26  8:28   ` Jan Beulich
  2 siblings, 1 reply; 30+ messages in thread
From: Razvan Cojocaru @ 2015-06-15  9:03 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, keir, eddie.dong, Razvan Cojocaru,
	stefano.stabellini, jun.nakajima, andrew.cooper3, ian.jackson,
	tim, Aravind.Gopalakrishnan, jbeulich, wei.liu2, boris.ostrovsky,
	suravee.suthikulpanit, ian.campbell

Deny register writes if a vm_client subscribed to mov_to_msr events
forbids them. Currently supported for MSR, CR0, CR3 and CR4 events.

Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>

---
Changes since V1:
 - Renamed MEM_ACCESS_SKIP_MSR_WRITE to MEM_ACCESS_DENY, to be used
   as a generic deny flag for whatever write operation triggered the
   event.
 - Write events for CR0, CR3 and CR4 are now pre-write.
---
 xen/arch/x86/domain.c             |    2 +
 xen/arch/x86/hvm/emulate.c        |    8 +--
 xen/arch/x86/hvm/event.c          |    5 +-
 xen/arch/x86/hvm/hvm.c            |  120 ++++++++++++++++++++++++++++++++-----
 xen/arch/x86/hvm/svm/nestedsvm.c  |   14 ++---
 xen/arch/x86/hvm/svm/svm.c        |    2 +-
 xen/arch/x86/hvm/vmx/vmx.c        |    6 +-
 xen/arch/x86/hvm/vmx/vvmx.c       |   18 +++---
 xen/arch/x86/mm/p2m.c             |   29 +++++++++
 xen/common/vm_event.c             |    6 ++
 xen/include/asm-x86/domain.h      |   18 +++++-
 xen/include/asm-x86/hvm/event.h   |    3 +-
 xen/include/asm-x86/hvm/support.h |    9 +--
 xen/include/public/vm_event.h     |    5 ++
 14 files changed, 201 insertions(+), 44 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 95eb190..3b706b0 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -659,6 +659,8 @@ void arch_domain_destroy(struct domain *d)
     cleanup_domain_irq_mapping(d);
 
     psr_free_rmid(d);
+
+    xfree(d->arch.event_write_data);
 }
 
 void arch_domain_shutdown(struct domain *d)
diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 2f7081d..f59bc7f 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -1346,14 +1346,14 @@ static int hvmemul_write_cr(
     switch ( reg )
     {
     case 0:
-        return hvm_set_cr0(val);
+        return hvm_set_cr0(val, 1);
     case 2:
         current->arch.hvm_vcpu.guest_cr[2] = val;
         return X86EMUL_OKAY;
     case 3:
-        return hvm_set_cr3(val);
+        return hvm_set_cr3(val, 1);
     case 4:
-        return hvm_set_cr4(val);
+        return hvm_set_cr4(val, 1);
     default:
         break;
     }
@@ -1374,7 +1374,7 @@ static int hvmemul_write_msr(
     uint64_t val,
     struct x86_emulate_ctxt *ctxt)
 {
-    return hvm_msr_write_intercept(reg, val);
+    return hvm_msr_write_intercept(reg, val, 1);
 }
 
 static int hvmemul_wbinvd(
diff --git a/xen/arch/x86/hvm/event.c b/xen/arch/x86/hvm/event.c
index 5df0e31..9ac8583 100644
--- a/xen/arch/x86/hvm/event.c
+++ b/xen/arch/x86/hvm/event.c
@@ -90,7 +90,7 @@ static int hvm_event_traps(uint8_t sync, vm_event_request_t *req)
     return 1;
 }
 
-void hvm_event_cr(unsigned int index, unsigned long value, unsigned long old)
+bool_t hvm_event_cr(unsigned int index, unsigned long value, unsigned long old)
 {
     struct arch_domain *currad = &current->domain->arch;
     unsigned int ctrlreg_bitmask = monitor_ctrlreg_bitmask(index);
@@ -109,7 +109,10 @@ void hvm_event_cr(unsigned int index, unsigned long value, unsigned long old)
 
         hvm_event_traps(currad->monitor.write_ctrlreg_sync & ctrlreg_bitmask,
                         &req);
+        return 1;
     }
+
+    return 0;
 }
 
 void hvm_event_msr(unsigned int msr, uint64_t value)
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 02a3df8..32ebfe1 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -52,6 +52,7 @@
 #include <asm/traps.h>
 #include <asm/mc146818rtc.h>
 #include <asm/mce.h>
+#include <asm/monitor.h>
 #include <asm/hvm/hvm.h>
 #include <asm/hvm/vpt.h>
 #include <asm/hvm/support.h>
@@ -468,6 +469,37 @@ void hvm_do_resume(struct vcpu *v)
         }
     }
 
+    ASSERT(v == current);
+
+    if ( d->arch.event_write_data )
+    {
+        struct monitor_write_data *w = &d->arch.event_write_data[v->vcpu_id];
+
+        if ( w->do_write.msr )
+        {
+            hvm_msr_write_intercept(w->msr, w->value, 0);
+            w->do_write.msr = 0;
+        }
+
+        if ( w->do_write.cr0 )
+        {
+            hvm_set_cr0(w->cr0, 0);
+            w->do_write.cr0 = 0;
+        }
+
+        if ( w->do_write.cr3 )
+        {
+            hvm_set_cr3(w->cr3, 0);
+            w->do_write.cr3 = 0;
+        }
+
+        if ( w->do_write.cr4 )
+        {
+            hvm_set_cr4(w->cr4, 0);
+            w->do_write.cr4 = 0;
+        }
+    }
+
     /* Inject pending hw/sw trap */
     if ( v->arch.hvm_vcpu.inject_trap.vector != -1 ) 
     {
@@ -3086,13 +3118,13 @@ int hvm_mov_to_cr(unsigned int cr, unsigned int gpr)
     switch ( cr )
     {
     case 0:
-        return hvm_set_cr0(val);
+        return hvm_set_cr0(val, 1);
 
     case 3:
-        return hvm_set_cr3(val);
+        return hvm_set_cr3(val, 1);
 
     case 4:
-        return hvm_set_cr4(val);
+        return hvm_set_cr4(val, 1);
 
     case 8:
         vlapic_set_reg(vcpu_vlapic(curr), APIC_TASKPRI, ((val & 0x0f) << 4));
@@ -3189,12 +3221,13 @@ static void hvm_update_cr(struct vcpu *v, unsigned int cr, unsigned long value)
     hvm_update_guest_cr(v, cr);
 }
 
-int hvm_set_cr0(unsigned long value)
+int hvm_set_cr0(unsigned long value, bool_t event_only)
 {
     struct vcpu *v = current;
     struct domain *d = v->domain;
     unsigned long gfn, old_value = v->arch.hvm_vcpu.guest_cr[0];
     struct page_info *page;
+    struct arch_domain *currad = &v->domain->arch;
 
     HVM_DBG_LOG(DBG_LEVEL_VMMU, "Update CR0 value = %lx", value);
 
@@ -3224,6 +3257,22 @@ int hvm_set_cr0(unsigned long value)
         goto gpf;
     }
 
+    if ( event_only && unlikely(currad->monitor.write_ctrlreg_enabled &
+                                monitor_ctrlreg_bitmask(VM_EVENT_X86_CR0)) &&
+                                value != old_value )
+    {
+        ASSERT(currad->event_write_data != NULL);
+
+        if ( hvm_event_crX(CR0, value, old_value) )
+        {
+            /* The actual write will occur in hvm_do_resume(), if permitted. */
+            currad->event_write_data[v->vcpu_id].do_write.cr0 = 1;
+            currad->event_write_data[v->vcpu_id].cr0 = value;
+
+            return X86EMUL_OKAY;
+        }
+    }
+
     if ( (value & X86_CR0_PG) && !(old_value & X86_CR0_PG) )
     {
         if ( v->arch.hvm_vcpu.guest_efer & EFER_LME )
@@ -3290,7 +3339,6 @@ int hvm_set_cr0(unsigned long value)
         hvm_funcs.handle_cd(v, value);
 
     hvm_update_cr(v, 0, value);
-    hvm_event_crX(CR0, value, old_value);
 
     if ( (value ^ old_value) & X86_CR0_PG ) {
         if ( !nestedhvm_vmswitch_in_progress(v) && nestedhvm_vcpu_in_guestmode(v) )
@@ -3306,11 +3354,28 @@ int hvm_set_cr0(unsigned long value)
     return X86EMUL_EXCEPTION;
 }
 
-int hvm_set_cr3(unsigned long value)
+int hvm_set_cr3(unsigned long value, bool_t event_only)
 {
     struct vcpu *v = current;
     struct page_info *page;
-    unsigned long old;
+    unsigned long old = v->arch.hvm_vcpu.guest_cr[3];
+    struct arch_domain *currad = &v->domain->arch;
+
+    if ( event_only && unlikely(currad->monitor.write_ctrlreg_enabled &
+                                monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3)) &&
+                                value != old )
+    {
+        ASSERT(currad->event_write_data != NULL);
+
+        if ( hvm_event_crX(CR3, value, old) )
+        {
+            /* The actual write will occur in hvm_do_resume(), if permitted. */
+            currad->event_write_data[v->vcpu_id].do_write.cr3 = 1;
+            currad->event_write_data[v->vcpu_id].cr3 = value;
+
+            return X86EMUL_OKAY;
+        }
+    }
 
     if ( hvm_paging_enabled(v) && !paging_mode_hap(v->domain) &&
          (value != v->arch.hvm_vcpu.guest_cr[3]) )
@@ -3328,10 +3393,8 @@ int hvm_set_cr3(unsigned long value)
         HVM_DBG_LOG(DBG_LEVEL_VMMU, "Update CR3 value = %lx", value);
     }
 
-    old=v->arch.hvm_vcpu.guest_cr[3];
     v->arch.hvm_vcpu.guest_cr[3] = value;
     paging_update_cr3(v);
-    hvm_event_crX(CR3, value, old);
     return X86EMUL_OKAY;
 
  bad_cr3:
@@ -3340,10 +3403,11 @@ int hvm_set_cr3(unsigned long value)
     return X86EMUL_UNHANDLEABLE;
 }
 
-int hvm_set_cr4(unsigned long value)
+int hvm_set_cr4(unsigned long value, bool_t event_only)
 {
     struct vcpu *v = current;
     unsigned long old_cr;
+    struct arch_domain *currad = &v->domain->arch;
 
     if ( value & hvm_cr4_guest_reserved_bits(v, 0) )
     {
@@ -3371,8 +3435,23 @@ int hvm_set_cr4(unsigned long value)
         goto gpf;
     }
 
+    if ( event_only && unlikely(currad->monitor.write_ctrlreg_enabled &
+                                monitor_ctrlreg_bitmask(VM_EVENT_X86_CR4)) &&
+                                value != old_cr )
+    {
+        ASSERT(currad->event_write_data != NULL);
+
+        if ( hvm_event_crX(CR4, value, old_cr) )
+        {
+            /* The actual write will occur in hvm_do_resume(), if permitted. */
+            currad->event_write_data[v->vcpu_id].do_write.cr4 = 1;
+            currad->event_write_data[v->vcpu_id].cr4 = value;
+
+            return X86EMUL_OKAY;
+        }
+    }
+
     hvm_update_cr(v, 4, value);
-    hvm_event_crX(CR4, value, old_cr);
 
     /*
      * Modifying CR4.{PSE,PAE,PGE,SMEP}, or clearing CR4.PCIDE
@@ -3836,7 +3915,7 @@ void hvm_task_switch(
         goto out;
 
 
-    if ( hvm_set_cr3(tss.cr3) )
+    if ( hvm_set_cr3(tss.cr3, 1) )
         goto out;
 
     regs->eip    = tss.eip;
@@ -4538,12 +4617,14 @@ int hvm_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
     goto out;
 }
 
-int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content)
+int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content,
+                            bool_t event_only)
 {
     struct vcpu *v = current;
     bool_t mtrr;
     unsigned int edx, index;
     int ret = X86EMUL_OKAY;
+    struct arch_domain *currad = &current->domain->arch;
 
     HVMTRACE_3D(MSR_WRITE, msr,
                (uint32_t)msr_content, (uint32_t)(msr_content >> 32));
@@ -4551,7 +4632,18 @@ int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content)
     hvm_cpuid(1, NULL, NULL, NULL, &edx);
     mtrr = !!(edx & cpufeat_mask(X86_FEATURE_MTRR));
 
-    hvm_event_msr(msr, msr_content);
+    if ( event_only && unlikely(currad->monitor.mov_to_msr_enabled) )
+    {
+        ASSERT(currad->event_write_data != NULL);
+
+        /* The actual write will occur in hvm_do_resume() (if permitted). */
+        currad->event_write_data[v->vcpu_id].do_write.msr = 1;
+        currad->event_write_data[v->vcpu_id].msr = msr;
+        currad->event_write_data[v->vcpu_id].value = msr_content;
+
+        hvm_event_msr(msr, msr_content);
+        return X86EMUL_OKAY;
+    }
 
     switch ( msr )
     {
diff --git a/xen/arch/x86/hvm/svm/nestedsvm.c b/xen/arch/x86/hvm/svm/nestedsvm.c
index be5797a..07e3cad 100644
--- a/xen/arch/x86/hvm/svm/nestedsvm.c
+++ b/xen/arch/x86/hvm/svm/nestedsvm.c
@@ -274,7 +274,7 @@ int nsvm_vcpu_hostrestore(struct vcpu *v, struct cpu_user_regs *regs)
 
     /* CR4 */
     v->arch.hvm_vcpu.guest_cr[4] = n1vmcb->_cr4;
-    rc = hvm_set_cr4(n1vmcb->_cr4);
+    rc = hvm_set_cr4(n1vmcb->_cr4, 1);
     if (rc != X86EMUL_OKAY)
         gdprintk(XENLOG_ERR, "hvm_set_cr4 failed, rc: %u\n", rc);
 
@@ -283,7 +283,7 @@ int nsvm_vcpu_hostrestore(struct vcpu *v, struct cpu_user_regs *regs)
         svm->ns_cr0, v->arch.hvm_vcpu.guest_cr[0]);
     v->arch.hvm_vcpu.guest_cr[0] = n1vmcb->_cr0 | X86_CR0_PE;
     n1vmcb->rflags &= ~X86_EFLAGS_VM;
-    rc = hvm_set_cr0(n1vmcb->_cr0 | X86_CR0_PE);
+    rc = hvm_set_cr0(n1vmcb->_cr0 | X86_CR0_PE, 1);
     if (rc != X86EMUL_OKAY)
         gdprintk(XENLOG_ERR, "hvm_set_cr0 failed, rc: %u\n", rc);
     svm->ns_cr0 = v->arch.hvm_vcpu.guest_cr[0];
@@ -309,7 +309,7 @@ int nsvm_vcpu_hostrestore(struct vcpu *v, struct cpu_user_regs *regs)
         v->arch.guest_table = pagetable_null();
         /* hvm_set_cr3() below sets v->arch.hvm_vcpu.guest_cr[3] for us. */
     }
-    rc = hvm_set_cr3(n1vmcb->_cr3);
+    rc = hvm_set_cr3(n1vmcb->_cr3, 1);
     if (rc != X86EMUL_OKAY)
         gdprintk(XENLOG_ERR, "hvm_set_cr3 failed, rc: %u\n", rc);
 
@@ -534,7 +534,7 @@ static int nsvm_vmcb_prepare4vmrun(struct vcpu *v, struct cpu_user_regs *regs)
 
     /* CR4 */
     v->arch.hvm_vcpu.guest_cr[4] = ns_vmcb->_cr4;
-    rc = hvm_set_cr4(ns_vmcb->_cr4);
+    rc = hvm_set_cr4(ns_vmcb->_cr4, 1);
     if (rc != X86EMUL_OKAY)
         gdprintk(XENLOG_ERR, "hvm_set_cr4 failed, rc: %u\n", rc);
 
@@ -542,7 +542,7 @@ static int nsvm_vmcb_prepare4vmrun(struct vcpu *v, struct cpu_user_regs *regs)
     svm->ns_cr0 = v->arch.hvm_vcpu.guest_cr[0];
     cr0 = nestedsvm_fpu_vmentry(svm->ns_cr0, ns_vmcb, n1vmcb, n2vmcb);
     v->arch.hvm_vcpu.guest_cr[0] = ns_vmcb->_cr0;
-    rc = hvm_set_cr0(cr0);
+    rc = hvm_set_cr0(cr0, 1);
     if (rc != X86EMUL_OKAY)
         gdprintk(XENLOG_ERR, "hvm_set_cr0 failed, rc: %u\n", rc);
 
@@ -558,7 +558,7 @@ static int nsvm_vmcb_prepare4vmrun(struct vcpu *v, struct cpu_user_regs *regs)
         nestedsvm_vmcb_set_nestedp2m(v, ns_vmcb, n2vmcb);
 
         /* hvm_set_cr3() below sets v->arch.hvm_vcpu.guest_cr[3] for us. */
-        rc = hvm_set_cr3(ns_vmcb->_cr3);
+        rc = hvm_set_cr3(ns_vmcb->_cr3, 1);
         if (rc != X86EMUL_OKAY)
             gdprintk(XENLOG_ERR, "hvm_set_cr3 failed, rc: %u\n", rc);
     } else if (paging_mode_hap(v->domain)) {
@@ -570,7 +570,7 @@ static int nsvm_vmcb_prepare4vmrun(struct vcpu *v, struct cpu_user_regs *regs)
          * we assume it intercepts page faults.
          */
         /* hvm_set_cr3() below sets v->arch.hvm_vcpu.guest_cr[3] for us. */
-        rc = hvm_set_cr3(ns_vmcb->_cr3);
+        rc = hvm_set_cr3(ns_vmcb->_cr3, 1);
         if (rc != X86EMUL_OKAY)
             gdprintk(XENLOG_ERR, "hvm_set_cr3 failed, rc: %u\n", rc);
     } else {
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 6734fb6..b9b4791 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1945,7 +1945,7 @@ static void svm_do_msr_access(struct cpu_user_regs *regs)
         if ( (inst_len = __get_instruction_length(v, INSTR_WRMSR)) == 0 )
             return;
         msr_content = ((uint64_t)regs->edx << 32) | (uint32_t)regs->eax;
-        rc = hvm_msr_write_intercept(regs->ecx, msr_content);
+        rc = hvm_msr_write_intercept(regs->ecx, msr_content, 1);
     }
 
     if ( rc == X86EMUL_OKAY )
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index af257db..a8757da 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2010,9 +2010,9 @@ static int vmx_cr_access(unsigned long exit_qualification)
     }
     case VMX_CONTROL_REG_ACCESS_TYPE_CLTS: {
         unsigned long old = curr->arch.hvm_vcpu.guest_cr[0];
+        hvm_event_crX(CR0, curr->arch.hvm_vcpu.guest_cr[0], old);
         curr->arch.hvm_vcpu.guest_cr[0] &= ~X86_CR0_TS;
         vmx_update_guest_cr(curr, 0);
-        hvm_event_crX(CR0, curr->arch.hvm_vcpu.guest_cr[0], old);
         HVMTRACE_0D(CLTS);
         break;
     }
@@ -2024,7 +2024,7 @@ static int vmx_cr_access(unsigned long exit_qualification)
                 (VMX_CONTROL_REG_ACCESS_DATA(exit_qualification) &
                  (X86_CR0_PE|X86_CR0_MP|X86_CR0_EM|X86_CR0_TS));
         HVMTRACE_LONG_1D(LMSW, value);
-        return hvm_set_cr0(value);
+        return hvm_set_cr0(value, 1);
     }
     default:
         BUG();
@@ -3046,7 +3046,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
     {
         uint64_t msr_content;
         msr_content = ((uint64_t)regs->edx << 32) | (uint32_t)regs->eax;
-        if ( hvm_msr_write_intercept(regs->ecx, msr_content) == X86EMUL_OKAY )
+        if ( hvm_msr_write_intercept(regs->ecx, msr_content, 1) == X86EMUL_OKAY )
             update_guest_eip(); /* Safe: WRMSR */
         break;
     }
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index ac6e3b3..3184af2 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -1048,15 +1048,16 @@ static void load_shadow_guest_state(struct vcpu *v)
 
     nvcpu->guest_cr[0] = __get_vvmcs(vvmcs, CR0_READ_SHADOW);
     nvcpu->guest_cr[4] = __get_vvmcs(vvmcs, CR4_READ_SHADOW);
-    hvm_set_cr0(__get_vvmcs(vvmcs, GUEST_CR0));
-    hvm_set_cr4(__get_vvmcs(vvmcs, GUEST_CR4));
-    hvm_set_cr3(__get_vvmcs(vvmcs, GUEST_CR3));
+    hvm_set_cr0(__get_vvmcs(vvmcs, GUEST_CR0), 1);
+    hvm_set_cr4(__get_vvmcs(vvmcs, GUEST_CR4), 1);
+    hvm_set_cr3(__get_vvmcs(vvmcs, GUEST_CR3), 1);
 
     control = __get_vvmcs(vvmcs, VM_ENTRY_CONTROLS);
     if ( control & VM_ENTRY_LOAD_GUEST_PAT )
         hvm_set_guest_pat(v, __get_vvmcs(vvmcs, GUEST_PAT));
     if ( control & VM_ENTRY_LOAD_PERF_GLOBAL_CTRL )
-        hvm_msr_write_intercept(MSR_CORE_PERF_GLOBAL_CTRL, __get_vvmcs(vvmcs, GUEST_PERF_GLOBAL_CTRL));
+        hvm_msr_write_intercept(MSR_CORE_PERF_GLOBAL_CTRL,
+                                __get_vvmcs(vvmcs, GUEST_PERF_GLOBAL_CTRL), 1);
 
     hvm_funcs.set_tsc_offset(v, v->arch.hvm_vcpu.cache_tsc_offset, 0);
 
@@ -1249,15 +1250,16 @@ static void load_vvmcs_host_state(struct vcpu *v)
         __vmwrite(vmcs_h2g_field[i].guest_field, r);
     }
 
-    hvm_set_cr0(__get_vvmcs(vvmcs, HOST_CR0));
-    hvm_set_cr4(__get_vvmcs(vvmcs, HOST_CR4));
-    hvm_set_cr3(__get_vvmcs(vvmcs, HOST_CR3));
+    hvm_set_cr0(__get_vvmcs(vvmcs, HOST_CR0), 1);
+    hvm_set_cr4(__get_vvmcs(vvmcs, HOST_CR4), 1);
+    hvm_set_cr3(__get_vvmcs(vvmcs, HOST_CR3), 1);
 
     control = __get_vvmcs(vvmcs, VM_EXIT_CONTROLS);
     if ( control & VM_EXIT_LOAD_HOST_PAT )
         hvm_set_guest_pat(v, __get_vvmcs(vvmcs, HOST_PAT));
     if ( control & VM_EXIT_LOAD_PERF_GLOBAL_CTRL )
-        hvm_msr_write_intercept(MSR_CORE_PERF_GLOBAL_CTRL, __get_vvmcs(vvmcs, HOST_PERF_GLOBAL_CTRL));
+        hvm_msr_write_intercept(MSR_CORE_PERF_GLOBAL_CTRL,
+                                __get_vvmcs(vvmcs, HOST_PERF_GLOBAL_CTRL), 1);
 
     hvm_funcs.set_tsc_offset(v, v->arch.hvm_vcpu.cache_tsc_offset, 0);
 
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 0ab74bc..451b674 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1417,6 +1417,35 @@ static void p2m_vm_event_fill_regs(vm_event_request_t *req)
 void p2m_mem_access_emulate_check(struct vcpu *v,
                                   const vm_event_response_t *rsp)
 {
+    ASSERT(v->domain->arch.event_write_data != NULL);
+
+    if ( rsp->flags & MEM_ACCESS_DENY )
+    {
+        struct monitor_write_data *w =
+            &v->domain->arch.event_write_data[v->vcpu_id];
+
+        switch ( rsp->reason )
+        {
+        case VM_EVENT_REASON_MOV_TO_MSR:
+            w->do_write.msr = 0;
+            break;
+        case VM_EVENT_REASON_WRITE_CTRLREG:
+            switch ( rsp->u.write_ctrlreg.index )
+            {
+            case VM_EVENT_X86_CR0:
+                w->do_write.cr0 = 0;
+                break;
+            case VM_EVENT_X86_CR3:
+                w->do_write.cr3 = 0;
+                break;
+            case VM_EVENT_X86_CR4:
+                w->do_write.cr4 = 0;
+                break;
+            }
+            break;
+        }
+    }
+
     /* Mark vcpu for skipping one instruction upon rescheduling. */
     if ( rsp->flags & MEM_ACCESS_EMULATE )
     {
diff --git a/xen/common/vm_event.c b/xen/common/vm_event.c
index d635b36..c1a1573 100644
--- a/xen/common/vm_event.c
+++ b/xen/common/vm_event.c
@@ -79,6 +79,10 @@ static int vm_event_enable(
         }
     }
 
+    if ( !d->arch.event_write_data )
+        d->arch.event_write_data = xzalloc_array(struct monitor_write_data,
+                                                 d->max_vcpus);
+
     rc = prepare_ring_for_helper(d, ring_gfn, &ved->ring_pg_struct,
                                     &ved->ring_page);
     if ( rc < 0 )
@@ -410,6 +414,8 @@ void vm_event_resume(struct domain *d, struct vm_event_domain *ved)
 
 #ifdef HAS_MEM_ACCESS
         case VM_EVENT_REASON_MEM_ACCESS:
+        case VM_EVENT_REASON_MOV_TO_MSR:
+        case VM_EVENT_REASON_WRITE_CTRLREG:
             mem_access_resume(v, &rsp);
             break;
 #endif
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index a9af5f4..fbfc5c4 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -250,6 +250,21 @@ struct pv_domain
     struct mapcache_domain mapcache;
 };
 
+struct monitor_write_data {
+    struct {
+        uint8_t msr : 1;
+        uint8_t cr0 : 1;
+        uint8_t cr3 : 1;
+        uint8_t cr4 : 1;
+    } do_write;
+
+    uint64_t msr;
+    uint64_t value;
+    uint64_t cr0;
+    uint64_t cr3;
+    uint64_t cr4;
+};
+
 struct arch_domain
 {
     struct page_info *perdomain_l3_pg;
@@ -355,6 +370,8 @@ struct arch_domain
 
     /* Mem_access emulation control */
     bool_t mem_access_emulate_enabled;
+
+    struct monitor_write_data *event_write_data;
 } __cacheline_aligned;
 
 #define has_arch_pdevs(d)    (!list_empty(&(d)->arch.pdev_list))
@@ -509,7 +526,6 @@ struct arch_vcpu
         unsigned long eip;
         struct vm_event_emul_read_data *emul_read_data;
     } vm_event;
-
 };
 
 smap_check_policy_t smap_policy_change(struct vcpu *v,
diff --git a/xen/include/asm-x86/hvm/event.h b/xen/include/asm-x86/hvm/event.h
index 8d93f6a..d9b3581 100644
--- a/xen/include/asm-x86/hvm/event.h
+++ b/xen/include/asm-x86/hvm/event.h
@@ -19,7 +19,8 @@
 #define __ASM_X86_HVM_EVENT_H__
 
 /* Called for current VCPU on crX/MSR changes by guest */
-void hvm_event_cr(unsigned int index, unsigned long value, unsigned long old);
+bool_t hvm_event_cr(unsigned int index, unsigned long value,
+                    unsigned long old);
 #define hvm_event_crX(what, new, old) \
     hvm_event_cr(VM_EVENT_X86_##what, new, old)
 void hvm_event_msr(unsigned int msr, uint64_t value);
diff --git a/xen/include/asm-x86/hvm/support.h b/xen/include/asm-x86/hvm/support.h
index 05ef5c5..9c06fd3 100644
--- a/xen/include/asm-x86/hvm/support.h
+++ b/xen/include/asm-x86/hvm/support.h
@@ -124,11 +124,12 @@ void hvm_shadow_handle_cd(struct vcpu *v, unsigned long value);
 
 /* These functions all return X86EMUL return codes. */
 int hvm_set_efer(uint64_t value);
-int hvm_set_cr0(unsigned long value);
-int hvm_set_cr3(unsigned long value);
-int hvm_set_cr4(unsigned long value);
+int hvm_set_cr0(unsigned long value, bool_t event_only);
+int hvm_set_cr3(unsigned long value, bool_t event_only);
+int hvm_set_cr4(unsigned long value, bool_t event_only);
 int hvm_msr_read_intercept(unsigned int msr, uint64_t *msr_content);
-int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content);
+int hvm_msr_write_intercept(
+    unsigned int msr, uint64_t msr_content, bool_t event_only);
 int hvm_mov_to_cr(unsigned int cr, unsigned int gpr);
 int hvm_mov_from_cr(unsigned int cr, unsigned int gpr);
 
diff --git a/xen/include/public/vm_event.h b/xen/include/public/vm_event.h
index 8c39549..e96491d 100644
--- a/xen/include/public/vm_event.h
+++ b/xen/include/public/vm_event.h
@@ -158,6 +158,11 @@ struct vm_event_regs_x86 {
  * MEM_ACCESS_EMULATE_NOWRITE.
  */
 #define MEM_ACCESS_SET_EMUL_READ_DATA   (1 << 8)
+ /*
+  * Deny completion of the operation that triggered the event.
+  * Currently only useful for MSR write events.
+  */
+#define MEM_ACCESS_DENY                 (1 << 9)
 
 struct vm_event_mem_access {
     uint64_t gfn;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH V2 2/3] xen/vm_event: Support for guest-requested events
  2015-06-15  9:03 ` [PATCH V2 2/3] xen/vm_event: Support for guest-requested events Razvan Cojocaru
@ 2015-06-24 14:56   ` Razvan Cojocaru
  2015-06-24 15:03     ` Jan Beulich
  2015-06-26  7:02   ` Jan Beulich
  1 sibling, 1 reply; 30+ messages in thread
From: Razvan Cojocaru @ 2015-06-24 14:56 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, keir, jbeulich, stefano.stabellini, andrew.cooper3,
	suravee.suthikulpanit, eddie.dong, tim, Aravind.Gopalakrishnan,
	jun.nakajima, wei.liu2, boris.ostrovsky, ian.jackson,
	ian.campbell

On 06/15/2015 12:03 PM, Razvan Cojocaru wrote:
> Added support for a new class of vm_events: VM_EVENT_REASON_REQUEST,
> sent via HVMOP_request_vm_event. The guest can request that a
> generic vm_event (containing only the vm_event-filled guest registers
> as information) be sent to userspace by setting up the correct
> registers and doing a VMCALL. For example, for a 64-bit guest, this
> means: EAX = 34 (hvmop), EBX = 24 (HVMOP_request_vm_event).
> 
> Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
> 
> ---
> Changes since V1:
>  - Using "curr" consistently in hvm_event_requested().
>  - Renamed VM_EVENT_REASON_REQUEST to VM_EVENT_REASON_GUEST_REQUEST.

In the meantime, my colleague Corneliu Zuzu working on an ARM-related
project with Xen has tested my patch and discovered that some registers
get clobbered when Xen is compiled in debug mode and a GUEST_REQUEST
event is being sent out.

Here's the ARM part, in do_trap_hypercall(), xen/arch/arm/traps.c:

1399 #ifndef NDEBUG
1400     /*
1401      * Clobber argument registers only if pc is unchanged, otherwise
1402      * this is a hypercall continuation.
1403      */
1404     if ( orig_pc == regs->pc )
1405     {
1406         switch ( arm_hypercall_table[*nr].nr_args ) {
1407         case 5: HYPERCALL_ARG5(regs) = 0xDEADBEEF;
1408         case 4: HYPERCALL_ARG4(regs) = 0xDEADBEEF;
1409         case 3: HYPERCALL_ARG3(regs) = 0xDEADBEEF;
1410         case 2: HYPERCALL_ARG2(regs) = 0xDEADBEEF;
1411         case 1: /* Don't clobber x0/r0 -- it's the return value */
1412             break;
1413         default: BUG();
1414         }
1415         *nr = 0xDEADBEEF;
1416     }
1417 #endif

Would it be fair to say that HVMOP_request_vm_event should be an
exception to this rule (i.e. something along the lines of "if (
unlikely(r12 == HVMOP_request_vm_event) && orig_pc == regs->pc )",
etc.)? Even in debug mode, a GUEST_REQUEST vm_event should keep the
state of the guest intact.


Thanks,
Razvan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V2 2/3] xen/vm_event: Support for guest-requested events
  2015-06-24 14:56   ` Razvan Cojocaru
@ 2015-06-24 15:03     ` Jan Beulich
  2015-06-25  7:55       ` Razvan Cojocaru
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Beulich @ 2015-06-24 15:03 UTC (permalink / raw)
  To: Razvan Cojocaru
  Cc: tim, kevin.tian, wei.liu2, ian.campbell, stefano.stabellini,
	jun.nakajima, andrew.cooper3, ian.jackson, xen-devel, eddie.dong,
	Aravind.Gopalakrishnan, suravee.suthikulpanit, keir,
	boris.ostrovsky

>>> On 24.06.15 at 16:56, <rcojocaru@bitdefender.com> wrote:
> On 06/15/2015 12:03 PM, Razvan Cojocaru wrote:
>> Added support for a new class of vm_events: VM_EVENT_REASON_REQUEST,
>> sent via HVMOP_request_vm_event. The guest can request that a
>> generic vm_event (containing only the vm_event-filled guest registers
>> as information) be sent to userspace by setting up the correct
>> registers and doing a VMCALL. For example, for a 64-bit guest, this
>> means: EAX = 34 (hvmop), EBX = 24 (HVMOP_request_vm_event).
>> 
>> Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
>> 
>> ---
>> Changes since V1:
>>  - Using "curr" consistently in hvm_event_requested().
>>  - Renamed VM_EVENT_REASON_REQUEST to VM_EVENT_REASON_GUEST_REQUEST.
> 
> In the meantime, my colleague Corneliu Zuzu working on an ARM-related
> project with Xen has tested my patch and discovered that some registers
> get clobbered when Xen is compiled in debug mode and a GUEST_REQUEST
> event is being sent out.
> 
> Here's the ARM part, in do_trap_hypercall(), xen/arch/arm/traps.c:
> 
> 1399 #ifndef NDEBUG
> 1400     /*
> 1401      * Clobber argument registers only if pc is unchanged, otherwise
> 1402      * this is a hypercall continuation.
> 1403      */
> 1404     if ( orig_pc == regs->pc )
> 1405     {
> 1406         switch ( arm_hypercall_table[*nr].nr_args ) {
> 1407         case 5: HYPERCALL_ARG5(regs) = 0xDEADBEEF;
> 1408         case 4: HYPERCALL_ARG4(regs) = 0xDEADBEEF;
> 1409         case 3: HYPERCALL_ARG3(regs) = 0xDEADBEEF;
> 1410         case 2: HYPERCALL_ARG2(regs) = 0xDEADBEEF;
> 1411         case 1: /* Don't clobber x0/r0 -- it's the return value */
> 1412             break;
> 1413         default: BUG();
> 1414         }
> 1415         *nr = 0xDEADBEEF;
> 1416     }
> 1417 #endif
> 
> Would it be fair to say that HVMOP_request_vm_event should be an
> exception to this rule (i.e. something along the lines of "if (
> unlikely(r12 == HVMOP_request_vm_event) && orig_pc == regs->pc )",
> etc.)? Even in debug mode, a GUEST_REQUEST vm_event should keep the
> state of the guest intact.

It's a hypercall, and hence should follow hypercall rules. If the
guest wants certain registers preserved, it should do so itself.

Jan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V2 2/3] xen/vm_event: Support for guest-requested events
  2015-06-24 15:03     ` Jan Beulich
@ 2015-06-25  7:55       ` Razvan Cojocaru
  2015-06-25  8:37         ` Jan Beulich
  0 siblings, 1 reply; 30+ messages in thread
From: Razvan Cojocaru @ 2015-06-25  7:55 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tim, kevin.tian, wei.liu2, ian.campbell, stefano.stabellini,
	jun.nakajima, andrew.cooper3, ian.jackson, xen-devel, eddie.dong,
	Aravind.Gopalakrishnan, suravee.suthikulpanit, keir,
	boris.ostrovsky

On 06/24/2015 06:03 PM, Jan Beulich wrote:
>>>> On 24.06.15 at 16:56, <rcojocaru@bitdefender.com> wrote:
>> Would it be fair to say that HVMOP_request_vm_event should be an
>> exception to this rule (i.e. something along the lines of "if (
>> unlikely(r12 == HVMOP_request_vm_event) && orig_pc == regs->pc )",
>> etc.)? Even in debug mode, a GUEST_REQUEST vm_event should keep the
>> state of the guest intact.
> 
> It's a hypercall, and hence should follow hypercall rules. If the
> guest wants certain registers preserved, it should do so itself.

Right, that's sensible, hypercalls should behave in a consistent manner.
The only issue here is that (and I think this would prove useful for
many future introspection clients), the trigger VMCALL doesn't often
happen from an in-guest driver.

>From what I understand from our introspection logic team (maybe they'll
chime in), VMCALLs are being used as markers in the code the OS is
running: "now we've reached this function, check out the guest
environment", "now we've reached this other function", etc.

They seem to be doing this often by overwriting (from the dom0
application) the beginning of a function that needs hooked, and the
larger this custom code becomes, the more problems arise. In the
registers clobber scenario, both registers holding HVMOP and
HVMOP_request_vm_event need to be saved and restored (pushed and
popped), and now additionally, for safety, all the clobbered registers,
so the necessary code has increased several times since the early days
of "push eax, mov eax MAGIC_CONSTANT, vmcall, pop eax".

Hence the request to at least not clobber the registers in this case,
and a reason why this initally was not a hypercall although the code
_almost_ acted like one.

Any suggestions (other than to just live with the situation)? :)


Thanks,
Razvan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V2 2/3] xen/vm_event: Support for guest-requested events
  2015-06-25  7:55       ` Razvan Cojocaru
@ 2015-06-25  8:37         ` Jan Beulich
  2015-06-25  9:09           ` Razvan Cojocaru
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Beulich @ 2015-06-25  8:37 UTC (permalink / raw)
  To: Razvan Cojocaru
  Cc: tim, kevin.tian, wei.liu2, ian.campbell, stefano.stabellini,
	jun.nakajima, andrew.cooper3, ian.jackson, xen-devel, eddie.dong,
	Aravind.Gopalakrishnan, suravee.suthikulpanit, keir,
	boris.ostrovsky

>>> On 25.06.15 at 09:55, <rcojocaru@bitdefender.com> wrote:
> On 06/24/2015 06:03 PM, Jan Beulich wrote:
>>>>> On 24.06.15 at 16:56, <rcojocaru@bitdefender.com> wrote:
>>> Would it be fair to say that HVMOP_request_vm_event should be an
>>> exception to this rule (i.e. something along the lines of "if (
>>> unlikely(r12 == HVMOP_request_vm_event) && orig_pc == regs->pc )",
>>> etc.)? Even in debug mode, a GUEST_REQUEST vm_event should keep the
>>> state of the guest intact.
>> 
>> It's a hypercall, and hence should follow hypercall rules. If the
>> guest wants certain registers preserved, it should do so itself.
> 
> Right, that's sensible, hypercalls should behave in a consistent manner.
> The only issue here is that (and I think this would prove useful for
> many future introspection clients), the trigger VMCALL doesn't often
> happen from an in-guest driver.
> 
> From what I understand from our introspection logic team (maybe they'll
> chime in), VMCALLs are being used as markers in the code the OS is
> running: "now we've reached this function, check out the guest
> environment", "now we've reached this other function", etc.
> 
> They seem to be doing this often by overwriting (from the dom0
> application) the beginning of a function that needs hooked, and the
> larger this custom code becomes, the more problems arise. In the
> registers clobber scenario, both registers holding HVMOP and
> HVMOP_request_vm_event need to be saved and restored (pushed and
> popped), and now additionally, for safety, all the clobbered registers,
> so the necessary code has increased several times since the early days
> of "push eax, mov eax MAGIC_CONSTANT, vmcall, pop eax".
> 
> Hence the request to at least not clobber the registers in this case,
> and a reason why this initally was not a hypercall although the code
> _almost_ acted like one.
> 
> Any suggestions (other than to just live with the situation)? :)

Even for the original code sequence you quote above it would be
less intrusive to the patched code if they just patched in a call.
Namely in Windows, with its reportedly pretty uniform function
prologues, there shouldn't be too many stub variants necessary
that such calls would target.

Jan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V2 2/3] xen/vm_event: Support for guest-requested events
  2015-06-25  8:37         ` Jan Beulich
@ 2015-06-25  9:09           ` Razvan Cojocaru
  0 siblings, 0 replies; 30+ messages in thread
From: Razvan Cojocaru @ 2015-06-25  9:09 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tim, kevin.tian, wei.liu2, ian.campbell, stefano.stabellini,
	jun.nakajima, andrew.cooper3, ian.jackson, xen-devel, eddie.dong,
	Aravind.Gopalakrishnan, suravee.suthikulpanit, keir,
	boris.ostrovsky

On 06/25/2015 11:37 AM, Jan Beulich wrote:
>>>> On 25.06.15 at 09:55, <rcojocaru@bitdefender.com> wrote:
>> On 06/24/2015 06:03 PM, Jan Beulich wrote:
>>>>>> On 24.06.15 at 16:56, <rcojocaru@bitdefender.com> wrote:
>>>> Would it be fair to say that HVMOP_request_vm_event should be an
>>>> exception to this rule (i.e. something along the lines of "if (
>>>> unlikely(r12 == HVMOP_request_vm_event) && orig_pc == regs->pc )",
>>>> etc.)? Even in debug mode, a GUEST_REQUEST vm_event should keep the
>>>> state of the guest intact.
>>>
>>> It's a hypercall, and hence should follow hypercall rules. If the
>>> guest wants certain registers preserved, it should do so itself.
>>
>> Right, that's sensible, hypercalls should behave in a consistent manner.
>> The only issue here is that (and I think this would prove useful for
>> many future introspection clients), the trigger VMCALL doesn't often
>> happen from an in-guest driver.
>>
>> From what I understand from our introspection logic team (maybe they'll
>> chime in), VMCALLs are being used as markers in the code the OS is
>> running: "now we've reached this function, check out the guest
>> environment", "now we've reached this other function", etc.
>>
>> They seem to be doing this often by overwriting (from the dom0
>> application) the beginning of a function that needs hooked, and the
>> larger this custom code becomes, the more problems arise. In the
>> registers clobber scenario, both registers holding HVMOP and
>> HVMOP_request_vm_event need to be saved and restored (pushed and
>> popped), and now additionally, for safety, all the clobbered registers,
>> so the necessary code has increased several times since the early days
>> of "push eax, mov eax MAGIC_CONSTANT, vmcall, pop eax".
>>
>> Hence the request to at least not clobber the registers in this case,
>> and a reason why this initally was not a hypercall although the code
>> _almost_ acted like one.
>>
>> Any suggestions (other than to just live with the situation)? :)
> 
> Even for the original code sequence you quote above it would be
> less intrusive to the patched code if they just patched in a call.
> Namely in Windows, with its reportedly pretty uniform function
> prologues, there shouldn't be too many stub variants necessary
> that such calls would target.

That makes great sense, that's very likely going to be the way to go
forward, so the patch can stay as it is.


Thanks,
Razvan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V2 1/3] xen/vm_access: Support for memory-content hiding
  2015-06-15  9:03 ` [PATCH V2 1/3] xen/vm_access: Support for memory-content hiding Razvan Cojocaru
@ 2015-06-25 15:57   ` Jan Beulich
  2015-06-26  8:22     ` Razvan Cojocaru
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Beulich @ 2015-06-25 15:57 UTC (permalink / raw)
  To: Razvan Cojocaru
  Cc: tim, kevin.tian, wei.liu2, ian.campbell, stefano.stabellini,
	jun.nakajima, andrew.cooper3, ian.jackson, xen-devel, eddie.dong,
	Aravind.Gopalakrishnan, suravee.suthikulpanit, keir,
	boris.ostrovsky

>>> On 15.06.15 at 11:03, <rcojocaru@bitdefender.com> wrote:
> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -578,6 +578,28 @@ static int hvmemul_read(
>          container_of(ctxt, struct hvm_emulate_ctxt, ctxt));
>  }
>  
> +static int hvmemul_read_set_context(
> +    enum x86_segment seg,
> +    unsigned long offset,
> +    void *p_data,
> +    unsigned int bytes,
> +    struct x86_emulate_ctxt *ctxt)
> +{
> +    struct vcpu *curr = current;
> +    unsigned int len;
> +
> +    if ( !curr->arch.vm_event.emul_read_data )
> +        return X86EMUL_UNHANDLEABLE;
> +
> +    len = min_t(unsigned int,
> +        bytes, curr->arch.vm_event.emul_read_data->size);
> +
> +    if ( len )
> +        memcpy(p_data, curr->arch.vm_event.emul_read_data->data, len);

And the rest of the destination simply remains unmodified /
uninitialized?

> @@ -891,14 +934,37 @@ static int hvmemul_rep_outs(
>                            !!(ctxt->regs->eflags & X86_EFLAGS_DF), NULL);
>  }
>  
> -static int hvmemul_rep_movs(
> +static int hvmemul_rep_outs_set_context(
> +    enum x86_segment src_seg,
> +    unsigned long src_offset,
> +    uint16_t dst_port,
> +    unsigned int bytes_per_rep,
> +    unsigned long *reps,
> +    struct x86_emulate_ctxt *ctxt)
> +{
> +    struct vcpu *curr = current;
> +    unsigned int safe_bytes;
> +
> +    if ( !curr->arch.vm_event.emul_read_data )
> +        return X86EMUL_UNHANDLEABLE;
> +
> +    safe_bytes = min_t(unsigned int, bytes_per_rep,
> +        curr->arch.vm_event.emul_read_data->size);
> +
> +    return hvmemul_do_pio(dst_port, reps, safe_bytes, 0, IOREQ_WRITE,
> +                          !!(ctxt->regs->eflags & X86_EFLAGS_DF),
> +                          curr->arch.vm_event.emul_read_data->data);

This isn't consistent with e.g. rep_movs below - you shouldn't
reduce the width of the operation.

Also - did I overlook where *reps gets forced to 1? If it's being
done elsewhere, perhaps worth an ASSERT()?

> @@ -981,7 +1047,19 @@ static int hvmemul_rep_movs(
>       */
>      rc = hvm_copy_from_guest_phys(buf, sgpa, bytes);
>      if ( rc == HVMCOPY_okay )
> +    {
> +        struct vcpu *curr = current;
> +
> +        if ( unlikely(set_context) && curr->arch.vm_event.emul_read_data )
> +        {
> +            unsigned long safe_bytes = min_t(unsigned long, bytes,
> +                curr->arch.vm_event.emul_read_data->size);

The variable doesn't need to be unsigned long.

> @@ -1000,13 +1078,40 @@ static int hvmemul_rep_movs(
>      return X86EMUL_OKAY;
>  }
>  
> -static int hvmemul_rep_stos(
> +static int hvmemul_rep_movs(
> +   enum x86_segment src_seg,
> +   unsigned long src_offset,
> +   enum x86_segment dst_seg,
> +   unsigned long dst_offset,
> +   unsigned int bytes_per_rep,
> +   unsigned long *reps,
> +   struct x86_emulate_ctxt *ctxt)
> +{
> +    return _hvmemul_rep_movs(src_seg, src_offset, dst_seg, dst_offset,
> +                             bytes_per_rep, reps, ctxt, 0);
> +}
> +
> +static int hvmemul_rep_movs_set_context(
> +   enum x86_segment src_seg,
> +   unsigned long src_offset,
> +   enum x86_segment dst_seg,
> +   unsigned long dst_offset,
> +   unsigned int bytes_per_rep,
> +   unsigned long *reps,
> +   struct x86_emulate_ctxt *ctxt)
> +{
> +    return _hvmemul_rep_movs(src_seg, src_offset, dst_seg, dst_offset,
> +                             bytes_per_rep, reps, ctxt, 1);
> +}

Perhaps putting a flag hvmemul_ctxt would be better?

> @@ -1408,6 +1569,32 @@ static const struct x86_emulate_ops hvm_emulate_ops_no_write = {
>      .invlpg        = hvmemul_invlpg
>  };
>  
> +static const struct x86_emulate_ops hvm_emulate_ops_set_context = {
> +    .read          = hvmemul_read_set_context,
> +    .insn_fetch    = hvmemul_insn_fetch,
> +    .write         = hvmemul_write,
> +    .cmpxchg       = hvmemul_cmpxchg_set_context,
> +    .rep_ins       = hvmemul_rep_ins,
> +    .rep_outs      = hvmemul_rep_outs_set_context,
> +    .rep_movs      = hvmemul_rep_movs_set_context,
> +    .rep_stos      = hvmemul_rep_stos_set_context,

If you don't override .write, why would you override .rep_stos?

> @@ -1528,18 +1715,31 @@ int hvm_emulate_one_no_write(
>      return _hvm_emulate_one(hvmemul_ctxt, &hvm_emulate_ops_no_write);
>  }
>  
> -void hvm_mem_access_emulate_one(bool_t nowrite, unsigned int trapnr,
> +int hvm_emulate_one_set_context(

static?

> +void hvm_mem_access_emulate_one(enum emul_kind kind, unsigned int trapnr,
>      unsigned int errcode)
>  {
>      struct hvm_emulate_ctxt ctx = {{ 0 }};
> -    int rc;
> +    int rc = X86EMUL_UNHANDLEABLE;
>  
>      hvm_emulate_prepare(&ctx, guest_cpu_user_regs());
>  
> -    if ( nowrite )
> -        rc = hvm_emulate_one_no_write(&ctx);
> -    else
> +    switch ( kind ) {
> +    case EMUL_KIND_NORMAL:
>          rc = hvm_emulate_one(&ctx);

Perhaps this should be the default case? If not, the initialization
of rc would better be put in the default case instead of at the top
of the function.

> @@ -63,6 +64,21 @@ static int vm_event_enable(
>      vm_event_ring_lock_init(ved);
>      vm_event_ring_lock(ved);
>  
> +    for_each_vcpu( d, v )
> +    {
> +        if ( v->arch.vm_event.emul_read_data )
> +            break;

continue?

Jan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V2 2/3] xen/vm_event: Support for guest-requested events
  2015-06-15  9:03 ` [PATCH V2 2/3] xen/vm_event: Support for guest-requested events Razvan Cojocaru
  2015-06-24 14:56   ` Razvan Cojocaru
@ 2015-06-26  7:02   ` Jan Beulich
  2015-06-26  7:17     ` Razvan Cojocaru
  2015-06-30 14:23     ` Razvan Cojocaru
  1 sibling, 2 replies; 30+ messages in thread
From: Jan Beulich @ 2015-06-26  7:02 UTC (permalink / raw)
  To: Razvan Cojocaru
  Cc: tim, kevin.tian, wei.liu2, ian.campbell, stefano.stabellini,
	jun.nakajima, andrew.cooper3, ian.jackson, xen-devel, eddie.dong,
	Aravind.Gopalakrishnan, suravee.suthikulpanit, keir,
	boris.ostrovsky, dgdegra

>>> On 15.06.15 at 11:03, <rcojocaru@bitdefender.com> wrote:
> Added support for a new class of vm_events: VM_EVENT_REASON_REQUEST,
> sent via HVMOP_request_vm_event. The guest can request that a
> generic vm_event (containing only the vm_event-filled guest registers
> as information) be sent to userspace by setting up the correct
> registers and doing a VMCALL. For example, for a 64-bit guest, this
> means: EAX = 34 (hvmop), EBX = 24 (HVMOP_request_vm_event).

I suppose you mean a 32-bit guest here? Also I'm not sure it's a good
idea to explicitly define a guest exposed hypercall to omit one of the
arguments normally required for it (the interface structure pointer):
Should there ever be a reason to allow the guest to control further
aspects of the operation by passing a structure, you'd then have to
define a new sub-op instead of being able to re-use the current one.
I.e. I'd strongly recommend requiring NULL to be passed here, and
checking this in the implementation of the handler.

> --- a/xen/arch/x86/hvm/event.c
> +++ b/xen/arch/x86/hvm/event.c
> @@ -126,6 +126,20 @@ void hvm_event_msr(unsigned int msr, uint64_t value)
>          hvm_event_traps(1, &req);
>  }
>  
> +void hvm_event_requested(void)
> +{
> +    struct vcpu *curr = current;
> +    struct arch_domain *currad = &curr->domain->arch;
> +
> +    vm_event_request_t req = {

Please avoid blank lines between declarations - a blank line following
declaration is supposed to delimit them from statements.

> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -6373,6 +6373,10 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>          break;
>      }
>  
> +    case HVMOP_request_vm_event:
> +        hvm_event_requested();
> +        break;

No XSM check here or in the handler? Shouldn't the admin controlling
guest properties from the host perspective be permitted control here?
Cc-ing Daniel for his input ...

> --- a/xen/include/asm-x86/domain.h
> +++ b/xen/include/asm-x86/domain.h
> @@ -342,13 +342,15 @@ struct arch_domain
>  
>      /* Monitor options */
>      struct {
> -        uint16_t write_ctrlreg_enabled       : 4;
> -        uint16_t write_ctrlreg_sync          : 4;
> -        uint16_t write_ctrlreg_onchangeonly  : 4;
> -        uint16_t mov_to_msr_enabled          : 1;
> -        uint16_t mov_to_msr_extended         : 1;
> -        uint16_t singlestep_enabled          : 1;
> -        uint16_t software_breakpoint_enabled : 1;
> +        uint32_t write_ctrlreg_enabled       : 4;
> +        uint32_t write_ctrlreg_sync          : 4;
> +        uint32_t write_ctrlreg_onchangeonly  : 4;
> +        uint32_t mov_to_msr_enabled          : 1;
> +        uint32_t mov_to_msr_extended         : 1;
> +        uint32_t singlestep_enabled          : 1;
> +        uint32_t software_breakpoint_enabled : 1;
> +        uint32_t request_enabled             : 1;
> +        uint32_t request_sync                : 1;

Can you please switch to plain unsigned int if you already have to
touch this? There's no reason I can see to use a fixed width integer
type here.

> --- a/xen/include/public/hvm/hvm_op.h
> +++ b/xen/include/public/hvm/hvm_op.h
> @@ -389,6 +389,10 @@ DEFINE_XEN_GUEST_HANDLE(xen_hvm_evtchn_upcall_vector_t);
>  
>  #endif /* defined(__i386__) || defined(__x86_64__) */
>  
> +#if defined(__XEN__) || defined(__XEN_TOOLS__)
> +#define HVMOP_request_vm_event 24
> +#endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */

Isn't this _specifically_ meant to be usable by a guest?

Jan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V2 2/3] xen/vm_event: Support for guest-requested events
  2015-06-26  7:02   ` Jan Beulich
@ 2015-06-26  7:17     ` Razvan Cojocaru
  2015-06-26  8:45       ` Jan Beulich
  2015-06-30 14:48       ` Lengyel, Tamas
  2015-06-30 14:23     ` Razvan Cojocaru
  1 sibling, 2 replies; 30+ messages in thread
From: Razvan Cojocaru @ 2015-06-26  7:17 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tim, kevin.tian, wei.liu2, ian.campbell, stefano.stabellini,
	jun.nakajima, andrew.cooper3, ian.jackson, xen-devel, eddie.dong,
	Aravind.Gopalakrishnan, suravee.suthikulpanit, keir,
	boris.ostrovsky, dgdegra

On 06/26/2015 10:02 AM, Jan Beulich wrote:
>>>> On 15.06.15 at 11:03, <rcojocaru@bitdefender.com> wrote:
>> Added support for a new class of vm_events: VM_EVENT_REASON_REQUEST,
>> sent via HVMOP_request_vm_event. The guest can request that a
>> generic vm_event (containing only the vm_event-filled guest registers
>> as information) be sent to userspace by setting up the correct
>> registers and doing a VMCALL. For example, for a 64-bit guest, this
>> means: EAX = 34 (hvmop), EBX = 24 (HVMOP_request_vm_event).
> 
> I suppose you mean a 32-bit guest here? Also I'm not sure it's a good
> idea to explicitly define a guest exposed hypercall to omit one of the
> arguments normally required for it (the interface structure pointer):
> Should there ever be a reason to allow the guest to control further
> aspects of the operation by passing a structure, you'd then have to
> define a new sub-op instead of being able to re-use the current one.
> I.e. I'd strongly recommend requiring NULL to be passed here, and
> checking this in the implementation of the handler.

You're right, I've tested it on a 64-bit guest but indeed compiled the
executable as a Win32 binary, hence the confusion. I'll correct the
patch comment.

Will look into the NULL parameter option.

>> --- a/xen/arch/x86/hvm/event.c
>> +++ b/xen/arch/x86/hvm/event.c
>> @@ -126,6 +126,20 @@ void hvm_event_msr(unsigned int msr, uint64_t value)
>>          hvm_event_traps(1, &req);
>>  }
>>  
>> +void hvm_event_requested(void)
>> +{
>> +    struct vcpu *curr = current;
>> +    struct arch_domain *currad = &curr->domain->arch;
>> +
>> +    vm_event_request_t req = {
> 
> Please avoid blank lines between declarations - a blank line following
> declaration is supposed to delimit them from statements.

Ack.

>> --- a/xen/arch/x86/hvm/hvm.c
>> +++ b/xen/arch/x86/hvm/hvm.c
>> @@ -6373,6 +6373,10 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>>          break;
>>      }
>>  
>> +    case HVMOP_request_vm_event:
>> +        hvm_event_requested();
>> +        break;
> 
> No XSM check here or in the handler? Shouldn't the admin controlling
> guest properties from the host perspective be permitted control here?
> Cc-ing Daniel for his input ...

Not sure this warrants XSM checks, but sure, if they're required I'll
try to put them in.

>> --- a/xen/include/asm-x86/domain.h
>> +++ b/xen/include/asm-x86/domain.h
>> @@ -342,13 +342,15 @@ struct arch_domain
>>  
>>      /* Monitor options */
>>      struct {
>> -        uint16_t write_ctrlreg_enabled       : 4;
>> -        uint16_t write_ctrlreg_sync          : 4;
>> -        uint16_t write_ctrlreg_onchangeonly  : 4;
>> -        uint16_t mov_to_msr_enabled          : 1;
>> -        uint16_t mov_to_msr_extended         : 1;
>> -        uint16_t singlestep_enabled          : 1;
>> -        uint16_t software_breakpoint_enabled : 1;
>> +        uint32_t write_ctrlreg_enabled       : 4;
>> +        uint32_t write_ctrlreg_sync          : 4;
>> +        uint32_t write_ctrlreg_onchangeonly  : 4;
>> +        uint32_t mov_to_msr_enabled          : 1;
>> +        uint32_t mov_to_msr_extended         : 1;
>> +        uint32_t singlestep_enabled          : 1;
>> +        uint32_t software_breakpoint_enabled : 1;
>> +        uint32_t request_enabled             : 1;
>> +        uint32_t request_sync                : 1;
> 
> Can you please switch to plain unsigned int if you already have to
> touch this? There's no reason I can see to use a fixed width integer
> type here.

Ack, will make it plain int.

>> --- a/xen/include/public/hvm/hvm_op.h
>> +++ b/xen/include/public/hvm/hvm_op.h
>> @@ -389,6 +389,10 @@ DEFINE_XEN_GUEST_HANDLE(xen_hvm_evtchn_upcall_vector_t);
>>  
>>  #endif /* defined(__i386__) || defined(__x86_64__) */
>>  
>> +#if defined(__XEN__) || defined(__XEN_TOOLS__)
>> +#define HVMOP_request_vm_event 24
>> +#endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
> 
> Isn't this _specifically_ meant to be usable by a guest?

Yes, but we still need Xen tools to subscribe to the events, control how
they are received (in a dom0 or similarly privileged domain) and process
them. That was my train of thought anyway, maybe I'm missing something
(or I'm misinterpreting the macros).


Thanks,
Razvan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V2 1/3] xen/vm_access: Support for memory-content hiding
  2015-06-25 15:57   ` Jan Beulich
@ 2015-06-26  8:22     ` Razvan Cojocaru
  2015-06-26  8:44       ` Jan Beulich
  0 siblings, 1 reply; 30+ messages in thread
From: Razvan Cojocaru @ 2015-06-26  8:22 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tim, kevin.tian, wei.liu2, ian.campbell, stefano.stabellini,
	jun.nakajima, andrew.cooper3, ian.jackson, xen-devel, eddie.dong,
	Aravind.Gopalakrishnan, suravee.suthikulpanit, keir,
	boris.ostrovsky

On 06/25/2015 06:57 PM, Jan Beulich wrote:
>>>> On 15.06.15 at 11:03, <rcojocaru@bitdefender.com> wrote:
>> --- a/xen/arch/x86/hvm/emulate.c
>> +++ b/xen/arch/x86/hvm/emulate.c
>> @@ -578,6 +578,28 @@ static int hvmemul_read(
>>          container_of(ctxt, struct hvm_emulate_ctxt, ctxt));
>>  }
>>  
>> +static int hvmemul_read_set_context(
>> +    enum x86_segment seg,
>> +    unsigned long offset,
>> +    void *p_data,
>> +    unsigned int bytes,
>> +    struct x86_emulate_ctxt *ctxt)
>> +{
>> +    struct vcpu *curr = current;
>> +    unsigned int len;
>> +
>> +    if ( !curr->arch.vm_event.emul_read_data )
>> +        return X86EMUL_UNHANDLEABLE;
>> +
>> +    len = min_t(unsigned int,
>> +        bytes, curr->arch.vm_event.emul_read_data->size);
>> +
>> +    if ( len )
>> +        memcpy(p_data, curr->arch.vm_event.emul_read_data->data, len);
> 
> And the rest of the destination simply remains unmodified /
> uninitialized?

It does, yes. Should I memset() it to 0 or something else before the
memcpy()?

>> @@ -891,14 +934,37 @@ static int hvmemul_rep_outs(
>>                            !!(ctxt->regs->eflags & X86_EFLAGS_DF), NULL);
>>  }
>>  
>> -static int hvmemul_rep_movs(
>> +static int hvmemul_rep_outs_set_context(
>> +    enum x86_segment src_seg,
>> +    unsigned long src_offset,
>> +    uint16_t dst_port,
>> +    unsigned int bytes_per_rep,
>> +    unsigned long *reps,
>> +    struct x86_emulate_ctxt *ctxt)
>> +{
>> +    struct vcpu *curr = current;
>> +    unsigned int safe_bytes;
>> +
>> +    if ( !curr->arch.vm_event.emul_read_data )
>> +        return X86EMUL_UNHANDLEABLE;
>> +
>> +    safe_bytes = min_t(unsigned int, bytes_per_rep,
>> +        curr->arch.vm_event.emul_read_data->size);
>> +
>> +    return hvmemul_do_pio(dst_port, reps, safe_bytes, 0, IOREQ_WRITE,
>> +                          !!(ctxt->regs->eflags & X86_EFLAGS_DF),
>> +                          curr->arch.vm_event.emul_read_data->data);
> 
> This isn't consistent with e.g. rep_movs below - you shouldn't
> reduce the width of the operation.

I agree, but to be honest I wasn't sure of how to better go about this,
it seem a bit more involved that the rest of the cases. I could perhaps
allocate a bytes_per_rep-sized buffer, memset() it to zero and then copy
safe_bytes from curr->arch.vm_event.emul_read_data->data to the
beginning of it?

> Also - did I overlook where *reps gets forced to 1? If it's being
> done elsewhere, perhaps worth an ASSERT()?

*reps got forced to 1 in hvmemul_rep_stos_set_context() in V1, I took
that out as per your comments, please see:

http://lists.xen.org/archives/html/xen-devel/2015-05/msg01054.html

>> @@ -981,7 +1047,19 @@ static int hvmemul_rep_movs(
>>       */
>>      rc = hvm_copy_from_guest_phys(buf, sgpa, bytes);
>>      if ( rc == HVMCOPY_okay )
>> +    {
>> +        struct vcpu *curr = current;
>> +
>> +        if ( unlikely(set_context) && curr->arch.vm_event.emul_read_data )
>> +        {
>> +            unsigned long safe_bytes = min_t(unsigned long, bytes,
>> +                curr->arch.vm_event.emul_read_data->size);
> 
> The variable doesn't need to be unsigned long.

bytes is unsigned long in the function above:

unsigned long saddr, daddr, bytes;

I think that's the only place where that is the case, but I had assumed
that whoever wrote the code knew better and followed suit. I'm happy to
change both places though.

>> @@ -1000,13 +1078,40 @@ static int hvmemul_rep_movs(
>>      return X86EMUL_OKAY;
>>  }
>>  
>> -static int hvmemul_rep_stos(
>> +static int hvmemul_rep_movs(
>> +   enum x86_segment src_seg,
>> +   unsigned long src_offset,
>> +   enum x86_segment dst_seg,
>> +   unsigned long dst_offset,
>> +   unsigned int bytes_per_rep,
>> +   unsigned long *reps,
>> +   struct x86_emulate_ctxt *ctxt)
>> +{
>> +    return _hvmemul_rep_movs(src_seg, src_offset, dst_seg, dst_offset,
>> +                             bytes_per_rep, reps, ctxt, 0);
>> +}
>> +
>> +static int hvmemul_rep_movs_set_context(
>> +   enum x86_segment src_seg,
>> +   unsigned long src_offset,
>> +   enum x86_segment dst_seg,
>> +   unsigned long dst_offset,
>> +   unsigned int bytes_per_rep,
>> +   unsigned long *reps,
>> +   struct x86_emulate_ctxt *ctxt)
>> +{
>> +    return _hvmemul_rep_movs(src_seg, src_offset, dst_seg, dst_offset,
>> +                             bytes_per_rep, reps, ctxt, 1);
>> +}
> 
> Perhaps putting a flag hvmemul_ctxt would be better?

Sorry, I don't understand the comment. Can you, please, clarify?

>> @@ -1408,6 +1569,32 @@ static const struct x86_emulate_ops hvm_emulate_ops_no_write = {
>>      .invlpg        = hvmemul_invlpg
>>  };
>>  
>> +static const struct x86_emulate_ops hvm_emulate_ops_set_context = {
>> +    .read          = hvmemul_read_set_context,
>> +    .insn_fetch    = hvmemul_insn_fetch,
>> +    .write         = hvmemul_write,
>> +    .cmpxchg       = hvmemul_cmpxchg_set_context,
>> +    .rep_ins       = hvmemul_rep_ins,
>> +    .rep_outs      = hvmemul_rep_outs_set_context,
>> +    .rep_movs      = hvmemul_rep_movs_set_context,
>> +    .rep_stos      = hvmemul_rep_stos_set_context,
> 
> If you don't override .write, why would you override .rep_stos?

I shouldn't, I got carried away with it (twice). I apologize.
I'll remove the rep_stos customization in V3.

>> @@ -1528,18 +1715,31 @@ int hvm_emulate_one_no_write(
>>      return _hvm_emulate_one(hvmemul_ctxt, &hvm_emulate_ops_no_write);
>>  }
>>  
>> -void hvm_mem_access_emulate_one(bool_t nowrite, unsigned int trapnr,
>> +int hvm_emulate_one_set_context(
> 
> static?

Ack.

>> +void hvm_mem_access_emulate_one(enum emul_kind kind, unsigned int trapnr,
>>      unsigned int errcode)
>>  {
>>      struct hvm_emulate_ctxt ctx = {{ 0 }};
>> -    int rc;
>> +    int rc = X86EMUL_UNHANDLEABLE;
>>  
>>      hvm_emulate_prepare(&ctx, guest_cpu_user_regs());
>>  
>> -    if ( nowrite )
>> -        rc = hvm_emulate_one_no_write(&ctx);
>> -    else
>> +    switch ( kind ) {
>> +    case EMUL_KIND_NORMAL:
>>          rc = hvm_emulate_one(&ctx);
> 
> Perhaps this should be the default case? If not, the initialization
> of rc would better be put in the default case instead of at the top
> of the function.

Ack, will make that the default case.

>> @@ -63,6 +64,21 @@ static int vm_event_enable(
>>      vm_event_ring_lock_init(ved);
>>      vm_event_ring_lock(ved);
>>  
>> +    for_each_vcpu( d, v )
>> +    {
>> +        if ( v->arch.vm_event.emul_read_data )
>> +            break;
> 
> continue?

Of course, thanks.

In addition to all the comments, Tamas has kindly pointed out that the
correct name is mem_access, not vm_event (in the patch title), I'll be
fixing that too in V3. Sorry for the typo.


Thanks,
Razvan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V2 3/3] xen/vm_event: Deny register writes if refused by vm_event reply
  2015-06-15  9:03 ` [PATCH V2 3/3] xen/vm_event: Deny register writes if refused by vm_event reply Razvan Cojocaru
@ 2015-06-26  8:28   ` Jan Beulich
  2015-06-26  9:17     ` Razvan Cojocaru
  2015-07-01 15:21     ` Razvan Cojocaru
  0 siblings, 2 replies; 30+ messages in thread
From: Jan Beulich @ 2015-06-26  8:28 UTC (permalink / raw)
  To: Razvan Cojocaru
  Cc: tim, kevin.tian, wei.liu2, ian.campbell, stefano.stabellini,
	jun.nakajima, andrew.cooper3, ian.jackson, xen-devel, eddie.dong,
	Aravind.Gopalakrishnan, suravee.suthikulpanit, keir,
	boris.ostrovsky

>>> On 15.06.15 at 11:03, <rcojocaru@bitdefender.com> wrote:
> Deny register writes if a vm_client subscribed to mov_to_msr events
> forbids them. Currently supported for MSR, CR0, CR3 and CR4 events.

Is the first sentence referring the mov_to_msr stale?

> --- a/xen/arch/x86/hvm/event.c
> +++ b/xen/arch/x86/hvm/event.c
> @@ -90,7 +90,7 @@ static int hvm_event_traps(uint8_t sync, vm_event_request_t *req)
>      return 1;
>  }
>  
> -void hvm_event_cr(unsigned int index, unsigned long value, unsigned long old)
> +bool_t hvm_event_cr(unsigned int index, unsigned long value, unsigned long old)
>  {
>      struct arch_domain *currad = &current->domain->arch;
>      unsigned int ctrlreg_bitmask = monitor_ctrlreg_bitmask(index);
> @@ -109,7 +109,10 @@ void hvm_event_cr(unsigned int index, unsigned long value, unsigned long old)
>  
>          hvm_event_traps(currad->monitor.write_ctrlreg_sync & ctrlreg_bitmask,
>                          &req);
> +        return 1;
>      }
> +
> +    return 0;
>  }
>  
>  void hvm_event_msr(unsigned int msr, uint64_t value)

Why is knowing whether an event was sent relevant for
hvm_event_cr(), but not for hvm_event_msr()?

> @@ -468,6 +469,37 @@ void hvm_do_resume(struct vcpu *v)
>          }
>      }
>  
> +    ASSERT(v == current);
> +
> +    if ( d->arch.event_write_data )

unlikely() to not penalize the common case? Also an ASSERT() like
this belongs either at the beginning of the function or, if really only
relevant with the changes you do here, inside the if().

> @@ -3189,12 +3221,13 @@ static void hvm_update_cr(struct vcpu *v, unsigned int cr, unsigned long value)
>      hvm_update_guest_cr(v, cr);
>  }
>  
> -int hvm_set_cr0(unsigned long value)
> +int hvm_set_cr0(unsigned long value, bool_t event_only)

"event_only" seems pretty misleading, as it limits the function's
operation only when ...

> @@ -3224,6 +3257,22 @@ int hvm_set_cr0(unsigned long value)
>          goto gpf;
>      }
>  
> +    if ( event_only && unlikely(currad->monitor.write_ctrlreg_enabled &
> +                                monitor_ctrlreg_bitmask(VM_EVENT_X86_CR0)) &&
> +                                value != old_value )

... a number of other conditions are true. Maybe "may_defer" or
some such making clear this is conditional behavior?

Also - indentation.

> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -2010,9 +2010,9 @@ static int vmx_cr_access(unsigned long exit_qualification)
>      }
>      case VMX_CONTROL_REG_ACCESS_TYPE_CLTS: {
>          unsigned long old = curr->arch.hvm_vcpu.guest_cr[0];
> +        hvm_event_crX(CR0, curr->arch.hvm_vcpu.guest_cr[0], old);
>          curr->arch.hvm_vcpu.guest_cr[0] &= ~X86_CR0_TS;
>          vmx_update_guest_cr(curr, 0);
> -        hvm_event_crX(CR0, curr->arch.hvm_vcpu.guest_cr[0], old);
>          HVMTRACE_0D(CLTS);
>          break;
>      }

I suppose it is intentional to ignore a possible deny here? If so,
shouldn't the be documented by way of a comment?

Also, since you already touch this code, please add a blank line
between declaration and statements.

> --- a/xen/arch/x86/hvm/vmx/vvmx.c
> +++ b/xen/arch/x86/hvm/vmx/vvmx.c
> @@ -1048,15 +1048,16 @@ static void load_shadow_guest_state(struct vcpu *v)
>  
>      nvcpu->guest_cr[0] = __get_vvmcs(vvmcs, CR0_READ_SHADOW);
>      nvcpu->guest_cr[4] = __get_vvmcs(vvmcs, CR4_READ_SHADOW);
> -    hvm_set_cr0(__get_vvmcs(vvmcs, GUEST_CR0));
> -    hvm_set_cr4(__get_vvmcs(vvmcs, GUEST_CR4));
> -    hvm_set_cr3(__get_vvmcs(vvmcs, GUEST_CR3));
> +    hvm_set_cr0(__get_vvmcs(vvmcs, GUEST_CR0), 1);
> +    hvm_set_cr4(__get_vvmcs(vvmcs, GUEST_CR4), 1);
> +    hvm_set_cr3(__get_vvmcs(vvmcs, GUEST_CR3), 1);

Are you sure there are no dependencies between the individual
registers getting set? And what about the order here versus how
you carry out the deferred writes in hvm_do_resume()? I don't
think any good can come from updating CR3 before CR4...

>      control = __get_vvmcs(vvmcs, VM_ENTRY_CONTROLS);
>      if ( control & VM_ENTRY_LOAD_GUEST_PAT )
>          hvm_set_guest_pat(v, __get_vvmcs(vvmcs, GUEST_PAT));
>      if ( control & VM_ENTRY_LOAD_PERF_GLOBAL_CTRL )
> -        hvm_msr_write_intercept(MSR_CORE_PERF_GLOBAL_CTRL, __get_vvmcs(vvmcs, GUEST_PERF_GLOBAL_CTRL));
> +        hvm_msr_write_intercept(MSR_CORE_PERF_GLOBAL_CTRL,
> +                                __get_vvmcs(vvmcs, GUEST_PERF_GLOBAL_CTRL), 1);

So one special MSR gets potentially denied writes to - how about
all the other ones like PAT (visible above), EFER, etc? And once
you send events for multiple ones - how would you know which
ones were denied access to be the time to reach hvm_do_resume()?

All the same questions of course apply to nested SVM code too, just
that there the problems are leass easily visible from looking at the
patch.

> @@ -1249,15 +1250,16 @@ static void load_vvmcs_host_state(struct vcpu *v)
>          __vmwrite(vmcs_h2g_field[i].guest_field, r);
>      }
>  
> -    hvm_set_cr0(__get_vvmcs(vvmcs, HOST_CR0));
> -    hvm_set_cr4(__get_vvmcs(vvmcs, HOST_CR4));
> -    hvm_set_cr3(__get_vvmcs(vvmcs, HOST_CR3));
> +    hvm_set_cr0(__get_vvmcs(vvmcs, HOST_CR0), 1);
> +    hvm_set_cr4(__get_vvmcs(vvmcs, HOST_CR4), 1);
> +    hvm_set_cr3(__get_vvmcs(vvmcs, HOST_CR3), 1);
>  
>      control = __get_vvmcs(vvmcs, VM_EXIT_CONTROLS);
>      if ( control & VM_EXIT_LOAD_HOST_PAT )
>          hvm_set_guest_pat(v, __get_vvmcs(vvmcs, HOST_PAT));
>      if ( control & VM_EXIT_LOAD_PERF_GLOBAL_CTRL )
> -        hvm_msr_write_intercept(MSR_CORE_PERF_GLOBAL_CTRL, __get_vvmcs(vvmcs, HOST_PERF_GLOBAL_CTRL));
> +        hvm_msr_write_intercept(MSR_CORE_PERF_GLOBAL_CTRL,
> +                                __get_vvmcs(vvmcs, HOST_PERF_GLOBAL_CTRL), 1);

Considering these together with the above - do you really want/
need to intercept and send events for both host and guest shadow
state changes?

> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -1417,6 +1417,35 @@ static void p2m_vm_event_fill_regs(vm_event_request_t *req)
>  void p2m_mem_access_emulate_check(struct vcpu *v,
>                                    const vm_event_response_t *rsp)
>  {
> +    ASSERT(v->domain->arch.event_write_data != NULL);
> +
> +    if ( rsp->flags & MEM_ACCESS_DENY )
> +    {
> +        struct monitor_write_data *w =
> +            &v->domain->arch.event_write_data[v->vcpu_id];
> +

I think the ASSERT() above belongs here.

> --- a/xen/include/asm-x86/domain.h
> +++ b/xen/include/asm-x86/domain.h
> @@ -250,6 +250,21 @@ struct pv_domain
>      struct mapcache_domain mapcache;
>  };
>  
> +struct monitor_write_data {
> +    struct {
> +        uint8_t msr : 1;
> +        uint8_t cr0 : 1;
> +        uint8_t cr3 : 1;
> +        uint8_t cr4 : 1;

unsigned int

> +    } do_write;
> +
> +    uint64_t msr;

unsigned int or uint32_t

> --- a/xen/include/public/vm_event.h
> +++ b/xen/include/public/vm_event.h
> @@ -158,6 +158,11 @@ struct vm_event_regs_x86 {
>   * MEM_ACCESS_EMULATE_NOWRITE.
>   */
>  #define MEM_ACCESS_SET_EMUL_READ_DATA   (1 << 8)
> + /*
> +  * Deny completion of the operation that triggered the event.
> +  * Currently only useful for MSR write events.
> +  */
> +#define MEM_ACCESS_DENY                 (1 << 9)

Same as for the description - isn't the comment, referring to only
MSR writes, stale?

Jan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V2 1/3] xen/vm_access: Support for memory-content hiding
  2015-06-26  8:22     ` Razvan Cojocaru
@ 2015-06-26  8:44       ` Jan Beulich
  2015-06-26  9:49         ` Razvan Cojocaru
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Beulich @ 2015-06-26  8:44 UTC (permalink / raw)
  To: Razvan Cojocaru
  Cc: tim, kevin.tian, wei.liu2, ian.campbell, stefano.stabellini,
	jun.nakajima, andrew.cooper3, ian.jackson, xen-devel, eddie.dong,
	Aravind.Gopalakrishnan, suravee.suthikulpanit, keir,
	boris.ostrovsky

>>> On 26.06.15 at 10:22, <rcojocaru@bitdefender.com> wrote:
> On 06/25/2015 06:57 PM, Jan Beulich wrote:
>>>>> On 15.06.15 at 11:03, <rcojocaru@bitdefender.com> wrote:
>>> --- a/xen/arch/x86/hvm/emulate.c
>>> +++ b/xen/arch/x86/hvm/emulate.c
>>> @@ -578,6 +578,28 @@ static int hvmemul_read(
>>>          container_of(ctxt, struct hvm_emulate_ctxt, ctxt));
>>>  }
>>>  
>>> +static int hvmemul_read_set_context(
>>> +    enum x86_segment seg,
>>> +    unsigned long offset,
>>> +    void *p_data,
>>> +    unsigned int bytes,
>>> +    struct x86_emulate_ctxt *ctxt)
>>> +{
>>> +    struct vcpu *curr = current;
>>> +    unsigned int len;
>>> +
>>> +    if ( !curr->arch.vm_event.emul_read_data )
>>> +        return X86EMUL_UNHANDLEABLE;
>>> +
>>> +    len = min_t(unsigned int,
>>> +        bytes, curr->arch.vm_event.emul_read_data->size);
>>> +
>>> +    if ( len )
>>> +        memcpy(p_data, curr->arch.vm_event.emul_read_data->data, len);
>> 
>> And the rest of the destination simply remains unmodified /
>> uninitialized?
> 
> It does, yes. Should I memset() it to 0 or something else before the
> memcpy()?

That really depends on your intentions. It just seems (together
with the other cases) inconsistently handled at present. I see
you having three options:
- return actual memory contents for parts of the writes outside
  the overridden range,
- return zero (or another preset value) for such parts,
- fail such operations.
What you shouldn't do is leave data potentially uninitialized (or
else you'd need to make sure that this doesn't result in leaking
hypervisor stack contents anywhere).

>>> @@ -891,14 +934,37 @@ static int hvmemul_rep_outs(
>>>                            !!(ctxt->regs->eflags & X86_EFLAGS_DF), NULL);
>>>  }
>>>  
>>> -static int hvmemul_rep_movs(
>>> +static int hvmemul_rep_outs_set_context(
>>> +    enum x86_segment src_seg,
>>> +    unsigned long src_offset,
>>> +    uint16_t dst_port,
>>> +    unsigned int bytes_per_rep,
>>> +    unsigned long *reps,
>>> +    struct x86_emulate_ctxt *ctxt)
>>> +{
>>> +    struct vcpu *curr = current;
>>> +    unsigned int safe_bytes;
>>> +
>>> +    if ( !curr->arch.vm_event.emul_read_data )
>>> +        return X86EMUL_UNHANDLEABLE;
>>> +
>>> +    safe_bytes = min_t(unsigned int, bytes_per_rep,
>>> +        curr->arch.vm_event.emul_read_data->size);
>>> +
>>> +    return hvmemul_do_pio(dst_port, reps, safe_bytes, 0, IOREQ_WRITE,
>>> +                          !!(ctxt->regs->eflags & X86_EFLAGS_DF),
>>> +                          curr->arch.vm_event.emul_read_data->data);
>> 
>> This isn't consistent with e.g. rep_movs below - you shouldn't
>> reduce the width of the operation.
> 
> I agree, but to be honest I wasn't sure of how to better go about this,
> it seem a bit more involved that the rest of the cases. I could perhaps
> allocate a bytes_per_rep-sized buffer, memset() it to zero and then copy
> safe_bytes from curr->arch.vm_event.emul_read_data->data to the
> beginning of it?

See above - you first need to define the model you want to follow,
and then you need to handle this uniformly everywhere.

>> Also - did I overlook where *reps gets forced to 1? If it's being
>> done elsewhere, perhaps worth an ASSERT()?
> 
> *reps got forced to 1 in hvmemul_rep_stos_set_context() in V1, I took
> that out as per your comments, please see:
> 
> http://lists.xen.org/archives/html/xen-devel/2015-05/msg01054.html 

That's fine, but then you can't simply ignore it in the safe_bytes
calculation.

>>> @@ -981,7 +1047,19 @@ static int hvmemul_rep_movs(
>>>       */
>>>      rc = hvm_copy_from_guest_phys(buf, sgpa, bytes);
>>>      if ( rc == HVMCOPY_okay )
>>> +    {
>>> +        struct vcpu *curr = current;
>>> +
>>> +        if ( unlikely(set_context) && curr->arch.vm_event.emul_read_data )
>>> +        {
>>> +            unsigned long safe_bytes = min_t(unsigned long, bytes,
>>> +                curr->arch.vm_event.emul_read_data->size);
>> 
>> The variable doesn't need to be unsigned long.
> 
> bytes is unsigned long in the function above:
> 
> unsigned long saddr, daddr, bytes;
> 
> I think that's the only place where that is the case, but I had assumed
> that whoever wrote the code knew better and followed suit. I'm happy to
> change both places though.

No, that's not the point. The point is that the result of
min(<unsigned int>, <unsigned long>) always fits in
unsigned int.

>>> @@ -1000,13 +1078,40 @@ static int hvmemul_rep_movs(
>>>      return X86EMUL_OKAY;
>>>  }
>>>  
>>> -static int hvmemul_rep_stos(
>>> +static int hvmemul_rep_movs(
>>> +   enum x86_segment src_seg,
>>> +   unsigned long src_offset,
>>> +   enum x86_segment dst_seg,
>>> +   unsigned long dst_offset,
>>> +   unsigned int bytes_per_rep,
>>> +   unsigned long *reps,
>>> +   struct x86_emulate_ctxt *ctxt)
>>> +{
>>> +    return _hvmemul_rep_movs(src_seg, src_offset, dst_seg, dst_offset,
>>> +                             bytes_per_rep, reps, ctxt, 0);
>>> +}
>>> +
>>> +static int hvmemul_rep_movs_set_context(
>>> +   enum x86_segment src_seg,
>>> +   unsigned long src_offset,
>>> +   enum x86_segment dst_seg,
>>> +   unsigned long dst_offset,
>>> +   unsigned int bytes_per_rep,
>>> +   unsigned long *reps,
>>> +   struct x86_emulate_ctxt *ctxt)
>>> +{
>>> +    return _hvmemul_rep_movs(src_seg, src_offset, dst_seg, dst_offset,
>>> +                             bytes_per_rep, reps, ctxt, 1);
>>> +}
>> 
>> Perhaps putting a flag hvmemul_ctxt would be better?
> 
> Sorry, I don't understand the comment. Can you, please, clarify?

All these wrappers irritate me - a flag in hvmemul_ctxt would - afaict -
eliminate the need for all of them.

> In addition to all the comments, Tamas has kindly pointed out that the
> correct name is mem_access, not vm_event (in the patch title), I'll be
> fixing that too in V3. Sorry for the typo.

Sure - I simply try to not repeat comments others have made.

Jan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V2 2/3] xen/vm_event: Support for guest-requested events
  2015-06-26  7:17     ` Razvan Cojocaru
@ 2015-06-26  8:45       ` Jan Beulich
  2015-06-30 14:48       ` Lengyel, Tamas
  1 sibling, 0 replies; 30+ messages in thread
From: Jan Beulich @ 2015-06-26  8:45 UTC (permalink / raw)
  To: Razvan Cojocaru
  Cc: tim, kevin.tian, wei.liu2, ian.campbell, stefano.stabellini,
	jun.nakajima, andrew.cooper3, ian.jackson, xen-devel, eddie.dong,
	Aravind.Gopalakrishnan, suravee.suthikulpanit, keir,
	boris.ostrovsky, dgdegra

>>> On 26.06.15 at 09:17, <rcojocaru@bitdefender.com> wrote:
> On 06/26/2015 10:02 AM, Jan Beulich wrote:
>>>>> On 15.06.15 at 11:03, <rcojocaru@bitdefender.com> wrote:
>>> --- a/xen/include/public/hvm/hvm_op.h
>>> +++ b/xen/include/public/hvm/hvm_op.h
>>> @@ -389,6 +389,10 @@ DEFINE_XEN_GUEST_HANDLE(xen_hvm_evtchn_upcall_vector_t);
>>>  
>>>  #endif /* defined(__i386__) || defined(__x86_64__) */
>>>  
>>> +#if defined(__XEN__) || defined(__XEN_TOOLS__)
>>> +#define HVMOP_request_vm_event 24
>>> +#endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
>> 
>> Isn't this _specifically_ meant to be usable by a guest?
> 
> Yes, but we still need Xen tools to subscribe to the events, control how
> they are received (in a dom0 or similarly privileged domain) and process
> them. That was my train of thought anyway, maybe I'm missing something
> (or I'm misinterpreting the macros).

The way it's done above you _limit_ the #define's visibility to
Xen and the tool stack, i.e. you namely exclude guests from
seeing (and hence using) it.

Jan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V2 3/3] xen/vm_event: Deny register writes if refused by vm_event reply
  2015-06-26  8:28   ` Jan Beulich
@ 2015-06-26  9:17     ` Razvan Cojocaru
  2015-06-26  9:39       ` Jan Beulich
  2015-07-01 15:21     ` Razvan Cojocaru
  1 sibling, 1 reply; 30+ messages in thread
From: Razvan Cojocaru @ 2015-06-26  9:17 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tim, kevin.tian, wei.liu2, ian.campbell, stefano.stabellini,
	jun.nakajima, andrew.cooper3, ian.jackson, xen-devel, eddie.dong,
	Aravind.Gopalakrishnan, suravee.suthikulpanit, keir,
	boris.ostrovsky

On 06/26/2015 11:28 AM, Jan Beulich wrote:
>>>> On 15.06.15 at 11:03, <rcojocaru@bitdefender.com> wrote:
>> Deny register writes if a vm_client subscribed to mov_to_msr events
>> forbids them. Currently supported for MSR, CR0, CR3 and CR4 events.
> 
> Is the first sentence referring the mov_to_msr stale?

Yes, the patch now applies to MSR, CR0, CR3 and CR4. I'll update the
patch description.

>> --- a/xen/arch/x86/hvm/event.c
>> +++ b/xen/arch/x86/hvm/event.c
>> @@ -90,7 +90,7 @@ static int hvm_event_traps(uint8_t sync, vm_event_request_t *req)
>>      return 1;
>>  }
>>  
>> -void hvm_event_cr(unsigned int index, unsigned long value, unsigned long old)
>> +bool_t hvm_event_cr(unsigned int index, unsigned long value, unsigned long old)
>>  {
>>      struct arch_domain *currad = &current->domain->arch;
>>      unsigned int ctrlreg_bitmask = monitor_ctrlreg_bitmask(index);
>> @@ -109,7 +109,10 @@ void hvm_event_cr(unsigned int index, unsigned long value, unsigned long old)
>>  
>>          hvm_event_traps(currad->monitor.write_ctrlreg_sync & ctrlreg_bitmask,
>>                          &req);
>> +        return 1;
>>      }
>> +
>> +    return 0;
>>  }
>>  
>>  void hvm_event_msr(unsigned int msr, uint64_t value)
> 
> Why is knowing whether an event was sent relevant for
> hvm_event_cr(), but not for hvm_event_msr()?

MSR events don't honor onchangeonly, so they're always being sent. A CR
event won't be sent if onchangeonly is true and the register is being
set to the same value again.

Since the change prompted the question, I should perhaps add a comment
to explain that?

>> @@ -468,6 +469,37 @@ void hvm_do_resume(struct vcpu *v)
>>          }
>>      }
>>  
>> +    ASSERT(v == current);
>> +
>> +    if ( d->arch.event_write_data )
> 
> unlikely() to not penalize the common case? Also an ASSERT() like
> this belongs either at the beginning of the function or, if really only
> relevant with the changes you do here, inside the if().

Ack, will move the ASSERT() and wrap the condition with an unlikely().

>> @@ -3189,12 +3221,13 @@ static void hvm_update_cr(struct vcpu *v, unsigned int cr, unsigned long value)
>>      hvm_update_guest_cr(v, cr);
>>  }
>>  
>> -int hvm_set_cr0(unsigned long value)
>> +int hvm_set_cr0(unsigned long value, bool_t event_only)
> 
> "event_only" seems pretty misleading, as it limits the function's
> operation only when ...
> 
>> @@ -3224,6 +3257,22 @@ int hvm_set_cr0(unsigned long value)
>>          goto gpf;
>>      }
>>  
>> +    if ( event_only && unlikely(currad->monitor.write_ctrlreg_enabled &
>> +                                monitor_ctrlreg_bitmask(VM_EVENT_X86_CR0)) &&
>> +                                value != old_value )
> 
> ... a number of other conditions are true. Maybe "may_defer" or
> some such making clear this is conditional behavior?
> 
> Also - indentation.

Ack on both counts.

>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> @@ -2010,9 +2010,9 @@ static int vmx_cr_access(unsigned long exit_qualification)
>>      }
>>      case VMX_CONTROL_REG_ACCESS_TYPE_CLTS: {
>>          unsigned long old = curr->arch.hvm_vcpu.guest_cr[0];
>> +        hvm_event_crX(CR0, curr->arch.hvm_vcpu.guest_cr[0], old);
>>          curr->arch.hvm_vcpu.guest_cr[0] &= ~X86_CR0_TS;
>>          vmx_update_guest_cr(curr, 0);
>> -        hvm_event_crX(CR0, curr->arch.hvm_vcpu.guest_cr[0], old);
>>          HVMTRACE_0D(CLTS);
>>          break;
>>      }
> 
> I suppose it is intentional to ignore a possible deny here? If so,
> shouldn't the be documented by way of a comment?

Actually on second thought I should probably not ignore a deny there.
While this wasn't required in tests, it's probably better to be consistent.

> Also, since you already touch this code, please add a blank line
> between declaration and statements.

Ack.

>> --- a/xen/arch/x86/hvm/vmx/vvmx.c
>> +++ b/xen/arch/x86/hvm/vmx/vvmx.c
>> @@ -1048,15 +1048,16 @@ static void load_shadow_guest_state(struct vcpu *v)
>>  
>>      nvcpu->guest_cr[0] = __get_vvmcs(vvmcs, CR0_READ_SHADOW);
>>      nvcpu->guest_cr[4] = __get_vvmcs(vvmcs, CR4_READ_SHADOW);
>> -    hvm_set_cr0(__get_vvmcs(vvmcs, GUEST_CR0));
>> -    hvm_set_cr4(__get_vvmcs(vvmcs, GUEST_CR4));
>> -    hvm_set_cr3(__get_vvmcs(vvmcs, GUEST_CR3));
>> +    hvm_set_cr0(__get_vvmcs(vvmcs, GUEST_CR0), 1);
>> +    hvm_set_cr4(__get_vvmcs(vvmcs, GUEST_CR4), 1);
>> +    hvm_set_cr3(__get_vvmcs(vvmcs, GUEST_CR3), 1);
> 
> Are you sure there are no dependencies between the individual
> registers getting set? And what about the order here versus how
> you carry out the deferred writes in hvm_do_resume()? I don't
> think any good can come from updating CR3 before CR4...

Ack, I'll mirror this order in hvm_do_resume().

>>      control = __get_vvmcs(vvmcs, VM_ENTRY_CONTROLS);
>>      if ( control & VM_ENTRY_LOAD_GUEST_PAT )
>>          hvm_set_guest_pat(v, __get_vvmcs(vvmcs, GUEST_PAT));
>>      if ( control & VM_ENTRY_LOAD_PERF_GLOBAL_CTRL )
>> -        hvm_msr_write_intercept(MSR_CORE_PERF_GLOBAL_CTRL, __get_vvmcs(vvmcs, GUEST_PERF_GLOBAL_CTRL));
>> +        hvm_msr_write_intercept(MSR_CORE_PERF_GLOBAL_CTRL,
>> +                                __get_vvmcs(vvmcs, GUEST_PERF_GLOBAL_CTRL), 1);
> 
> So one special MSR gets potentially denied writes to - how about
> all the other ones like PAT (visible above), EFER, etc? And once
> you send events for multiple ones - how would you know which
> ones were denied access to be the time to reach hvm_do_resume()?
> 
> All the same questions of course apply to nested SVM code too, just
> that there the problems are leass easily visible from looking at the
> patch.

"Regular" MSRs (the ones that go through hvm_msr_write_intercept()) are
sufficient in all the introspection use cases we've tested. The code
would of course have the problem you've mentioned if the code would be
modified to include PAT & friends, but for now this doesn't seem to be
an issue.

Should I add a comment, maybe in hvm_do_resume()?

>> @@ -1249,15 +1250,16 @@ static void load_vvmcs_host_state(struct vcpu *v)
>>          __vmwrite(vmcs_h2g_field[i].guest_field, r);
>>      }
>>  
>> -    hvm_set_cr0(__get_vvmcs(vvmcs, HOST_CR0));
>> -    hvm_set_cr4(__get_vvmcs(vvmcs, HOST_CR4));
>> -    hvm_set_cr3(__get_vvmcs(vvmcs, HOST_CR3));
>> +    hvm_set_cr0(__get_vvmcs(vvmcs, HOST_CR0), 1);
>> +    hvm_set_cr4(__get_vvmcs(vvmcs, HOST_CR4), 1);
>> +    hvm_set_cr3(__get_vvmcs(vvmcs, HOST_CR3), 1);
>>  
>>      control = __get_vvmcs(vvmcs, VM_EXIT_CONTROLS);
>>      if ( control & VM_EXIT_LOAD_HOST_PAT )
>>          hvm_set_guest_pat(v, __get_vvmcs(vvmcs, HOST_PAT));
>>      if ( control & VM_EXIT_LOAD_PERF_GLOBAL_CTRL )
>> -        hvm_msr_write_intercept(MSR_CORE_PERF_GLOBAL_CTRL, __get_vvmcs(vvmcs, HOST_PERF_GLOBAL_CTRL));
>> +        hvm_msr_write_intercept(MSR_CORE_PERF_GLOBAL_CTRL,
>> +                                __get_vvmcs(vvmcs, HOST_PERF_GLOBAL_CTRL), 1);
> 
> Considering these together with the above - do you really want/
> need to intercept and send events for both host and guest shadow
> state changes?

Guest MSRs should suffice indeed. I was just trying to be consistent,
but no, the host changes should not be necessary.

>> --- a/xen/arch/x86/mm/p2m.c
>> +++ b/xen/arch/x86/mm/p2m.c
>> @@ -1417,6 +1417,35 @@ static void p2m_vm_event_fill_regs(vm_event_request_t *req)
>>  void p2m_mem_access_emulate_check(struct vcpu *v,
>>                                    const vm_event_response_t *rsp)
>>  {
>> +    ASSERT(v->domain->arch.event_write_data != NULL);
>> +
>> +    if ( rsp->flags & MEM_ACCESS_DENY )
>> +    {
>> +        struct monitor_write_data *w =
>> +            &v->domain->arch.event_write_data[v->vcpu_id];
>> +
> 
> I think the ASSERT() above belongs here.

Ack.

>> --- a/xen/include/asm-x86/domain.h
>> +++ b/xen/include/asm-x86/domain.h
>> @@ -250,6 +250,21 @@ struct pv_domain
>>      struct mapcache_domain mapcache;
>>  };
>>  
>> +struct monitor_write_data {
>> +    struct {
>> +        uint8_t msr : 1;
>> +        uint8_t cr0 : 1;
>> +        uint8_t cr3 : 1;
>> +        uint8_t cr4 : 1;
> 
> unsigned int

Ack.

> 
>> +    } do_write;
>> +
>> +    uint64_t msr;
> 
> unsigned int or uint32_t

Ack.

> 
>> --- a/xen/include/public/vm_event.h
>> +++ b/xen/include/public/vm_event.h
>> @@ -158,6 +158,11 @@ struct vm_event_regs_x86 {
>>   * MEM_ACCESS_EMULATE_NOWRITE.
>>   */
>>  #define MEM_ACCESS_SET_EMUL_READ_DATA   (1 << 8)
>> + /*
>> +  * Deny completion of the operation that triggered the event.
>> +  * Currently only useful for MSR write events.
>> +  */
>> +#define MEM_ACCESS_DENY                 (1 << 9)
> 
> Same as for the description - isn't the comment, referring to only
> MSR writes, stale?

It is. Will update it.


Thanks,
Razvan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V2 3/3] xen/vm_event: Deny register writes if refused by vm_event reply
  2015-06-26  9:17     ` Razvan Cojocaru
@ 2015-06-26  9:39       ` Jan Beulich
  2015-06-26 10:33         ` Razvan Cojocaru
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Beulich @ 2015-06-26  9:39 UTC (permalink / raw)
  To: Razvan Cojocaru
  Cc: tim, kevin.tian, wei.liu2, ian.campbell, stefano.stabellini,
	jun.nakajima, andrew.cooper3, ian.jackson, xen-devel, eddie.dong,
	Aravind.Gopalakrishnan, suravee.suthikulpanit, keir,
	boris.ostrovsky

>>> On 26.06.15 at 11:17, <rcojocaru@bitdefender.com> wrote:
> On 06/26/2015 11:28 AM, Jan Beulich wrote:
>>>>> On 15.06.15 at 11:03, <rcojocaru@bitdefender.com> wrote:
>>> @@ -109,7 +109,10 @@ void hvm_event_cr(unsigned int index, unsigned long value, unsigned long old)
>>>  
>>>          hvm_event_traps(currad->monitor.write_ctrlreg_sync & ctrlreg_bitmask,
>>>                          &req);
>>> +        return 1;
>>>      }
>>> +
>>> +    return 0;
>>>  }
>>>  
>>>  void hvm_event_msr(unsigned int msr, uint64_t value)
>> 
>> Why is knowing whether an event was sent relevant for
>> hvm_event_cr(), but not for hvm_event_msr()?
> 
> MSR events don't honor onchangeonly, so they're always being sent. A CR
> event won't be sent if onchangeonly is true and the register is being
> set to the same value again.
> 
> Since the change prompted the question, I should perhaps add a comment
> to explain that?

Might be worth it.

>>> --- a/xen/arch/x86/hvm/vmx/vvmx.c
>>> +++ b/xen/arch/x86/hvm/vmx/vvmx.c
>>> @@ -1048,15 +1048,16 @@ static void load_shadow_guest_state(struct vcpu *v)
>>>  
>>>      nvcpu->guest_cr[0] = __get_vvmcs(vvmcs, CR0_READ_SHADOW);
>>>      nvcpu->guest_cr[4] = __get_vvmcs(vvmcs, CR4_READ_SHADOW);
>>> -    hvm_set_cr0(__get_vvmcs(vvmcs, GUEST_CR0));
>>> -    hvm_set_cr4(__get_vvmcs(vvmcs, GUEST_CR4));
>>> -    hvm_set_cr3(__get_vvmcs(vvmcs, GUEST_CR3));
>>> +    hvm_set_cr0(__get_vvmcs(vvmcs, GUEST_CR0), 1);
>>> +    hvm_set_cr4(__get_vvmcs(vvmcs, GUEST_CR4), 1);
>>> +    hvm_set_cr3(__get_vvmcs(vvmcs, GUEST_CR3), 1);
>> 
>> Are you sure there are no dependencies between the individual
>> registers getting set? And what about the order here versus how
>> you carry out the deferred writes in hvm_do_resume()? I don't
>> think any good can come from updating CR3 before CR4...
> 
> Ack, I'll mirror this order in hvm_do_resume().

With the caveat that viewed globally there may not be one single
correct order, namely when also taking MSRs into account.

>>>      control = __get_vvmcs(vvmcs, VM_ENTRY_CONTROLS);
>>>      if ( control & VM_ENTRY_LOAD_GUEST_PAT )
>>>          hvm_set_guest_pat(v, __get_vvmcs(vvmcs, GUEST_PAT));
>>>      if ( control & VM_ENTRY_LOAD_PERF_GLOBAL_CTRL )
>>> -        hvm_msr_write_intercept(MSR_CORE_PERF_GLOBAL_CTRL, __get_vvmcs(vvmcs, GUEST_PERF_GLOBAL_CTRL));
>>> +        hvm_msr_write_intercept(MSR_CORE_PERF_GLOBAL_CTRL,
>>> +                                __get_vvmcs(vvmcs, GUEST_PERF_GLOBAL_CTRL), 1);
>> 
>> So one special MSR gets potentially denied writes to - how about
>> all the other ones like PAT (visible above), EFER, etc? And once
>> you send events for multiple ones - how would you know which
>> ones were denied access to be the time to reach hvm_do_resume()?
>> 
>> All the same questions of course apply to nested SVM code too, just
>> that there the problems are leass easily visible from looking at the
>> patch.
> 
> "Regular" MSRs (the ones that go through hvm_msr_write_intercept()) are
> sufficient in all the introspection use cases we've tested. The code
> would of course have the problem you've mentioned if the code would be
> modified to include PAT & friends, but for now this doesn't seem to be
> an issue.
> 
> Should I add a comment, maybe in hvm_do_resume()?

Not sure where it would best go, but explaining that there are
differences between how MSRs get handled seems very necessary,
along with what the differences are and why.

>>> @@ -1249,15 +1250,16 @@ static void load_vvmcs_host_state(struct vcpu *v)
>>>          __vmwrite(vmcs_h2g_field[i].guest_field, r);
>>>      }
>>>  
>>> -    hvm_set_cr0(__get_vvmcs(vvmcs, HOST_CR0));
>>> -    hvm_set_cr4(__get_vvmcs(vvmcs, HOST_CR4));
>>> -    hvm_set_cr3(__get_vvmcs(vvmcs, HOST_CR3));
>>> +    hvm_set_cr0(__get_vvmcs(vvmcs, HOST_CR0), 1);
>>> +    hvm_set_cr4(__get_vvmcs(vvmcs, HOST_CR4), 1);
>>> +    hvm_set_cr3(__get_vvmcs(vvmcs, HOST_CR3), 1);
>>>  
>>>      control = __get_vvmcs(vvmcs, VM_EXIT_CONTROLS);
>>>      if ( control & VM_EXIT_LOAD_HOST_PAT )
>>>          hvm_set_guest_pat(v, __get_vvmcs(vvmcs, HOST_PAT));
>>>      if ( control & VM_EXIT_LOAD_PERF_GLOBAL_CTRL )
>>> -        hvm_msr_write_intercept(MSR_CORE_PERF_GLOBAL_CTRL, __get_vvmcs(vvmcs, HOST_PERF_GLOBAL_CTRL));
>>> +        hvm_msr_write_intercept(MSR_CORE_PERF_GLOBAL_CTRL,
>>> +                                __get_vvmcs(vvmcs, HOST_PERF_GLOBAL_CTRL), 1);
>> 
>> Considering these together with the above - do you really want/
>> need to intercept and send events for both host and guest shadow
>> state changes?
> 
> Guest MSRs should suffice indeed. I was just trying to be consistent,
> but no, the host changes should not be necessary.

Aren't you inverting things here? We're talking about the guest's host
vs (L2) guest values here afaict (note the __get_vvmcs() uses).

Jan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V2 1/3] xen/vm_access: Support for memory-content hiding
  2015-06-26  8:44       ` Jan Beulich
@ 2015-06-26  9:49         ` Razvan Cojocaru
  2015-06-26  9:59           ` Jan Beulich
  0 siblings, 1 reply; 30+ messages in thread
From: Razvan Cojocaru @ 2015-06-26  9:49 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, wei.liu2, ian.campbell, stefano.stabellini,
	andrew.cooper3, tim, xen-devel, eddie.dong,
	Aravind.Gopalakrishnan, jun.nakajima, suravee.suthikulpanit,
	boris.ostrovsky, keir, ian.jackson

On 06/26/2015 11:44 AM, Jan Beulich wrote:
>>>> @@ -891,14 +934,37 @@ static int hvmemul_rep_outs(
>>>> >>>                            !!(ctxt->regs->eflags & X86_EFLAGS_DF), NULL);
>>>> >>>  }
>>>> >>>  
>>>> >>> -static int hvmemul_rep_movs(
>>>> >>> +static int hvmemul_rep_outs_set_context(
>>>> >>> +    enum x86_segment src_seg,
>>>> >>> +    unsigned long src_offset,
>>>> >>> +    uint16_t dst_port,
>>>> >>> +    unsigned int bytes_per_rep,
>>>> >>> +    unsigned long *reps,
>>>> >>> +    struct x86_emulate_ctxt *ctxt)
>>>> >>> +{
>>>> >>> +    struct vcpu *curr = current;
>>>> >>> +    unsigned int safe_bytes;
>>>> >>> +
>>>> >>> +    if ( !curr->arch.vm_event.emul_read_data )
>>>> >>> +        return X86EMUL_UNHANDLEABLE;
>>>> >>> +
>>>> >>> +    safe_bytes = min_t(unsigned int, bytes_per_rep,
>>>> >>> +        curr->arch.vm_event.emul_read_data->size);
>>>> >>> +
>>>> >>> +    return hvmemul_do_pio(dst_port, reps, safe_bytes, 0, IOREQ_WRITE,
>>>> >>> +                          !!(ctxt->regs->eflags & X86_EFLAGS_DF),
>>>> >>> +                          curr->arch.vm_event.emul_read_data->data);
>>> >> 
>>> >> This isn't consistent with e.g. rep_movs below - you shouldn't
>>> >> reduce the width of the operation.
>> > 
>> > I agree, but to be honest I wasn't sure of how to better go about this,
>> > it seem a bit more involved that the rest of the cases. I could perhaps
>> > allocate a bytes_per_rep-sized buffer, memset() it to zero and then copy
>> > safe_bytes from curr->arch.vm_event.emul_read_data->data to the
>> > beginning of it?
> See above - you first need to define the model you want to follow,
> and then you need to handle this uniformly everywhere.
> 
>>> >> Also - did I overlook where *reps gets forced to 1? If it's being
>>> >> done elsewhere, perhaps worth an ASSERT()?
>> > 
>> > *reps got forced to 1 in hvmemul_rep_stos_set_context() in V1, I took
>> > that out as per your comments, please see:
>> > 
>> > http://lists.xen.org/archives/html/xen-devel/2015-05/msg01054.html 
> That's fine, but then you can't simply ignore it in the safe_bytes
> calculation.

Sorry, I don't follow - I did truncate the data if bytes_per_rep >
curr->arch.vm_event.emul_read_data->size. But I'm passing reps on
untouched to hvmemul_do_pio(). Did I misunderstand the hvmemul_do_pio()
code? Should I pass a (*reps * bytes_per_rep) sized buffer to
hvmemul_do_pio() when its last parameter is != NULL?


Thanks,
Razvan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V2 1/3] xen/vm_access: Support for memory-content hiding
  2015-06-26  9:49         ` Razvan Cojocaru
@ 2015-06-26  9:59           ` Jan Beulich
  0 siblings, 0 replies; 30+ messages in thread
From: Jan Beulich @ 2015-06-26  9:59 UTC (permalink / raw)
  To: Razvan Cojocaru
  Cc: tim, kevin.tian, wei.liu2, ian.campbell, stefano.stabellini,
	jun.nakajima, andrew.cooper3, ian.jackson, xen-devel, eddie.dong,
	Aravind.Gopalakrishnan, suravee.suthikulpanit, keir,
	boris.ostrovsky

>>> On 26.06.15 at 11:49, <rcojocaru@bitdefender.com> wrote:
> Sorry, I don't follow - I did truncate the data if bytes_per_rep >
> curr->arch.vm_event.emul_read_data->size. But I'm passing reps on
> untouched to hvmemul_do_pio(). Did I misunderstand the hvmemul_do_pio()
> code? Should I pass a (*reps * bytes_per_rep) sized buffer to
> hvmemul_do_pio() when its last parameter is != NULL?

Just consider how REP OUTS works: It reads equally sized items
from successive memory locations (one per iteration, with the
memory address being incremented/decremented after each
iteration) and sends them to the designated port (again once per
iteration).

Jan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V2 3/3] xen/vm_event: Deny register writes if refused by vm_event reply
  2015-06-26  9:39       ` Jan Beulich
@ 2015-06-26 10:33         ` Razvan Cojocaru
  0 siblings, 0 replies; 30+ messages in thread
From: Razvan Cojocaru @ 2015-06-26 10:33 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tim, kevin.tian, wei.liu2, ian.campbell, stefano.stabellini,
	jun.nakajima, andrew.cooper3, ian.jackson, xen-devel, eddie.dong,
	Aravind.Gopalakrishnan, suravee.suthikulpanit, keir,
	boris.ostrovsky

On 06/26/2015 12:39 PM, Jan Beulich wrote:
>>>> @@ -1249,15 +1250,16 @@ static void load_vvmcs_host_state(struct vcpu *v)
>>>> >>>          __vmwrite(vmcs_h2g_field[i].guest_field, r);
>>>> >>>      }
>>>> >>>  
>>>> >>> -    hvm_set_cr0(__get_vvmcs(vvmcs, HOST_CR0));
>>>> >>> -    hvm_set_cr4(__get_vvmcs(vvmcs, HOST_CR4));
>>>> >>> -    hvm_set_cr3(__get_vvmcs(vvmcs, HOST_CR3));
>>>> >>> +    hvm_set_cr0(__get_vvmcs(vvmcs, HOST_CR0), 1);
>>>> >>> +    hvm_set_cr4(__get_vvmcs(vvmcs, HOST_CR4), 1);
>>>> >>> +    hvm_set_cr3(__get_vvmcs(vvmcs, HOST_CR3), 1);
>>>> >>>  
>>>> >>>      control = __get_vvmcs(vvmcs, VM_EXIT_CONTROLS);
>>>> >>>      if ( control & VM_EXIT_LOAD_HOST_PAT )
>>>> >>>          hvm_set_guest_pat(v, __get_vvmcs(vvmcs, HOST_PAT));
>>>> >>>      if ( control & VM_EXIT_LOAD_PERF_GLOBAL_CTRL )
>>>> >>> -        hvm_msr_write_intercept(MSR_CORE_PERF_GLOBAL_CTRL, __get_vvmcs(vvmcs, HOST_PERF_GLOBAL_CTRL));
>>>> >>> +        hvm_msr_write_intercept(MSR_CORE_PERF_GLOBAL_CTRL,
>>>> >>> +                                __get_vvmcs(vvmcs, HOST_PERF_GLOBAL_CTRL), 1);
>>> >> 
>>> >> Considering these together with the above - do you really want/
>>> >> need to intercept and send events for both host and guest shadow
>>> >> state changes?
>> > 
>> > Guest MSRs should suffice indeed. I was just trying to be consistent,
>> > but no, the host changes should not be necessary.
> Aren't you inverting things here? We're talking about the guest's host
> vs (L2) guest values here afaict (note the __get_vvmcs() uses).

I am indeed, clarification appreciated.


Thanks,
Razvan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V2 2/3] xen/vm_event: Support for guest-requested events
  2015-06-26  7:02   ` Jan Beulich
  2015-06-26  7:17     ` Razvan Cojocaru
@ 2015-06-30 14:23     ` Razvan Cojocaru
  2015-07-06 10:27       ` Jan Beulich
  1 sibling, 1 reply; 30+ messages in thread
From: Razvan Cojocaru @ 2015-06-30 14:23 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tim, kevin.tian, wei.liu2, ian.campbell, stefano.stabellini,
	jun.nakajima, andrew.cooper3, ian.jackson, xen-devel, eddie.dong,
	Aravind.Gopalakrishnan, suravee.suthikulpanit, keir,
	boris.ostrovsky, dgdegra

On 06/26/2015 10:02 AM, Jan Beulich wrote:
>>>> On 15.06.15 at 11:03, <rcojocaru@bitdefender.com> wrote:
>> Added support for a new class of vm_events: VM_EVENT_REASON_REQUEST,
>> sent via HVMOP_request_vm_event. The guest can request that a
>> generic vm_event (containing only the vm_event-filled guest registers
>> as information) be sent to userspace by setting up the correct
>> registers and doing a VMCALL. For example, for a 64-bit guest, this
>> means: EAX = 34 (hvmop), EBX = 24 (HVMOP_request_vm_event).
> 
> I suppose you mean a 32-bit guest here? Also I'm not sure it's a good
> idea to explicitly define a guest exposed hypercall to omit one of the
> arguments normally required for it (the interface structure pointer):
> Should there ever be a reason to allow the guest to control further
> aspects of the operation by passing a structure, you'd then have to
> define a new sub-op instead of being able to re-use the current one.
> I.e. I'd strongly recommend requiring NULL to be passed here, and
> checking this in the implementation of the handler.

Would something like this do?

6391     case HVMOP_guest_request_vm_event:
6392         if ( !guest_handle_is_null(arg) )
6393             rc = -EINVAL;
6394         else
6395             hvm_event_guest_request();
6396         break;

>> --- a/xen/arch/x86/hvm/hvm.c
>> +++ b/xen/arch/x86/hvm/hvm.c
>> @@ -6373,6 +6373,10 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>>          break;
>>      }
>>  
>> +    case HVMOP_request_vm_event:
>> +        hvm_event_requested();
>> +        break;
> 
> No XSM check here or in the handler? Shouldn't the admin controlling
> guest properties from the host perspective be permitted control here?
> Cc-ing Daniel for his input ...

Thinking more about this, the goal here is to be able to monitor
non-privileged guests from a privileged domain. Being able to subscribe
to these events is subject to XSM checks (so an application in dom0
would be able to receive them), but if XSM checks are needed for the
guest as well, then, at least for the purpose the code is intended for
now, the default would need to be to allow this to happen.


Thanks,
Razvan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V2 2/3] xen/vm_event: Support for guest-requested events
  2015-06-26  7:17     ` Razvan Cojocaru
  2015-06-26  8:45       ` Jan Beulich
@ 2015-06-30 14:48       ` Lengyel, Tamas
  2015-06-30 15:22         ` Razvan Cojocaru
                           ` (2 more replies)
  1 sibling, 3 replies; 30+ messages in thread
From: Lengyel, Tamas @ 2015-06-30 14:48 UTC (permalink / raw)
  To: Razvan Cojocaru
  Cc: kevin.tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Andrew Cooper, Tim Deegan, Daniel De Graaf,
	Xen-devel, eddie.dong, Aravind.Gopalakrishnan, Jan Beulich,
	suravee.suthikulpanit, boris.ostrovsky, keir, Ian Jackson


[-- Attachment #1.1: Type: text/plain, Size: 1465 bytes --]

>> --- a/xen/include/asm-x86/domain.h

> >> +++ b/xen/include/asm-x86/domain.h
> >> @@ -342,13 +342,15 @@ struct arch_domain
> >>
> >>      /* Monitor options */
> >>      struct {
> >> -        uint16_t write_ctrlreg_enabled       : 4;
> >> -        uint16_t write_ctrlreg_sync          : 4;
> >> -        uint16_t write_ctrlreg_onchangeonly  : 4;
> >> -        uint16_t mov_to_msr_enabled          : 1;
> >> -        uint16_t mov_to_msr_extended         : 1;
> >> -        uint16_t singlestep_enabled          : 1;
> >> -        uint16_t software_breakpoint_enabled : 1;
> >> +        uint32_t write_ctrlreg_enabled       : 4;
> >> +        uint32_t write_ctrlreg_sync          : 4;
> >> +        uint32_t write_ctrlreg_onchangeonly  : 4;
> >> +        uint32_t mov_to_msr_enabled          : 1;
> >> +        uint32_t mov_to_msr_extended         : 1;
> >> +        uint32_t singlestep_enabled          : 1;
> >> +        uint32_t software_breakpoint_enabled : 1;
> >> +        uint32_t request_enabled             : 1;
> >> +        uint32_t request_sync                : 1;
> >
> > Can you please switch to plain unsigned int if you already have to
> > touch this? There's no reason I can see to use a fixed width integer
> > type here.
>
> Ack, will make it plain int.


IMHO having it fix-width is easier to read when adding new elements to see
how many bits we have left free. I would not want this changed unless there
is a clear benefit to doing so.

Tamas

[-- Attachment #1.2: Type: text/html, Size: 1960 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V2 2/3] xen/vm_event: Support for guest-requested events
  2015-06-30 14:48       ` Lengyel, Tamas
@ 2015-06-30 15:22         ` Razvan Cojocaru
  2015-07-01  8:24         ` Razvan Cojocaru
  2015-07-06 10:26         ` Jan Beulich
  2 siblings, 0 replies; 30+ messages in thread
From: Razvan Cojocaru @ 2015-06-30 15:22 UTC (permalink / raw)
  To: Lengyel, Tamas
  Cc: kevin.tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Andrew Cooper, Tim Deegan, Daniel De Graaf,
	Xen-devel, eddie.dong, Aravind.Gopalakrishnan, Jan Beulich,
	suravee.suthikulpanit, boris.ostrovsky, keir, Ian Jackson

On 06/30/2015 05:48 PM, Lengyel, Tamas wrote:
>>> --- a/xen/include/asm-x86/domain.h
> 
>     >> +++ b/xen/include/asm-x86/domain.h
>     >> @@ -342,13 +342,15 @@ struct arch_domain
>     >>
>     >>      /* Monitor options */
>     >>      struct {
>     >> -        uint16_t write_ctrlreg_enabled       : 4;
>     >> -        uint16_t write_ctrlreg_sync          : 4;
>     >> -        uint16_t write_ctrlreg_onchangeonly  : 4;
>     >> -        uint16_t mov_to_msr_enabled          : 1;
>     >> -        uint16_t mov_to_msr_extended         : 1;
>     >> -        uint16_t singlestep_enabled          : 1;
>     >> -        uint16_t software_breakpoint_enabled : 1;
>     >> +        uint32_t write_ctrlreg_enabled       : 4;
>     >> +        uint32_t write_ctrlreg_sync          : 4;
>     >> +        uint32_t write_ctrlreg_onchangeonly  : 4;
>     >> +        uint32_t mov_to_msr_enabled          : 1;
>     >> +        uint32_t mov_to_msr_extended         : 1;
>     >> +        uint32_t singlestep_enabled          : 1;
>     >> +        uint32_t software_breakpoint_enabled : 1;
>     >> +        uint32_t request_enabled             : 1;
>     >> +        uint32_t request_sync                : 1;
>     >
>     > Can you please switch to plain unsigned int if you already have to
>     > touch this? There's no reason I can see to use a fixed width integer
>     > type here.
> 
>     Ack, will make it plain int.
> 
> 
> IMHO having it fix-width is easier to read when adding new elements to
> see how many bits we have left free. I would not want this changed
> unless there is a clear benefit to doing so.

I don't really have a preference one way or the other about this based
on readability, so I'm fine with going either way.

However, while in practice this modification is safe, AFAIK the C
standard does not guarantee that an (unsigned) int is at least 32-bits
wide, so it might raise the question of what might happen if somebody
builds the hypervisor with a 16-bit int compiler (though I would imagine
there would be some compile-time warning / error in that case). Do we
care about that?


Thanks,
Razvan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V2 2/3] xen/vm_event: Support for guest-requested events
  2015-06-30 14:48       ` Lengyel, Tamas
  2015-06-30 15:22         ` Razvan Cojocaru
@ 2015-07-01  8:24         ` Razvan Cojocaru
  2015-07-06 10:26         ` Jan Beulich
  2 siblings, 0 replies; 30+ messages in thread
From: Razvan Cojocaru @ 2015-07-01  8:24 UTC (permalink / raw)
  To: Lengyel, Tamas
  Cc: kevin.tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Andrew Cooper, Tim Deegan, Daniel De Graaf,
	Xen-devel, eddie.dong, Aravind.Gopalakrishnan, Jan Beulich,
	suravee.suthikulpanit, boris.ostrovsky, keir, Ian Jackson

On 06/30/2015 05:48 PM, Lengyel, Tamas wrote:
>>> --- a/xen/include/asm-x86/domain.h
> 
>     >> +++ b/xen/include/asm-x86/domain.h
>     >> @@ -342,13 +342,15 @@ struct arch_domain
>     >>
>     >>      /* Monitor options */
>     >>      struct {
>     >> -        uint16_t write_ctrlreg_enabled       : 4;
>     >> -        uint16_t write_ctrlreg_sync          : 4;
>     >> -        uint16_t write_ctrlreg_onchangeonly  : 4;
>     >> -        uint16_t mov_to_msr_enabled          : 1;
>     >> -        uint16_t mov_to_msr_extended         : 1;
>     >> -        uint16_t singlestep_enabled          : 1;
>     >> -        uint16_t software_breakpoint_enabled : 1;
>     >> +        uint32_t write_ctrlreg_enabled       : 4;
>     >> +        uint32_t write_ctrlreg_sync          : 4;
>     >> +        uint32_t write_ctrlreg_onchangeonly  : 4;
>     >> +        uint32_t mov_to_msr_enabled          : 1;
>     >> +        uint32_t mov_to_msr_extended         : 1;
>     >> +        uint32_t singlestep_enabled          : 1;
>     >> +        uint32_t software_breakpoint_enabled : 1;
>     >> +        uint32_t request_enabled             : 1;
>     >> +        uint32_t request_sync                : 1;
>     >
>     > Can you please switch to plain unsigned int if you already have to
>     > touch this? There's no reason I can see to use a fixed width integer
>     > type here.
> 
>     Ack, will make it plain int.
> 
> 
> IMHO having it fix-width is easier to read when adding new elements to
> see how many bits we have left free. I would not want this changed
> unless there is a clear benefit to doing so.

There's also another case to be made for your view, and that is that
where bitfields are concerned the current Xen style uses fixed width
types. This can be seen by, e.g. "grep -R ": 1" *" in xen/include.

I'll leave it as uint32_t for now, and hopefully Jan will let us know
more when he returns from his vacation.


Thanks,
Razvan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V2 3/3] xen/vm_event: Deny register writes if refused by vm_event reply
  2015-06-26  8:28   ` Jan Beulich
  2015-06-26  9:17     ` Razvan Cojocaru
@ 2015-07-01 15:21     ` Razvan Cojocaru
  1 sibling, 0 replies; 30+ messages in thread
From: Razvan Cojocaru @ 2015-07-01 15:21 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tim, kevin.tian, wei.liu2, ian.campbell, stefano.stabellini,
	jun.nakajima, andrew.cooper3, ian.jackson, xen-devel, eddie.dong,
	Aravind.Gopalakrishnan, suravee.suthikulpanit, keir,
	boris.ostrovsky

On 06/26/2015 11:28 AM, Jan Beulich wrote:
>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> > @@ -2010,9 +2010,9 @@ static int vmx_cr_access(unsigned long exit_qualification)
>> >      }
>> >      case VMX_CONTROL_REG_ACCESS_TYPE_CLTS: {
>> >          unsigned long old = curr->arch.hvm_vcpu.guest_cr[0];
>> > +        hvm_event_crX(CR0, curr->arch.hvm_vcpu.guest_cr[0], old);
>> >          curr->arch.hvm_vcpu.guest_cr[0] &= ~X86_CR0_TS;
>> >          vmx_update_guest_cr(curr, 0);
>> > -        hvm_event_crX(CR0, curr->arch.hvm_vcpu.guest_cr[0], old);
>> >          HVMTRACE_0D(CLTS);
>> >          break;
>> >      }
> I suppose it is intentional to ignore a possible deny here? If so,
> shouldn't the be documented by way of a comment?

On second though, I will document it by way of a comment and leave it as
it is, since this is a corner case that would need special handling
(vmx_update_guest_cr(curr, 0), etc. vs how it's now done in
hvm_do_resume(), which would need a new parameter being passed around,
conditional code, etc.), and AFAIK this event is not interesting from a
DENY perspective and so not currently worth the overhead.

Looking at the code again, I'll also fix another mistake: when cutting
and pasting the hvm_event_crX() above to make it a pre-write event to be
consistent, the hvm_event_crX(CR0, curr->arch.hvm_vcpu.guest_cr[0], old)
code became wrong: old == curr->arch.hvm_vcpu.guest_cr[0], whereas
before, curr->arch.hvm_vcpu.guest_cr[0] &= ~X86_CR0_TS before that call.
I'll fix this too in V3.


Thanks,
Razvan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V2 2/3] xen/vm_event: Support for guest-requested events
  2015-06-30 14:48       ` Lengyel, Tamas
  2015-06-30 15:22         ` Razvan Cojocaru
  2015-07-01  8:24         ` Razvan Cojocaru
@ 2015-07-06 10:26         ` Jan Beulich
  2015-07-06 13:46           ` Lengyel, Tamas
  2 siblings, 1 reply; 30+ messages in thread
From: Jan Beulich @ 2015-07-06 10:26 UTC (permalink / raw)
  To: Razvan Cojocaru, Tamas Lengyel
  Cc: Tim Deegan, kevin.tian, Wei Liu, Ian Campbell,
	Stefano Stabellini, Jun Nakajima, Andrew Cooper, Ian Jackson,
	Xen-devel, eddie.dong, Aravind.Gopalakrishnan,
	suravee.suthikulpanit, keir, boris.ostrovsky, Daniel De Graaf

>>> On 30.06.15 at 16:48, <tlengyel@novetta.com> wrote:
>> > --- a/xen/include/asm-x86/domain.h
> 
>> >> +++ b/xen/include/asm-x86/domain.h
>> >> @@ -342,13 +342,15 @@ struct arch_domain
>> >>
>> >>      /* Monitor options */
>> >>      struct {
>> >> -        uint16_t write_ctrlreg_enabled       : 4;
>> >> -        uint16_t write_ctrlreg_sync          : 4;
>> >> -        uint16_t write_ctrlreg_onchangeonly  : 4;
>> >> -        uint16_t mov_to_msr_enabled          : 1;
>> >> -        uint16_t mov_to_msr_extended         : 1;
>> >> -        uint16_t singlestep_enabled          : 1;
>> >> -        uint16_t software_breakpoint_enabled : 1;
>> >> +        uint32_t write_ctrlreg_enabled       : 4;
>> >> +        uint32_t write_ctrlreg_sync          : 4;
>> >> +        uint32_t write_ctrlreg_onchangeonly  : 4;
>> >> +        uint32_t mov_to_msr_enabled          : 1;
>> >> +        uint32_t mov_to_msr_extended         : 1;
>> >> +        uint32_t singlestep_enabled          : 1;
>> >> +        uint32_t software_breakpoint_enabled : 1;
>> >> +        uint32_t request_enabled             : 1;
>> >> +        uint32_t request_sync                : 1;
>> >
>> > Can you please switch to plain unsigned int if you already have to
>> > touch this? There's no reason I can see to use a fixed width integer
>> > type here.
>>
>> Ack, will make it plain int.
> 
> 
> IMHO having it fix-width is easier to read when adding new elements to see
> how many bits we have left free. I would not want this changed unless there
> is a clear benefit to doing so.

I can't see what benefit using fixed widht types here is: Best
demonstrated with uint64_t, gcc doesn't honor the type of the
field anyway. And hence counting the number of left bits (with
some unknown definition of "left") is quite pointless too - it's an
internal structure, and when a new bit field needs to be added,
it doesn't really matter whether it grows the structure (at least
no more than any other field addition).

Bottom line - no fixed width types when not really warranted,
please.

Jan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V2 2/3] xen/vm_event: Support for guest-requested events
  2015-06-30 14:23     ` Razvan Cojocaru
@ 2015-07-06 10:27       ` Jan Beulich
  2015-07-06 14:35         ` Razvan Cojocaru
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Beulich @ 2015-07-06 10:27 UTC (permalink / raw)
  To: Razvan Cojocaru, dgdegra
  Cc: tim, kevin.tian, wei.liu2, ian.campbell, stefano.stabellini,
	jun.nakajima, andrew.cooper3, ian.jackson, xen-devel, eddie.dong,
	Aravind.Gopalakrishnan, suravee.suthikulpanit, keir,
	boris.ostrovsky

>>> On 30.06.15 at 16:23, <rcojocaru@bitdefender.com> wrote:
> On 06/26/2015 10:02 AM, Jan Beulich wrote:
>>>>> On 15.06.15 at 11:03, <rcojocaru@bitdefender.com> wrote:
>>> Added support for a new class of vm_events: VM_EVENT_REASON_REQUEST,
>>> sent via HVMOP_request_vm_event. The guest can request that a
>>> generic vm_event (containing only the vm_event-filled guest registers
>>> as information) be sent to userspace by setting up the correct
>>> registers and doing a VMCALL. For example, for a 64-bit guest, this
>>> means: EAX = 34 (hvmop), EBX = 24 (HVMOP_request_vm_event).
>> 
>> I suppose you mean a 32-bit guest here? Also I'm not sure it's a good
>> idea to explicitly define a guest exposed hypercall to omit one of the
>> arguments normally required for it (the interface structure pointer):
>> Should there ever be a reason to allow the guest to control further
>> aspects of the operation by passing a structure, you'd then have to
>> define a new sub-op instead of being able to re-use the current one.
>> I.e. I'd strongly recommend requiring NULL to be passed here, and
>> checking this in the implementation of the handler.
> 
> Would something like this do?
> 
> 6391     case HVMOP_guest_request_vm_event:
> 6392         if ( !guest_handle_is_null(arg) )
> 6393             rc = -EINVAL;
> 6394         else
> 6395             hvm_event_guest_request();
> 6396         break;

Yes, except that I'd recommend inverting the condition and
swapping the branches.

>>> --- a/xen/arch/x86/hvm/hvm.c
>>> +++ b/xen/arch/x86/hvm/hvm.c
>>> @@ -6373,6 +6373,10 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>>>          break;
>>>      }
>>>  
>>> +    case HVMOP_request_vm_event:
>>> +        hvm_event_requested();
>>> +        break;
>> 
>> No XSM check here or in the handler? Shouldn't the admin controlling
>> guest properties from the host perspective be permitted control here?
>> Cc-ing Daniel for his input ...
> 
> Thinking more about this, the goal here is to be able to monitor
> non-privileged guests from a privileged domain. Being able to subscribe
> to these events is subject to XSM checks (so an application in dom0
> would be able to receive them), but if XSM checks are needed for the
> guest as well, then, at least for the purpose the code is intended for
> now, the default would need to be to allow this to happen.

Daniel?

Jan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V2 2/3] xen/vm_event: Support for guest-requested events
  2015-07-06 10:26         ` Jan Beulich
@ 2015-07-06 13:46           ` Lengyel, Tamas
  0 siblings, 0 replies; 30+ messages in thread
From: Lengyel, Tamas @ 2015-07-06 13:46 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Razvan Cojocaru, Xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2248 bytes --]

On Mon, Jul 6, 2015 at 6:26 AM, Jan Beulich <JBeulich@suse.com> wrote:

> >>> On 30.06.15 at 16:48, <tlengyel@novetta.com> wrote:
> >> > --- a/xen/include/asm-x86/domain.h
> >
> >> >> +++ b/xen/include/asm-x86/domain.h
> >> >> @@ -342,13 +342,15 @@ struct arch_domain
> >> >>
> >> >>      /* Monitor options */
> >> >>      struct {
> >> >> -        uint16_t write_ctrlreg_enabled       : 4;
> >> >> -        uint16_t write_ctrlreg_sync          : 4;
> >> >> -        uint16_t write_ctrlreg_onchangeonly  : 4;
> >> >> -        uint16_t mov_to_msr_enabled          : 1;
> >> >> -        uint16_t mov_to_msr_extended         : 1;
> >> >> -        uint16_t singlestep_enabled          : 1;
> >> >> -        uint16_t software_breakpoint_enabled : 1;
> >> >> +        uint32_t write_ctrlreg_enabled       : 4;
> >> >> +        uint32_t write_ctrlreg_sync          : 4;
> >> >> +        uint32_t write_ctrlreg_onchangeonly  : 4;
> >> >> +        uint32_t mov_to_msr_enabled          : 1;
> >> >> +        uint32_t mov_to_msr_extended         : 1;
> >> >> +        uint32_t singlestep_enabled          : 1;
> >> >> +        uint32_t software_breakpoint_enabled : 1;
> >> >> +        uint32_t request_enabled             : 1;
> >> >> +        uint32_t request_sync                : 1;
> >> >
> >> > Can you please switch to plain unsigned int if you already have to
> >> > touch this? There's no reason I can see to use a fixed width integer
> >> > type here.
> >>
> >> Ack, will make it plain int.
> >
> >
> > IMHO having it fix-width is easier to read when adding new elements to
> see
> > how many bits we have left free. I would not want this changed unless
> there
> > is a clear benefit to doing so.
>
> I can't see what benefit using fixed widht types here is: Best
> demonstrated with uint64_t, gcc doesn't honor the type of the
> field anyway. And hence counting the number of left bits (with
> some unknown definition of "left") is quite pointless too - it's an
> internal structure, and when a new bit field needs to be added,
> it doesn't really matter whether it grows the structure (at least
> no more than any other field addition).
>
> Bottom line - no fixed width types when not really warranted,
> please.
>
> Jan
>

SGTM.

Thanks,
Tamas

[-- Attachment #1.2: Type: text/html, Size: 3264 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V2 2/3] xen/vm_event: Support for guest-requested events
  2015-07-06 10:27       ` Jan Beulich
@ 2015-07-06 14:35         ` Razvan Cojocaru
  0 siblings, 0 replies; 30+ messages in thread
From: Razvan Cojocaru @ 2015-07-06 14:35 UTC (permalink / raw)
  To: Jan Beulich, dgdegra
  Cc: tim, kevin.tian, wei.liu2, ian.campbell, stefano.stabellini,
	jun.nakajima, andrew.cooper3, ian.jackson, xen-devel, eddie.dong,
	Aravind.Gopalakrishnan, suravee.suthikulpanit, keir,
	boris.ostrovsky

On 07/06/2015 01:27 PM, Jan Beulich wrote:
>>> No XSM check here or in the handler? Shouldn't the admin controlling
>>> guest properties from the host perspective be permitted control here?
>>> Cc-ing Daniel for his input ...
>>
>> Thinking more about this, the goal here is to be able to monitor
>> non-privileged guests from a privileged domain. Being able to subscribe
>> to these events is subject to XSM checks (so an application in dom0
>> would be able to receive them), but if XSM checks are needed for the
>> guest as well, then, at least for the purpose the code is intended for
>> now, the default would need to be to allow this to happen.
> 
> Daniel?

The examples I've seen of XSM checks in hvm_do_op() require that an
argument is provided so that the domain id can be retrieved:

6156     case HVMOP_track_dirty_vram:
6157     {
6158         struct xen_hvm_track_dirty_vram a;
6159         struct domain *d;
6160
6161         if ( copy_from_guest(&a, arg, 1) )
6162             return -EFAULT;
6163
6164         rc = rcu_lock_remote_domain_by_id(a.domid, &d);
6165         if ( rc != 0 )
6166             return rc;

[...]

6175         rc = xsm_hvm_control(XSM_DM_PRIV, d, op);
6176         if ( rc )
6177             goto param_fail2;

We'll now be sending NULL as the hypercall argument (as previously
discussed), but even if we decided to set it to an useful value, I'm not
sure how a HVM guest, who presumably is not even aware it's running on
top of Xen, can pass a correct ID to the hypervisor for XSM checking here.

Also, I'm not quite following how this is different from the other
vm_events as far as XSM is concerned. Special permissions are not
required for EPT, CR or MSR events, and while the VMCALL-based
guest-requested events are bit more involved, in the end it's just as
easy (or at least not that more difficult) to run a VMCALL in the guest
as it is to write a value to a control register.

Unless we get a reply from Daniel soon, I'll send V3 later today so that
the rest of the changes discussed last week will have a shot at being
reviewed, and I'll of course change the code in V4 should more XSM
checks be required.


Thanks,
Razvan

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2015-07-06 14:35 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-15  9:03 Vm_event memory introspection helpers Razvan Cojocaru
2015-06-15  9:03 ` [PATCH V2 1/3] xen/vm_access: Support for memory-content hiding Razvan Cojocaru
2015-06-25 15:57   ` Jan Beulich
2015-06-26  8:22     ` Razvan Cojocaru
2015-06-26  8:44       ` Jan Beulich
2015-06-26  9:49         ` Razvan Cojocaru
2015-06-26  9:59           ` Jan Beulich
2015-06-15  9:03 ` [PATCH V2 2/3] xen/vm_event: Support for guest-requested events Razvan Cojocaru
2015-06-24 14:56   ` Razvan Cojocaru
2015-06-24 15:03     ` Jan Beulich
2015-06-25  7:55       ` Razvan Cojocaru
2015-06-25  8:37         ` Jan Beulich
2015-06-25  9:09           ` Razvan Cojocaru
2015-06-26  7:02   ` Jan Beulich
2015-06-26  7:17     ` Razvan Cojocaru
2015-06-26  8:45       ` Jan Beulich
2015-06-30 14:48       ` Lengyel, Tamas
2015-06-30 15:22         ` Razvan Cojocaru
2015-07-01  8:24         ` Razvan Cojocaru
2015-07-06 10:26         ` Jan Beulich
2015-07-06 13:46           ` Lengyel, Tamas
2015-06-30 14:23     ` Razvan Cojocaru
2015-07-06 10:27       ` Jan Beulich
2015-07-06 14:35         ` Razvan Cojocaru
2015-06-15  9:03 ` [PATCH V2 3/3] xen/vm_event: Deny register writes if refused by vm_event reply Razvan Cojocaru
2015-06-26  8:28   ` Jan Beulich
2015-06-26  9:17     ` Razvan Cojocaru
2015-06-26  9:39       ` Jan Beulich
2015-06-26 10:33         ` Razvan Cojocaru
2015-07-01 15:21     ` Razvan Cojocaru

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.