All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC V11 1/5] xen: Emulate with no writes
@ 2014-09-02 15:37 Razvan Cojocaru
  2014-09-02 15:37 ` [PATCH RFC V11 2/5] xen: Optimize introspection access to guest state Razvan Cojocaru
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Razvan Cojocaru @ 2014-09-02 15:37 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, ian.campbell, Razvan Cojocaru, stefano.stabellini,
	jun.nakajima, andrew.cooper3, eddie.dong, tim, JBeulich,
	ian.jackson

Added support for emulating an instruction with no memory writes.
Additionally, introduced hvm_emulate_one_full(), which inspects
possible return values from the hvm_emulate_one() functions
(EXCEPTION, UNHANDLEABLE) and acts on them.

Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

---
Changes since V7:
 - Renamed hvm_emulate_one_full() to hvm_mem_event_emulate_one().
 - Added a comment explaining how X86EMUL_RETRY is being treated.

Changes since V6:
 - Added discard versions of write_msr and wbinvd.
 - Now calling hvm_emulate_writeback() for the X86EMUL_UNHANDLEABLE
   case as well.

Changes since V5:
 - Added discard versions of write_io and rep_outs.

Changes since V4:
 - Also discarding IO reads (dummy read_io handler).

Changes since V3:
 - The rep_ins, rep_movs and cmpxchg handlers are
   now also inactive.

Changes since V2:
 - Renamed hvmemul_write_dummy() to hvmemul_write_discard().
 - Fixed a comment (Xen coding style rules).
 - Renamed hvm_emulate_one_with_ops() to _hvm_emulate_one().
 - Changed stack variable hvm_emulate_ops_no_write into a static const.
 - Modified struct hvm_emulate_ctxt ctx initialization syntax.
 - Renamed unhandleable_trapnr to trapnr and unhandleable_errcode
   to errcode.
 - Changed errcode's type to unsigned int.
 - Removed hvm_emulate_one() loop that went on until it returned
   something other than X86EMUL_RETRY (to prevent potential
   blocking against the time calibration rendezvous).

Changes since V1:
 - Removed the Linux code that computes the length of an instruction.
 - Unused function parameters are no longer marked.
 - Refactored the code to eliminate redundancy.
 - Made the exception passed on to the guest by hvm_emulate_one_full()
   configurable.
---
 xen/arch/x86/hvm/emulate.c        |  175 ++++++++++++++++++++++++++++++++++++-
 xen/include/asm-x86/hvm/emulate.h |    5 ++
 2 files changed, 177 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 86cf432..6ab06e0 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -690,6 +690,94 @@ static int hvmemul_write(
     return X86EMUL_OKAY;
 }
 
+static int hvmemul_write_discard(
+    enum x86_segment seg,
+    unsigned long offset,
+    void *p_data,
+    unsigned int bytes,
+    struct x86_emulate_ctxt *ctxt)
+{
+    /* Discarding the write. */
+    return X86EMUL_OKAY;
+}
+
+static int hvmemul_rep_ins_discard(
+    uint16_t src_port,
+    enum x86_segment dst_seg,
+    unsigned long dst_offset,
+    unsigned int bytes_per_rep,
+    unsigned long *reps,
+    struct x86_emulate_ctxt *ctxt)
+{
+    return X86EMUL_OKAY;
+}
+
+static int hvmemul_rep_movs_discard(
+   enum x86_segment src_seg,
+   unsigned long src_offset,
+   enum x86_segment dst_seg,
+   unsigned long dst_offset,
+   unsigned int bytes_per_rep,
+   unsigned long *reps,
+   struct x86_emulate_ctxt *ctxt)
+{
+    return X86EMUL_OKAY;
+}
+
+static int hvmemul_rep_outs_discard(
+    enum x86_segment src_seg,
+    unsigned long src_offset,
+    uint16_t dst_port,
+    unsigned int bytes_per_rep,
+    unsigned long *reps,
+    struct x86_emulate_ctxt *ctxt)
+{
+    return X86EMUL_OKAY;
+}
+
+static int hvmemul_cmpxchg_discard(
+    enum x86_segment seg,
+    unsigned long offset,
+    void *p_old,
+    void *p_new,
+    unsigned int bytes,
+    struct x86_emulate_ctxt *ctxt)
+{
+    return X86EMUL_OKAY;
+}
+
+static int hvmemul_read_io_discard(
+    unsigned int port,
+    unsigned int bytes,
+    unsigned long *val,
+    struct x86_emulate_ctxt *ctxt)
+{
+    return X86EMUL_OKAY;
+}
+
+static int hvmemul_write_io_discard(
+    unsigned int port,
+    unsigned int bytes,
+    unsigned long val,
+    struct x86_emulate_ctxt *ctxt)
+{
+    return X86EMUL_OKAY;
+}
+
+static int hvmemul_write_msr_discard(
+    unsigned long reg,
+    uint64_t val,
+    struct x86_emulate_ctxt *ctxt)
+{
+    return X86EMUL_OKAY;
+}
+
+static int hvmemul_wbinvd_discard(
+    struct x86_emulate_ctxt *ctxt)
+{
+    return X86EMUL_OKAY;
+}
+
 static int hvmemul_cmpxchg(
     enum x86_segment seg,
     unsigned long offset,
@@ -1140,8 +1228,33 @@ static const struct x86_emulate_ops hvm_emulate_ops = {
     .invlpg        = hvmemul_invlpg
 };
 
-int hvm_emulate_one(
-    struct hvm_emulate_ctxt *hvmemul_ctxt)
+static const struct x86_emulate_ops hvm_emulate_ops_no_write = {
+    .read          = hvmemul_read,
+    .insn_fetch    = hvmemul_insn_fetch,
+    .write         = hvmemul_write_discard,
+    .cmpxchg       = hvmemul_cmpxchg_discard,
+    .rep_ins       = hvmemul_rep_ins_discard,
+    .rep_outs      = hvmemul_rep_outs_discard,
+    .rep_movs      = hvmemul_rep_movs_discard,
+    .read_segment  = hvmemul_read_segment,
+    .write_segment = hvmemul_write_segment,
+    .read_io       = hvmemul_read_io_discard,
+    .write_io      = hvmemul_write_io_discard,
+    .read_cr       = hvmemul_read_cr,
+    .write_cr      = hvmemul_write_cr,
+    .read_msr      = hvmemul_read_msr,
+    .write_msr     = hvmemul_write_msr_discard,
+    .wbinvd        = hvmemul_wbinvd_discard,
+    .cpuid         = hvmemul_cpuid,
+    .inject_hw_exception = hvmemul_inject_hw_exception,
+    .inject_sw_interrupt = hvmemul_inject_sw_interrupt,
+    .get_fpu       = hvmemul_get_fpu,
+    .put_fpu       = hvmemul_put_fpu,
+    .invlpg        = hvmemul_invlpg
+};
+
+static int _hvm_emulate_one(struct hvm_emulate_ctxt *hvmemul_ctxt,
+    const struct x86_emulate_ops *ops)
 {
     struct cpu_user_regs *regs = hvmemul_ctxt->ctxt.regs;
     struct vcpu *curr = current;
@@ -1193,7 +1306,7 @@ int hvm_emulate_one(
     vio->mmio_retrying = vio->mmio_retry;
     vio->mmio_retry = 0;
 
-    rc = x86_emulate(&hvmemul_ctxt->ctxt, &hvm_emulate_ops);
+    rc = x86_emulate(&hvmemul_ctxt->ctxt, ops);
 
     if ( rc == X86EMUL_OKAY && vio->mmio_retry )
         rc = X86EMUL_RETRY;
@@ -1241,6 +1354,62 @@ int hvm_emulate_one(
     return X86EMUL_OKAY;
 }
 
+int hvm_emulate_one(
+    struct hvm_emulate_ctxt *hvmemul_ctxt)
+{
+    return _hvm_emulate_one(hvmemul_ctxt, &hvm_emulate_ops);
+}
+
+int hvm_emulate_one_no_write(
+    struct hvm_emulate_ctxt *hvmemul_ctxt)
+{
+    return _hvm_emulate_one(hvmemul_ctxt, &hvm_emulate_ops_no_write);
+}
+
+void hvm_mem_event_emulate_one(bool_t nowrite, unsigned int trapnr,
+    unsigned int errcode)
+{
+    struct hvm_emulate_ctxt ctx = {{ 0 }};
+    int rc;
+
+    hvm_emulate_prepare(&ctx, guest_cpu_user_regs());
+
+    if ( nowrite )
+        rc = hvm_emulate_one_no_write(&ctx);
+    else
+        rc = hvm_emulate_one(&ctx);
+
+    switch ( rc )
+    {
+    case X86EMUL_RETRY:
+        /*
+         * This function is called when handling an EPT-related mem_event
+         * reply. As such, nothing else needs to be done here, since simply
+         * returning makes the current instruction cause a page fault again,
+         * consistent with X86EMUL_RETRY.
+         */
+        return;
+    case X86EMUL_UNHANDLEABLE:
+        gdprintk(XENLOG_DEBUG, "Emulation failed @ %04x:%lx: "
+               "%02x %02x %02x %02x %02x %02x %02x %02x %02x %02x\n",
+               hvmemul_get_seg_reg(x86_seg_cs, &ctx)->sel,
+               ctx.insn_buf_eip,
+               ctx.insn_buf[0], ctx.insn_buf[1],
+               ctx.insn_buf[2], ctx.insn_buf[3],
+               ctx.insn_buf[4], ctx.insn_buf[5],
+               ctx.insn_buf[6], ctx.insn_buf[7],
+               ctx.insn_buf[8], ctx.insn_buf[9]);
+        hvm_inject_hw_exception(trapnr, errcode);
+        break;
+    case X86EMUL_EXCEPTION:
+        if ( ctx.exn_pending )
+            hvm_inject_hw_exception(ctx.exn_vector, ctx.exn_error_code);
+        break;
+    }
+
+    hvm_emulate_writeback(&ctx);
+}
+
 void hvm_emulate_prepare(
     struct hvm_emulate_ctxt *hvmemul_ctxt,
     struct cpu_user_regs *regs)
diff --git a/xen/include/asm-x86/hvm/emulate.h b/xen/include/asm-x86/hvm/emulate.h
index 00a06cc..efff97e 100644
--- a/xen/include/asm-x86/hvm/emulate.h
+++ b/xen/include/asm-x86/hvm/emulate.h
@@ -37,6 +37,11 @@ struct hvm_emulate_ctxt {
 
 int hvm_emulate_one(
     struct hvm_emulate_ctxt *hvmemul_ctxt);
+int hvm_emulate_one_no_write(
+    struct hvm_emulate_ctxt *hvmemul_ctxt);
+void hvm_mem_event_emulate_one(bool_t nowrite,
+    unsigned int trapnr,
+    unsigned int errcode);
 void hvm_emulate_prepare(
     struct hvm_emulate_ctxt *hvmemul_ctxt,
     struct cpu_user_regs *regs);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH RFC V11 2/5] xen: Optimize introspection access to guest state
  2014-09-02 15:37 [PATCH RFC V11 1/5] xen: Emulate with no writes Razvan Cojocaru
@ 2014-09-02 15:37 ` Razvan Cojocaru
  2014-09-03 13:06   ` Ian Campbell
  2014-09-02 15:37 ` [PATCH RFC V11 3/5] xen, libxc: Force-enable relevant MSR events Razvan Cojocaru
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: Razvan Cojocaru @ 2014-09-02 15:37 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, ian.campbell, Razvan Cojocaru, stefano.stabellini,
	jun.nakajima, andrew.cooper3, eddie.dong, tim, JBeulich,
	ian.jackson

Speed optimization for introspection purposes: a handful of registers
are sent along with each mem_event. This requires enlargement of the
mem_event_request / mem_event_response stuctures, and additional code
to fill in relevant values. Since the EPT event processing code needs
more data than CR3 or MSR event processors, hvm_mem_event_fill_regs()
fills in less data than p2m_mem_event_fill_regs(), in order to avoid
overhead. Struct hvm_hw_cpu has been considered instead of the custom
struct mem_event_regs_st, but its size would cause quick filling up
of the mem_event ring buffer.

Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

---
Changes since V5:
 - Const-correctness.

Changes since V4:
 - Renamed "x86_mem_event_regs" to "mem_event_regs_x86" and
   removed a typedef.

Changes since V3:
 - Changed the name of VCPU variables referring to the current
   VCPU to "curr".
 - Renamed "mem_event_regs" to "x86_mem_event_regs" to make it
   explicit.

Changes since V2:
 - Removed inline function compiler suggestions.
 - Removed superfluous memset(&ctxt, 0, sizeof(struct hvm_hw_cpu)).
 - Padded struct mem_event_regs_st.
 - Renamed mem_event_regs_t to mem_event_regs.
 - Readability changes.

Changes since V1:
 - Replaced guest_x86_mode with cs_arbytes in the mem_event_regs_st
   structure.
 - Removed superfluous preprocessor check for __x86_64__ in
   p2m_mem_event_fill_regs().
---
 tools/libxc/xc_mem_access.c        |   10 +++++++++-
 tools/libxc/xc_mem_event.c         |    7 +++++--
 tools/libxc/xc_private.h           |    2 +-
 tools/libxc/xenctrl.h              |    2 ++
 xen/arch/x86/hvm/vmx/vmcs.c        |   25 +++++++++++++++++++++++++
 xen/arch/x86/hvm/vmx/vmx.c         |   13 +++++++++++++
 xen/arch/x86/mm/mem_event.c        |   11 +++++++++++
 xen/include/asm-x86/hvm/domain.h   |    1 +
 xen/include/asm-x86/hvm/hvm.h      |    2 ++
 xen/include/asm-x86/hvm/vmx/vmcs.h |    7 +++++++
 xen/include/public/domctl.h        |    7 ++++---
 11 files changed, 80 insertions(+), 7 deletions(-)

diff --git a/tools/libxc/xc_mem_access.c b/tools/libxc/xc_mem_access.c
index 461f0e9..55d0e9f 100644
--- a/tools/libxc/xc_mem_access.c
+++ b/tools/libxc/xc_mem_access.c
@@ -26,7 +26,15 @@
 
 void *xc_mem_access_enable(xc_interface *xch, domid_t domain_id, uint32_t *port)
 {
-    return xc_mem_event_enable(xch, domain_id, HVM_PARAM_ACCESS_RING_PFN, port);
+    return xc_mem_event_enable(xch, domain_id, HVM_PARAM_ACCESS_RING_PFN,
+                               port, 0);
+}
+
+void *xc_mem_access_enable_introspection(xc_interface *xch, domid_t domain_id,
+                                         uint32_t *port)
+{
+    return xc_mem_event_enable(xch, domain_id, HVM_PARAM_ACCESS_RING_PFN,
+                               port, 1);
 }
 
 int xc_mem_access_disable(xc_interface *xch, domid_t domain_id)
diff --git a/tools/libxc/xc_mem_event.c b/tools/libxc/xc_mem_event.c
index faf1cc6..8c0be4e 100644
--- a/tools/libxc/xc_mem_event.c
+++ b/tools/libxc/xc_mem_event.c
@@ -57,7 +57,7 @@ int xc_mem_event_memop(xc_interface *xch, domid_t domain_id,
 }
 
 void *xc_mem_event_enable(xc_interface *xch, domid_t domain_id, int param,
-                          uint32_t *port)
+                          uint32_t *port, int enable_introspection)
 {
     void *ring_page = NULL;
     uint64_t pfn;
@@ -120,7 +120,10 @@ void *xc_mem_event_enable(xc_interface *xch, domid_t domain_id, int param,
         break;
 
     case HVM_PARAM_ACCESS_RING_PFN:
-        op = XEN_DOMCTL_MEM_EVENT_OP_ACCESS_ENABLE;
+        if ( enable_introspection )
+            op = XEN_DOMCTL_MEM_EVENT_OP_ACCESS_ENABLE_INTROSPECTION;
+        else
+            op = XEN_DOMCTL_MEM_EVENT_OP_ACCESS_ENABLE;
         mode = XEN_DOMCTL_MEM_EVENT_OP_ACCESS;
         break;
 
diff --git a/tools/libxc/xc_private.h b/tools/libxc/xc_private.h
index c50a7c9..94df688 100644
--- a/tools/libxc/xc_private.h
+++ b/tools/libxc/xc_private.h
@@ -381,6 +381,6 @@ int xc_mem_event_memop(xc_interface *xch, domid_t domain_id,
  * param can be HVM_PARAM_PAGING/ACCESS/SHARING_RING_PFN
  */
 void *xc_mem_event_enable(xc_interface *xch, domid_t domain_id, int param,
-                          uint32_t *port);
+                          uint32_t *port, int enable_introspection);
 
 #endif /* __XC_PRIVATE_H__ */
diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
index 1c5d0db..28b5562 100644
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -2263,6 +2263,8 @@ int xc_mem_paging_load(xc_interface *xch, domid_t domain_id,
  * Caller has to unmap this page when done.
  */
 void *xc_mem_access_enable(xc_interface *xch, domid_t domain_id, uint32_t *port);
+void *xc_mem_access_enable_introspection(xc_interface *xch, domid_t domain_id,
+                                         uint32_t *port);
 int xc_mem_access_disable(xc_interface *xch, domid_t domain_id);
 int xc_mem_access_resume(xc_interface *xch, domid_t domain_id);
 
diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 4a4f4e1..fc1f882 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -39,6 +39,7 @@
 #include <xen/keyhandler.h>
 #include <asm/shadow.h>
 #include <asm/tboot.h>
+#include <asm/mem_event.h>
 
 static bool_t __read_mostly opt_vpid_enabled = 1;
 boolean_param("vpid", opt_vpid_enabled);
@@ -71,6 +72,18 @@ u32 vmx_vmexit_control __read_mostly;
 u32 vmx_vmentry_control __read_mostly;
 u64 vmx_ept_vpid_cap __read_mostly;
 
+const u32 vmx_introspection_force_enabled_msrs[] = {
+    MSR_IA32_SYSENTER_EIP,
+    MSR_IA32_SYSENTER_ESP,
+    MSR_IA32_SYSENTER_CS,
+    MSR_IA32_MC0_CTL,
+    MSR_STAR,
+    MSR_LSTAR
+};
+
+const unsigned int vmx_introspection_force_enabled_msrs_size =
+    ARRAY_SIZE(vmx_introspection_force_enabled_msrs);
+
 static DEFINE_PER_CPU_READ_MOSTLY(struct vmcs_struct *, vmxon_region);
 static DEFINE_PER_CPU(struct vmcs_struct *, current_vmcs);
 static DEFINE_PER_CPU(struct list_head, active_vmcs_list);
@@ -695,11 +708,23 @@ static void vmx_set_host_env(struct vcpu *v)
 void vmx_disable_intercept_for_msr(struct vcpu *v, u32 msr, int type)
 {
     unsigned long *msr_bitmap = v->arch.hvm_vmx.msr_bitmap;
+    struct domain *d = v->domain;
 
     /* VMX MSR bitmap supported? */
     if ( msr_bitmap == NULL )
         return;
 
+    if ( unlikely(d->arch.hvm_domain.introspection_enabled) &&
+         mem_event_check_ring(&d->mem_event->access) )
+    {
+        unsigned int i;
+
+        /* Filter out MSR-s needed for memory introspection */
+        for ( i = 0; i < vmx_introspection_force_enabled_msrs_size; i++ )
+            if ( msr == vmx_introspection_force_enabled_msrs[i] )
+                return;
+    }
+
     /*
      * See Intel PRM Vol. 3, 20.6.9 (MSR-Bitmap Address). Early manuals
      * have the write-low and read-high bitmap offsets the wrong way round.
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 26b1ad5..b08664e 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1683,6 +1683,18 @@ void vmx_hypervisor_cpuid_leaf(uint32_t sub_idx,
         *eax |= XEN_HVM_CPUID_X2APIC_VIRT;
 }
 
+static void vmx_enable_msr_exit_interception(struct domain *d)
+{
+    struct vcpu *v;
+    unsigned int i;
+
+    /* Enable interception for MSRs needed for memory introspection. */
+    for_each_vcpu ( d, v )
+        for ( i = 0; i < vmx_introspection_force_enabled_msrs_size; i++ )
+            vmx_enable_intercept_for_msr(v, vmx_introspection_force_enabled_msrs[i],
+                                         MSR_TYPE_W);
+}
+
 static struct hvm_function_table __initdata vmx_function_table = {
     .name                 = "VMX",
     .cpu_up_prepare       = vmx_cpu_up_prepare,
@@ -1741,6 +1753,7 @@ static struct hvm_function_table __initdata vmx_function_table = {
     .handle_eoi           = vmx_handle_eoi,
     .nhvm_hap_walk_L1_p2m = nvmx_hap_walk_L1_p2m,
     .hypervisor_cpuid_leaf = vmx_hypervisor_cpuid_leaf,
+    .enable_msr_exit_interception = vmx_enable_msr_exit_interception,
 };
 
 const struct hvm_function_table * __init start_vmx(void)
diff --git a/xen/arch/x86/mm/mem_event.c b/xen/arch/x86/mm/mem_event.c
index ba7e71e..fdd5ff6 100644
--- a/xen/arch/x86/mm/mem_event.c
+++ b/xen/arch/x86/mm/mem_event.c
@@ -587,6 +587,7 @@ int mem_event_domctl(struct domain *d, xen_domctl_mem_event_op_t *mec,
         switch( mec->op )
         {
         case XEN_DOMCTL_MEM_EVENT_OP_ACCESS_ENABLE:
+        case XEN_DOMCTL_MEM_EVENT_OP_ACCESS_ENABLE_INTROSPECTION:
         {
             rc = -ENODEV;
             /* Only HAP is supported */
@@ -600,13 +601,23 @@ int mem_event_domctl(struct domain *d, xen_domctl_mem_event_op_t *mec,
             rc = mem_event_enable(d, mec, med, _VPF_mem_access, 
                                     HVM_PARAM_ACCESS_RING_PFN,
                                     mem_access_notification);
+
+            if ( mec->op != XEN_DOMCTL_MEM_EVENT_OP_ACCESS_ENABLE &&
+                 rc == 0 && hvm_funcs.enable_msr_exit_interception )
+            {
+                d->arch.hvm_domain.introspection_enabled = 1;
+                hvm_funcs.enable_msr_exit_interception(d);
+            }
         }
         break;
 
         case XEN_DOMCTL_MEM_EVENT_OP_ACCESS_DISABLE:
         {
             if ( med->ring_page )
+            {
                 rc = mem_event_disable(d, med);
+                d->arch.hvm_domain.introspection_enabled = 0;
+            }
         }
         break;
 
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index 291a2e0..30d4aa3 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -134,6 +134,7 @@ struct hvm_domain {
     bool_t                 mem_sharing_enabled;
     bool_t                 qemu_mapcache_invalidate;
     bool_t                 is_s3_suspended;
+    bool_t                 introspection_enabled;
 
     /*
      * TSC value that VCPUs use to calculate their tsc_offset value.
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 1123857..121d053 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -205,6 +205,8 @@ struct hvm_function_table {
     void (*hypervisor_cpuid_leaf)(uint32_t sub_idx,
                                   uint32_t *eax, uint32_t *ebx,
                                   uint32_t *ecx, uint32_t *edx);
+
+    void (*enable_msr_exit_interception)(struct domain *d);
 };
 
 extern struct hvm_function_table hvm_funcs;
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 215d93c..6a99dca 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -471,6 +471,13 @@ enum vmcs_field {
     HOST_RIP                        = 0x00006c16,
 };
 
+/*
+ * A set of MSR-s that need to be enabled for memory introspection
+ * to work.
+ */
+extern const u32 vmx_introspection_force_enabled_msrs[];
+extern const unsigned int vmx_introspection_force_enabled_msrs_size;
+
 #define VMCS_VPID_WIDTH 16
 
 #define MSR_TYPE_R 1
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 5b11bbf..a3431ec 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -772,10 +772,11 @@ struct xen_domctl_gdbsx_domstatus {
  * ENODEV - host lacks HAP support (EPT/NPT) or HAP is disabled in guest
  * EBUSY  - guest has or had access enabled, ring buffer still active
  */
-#define XEN_DOMCTL_MEM_EVENT_OP_ACCESS            2
+#define XEN_DOMCTL_MEM_EVENT_OP_ACCESS                        2
 
-#define XEN_DOMCTL_MEM_EVENT_OP_ACCESS_ENABLE     0
-#define XEN_DOMCTL_MEM_EVENT_OP_ACCESS_DISABLE    1
+#define XEN_DOMCTL_MEM_EVENT_OP_ACCESS_ENABLE                 0
+#define XEN_DOMCTL_MEM_EVENT_OP_ACCESS_DISABLE                1
+#define XEN_DOMCTL_MEM_EVENT_OP_ACCESS_ENABLE_INTROSPECTION   2
 
 /*
  * Sharing ENOMEM helper.
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH RFC V11 3/5] xen, libxc: Force-enable relevant MSR events
  2014-09-02 15:37 [PATCH RFC V11 1/5] xen: Emulate with no writes Razvan Cojocaru
  2014-09-02 15:37 ` [PATCH RFC V11 2/5] xen: Optimize introspection access to guest state Razvan Cojocaru
@ 2014-09-02 15:37 ` Razvan Cojocaru
  2014-09-02 15:37 ` [PATCH RFC V11 4/5] xen, libxc: Request page fault injection via libxc Razvan Cojocaru
  2014-09-02 15:37 ` [PATCH RFC V11 5/5] xen: Handle resumed instruction based on previous mem_event reply Razvan Cojocaru
  3 siblings, 0 replies; 10+ messages in thread
From: Razvan Cojocaru @ 2014-09-02 15:37 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, ian.campbell, Razvan Cojocaru, stefano.stabellini,
	jun.nakajima, andrew.cooper3, eddie.dong, tim, JBeulich,
	ian.jackson

Vmx_disable_intercept_for_msr() will now refuse to disable interception of
MSRs needed for memory introspection. It is not possible to gate this on
mem_access being active for the domain, since by the time mem_access does
become active the interception for the interesting MSRs has already been
disabled (vmx_disable_intercept_for_msr() runs very early on).

Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

---
Changes since V8:
 - Renamed vmx_msrs_exit_array to vmx_introspection_force_enabled_msrs.

Changes since V7:
 - Reversed if conditions (cheapest one first).
 - Combined two if statements.
 - Moved "bool_t introspection_enabled;" to the bool_t group above.
 - Renamed msrs_exit_array to vmx_msrs_exit_array and made it
   "extern" in the header.

Changes since V6:
 - Moved the array of interesting MSRs to common header.
 - Minor code cleanup.

Changes since V5:
 - Added xc_mem_access_enable_introspection() to libxc, which has
   the same parameters and semantics as xc_mem_access_enable(),
   but it additionally sets d->arch.hvm_domain.introspection_enabled
   and enables relevant MSR interception.
 - Renamed vmx_enable_intro_msr_interception() to
   vmx_enable_msr_exit_interception().
 - Simplified vmx_enable_msr_exit_interception() (const array of MSRs).

Changes since V3:
 - Removed log line stating that MSR interception cannot be disabled.
 - Removed superfluous #include <asm/hvm/vmx/vmcs.h>.
 - Moved VMX-specific code to vmx.c (as a new hvm_funcs member).

Changes since V2:
 - Split a log line differently to keep it grepable.
 - Interception for relevant MSRs will be disabled only if
   mem_access is not enabled.
 - Since they end up being disabled early on (when mem_access
   is not enabled yet), re-enable interception when mem_access
   becomes active.

Changes since V1:
 - Replaced printk() with gdprintk(XENLOG_DEBUG, ...).
---
 xen/arch/x86/hvm/hvm.c         |   33 +++++++++++++++++++++++
 xen/arch/x86/mm/p2m.c          |   57 ++++++++++++++++++++++++++++++++++++++++
 xen/include/public/mem_event.h |   39 +++++++++++++++++++++++++++
 3 files changed, 129 insertions(+)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 83e6fae..5761ff9 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -6149,6 +6149,38 @@ int hvm_debug_op(struct vcpu *v, int32_t op)
     return rc;
 }
 
+static void hvm_mem_event_fill_regs(mem_event_request_t *req)
+{
+    const struct cpu_user_regs *regs = guest_cpu_user_regs();
+    const struct vcpu *curr = current;
+
+    req->x86_regs.rax = regs->eax;
+    req->x86_regs.rcx = regs->ecx;
+    req->x86_regs.rdx = regs->edx;
+    req->x86_regs.rbx = regs->ebx;
+    req->x86_regs.rsp = regs->esp;
+    req->x86_regs.rbp = regs->ebp;
+    req->x86_regs.rsi = regs->esi;
+    req->x86_regs.rdi = regs->edi;
+
+    req->x86_regs.r8  = regs->r8;
+    req->x86_regs.r9  = regs->r9;
+    req->x86_regs.r10 = regs->r10;
+    req->x86_regs.r11 = regs->r11;
+    req->x86_regs.r12 = regs->r12;
+    req->x86_regs.r13 = regs->r13;
+    req->x86_regs.r14 = regs->r14;
+    req->x86_regs.r15 = regs->r15;
+
+    req->x86_regs.rflags = regs->eflags;
+    req->x86_regs.rip    = regs->eip;
+
+    req->x86_regs.msr_efer = curr->arch.hvm_vcpu.guest_efer;
+    req->x86_regs.cr0 = curr->arch.hvm_vcpu.guest_cr[0];
+    req->x86_regs.cr3 = curr->arch.hvm_vcpu.guest_cr[3];
+    req->x86_regs.cr4 = curr->arch.hvm_vcpu.guest_cr[4];
+}
+
 static int hvm_memory_event_traps(long p, uint32_t reason,
                                   unsigned long value, unsigned long old, 
                                   bool_t gla_valid, unsigned long gla) 
@@ -6193,6 +6225,7 @@ static int hvm_memory_event_traps(long p, uint32_t reason,
         req.gla = old;
     }
     
+    hvm_mem_event_fill_regs(&req);
     mem_event_put_request(d, &d->mem_event->access, &req);
     
     return 1;
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index c2e89e1..6ed4109 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1323,6 +1323,61 @@ void p2m_mem_paging_resume(struct domain *d)
     }
 }
 
+static void p2m_mem_event_fill_regs(mem_event_request_t *req)
+{
+    const struct cpu_user_regs *regs = guest_cpu_user_regs();
+    struct segment_register seg;
+    struct hvm_hw_cpu ctxt;
+    struct vcpu *curr = current;
+
+    /* Architecture-specific vmcs/vmcb bits */
+    hvm_funcs.save_cpu_ctxt(curr, &ctxt);
+
+    req->x86_regs.rax = regs->eax;
+    req->x86_regs.rcx = regs->ecx;
+    req->x86_regs.rdx = regs->edx;
+    req->x86_regs.rbx = regs->ebx;
+    req->x86_regs.rsp = regs->esp;
+    req->x86_regs.rbp = regs->ebp;
+    req->x86_regs.rsi = regs->esi;
+    req->x86_regs.rdi = regs->edi;
+
+    req->x86_regs.r8  = regs->r8;
+    req->x86_regs.r9  = regs->r9;
+    req->x86_regs.r10 = regs->r10;
+    req->x86_regs.r11 = regs->r11;
+    req->x86_regs.r12 = regs->r12;
+    req->x86_regs.r13 = regs->r13;
+    req->x86_regs.r14 = regs->r14;
+    req->x86_regs.r15 = regs->r15;
+
+    req->x86_regs.rflags = regs->eflags;
+    req->x86_regs.rip    = regs->eip;
+
+    req->x86_regs.dr7 = curr->arch.debugreg[7];
+    req->x86_regs.cr0 = ctxt.cr0;
+    req->x86_regs.cr2 = ctxt.cr2;
+    req->x86_regs.cr3 = ctxt.cr3;
+    req->x86_regs.cr4 = ctxt.cr4;
+
+    req->x86_regs.sysenter_cs = ctxt.sysenter_cs;
+    req->x86_regs.sysenter_esp = ctxt.sysenter_esp;
+    req->x86_regs.sysenter_eip = ctxt.sysenter_eip;
+
+    req->x86_regs.msr_efer = ctxt.msr_efer;
+    req->x86_regs.msr_star = ctxt.msr_star;
+    req->x86_regs.msr_lstar = ctxt.msr_lstar;
+
+    hvm_get_segment_register(curr, x86_seg_fs, &seg);
+    req->x86_regs.fs_base = seg.base;
+
+    hvm_get_segment_register(curr, x86_seg_gs, &seg);
+    req->x86_regs.gs_base = seg.base;
+
+    hvm_get_segment_register(curr, x86_seg_cs, &seg);
+    req->x86_regs.cs_arbytes = seg.attr.bytes;
+}
+
 bool_t p2m_mem_access_check(paddr_t gpa, unsigned long gla,
                             struct npfec npfec,
                             mem_event_request_t **req_ptr)
@@ -1413,6 +1468,8 @@ bool_t p2m_mem_access_check(paddr_t gpa, unsigned long gla,
         req->access_w = npfec.write_access;
         req->access_x = npfec.insn_fetch;
         req->vcpu_id = v->vcpu_id;
+
+        p2m_mem_event_fill_regs(req);
     }
 
     /* Pause the current VCPU */
diff --git a/xen/include/public/mem_event.h b/xen/include/public/mem_event.h
index fc12697..d3dd9c6 100644
--- a/xen/include/public/mem_event.h
+++ b/xen/include/public/mem_event.h
@@ -48,6 +48,44 @@
 #define MEM_EVENT_REASON_MSR         7    /* MSR was hit: gfn is MSR value, gla is MSR address;
                                              does NOT honour HVMPME_onchangeonly */
 
+/* Using a custom struct (not hvm_hw_cpu) so as to not fill
+ * the mem_event ring buffer too quickly. */
+struct mem_event_regs_x86 {
+    uint64_t rax;
+    uint64_t rcx;
+    uint64_t rdx;
+    uint64_t rbx;
+    uint64_t rsp;
+    uint64_t rbp;
+    uint64_t rsi;
+    uint64_t rdi;
+    uint64_t r8;
+    uint64_t r9;
+    uint64_t r10;
+    uint64_t r11;
+    uint64_t r12;
+    uint64_t r13;
+    uint64_t r14;
+    uint64_t r15;
+    uint64_t rflags;
+    uint64_t dr7;
+    uint64_t rip;
+    uint64_t cr0;
+    uint64_t cr2;
+    uint64_t cr3;
+    uint64_t cr4;
+    uint64_t sysenter_cs;
+    uint64_t sysenter_esp;
+    uint64_t sysenter_eip;
+    uint64_t msr_efer;
+    uint64_t msr_star;
+    uint64_t msr_lstar;
+    uint64_t fs_base;
+    uint64_t gs_base;
+    uint32_t cs_arbytes;
+    uint32_t _pad;
+};
+
 typedef struct mem_event_st {
     uint32_t flags;
     uint32_t vcpu_id;
@@ -67,6 +105,7 @@ typedef struct mem_event_st {
     uint16_t available:10;
 
     uint16_t reason;
+    struct mem_event_regs_x86 x86_regs;
 } mem_event_request_t, mem_event_response_t;
 
 DEFINE_RING_TYPES(mem_event, mem_event_request_t, mem_event_response_t);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH RFC V11 4/5] xen, libxc: Request page fault injection via libxc
  2014-09-02 15:37 [PATCH RFC V11 1/5] xen: Emulate with no writes Razvan Cojocaru
  2014-09-02 15:37 ` [PATCH RFC V11 2/5] xen: Optimize introspection access to guest state Razvan Cojocaru
  2014-09-02 15:37 ` [PATCH RFC V11 3/5] xen, libxc: Force-enable relevant MSR events Razvan Cojocaru
@ 2014-09-02 15:37 ` Razvan Cojocaru
  2014-09-02 15:37 ` [PATCH RFC V11 5/5] xen: Handle resumed instruction based on previous mem_event reply Razvan Cojocaru
  3 siblings, 0 replies; 10+ messages in thread
From: Razvan Cojocaru @ 2014-09-02 15:37 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, ian.campbell, Razvan Cojocaru, stefano.stabellini,
	jun.nakajima, andrew.cooper3, eddie.dong, tim, JBeulich,
	ian.jackson

Extended HVMOP_inject_trap to allow asking for trap injection done
by the first available CPU, when it's in user mode and its CR3
matches the one for an interesting application inside the guest.
This mechanism allows bringing in swapped-out pages for inspection.

Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
---
 tools/libxc/xc_misc.c            |    5 +-
 tools/libxc/xenctrl.h            |    3 +-
 xen/arch/x86/hvm/hvm.c           |  103 +++++++++++++++++++++++++++++++++-----
 xen/include/asm-x86/hvm/domain.h |    3 ++
 xen/include/asm-x86/hvm/hvm.h    |    1 +
 xen/include/public/hvm/hvm_op.h  |    2 +
 6 files changed, 101 insertions(+), 16 deletions(-)

diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index e253a58..6773446 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -597,7 +597,7 @@ int xc_hvm_set_mem_type(
 int xc_hvm_inject_trap(
     xc_interface *xch, domid_t dom, int vcpu, uint32_t vector,
     uint32_t type, uint32_t error_code, uint32_t insn_len,
-    uint64_t cr2)
+    uint64_t cr2, uint64_t cr3)
 {
     DECLARE_HYPERCALL;
     DECLARE_HYPERCALL_BUFFER(struct xen_hvm_inject_trap, arg);
@@ -611,12 +611,13 @@ int xc_hvm_inject_trap(
     }
 
     arg->domid       = dom;
-    arg->vcpuid      = vcpu;
+    arg->vcpuid      = (vcpu == -1 ? (uint32_t)~0 : vcpu);
     arg->vector      = vector;
     arg->type        = type;
     arg->error_code  = error_code;
     arg->insn_len    = insn_len;
     arg->cr2         = cr2;
+    arg->cr3         = cr3;
 
     hypercall.op     = __HYPERVISOR_hvm_op;
     hypercall.arg[0] = HVMOP_inject_trap;
diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
index 28b5562..5bf0173 100644
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -1816,11 +1816,12 @@ int xc_hvm_set_mem_type(
 /*
  * Injects a hardware/software CPU trap, to take effect the next time the HVM 
  * resumes. 
+ * Cr3 is only taken into account if vcpu == -1 (wildcard for "any vcpu").
  */
 int xc_hvm_inject_trap(
     xc_interface *xch, domid_t dom, int vcpu, uint32_t vector,
     uint32_t type, uint32_t error_code, uint32_t insn_len,
-    uint64_t cr2);
+    uint64_t cr2, uint64_t cr3);
 
 /*
  *  LOGGING AND ERROR REPORTING
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 5761ff9..bfd047b 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -420,6 +420,52 @@ static bool_t hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p)
     return 1;
 }
 
+static bool_t hvm_is_pf_requested(struct vcpu *v)
+{
+    const struct domain *d = v->domain;
+    struct segment_register seg;
+//    unsigned long ev;
+//    uint32_t pending_event = 0;
+    uint64_t mask;
+
+/*
+    if ( !is_hvm_domain(currd) ||
+         likely(!currd->arch.hvm_domain.request_pagefault_info.valid) )
+        return 0;
+*/
+
+    hvm_get_segment_register(v, x86_seg_ss, &seg);
+
+    if ( seg.attr.fields.dpl != 3 ) /* Guest is not in user mode */
+        return 0;
+
+    if ( hvm_long_mode_enabled(v) )
+        mask = PADDR_MASK & PAGE_MASK; /* Bits 51:12. */
+    else if ( hvm_pae_enabled(v) )
+        mask = 0x00000000ffffffe0; /* Bits 31:5. */
+    else
+        mask = (uint32_t)PAGE_MASK; /* Bits 31:12. */
+
+    if ( (v->arch.hvm_vcpu.guest_cr[3] & mask) !=
+         (d->arch.hvm_domain.inject_trap.cr3 & mask) )
+        return 0;
+
+    return 1;
+
+/*
+    vmx_vmcs_enter(curr);
+    __vmread(VM_ENTRY_INTR_INFO, &ev);
+    vmx_vmcs_exit(curr);
+
+    if ( (ev & INTR_INFO_VALID_MASK) &&
+         hvm_event_needs_reinjection(MASK_EXTR(ev, INTR_INFO_INTR_TYPE_MASK),
+                                     ev & INTR_INFO_VECTOR_MASK) )
+        pending_event = ev;
+
+    return pending_event == 0;
+*/
+}
+
 void hvm_do_resume(struct vcpu *v)
 {
     struct domain *d = v->domain;
@@ -451,6 +497,16 @@ void hvm_do_resume(struct vcpu *v)
     }
 
     /* Inject pending hw/sw trap */
+    if ( d->arch.hvm_domain.inject_trap.vector != -1 &&
+         v->arch.hvm_vcpu.inject_trap.vector == -1 &&
+         hvm_is_pf_requested(v) )
+    {
+        printk("Injected\n");
+        hvm_inject_trap(&d->arch.hvm_domain.inject_trap);
+        d->arch.hvm_domain.inject_trap.vector = -1;
+    }
+
+    /* Inject pending hw/sw trap */
     if ( v->arch.hvm_vcpu.inject_trap.vector != -1 ) 
     {
         hvm_inject_trap(&v->arch.hvm_vcpu.inject_trap);
@@ -1473,9 +1529,10 @@ int hvm_domain_initialise(struct domain *d)
             printk(XENLOG_G_INFO "PVH guest must have HAP on\n");
             return -EINVAL;
         }
-
     }
 
+    d->arch.hvm_domain.inject_trap.vector = -1;
+
     spin_lock_init(&d->arch.hvm_domain.ioreq_server.lock);
     INIT_LIST_HEAD(&d->arch.hvm_domain.ioreq_server.list);
     spin_lock_init(&d->arch.hvm_domain.irq_lock);
@@ -6086,19 +6143,39 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
             goto param_fail8;
 
         rc = -ENOENT;
-        if ( tr.vcpuid >= d->max_vcpus || (v = d->vcpu[tr.vcpuid]) == NULL )
-            goto param_fail8;
-        
-        if ( v->arch.hvm_vcpu.inject_trap.vector != -1 )
-            rc = -EBUSY;
-        else 
+
+        if ( tr.vcpuid == (uint32_t)~0 ) /* Any VCPU. */
         {
-            v->arch.hvm_vcpu.inject_trap.vector = tr.vector;
-            v->arch.hvm_vcpu.inject_trap.type = tr.type;
-            v->arch.hvm_vcpu.inject_trap.error_code = tr.error_code;
-            v->arch.hvm_vcpu.inject_trap.insn_len = tr.insn_len;
-            v->arch.hvm_vcpu.inject_trap.cr2 = tr.cr2;
-            rc = 0;
+            if ( d->arch.hvm_domain.inject_trap.vector != -1 )
+                rc = -EBUSY;
+            else
+            {
+                d->arch.hvm_domain.inject_trap.vector = tr.vector;
+                d->arch.hvm_domain.inject_trap.type = tr.type;
+                d->arch.hvm_domain.inject_trap.error_code = tr.error_code;
+                d->arch.hvm_domain.inject_trap.insn_len = tr.insn_len;
+                d->arch.hvm_domain.inject_trap.cr2 = tr.cr2;
+                d->arch.hvm_domain.inject_trap.cr3 = tr.cr3;
+                rc = 0;
+            }
+        }
+        else
+        {
+            if ( tr.vcpuid >= d->max_vcpus || (v = d->vcpu[tr.vcpuid]) == NULL )
+                goto param_fail8;
+
+            if ( v->arch.hvm_vcpu.inject_trap.vector != -1 )
+                rc = -EBUSY;
+            else
+            {
+                v->arch.hvm_vcpu.inject_trap.vector = tr.vector;
+                v->arch.hvm_vcpu.inject_trap.type = tr.type;
+                v->arch.hvm_vcpu.inject_trap.error_code = tr.error_code;
+                v->arch.hvm_vcpu.inject_trap.insn_len = tr.insn_len;
+                v->arch.hvm_vcpu.inject_trap.cr2 = tr.cr2;
+                v->arch.hvm_vcpu.inject_trap.cr3 = tr.cr3;
+                rc = 0;
+            }
         }
 
     param_fail8:
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index 30d4aa3..b432874 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -146,6 +146,9 @@ struct hvm_domain {
         struct vmx_domain vmx;
         struct svm_domain svm;
     };
+
+    /* Pending hw/sw interrupt (.vector = -1 means nothing pending). */
+    struct hvm_trap     inject_trap;
 };
 
 #define hap_enabled(d)  ((d)->arch.hvm_domain.hap_enabled)
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 121d053..3b0bde9 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -78,6 +78,7 @@ struct hvm_trap {
     int           error_code;   /* HVM_DELIVER_NO_ERROR_CODE if n/a */
     int           insn_len;     /* Instruction length */ 
     unsigned long cr2;          /* Only for TRAP_page_fault h/w exception */
+    unsigned long cr3;          /* Only for TRAP_page_fault h/w exception */
 };
 
 /*
diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
index eeb0a60..5c229b7 100644
--- a/xen/include/public/hvm/hvm_op.h
+++ b/xen/include/public/hvm/hvm_op.h
@@ -197,6 +197,8 @@ struct xen_hvm_inject_trap {
     uint32_t insn_len;
     /* CR2 for page faults */
     uint64_aligned_t cr2;
+    /* If vcpuid == -1, any CPU with a matching CR3 will inject. */
+    uint64_aligned_t cr3;
 };
 typedef struct xen_hvm_inject_trap xen_hvm_inject_trap_t;
 DEFINE_XEN_GUEST_HANDLE(xen_hvm_inject_trap_t);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH RFC V11 5/5] xen: Handle resumed instruction based on previous mem_event reply
  2014-09-02 15:37 [PATCH RFC V11 1/5] xen: Emulate with no writes Razvan Cojocaru
                   ` (2 preceding siblings ...)
  2014-09-02 15:37 ` [PATCH RFC V11 4/5] xen, libxc: Request page fault injection via libxc Razvan Cojocaru
@ 2014-09-02 15:37 ` Razvan Cojocaru
  3 siblings, 0 replies; 10+ messages in thread
From: Razvan Cojocaru @ 2014-09-02 15:37 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, ian.campbell, Razvan Cojocaru, stefano.stabellini,
	jun.nakajima, andrew.cooper3, eddie.dong, tim, JBeulich,
	ian.jackson

In a scenario where a page fault that triggered a mem_event occured,
p2m_mem_access_check() will now be able to either 1) emulate the
current instruction, or 2) emulate it, but don't allow it to perform
any writes.

Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>

---
Changes since V10:
 - Removed #ifndef NDEBUG.

Changes since V9:
 - Changed the EXIT_REASON_EPT_VIOLATION check into an ASSERT().

Changes since V8:
 - Make sure that the exit reason is EXIT_REASON_EPT_VIOLATION
   before checking for EPT_GLA_FAULT.

Changes since V7:
 - Now calling hvm_mem_event_emulate_one() instead of
   hvm_emulate_one_full() (renamed in the first patch).

Changes since V5:
 - Fixed indentation.
 - Re-ordered a set of if() conditions.

Changes since V4:
 - Renamed "exited_by_pagefault" to "exited_by_nested_pagefault".
 - Folded two long lines.
 - Combined two "if" statements.
 - Moved "violation = 0;" to the XENMEM_access_rwx case.

Changes since V3:
 - Collapsed verbose lines into a single "else if()".
 - Changed an int to bool_t.
 - Fixed a minor coding style issue.
 - Now computing the first parameter to hvm_emulate_one_full()
   (replaced an "if" with a single call).
 - Added code comments about eip and gla reset (clarity issue).
 - Removed duplicate code by Andrew Cooper (introduced in V2,
   since committed).

Changes since V2:
 - Moved the __vmread(EXIT_QUALIFICATION, &exit_qualification); code
   in vmx.c, accessible via hvm_funcs.
 - Incorporated changes by Andrew Cooper ("[PATCH 1/2] Xen/mem_event:
   Validate the response vcpu_id before acting on it."

Changes since V1:
 - Removed the 'skip' code which required computing the current
   instruction length.
 - Removed the set_ad_bits() code that attempted to modify the
   'accessed' and 'dirty' bits for instructions that the emulator
   can't handle at the moment.
---
 xen/arch/x86/domain.c          |    3 ++
 xen/arch/x86/hvm/vmx/vmx.c     |   16 ++++++++
 xen/arch/x86/mm/p2m.c          |   84 ++++++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/domain.h   |    9 +++++
 xen/include/asm-x86/hvm/hvm.h  |    2 +
 xen/include/public/mem_event.h |   12 +++---
 6 files changed, 121 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index f7e0e78..7b1dfe6 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -415,6 +415,9 @@ int vcpu_initialise(struct vcpu *v)
 
     v->arch.flags = TF_kernel_mode;
 
+    /* By default, do not emulate */
+    v->arch.mem_event.emulate_flags = 0;
+
     rc = mapcache_vcpu_init(v);
     if ( rc )
         return rc;
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index b08664e..cc317a4 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1695,6 +1695,21 @@ static void vmx_enable_msr_exit_interception(struct domain *d)
                                          MSR_TYPE_W);
 }
 
+static bool_t vmx_exited_by_nested_pagefault(void)
+{
+    unsigned long exit_qualification, exit_reason;
+
+    __vmread(VM_EXIT_REASON, &exit_reason);
+    ASSERT(exit_reason == EXIT_REASON_EPT_VIOLATION);
+
+    __vmread(EXIT_QUALIFICATION, &exit_qualification);
+
+    if ( (exit_qualification & EPT_GLA_FAULT) == 0 )
+        return 0;
+
+    return 1;
+}
+
 static struct hvm_function_table __initdata vmx_function_table = {
     .name                 = "VMX",
     .cpu_up_prepare       = vmx_cpu_up_prepare,
@@ -1754,6 +1769,7 @@ static struct hvm_function_table __initdata vmx_function_table = {
     .nhvm_hap_walk_L1_p2m = nvmx_hap_walk_L1_p2m,
     .hypervisor_cpuid_leaf = vmx_hypervisor_cpuid_leaf,
     .enable_msr_exit_interception = vmx_enable_msr_exit_interception,
+    .exited_by_nested_pagefault  = vmx_exited_by_nested_pagefault,
 };
 
 const struct hvm_function_table * __init start_vmx(void)
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 6ed4109..3f35832 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1391,6 +1391,7 @@ bool_t p2m_mem_access_check(paddr_t gpa, unsigned long gla,
     p2m_access_t p2ma;
     mem_event_request_t *req;
     int rc;
+    unsigned long eip = guest_cpu_user_regs()->eip;
 
     /* First, handle rx2rw conversion automatically.
      * These calls to p2m->set_entry() must succeed: we have the gfn
@@ -1443,6 +1444,36 @@ bool_t p2m_mem_access_check(paddr_t gpa, unsigned long gla,
             return 1;
         }
     }
+    else if ( v->arch.mem_event.emulate_flags == 0 &&
+              hvm_funcs.exited_by_nested_pagefault &&
+              !hvm_funcs.exited_by_nested_pagefault() ) /* don't send a mem_event */
+    {
+        v->arch.mem_event.emulate_flags = MEM_EVENT_FLAG_EMULATE;
+        v->arch.mem_event.gpa = gpa;
+        v->arch.mem_event.eip = eip;
+    }
+
+    /* The previous mem_event reply does not match the current state. */
+    if ( v->arch.mem_event.gpa != gpa || v->arch.mem_event.eip != eip )
+    {
+        /* Don't emulate the current instruction, send a new mem_event. */
+        v->arch.mem_event.emulate_flags = 0;
+
+        /* Make sure to mark the current state to match it again against
+         * the new mem_event about to be sent. */
+        v->arch.mem_event.gpa = gpa;
+        v->arch.mem_event.eip = eip;
+    }
+
+    if ( v->arch.mem_event.emulate_flags )
+    {
+        hvm_mem_event_emulate_one((v->arch.mem_event.emulate_flags &
+                                   MEM_EVENT_FLAG_EMULATE_NOWRITE) != 0,
+                                  TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
+
+        v->arch.mem_event.emulate_flags = 0;
+        return 1;
+    }
 
     *req_ptr = NULL;
     req = xzalloc(mem_event_request_t);
@@ -1498,6 +1529,59 @@ void p2m_mem_access_resume(struct domain *d)
 
         v = d->vcpu[rsp.vcpu_id];
 
+        /* Mark vcpu for skipping one instruction upon rescheduling */
+        if ( rsp.flags & MEM_EVENT_FLAG_EMULATE )
+        {
+            xenmem_access_t access;
+            bool_t violation = 1;
+
+            v->arch.mem_event.emulate_flags = 0;
+
+            if ( p2m_get_mem_access(d, rsp.gfn, &access) == 0 )
+            {
+                switch ( access )
+                {
+                case XENMEM_access_n:
+                case XENMEM_access_n2rwx:
+                default:
+                    violation = rsp.access_r || rsp.access_w || rsp.access_x;
+                    break;
+
+                case XENMEM_access_r:
+                    violation = rsp.access_w || rsp.access_x;
+                    break;
+
+                case XENMEM_access_w:
+                    violation = rsp.access_r || rsp.access_x;
+                    break;
+
+                case XENMEM_access_x:
+                    violation = rsp.access_r || rsp.access_w;
+                    break;
+
+                case XENMEM_access_rx:
+                case XENMEM_access_rx2rw:
+                    violation = rsp.access_w;
+                    break;
+
+                case XENMEM_access_wx:
+                    violation = rsp.access_r;
+                    break;
+
+                case XENMEM_access_rw:
+                    violation = rsp.access_x;
+                    break;
+
+                case XENMEM_access_rwx:
+                    violation = 0;
+                    break;
+                }
+            }
+
+            if ( violation )
+                v->arch.mem_event.emulate_flags = rsp.flags;
+        }
+
         /* Unpause domain */
         if ( rsp.flags & MEM_EVENT_FLAG_VCPU_PAUSED )
             mem_event_vcpu_unpause(v);
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 83329ed..440aa81 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -458,6 +458,15 @@ struct arch_vcpu
 
     /* A secondary copy of the vcpu time info. */
     XEN_GUEST_HANDLE(vcpu_time_info_t) time_info_guest;
+
+    /* Should we emulate the next matching instruction on VCPU resume
+     * after a mem_event? */
+    struct {
+        uint32_t emulate_flags;
+        unsigned long gpa;
+        unsigned long eip;
+    } mem_event;
+
 } __cacheline_aligned;
 
 smap_check_policy_t smap_policy_change(struct vcpu *v,
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 3b0bde9..9ab126c 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -208,6 +208,8 @@ struct hvm_function_table {
                                   uint32_t *ecx, uint32_t *edx);
 
     void (*enable_msr_exit_interception)(struct domain *d);
+
+    bool_t (*exited_by_nested_pagefault)(void);
 };
 
 extern struct hvm_function_table hvm_funcs;
diff --git a/xen/include/public/mem_event.h b/xen/include/public/mem_event.h
index d3dd9c6..92c063c 100644
--- a/xen/include/public/mem_event.h
+++ b/xen/include/public/mem_event.h
@@ -31,11 +31,13 @@
 #include "io/ring.h"
 
 /* Memory event flags */
-#define MEM_EVENT_FLAG_VCPU_PAUSED  (1 << 0)
-#define MEM_EVENT_FLAG_DROP_PAGE    (1 << 1)
-#define MEM_EVENT_FLAG_EVICT_FAIL   (1 << 2)
-#define MEM_EVENT_FLAG_FOREIGN      (1 << 3)
-#define MEM_EVENT_FLAG_DUMMY        (1 << 4)
+#define MEM_EVENT_FLAG_VCPU_PAUSED     (1 << 0)
+#define MEM_EVENT_FLAG_DROP_PAGE       (1 << 1)
+#define MEM_EVENT_FLAG_EVICT_FAIL      (1 << 2)
+#define MEM_EVENT_FLAG_FOREIGN         (1 << 3)
+#define MEM_EVENT_FLAG_DUMMY           (1 << 4)
+#define MEM_EVENT_FLAG_EMULATE         (1 << 5)
+#define MEM_EVENT_FLAG_EMULATE_NOWRITE (1 << 6)
 
 /* Reasons for the memory event request */
 #define MEM_EVENT_REASON_UNKNOWN     0    /* typical reason */
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC V11 2/5] xen: Optimize introspection access to guest state
  2014-09-02 15:37 ` [PATCH RFC V11 2/5] xen: Optimize introspection access to guest state Razvan Cojocaru
@ 2014-09-03 13:06   ` Ian Campbell
  2014-09-03 13:10     ` Razvan Cojocaru
  0 siblings, 1 reply; 10+ messages in thread
From: Ian Campbell @ 2014-09-03 13:06 UTC (permalink / raw)
  To: Razvan Cojocaru
  Cc: kevin.tian, JBeulich, stefano.stabellini, andrew.cooper3,
	eddie.dong, tim, jun.nakajima, xen-devel, ian.jackson

On Tue, 2014-09-02 at 18:37 +0300, Razvan Cojocaru wrote:
>  
>  void *xc_mem_access_enable(xc_interface *xch, domid_t domain_id, uint32_t *port)
>  {
> -    return xc_mem_event_enable(xch, domain_id, HVM_PARAM_ACCESS_RING_PFN, port);
> +    return xc_mem_event_enable(xch, domain_id, HVM_PARAM_ACCESS_RING_PFN,
> +                               port, 0);
> +}
> +
> +void *xc_mem_access_enable_introspection(xc_interface *xch, domid_t domain_id,
> +                                         uint32_t *port)

I'm not really sure the _introspection suffix is the best description of
what is actually going on here. It looks more like an "extended context
in event" type thing or something AFAICT.

But none the less the code seems correct even if the naming isn't ideal,
so for the toolstack side:

Acked-by: Ian Campbell <ian.campbell@citrix.com>


Ian.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC V11 2/5] xen: Optimize introspection access to guest state
  2014-09-03 13:06   ` Ian Campbell
@ 2014-09-03 13:10     ` Razvan Cojocaru
  2014-09-03 13:38       ` Ian Campbell
  0 siblings, 1 reply; 10+ messages in thread
From: Razvan Cojocaru @ 2014-09-03 13:10 UTC (permalink / raw)
  To: Ian Campbell
  Cc: kevin.tian, eddie.dong, stefano.stabellini, jun.nakajima,
	andrew.cooper3, tim, JBeulich, xen-devel, ian.jackson

On 09/03/2014 04:06 PM, Ian Campbell wrote:
> On Tue, 2014-09-02 at 18:37 +0300, Razvan Cojocaru wrote:
>>  
>>  void *xc_mem_access_enable(xc_interface *xch, domid_t domain_id, uint32_t *port)
>>  {
>> -    return xc_mem_event_enable(xch, domain_id, HVM_PARAM_ACCESS_RING_PFN, port);
>> +    return xc_mem_event_enable(xch, domain_id, HVM_PARAM_ACCESS_RING_PFN,
>> +                               port, 0);
>> +}
>> +
>> +void *xc_mem_access_enable_introspection(xc_interface *xch, domid_t domain_id,
>> +                                         uint32_t *port)
> 
> I'm not really sure the _introspection suffix is the best description of
> what is actually going on here. It looks more like an "extended context
> in event" type thing or something AFAICT.
> 
> But none the less the code seems correct even if the naming isn't ideal,
> so for the toolstack side:
> 
> Acked-by: Ian Campbell <ian.campbell@citrix.com>

Thanks! I'll make sure to add the Acked-by tag to the next iteration of
the series (in the meantime it's become [PATCH V2]).


Thanks,
Razvan Cojocaru

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC V11 2/5] xen: Optimize introspection access to guest state
  2014-09-03 13:10     ` Razvan Cojocaru
@ 2014-09-03 13:38       ` Ian Campbell
       [not found]         ` <54071A17.9040904@bitdefender.com>
  0 siblings, 1 reply; 10+ messages in thread
From: Ian Campbell @ 2014-09-03 13:38 UTC (permalink / raw)
  To: Razvan Cojocaru
  Cc: kevin.tian, eddie.dong, stefano.stabellini, jun.nakajima,
	andrew.cooper3, tim, JBeulich, xen-devel, ian.jackson

On Wed, 2014-09-03 at 16:10 +0300, Razvan Cojocaru wrote:
> (in the meantime it's become [PATCH V2]).

I saw, it was rather confusing to go from RFC v11 back to v1, v2 etc.
Normally we would consider the RFC tags etc to be independent from the
version (i.e. the version continues counting when tags are
added/removed).

Ian.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC V11 2/5] xen: Optimize introspection access to guest state
       [not found]         ` <54071A17.9040904@bitdefender.com>
@ 2014-09-03 13:41           ` Razvan Cojocaru
  2014-09-03 13:54             ` Ian Campbell
  0 siblings, 1 reply; 10+ messages in thread
From: Razvan Cojocaru @ 2014-09-03 13:41 UTC (permalink / raw)
  To: Ian Campbell
  Cc: kevin.tian, eddie.dong, stefano.stabellini, andrew.cooper3, tim,
	JBeulich, jun.nakajima, xen-devel, ian.jackson

On 09/03/2014 04:39 PM, Razvan Cojocaru wrote:
> On 09/03/2014 04:38 PM, Ian Campbell wrote:
>> On Wed, 2014-09-03 at 16:10 +0300, Razvan Cojocaru wrote:
>>> (in the meantime it's become [PATCH V2]).
>>
>> I saw, it was rather confusing to go from RFC v11 back to v1, v2 etc.
>> Normally we would consider the RFC tags etc to be independent from the
>> version (i.e. the version continues counting when tags are
>> added/removed).

I didn't know that. Would you like me to move to a higher number (which
would that be now? 14?), or would that just add to the confusion at this
point?


Thanks,
Razvan Cojocaru

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC V11 2/5] xen: Optimize introspection access to guest state
  2014-09-03 13:41           ` Razvan Cojocaru
@ 2014-09-03 13:54             ` Ian Campbell
  0 siblings, 0 replies; 10+ messages in thread
From: Ian Campbell @ 2014-09-03 13:54 UTC (permalink / raw)
  To: Razvan Cojocaru
  Cc: kevin.tian, eddie.dong, stefano.stabellini, andrew.cooper3, tim,
	JBeulich, jun.nakajima, xen-devel, ian.jackson

On Wed, 2014-09-03 at 16:41 +0300, Razvan Cojocaru wrote:
> On 09/03/2014 04:39 PM, Razvan Cojocaru wrote:
> > On 09/03/2014 04:38 PM, Ian Campbell wrote:
> >> On Wed, 2014-09-03 at 16:10 +0300, Razvan Cojocaru wrote:
> >>> (in the meantime it's become [PATCH V2]).
> >>
> >> I saw, it was rather confusing to go from RFC v11 back to v1, v2 etc.
> >> Normally we would consider the RFC tags etc to be independent from the
> >> version (i.e. the version continues counting when tags are
> >> added/removed).
> 
> I didn't know that. Would you like me to move to a higher number (which
> would that be now? 14?), or would that just add to the confusion at this
> point?

I think I'm done ack-ing the (mostly trivial) toolstack bits for this
series so unless things change in that area I guess I won't be affected
either way.

Probably what's done is done, so you may as well continue on with v3
etc.

Ian.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-09-03 13:54 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-02 15:37 [PATCH RFC V11 1/5] xen: Emulate with no writes Razvan Cojocaru
2014-09-02 15:37 ` [PATCH RFC V11 2/5] xen: Optimize introspection access to guest state Razvan Cojocaru
2014-09-03 13:06   ` Ian Campbell
2014-09-03 13:10     ` Razvan Cojocaru
2014-09-03 13:38       ` Ian Campbell
     [not found]         ` <54071A17.9040904@bitdefender.com>
2014-09-03 13:41           ` Razvan Cojocaru
2014-09-03 13:54             ` Ian Campbell
2014-09-02 15:37 ` [PATCH RFC V11 3/5] xen, libxc: Force-enable relevant MSR events Razvan Cojocaru
2014-09-02 15:37 ` [PATCH RFC V11 4/5] xen, libxc: Request page fault injection via libxc Razvan Cojocaru
2014-09-02 15:37 ` [PATCH RFC V11 5/5] xen: Handle resumed instruction based on previous mem_event reply Razvan Cojocaru

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.