All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V4 0/5] Basic guest memory introspection support
@ 2014-09-05 10:01 Razvan Cojocaru
  2014-09-05 10:01 ` [PATCH V4 1/5] xen: Emulate with no writes Razvan Cojocaru
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Razvan Cojocaru @ 2014-09-05 10:01 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, keir, ian.campbell, stefano.stabellini, eddie.dong,
	ian.jackson, tim, jbeulich, jun.nakajima, andrew.cooper3

We need to be able to detect rootkits in HVM guests, in a way that
allows an application that runs in dom0 (or a similarly privileged
domain) to control what the guest is allowed to do once a threat is
detected. This has been done over the mem_event mechanism.

Here is a summary of the series:

a  1/5 xen: Emulate with no writes
a  2/5 xen: Optimize introspection access to guest state
A  3/5 xen, libxc: Force-enable relevant MSR events
a* 4/5 xen, libxc: Request page fault injection via libxc
a  5/5 xen: Handle resumed instruction based on previous mem_event reply

Key to symbols:
 *   Updated in this version of the series.
 +   New patch in this version.
 /   Updated but only to remove changes into a separate patch.
 -   Updated with style changes only.
 a   Acked/reviwed by one reviewer.
 A   Acked/reviwed by more than one reviewer.

This new version has modified the policy of the HVMOP_inject_trap
hypercall (patch 4/5) to return -EBUSY if a trap is pending on the
domain or any of its VCPUs. Also, added an Ack for patch 5/5.

We needed to:

1. Be able to execute the current instruction without allowing it to
write to memory. This is done based on new mem_event response
fields sent from the controlling application.

2. Have the guest as responsive as possible amid all the processing. So
we had to cache some information with each mem_event sent.

3. Not allow the hypervisor to disable sending information about
interesting MSR events.

4. Add an additional mem_event type, namely a VMCALL event - in order to
do its work, our application occasionally triggers VMCALLs in the guest
(not included in the current series, but included in the initial RFC
series).

5. Extend libxc to allow triggering page faults in the guest (in order
to be able to bring back in pages swapped out by the guest OS).


Thanks,
Razvan Cojocaru

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH V4 1/5] xen: Emulate with no writes
  2014-09-05 10:01 [PATCH V4 0/5] Basic guest memory introspection support Razvan Cojocaru
@ 2014-09-05 10:01 ` Razvan Cojocaru
  2014-09-05 10:01 ` [PATCH V4 2/5] xen: Optimize introspection access to guest state Razvan Cojocaru
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Razvan Cojocaru @ 2014-09-05 10:01 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, keir, ian.campbell, Razvan Cojocaru,
	stefano.stabellini, eddie.dong, ian.jackson, tim, jbeulich,
	jun.nakajima, andrew.cooper3

Added support for emulating an instruction with no memory writes.
Additionally, introduced hvm_emulate_one_full(), which inspects
possible return values from the hvm_emulate_one() functions
(EXCEPTION, UNHANDLEABLE) and acts on them.

Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
 xen/arch/x86/hvm/emulate.c        |  175 ++++++++++++++++++++++++++++++++++++-
 xen/include/asm-x86/hvm/emulate.h |    5 ++
 2 files changed, 177 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 86cf432..6ab06e0 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -690,6 +690,94 @@ static int hvmemul_write(
     return X86EMUL_OKAY;
 }
 
+static int hvmemul_write_discard(
+    enum x86_segment seg,
+    unsigned long offset,
+    void *p_data,
+    unsigned int bytes,
+    struct x86_emulate_ctxt *ctxt)
+{
+    /* Discarding the write. */
+    return X86EMUL_OKAY;
+}
+
+static int hvmemul_rep_ins_discard(
+    uint16_t src_port,
+    enum x86_segment dst_seg,
+    unsigned long dst_offset,
+    unsigned int bytes_per_rep,
+    unsigned long *reps,
+    struct x86_emulate_ctxt *ctxt)
+{
+    return X86EMUL_OKAY;
+}
+
+static int hvmemul_rep_movs_discard(
+   enum x86_segment src_seg,
+   unsigned long src_offset,
+   enum x86_segment dst_seg,
+   unsigned long dst_offset,
+   unsigned int bytes_per_rep,
+   unsigned long *reps,
+   struct x86_emulate_ctxt *ctxt)
+{
+    return X86EMUL_OKAY;
+}
+
+static int hvmemul_rep_outs_discard(
+    enum x86_segment src_seg,
+    unsigned long src_offset,
+    uint16_t dst_port,
+    unsigned int bytes_per_rep,
+    unsigned long *reps,
+    struct x86_emulate_ctxt *ctxt)
+{
+    return X86EMUL_OKAY;
+}
+
+static int hvmemul_cmpxchg_discard(
+    enum x86_segment seg,
+    unsigned long offset,
+    void *p_old,
+    void *p_new,
+    unsigned int bytes,
+    struct x86_emulate_ctxt *ctxt)
+{
+    return X86EMUL_OKAY;
+}
+
+static int hvmemul_read_io_discard(
+    unsigned int port,
+    unsigned int bytes,
+    unsigned long *val,
+    struct x86_emulate_ctxt *ctxt)
+{
+    return X86EMUL_OKAY;
+}
+
+static int hvmemul_write_io_discard(
+    unsigned int port,
+    unsigned int bytes,
+    unsigned long val,
+    struct x86_emulate_ctxt *ctxt)
+{
+    return X86EMUL_OKAY;
+}
+
+static int hvmemul_write_msr_discard(
+    unsigned long reg,
+    uint64_t val,
+    struct x86_emulate_ctxt *ctxt)
+{
+    return X86EMUL_OKAY;
+}
+
+static int hvmemul_wbinvd_discard(
+    struct x86_emulate_ctxt *ctxt)
+{
+    return X86EMUL_OKAY;
+}
+
 static int hvmemul_cmpxchg(
     enum x86_segment seg,
     unsigned long offset,
@@ -1140,8 +1228,33 @@ static const struct x86_emulate_ops hvm_emulate_ops = {
     .invlpg        = hvmemul_invlpg
 };
 
-int hvm_emulate_one(
-    struct hvm_emulate_ctxt *hvmemul_ctxt)
+static const struct x86_emulate_ops hvm_emulate_ops_no_write = {
+    .read          = hvmemul_read,
+    .insn_fetch    = hvmemul_insn_fetch,
+    .write         = hvmemul_write_discard,
+    .cmpxchg       = hvmemul_cmpxchg_discard,
+    .rep_ins       = hvmemul_rep_ins_discard,
+    .rep_outs      = hvmemul_rep_outs_discard,
+    .rep_movs      = hvmemul_rep_movs_discard,
+    .read_segment  = hvmemul_read_segment,
+    .write_segment = hvmemul_write_segment,
+    .read_io       = hvmemul_read_io_discard,
+    .write_io      = hvmemul_write_io_discard,
+    .read_cr       = hvmemul_read_cr,
+    .write_cr      = hvmemul_write_cr,
+    .read_msr      = hvmemul_read_msr,
+    .write_msr     = hvmemul_write_msr_discard,
+    .wbinvd        = hvmemul_wbinvd_discard,
+    .cpuid         = hvmemul_cpuid,
+    .inject_hw_exception = hvmemul_inject_hw_exception,
+    .inject_sw_interrupt = hvmemul_inject_sw_interrupt,
+    .get_fpu       = hvmemul_get_fpu,
+    .put_fpu       = hvmemul_put_fpu,
+    .invlpg        = hvmemul_invlpg
+};
+
+static int _hvm_emulate_one(struct hvm_emulate_ctxt *hvmemul_ctxt,
+    const struct x86_emulate_ops *ops)
 {
     struct cpu_user_regs *regs = hvmemul_ctxt->ctxt.regs;
     struct vcpu *curr = current;
@@ -1193,7 +1306,7 @@ int hvm_emulate_one(
     vio->mmio_retrying = vio->mmio_retry;
     vio->mmio_retry = 0;
 
-    rc = x86_emulate(&hvmemul_ctxt->ctxt, &hvm_emulate_ops);
+    rc = x86_emulate(&hvmemul_ctxt->ctxt, ops);
 
     if ( rc == X86EMUL_OKAY && vio->mmio_retry )
         rc = X86EMUL_RETRY;
@@ -1241,6 +1354,62 @@ int hvm_emulate_one(
     return X86EMUL_OKAY;
 }
 
+int hvm_emulate_one(
+    struct hvm_emulate_ctxt *hvmemul_ctxt)
+{
+    return _hvm_emulate_one(hvmemul_ctxt, &hvm_emulate_ops);
+}
+
+int hvm_emulate_one_no_write(
+    struct hvm_emulate_ctxt *hvmemul_ctxt)
+{
+    return _hvm_emulate_one(hvmemul_ctxt, &hvm_emulate_ops_no_write);
+}
+
+void hvm_mem_event_emulate_one(bool_t nowrite, unsigned int trapnr,
+    unsigned int errcode)
+{
+    struct hvm_emulate_ctxt ctx = {{ 0 }};
+    int rc;
+
+    hvm_emulate_prepare(&ctx, guest_cpu_user_regs());
+
+    if ( nowrite )
+        rc = hvm_emulate_one_no_write(&ctx);
+    else
+        rc = hvm_emulate_one(&ctx);
+
+    switch ( rc )
+    {
+    case X86EMUL_RETRY:
+        /*
+         * This function is called when handling an EPT-related mem_event
+         * reply. As such, nothing else needs to be done here, since simply
+         * returning makes the current instruction cause a page fault again,
+         * consistent with X86EMUL_RETRY.
+         */
+        return;
+    case X86EMUL_UNHANDLEABLE:
+        gdprintk(XENLOG_DEBUG, "Emulation failed @ %04x:%lx: "
+               "%02x %02x %02x %02x %02x %02x %02x %02x %02x %02x\n",
+               hvmemul_get_seg_reg(x86_seg_cs, &ctx)->sel,
+               ctx.insn_buf_eip,
+               ctx.insn_buf[0], ctx.insn_buf[1],
+               ctx.insn_buf[2], ctx.insn_buf[3],
+               ctx.insn_buf[4], ctx.insn_buf[5],
+               ctx.insn_buf[6], ctx.insn_buf[7],
+               ctx.insn_buf[8], ctx.insn_buf[9]);
+        hvm_inject_hw_exception(trapnr, errcode);
+        break;
+    case X86EMUL_EXCEPTION:
+        if ( ctx.exn_pending )
+            hvm_inject_hw_exception(ctx.exn_vector, ctx.exn_error_code);
+        break;
+    }
+
+    hvm_emulate_writeback(&ctx);
+}
+
 void hvm_emulate_prepare(
     struct hvm_emulate_ctxt *hvmemul_ctxt,
     struct cpu_user_regs *regs)
diff --git a/xen/include/asm-x86/hvm/emulate.h b/xen/include/asm-x86/hvm/emulate.h
index 00a06cc..efff97e 100644
--- a/xen/include/asm-x86/hvm/emulate.h
+++ b/xen/include/asm-x86/hvm/emulate.h
@@ -37,6 +37,11 @@ struct hvm_emulate_ctxt {
 
 int hvm_emulate_one(
     struct hvm_emulate_ctxt *hvmemul_ctxt);
+int hvm_emulate_one_no_write(
+    struct hvm_emulate_ctxt *hvmemul_ctxt);
+void hvm_mem_event_emulate_one(bool_t nowrite,
+    unsigned int trapnr,
+    unsigned int errcode);
 void hvm_emulate_prepare(
     struct hvm_emulate_ctxt *hvmemul_ctxt,
     struct cpu_user_regs *regs);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH V4 2/5] xen: Optimize introspection access to guest state
  2014-09-05 10:01 [PATCH V4 0/5] Basic guest memory introspection support Razvan Cojocaru
  2014-09-05 10:01 ` [PATCH V4 1/5] xen: Emulate with no writes Razvan Cojocaru
@ 2014-09-05 10:01 ` Razvan Cojocaru
  2014-09-05 10:01 ` [PATCH V4 3/5] xen, libxc: Force-enable relevant MSR events Razvan Cojocaru
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Razvan Cojocaru @ 2014-09-05 10:01 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, keir, ian.campbell, Razvan Cojocaru,
	stefano.stabellini, eddie.dong, ian.jackson, tim, jbeulich,
	jun.nakajima, andrew.cooper3

Speed optimization for introspection purposes: a handful of registers
are sent along with each mem_event. This requires enlargement of the
mem_event_request / mem_event_response stuctures, and additional code
to fill in relevant values. Since the EPT event processing code needs
more data than CR3 or MSR event processors, hvm_mem_event_fill_regs()
fills in less data than p2m_mem_event_fill_regs(), in order to avoid
overhead. Struct hvm_hw_cpu has been considered instead of the custom
struct mem_event_regs_st, but its size would cause quick filling up
of the mem_event ring buffer.

Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
 xen/arch/x86/hvm/hvm.c         |   33 +++++++++++++++++++++++
 xen/arch/x86/mm/p2m.c          |   57 ++++++++++++++++++++++++++++++++++++++++
 xen/include/public/mem_event.h |   39 +++++++++++++++++++++++++++
 3 files changed, 129 insertions(+)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 8d905d3..bb45593 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -6149,6 +6149,38 @@ int hvm_debug_op(struct vcpu *v, int32_t op)
     return rc;
 }
 
+static void hvm_mem_event_fill_regs(mem_event_request_t *req)
+{
+    const struct cpu_user_regs *regs = guest_cpu_user_regs();
+    const struct vcpu *curr = current;
+
+    req->x86_regs.rax = regs->eax;
+    req->x86_regs.rcx = regs->ecx;
+    req->x86_regs.rdx = regs->edx;
+    req->x86_regs.rbx = regs->ebx;
+    req->x86_regs.rsp = regs->esp;
+    req->x86_regs.rbp = regs->ebp;
+    req->x86_regs.rsi = regs->esi;
+    req->x86_regs.rdi = regs->edi;
+
+    req->x86_regs.r8  = regs->r8;
+    req->x86_regs.r9  = regs->r9;
+    req->x86_regs.r10 = regs->r10;
+    req->x86_regs.r11 = regs->r11;
+    req->x86_regs.r12 = regs->r12;
+    req->x86_regs.r13 = regs->r13;
+    req->x86_regs.r14 = regs->r14;
+    req->x86_regs.r15 = regs->r15;
+
+    req->x86_regs.rflags = regs->eflags;
+    req->x86_regs.rip    = regs->eip;
+
+    req->x86_regs.msr_efer = curr->arch.hvm_vcpu.guest_efer;
+    req->x86_regs.cr0 = curr->arch.hvm_vcpu.guest_cr[0];
+    req->x86_regs.cr3 = curr->arch.hvm_vcpu.guest_cr[3];
+    req->x86_regs.cr4 = curr->arch.hvm_vcpu.guest_cr[4];
+}
+
 static int hvm_memory_event_traps(long p, uint32_t reason,
                                   unsigned long value, unsigned long old, 
                                   bool_t gla_valid, unsigned long gla) 
@@ -6193,6 +6225,7 @@ static int hvm_memory_event_traps(long p, uint32_t reason,
         req.gla = old;
     }
     
+    hvm_mem_event_fill_regs(&req);
     mem_event_put_request(d, &d->mem_event->access, &req);
     
     return 1;
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 32776c3..d0962aa 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1327,6 +1327,61 @@ void p2m_mem_paging_resume(struct domain *d)
     }
 }
 
+static void p2m_mem_event_fill_regs(mem_event_request_t *req)
+{
+    const struct cpu_user_regs *regs = guest_cpu_user_regs();
+    struct segment_register seg;
+    struct hvm_hw_cpu ctxt;
+    struct vcpu *curr = current;
+
+    /* Architecture-specific vmcs/vmcb bits */
+    hvm_funcs.save_cpu_ctxt(curr, &ctxt);
+
+    req->x86_regs.rax = regs->eax;
+    req->x86_regs.rcx = regs->ecx;
+    req->x86_regs.rdx = regs->edx;
+    req->x86_regs.rbx = regs->ebx;
+    req->x86_regs.rsp = regs->esp;
+    req->x86_regs.rbp = regs->ebp;
+    req->x86_regs.rsi = regs->esi;
+    req->x86_regs.rdi = regs->edi;
+
+    req->x86_regs.r8  = regs->r8;
+    req->x86_regs.r9  = regs->r9;
+    req->x86_regs.r10 = regs->r10;
+    req->x86_regs.r11 = regs->r11;
+    req->x86_regs.r12 = regs->r12;
+    req->x86_regs.r13 = regs->r13;
+    req->x86_regs.r14 = regs->r14;
+    req->x86_regs.r15 = regs->r15;
+
+    req->x86_regs.rflags = regs->eflags;
+    req->x86_regs.rip    = regs->eip;
+
+    req->x86_regs.dr7 = curr->arch.debugreg[7];
+    req->x86_regs.cr0 = ctxt.cr0;
+    req->x86_regs.cr2 = ctxt.cr2;
+    req->x86_regs.cr3 = ctxt.cr3;
+    req->x86_regs.cr4 = ctxt.cr4;
+
+    req->x86_regs.sysenter_cs = ctxt.sysenter_cs;
+    req->x86_regs.sysenter_esp = ctxt.sysenter_esp;
+    req->x86_regs.sysenter_eip = ctxt.sysenter_eip;
+
+    req->x86_regs.msr_efer = ctxt.msr_efer;
+    req->x86_regs.msr_star = ctxt.msr_star;
+    req->x86_regs.msr_lstar = ctxt.msr_lstar;
+
+    hvm_get_segment_register(curr, x86_seg_fs, &seg);
+    req->x86_regs.fs_base = seg.base;
+
+    hvm_get_segment_register(curr, x86_seg_gs, &seg);
+    req->x86_regs.gs_base = seg.base;
+
+    hvm_get_segment_register(curr, x86_seg_cs, &seg);
+    req->x86_regs.cs_arbytes = seg.attr.bytes;
+}
+
 bool_t p2m_mem_access_check(paddr_t gpa, unsigned long gla,
                             struct npfec npfec,
                             mem_event_request_t **req_ptr)
@@ -1417,6 +1472,8 @@ bool_t p2m_mem_access_check(paddr_t gpa, unsigned long gla,
         req->access_w = npfec.write_access;
         req->access_x = npfec.insn_fetch;
         req->vcpu_id = v->vcpu_id;
+
+        p2m_mem_event_fill_regs(req);
     }
 
     /* Pause the current VCPU */
diff --git a/xen/include/public/mem_event.h b/xen/include/public/mem_event.h
index fc12697..d3dd9c6 100644
--- a/xen/include/public/mem_event.h
+++ b/xen/include/public/mem_event.h
@@ -48,6 +48,44 @@
 #define MEM_EVENT_REASON_MSR         7    /* MSR was hit: gfn is MSR value, gla is MSR address;
                                              does NOT honour HVMPME_onchangeonly */
 
+/* Using a custom struct (not hvm_hw_cpu) so as to not fill
+ * the mem_event ring buffer too quickly. */
+struct mem_event_regs_x86 {
+    uint64_t rax;
+    uint64_t rcx;
+    uint64_t rdx;
+    uint64_t rbx;
+    uint64_t rsp;
+    uint64_t rbp;
+    uint64_t rsi;
+    uint64_t rdi;
+    uint64_t r8;
+    uint64_t r9;
+    uint64_t r10;
+    uint64_t r11;
+    uint64_t r12;
+    uint64_t r13;
+    uint64_t r14;
+    uint64_t r15;
+    uint64_t rflags;
+    uint64_t dr7;
+    uint64_t rip;
+    uint64_t cr0;
+    uint64_t cr2;
+    uint64_t cr3;
+    uint64_t cr4;
+    uint64_t sysenter_cs;
+    uint64_t sysenter_esp;
+    uint64_t sysenter_eip;
+    uint64_t msr_efer;
+    uint64_t msr_star;
+    uint64_t msr_lstar;
+    uint64_t fs_base;
+    uint64_t gs_base;
+    uint32_t cs_arbytes;
+    uint32_t _pad;
+};
+
 typedef struct mem_event_st {
     uint32_t flags;
     uint32_t vcpu_id;
@@ -67,6 +105,7 @@ typedef struct mem_event_st {
     uint16_t available:10;
 
     uint16_t reason;
+    struct mem_event_regs_x86 x86_regs;
 } mem_event_request_t, mem_event_response_t;
 
 DEFINE_RING_TYPES(mem_event, mem_event_request_t, mem_event_response_t);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH V4 3/5] xen, libxc: Force-enable relevant MSR events
  2014-09-05 10:01 [PATCH V4 0/5] Basic guest memory introspection support Razvan Cojocaru
  2014-09-05 10:01 ` [PATCH V4 1/5] xen: Emulate with no writes Razvan Cojocaru
  2014-09-05 10:01 ` [PATCH V4 2/5] xen: Optimize introspection access to guest state Razvan Cojocaru
@ 2014-09-05 10:01 ` Razvan Cojocaru
  2014-09-05 10:01 ` [PATCH V4 4/5] xen, libxc: Request page fault injection via libxc Razvan Cojocaru
  2014-09-05 10:01 ` [PATCH V4 5/5] xen: Handle resumed instruction based on previous mem_event reply Razvan Cojocaru
  4 siblings, 0 replies; 13+ messages in thread
From: Razvan Cojocaru @ 2014-09-05 10:01 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, keir, ian.campbell, Razvan Cojocaru,
	stefano.stabellini, eddie.dong, ian.jackson, tim, jbeulich,
	jun.nakajima, andrew.cooper3

Vmx_disable_intercept_for_msr() will now refuse to disable interception of
MSRs needed for memory introspection. It is not possible to gate this on
mem_access being active for the domain, since by the time mem_access does
become active the interception for the interesting MSRs has already been
disabled (vmx_disable_intercept_for_msr() runs very early on).

Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
---
 tools/libxc/xc_mem_access.c        |   10 +++++++++-
 tools/libxc/xc_mem_event.c         |    7 +++++--
 tools/libxc/xc_private.h           |    2 +-
 tools/libxc/xenctrl.h              |    2 ++
 xen/arch/x86/hvm/vmx/vmcs.c        |   25 +++++++++++++++++++++++++
 xen/arch/x86/hvm/vmx/vmx.c         |   13 +++++++++++++
 xen/arch/x86/mm/mem_event.c        |   11 +++++++++++
 xen/include/asm-x86/hvm/domain.h   |    1 +
 xen/include/asm-x86/hvm/hvm.h      |    2 ++
 xen/include/asm-x86/hvm/vmx/vmcs.h |    7 +++++++
 xen/include/public/domctl.h        |    7 ++++---
 11 files changed, 80 insertions(+), 7 deletions(-)

diff --git a/tools/libxc/xc_mem_access.c b/tools/libxc/xc_mem_access.c
index 461f0e9..55d0e9f 100644
--- a/tools/libxc/xc_mem_access.c
+++ b/tools/libxc/xc_mem_access.c
@@ -26,7 +26,15 @@
 
 void *xc_mem_access_enable(xc_interface *xch, domid_t domain_id, uint32_t *port)
 {
-    return xc_mem_event_enable(xch, domain_id, HVM_PARAM_ACCESS_RING_PFN, port);
+    return xc_mem_event_enable(xch, domain_id, HVM_PARAM_ACCESS_RING_PFN,
+                               port, 0);
+}
+
+void *xc_mem_access_enable_introspection(xc_interface *xch, domid_t domain_id,
+                                         uint32_t *port)
+{
+    return xc_mem_event_enable(xch, domain_id, HVM_PARAM_ACCESS_RING_PFN,
+                               port, 1);
 }
 
 int xc_mem_access_disable(xc_interface *xch, domid_t domain_id)
diff --git a/tools/libxc/xc_mem_event.c b/tools/libxc/xc_mem_event.c
index faf1cc6..8c0be4e 100644
--- a/tools/libxc/xc_mem_event.c
+++ b/tools/libxc/xc_mem_event.c
@@ -57,7 +57,7 @@ int xc_mem_event_memop(xc_interface *xch, domid_t domain_id,
 }
 
 void *xc_mem_event_enable(xc_interface *xch, domid_t domain_id, int param,
-                          uint32_t *port)
+                          uint32_t *port, int enable_introspection)
 {
     void *ring_page = NULL;
     uint64_t pfn;
@@ -120,7 +120,10 @@ void *xc_mem_event_enable(xc_interface *xch, domid_t domain_id, int param,
         break;
 
     case HVM_PARAM_ACCESS_RING_PFN:
-        op = XEN_DOMCTL_MEM_EVENT_OP_ACCESS_ENABLE;
+        if ( enable_introspection )
+            op = XEN_DOMCTL_MEM_EVENT_OP_ACCESS_ENABLE_INTROSPECTION;
+        else
+            op = XEN_DOMCTL_MEM_EVENT_OP_ACCESS_ENABLE;
         mode = XEN_DOMCTL_MEM_EVENT_OP_ACCESS;
         break;
 
diff --git a/tools/libxc/xc_private.h b/tools/libxc/xc_private.h
index c50a7c9..94df688 100644
--- a/tools/libxc/xc_private.h
+++ b/tools/libxc/xc_private.h
@@ -381,6 +381,6 @@ int xc_mem_event_memop(xc_interface *xch, domid_t domain_id,
  * param can be HVM_PARAM_PAGING/ACCESS/SHARING_RING_PFN
  */
 void *xc_mem_event_enable(xc_interface *xch, domid_t domain_id, int param,
-                          uint32_t *port);
+                          uint32_t *port, int enable_introspection);
 
 #endif /* __XC_PRIVATE_H__ */
diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
index 1c5d0db..28b5562 100644
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -2263,6 +2263,8 @@ int xc_mem_paging_load(xc_interface *xch, domid_t domain_id,
  * Caller has to unmap this page when done.
  */
 void *xc_mem_access_enable(xc_interface *xch, domid_t domain_id, uint32_t *port);
+void *xc_mem_access_enable_introspection(xc_interface *xch, domid_t domain_id,
+                                         uint32_t *port);
 int xc_mem_access_disable(xc_interface *xch, domid_t domain_id);
 int xc_mem_access_resume(xc_interface *xch, domid_t domain_id);
 
diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 4a4f4e1..fc1f882 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -39,6 +39,7 @@
 #include <xen/keyhandler.h>
 #include <asm/shadow.h>
 #include <asm/tboot.h>
+#include <asm/mem_event.h>
 
 static bool_t __read_mostly opt_vpid_enabled = 1;
 boolean_param("vpid", opt_vpid_enabled);
@@ -71,6 +72,18 @@ u32 vmx_vmexit_control __read_mostly;
 u32 vmx_vmentry_control __read_mostly;
 u64 vmx_ept_vpid_cap __read_mostly;
 
+const u32 vmx_introspection_force_enabled_msrs[] = {
+    MSR_IA32_SYSENTER_EIP,
+    MSR_IA32_SYSENTER_ESP,
+    MSR_IA32_SYSENTER_CS,
+    MSR_IA32_MC0_CTL,
+    MSR_STAR,
+    MSR_LSTAR
+};
+
+const unsigned int vmx_introspection_force_enabled_msrs_size =
+    ARRAY_SIZE(vmx_introspection_force_enabled_msrs);
+
 static DEFINE_PER_CPU_READ_MOSTLY(struct vmcs_struct *, vmxon_region);
 static DEFINE_PER_CPU(struct vmcs_struct *, current_vmcs);
 static DEFINE_PER_CPU(struct list_head, active_vmcs_list);
@@ -695,11 +708,23 @@ static void vmx_set_host_env(struct vcpu *v)
 void vmx_disable_intercept_for_msr(struct vcpu *v, u32 msr, int type)
 {
     unsigned long *msr_bitmap = v->arch.hvm_vmx.msr_bitmap;
+    struct domain *d = v->domain;
 
     /* VMX MSR bitmap supported? */
     if ( msr_bitmap == NULL )
         return;
 
+    if ( unlikely(d->arch.hvm_domain.introspection_enabled) &&
+         mem_event_check_ring(&d->mem_event->access) )
+    {
+        unsigned int i;
+
+        /* Filter out MSR-s needed for memory introspection */
+        for ( i = 0; i < vmx_introspection_force_enabled_msrs_size; i++ )
+            if ( msr == vmx_introspection_force_enabled_msrs[i] )
+                return;
+    }
+
     /*
      * See Intel PRM Vol. 3, 20.6.9 (MSR-Bitmap Address). Early manuals
      * have the write-low and read-high bitmap offsets the wrong way round.
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 26b1ad5..b08664e 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1683,6 +1683,18 @@ void vmx_hypervisor_cpuid_leaf(uint32_t sub_idx,
         *eax |= XEN_HVM_CPUID_X2APIC_VIRT;
 }
 
+static void vmx_enable_msr_exit_interception(struct domain *d)
+{
+    struct vcpu *v;
+    unsigned int i;
+
+    /* Enable interception for MSRs needed for memory introspection. */
+    for_each_vcpu ( d, v )
+        for ( i = 0; i < vmx_introspection_force_enabled_msrs_size; i++ )
+            vmx_enable_intercept_for_msr(v, vmx_introspection_force_enabled_msrs[i],
+                                         MSR_TYPE_W);
+}
+
 static struct hvm_function_table __initdata vmx_function_table = {
     .name                 = "VMX",
     .cpu_up_prepare       = vmx_cpu_up_prepare,
@@ -1741,6 +1753,7 @@ static struct hvm_function_table __initdata vmx_function_table = {
     .handle_eoi           = vmx_handle_eoi,
     .nhvm_hap_walk_L1_p2m = nvmx_hap_walk_L1_p2m,
     .hypervisor_cpuid_leaf = vmx_hypervisor_cpuid_leaf,
+    .enable_msr_exit_interception = vmx_enable_msr_exit_interception,
 };
 
 const struct hvm_function_table * __init start_vmx(void)
diff --git a/xen/arch/x86/mm/mem_event.c b/xen/arch/x86/mm/mem_event.c
index ba7e71e..fdd5ff6 100644
--- a/xen/arch/x86/mm/mem_event.c
+++ b/xen/arch/x86/mm/mem_event.c
@@ -587,6 +587,7 @@ int mem_event_domctl(struct domain *d, xen_domctl_mem_event_op_t *mec,
         switch( mec->op )
         {
         case XEN_DOMCTL_MEM_EVENT_OP_ACCESS_ENABLE:
+        case XEN_DOMCTL_MEM_EVENT_OP_ACCESS_ENABLE_INTROSPECTION:
         {
             rc = -ENODEV;
             /* Only HAP is supported */
@@ -600,13 +601,23 @@ int mem_event_domctl(struct domain *d, xen_domctl_mem_event_op_t *mec,
             rc = mem_event_enable(d, mec, med, _VPF_mem_access, 
                                     HVM_PARAM_ACCESS_RING_PFN,
                                     mem_access_notification);
+
+            if ( mec->op != XEN_DOMCTL_MEM_EVENT_OP_ACCESS_ENABLE &&
+                 rc == 0 && hvm_funcs.enable_msr_exit_interception )
+            {
+                d->arch.hvm_domain.introspection_enabled = 1;
+                hvm_funcs.enable_msr_exit_interception(d);
+            }
         }
         break;
 
         case XEN_DOMCTL_MEM_EVENT_OP_ACCESS_DISABLE:
         {
             if ( med->ring_page )
+            {
                 rc = mem_event_disable(d, med);
+                d->arch.hvm_domain.introspection_enabled = 0;
+            }
         }
         break;
 
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index 291a2e0..30d4aa3 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -134,6 +134,7 @@ struct hvm_domain {
     bool_t                 mem_sharing_enabled;
     bool_t                 qemu_mapcache_invalidate;
     bool_t                 is_s3_suspended;
+    bool_t                 introspection_enabled;
 
     /*
      * TSC value that VCPUs use to calculate their tsc_offset value.
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 1123857..121d053 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -205,6 +205,8 @@ struct hvm_function_table {
     void (*hypervisor_cpuid_leaf)(uint32_t sub_idx,
                                   uint32_t *eax, uint32_t *ebx,
                                   uint32_t *ecx, uint32_t *edx);
+
+    void (*enable_msr_exit_interception)(struct domain *d);
 };
 
 extern struct hvm_function_table hvm_funcs;
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 215d93c..6a99dca 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -471,6 +471,13 @@ enum vmcs_field {
     HOST_RIP                        = 0x00006c16,
 };
 
+/*
+ * A set of MSR-s that need to be enabled for memory introspection
+ * to work.
+ */
+extern const u32 vmx_introspection_force_enabled_msrs[];
+extern const unsigned int vmx_introspection_force_enabled_msrs_size;
+
 #define VMCS_VPID_WIDTH 16
 
 #define MSR_TYPE_R 1
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 5b11bbf..a3431ec 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -772,10 +772,11 @@ struct xen_domctl_gdbsx_domstatus {
  * ENODEV - host lacks HAP support (EPT/NPT) or HAP is disabled in guest
  * EBUSY  - guest has or had access enabled, ring buffer still active
  */
-#define XEN_DOMCTL_MEM_EVENT_OP_ACCESS            2
+#define XEN_DOMCTL_MEM_EVENT_OP_ACCESS                        2
 
-#define XEN_DOMCTL_MEM_EVENT_OP_ACCESS_ENABLE     0
-#define XEN_DOMCTL_MEM_EVENT_OP_ACCESS_DISABLE    1
+#define XEN_DOMCTL_MEM_EVENT_OP_ACCESS_ENABLE                 0
+#define XEN_DOMCTL_MEM_EVENT_OP_ACCESS_DISABLE                1
+#define XEN_DOMCTL_MEM_EVENT_OP_ACCESS_ENABLE_INTROSPECTION   2
 
 /*
  * Sharing ENOMEM helper.
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH V4 4/5] xen, libxc: Request page fault injection via libxc
  2014-09-05 10:01 [PATCH V4 0/5] Basic guest memory introspection support Razvan Cojocaru
                   ` (2 preceding siblings ...)
  2014-09-05 10:01 ` [PATCH V4 3/5] xen, libxc: Force-enable relevant MSR events Razvan Cojocaru
@ 2014-09-05 10:01 ` Razvan Cojocaru
  2014-09-05 16:46   ` Tian, Kevin
  2014-09-08 15:47   ` Jan Beulich
  2014-09-05 10:01 ` [PATCH V4 5/5] xen: Handle resumed instruction based on previous mem_event reply Razvan Cojocaru
  4 siblings, 2 replies; 13+ messages in thread
From: Razvan Cojocaru @ 2014-09-05 10:01 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, keir, ian.campbell, Razvan Cojocaru,
	stefano.stabellini, eddie.dong, ian.jackson, tim, jbeulich,
	jun.nakajima, andrew.cooper3

Extended HVMOP_inject_trap to allow asking for trap injection done
by the first available CPU, when it's in user mode and its CR3
matches the one for an interesting application inside the guest.
This mechanism allows bringing in swapped-out pages for inspection.

Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

---
Changes since V3:
 - Now returning -EBUSY if the hypercall tries to set per-domain
   injection data when there's a pending injection on any of the
   domain's CPUs.
 - Minor code cleanup.

Changes since V2:
 - Renamed hvm_is_pf_requested() to hvm_can_inject_domain_pf().
 - Modified comments to better reflect the introspection-related
   purpose of added code.
---
 tools/libxc/xc_misc.c               |    5 +-
 tools/libxc/xenctrl.h               |    3 +-
 tools/tests/xen-access/xen-access.c |    2 +-
 xen/arch/x86/hvm/hvm.c              |   97 ++++++++++++++++++++++++++++++-----
 xen/include/asm-x86/hvm/domain.h    |    3 ++
 xen/include/asm-x86/hvm/hvm.h       |    1 +
 xen/include/public/hvm/hvm_op.h     |    7 +++
 7 files changed, 100 insertions(+), 18 deletions(-)

diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index e253a58..6773446 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -597,7 +597,7 @@ int xc_hvm_set_mem_type(
 int xc_hvm_inject_trap(
     xc_interface *xch, domid_t dom, int vcpu, uint32_t vector,
     uint32_t type, uint32_t error_code, uint32_t insn_len,
-    uint64_t cr2)
+    uint64_t cr2, uint64_t cr3)
 {
     DECLARE_HYPERCALL;
     DECLARE_HYPERCALL_BUFFER(struct xen_hvm_inject_trap, arg);
@@ -611,12 +611,13 @@ int xc_hvm_inject_trap(
     }
 
     arg->domid       = dom;
-    arg->vcpuid      = vcpu;
+    arg->vcpuid      = (vcpu == -1 ? (uint32_t)~0 : vcpu);
     arg->vector      = vector;
     arg->type        = type;
     arg->error_code  = error_code;
     arg->insn_len    = insn_len;
     arg->cr2         = cr2;
+    arg->cr3         = cr3;
 
     hypercall.op     = __HYPERVISOR_hvm_op;
     hypercall.arg[0] = HVMOP_inject_trap;
diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
index 28b5562..5bf0173 100644
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -1816,11 +1816,12 @@ int xc_hvm_set_mem_type(
 /*
  * Injects a hardware/software CPU trap, to take effect the next time the HVM 
  * resumes. 
+ * Cr3 is only taken into account if vcpu == -1 (wildcard for "any vcpu").
  */
 int xc_hvm_inject_trap(
     xc_interface *xch, domid_t dom, int vcpu, uint32_t vector,
     uint32_t type, uint32_t error_code, uint32_t insn_len,
-    uint64_t cr2);
+    uint64_t cr2, uint64_t cr3);
 
 /*
  *  LOGGING AND ERROR REPORTING
diff --git a/tools/tests/xen-access/xen-access.c b/tools/tests/xen-access/xen-access.c
index 090df5f..34c53d2 100644
--- a/tools/tests/xen-access/xen-access.c
+++ b/tools/tests/xen-access/xen-access.c
@@ -601,7 +601,7 @@ int main(int argc, char *argv[])
                 /* Reinject */
                 rc = xc_hvm_inject_trap(
                     xch, domain_id, req.vcpu_id, 3,
-                    HVMOP_TRAP_sw_exc, -1, 0, 0);
+                    HVMOP_TRAP_sw_exc, -1, 0, 0, 0);
                 if (rc < 0)
                 {
                     ERROR("Error %d injecting int3\n", rc);
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index bb45593..9b4904c 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -420,6 +420,31 @@ static bool_t hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p)
     return 1;
 }
 
+static bool_t hvm_can_inject_domain_pf(struct vcpu *v)
+{
+    const struct domain *d = v->domain;
+    struct segment_register seg;
+    uint64_t mask;
+
+    hvm_get_segment_register(v, x86_seg_ss, &seg);
+
+    if ( seg.attr.fields.dpl != 3 ) /* Guest is not in user mode */
+        return 0;
+
+    if ( hvm_long_mode_enabled(v) )
+        mask = PADDR_MASK & PAGE_MASK; /* Bits 51:12. */
+    else if ( hvm_pae_enabled(v) )
+        mask = 0x00000000ffffffe0; /* Bits 31:5. */
+    else
+        mask = (uint32_t)PAGE_MASK; /* Bits 31:12. */
+
+    if ( (v->arch.hvm_vcpu.guest_cr[3] & mask) !=
+         (d->arch.hvm_domain.inject_trap.cr3 & mask) )
+        return 0;
+
+    return 1;
+}
+
 void hvm_do_resume(struct vcpu *v)
 {
     struct domain *d = v->domain;
@@ -451,11 +476,22 @@ void hvm_do_resume(struct vcpu *v)
     }
 
     /* Inject pending hw/sw trap */
-    if ( v->arch.hvm_vcpu.inject_trap.vector != -1 ) 
+    if ( v->arch.hvm_vcpu.inject_trap.vector != -1 )
     {
         hvm_inject_trap(&v->arch.hvm_vcpu.inject_trap);
         v->arch.hvm_vcpu.inject_trap.vector = -1;
     }
+    /*
+     * Inject per-domain pending hw/sw trap (this will most likely
+     * be a page fault injected by memory introspection code).
+     */
+    else if ( d->arch.hvm_domain.inject_trap.vector != -1 &&
+              hvm_can_inject_domain_pf(v) )
+    {
+        hvm_inject_trap(&d->arch.hvm_domain.inject_trap);
+        d->arch.hvm_domain.inject_trap.vector = -1;
+    }
+
 }
 
 static int hvm_alloc_ioreq_gmfn(struct domain *d, unsigned long *gmfn)
@@ -1473,9 +1509,10 @@ int hvm_domain_initialise(struct domain *d)
             printk(XENLOG_G_INFO "PVH guest must have HAP on\n");
             return -EINVAL;
         }
-
     }
 
+    d->arch.hvm_domain.inject_trap.vector = -1;
+
     spin_lock_init(&d->arch.hvm_domain.ioreq_server.lock);
     INIT_LIST_HEAD(&d->arch.hvm_domain.ioreq_server.list);
     spin_lock_init(&d->arch.hvm_domain.irq_lock);
@@ -6086,19 +6123,51 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
             goto param_fail8;
 
         rc = -ENOENT;
-        if ( tr.vcpuid >= d->max_vcpus || (v = d->vcpu[tr.vcpuid]) == NULL )
-            goto param_fail8;
-        
-        if ( v->arch.hvm_vcpu.inject_trap.vector != -1 )
-            rc = -EBUSY;
-        else 
+
+        if ( tr.vcpuid == (uint32_t)~0 ) /* Any VCPU. */
         {
-            v->arch.hvm_vcpu.inject_trap.vector = tr.vector;
-            v->arch.hvm_vcpu.inject_trap.type = tr.type;
-            v->arch.hvm_vcpu.inject_trap.error_code = tr.error_code;
-            v->arch.hvm_vcpu.inject_trap.insn_len = tr.insn_len;
-            v->arch.hvm_vcpu.inject_trap.cr2 = tr.cr2;
-            rc = 0;
+            unsigned int i;
+
+            for ( i = 0; i < d->max_vcpus; i++ )
+                if ( (v = d->vcpu[i]) != NULL &&
+                     v->arch.hvm_vcpu.inject_trap.vector != -1 )
+                {
+                    rc = -EBUSY;
+                    break;
+                }
+
+            if ( d->arch.hvm_domain.inject_trap.vector != -1 )
+                rc = -EBUSY;
+
+            if ( rc != -EBUSY )
+            {
+                d->arch.hvm_domain.inject_trap.vector = tr.vector;
+                d->arch.hvm_domain.inject_trap.type = tr.type;
+                d->arch.hvm_domain.inject_trap.error_code = tr.error_code;
+                d->arch.hvm_domain.inject_trap.insn_len = tr.insn_len;
+                d->arch.hvm_domain.inject_trap.cr2 = tr.cr2;
+                d->arch.hvm_domain.inject_trap.cr3 = tr.cr3;
+                rc = 0;
+            }
+        }
+        else /* Specific VCPU. */
+        {
+            if ( tr.vcpuid >= d->max_vcpus || (v = d->vcpu[tr.vcpuid]) == NULL )
+                goto param_fail8;
+
+            if ( v->arch.hvm_vcpu.inject_trap.vector != -1 ||
+                 d->arch.hvm_domain.inject_trap.vector != -1 )
+                rc = -EBUSY;
+            else
+            {
+                v->arch.hvm_vcpu.inject_trap.vector = tr.vector;
+                v->arch.hvm_vcpu.inject_trap.type = tr.type;
+                v->arch.hvm_vcpu.inject_trap.error_code = tr.error_code;
+                v->arch.hvm_vcpu.inject_trap.insn_len = tr.insn_len;
+                v->arch.hvm_vcpu.inject_trap.cr2 = tr.cr2;
+                v->arch.hvm_vcpu.inject_trap.cr3 = tr.cr3;
+                rc = 0;
+            }
         }
 
     param_fail8:
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index 30d4aa3..b432874 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -146,6 +146,9 @@ struct hvm_domain {
         struct vmx_domain vmx;
         struct svm_domain svm;
     };
+
+    /* Pending hw/sw interrupt (.vector = -1 means nothing pending). */
+    struct hvm_trap     inject_trap;
 };
 
 #define hap_enabled(d)  ((d)->arch.hvm_domain.hap_enabled)
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 121d053..3b0bde9 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -78,6 +78,7 @@ struct hvm_trap {
     int           error_code;   /* HVM_DELIVER_NO_ERROR_CODE if n/a */
     int           insn_len;     /* Instruction length */ 
     unsigned long cr2;          /* Only for TRAP_page_fault h/w exception */
+    unsigned long cr3;          /* Only for TRAP_page_fault h/w exception */
 };
 
 /*
diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
index eeb0a60..399a942 100644
--- a/xen/include/public/hvm/hvm_op.h
+++ b/xen/include/public/hvm/hvm_op.h
@@ -197,6 +197,13 @@ struct xen_hvm_inject_trap {
     uint32_t insn_len;
     /* CR2 for page faults */
     uint64_aligned_t cr2;
+    /*
+     * Only used if vcpuid == ~0 (wildcard for any VCPU).
+     * In that case, injection data is set per-domain, and any VCPU
+     * running a process with matching CR3 in user mode will inject
+     * the trap.
+     */
+    uint64_aligned_t cr3;
 };
 typedef struct xen_hvm_inject_trap xen_hvm_inject_trap_t;
 DEFINE_XEN_GUEST_HANDLE(xen_hvm_inject_trap_t);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH V4 5/5] xen: Handle resumed instruction based on previous mem_event reply
  2014-09-05 10:01 [PATCH V4 0/5] Basic guest memory introspection support Razvan Cojocaru
                   ` (3 preceding siblings ...)
  2014-09-05 10:01 ` [PATCH V4 4/5] xen, libxc: Request page fault injection via libxc Razvan Cojocaru
@ 2014-09-05 10:01 ` Razvan Cojocaru
  2014-09-08 15:50   ` Jan Beulich
  4 siblings, 1 reply; 13+ messages in thread
From: Razvan Cojocaru @ 2014-09-05 10:01 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, keir, ian.campbell, Razvan Cojocaru,
	stefano.stabellini, eddie.dong, ian.jackson, tim, jbeulich,
	jun.nakajima, andrew.cooper3

In a scenario where a page fault that triggered a mem_event occured,
p2m_mem_access_check() will now be able to either 1) emulate the
current instruction, or 2) emulate it, but don't allow it to perform
any writes.

Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
---
 xen/arch/x86/domain.c          |    3 ++
 xen/arch/x86/hvm/vmx/vmx.c     |   16 ++++++++
 xen/arch/x86/mm/p2m.c          |   84 ++++++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/domain.h   |    9 +++++
 xen/include/asm-x86/hvm/hvm.h  |    2 +
 xen/include/public/mem_event.h |   12 +++---
 6 files changed, 121 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index f7e0e78..7b1dfe6 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -415,6 +415,9 @@ int vcpu_initialise(struct vcpu *v)
 
     v->arch.flags = TF_kernel_mode;
 
+    /* By default, do not emulate */
+    v->arch.mem_event.emulate_flags = 0;
+
     rc = mapcache_vcpu_init(v);
     if ( rc )
         return rc;
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index b08664e..cc317a4 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1695,6 +1695,21 @@ static void vmx_enable_msr_exit_interception(struct domain *d)
                                          MSR_TYPE_W);
 }
 
+static bool_t vmx_exited_by_nested_pagefault(void)
+{
+    unsigned long exit_qualification, exit_reason;
+
+    __vmread(VM_EXIT_REASON, &exit_reason);
+    ASSERT(exit_reason == EXIT_REASON_EPT_VIOLATION);
+
+    __vmread(EXIT_QUALIFICATION, &exit_qualification);
+
+    if ( (exit_qualification & EPT_GLA_FAULT) == 0 )
+        return 0;
+
+    return 1;
+}
+
 static struct hvm_function_table __initdata vmx_function_table = {
     .name                 = "VMX",
     .cpu_up_prepare       = vmx_cpu_up_prepare,
@@ -1754,6 +1769,7 @@ static struct hvm_function_table __initdata vmx_function_table = {
     .nhvm_hap_walk_L1_p2m = nvmx_hap_walk_L1_p2m,
     .hypervisor_cpuid_leaf = vmx_hypervisor_cpuid_leaf,
     .enable_msr_exit_interception = vmx_enable_msr_exit_interception,
+    .exited_by_nested_pagefault  = vmx_exited_by_nested_pagefault,
 };
 
 const struct hvm_function_table * __init start_vmx(void)
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index d0962aa..616ebb4 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1395,6 +1395,7 @@ bool_t p2m_mem_access_check(paddr_t gpa, unsigned long gla,
     p2m_access_t p2ma;
     mem_event_request_t *req;
     int rc;
+    unsigned long eip = guest_cpu_user_regs()->eip;
 
     /* First, handle rx2rw conversion automatically.
      * These calls to p2m->set_entry() must succeed: we have the gfn
@@ -1447,6 +1448,36 @@ bool_t p2m_mem_access_check(paddr_t gpa, unsigned long gla,
             return 1;
         }
     }
+    else if ( v->arch.mem_event.emulate_flags == 0 &&
+              hvm_funcs.exited_by_nested_pagefault &&
+              !hvm_funcs.exited_by_nested_pagefault() ) /* don't send a mem_event */
+    {
+        v->arch.mem_event.emulate_flags = MEM_EVENT_FLAG_EMULATE;
+        v->arch.mem_event.gpa = gpa;
+        v->arch.mem_event.eip = eip;
+    }
+
+    /* The previous mem_event reply does not match the current state. */
+    if ( v->arch.mem_event.gpa != gpa || v->arch.mem_event.eip != eip )
+    {
+        /* Don't emulate the current instruction, send a new mem_event. */
+        v->arch.mem_event.emulate_flags = 0;
+
+        /* Make sure to mark the current state to match it again against
+         * the new mem_event about to be sent. */
+        v->arch.mem_event.gpa = gpa;
+        v->arch.mem_event.eip = eip;
+    }
+
+    if ( v->arch.mem_event.emulate_flags )
+    {
+        hvm_mem_event_emulate_one((v->arch.mem_event.emulate_flags &
+                                   MEM_EVENT_FLAG_EMULATE_NOWRITE) != 0,
+                                  TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
+
+        v->arch.mem_event.emulate_flags = 0;
+        return 1;
+    }
 
     *req_ptr = NULL;
     req = xzalloc(mem_event_request_t);
@@ -1502,6 +1533,59 @@ void p2m_mem_access_resume(struct domain *d)
 
         v = d->vcpu[rsp.vcpu_id];
 
+        /* Mark vcpu for skipping one instruction upon rescheduling */
+        if ( rsp.flags & MEM_EVENT_FLAG_EMULATE )
+        {
+            xenmem_access_t access;
+            bool_t violation = 1;
+
+            v->arch.mem_event.emulate_flags = 0;
+
+            if ( p2m_get_mem_access(d, rsp.gfn, &access) == 0 )
+            {
+                switch ( access )
+                {
+                case XENMEM_access_n:
+                case XENMEM_access_n2rwx:
+                default:
+                    violation = rsp.access_r || rsp.access_w || rsp.access_x;
+                    break;
+
+                case XENMEM_access_r:
+                    violation = rsp.access_w || rsp.access_x;
+                    break;
+
+                case XENMEM_access_w:
+                    violation = rsp.access_r || rsp.access_x;
+                    break;
+
+                case XENMEM_access_x:
+                    violation = rsp.access_r || rsp.access_w;
+                    break;
+
+                case XENMEM_access_rx:
+                case XENMEM_access_rx2rw:
+                    violation = rsp.access_w;
+                    break;
+
+                case XENMEM_access_wx:
+                    violation = rsp.access_r;
+                    break;
+
+                case XENMEM_access_rw:
+                    violation = rsp.access_x;
+                    break;
+
+                case XENMEM_access_rwx:
+                    violation = 0;
+                    break;
+                }
+            }
+
+            if ( violation )
+                v->arch.mem_event.emulate_flags = rsp.flags;
+        }
+
         /* Unpause domain */
         if ( rsp.flags & MEM_EVENT_FLAG_VCPU_PAUSED )
             mem_event_vcpu_unpause(v);
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 83329ed..440aa81 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -458,6 +458,15 @@ struct arch_vcpu
 
     /* A secondary copy of the vcpu time info. */
     XEN_GUEST_HANDLE(vcpu_time_info_t) time_info_guest;
+
+    /* Should we emulate the next matching instruction on VCPU resume
+     * after a mem_event? */
+    struct {
+        uint32_t emulate_flags;
+        unsigned long gpa;
+        unsigned long eip;
+    } mem_event;
+
 } __cacheline_aligned;
 
 smap_check_policy_t smap_policy_change(struct vcpu *v,
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 3b0bde9..9ab126c 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -208,6 +208,8 @@ struct hvm_function_table {
                                   uint32_t *ecx, uint32_t *edx);
 
     void (*enable_msr_exit_interception)(struct domain *d);
+
+    bool_t (*exited_by_nested_pagefault)(void);
 };
 
 extern struct hvm_function_table hvm_funcs;
diff --git a/xen/include/public/mem_event.h b/xen/include/public/mem_event.h
index d3dd9c6..92c063c 100644
--- a/xen/include/public/mem_event.h
+++ b/xen/include/public/mem_event.h
@@ -31,11 +31,13 @@
 #include "io/ring.h"
 
 /* Memory event flags */
-#define MEM_EVENT_FLAG_VCPU_PAUSED  (1 << 0)
-#define MEM_EVENT_FLAG_DROP_PAGE    (1 << 1)
-#define MEM_EVENT_FLAG_EVICT_FAIL   (1 << 2)
-#define MEM_EVENT_FLAG_FOREIGN      (1 << 3)
-#define MEM_EVENT_FLAG_DUMMY        (1 << 4)
+#define MEM_EVENT_FLAG_VCPU_PAUSED     (1 << 0)
+#define MEM_EVENT_FLAG_DROP_PAGE       (1 << 1)
+#define MEM_EVENT_FLAG_EVICT_FAIL      (1 << 2)
+#define MEM_EVENT_FLAG_FOREIGN         (1 << 3)
+#define MEM_EVENT_FLAG_DUMMY           (1 << 4)
+#define MEM_EVENT_FLAG_EMULATE         (1 << 5)
+#define MEM_EVENT_FLAG_EMULATE_NOWRITE (1 << 6)
 
 /* Reasons for the memory event request */
 #define MEM_EVENT_REASON_UNKNOWN     0    /* typical reason */
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH V4 4/5] xen, libxc: Request page fault injection via libxc
  2014-09-05 10:01 ` [PATCH V4 4/5] xen, libxc: Request page fault injection via libxc Razvan Cojocaru
@ 2014-09-05 16:46   ` Tian, Kevin
  2014-09-08 15:47   ` Jan Beulich
  1 sibling, 0 replies; 13+ messages in thread
From: Tian, Kevin @ 2014-09-05 16:46 UTC (permalink / raw)
  To: Razvan Cojocaru, xen-devel
  Cc: keir, ian.campbell, stefano.stabellini, Dong, Eddie, ian.jackson,
	tim, jbeulich, Nakajima, Jun, andrew.cooper3

> From: Razvan Cojocaru [mailto:rcojocaru@bitdefender.com]
> Sent: Friday, September 05, 2014 3:02 AM
> 
> Extended HVMOP_inject_trap to allow asking for trap injection done
> by the first available CPU, when it's in user mode and its CR3
> matches the one for an interesting application inside the guest.
> This mechanism allows bringing in swapped-out pages for inspection.
> 
> Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
> Acked-by: Ian Campbell <ian.campbell@citrix.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

> 
> ---
> Changes since V3:
>  - Now returning -EBUSY if the hypercall tries to set per-domain
>    injection data when there's a pending injection on any of the
>    domain's CPUs.
>  - Minor code cleanup.
> 
> Changes since V2:
>  - Renamed hvm_is_pf_requested() to hvm_can_inject_domain_pf().
>  - Modified comments to better reflect the introspection-related
>    purpose of added code.
> ---
>  tools/libxc/xc_misc.c               |    5 +-
>  tools/libxc/xenctrl.h               |    3 +-
>  tools/tests/xen-access/xen-access.c |    2 +-
>  xen/arch/x86/hvm/hvm.c              |   97
> ++++++++++++++++++++++++++++++-----
>  xen/include/asm-x86/hvm/domain.h    |    3 ++
>  xen/include/asm-x86/hvm/hvm.h       |    1 +
>  xen/include/public/hvm/hvm_op.h     |    7 +++
>  7 files changed, 100 insertions(+), 18 deletions(-)
> 
> diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
> index e253a58..6773446 100644
> --- a/tools/libxc/xc_misc.c
> +++ b/tools/libxc/xc_misc.c
> @@ -597,7 +597,7 @@ int xc_hvm_set_mem_type(
>  int xc_hvm_inject_trap(
>      xc_interface *xch, domid_t dom, int vcpu, uint32_t vector,
>      uint32_t type, uint32_t error_code, uint32_t insn_len,
> -    uint64_t cr2)
> +    uint64_t cr2, uint64_t cr3)
>  {
>      DECLARE_HYPERCALL;
>      DECLARE_HYPERCALL_BUFFER(struct xen_hvm_inject_trap, arg);
> @@ -611,12 +611,13 @@ int xc_hvm_inject_trap(
>      }
> 
>      arg->domid       = dom;
> -    arg->vcpuid      = vcpu;
> +    arg->vcpuid      = (vcpu == -1 ? (uint32_t)~0 : vcpu);
>      arg->vector      = vector;
>      arg->type        = type;
>      arg->error_code  = error_code;
>      arg->insn_len    = insn_len;
>      arg->cr2         = cr2;
> +    arg->cr3         = cr3;
> 
>      hypercall.op     = __HYPERVISOR_hvm_op;
>      hypercall.arg[0] = HVMOP_inject_trap;
> diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
> index 28b5562..5bf0173 100644
> --- a/tools/libxc/xenctrl.h
> +++ b/tools/libxc/xenctrl.h
> @@ -1816,11 +1816,12 @@ int xc_hvm_set_mem_type(
>  /*
>   * Injects a hardware/software CPU trap, to take effect the next time the
> HVM
>   * resumes.
> + * Cr3 is only taken into account if vcpu == -1 (wildcard for "any vcpu").
>   */
>  int xc_hvm_inject_trap(
>      xc_interface *xch, domid_t dom, int vcpu, uint32_t vector,
>      uint32_t type, uint32_t error_code, uint32_t insn_len,
> -    uint64_t cr2);
> +    uint64_t cr2, uint64_t cr3);
> 
>  /*
>   *  LOGGING AND ERROR REPORTING
> diff --git a/tools/tests/xen-access/xen-access.c
> b/tools/tests/xen-access/xen-access.c
> index 090df5f..34c53d2 100644
> --- a/tools/tests/xen-access/xen-access.c
> +++ b/tools/tests/xen-access/xen-access.c
> @@ -601,7 +601,7 @@ int main(int argc, char *argv[])
>                  /* Reinject */
>                  rc = xc_hvm_inject_trap(
>                      xch, domain_id, req.vcpu_id, 3,
> -                    HVMOP_TRAP_sw_exc, -1, 0, 0);
> +                    HVMOP_TRAP_sw_exc, -1, 0, 0, 0);
>                  if (rc < 0)
>                  {
>                      ERROR("Error %d injecting int3\n", rc);
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index bb45593..9b4904c 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -420,6 +420,31 @@ static bool_t hvm_wait_for_io(struct
> hvm_ioreq_vcpu *sv, ioreq_t *p)
>      return 1;
>  }
> 
> +static bool_t hvm_can_inject_domain_pf(struct vcpu *v)
> +{
> +    const struct domain *d = v->domain;
> +    struct segment_register seg;
> +    uint64_t mask;
> +
> +    hvm_get_segment_register(v, x86_seg_ss, &seg);
> +
> +    if ( seg.attr.fields.dpl != 3 ) /* Guest is not in user mode */
> +        return 0;
> +
> +    if ( hvm_long_mode_enabled(v) )
> +        mask = PADDR_MASK & PAGE_MASK; /* Bits 51:12. */
> +    else if ( hvm_pae_enabled(v) )
> +        mask = 0x00000000ffffffe0; /* Bits 31:5. */
> +    else
> +        mask = (uint32_t)PAGE_MASK; /* Bits 31:12. */
> +
> +    if ( (v->arch.hvm_vcpu.guest_cr[3] & mask) !=
> +         (d->arch.hvm_domain.inject_trap.cr3 & mask) )
> +        return 0;
> +
> +    return 1;
> +}
> +
>  void hvm_do_resume(struct vcpu *v)
>  {
>      struct domain *d = v->domain;
> @@ -451,11 +476,22 @@ void hvm_do_resume(struct vcpu *v)
>      }
> 
>      /* Inject pending hw/sw trap */
> -    if ( v->arch.hvm_vcpu.inject_trap.vector != -1 )
> +    if ( v->arch.hvm_vcpu.inject_trap.vector != -1 )
>      {
>          hvm_inject_trap(&v->arch.hvm_vcpu.inject_trap);
>          v->arch.hvm_vcpu.inject_trap.vector = -1;
>      }
> +    /*
> +     * Inject per-domain pending hw/sw trap (this will most likely
> +     * be a page fault injected by memory introspection code).
> +     */
> +    else if ( d->arch.hvm_domain.inject_trap.vector != -1 &&
> +              hvm_can_inject_domain_pf(v) )
> +    {
> +        hvm_inject_trap(&d->arch.hvm_domain.inject_trap);
> +        d->arch.hvm_domain.inject_trap.vector = -1;
> +    }
> +
>  }
> 
>  static int hvm_alloc_ioreq_gmfn(struct domain *d, unsigned long *gmfn)
> @@ -1473,9 +1509,10 @@ int hvm_domain_initialise(struct domain *d)
>              printk(XENLOG_G_INFO "PVH guest must have HAP on\n");
>              return -EINVAL;
>          }
> -
>      }
> 
> +    d->arch.hvm_domain.inject_trap.vector = -1;
> +
>      spin_lock_init(&d->arch.hvm_domain.ioreq_server.lock);
>      INIT_LIST_HEAD(&d->arch.hvm_domain.ioreq_server.list);
>      spin_lock_init(&d->arch.hvm_domain.irq_lock);
> @@ -6086,19 +6123,51 @@ long do_hvm_op(unsigned long op,
> XEN_GUEST_HANDLE_PARAM(void) arg)
>              goto param_fail8;
> 
>          rc = -ENOENT;
> -        if ( tr.vcpuid >= d->max_vcpus || (v = d->vcpu[tr.vcpuid]) == NULL )
> -            goto param_fail8;
> -
> -        if ( v->arch.hvm_vcpu.inject_trap.vector != -1 )
> -            rc = -EBUSY;
> -        else
> +
> +        if ( tr.vcpuid == (uint32_t)~0 ) /* Any VCPU. */
>          {
> -            v->arch.hvm_vcpu.inject_trap.vector = tr.vector;
> -            v->arch.hvm_vcpu.inject_trap.type = tr.type;
> -            v->arch.hvm_vcpu.inject_trap.error_code = tr.error_code;
> -            v->arch.hvm_vcpu.inject_trap.insn_len = tr.insn_len;
> -            v->arch.hvm_vcpu.inject_trap.cr2 = tr.cr2;
> -            rc = 0;
> +            unsigned int i;
> +
> +            for ( i = 0; i < d->max_vcpus; i++ )
> +                if ( (v = d->vcpu[i]) != NULL &&
> +                     v->arch.hvm_vcpu.inject_trap.vector != -1 )
> +                {
> +                    rc = -EBUSY;
> +                    break;
> +                }
> +
> +            if ( d->arch.hvm_domain.inject_trap.vector != -1 )
> +                rc = -EBUSY;
> +
> +            if ( rc != -EBUSY )
> +            {
> +                d->arch.hvm_domain.inject_trap.vector = tr.vector;
> +                d->arch.hvm_domain.inject_trap.type = tr.type;
> +                d->arch.hvm_domain.inject_trap.error_code =
> tr.error_code;
> +                d->arch.hvm_domain.inject_trap.insn_len = tr.insn_len;
> +                d->arch.hvm_domain.inject_trap.cr2 = tr.cr2;
> +                d->arch.hvm_domain.inject_trap.cr3 = tr.cr3;
> +                rc = 0;
> +            }
> +        }
> +        else /* Specific VCPU. */
> +        {
> +            if ( tr.vcpuid >= d->max_vcpus || (v = d->vcpu[tr.vcpuid]) ==
> NULL )
> +                goto param_fail8;
> +
> +            if ( v->arch.hvm_vcpu.inject_trap.vector != -1 ||
> +                 d->arch.hvm_domain.inject_trap.vector != -1 )
> +                rc = -EBUSY;
> +            else
> +            {
> +                v->arch.hvm_vcpu.inject_trap.vector = tr.vector;
> +                v->arch.hvm_vcpu.inject_trap.type = tr.type;
> +                v->arch.hvm_vcpu.inject_trap.error_code = tr.error_code;
> +                v->arch.hvm_vcpu.inject_trap.insn_len = tr.insn_len;
> +                v->arch.hvm_vcpu.inject_trap.cr2 = tr.cr2;
> +                v->arch.hvm_vcpu.inject_trap.cr3 = tr.cr3;
> +                rc = 0;
> +            }
>          }
> 
>      param_fail8:
> diff --git a/xen/include/asm-x86/hvm/domain.h
> b/xen/include/asm-x86/hvm/domain.h
> index 30d4aa3..b432874 100644
> --- a/xen/include/asm-x86/hvm/domain.h
> +++ b/xen/include/asm-x86/hvm/domain.h
> @@ -146,6 +146,9 @@ struct hvm_domain {
>          struct vmx_domain vmx;
>          struct svm_domain svm;
>      };
> +
> +    /* Pending hw/sw interrupt (.vector = -1 means nothing pending). */
> +    struct hvm_trap     inject_trap;
>  };
> 
>  #define hap_enabled(d)  ((d)->arch.hvm_domain.hap_enabled)
> diff --git a/xen/include/asm-x86/hvm/hvm.h
> b/xen/include/asm-x86/hvm/hvm.h
> index 121d053..3b0bde9 100644
> --- a/xen/include/asm-x86/hvm/hvm.h
> +++ b/xen/include/asm-x86/hvm/hvm.h
> @@ -78,6 +78,7 @@ struct hvm_trap {
>      int           error_code;   /* HVM_DELIVER_NO_ERROR_CODE if
> n/a */
>      int           insn_len;     /* Instruction length */
>      unsigned long cr2;          /* Only for TRAP_page_fault h/w
> exception */
> +    unsigned long cr3;          /* Only for TRAP_page_fault h/w
> exception */
>  };
> 
>  /*
> diff --git a/xen/include/public/hvm/hvm_op.h
> b/xen/include/public/hvm/hvm_op.h
> index eeb0a60..399a942 100644
> --- a/xen/include/public/hvm/hvm_op.h
> +++ b/xen/include/public/hvm/hvm_op.h
> @@ -197,6 +197,13 @@ struct xen_hvm_inject_trap {
>      uint32_t insn_len;
>      /* CR2 for page faults */
>      uint64_aligned_t cr2;
> +    /*
> +     * Only used if vcpuid == ~0 (wildcard for any VCPU).
> +     * In that case, injection data is set per-domain, and any VCPU
> +     * running a process with matching CR3 in user mode will inject
> +     * the trap.
> +     */
> +    uint64_aligned_t cr3;
>  };
>  typedef struct xen_hvm_inject_trap xen_hvm_inject_trap_t;
>  DEFINE_XEN_GUEST_HANDLE(xen_hvm_inject_trap_t);
> --
> 1.7.9.5

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V4 4/5] xen, libxc: Request page fault injection via libxc
  2014-09-05 10:01 ` [PATCH V4 4/5] xen, libxc: Request page fault injection via libxc Razvan Cojocaru
  2014-09-05 16:46   ` Tian, Kevin
@ 2014-09-08 15:47   ` Jan Beulich
  2014-09-08 16:21     ` Razvan Cojocaru
  1 sibling, 1 reply; 13+ messages in thread
From: Jan Beulich @ 2014-09-08 15:47 UTC (permalink / raw)
  To: Razvan Cojocaru
  Cc: kevin.tian, keir, ian.campbell, stefano.stabellini,
	andrew.cooper3, eddie.dong, tim, jun.nakajima, xen-devel,
	ian.jackson

>>> On 05.09.14 at 12:01, <rcojocaru@bitdefender.com> wrote:
> +    /*
> +     * Inject per-domain pending hw/sw trap (this will most likely
> +     * be a page fault injected by memory introspection code).
> +     */
> +    else if ( d->arch.hvm_domain.inject_trap.vector != -1 &&
> +              hvm_can_inject_domain_pf(v) )

Now these two don't fit together: The first check for _any_
vector, while the latter considers just #PF.

> +    {
> +        hvm_inject_trap(&d->arch.hvm_domain.inject_trap);
> +        d->arch.hvm_domain.inject_trap.vector = -1;
> +    }

And this is clearly lacking serialization (or a comment saying why
serialization isn't needed here).

> @@ -6086,19 +6123,51 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>              goto param_fail8;
>  
>          rc = -ENOENT;
> -        if ( tr.vcpuid >= d->max_vcpus || (v = d->vcpu[tr.vcpuid]) == NULL )
> -            goto param_fail8;
> -        
> -        if ( v->arch.hvm_vcpu.inject_trap.vector != -1 )
> -            rc = -EBUSY;
> -        else 
> +
> +        if ( tr.vcpuid == (uint32_t)~0 ) /* Any VCPU. */
>          {
> -            v->arch.hvm_vcpu.inject_trap.vector = tr.vector;
> -            v->arch.hvm_vcpu.inject_trap.type = tr.type;
> -            v->arch.hvm_vcpu.inject_trap.error_code = tr.error_code;
> -            v->arch.hvm_vcpu.inject_trap.insn_len = tr.insn_len;
> -            v->arch.hvm_vcpu.inject_trap.cr2 = tr.cr2;
> -            rc = 0;
> +            unsigned int i;
> +
> +            for ( i = 0; i < d->max_vcpus; i++ )
> +                if ( (v = d->vcpu[i]) != NULL &&
> +                     v->arch.hvm_vcpu.inject_trap.vector != -1 )
> +                {
> +                    rc = -EBUSY;
> +                    break;
> +                }
> +
> +            if ( d->arch.hvm_domain.inject_trap.vector != -1 )
> +                rc = -EBUSY;
> +
> +            if ( rc != -EBUSY )
> +            {
> +                d->arch.hvm_domain.inject_trap.vector = tr.vector;
> +                d->arch.hvm_domain.inject_trap.type = tr.type;
> +                d->arch.hvm_domain.inject_trap.error_code = tr.error_code;
> +                d->arch.hvm_domain.inject_trap.insn_len = tr.insn_len;
> +                d->arch.hvm_domain.inject_trap.cr2 = tr.cr2;
> +                d->arch.hvm_domain.inject_trap.cr3 = tr.cr3;
> +                rc = 0;
> +            }
> +        }
> +        else /* Specific VCPU. */
> +        {
> +            if ( tr.vcpuid >= d->max_vcpus || (v = d->vcpu[tr.vcpuid]) == NULL )
> +                goto param_fail8;
> +
> +            if ( v->arch.hvm_vcpu.inject_trap.vector != -1 ||
> +                 d->arch.hvm_domain.inject_trap.vector != -1 )
> +                rc = -EBUSY;
> +            else
> +            {
> +                v->arch.hvm_vcpu.inject_trap.vector = tr.vector;
> +                v->arch.hvm_vcpu.inject_trap.type = tr.type;
> +                v->arch.hvm_vcpu.inject_trap.error_code = tr.error_code;
> +                v->arch.hvm_vcpu.inject_trap.insn_len = tr.insn_len;
> +                v->arch.hvm_vcpu.inject_trap.cr2 = tr.cr2;
> +                v->arch.hvm_vcpu.inject_trap.cr3 = tr.cr3;
> +                rc = 0;
> +            }
>          }

As does - afaict - this code.

> --- a/xen/include/public/hvm/hvm_op.h
> +++ b/xen/include/public/hvm/hvm_op.h
> @@ -197,6 +197,13 @@ struct xen_hvm_inject_trap {
>      uint32_t insn_len;
>      /* CR2 for page faults */
>      uint64_aligned_t cr2;
> +    /*
> +     * Only used if vcpuid == ~0 (wildcard for any VCPU).
> +     * In that case, injection data is set per-domain, and any VCPU
> +     * running a process with matching CR3 in user mode will inject
> +     * the trap.
> +     */
> +    uint64_aligned_t cr3;

The comment should say "Currently only used ...", and the code
should then check this (returning the usual -EOPNOTSUPP). Or
alternatively implement it right away (which may be the better
route taking into consideration the first of the comments above).

Jan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V4 5/5] xen: Handle resumed instruction based on previous mem_event reply
  2014-09-05 10:01 ` [PATCH V4 5/5] xen: Handle resumed instruction based on previous mem_event reply Razvan Cojocaru
@ 2014-09-08 15:50   ` Jan Beulich
  2014-09-08 15:53     ` Razvan Cojocaru
  0 siblings, 1 reply; 13+ messages in thread
From: Jan Beulich @ 2014-09-08 15:50 UTC (permalink / raw)
  To: Razvan Cojocaru
  Cc: kevin.tian, keir, ian.campbell, stefano.stabellini,
	andrew.cooper3, eddie.dong, tim, jun.nakajima, xen-devel,
	ian.jackson

>>> On 05.09.14 at 12:01, <rcojocaru@bitdefender.com> wrote:
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -1695,6 +1695,21 @@ static void vmx_enable_msr_exit_interception(struct domain *d)
>                                           MSR_TYPE_W);
>  }
>  
> +static bool_t vmx_exited_by_nested_pagefault(void)
> +{
> +    unsigned long exit_qualification, exit_reason;
> +
> +    __vmread(VM_EXIT_REASON, &exit_reason);
> +    ASSERT(exit_reason == EXIT_REASON_EPT_VIOLATION);
> +
> +    __vmread(EXIT_QUALIFICATION, &exit_qualification);
> +
> +    if ( (exit_qualification & EPT_GLA_FAULT) == 0 )
> +        return 0;
> +
> +    return 1;
> +}

So I said a number of times that I think this new function can go
away with the recent changes introducing struct npfec, making
the interesting bit of information already available to
p2m_mem_access_check() if I'm not mistaken. And the previous
excuse of that code not being committed yet is no longer valid.

Jan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V4 5/5] xen: Handle resumed instruction based on previous mem_event reply
  2014-09-08 15:50   ` Jan Beulich
@ 2014-09-08 15:53     ` Razvan Cojocaru
  0 siblings, 0 replies; 13+ messages in thread
From: Razvan Cojocaru @ 2014-09-08 15:53 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, keir, ian.campbell, stefano.stabellini,
	andrew.cooper3, eddie.dong, tim, jun.nakajima, xen-devel,
	ian.jackson

On 09/08/14 18:50, Jan Beulich wrote:
>>>> On 05.09.14 at 12:01, <rcojocaru@bitdefender.com> wrote:
>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> @@ -1695,6 +1695,21 @@ static void vmx_enable_msr_exit_interception(struct domain *d)
>>                                           MSR_TYPE_W);
>>  }
>>  
>> +static bool_t vmx_exited_by_nested_pagefault(void)
>> +{
>> +    unsigned long exit_qualification, exit_reason;
>> +
>> +    __vmread(VM_EXIT_REASON, &exit_reason);
>> +    ASSERT(exit_reason == EXIT_REASON_EPT_VIOLATION);
>> +
>> +    __vmread(EXIT_QUALIFICATION, &exit_qualification);
>> +
>> +    if ( (exit_qualification & EPT_GLA_FAULT) == 0 )
>> +        return 0;
>> +
>> +    return 1;
>> +}
> 
> So I said a number of times that I think this new function can go
> away with the recent changes introducing struct npfec, making
> the interesting bit of information already available to
> p2m_mem_access_check() if I'm not mistaken. And the previous
> excuse of that code not being committed yet is no longer valid.

I'll change that and test the code first thing tomorrow.


Thanks,
Razvan Cojocaru

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V4 4/5] xen, libxc: Request page fault injection via libxc
  2014-09-08 15:47   ` Jan Beulich
@ 2014-09-08 16:21     ` Razvan Cojocaru
  2014-09-09  9:08       ` Jan Beulich
  0 siblings, 1 reply; 13+ messages in thread
From: Razvan Cojocaru @ 2014-09-08 16:21 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, keir, ian.campbell, stefano.stabellini,
	andrew.cooper3, eddie.dong, tim, jun.nakajima, xen-devel,
	ian.jackson

On 09/08/14 18:47, Jan Beulich wrote:
>>>> On 05.09.14 at 12:01, <rcojocaru@bitdefender.com> wrote:
>> +    {
>> +        hvm_inject_trap(&d->arch.hvm_domain.inject_trap);
>> +        d->arch.hvm_domain.inject_trap.vector = -1;
>> +    }
> 
> And this is clearly lacking serialization (or a comment saying why
> serialization isn't needed here).

If I understood this correctly, this is what Kevin has recommended: that
only one HVMOP_trap_request should be pending at a time. Or have I
misunderstood your comment?

>> --- a/xen/include/public/hvm/hvm_op.h
>> +++ b/xen/include/public/hvm/hvm_op.h
>> @@ -197,6 +197,13 @@ struct xen_hvm_inject_trap {
>>      uint32_t insn_len;
>>      /* CR2 for page faults */
>>      uint64_aligned_t cr2;
>> +    /*
>> +     * Only used if vcpuid == ~0 (wildcard for any VCPU).
>> +     * In that case, injection data is set per-domain, and any VCPU
>> +     * running a process with matching CR3 in user mode will inject
>> +     * the trap.
>> +     */
>> +    uint64_aligned_t cr3;
> 
> The comment should say "Currently only used ...", and the code
> should then check this (returning the usual -EOPNOTSUPP). Or
> alternatively implement it right away (which may be the better
> route taking into consideration the first of the comments above).

I will gladly change the comment, but I'm not sure for what CR3 value I
should return -EOPNOTSUPP. Should I enforce that cr3 == 0 for the
VCPU-specific case and for traps that are not page faults? Could you
please elaborate on what the alternative is?


Thanks,
Razvan Cojocaru

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V4 4/5] xen, libxc: Request page fault injection via libxc
  2014-09-08 16:21     ` Razvan Cojocaru
@ 2014-09-09  9:08       ` Jan Beulich
  2014-09-09 10:31         ` Razvan Cojocaru
  0 siblings, 1 reply; 13+ messages in thread
From: Jan Beulich @ 2014-09-09  9:08 UTC (permalink / raw)
  To: Razvan Cojocaru
  Cc: kevin.tian, keir, ian.campbell, stefano.stabellini,
	andrew.cooper3, eddie.dong, tim, jun.nakajima, xen-devel,
	ian.jackson

>>> On 08.09.14 at 18:21, <rcojocaru@bitdefender.com> wrote:
> On 09/08/14 18:47, Jan Beulich wrote:
>>>>> On 05.09.14 at 12:01, <rcojocaru@bitdefender.com> wrote:
>>> +    {
>>> +        hvm_inject_trap(&d->arch.hvm_domain.inject_trap);
>>> +        d->arch.hvm_domain.inject_trap.vector = -1;
>>> +    }
>> 
>> And this is clearly lacking serialization (or a comment saying why
>> serialization isn't needed here).
> 
> If I understood this correctly, this is what Kevin has recommended: that
> only one HVMOP_trap_request should be pending at a time. Or have I
> misunderstood your comment?

Indeed you have - the comment is about dealing with races of e.g.
two CPUs processing a request for trap injection at the same time.

>>> --- a/xen/include/public/hvm/hvm_op.h
>>> +++ b/xen/include/public/hvm/hvm_op.h
>>> @@ -197,6 +197,13 @@ struct xen_hvm_inject_trap {
>>>      uint32_t insn_len;
>>>      /* CR2 for page faults */
>>>      uint64_aligned_t cr2;
>>> +    /*
>>> +     * Only used if vcpuid == ~0 (wildcard for any VCPU).
>>> +     * In that case, injection data is set per-domain, and any VCPU
>>> +     * running a process with matching CR3 in user mode will inject
>>> +     * the trap.
>>> +     */
>>> +    uint64_aligned_t cr3;
>> 
>> The comment should say "Currently only used ...", and the code
>> should then check this (returning the usual -EOPNOTSUPP). Or
>> alternatively implement it right away (which may be the better
>> route taking into consideration the first of the comments above).
> 
> I will gladly change the comment, but I'm not sure for what CR3 value I
> should return -EOPNOTSUPP. Should I enforce that cr3 == 0 for the
> VCPU-specific case and for traps that are not page faults? Could you
> please elaborate on what the alternative is?

First of all you need a value (or separate flag) indicating that the
CR3 check is to not be done. The current lack thereof shows (once
again) that you continue to think just in terms of your use case, not
in terms of a generic injection model. Once such a flag or special
value is in place, the answer to the above is - I think - obvious: For
other than #PF, it may be desirable (but not strictly necessary) to
enforce the value to mean "don't care"; just do whatever currently
is done for CR2. For #PF, the special value (or flag) used to
suppress CR3 checking in the domain case should (for now) get
enforced for the vCPU case.

As to flag or value: Since e.g. ~0 is never valid, this would seem
preferable over 0. And as long as there is one definitely invalid
value, not introducing a separate qualifying flag would seem the
preferred route to me.

Jan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V4 4/5] xen, libxc: Request page fault injection via libxc
  2014-09-09  9:08       ` Jan Beulich
@ 2014-09-09 10:31         ` Razvan Cojocaru
  0 siblings, 0 replies; 13+ messages in thread
From: Razvan Cojocaru @ 2014-09-09 10:31 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, keir, ian.campbell, stefano.stabellini,
	andrew.cooper3, eddie.dong, tim, jun.nakajima, xen-devel,
	ian.jackson

On 09/09/2014 12:08 PM, Jan Beulich wrote:
>>>> On 08.09.14 at 18:21, <rcojocaru@bitdefender.com> wrote:
>> On 09/08/14 18:47, Jan Beulich wrote:
>>>>>> On 05.09.14 at 12:01, <rcojocaru@bitdefender.com> wrote:
>>>> +    {
>>>> +        hvm_inject_trap(&d->arch.hvm_domain.inject_trap);
>>>> +        d->arch.hvm_domain.inject_trap.vector = -1;
>>>> +    }
>>>
>>> And this is clearly lacking serialization (or a comment saying why
>>> serialization isn't needed here).
>>
>> If I understood this correctly, this is what Kevin has recommended: that
>> only one HVMOP_trap_request should be pending at a time. Or have I
>> misunderstood your comment?
> 
> Indeed you have - the comment is about dealing with races of e.g.
> two CPUs processing a request for trap injection at the same time.

Thanks for clearing that up. I hope I've addressed all these issues in
V6 (just posted), and hopefully patch 5/5 can also be Acked now (I'm now
using struct npfec as discussed).


Thanks,
Razvan Cojocaru

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2014-09-09 10:31 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-05 10:01 [PATCH V4 0/5] Basic guest memory introspection support Razvan Cojocaru
2014-09-05 10:01 ` [PATCH V4 1/5] xen: Emulate with no writes Razvan Cojocaru
2014-09-05 10:01 ` [PATCH V4 2/5] xen: Optimize introspection access to guest state Razvan Cojocaru
2014-09-05 10:01 ` [PATCH V4 3/5] xen, libxc: Force-enable relevant MSR events Razvan Cojocaru
2014-09-05 10:01 ` [PATCH V4 4/5] xen, libxc: Request page fault injection via libxc Razvan Cojocaru
2014-09-05 16:46   ` Tian, Kevin
2014-09-08 15:47   ` Jan Beulich
2014-09-08 16:21     ` Razvan Cojocaru
2014-09-09  9:08       ` Jan Beulich
2014-09-09 10:31         ` Razvan Cojocaru
2014-09-05 10:01 ` [PATCH V4 5/5] xen: Handle resumed instruction based on previous mem_event reply Razvan Cojocaru
2014-09-08 15:50   ` Jan Beulich
2014-09-08 15:53     ` Razvan Cojocaru

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.