All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/18] x86/hvm: I/O emulation cleanup and fix
@ 2015-06-23 10:39 Paul Durrant
  2015-06-23 10:39 ` [PATCH v3 01/18] x86/hvm: simplify hvmemul_do_io() Paul Durrant
                   ` (18 more replies)
  0 siblings, 19 replies; 29+ messages in thread
From: Paul Durrant @ 2015-06-23 10:39 UTC (permalink / raw)
  To: xen-devel; +Cc: Paul Durrant

This patch series re-works much of the code involved in emulation of port
and memory mapped I/O for HVM guests.

The code has become very convoluted and, at least by inspection, certain
emulations will apparently malfunction.

The series is broken down into 18 patches (which are also available in
my xenbits repo: http://xenbits.xen.org/gitweb/?p=people/pauldu/xen.git
on the emulation25 branch) as follows:

0001-x86-hvm-simplify-hvmemul_do_io.patch
0002-x86-hvm-remove-hvm_io_pending-check-in-hvmemul_do_io.patch
0003-x86-hvm-remove-extraneous-parameter-from-hvmtrace_io.patch
0004-x86-hvm-make-sure-translated-MMIO-reads-or-writes-fa.patch
0005-x86-hvm-remove-multiple-open-coded-chunking-loops.patch
0006-x86-hvm-re-name-struct-hvm_mmio_handler-to-hvm_mmio_.patch
0007-x86-hvm-unify-internal-portio-and-mmio-intercepts.patch
0008-x86-hvm-add-length-to-mmio-check-op.patch
0009-x86-hvm-unify-dpci-portio-intercept-with-standard-po.patch
0010-x86-hvm-unify-stdvga-mmio-intercept-with-standard-mm.patch
0011-x86-hvm-revert-82ed8716b-fix-direct-PCI-port-I-O-emu.patch
0012-x86-hvm-only-call-hvm_io_assist-from-hvm_wait_for_io.patch
0013-x86-hvm-split-I-O-completion-handling-from-state-mod.patch
0014-x86-hvm-remove-HVMIO_dispatched-I-O-state.patch
0015-x86-hvm-remove-hvm_io_state-enumeration.patch
0016-x86-hvm-use-ioreq_t-to-track-in-flight-state.patch
0017-x86-hvm-always-re-emulate-I-O-from-a-buffer.patch
0018-x86-hvm-track-large-memory-mapped-accesses-by-buffer.patch

v2:
 - Removed bogus assertion from patch 16
 - Re-worked patch 17 after basic testing of back-port onto XenServer

v3:
 - Addressed comments from Jan
 - Re-ordered series to bring a couple of more trivial patches to the
   front
 - Backport to XenServer (4.5) now passing automated tests
 - Tested on unstable with QEMU upstream and trad, with and without
   HAP (to force shadow emulation)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v3 01/18] x86/hvm: simplify hvmemul_do_io()
  2015-06-23 10:39 [PATCH v3 00/18] x86/hvm: I/O emulation cleanup and fix Paul Durrant
@ 2015-06-23 10:39 ` Paul Durrant
  2015-06-23 10:39 ` [PATCH v3 02/18] x86/hvm: remove hvm_io_pending() check in hvmemul_do_io() Paul Durrant
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: Paul Durrant @ 2015-06-23 10:39 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant, Keir Fraser, Jan Beulich

Currently hvmemul_do_io() handles paging for I/O to/from a guest address
inline. This causes every exit point to have to execute:

if ( ram_page )
    put_page(ram_page);

This patch introduces wrapper hvmemul_do_io_addr() and
hvmemul_do_io_buffer() functions. The latter is used for I/O to/from a Xen
buffer and thus the complexity of paging can be restricted only to the
former, making the common hvmemul_do_io() function less convoluted.

This patch also tightens up some types and introduces pio/mmio wrappers
for the above functions with comments to document their semantics.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/emulate.c        |  278 ++++++++++++++++++++++++-------------
 xen/arch/x86/hvm/io.c             |    4 +-
 xen/include/asm-x86/hvm/emulate.h |   17 ++-
 3 files changed, 198 insertions(+), 101 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index ac9c9d6..9d7af0c 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -51,41 +51,23 @@ static void hvmtrace_io_assist(int is_mmio, ioreq_t *p)
 }
 
 static int hvmemul_do_io(
-    int is_mmio, paddr_t addr, unsigned long *reps, int size,
-    paddr_t ram_gpa, int dir, int df, void *p_data)
+    bool_t is_mmio, paddr_t addr, unsigned long *reps, unsigned int size,
+    uint8_t dir, bool_t df, bool_t data_is_addr, uintptr_t data)
 {
     struct vcpu *curr = current;
-    struct hvm_vcpu_io *vio;
+    struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
     ioreq_t p = {
         .type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO,
         .addr = addr,
         .size = size,
         .dir = dir,
         .df = df,
-        .data = ram_gpa,
-        .data_is_ptr = (p_data == NULL),
+        .data = data,
+        .data_is_ptr = data_is_addr, /* ioreq_t field name is misleading */
     };
-    unsigned long ram_gfn = paddr_to_pfn(ram_gpa);
-    p2m_type_t p2mt;
-    struct page_info *ram_page;
+    void *p_data = (void *)data;
     int rc;
 
-    /* Check for paged out page */
-    ram_page = get_page_from_gfn(curr->domain, ram_gfn, &p2mt, P2M_UNSHARE);
-    if ( p2m_is_paging(p2mt) )
-    {
-        if ( ram_page )
-            put_page(ram_page);
-        p2m_mem_paging_populate(curr->domain, ram_gfn);
-        return X86EMUL_RETRY;
-    }
-    if ( p2m_is_shared(p2mt) )
-    {
-        if ( ram_page )
-            put_page(ram_page);
-        return X86EMUL_RETRY;
-    }
-
     /*
      * Weird-sized accesses have undefined behaviour: we discard writes
      * and read all-ones.
@@ -93,23 +75,10 @@ static int hvmemul_do_io(
     if ( unlikely((size > sizeof(long)) || (size & (size - 1))) )
     {
         gdprintk(XENLOG_WARNING, "bad mmio size %d\n", size);
-        ASSERT(p_data != NULL); /* cannot happen with a REP prefix */
-        if ( dir == IOREQ_READ )
-            memset(p_data, ~0, size);
-        if ( ram_page )
-            put_page(ram_page);
         return X86EMUL_UNHANDLEABLE;
     }
 
-    if ( !p.data_is_ptr && (dir == IOREQ_WRITE) )
-    {
-        memcpy(&p.data, p_data, size);
-        p_data = NULL;
-    }
-
-    vio = &curr->arch.hvm_vcpu.hvm_io;
-
-    if ( is_mmio && !p.data_is_ptr )
+    if ( is_mmio && !data_is_addr )
     {
         /* Part of a multi-cycle read or write? */
         if ( dir == IOREQ_WRITE )
@@ -117,11 +86,7 @@ static int hvmemul_do_io(
             paddr_t pa = vio->mmio_large_write_pa;
             unsigned int bytes = vio->mmio_large_write_bytes;
             if ( (addr >= pa) && ((addr + size) <= (pa + bytes)) )
-            {
-                if ( ram_page )
-                    put_page(ram_page);
                 return X86EMUL_OKAY;
-            }
         }
         else
         {
@@ -131,8 +96,6 @@ static int hvmemul_do_io(
             {
                 memcpy(p_data, &vio->mmio_large_read[addr - pa],
                        size);
-                if ( ram_page )
-                    put_page(ram_page);
                 return X86EMUL_OKAY;
             }
         }
@@ -144,40 +107,28 @@ static int hvmemul_do_io(
         break;
     case HVMIO_completed:
         vio->io_state = HVMIO_none;
-        if ( p_data == NULL )
-        {
-            if ( ram_page )
-                put_page(ram_page);
+        if ( data_is_addr || dir == IOREQ_WRITE )
             return X86EMUL_UNHANDLEABLE;
-        }
         goto finish_access;
     case HVMIO_dispatched:
         /* May have to wait for previous cycle of a multi-write to complete. */
-        if ( is_mmio && !p.data_is_ptr && (dir == IOREQ_WRITE) &&
+        if ( is_mmio && !data_is_addr && (dir == IOREQ_WRITE) &&
              (addr == (vio->mmio_large_write_pa +
                        vio->mmio_large_write_bytes)) )
-        {
-            if ( ram_page )
-                put_page(ram_page);
             return X86EMUL_RETRY;
-        }
         /* fallthrough */
     default:
-        if ( ram_page )
-            put_page(ram_page);
         return X86EMUL_UNHANDLEABLE;
     }
 
     if ( hvm_io_pending(curr) )
     {
         gdprintk(XENLOG_WARNING, "WARNING: io already pending?\n");
-        if ( ram_page )
-            put_page(ram_page);
         return X86EMUL_UNHANDLEABLE;
     }
 
-    vio->io_state =
-        (p_data == NULL) ? HVMIO_dispatched : HVMIO_awaiting_completion;
+    vio->io_state = (data_is_addr || dir == IOREQ_WRITE) ?
+        HVMIO_dispatched : HVMIO_awaiting_completion;
     vio->io_size = size;
 
     /*
@@ -190,7 +141,12 @@ static int hvmemul_do_io(
     p.count = *reps;
 
     if ( dir == IOREQ_WRITE )
+    {
+        if ( !data_is_addr )
+            memcpy(&p.data, p_data, size);
+
         hvmtrace_io_assist(is_mmio, &p);
+    }
 
     if ( is_mmio )
     {
@@ -235,7 +191,7 @@ static int hvmemul_do_io(
             rc = X86EMUL_RETRY;
             if ( !hvm_send_assist_req(s, &p) )
                 vio->io_state = HVMIO_none;
-            else if ( p_data == NULL )
+            else if ( data_is_addr || dir == IOREQ_WRITE )
                 rc = X86EMUL_OKAY;
         }
         break;
@@ -245,20 +201,18 @@ static int hvmemul_do_io(
     }
 
     if ( rc != X86EMUL_OKAY )
-    {
-        if ( ram_page )
-            put_page(ram_page);
         return rc;
-    }
 
  finish_access:
     if ( dir == IOREQ_READ )
+    {
         hvmtrace_io_assist(is_mmio, &p);
 
-    if ( p_data != NULL )
-        memcpy(p_data, &vio->io_data, size);
+        if ( !data_is_addr )
+            memcpy(p_data, &vio->io_data, size);
+    }
 
-    if ( is_mmio && !p.data_is_ptr )
+    if ( is_mmio && !data_is_addr )
     {
         /* Part of a multi-cycle read or write? */
         if ( dir == IOREQ_WRITE )
@@ -285,23 +239,153 @@ static int hvmemul_do_io(
         }
     }
 
-    if ( ram_page )
-        put_page(ram_page);
     return X86EMUL_OKAY;
 }
 
-int hvmemul_do_pio(
-    unsigned long port, unsigned long *reps, int size,
-    paddr_t ram_gpa, int dir, int df, void *p_data)
+static int hvmemul_do_io_buffer(
+    bool_t is_mmio, paddr_t addr, unsigned long *reps, unsigned int size,
+    uint8_t dir, bool_t df, void *buffer)
+{
+    int rc;
+
+    BUG_ON(buffer == NULL);
+
+    rc = hvmemul_do_io(is_mmio, addr, reps, size, dir, df, 0,
+                       (uintptr_t)buffer);
+    if ( rc == X86EMUL_UNHANDLEABLE && dir == IOREQ_READ )
+        memset(buffer, 0xff, size);
+
+    return rc;
+}
+
+static int hvmemul_acquire_page(unsigned long gmfn, struct page_info **page)
+{
+    struct domain *curr_d = current->domain;
+    p2m_type_t p2mt;
+
+    *page = get_page_from_gfn(curr_d, gmfn, &p2mt, P2M_UNSHARE);
+
+    if ( *page == NULL )
+        return X86EMUL_UNHANDLEABLE;
+
+    if ( p2m_is_paging(p2mt) )
+    {
+        put_page(*page);
+        p2m_mem_paging_populate(curr_d, gmfn);
+        return X86EMUL_RETRY;
+    }
+
+    if ( p2m_is_shared(p2mt) )
+    {
+        put_page(*page);
+        return X86EMUL_RETRY;
+    }
+
+    return X86EMUL_OKAY;
+}
+
+static inline void hvmemul_release_page(struct page_info *page)
 {
-    return hvmemul_do_io(0, port, reps, size, ram_gpa, dir, df, p_data);
+    put_page(page);
 }
 
-static int hvmemul_do_mmio(
-    paddr_t gpa, unsigned long *reps, int size,
-    paddr_t ram_gpa, int dir, int df, void *p_data)
+static int hvmemul_do_io_addr(
+    bool_t is_mmio, paddr_t addr, unsigned long *reps,
+    unsigned int size, uint8_t dir, bool_t df, paddr_t ram_gpa)
+{
+    struct page_info *ram_page;
+    int rc;
+
+    rc = hvmemul_acquire_page(paddr_to_pfn(ram_gpa), &ram_page);
+    if ( rc != X86EMUL_OKAY )
+        return rc;
+
+    rc = hvmemul_do_io(is_mmio, addr, reps, size, dir, df, 1,
+                       ram_gpa);
+
+    hvmemul_release_page(ram_page);
+
+    return rc;
+}
+
+/*
+ * Perform I/O between <port> and <buffer>. <dir> indicates the
+ * direction: IOREQ_READ means a read from <port> to <buffer> and
+ * IOREQ_WRITE means a write from <buffer> to <port>. Each access has
+ * width <size>.
+ */
+int hvmemul_do_pio_buffer(uint16_t port,
+                          unsigned int size,
+                          uint8_t dir,
+                          void *buffer)
+{
+    unsigned long one_rep = 1;
+
+    return hvmemul_do_io_buffer(0, port, &one_rep, size, dir, 0, buffer);
+}
+
+/*
+ * Perform I/O between <port> and guest RAM starting at <ram_addr>.
+ * <dir> indicates the direction: IOREQ_READ means a read from <port> to
+ * RAM and IOREQ_WRITE means a write from RAM to <port>. Each access has
+ * width <size> and up to *<reps> accesses will be performed. If
+ * X86EMUL_OKAY is returned then <reps> will be updated with the number
+ * of accesses actually performed.
+ * Each access will be done to/from successive RAM addresses, increasing
+ * if <df> is 0 or decreasing if <df> is 1.
+ */
+static int hvmemul_do_pio_addr(uint16_t port,
+                               unsigned long *reps,
+                               unsigned int size,
+                               uint8_t dir,
+                               bool_t df,
+                               paddr_t ram_addr)
+{
+    return hvmemul_do_io_addr(0, port, reps, size, dir, df, ram_addr);
+}
+
+/*
+ * Perform I/O between MMIO space starting at <mmio_gpa> and <buffer>.
+ * <dir> indicates the direction: IOREQ_READ means a read from MMIO to
+ * <buffer> and IOREQ_WRITE means a write from <buffer> to MMIO. Each
+ * access has width <size> and up to *<reps> accesses will be performed.
+ * If X86EMUL_OKAY is returned then <reps> will be updated with the number
+ * of accesses actually performed.
+ * Each access will be done to/from successive MMIO addresses, increasing
+ * if <df> is 0 or decreasing if <df> is 1.
+ *
+ * NOTE: If *<reps> is greater than 1, each access will use the
+ *       <buffer> pointer; there is no implicit interation over a
+ *       block of memory starting at <buffer>.
+ */
+static int hvmemul_do_mmio_buffer(paddr_t mmio_gpa,
+                                  unsigned long *reps,
+                                  unsigned int size,
+                                  uint8_t dir,
+                                  bool_t df,
+                                  void *buffer)
+{
+    return hvmemul_do_io_buffer(1, mmio_gpa, reps, size, dir, df, buffer);
+}
+
+/*
+ * Perform I/O between MMIO space starting at <mmio_gpa> and guest RAM
+ * starting at <ram_gpa>. <dir> indicates the direction: IOREQ_READ
+ * means a read from MMIO to RAM and IOREQ_WRITE means a write from RAM
+ * to MMIO. Each access has width <size> and up to *<reps> accesses will
+ * be performed. If X86EMUL_OKAY is returned then <reps> will be updated
+ * with the number of accesses actually performed.
+ * Each access will be done to/from successive RAM *and* MMIO addresses,
+ * increasing if <df> is 0 or decreasing if <df> is 1.
+ */
+static int hvmemul_do_mmio_addr(paddr_t mmio_gpa,
+                                unsigned long *reps,
+                                unsigned int size,
+                                uint8_t dir,
+                                bool_t df,
+                                paddr_t ram_gpa)
 {
-    return hvmemul_do_io(1, gpa, reps, size, ram_gpa, dir, df, p_data);
+    return hvmemul_do_io_addr(1, mmio_gpa, reps, size, dir, df, ram_gpa);
 }
 
 /*
@@ -503,7 +587,8 @@ static int __hvmemul_read(
         gpa = (((paddr_t)vio->mmio_gpfn << PAGE_SHIFT) | off);
         while ( (off + chunk) <= PAGE_SIZE )
         {
-            rc = hvmemul_do_mmio(gpa, &reps, chunk, 0, IOREQ_READ, 0, p_data);
+            rc = hvmemul_do_mmio_buffer(gpa, &reps, chunk, IOREQ_READ, 0,
+                                        p_data);
             if ( rc != X86EMUL_OKAY || bytes == chunk )
                 return rc;
             addr += chunk;
@@ -537,7 +622,8 @@ static int __hvmemul_read(
                                     hvmemul_ctxt);
         while ( rc == X86EMUL_OKAY )
         {
-            rc = hvmemul_do_mmio(gpa, &reps, chunk, 0, IOREQ_READ, 0, p_data);
+            rc = hvmemul_do_mmio_buffer(gpa, &reps, chunk, IOREQ_READ, 0,
+                                        p_data);
             if ( rc != X86EMUL_OKAY || bytes == chunk )
                 break;
             addr += chunk;
@@ -645,7 +731,8 @@ static int hvmemul_write(
         gpa = (((paddr_t)vio->mmio_gpfn << PAGE_SHIFT) | off);
         while ( (off + chunk) <= PAGE_SIZE )
         {
-            rc = hvmemul_do_mmio(gpa, &reps, chunk, 0, IOREQ_WRITE, 0, p_data);
+            rc = hvmemul_do_mmio_buffer(gpa, &reps, chunk, IOREQ_WRITE, 0,
+                                        p_data);
             if ( rc != X86EMUL_OKAY || bytes == chunk )
                 return rc;
             addr += chunk;
@@ -675,7 +762,8 @@ static int hvmemul_write(
                                     hvmemul_ctxt);
         while ( rc == X86EMUL_OKAY )
         {
-            rc = hvmemul_do_mmio(gpa, &reps, chunk, 0, IOREQ_WRITE, 0, p_data);
+            rc = hvmemul_do_mmio_buffer(gpa, &reps, chunk, IOREQ_WRITE, 0,
+                                        p_data);
             if ( rc != X86EMUL_OKAY || bytes == chunk )
                 break;
             addr += chunk;
@@ -849,8 +937,8 @@ static int hvmemul_rep_ins(
     if ( p2mt == p2m_mmio_direct || p2mt == p2m_mmio_dm )
         return X86EMUL_UNHANDLEABLE;
 
-    return hvmemul_do_pio(src_port, reps, bytes_per_rep, gpa, IOREQ_READ,
-                          !!(ctxt->regs->eflags & X86_EFLAGS_DF), NULL);
+    return hvmemul_do_pio_addr(src_port, reps, bytes_per_rep, IOREQ_READ,
+                               !!(ctxt->regs->eflags & X86_EFLAGS_DF), gpa);
 }
 
 static int hvmemul_rep_outs(
@@ -887,8 +975,8 @@ static int hvmemul_rep_outs(
     if ( p2mt == p2m_mmio_direct || p2mt == p2m_mmio_dm )
         return X86EMUL_UNHANDLEABLE;
 
-    return hvmemul_do_pio(dst_port, reps, bytes_per_rep, gpa, IOREQ_WRITE,
-                          !!(ctxt->regs->eflags & X86_EFLAGS_DF), NULL);
+    return hvmemul_do_pio_addr(dst_port, reps, bytes_per_rep, IOREQ_WRITE,
+                               !!(ctxt->regs->eflags & X86_EFLAGS_DF), gpa);
 }
 
 static int hvmemul_rep_movs(
@@ -944,12 +1032,12 @@ static int hvmemul_rep_movs(
         return X86EMUL_UNHANDLEABLE;
 
     if ( sp2mt == p2m_mmio_dm )
-        return hvmemul_do_mmio(
-            sgpa, reps, bytes_per_rep, dgpa, IOREQ_READ, df, NULL);
+        return hvmemul_do_mmio_addr(
+            sgpa, reps, bytes_per_rep, IOREQ_READ, df, dgpa);
 
     if ( dp2mt == p2m_mmio_dm )
-        return hvmemul_do_mmio(
-            dgpa, reps, bytes_per_rep, sgpa, IOREQ_WRITE, df, NULL);
+        return hvmemul_do_mmio_addr(
+            dgpa, reps, bytes_per_rep, IOREQ_WRITE, df, sgpa);
 
     /* RAM-to-RAM copy: emulate as equivalent of memmove(dgpa, sgpa, bytes). */
     bytes = *reps * bytes_per_rep;
@@ -1102,8 +1190,8 @@ static int hvmemul_rep_stos(
         return X86EMUL_UNHANDLEABLE;
 
     case p2m_mmio_dm:
-        return hvmemul_do_mmio(gpa, reps, bytes_per_rep, 0, IOREQ_WRITE, df,
-                               p_data);
+        return hvmemul_do_mmio_buffer(gpa, reps, bytes_per_rep, IOREQ_WRITE, df,
+                                      p_data);
     }
 }
 
@@ -1140,9 +1228,8 @@ static int hvmemul_read_io(
     unsigned long *val,
     struct x86_emulate_ctxt *ctxt)
 {
-    unsigned long reps = 1;
     *val = 0;
-    return hvmemul_do_pio(port, &reps, bytes, 0, IOREQ_READ, 0, val);
+    return hvmemul_do_pio_buffer(port, bytes, IOREQ_READ, val);
 }
 
 static int hvmemul_write_io(
@@ -1151,8 +1238,7 @@ static int hvmemul_write_io(
     unsigned long val,
     struct x86_emulate_ctxt *ctxt)
 {
-    unsigned long reps = 1;
-    return hvmemul_do_pio(port, &reps, bytes, 0, IOREQ_WRITE, 0, &val);
+    return hvmemul_do_pio_buffer(port, bytes, IOREQ_WRITE, &val);
 }
 
 static int hvmemul_read_cr(
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index 68fb890..c0964ec 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -132,7 +132,7 @@ int handle_pio(uint16_t port, unsigned int size, int dir)
 {
     struct vcpu *curr = current;
     struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
-    unsigned long data, reps = 1;
+    unsigned long data;
     int rc;
 
     ASSERT((size - 1) < 4 && size != 3);
@@ -140,7 +140,7 @@ int handle_pio(uint16_t port, unsigned int size, int dir)
     if ( dir == IOREQ_WRITE )
         data = guest_cpu_user_regs()->eax;
 
-    rc = hvmemul_do_pio(port, &reps, size, 0, dir, 0, &data);
+    rc = hvmemul_do_pio_buffer(port, size, dir, &data);
 
     switch ( rc )
     {
diff --git a/xen/include/asm-x86/hvm/emulate.h b/xen/include/asm-x86/hvm/emulate.h
index b3971c8..594be38 100644
--- a/xen/include/asm-x86/hvm/emulate.h
+++ b/xen/include/asm-x86/hvm/emulate.h
@@ -50,11 +50,22 @@ struct segment_register *hvmemul_get_seg_reg(
     enum x86_segment seg,
     struct hvm_emulate_ctxt *hvmemul_ctxt);
 
-int hvmemul_do_pio(
-    unsigned long port, unsigned long *reps, int size,
-    paddr_t ram_gpa, int dir, int df, void *p_data);
+int hvmemul_do_pio_buffer(uint16_t port,
+                          unsigned int size,
+                          uint8_t dir,
+                          void *buffer);
 
 void hvm_dump_emulation_state(const char *prefix,
                               struct hvm_emulate_ctxt *hvmemul_ctxt);
 
 #endif /* __ASM_X86_HVM_EMULATE_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 02/18] x86/hvm: remove hvm_io_pending() check in hvmemul_do_io()
  2015-06-23 10:39 [PATCH v3 00/18] x86/hvm: I/O emulation cleanup and fix Paul Durrant
  2015-06-23 10:39 ` [PATCH v3 01/18] x86/hvm: simplify hvmemul_do_io() Paul Durrant
@ 2015-06-23 10:39 ` Paul Durrant
  2015-06-23 10:39 ` [PATCH v3 03/18] x86/hvm: remove extraneous parameter from hvmtrace_io_assist() Paul Durrant
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: Paul Durrant @ 2015-06-23 10:39 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant, Keir Fraser, Jan Beulich

The check is done at the wrong point (since it is irrelevant if the
I/O is to be handled by the hypervisor) and its functionality can be
covered by returning X86EMUL_UNHANDLEABLE from hvm_send_assist_req()
instead.

This patch also removes the domain_crash() call from
hvm_send_assist_req(). Returning X86EMUL_UNHANDLEABLE allows the
higher layers of emulation to decide what to do instead.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/emulate.c    |   10 ++--------
 xen/arch/x86/hvm/hvm.c        |   16 ++++++----------
 xen/include/asm-x86/hvm/hvm.h |    2 +-
 3 files changed, 9 insertions(+), 19 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 9d7af0c..b412302 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -121,12 +121,6 @@ static int hvmemul_do_io(
         return X86EMUL_UNHANDLEABLE;
     }
 
-    if ( hvm_io_pending(curr) )
-    {
-        gdprintk(XENLOG_WARNING, "WARNING: io already pending?\n");
-        return X86EMUL_UNHANDLEABLE;
-    }
-
     vio->io_state = (data_is_addr || dir == IOREQ_WRITE) ?
         HVMIO_dispatched : HVMIO_awaiting_completion;
     vio->io_size = size;
@@ -188,8 +182,8 @@ static int hvmemul_do_io(
         }
         else
         {
-            rc = X86EMUL_RETRY;
-            if ( !hvm_send_assist_req(s, &p) )
+            rc = hvm_send_assist_req(s, &p);
+            if ( rc != X86EMUL_RETRY )
                 vio->io_state = HVMIO_none;
             else if ( data_is_addr || dir == IOREQ_WRITE )
                 rc = X86EMUL_OKAY;
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index d5e5242..535d622 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -2605,7 +2605,7 @@ int hvm_buffered_io_send(ioreq_t *p)
     return 1;
 }
 
-bool_t hvm_send_assist_req(struct hvm_ioreq_server *s, ioreq_t *proto_p)
+int hvm_send_assist_req(struct hvm_ioreq_server *s, ioreq_t *proto_p)
 {
     struct vcpu *curr = current;
     struct domain *d = curr->domain;
@@ -2613,7 +2613,7 @@ bool_t hvm_send_assist_req(struct hvm_ioreq_server *s, ioreq_t *proto_p)
 
     ASSERT(s);
     if ( unlikely(!vcpu_start_shutdown_deferral(curr)) )
-        return 0; /* implicitly bins the i/o operation */
+        return X86EMUL_OKAY;
 
     list_for_each_entry ( sv,
                           &s->ioreq_vcpu_list,
@@ -2628,14 +2628,14 @@ bool_t hvm_send_assist_req(struct hvm_ioreq_server *s, ioreq_t *proto_p)
             {
                 gprintk(XENLOG_ERR, "device model set bad IO state %d\n",
                         p->state);
-                goto crash;
+                break;
             }
 
             if ( unlikely(p->vp_eport != port) )
             {
                 gprintk(XENLOG_ERR, "device model set bad event channel %d\n",
                         p->vp_eport);
-                goto crash;
+                break;
             }
 
             proto_p->state = STATE_IOREQ_NONE;
@@ -2651,15 +2651,11 @@ bool_t hvm_send_assist_req(struct hvm_ioreq_server *s, ioreq_t *proto_p)
              */
             p->state = STATE_IOREQ_READY;
             notify_via_xen_event_channel(d, port);
-            break;
+            return X86EMUL_RETRY;
         }
     }
 
-    return 1;
-
- crash:
-    domain_crash(d);
-    return 0;
+    return X86EMUL_UNHANDLEABLE;
 }
 
 void hvm_complete_assist_req(ioreq_t *p)
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 77eeac5..57f9605 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -230,7 +230,7 @@ void hvm_vcpu_reset_state(struct vcpu *v, uint16_t cs, uint16_t ip);
 
 struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
                                                  ioreq_t *p);
-bool_t hvm_send_assist_req(struct hvm_ioreq_server *s, ioreq_t *p);
+int hvm_send_assist_req(struct hvm_ioreq_server *s, ioreq_t *p);
 void hvm_broadcast_assist_req(ioreq_t *p);
 void hvm_complete_assist_req(ioreq_t *p);
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 03/18] x86/hvm: remove extraneous parameter from hvmtrace_io_assist()
  2015-06-23 10:39 [PATCH v3 00/18] x86/hvm: I/O emulation cleanup and fix Paul Durrant
  2015-06-23 10:39 ` [PATCH v3 01/18] x86/hvm: simplify hvmemul_do_io() Paul Durrant
  2015-06-23 10:39 ` [PATCH v3 02/18] x86/hvm: remove hvm_io_pending() check in hvmemul_do_io() Paul Durrant
@ 2015-06-23 10:39 ` Paul Durrant
  2015-06-23 10:39 ` [PATCH v3 04/18] x86/hvm: make sure translated MMIO reads or writes fall within a page Paul Durrant
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: Paul Durrant @ 2015-06-23 10:39 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant, Keir Fraser, Jan Beulich

The is_mmio parameter can be inferred from the ioreq type.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/emulate.c |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index b412302..935eab3 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -23,8 +23,9 @@
 #include <asm/hvm/support.h>
 #include <asm/hvm/svm/svm.h>
 
-static void hvmtrace_io_assist(int is_mmio, ioreq_t *p)
+static void hvmtrace_io_assist(ioreq_t *p)
 {
+    bool_t is_mmio = (p->type == IOREQ_TYPE_COPY);
     unsigned int size, event;
     unsigned char buffer[12];
 
@@ -139,7 +140,7 @@ static int hvmemul_do_io(
         if ( !data_is_addr )
             memcpy(&p.data, p_data, size);
 
-        hvmtrace_io_assist(is_mmio, &p);
+        hvmtrace_io_assist(&p);
     }
 
     if ( is_mmio )
@@ -200,7 +201,7 @@ static int hvmemul_do_io(
  finish_access:
     if ( dir == IOREQ_READ )
     {
-        hvmtrace_io_assist(is_mmio, &p);
+        hvmtrace_io_assist(&p);
 
         if ( !data_is_addr )
             memcpy(p_data, &vio->io_data, size);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 04/18] x86/hvm: make sure translated MMIO reads or writes fall within a page
  2015-06-23 10:39 [PATCH v3 00/18] x86/hvm: I/O emulation cleanup and fix Paul Durrant
                   ` (2 preceding siblings ...)
  2015-06-23 10:39 ` [PATCH v3 03/18] x86/hvm: remove extraneous parameter from hvmtrace_io_assist() Paul Durrant
@ 2015-06-23 10:39 ` Paul Durrant
  2015-06-23 15:52   ` Jan Beulich
  2015-06-23 10:39 ` [PATCH v3 05/18] x86/hvm: remove multiple open coded 'chunking' loops Paul Durrant
                   ` (14 subsequent siblings)
  18 siblings, 1 reply; 29+ messages in thread
From: Paul Durrant @ 2015-06-23 10:39 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant, Keir Fraser, Jan Beulich

...otherwise they will simply carry on to the next page using a normal
linear-to-phys translation.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/emulate.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 935eab3..f3372fc 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -586,7 +586,6 @@ static int __hvmemul_read(
                                         p_data);
             if ( rc != X86EMUL_OKAY || bytes == chunk )
                 return rc;
-            addr += chunk;
             off += chunk;
             gpa += chunk;
             p_data += chunk;
@@ -594,6 +593,8 @@ static int __hvmemul_read(
             if ( bytes < chunk )
                 chunk = bytes;
         }
+
+        return X86EMUL_UNHANDLEABLE;
     }
 
     if ( (seg != x86_seg_none) &&
@@ -730,7 +731,6 @@ static int hvmemul_write(
                                         p_data);
             if ( rc != X86EMUL_OKAY || bytes == chunk )
                 return rc;
-            addr += chunk;
             off += chunk;
             gpa += chunk;
             p_data += chunk;
@@ -738,6 +738,8 @@ static int hvmemul_write(
             if ( bytes < chunk )
                 chunk = bytes;
         }
+
+        return X86EMUL_UNHANDLEABLE;
     }
 
     if ( (seg != x86_seg_none) &&
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 05/18] x86/hvm: remove multiple open coded 'chunking' loops
  2015-06-23 10:39 [PATCH v3 00/18] x86/hvm: I/O emulation cleanup and fix Paul Durrant
                   ` (3 preceding siblings ...)
  2015-06-23 10:39 ` [PATCH v3 04/18] x86/hvm: make sure translated MMIO reads or writes fall within a page Paul Durrant
@ 2015-06-23 10:39 ` Paul Durrant
  2015-06-24  9:38   ` Jan Beulich
  2015-06-23 10:39 ` [PATCH v3 06/18] x86/hvm: re-name struct hvm_mmio_handler to hvm_mmio_ops Paul Durrant
                   ` (13 subsequent siblings)
  18 siblings, 1 reply; 29+ messages in thread
From: Paul Durrant @ 2015-06-23 10:39 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant, Keir Fraser, Jan Beulich

...in hvmemul_read/write()

Add hvmemul_phys_mmio_access() and hvmemul_linear_mmio_access() functions
to reduce code duplication.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/emulate.c |  232 ++++++++++++++++++++++++--------------------
 1 file changed, 126 insertions(+), 106 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index f3372fc..02796d0 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -540,6 +540,115 @@ static int hvmemul_virtual_to_linear(
     return X86EMUL_EXCEPTION;
 }
 
+static int hvmemul_phys_mmio_access(
+    paddr_t gpa, unsigned int size, uint8_t dir, uint8_t *buffer,
+    unsigned int *off)
+{
+    unsigned long one_rep = 1;
+    unsigned int chunk;
+    int rc = 0;
+
+    /* Accesses must fall within a page */
+    if ( (gpa & (PAGE_SIZE - 1)) + size > PAGE_SIZE )
+        return X86EMUL_UNHANDLEABLE;
+
+    /*
+     * hvmemul_do_io() cannot handle non-power-of-2 accesses or
+     * accesses larger than sizeof(long), so choose the highest power
+     * of 2 not exceeding sizeof(long) as the 'chunk' size.
+     */
+    chunk = 1 << (fls(size) - 1);
+    if ( chunk > sizeof (long) )
+        chunk = sizeof (long);
+
+    while ( size != 0 )
+    {
+        rc = hvmemul_do_mmio_buffer(gpa, &one_rep, chunk, dir, 0,
+                                    &buffer[*off]);
+        if ( rc != X86EMUL_OKAY )
+            break;
+
+        /* Advance to the next chunk */
+        gpa += chunk;
+        *off += chunk;
+        size -= chunk;
+
+        /*
+         * If the chunk now exceeds the remaining size, choose the next
+         * lowest power of 2 that will fit.
+         */
+        while ( chunk > size )
+            chunk >>= 1;
+    }
+
+    return rc;
+}
+
+static inline int hvmemul_phys_mmio_read(
+    paddr_t gpa, unsigned int size, uint8_t *buffer, unsigned int *off)
+{
+    return hvmemul_phys_mmio_access(gpa, size, IOREQ_READ, buffer,
+                                    off);
+}
+
+static inline int hvmemul_phys_mmio_write(
+    paddr_t gpa, unsigned int size, uint8_t *buffer, unsigned int *off)
+{
+    return hvmemul_phys_mmio_access(gpa, size, IOREQ_WRITE, buffer,
+                                    off);
+}
+
+static int hvmemul_linear_mmio_access(
+    unsigned long gla, unsigned int size, uint8_t dir, void *buffer,
+    uint32_t pfec, struct hvm_emulate_ctxt *hvmemul_ctxt)
+{
+    unsigned long page_off = gla & (PAGE_SIZE - 1);
+    unsigned int chunk, buffer_off = 0;
+    paddr_t gpa;
+    unsigned long one_rep = 1;
+    int rc;
+
+    chunk = min_t(unsigned int, size, PAGE_SIZE - page_off);
+    rc = hvmemul_linear_to_phys(gla, &gpa, chunk, &one_rep, pfec,
+                                hvmemul_ctxt);
+    while ( rc == X86EMUL_OKAY )
+    {
+        rc = hvmemul_phys_mmio_access(gpa, chunk, dir, buffer,
+                                      &buffer_off);
+        if ( rc != X86EMUL_OKAY )
+            break;
+
+        gla += chunk;
+        size -= chunk;
+
+        if ( size == 0 )
+            break;
+
+        ASSERT((gla & (PAGE_SIZE - 1)) == 0);
+        chunk = min_t(unsigned int, size, PAGE_SIZE);
+        rc = hvmemul_linear_to_phys(gla, &gpa, chunk, &one_rep, pfec,
+                                    hvmemul_ctxt);
+    }
+
+    return rc;
+}
+
+static inline int hvmemul_linear_mmio_read(
+    unsigned long gla, unsigned int size, void *buffer,
+    uint32_t pfec, struct hvm_emulate_ctxt *hvmemul_ctxt)
+{
+    return hvmemul_linear_mmio_access(gla, size, IOREQ_READ, buffer,
+                                      pfec, hvmemul_ctxt);
+}
+
+static inline int hvmemul_linear_mmio_write(
+    unsigned long gla, unsigned int size, void *buffer,
+    uint32_t pfec, struct hvm_emulate_ctxt *hvmemul_ctxt)
+{
+    return hvmemul_linear_mmio_access(gla, size, IOREQ_WRITE, buffer,
+                                      pfec, hvmemul_ctxt);
+}
+
 static int __hvmemul_read(
     enum x86_segment seg,
     unsigned long offset,
@@ -549,52 +658,26 @@ static int __hvmemul_read(
     struct hvm_emulate_ctxt *hvmemul_ctxt)
 {
     struct vcpu *curr = current;
-    unsigned long addr, reps = 1;
-    unsigned int off, chunk = min(bytes, 1U << LONG_BYTEORDER);
+    unsigned long addr, one_rep = 1;
     uint32_t pfec = PFEC_page_present;
     struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
-    paddr_t gpa;
     int rc;
 
     rc = hvmemul_virtual_to_linear(
-        seg, offset, bytes, &reps, access_type, hvmemul_ctxt, &addr);
+        seg, offset, bytes, &one_rep, access_type, hvmemul_ctxt, &addr);
     if ( rc != X86EMUL_OKAY )
         return rc;
-    off = addr & (PAGE_SIZE - 1);
-    /*
-     * We only need to handle sizes actual instruction operands can have. All
-     * such sizes are either powers of 2 or the sum of two powers of 2. Thus
-     * picking as initial chunk size the largest power of 2 not greater than
-     * the total size will always result in only power-of-2 size requests
-     * issued to hvmemul_do_mmio() (hvmemul_do_io() rejects non-powers-of-2).
-     */
-    while ( chunk & (chunk - 1) )
-        chunk &= chunk - 1;
-    if ( off + bytes > PAGE_SIZE )
-        while ( off & (chunk - 1) )
-            chunk >>= 1;
 
     if ( ((access_type != hvm_access_insn_fetch
            ? vio->mmio_access.read_access
            : vio->mmio_access.insn_fetch)) &&
          (vio->mmio_gva == (addr & PAGE_MASK)) )
     {
-        gpa = (((paddr_t)vio->mmio_gpfn << PAGE_SHIFT) | off);
-        while ( (off + chunk) <= PAGE_SIZE )
-        {
-            rc = hvmemul_do_mmio_buffer(gpa, &reps, chunk, IOREQ_READ, 0,
-                                        p_data);
-            if ( rc != X86EMUL_OKAY || bytes == chunk )
-                return rc;
-            off += chunk;
-            gpa += chunk;
-            p_data += chunk;
-            bytes -= chunk;
-            if ( bytes < chunk )
-                chunk = bytes;
-        }
+        unsigned int off = 0;
+        paddr_t gpa = (((paddr_t)vio->mmio_gpfn << PAGE_SHIFT) |
+                       (addr & (PAGE_SIZE - 1)));
 
-        return X86EMUL_UNHANDLEABLE;
+        return hvmemul_phys_mmio_read(gpa, bytes, p_data, &off);
     }
 
     if ( (seg != x86_seg_none) &&
@@ -614,30 +697,9 @@ static int __hvmemul_read(
     case HVMCOPY_bad_gfn_to_mfn:
         if ( access_type == hvm_access_insn_fetch )
             return X86EMUL_UNHANDLEABLE;
-        rc = hvmemul_linear_to_phys(addr, &gpa, chunk, &reps, pfec,
-                                    hvmemul_ctxt);
-        while ( rc == X86EMUL_OKAY )
-        {
-            rc = hvmemul_do_mmio_buffer(gpa, &reps, chunk, IOREQ_READ, 0,
-                                        p_data);
-            if ( rc != X86EMUL_OKAY || bytes == chunk )
-                break;
-            addr += chunk;
-            off += chunk;
-            p_data += chunk;
-            bytes -= chunk;
-            if ( bytes < chunk )
-                chunk = bytes;
-            if ( off < PAGE_SIZE )
-                gpa += chunk;
-            else
-            {
-                rc = hvmemul_linear_to_phys(addr, &gpa, chunk, &reps, pfec,
-                                            hvmemul_ctxt);
-                off = 0;
-            }
-        }
-        return rc;
+
+        return hvmemul_linear_mmio_read(addr, bytes, p_data,
+                                        pfec, hvmemul_ctxt);
     case HVMCOPY_gfn_paged_out:
     case HVMCOPY_gfn_shared:
         return X86EMUL_RETRY;
@@ -702,44 +764,24 @@ static int hvmemul_write(
     struct hvm_emulate_ctxt *hvmemul_ctxt =
         container_of(ctxt, struct hvm_emulate_ctxt, ctxt);
     struct vcpu *curr = current;
-    unsigned long addr, reps = 1;
-    unsigned int off, chunk = min(bytes, 1U << LONG_BYTEORDER);
+    unsigned long addr, one_rep = 1;
     uint32_t pfec = PFEC_page_present | PFEC_write_access;
     struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
-    paddr_t gpa;
     int rc;
 
     rc = hvmemul_virtual_to_linear(
-        seg, offset, bytes, &reps, hvm_access_write, hvmemul_ctxt, &addr);
+        seg, offset, bytes, &one_rep, hvm_access_write, hvmemul_ctxt, &addr);
     if ( rc != X86EMUL_OKAY )
         return rc;
-    off = addr & (PAGE_SIZE - 1);
-    /* See the respective comment in __hvmemul_read(). */
-    while ( chunk & (chunk - 1) )
-        chunk &= chunk - 1;
-    if ( off + bytes > PAGE_SIZE )
-        while ( off & (chunk - 1) )
-            chunk >>= 1;
 
     if ( vio->mmio_access.write_access &&
          (vio->mmio_gva == (addr & PAGE_MASK)) )
     {
-        gpa = (((paddr_t)vio->mmio_gpfn << PAGE_SHIFT) | off);
-        while ( (off + chunk) <= PAGE_SIZE )
-        {
-            rc = hvmemul_do_mmio_buffer(gpa, &reps, chunk, IOREQ_WRITE, 0,
-                                        p_data);
-            if ( rc != X86EMUL_OKAY || bytes == chunk )
-                return rc;
-            off += chunk;
-            gpa += chunk;
-            p_data += chunk;
-            bytes -= chunk;
-            if ( bytes < chunk )
-                chunk = bytes;
-        }
+        unsigned int off = 0;
+        paddr_t gpa = (((paddr_t)vio->mmio_gpfn << PAGE_SHIFT) |
+                       (addr & (PAGE_SIZE - 1)));
 
-        return X86EMUL_UNHANDLEABLE;
+        return hvmemul_phys_mmio_write(gpa, bytes, p_data, &off);
     }
 
     if ( (seg != x86_seg_none) &&
@@ -755,30 +797,8 @@ static int hvmemul_write(
     case HVMCOPY_bad_gva_to_gfn:
         return X86EMUL_EXCEPTION;
     case HVMCOPY_bad_gfn_to_mfn:
-        rc = hvmemul_linear_to_phys(addr, &gpa, chunk, &reps, pfec,
-                                    hvmemul_ctxt);
-        while ( rc == X86EMUL_OKAY )
-        {
-            rc = hvmemul_do_mmio_buffer(gpa, &reps, chunk, IOREQ_WRITE, 0,
-                                        p_data);
-            if ( rc != X86EMUL_OKAY || bytes == chunk )
-                break;
-            addr += chunk;
-            off += chunk;
-            p_data += chunk;
-            bytes -= chunk;
-            if ( bytes < chunk )
-                chunk = bytes;
-            if ( off < PAGE_SIZE )
-                gpa += chunk;
-            else
-            {
-                rc = hvmemul_linear_to_phys(addr, &gpa, chunk, &reps, pfec,
-                                            hvmemul_ctxt);
-                off = 0;
-            }
-        }
-        return rc;
+        return hvmemul_linear_mmio_write(addr, bytes, p_data,
+                                         pfec, hvmemul_ctxt);
     case HVMCOPY_gfn_paged_out:
     case HVMCOPY_gfn_shared:
         return X86EMUL_RETRY;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 06/18] x86/hvm: re-name struct hvm_mmio_handler to hvm_mmio_ops
  2015-06-23 10:39 [PATCH v3 00/18] x86/hvm: I/O emulation cleanup and fix Paul Durrant
                   ` (4 preceding siblings ...)
  2015-06-23 10:39 ` [PATCH v3 05/18] x86/hvm: remove multiple open coded 'chunking' loops Paul Durrant
@ 2015-06-23 10:39 ` Paul Durrant
  2015-06-23 10:39 ` [PATCH v3 07/18] x86/hvm: unify internal portio and mmio intercepts Paul Durrant
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: Paul Durrant @ 2015-06-23 10:39 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant, Keir Fraser

The struct just contains three methods and no data, so the name
hvm_mmio_ops more accurately reflects its content. A subsequent patch
introduces a new structure which more accurately warrants the name
hvm_mmio_handler so doing the rename in this purely cosmetic patch avoids
conflating functional and cosmetic changes in a single patch.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 tools/tests/vhpet/emul.h                  |    8 +++---
 tools/tests/vhpet/main.c                  |    6 ++---
 xen/arch/x86/hvm/hpet.c                   |    8 +++---
 xen/arch/x86/hvm/intercept.c              |   41 ++++++++++++++---------------
 xen/arch/x86/hvm/vioapic.c                |    8 +++---
 xen/arch/x86/hvm/vlapic.c                 |    8 +++---
 xen/arch/x86/hvm/vmsi.c                   |    8 +++---
 xen/drivers/passthrough/amd/iommu_guest.c |    8 +++---
 xen/include/asm-x86/hvm/io.h              |   18 ++++++-------
 9 files changed, 56 insertions(+), 57 deletions(-)

diff --git a/tools/tests/vhpet/emul.h b/tools/tests/vhpet/emul.h
index 09e4611..383acff 100644
--- a/tools/tests/vhpet/emul.h
+++ b/tools/tests/vhpet/emul.h
@@ -237,11 +237,11 @@ typedef int (*hvm_mmio_write_t)(struct vcpu *v,
 typedef int (*hvm_mmio_check_t)(struct vcpu *v, unsigned long addr);
 
 
-struct hvm_mmio_handler
+struct hvm_mmio_ops
 {
-    hvm_mmio_check_t check_handler;
-    hvm_mmio_read_t read_handler;
-    hvm_mmio_write_t write_handler;
+    hvm_mmio_check_t check;
+    hvm_mmio_read_t  read;
+    hvm_mmio_write_t write;
 };
 
 /* Marshalling and unmarshalling uses a buffer with size and cursor. */
diff --git a/tools/tests/vhpet/main.c b/tools/tests/vhpet/main.c
index fbd7510..6fe65ea 100644
--- a/tools/tests/vhpet/main.c
+++ b/tools/tests/vhpet/main.c
@@ -70,7 +70,7 @@ static int skip_error_on_load;
 
 static char *global_thousep;
 
-extern const struct hvm_mmio_handler hpet_mmio_handler;
+extern const struct hvm_mmio_ops hpet_mmio_ops;
 
 struct domain dom1;
 struct vcpu vcpu0;
@@ -297,13 +297,13 @@ void udelay(int w)
 unsigned int hpet_readl(unsigned long a)
 {
     unsigned long ret = 0;
-    hpet_mmio_handler.read_handler(current, a, 4, &ret);
+    hpet_mmio_ops.read(current, a, 4, &ret);
     return ret;
 }
 
 void hpet_writel(unsigned long d, unsigned long a)
 {
-    hpet_mmio_handler.write_handler(current, a, 4, d);
+    hpet_mmio_ops.write(current, a, 4, d);
     return;
 }
 
diff --git a/xen/arch/x86/hvm/hpet.c b/xen/arch/x86/hvm/hpet.c
index d898169..9585ca8 100644
--- a/xen/arch/x86/hvm/hpet.c
+++ b/xen/arch/x86/hvm/hpet.c
@@ -504,10 +504,10 @@ static int hpet_range(struct vcpu *v, unsigned long addr)
              (addr < (HPET_BASE_ADDRESS + HPET_MMAP_SIZE)) );
 }
 
-const struct hvm_mmio_handler hpet_mmio_handler = {
-    .check_handler = hpet_range,
-    .read_handler  = hpet_read,
-    .write_handler = hpet_write
+const struct hvm_mmio_ops hpet_mmio_ops = {
+    .check = hpet_range,
+    .read  = hpet_read,
+    .write = hpet_write
 };
 
 
diff --git a/xen/arch/x86/hvm/intercept.c b/xen/arch/x86/hvm/intercept.c
index d52a48c..cc44733 100644
--- a/xen/arch/x86/hvm/intercept.c
+++ b/xen/arch/x86/hvm/intercept.c
@@ -32,20 +32,20 @@
 #include <xen/event.h>
 #include <xen/iommu.h>
 
-static const struct hvm_mmio_handler *const
+static const struct hvm_mmio_ops *const
 hvm_mmio_handlers[HVM_MMIO_HANDLER_NR] =
 {
-    &hpet_mmio_handler,
-    &vlapic_mmio_handler,
-    &vioapic_mmio_handler,
-    &msixtbl_mmio_handler,
-    &iommu_mmio_handler
+    &hpet_mmio_ops,
+    &vlapic_mmio_ops,
+    &vioapic_mmio_ops,
+    &msixtbl_mmio_ops,
+    &iommu_mmio_ops
 };
 
 static int hvm_mmio_access(struct vcpu *v,
                            ioreq_t *p,
-                           hvm_mmio_read_t read_handler,
-                           hvm_mmio_write_t write_handler)
+                           hvm_mmio_read_t read,
+                           hvm_mmio_write_t write)
 {
     struct hvm_vcpu_io *vio = &v->arch.hvm_vcpu.hvm_io;
     unsigned long data;
@@ -64,11 +64,11 @@ static int hvm_mmio_access(struct vcpu *v,
                 vio->mmio_retrying = 0;
             }
             else
-                rc = read_handler(v, p->addr, p->size, &data);
+                rc = read(v, p->addr, p->size, &data);
             p->data = data;
         }
         else /* p->dir == IOREQ_WRITE */
-            rc = write_handler(v, p->addr, p->size, p->data);
+            rc = write(v, p->addr, p->size, p->data);
         return rc;
     }
 
@@ -86,7 +86,7 @@ static int hvm_mmio_access(struct vcpu *v,
             }
             else
             {
-                rc = read_handler(v, p->addr + step * i, p->size, &data);
+                rc = read(v, p->addr + step * i, p->size, &data);
                 if ( rc != X86EMUL_OKAY )
                     break;
             }
@@ -145,7 +145,7 @@ static int hvm_mmio_access(struct vcpu *v,
             }
             if ( rc != X86EMUL_OKAY )
                 break;
-            rc = write_handler(v, p->addr + step * i, p->size, data);
+            rc = write(v, p->addr + step * i, p->size, data);
             if ( rc != X86EMUL_OKAY )
                 break;
         }
@@ -169,7 +169,7 @@ bool_t hvm_mmio_internal(paddr_t gpa)
     unsigned int i;
 
     for ( i = 0; i < HVM_MMIO_HANDLER_NR; ++i )
-        if ( hvm_mmio_handlers[i]->check_handler(curr, gpa) )
+        if ( hvm_mmio_handlers[i]->check(curr, gpa) )
             return 1;
 
     return 0;
@@ -182,21 +182,20 @@ int hvm_mmio_intercept(ioreq_t *p)
 
     for ( i = 0; i < HVM_MMIO_HANDLER_NR; i++ )
     {
-        hvm_mmio_check_t check_handler =
-            hvm_mmio_handlers[i]->check_handler;
+        hvm_mmio_check_t check = hvm_mmio_handlers[i]->check;
 
-        if ( check_handler(v, p->addr) )
+        if ( check(v, p->addr) )
         {
             if ( unlikely(p->count > 1) &&
-                 !check_handler(v, unlikely(p->df)
-                                   ? p->addr - (p->count - 1L) * p->size
-                                   : p->addr + (p->count - 1L) * p->size) )
+                 !check(v, unlikely(p->df)
+                        ? p->addr - (p->count - 1L) * p->size
+                        : p->addr + (p->count - 1L) * p->size) )
                 p->count = 1;
 
             return hvm_mmio_access(
                 v, p,
-                hvm_mmio_handlers[i]->read_handler,
-                hvm_mmio_handlers[i]->write_handler);
+                hvm_mmio_handlers[i]->read,
+                hvm_mmio_handlers[i]->write);
         }
     }
 
diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
index f903420..cbbef9f 100644
--- a/xen/arch/x86/hvm/vioapic.c
+++ b/xen/arch/x86/hvm/vioapic.c
@@ -250,10 +250,10 @@ static int vioapic_range(struct vcpu *v, unsigned long addr)
              (addr < vioapic->base_address + VIOAPIC_MEM_LENGTH)));
 }
 
-const struct hvm_mmio_handler vioapic_mmio_handler = {
-    .check_handler = vioapic_range,
-    .read_handler = vioapic_read,
-    .write_handler = vioapic_write
+const struct hvm_mmio_ops vioapic_mmio_ops = {
+    .check = vioapic_range,
+    .read = vioapic_read,
+    .write = vioapic_write
 };
 
 static void ioapic_inj_irq(
diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index fbc51d1..d5855cc 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -996,10 +996,10 @@ static int vlapic_range(struct vcpu *v, unsigned long addr)
            (offset < PAGE_SIZE);
 }
 
-const struct hvm_mmio_handler vlapic_mmio_handler = {
-    .check_handler = vlapic_range,
-    .read_handler = vlapic_read,
-    .write_handler = vlapic_write
+const struct hvm_mmio_ops vlapic_mmio_ops = {
+    .check = vlapic_range,
+    .read = vlapic_read,
+    .write = vlapic_write
 };
 
 static void set_x2apic_id(struct vlapic *vlapic)
diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
index 3dbbe37..f89233d 100644
--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -344,10 +344,10 @@ static int msixtbl_range(struct vcpu *v, unsigned long addr)
     return !!desc;
 }
 
-const struct hvm_mmio_handler msixtbl_mmio_handler = {
-    .check_handler = msixtbl_range,
-    .read_handler = msixtbl_read,
-    .write_handler = msixtbl_write
+const struct hvm_mmio_ops msixtbl_mmio_ops = {
+    .check = msixtbl_range,
+    .read = msixtbl_read,
+    .write = msixtbl_write
 };
 
 static void add_msixtbl_entry(struct domain *d,
diff --git a/xen/drivers/passthrough/amd/iommu_guest.c b/xen/drivers/passthrough/amd/iommu_guest.c
index 98e7b38..7b0c102 100644
--- a/xen/drivers/passthrough/amd/iommu_guest.c
+++ b/xen/drivers/passthrough/amd/iommu_guest.c
@@ -919,8 +919,8 @@ static int guest_iommu_mmio_range(struct vcpu *v, unsigned long addr)
            addr < iommu->mmio_base + IOMMU_MMIO_SIZE;
 }
 
-const struct hvm_mmio_handler iommu_mmio_handler = {
-    .check_handler = guest_iommu_mmio_range,
-    .read_handler = guest_iommu_mmio_read,
-    .write_handler = guest_iommu_mmio_write
+const struct hvm_mmio_ops iommu_mmio_ops = {
+    .check = guest_iommu_mmio_range,
+    .read = guest_iommu_mmio_read,
+    .write = guest_iommu_mmio_write
 };
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index 886a9d6..f2aaec5 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -59,17 +59,17 @@ struct hvm_io_handler {
     struct  io_handler hdl_list[MAX_IO_HANDLER];
 };
 
-struct hvm_mmio_handler {
-    hvm_mmio_check_t check_handler;
-    hvm_mmio_read_t read_handler;
-    hvm_mmio_write_t write_handler;
+struct hvm_mmio_ops {
+    hvm_mmio_check_t check;
+    hvm_mmio_read_t  read;
+    hvm_mmio_write_t write;
 };
 
-extern const struct hvm_mmio_handler hpet_mmio_handler;
-extern const struct hvm_mmio_handler vlapic_mmio_handler;
-extern const struct hvm_mmio_handler vioapic_mmio_handler;
-extern const struct hvm_mmio_handler msixtbl_mmio_handler;
-extern const struct hvm_mmio_handler iommu_mmio_handler;
+extern const struct hvm_mmio_ops hpet_mmio_ops;
+extern const struct hvm_mmio_ops vlapic_mmio_ops;
+extern const struct hvm_mmio_ops vioapic_mmio_ops;
+extern const struct hvm_mmio_ops msixtbl_mmio_ops;
+extern const struct hvm_mmio_ops iommu_mmio_ops;
 
 #define HVM_MMIO_HANDLER_NR 5
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 07/18] x86/hvm: unify internal portio and mmio intercepts
  2015-06-23 10:39 [PATCH v3 00/18] x86/hvm: I/O emulation cleanup and fix Paul Durrant
                   ` (5 preceding siblings ...)
  2015-06-23 10:39 ` [PATCH v3 06/18] x86/hvm: re-name struct hvm_mmio_handler to hvm_mmio_ops Paul Durrant
@ 2015-06-23 10:39 ` Paul Durrant
  2015-06-24 12:07   ` Jan Beulich
  2015-06-23 10:39 ` [PATCH v3 08/18] x86/hvm: add length to mmio check op Paul Durrant
                   ` (11 subsequent siblings)
  18 siblings, 1 reply; 29+ messages in thread
From: Paul Durrant @ 2015-06-23 10:39 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant, Keir Fraser, Jan Beulich

The implementation of mmio and portio intercepts is unnecessarily different.
This leads to much code duplication. This patch unifies much of the
intercept handling, leaving only distinct handlers for stdvga mmio and dpci
portio. Subsequent patches will unify those handlers.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/emulate.c                |   11 +-
 xen/arch/x86/hvm/hpet.c                   |    4 +-
 xen/arch/x86/hvm/hvm.c                    |    7 +-
 xen/arch/x86/hvm/intercept.c              |  502 +++++++++++++----------------
 xen/arch/x86/hvm/stdvga.c                 |   30 +-
 xen/arch/x86/hvm/vioapic.c                |    4 +-
 xen/arch/x86/hvm/vlapic.c                 |    5 +-
 xen/arch/x86/hvm/vmsi.c                   |    7 +-
 xen/drivers/passthrough/amd/iommu_guest.c |   30 +-
 xen/include/asm-x86/hvm/domain.h          |    1 +
 xen/include/asm-x86/hvm/io.h              |  119 +++----
 11 files changed, 350 insertions(+), 370 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 02796d0..91e57b0 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -143,16 +143,7 @@ static int hvmemul_do_io(
         hvmtrace_io_assist(&p);
     }
 
-    if ( is_mmio )
-    {
-        rc = hvm_mmio_intercept(&p);
-        if ( rc == X86EMUL_UNHANDLEABLE )
-            rc = hvm_buffered_io_intercept(&p);
-    }
-    else
-    {
-        rc = hvm_portio_intercept(&p);
-    }
+    rc = hvm_io_intercept(&p);
 
     switch ( rc )
     {
diff --git a/xen/arch/x86/hvm/hpet.c b/xen/arch/x86/hvm/hpet.c
index 9585ca8..8958873 100644
--- a/xen/arch/x86/hvm/hpet.c
+++ b/xen/arch/x86/hvm/hpet.c
@@ -504,7 +504,7 @@ static int hpet_range(struct vcpu *v, unsigned long addr)
              (addr < (HPET_BASE_ADDRESS + HPET_MMAP_SIZE)) );
 }
 
-const struct hvm_mmio_ops hpet_mmio_ops = {
+static const struct hvm_mmio_ops hpet_mmio_ops = {
     .check = hpet_range,
     .read  = hpet_read,
     .write = hpet_write
@@ -659,6 +659,8 @@ void hpet_init(struct domain *d)
         h->hpet.comparator64[i] = ~0ULL;
         h->pt[i].source = PTSRC_isa;
     }
+
+    register_mmio_handler(d, &hpet_mmio_ops);
 }
 
 void hpet_deinit(struct domain *d)
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 535d622..c10db78 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1465,11 +1465,12 @@ int hvm_domain_initialise(struct domain *d)
         goto fail0;
 
     d->arch.hvm_domain.params = xzalloc_array(uint64_t, HVM_NR_PARAMS);
-    d->arch.hvm_domain.io_handler = xmalloc(struct hvm_io_handler);
+    d->arch.hvm_domain.io_handler = xzalloc_array(struct hvm_io_handler,
+                                                  NR_IO_HANDLERS);
     rc = -ENOMEM;
     if ( !d->arch.hvm_domain.params || !d->arch.hvm_domain.io_handler )
         goto fail1;
-    d->arch.hvm_domain.io_handler->num_slot = 0;
+    d->arch.hvm_domain.io_handler_count = 0;
 
     /* Set the default IO Bitmap. */
     if ( is_hardware_domain(d) )
@@ -1506,6 +1507,8 @@ int hvm_domain_initialise(struct domain *d)
 
     rtc_init(d);
 
+    msixtbl_init(d);
+
     register_portio_handler(d, 0xe9, 1, hvm_print_line);
     register_portio_handler(d, 0xcf8, 4, hvm_access_cf8);
 
diff --git a/xen/arch/x86/hvm/intercept.c b/xen/arch/x86/hvm/intercept.c
index cc44733..4db024e 100644
--- a/xen/arch/x86/hvm/intercept.c
+++ b/xen/arch/x86/hvm/intercept.c
@@ -32,205 +32,97 @@
 #include <xen/event.h>
 #include <xen/iommu.h>
 
-static const struct hvm_mmio_ops *const
-hvm_mmio_handlers[HVM_MMIO_HANDLER_NR] =
+static bool_t hvm_mmio_accept(struct hvm_io_handler *handler,
+                              uint64_t addr,
+                              uint64_t size)
 {
-    &hpet_mmio_ops,
-    &vlapic_mmio_ops,
-    &vioapic_mmio_ops,
-    &msixtbl_mmio_ops,
-    &iommu_mmio_ops
-};
+    BUG_ON(handler->type != IOREQ_TYPE_COPY);
+
+    return handler->u.mmio.ops->check(current, addr);
+}
 
-static int hvm_mmio_access(struct vcpu *v,
-                           ioreq_t *p,
-                           hvm_mmio_read_t read,
-                           hvm_mmio_write_t write)
+static int hvm_mmio_read(struct hvm_io_handler *handler,
+                         uint64_t addr,
+                         uint64_t size,
+                         uint64_t *data)
 {
-    struct hvm_vcpu_io *vio = &v->arch.hvm_vcpu.hvm_io;
-    unsigned long data;
-    int rc = X86EMUL_OKAY, i, step = p->df ? -p->size : p->size;
+    BUG_ON(handler->type != IOREQ_TYPE_COPY);
 
-    if ( !p->data_is_ptr )
-    {
-        if ( p->dir == IOREQ_READ )
-        {
-            if ( vio->mmio_retrying )
-            {
-                if ( vio->mmio_large_read_bytes != p->size )
-                    return X86EMUL_UNHANDLEABLE;
-                memcpy(&data, vio->mmio_large_read, p->size);
-                vio->mmio_large_read_bytes = 0;
-                vio->mmio_retrying = 0;
-            }
-            else
-                rc = read(v, p->addr, p->size, &data);
-            p->data = data;
-        }
-        else /* p->dir == IOREQ_WRITE */
-            rc = write(v, p->addr, p->size, p->data);
-        return rc;
-    }
+    return handler->u.mmio.ops->read(current, addr, size, data);
+}
 
-    if ( p->dir == IOREQ_READ )
-    {
-        for ( i = 0; i < p->count; i++ )
-        {
-            if ( vio->mmio_retrying )
-            {
-                if ( vio->mmio_large_read_bytes != p->size )
-                    return X86EMUL_UNHANDLEABLE;
-                memcpy(&data, vio->mmio_large_read, p->size);
-                vio->mmio_large_read_bytes = 0;
-                vio->mmio_retrying = 0;
-            }
-            else
-            {
-                rc = read(v, p->addr + step * i, p->size, &data);
-                if ( rc != X86EMUL_OKAY )
-                    break;
-            }
-            switch ( hvm_copy_to_guest_phys(p->data + step * i,
-                                            &data, p->size) )
-            {
-            case HVMCOPY_okay:
-                break;
-            case HVMCOPY_gfn_paged_out:
-            case HVMCOPY_gfn_shared:
-                rc = X86EMUL_RETRY;
-                break;
-            case HVMCOPY_bad_gfn_to_mfn:
-                /* Drop the write as real hardware would. */
-                continue;
-            case HVMCOPY_bad_gva_to_gfn:
-                ASSERT(0);
-                /* fall through */
-            default:
-                rc = X86EMUL_UNHANDLEABLE;
-                break;
-            }
-            if ( rc != X86EMUL_OKAY)
-                break;
-        }
+static int hvm_mmio_write(struct hvm_io_handler *handler,
+                          uint64_t addr,
+                          uint64_t size,
+                          uint64_t data)
+{
+    BUG_ON(handler->type != IOREQ_TYPE_COPY);
 
-        if ( rc == X86EMUL_RETRY )
-        {
-            vio->mmio_retry = 1;
-            vio->mmio_large_read_bytes = p->size;
-            memcpy(vio->mmio_large_read, &data, p->size);
-        }
-    }
-    else
-    {
-        for ( i = 0; i < p->count; i++ )
-        {
-            switch ( hvm_copy_from_guest_phys(&data, p->data + step * i,
-                                              p->size) )
-            {
-            case HVMCOPY_okay:
-                break;
-            case HVMCOPY_gfn_paged_out:
-            case HVMCOPY_gfn_shared:
-                rc = X86EMUL_RETRY;
-                break;
-            case HVMCOPY_bad_gfn_to_mfn:
-                data = ~0;
-                break;
-            case HVMCOPY_bad_gva_to_gfn:
-                ASSERT(0);
-                /* fall through */
-            default:
-                rc = X86EMUL_UNHANDLEABLE;
-                break;
-            }
-            if ( rc != X86EMUL_OKAY )
-                break;
-            rc = write(v, p->addr + step * i, p->size, data);
-            if ( rc != X86EMUL_OKAY )
-                break;
-        }
+    return handler->u.mmio.ops->write(current, addr, size, data);
+}
 
-        if ( rc == X86EMUL_RETRY )
-            vio->mmio_retry = 1;
-    }
+static const struct hvm_io_ops mmio_ops = {
+    .accept = hvm_mmio_accept,
+    .read = hvm_mmio_read,
+    .write = hvm_mmio_write
+};
 
-    if ( i != 0 )
-    {
-        p->count = i;
-        rc = X86EMUL_OKAY;
-    }
+static bool_t hvm_portio_accept(struct hvm_io_handler *handler,
+                                uint64_t addr,
+                                uint64_t size)
+{
+    BUG_ON(handler->type != IOREQ_TYPE_PIO);
 
-    return rc;
+    return (addr >= handler->u.portio.start) &&
+           ((addr + size) <= handler->u.portio.end);
 }
 
-bool_t hvm_mmio_internal(paddr_t gpa)
+static int hvm_portio_read(struct hvm_io_handler *handler,
+                           uint64_t addr,
+                           uint64_t size,
+                           uint64_t *data)
 {
-    struct vcpu *curr = current;
-    unsigned int i;
+    uint32_t val = 0;
+    int rc;
 
-    for ( i = 0; i < HVM_MMIO_HANDLER_NR; ++i )
-        if ( hvm_mmio_handlers[i]->check(curr, gpa) )
-            return 1;
+    BUG_ON(handler->type != IOREQ_TYPE_PIO);
 
-    return 0;
+    rc = handler->u.portio.action(IOREQ_READ, addr, size, &val);
+    if ( rc == X86EMUL_OKAY )
+        *data = val;
+
+    return rc;
 }
 
-int hvm_mmio_intercept(ioreq_t *p)
+static int hvm_portio_write(struct hvm_io_handler *handler,
+                            uint64_t addr,
+                            uint64_t size,
+                            uint64_t data)
 {
-    struct vcpu *v = current;
-    int i;
+    uint32_t val = data;
 
-    for ( i = 0; i < HVM_MMIO_HANDLER_NR; i++ )
-    {
-        hvm_mmio_check_t check = hvm_mmio_handlers[i]->check;
-
-        if ( check(v, p->addr) )
-        {
-            if ( unlikely(p->count > 1) &&
-                 !check(v, unlikely(p->df)
-                        ? p->addr - (p->count - 1L) * p->size
-                        : p->addr + (p->count - 1L) * p->size) )
-                p->count = 1;
-
-            return hvm_mmio_access(
-                v, p,
-                hvm_mmio_handlers[i]->read,
-                hvm_mmio_handlers[i]->write);
-        }
-    }
+    BUG_ON(handler->type != IOREQ_TYPE_PIO);
 
-    return X86EMUL_UNHANDLEABLE;
+    return handler->u.portio.action(IOREQ_WRITE, addr, size, &val);
 }
 
-static int process_portio_intercept(portio_action_t action, ioreq_t *p)
+static const struct hvm_io_ops portio_ops = {
+    .accept = hvm_portio_accept,
+    .read = hvm_portio_read,
+    .write = hvm_portio_write
+};
+
+static int hvm_process_io_intercept(struct hvm_io_handler *handler,
+                                    ioreq_t *p)
 {
     struct hvm_vcpu_io *vio = &current->arch.hvm_vcpu.hvm_io;
+    const struct hvm_io_ops *ops =
+        (p->type == IOREQ_TYPE_COPY) ?
+        &mmio_ops :
+        &portio_ops;
     int rc = X86EMUL_OKAY, i, step = p->df ? -p->size : p->size;
-    uint32_t data;
-
-    if ( !p->data_is_ptr )
-    {
-        if ( p->dir == IOREQ_READ )
-        {
-            if ( vio->mmio_retrying )
-            {
-                if ( vio->mmio_large_read_bytes != p->size )
-                    return X86EMUL_UNHANDLEABLE;
-                memcpy(&data, vio->mmio_large_read, p->size);
-                vio->mmio_large_read_bytes = 0;
-                vio->mmio_retrying = 0;
-            }
-            else
-                rc = action(IOREQ_READ, p->addr, p->size, &data);
-            p->data = data;
-        }
-        else
-        {
-            data = p->data;
-            rc = action(IOREQ_WRITE, p->addr, p->size, &data);
-        }
-        return rc;
-    }
+    uint64_t data;
+    uint64_t addr;
 
     if ( p->dir == IOREQ_READ )
     {
@@ -246,31 +138,40 @@ static int process_portio_intercept(portio_action_t action, ioreq_t *p)
             }
             else
             {
-                rc = action(IOREQ_READ, p->addr, p->size, &data);
+                addr = (p->type == IOREQ_TYPE_COPY) ?
+                    p->addr + step * i :
+                    p->addr;
+                rc = ops->read(handler, addr, p->size, &data);
                 if ( rc != X86EMUL_OKAY )
                     break;
             }
-            switch ( hvm_copy_to_guest_phys(p->data + step * i,
-                                            &data, p->size) )
+
+            if ( p->data_is_ptr )
             {
-            case HVMCOPY_okay:
-                break;
-            case HVMCOPY_gfn_paged_out:
-            case HVMCOPY_gfn_shared:
-                rc = X86EMUL_RETRY;
-                break;
-            case HVMCOPY_bad_gfn_to_mfn:
-                /* Drop the write as real hardware would. */
-                continue;
-            case HVMCOPY_bad_gva_to_gfn:
-                ASSERT(0);
-                /* fall through */
-            default:
-                rc = X86EMUL_UNHANDLEABLE;
-                break;
+                switch ( hvm_copy_to_guest_phys(p->data + step * i,
+                                                &data, p->size) )
+                {
+                case HVMCOPY_okay:
+                    break;
+                case HVMCOPY_gfn_paged_out:
+                case HVMCOPY_gfn_shared:
+                    rc = X86EMUL_RETRY;
+                    break;
+                case HVMCOPY_bad_gfn_to_mfn:
+                    /* Drop the write as real hardware would. */
+                    continue;
+                case HVMCOPY_bad_gva_to_gfn:
+                    ASSERT_UNREACHABLE();
+                    /* fall through */
+                default:
+                    rc = X86EMUL_UNHANDLEABLE;
+                    break;
+                }
+                if ( rc != X86EMUL_OKAY )
+                    break;
             }
-            if ( rc != X86EMUL_OKAY)
-                break;
+            else
+                p->data = data;
         }
 
         if ( rc == X86EMUL_RETRY )
@@ -284,29 +185,37 @@ static int process_portio_intercept(portio_action_t action, ioreq_t *p)
     {
         for ( i = 0; i < p->count; i++ )
         {
-            data = 0;
-            switch ( hvm_copy_from_guest_phys(&data, p->data + step * i,
-                                              p->size) )
+            if ( p->data_is_ptr )
             {
-            case HVMCOPY_okay:
-                break;
-            case HVMCOPY_gfn_paged_out:
-            case HVMCOPY_gfn_shared:
-                rc = X86EMUL_RETRY;
-                break;
-            case HVMCOPY_bad_gfn_to_mfn:
-                data = ~0;
-                break;
-            case HVMCOPY_bad_gva_to_gfn:
-                ASSERT(0);
-                /* fall through */
-            default:
-                rc = X86EMUL_UNHANDLEABLE;
-                break;
+                switch ( hvm_copy_from_guest_phys(&data, p->data + step * i,
+                                                  p->size) )
+                {
+                case HVMCOPY_okay:
+                    break;
+                case HVMCOPY_gfn_paged_out:
+                case HVMCOPY_gfn_shared:
+                    rc = X86EMUL_RETRY;
+                    break;
+                case HVMCOPY_bad_gfn_to_mfn:
+                    data = ~0;
+                    break;
+                case HVMCOPY_bad_gva_to_gfn:
+                    ASSERT_UNREACHABLE();
+                    /* fall through */
+                default:
+                    rc = X86EMUL_UNHANDLEABLE;
+                    break;
+                }
+                if ( rc != X86EMUL_OKAY )
+                    break;
             }
-            if ( rc != X86EMUL_OKAY )
-                break;
-            rc = action(IOREQ_WRITE, p->addr, p->size, &data);
+            else
+                data = p->data;
+
+            addr = (p->type == IOREQ_TYPE_COPY) ?
+                p->addr + step * i :
+                p->addr;
+            rc = ops->write(handler, addr, p->size, data);
             if ( rc != X86EMUL_OKAY )
                 break;
         }
@@ -324,78 +233,133 @@ static int process_portio_intercept(portio_action_t action, ioreq_t *p)
     return rc;
 }
 
-/*
- * Check if the request is handled inside xen
- * return value: 0 --not handled; 1 --handled
- */
-int hvm_io_intercept(ioreq_t *p, int type)
+static struct hvm_io_handler *hvm_find_io_handler(ioreq_t *p)
+{
+    struct vcpu *curr = current;
+    struct domain *curr_d = curr->domain;
+    const struct hvm_io_ops *ops =
+        (p->type == IOREQ_TYPE_COPY) ?
+        &mmio_ops :
+        &portio_ops;
+    unsigned int i;
+
+    for ( i = 0; i < curr_d->arch.hvm_domain.io_handler_count; i++ )
+    {
+        struct hvm_io_handler *handler =
+            &curr_d->arch.hvm_domain.io_handler[i];
+        uint64_t start, end, count = p->count, size = p->size;
+
+        if ( handler->type != p->type )
+            continue;
+
+        switch ( handler->type )
+        {
+        case IOREQ_TYPE_PIO:
+            start = p->addr;
+            end = p->addr + size;
+            break;
+        case IOREQ_TYPE_COPY:
+            if ( p->df )
+            {
+                start = (p->addr - (count - 1) * size);
+                end = p->addr + size;
+            }
+            else
+            {
+                start = p->addr;
+                end = p->addr + count * size;
+            }
+            break;
+        default:
+            BUG();
+        }
+
+        if ( ops->accept(handler, start, end - start) )
+            return handler;
+    }
+
+    return NULL;
+}
+
+int hvm_io_intercept(ioreq_t *p)
 {
-    struct vcpu *v = current;
-    struct hvm_io_handler *handler = v->domain->arch.hvm_domain.io_handler;
-    int i;
-    unsigned long addr, size;
+    struct hvm_io_handler *handler;
 
-    if ( type == HVM_PORTIO )
+    if ( p->type == IOREQ_TYPE_PIO )
     {
         int rc = dpci_ioport_intercept(p);
         if ( (rc == X86EMUL_OKAY) || (rc == X86EMUL_RETRY) )
             return rc;
     }
-
-    for ( i = 0; i < handler->num_slot; i++ )
+    else if ( p->type == IOREQ_TYPE_COPY )
     {
-        if ( type != handler->hdl_list[i].type )
-            continue;
-        addr = handler->hdl_list[i].addr;
-        size = handler->hdl_list[i].size;
-        if ( (p->addr >= addr) &&
-             ((p->addr + p->size) <= (addr + size)) )
-        {
-            if ( type == HVM_PORTIO )
-                return process_portio_intercept(
-                    handler->hdl_list[i].action.portio, p);
+        int rc = stdvga_intercept_mmio(p);
+        if ( (rc == X86EMUL_OKAY) || (rc == X86EMUL_RETRY) )
+            return rc;
+    }
 
-            if ( unlikely(p->count > 1) &&
-                 (unlikely(p->df)
-                  ? p->addr - (p->count - 1L) * p->size < addr
-                  : p->addr + p->count * 1L * p->size - 1 >= addr + size) )
-                p->count = 1;
+    handler = hvm_find_io_handler(p);
 
-            return handler->hdl_list[i].action.mmio(p);
-        }
-    }
+    if ( handler == NULL )
+        return X86EMUL_UNHANDLEABLE;
 
-    return X86EMUL_UNHANDLEABLE;
+    return hvm_process_io_intercept(handler, p);
 }
 
-void register_io_handler(
-    struct domain *d, unsigned long addr, unsigned long size,
-    void *action, int type)
+static struct hvm_io_handler *hvm_next_io_handler(struct domain *d)
 {
-    struct hvm_io_handler *handler = d->arch.hvm_domain.io_handler;
-    int num = handler->num_slot;
+    unsigned int i = d->arch.hvm_domain.io_handler_count++;
 
-    BUG_ON(num >= MAX_IO_HANDLER);
+    if (i == NR_IO_HANDLERS)
+        domain_crash(d);
 
-    handler->hdl_list[num].addr = addr;
-    handler->hdl_list[num].size = size;
-    handler->hdl_list[num].action.ptr = action;
-    handler->hdl_list[num].type = type;
-    handler->num_slot++;
+    return &d->arch.hvm_domain.io_handler[i];
 }
 
-void relocate_io_handler(
-    struct domain *d, unsigned long old_addr, unsigned long new_addr,
-    unsigned long size, int type)
+void register_mmio_handler(struct domain *d, const struct hvm_mmio_ops *ops)
 {
-    struct hvm_io_handler *handler = d->arch.hvm_domain.io_handler;
-    int i;
-
-    for ( i = 0; i < handler->num_slot; i++ )
-        if ( (handler->hdl_list[i].addr == old_addr) &&
-             (handler->hdl_list[i].size == size) &&
-             (handler->hdl_list[i].type == type) )
-            handler->hdl_list[i].addr = new_addr;
+    struct hvm_io_handler *handler = hvm_next_io_handler(d);
+
+    handler->type = IOREQ_TYPE_COPY;
+    handler->u.mmio.ops = ops;
+}
+
+void register_portio_handler(struct domain *d, uint32_t addr,
+                             uint32_t size, portio_action_t action)
+{
+    struct hvm_io_handler *handler = hvm_next_io_handler(d);
+
+    handler->type = IOREQ_TYPE_PIO;
+    handler->u.portio.start = addr;
+    handler->u.portio.end = addr + size;
+    handler->u.portio.action = action;
+}
+
+void relocate_portio_handler(struct domain *d, uint32_t old_addr,
+                             uint32_t new_addr, uint32_t size)
+{
+    ioreq_t p = {
+        .type = IOREQ_TYPE_PIO,
+        .addr = old_addr,
+        .size = size
+    };
+    struct hvm_io_handler *handler = hvm_find_io_handler(&p);
+
+    if ( handler == NULL )
+        return;
+
+    handler->u.portio.start = new_addr;
+    handler->u.portio.end = new_addr + size;
+}
+
+bool_t hvm_mmio_internal(paddr_t gpa)
+{
+    ioreq_t p = {
+        .type = IOREQ_TYPE_COPY,
+        .addr = gpa
+    };
+
+    return (hvm_find_io_handler(&p) != NULL);
 }
 
 /*
diff --git a/xen/arch/x86/hvm/stdvga.c b/xen/arch/x86/hvm/stdvga.c
index 13d1029..dcd532a 100644
--- a/xen/arch/x86/hvm/stdvga.c
+++ b/xen/arch/x86/hvm/stdvga.c
@@ -547,12 +547,27 @@ static int mmio_move(struct hvm_hw_stdvga *s, ioreq_t *p)
     return 1;
 }
 
-static int stdvga_intercept_mmio(ioreq_t *p)
+int stdvga_intercept_mmio(ioreq_t *p)
 {
     struct domain *d = current->domain;
     struct hvm_hw_stdvga *s = &d->arch.hvm_domain.stdvga;
+    uint64_t start, end, count = p->count, size = p->size;
     int buf = 0, rc;
 
+    if ( p->df )
+    {
+        start = (p->addr - (count - 1) * size);
+        end = p->addr + size;
+    }
+    else
+    {
+        start = p->addr;
+        end = p->addr + count * size;
+    }
+
+    if ( (start < VGA_MEM_BASE) || (end > (VGA_MEM_BASE + VGA_MEM_SIZE)) )
+        return X86EMUL_UNHANDLEABLE;
+
     if ( p->size > 8 )
     {
         gdprintk(XENLOG_WARNING, "invalid mmio size %d\n", (int)p->size);
@@ -619,9 +634,6 @@ void stdvga_init(struct domain *d)
         register_portio_handler(d, 0x3c4, 2, stdvga_intercept_pio);
         /* Graphics registers. */
         register_portio_handler(d, 0x3ce, 2, stdvga_intercept_pio);
-        /* MMIO. */
-        register_buffered_io_handler(
-            d, VGA_MEM_BASE, VGA_MEM_SIZE, stdvga_intercept_mmio);
     }
 }
 
@@ -638,3 +650,13 @@ void stdvga_deinit(struct domain *d)
         s->vram_page[i] = NULL;
     }
 }
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
index cbbef9f..9ad909b 100644
--- a/xen/arch/x86/hvm/vioapic.c
+++ b/xen/arch/x86/hvm/vioapic.c
@@ -250,7 +250,7 @@ static int vioapic_range(struct vcpu *v, unsigned long addr)
              (addr < vioapic->base_address + VIOAPIC_MEM_LENGTH)));
 }
 
-const struct hvm_mmio_ops vioapic_mmio_ops = {
+static const struct hvm_mmio_ops vioapic_mmio_ops = {
     .check = vioapic_range,
     .read = vioapic_read,
     .write = vioapic_write
@@ -456,6 +456,8 @@ int vioapic_init(struct domain *d)
     d->arch.hvm_domain.vioapic->domain = d;
     vioapic_reset(d);
 
+    register_mmio_handler(d, &vioapic_mmio_ops);
+
     return 0;
 }
 
diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index d5855cc..f2052cf 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -996,7 +996,7 @@ static int vlapic_range(struct vcpu *v, unsigned long addr)
            (offset < PAGE_SIZE);
 }
 
-const struct hvm_mmio_ops vlapic_mmio_ops = {
+static const struct hvm_mmio_ops vlapic_mmio_ops = {
     .check = vlapic_range,
     .read = vlapic_read,
     .write = vlapic_write
@@ -1462,6 +1462,9 @@ int vlapic_init(struct vcpu *v)
                  vlapic_init_sipi_action,
                  (unsigned long)v);
 
+    if ( v->vcpu_id == 0 )
+        register_mmio_handler(v->domain, &vlapic_mmio_ops);
+
     return 0;
 }
 
diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
index f89233d..09ea301 100644
--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -344,7 +344,7 @@ static int msixtbl_range(struct vcpu *v, unsigned long addr)
     return !!desc;
 }
 
-const struct hvm_mmio_ops msixtbl_mmio_ops = {
+static const struct hvm_mmio_ops msixtbl_mmio_ops = {
     .check = msixtbl_range,
     .read = msixtbl_read,
     .write = msixtbl_write
@@ -481,6 +481,11 @@ found:
     spin_unlock_irq(&irq_desc->lock);
 }
 
+void msixtbl_init(struct domain *d)
+{
+    register_mmio_handler(d, &msixtbl_mmio_ops);
+}
+
 void msixtbl_pt_cleanup(struct domain *d)
 {
     struct msixtbl_entry *entry, *temp;
diff --git a/xen/drivers/passthrough/amd/iommu_guest.c b/xen/drivers/passthrough/amd/iommu_guest.c
index 7b0c102..9c3c488 100644
--- a/xen/drivers/passthrough/amd/iommu_guest.c
+++ b/xen/drivers/passthrough/amd/iommu_guest.c
@@ -868,6 +868,20 @@ static void guest_iommu_reg_init(struct guest_iommu *iommu)
     iommu->reg_ext_feature.hi = upper;
 }
 
+static int guest_iommu_mmio_range(struct vcpu *v, unsigned long addr)
+{
+    struct guest_iommu *iommu = vcpu_iommu(v);
+
+    return iommu && addr >= iommu->mmio_base &&
+           addr < iommu->mmio_base + IOMMU_MMIO_SIZE;
+}
+
+static const struct hvm_mmio_ops iommu_mmio_ops = {
+    .check = guest_iommu_mmio_range,
+    .read = guest_iommu_mmio_read,
+    .write = guest_iommu_mmio_write
+};
+
 /* Domain specific initialization */
 int guest_iommu_init(struct domain* d)
 {
@@ -894,6 +908,8 @@ int guest_iommu_init(struct domain* d)
 
     spin_lock_init(&iommu->lock);
 
+    register_mmio_handler(d, &iommu_mmio_ops);
+
     return 0;
 }
 
@@ -910,17 +926,3 @@ void guest_iommu_destroy(struct domain *d)
 
     domain_hvm_iommu(d)->arch.g_iommu = NULL;
 }
-
-static int guest_iommu_mmio_range(struct vcpu *v, unsigned long addr)
-{
-    struct guest_iommu *iommu = vcpu_iommu(v);
-
-    return iommu && addr >= iommu->mmio_base &&
-           addr < iommu->mmio_base + IOMMU_MMIO_SIZE;
-}
-
-const struct hvm_mmio_ops iommu_mmio_ops = {
-    .check = guest_iommu_mmio_range,
-    .read = guest_iommu_mmio_read,
-    .write = guest_iommu_mmio_write
-};
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index ad68fcf..b612755 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -94,6 +94,7 @@ struct hvm_domain {
     struct pl_time         pl_time;
 
     struct hvm_io_handler *io_handler;
+    unsigned int          io_handler_count;
 
     /* Lock protects access to irq, vpic and vioapic. */
     spinlock_t             irq_lock;
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index f2aaec5..fecd02d 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -25,10 +25,7 @@
 #include <public/hvm/ioreq.h>
 #include <public/event_channel.h>
 
-#define MAX_IO_HANDLER             16
-
-#define HVM_PORTIO                  0
-#define HVM_BUFFERED_IO             2
+#define NR_IO_HANDLERS 32
 
 typedef int (*hvm_mmio_read_t)(struct vcpu *v,
                                unsigned long addr,
@@ -40,81 +37,57 @@ typedef int (*hvm_mmio_write_t)(struct vcpu *v,
                                 unsigned long val);
 typedef int (*hvm_mmio_check_t)(struct vcpu *v, unsigned long addr);
 
-typedef int (*portio_action_t)(
-    int dir, uint32_t port, uint32_t bytes, uint32_t *val);
-typedef int (*mmio_action_t)(ioreq_t *);
-struct io_handler {
-    int                 type;
-    unsigned long       addr;
-    unsigned long       size;
-    union {
-        portio_action_t portio;
-        mmio_action_t   mmio;
-        void           *ptr;
-    } action;
-};
-
-struct hvm_io_handler {
-    int     num_slot;
-    struct  io_handler hdl_list[MAX_IO_HANDLER];
-};
-
 struct hvm_mmio_ops {
     hvm_mmio_check_t check;
     hvm_mmio_read_t  read;
     hvm_mmio_write_t write;
 };
 
-extern const struct hvm_mmio_ops hpet_mmio_ops;
-extern const struct hvm_mmio_ops vlapic_mmio_ops;
-extern const struct hvm_mmio_ops vioapic_mmio_ops;
-extern const struct hvm_mmio_ops msixtbl_mmio_ops;
-extern const struct hvm_mmio_ops iommu_mmio_ops;
-
-#define HVM_MMIO_HANDLER_NR 5
+typedef int (*portio_action_t)(
+    int dir, uint32_t port, uint32_t bytes, uint32_t *val);
 
-int hvm_io_intercept(ioreq_t *p, int type);
-void register_io_handler(
-    struct domain *d, unsigned long addr, unsigned long size,
-    void *action, int type);
-void relocate_io_handler(
-    struct domain *d, unsigned long old_addr, unsigned long new_addr,
-    unsigned long size, int type);
+struct hvm_io_handler {
+    union {
+        struct {
+            const struct hvm_mmio_ops *ops;
+        } mmio;
+        struct {
+            uint32_t        start, end;
+            portio_action_t action;
+        } portio;
+    } u;
+    uint8_t type;
+};
 
-static inline int hvm_portio_intercept(ioreq_t *p)
-{
-    return hvm_io_intercept(p, HVM_PORTIO);
-}
+typedef int (*hvm_io_read_t)(struct hvm_io_handler *,
+                             uint64_t addr,
+                             uint64_t size,
+                             uint64_t *data);
+typedef int (*hvm_io_write_t)(struct hvm_io_handler *,
+                              uint64_t addr,
+                              uint64_t size,
+                              uint64_t data);
+typedef bool_t (*hvm_io_accept_t)(struct hvm_io_handler *,
+                                  uint64_t addr,
+                                  uint64_t size);
+
+struct hvm_io_ops {
+    hvm_io_accept_t accept;
+    hvm_io_read_t   read;
+    hvm_io_write_t  write;
+};
 
-static inline int hvm_buffered_io_intercept(ioreq_t *p)
-{
-    return hvm_io_intercept(p, HVM_BUFFERED_IO);
-}
+int hvm_io_intercept(ioreq_t *p);
 
+void register_mmio_handler(struct domain *d,
+                           const struct hvm_mmio_ops *ops);
+void register_portio_handler(struct domain *d, uint32_t addr,
+                             uint32_t size, portio_action_t action);
+void relocate_portio_handler(struct domain *d, uint32_t old_addr,
+                             uint32_t new_addr, uint32_t size);
 bool_t hvm_mmio_internal(paddr_t gpa);
-int hvm_mmio_intercept(ioreq_t *p);
-int hvm_buffered_io_send(ioreq_t *p);
 
-static inline void register_portio_handler(
-    struct domain *d, unsigned long addr,
-    unsigned long size, portio_action_t action)
-{
-    register_io_handler(d, addr, size, action, HVM_PORTIO);
-}
-
-static inline void relocate_portio_handler(
-    struct domain *d, unsigned long old_addr, unsigned long new_addr,
-    unsigned long size)
-{
-    relocate_io_handler(d, old_addr, new_addr, size, HVM_PORTIO);
-}
-
-static inline void register_buffered_io_handler(
-    struct domain *d, unsigned long addr,
-    unsigned long size, mmio_action_t action)
-{
-    register_io_handler(d, addr, size, action, HVM_BUFFERED_IO);
-}
+int hvm_buffered_io_send(ioreq_t *p);
 
 void send_timeoffset_req(unsigned long timeoff);
 void send_invalidate_req(void);
@@ -127,6 +100,7 @@ void hvm_io_assist(ioreq_t *p);
 void hvm_dpci_eoi(struct domain *d, unsigned int guest_irq,
                   const union vioapic_redir_entry *ent);
 void msix_write_completion(struct vcpu *);
+void msixtbl_init(struct domain *d);
 
 struct hvm_hw_stdvga {
     uint8_t sr_index;
@@ -141,8 +115,19 @@ struct hvm_hw_stdvga {
 };
 
 void stdvga_init(struct domain *d);
+int stdvga_intercept_mmio(ioreq_t *p);
 void stdvga_deinit(struct domain *d);
 
 extern void hvm_dpci_msi_eoi(struct domain *d, int vector);
 #endif /* __ASM_X86_HVM_IO_H__ */
 
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 08/18] x86/hvm: add length to mmio check op
  2015-06-23 10:39 [PATCH v3 00/18] x86/hvm: I/O emulation cleanup and fix Paul Durrant
                   ` (6 preceding siblings ...)
  2015-06-23 10:39 ` [PATCH v3 07/18] x86/hvm: unify internal portio and mmio intercepts Paul Durrant
@ 2015-06-23 10:39 ` Paul Durrant
  2015-06-23 10:39 ` [PATCH v3 09/18] x86/hvm: unify dpci portio intercept with standard portio intercept Paul Durrant
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: Paul Durrant @ 2015-06-23 10:39 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant, Keir Fraser, Jan Beulich

When memory mapped I/O is range checked by internal handlers, the length
of the access should be taken into account.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/hpet.c                   |    7 ++++---
 xen/arch/x86/hvm/intercept.c              |    2 +-
 xen/arch/x86/hvm/vioapic.c                |   17 ++++++++++++++---
 xen/arch/x86/hvm/vlapic.c                 |    8 +++++---
 xen/arch/x86/hvm/vmsi.c                   |   27 ++++++++++++++++++++-------
 xen/drivers/passthrough/amd/iommu_guest.c |   18 +++++++++++++++---
 xen/include/asm-x86/hvm/io.h              |    4 +++-
 7 files changed, 62 insertions(+), 21 deletions(-)

diff --git a/xen/arch/x86/hvm/hpet.c b/xen/arch/x86/hvm/hpet.c
index 8958873..1a1f239 100644
--- a/xen/arch/x86/hvm/hpet.c
+++ b/xen/arch/x86/hvm/hpet.c
@@ -498,10 +498,11 @@ static int hpet_write(
     return X86EMUL_OKAY;
 }
 
-static int hpet_range(struct vcpu *v, unsigned long addr)
+static int hpet_range(struct vcpu *v, unsigned long addr,
+                      unsigned long length)
 {
-    return ( (addr >= HPET_BASE_ADDRESS) &&
-             (addr < (HPET_BASE_ADDRESS + HPET_MMAP_SIZE)) );
+    return (addr >= HPET_BASE_ADDRESS) &&
+           ((addr + length) < (HPET_BASE_ADDRESS + HPET_MMAP_SIZE));
 }
 
 static const struct hvm_mmio_ops hpet_mmio_ops = {
diff --git a/xen/arch/x86/hvm/intercept.c b/xen/arch/x86/hvm/intercept.c
index 4db024e..5e8d8b2 100644
--- a/xen/arch/x86/hvm/intercept.c
+++ b/xen/arch/x86/hvm/intercept.c
@@ -38,7 +38,7 @@ static bool_t hvm_mmio_accept(struct hvm_io_handler *handler,
 {
     BUG_ON(handler->type != IOREQ_TYPE_COPY);
 
-    return handler->u.mmio.ops->check(current, addr);
+    return handler->u.mmio.ops->check(current, addr, size);
 }
 
 static int hvm_mmio_read(struct hvm_io_handler *handler,
diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
index 9ad909b..4a9b33e 100644
--- a/xen/arch/x86/hvm/vioapic.c
+++ b/xen/arch/x86/hvm/vioapic.c
@@ -242,12 +242,13 @@ static int vioapic_write(
     return X86EMUL_OKAY;
 }
 
-static int vioapic_range(struct vcpu *v, unsigned long addr)
+static int vioapic_range(struct vcpu *v, unsigned long addr,
+			 unsigned long length)
 {
     struct hvm_hw_vioapic *vioapic = domain_vioapic(v->domain);
 
-    return ((addr >= vioapic->base_address &&
-             (addr < vioapic->base_address + VIOAPIC_MEM_LENGTH)));
+    return (addr >= vioapic->base_address) &&
+           ((addr + length) <= (vioapic->base_address + VIOAPIC_MEM_LENGTH));
 }
 
 static const struct hvm_mmio_ops vioapic_mmio_ops = {
@@ -466,3 +467,13 @@ void vioapic_deinit(struct domain *d)
     xfree(d->arch.hvm_domain.vioapic);
     d->arch.hvm_domain.vioapic = NULL;
 }
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index f2052cf..7421fc5 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -986,14 +986,16 @@ int hvm_x2apic_msr_write(struct vcpu *v, unsigned int msr, uint64_t msr_content)
     return vlapic_reg_write(v, offset, (uint32_t)msr_content);
 }
 
-static int vlapic_range(struct vcpu *v, unsigned long addr)
+static int vlapic_range(struct vcpu *v, unsigned long address,
+                        unsigned long len)
 {
     struct vlapic *vlapic = vcpu_vlapic(v);
-    unsigned long offset  = addr - vlapic_base_address(vlapic);
+    unsigned long offset  = address - vlapic_base_address(vlapic);
 
     return !vlapic_hw_disabled(vlapic) &&
            !vlapic_x2apic_mode(vlapic) &&
-           (offset < PAGE_SIZE);
+           (address >= vlapic_base_address(vlapic)) &&
+           ((offset + len) <= PAGE_SIZE);
 }
 
 static const struct hvm_mmio_ops vlapic_mmio_ops = {
diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
index 09ea301..61fe391 100644
--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -168,14 +168,14 @@ struct msixtbl_entry
 static DEFINE_RCU_READ_LOCK(msixtbl_rcu_lock);
 
 static struct msixtbl_entry *msixtbl_find_entry(
-    struct vcpu *v, unsigned long addr)
+    struct vcpu *v, unsigned long address, unsigned long len)
 {
     struct msixtbl_entry *entry;
     struct domain *d = v->domain;
 
     list_for_each_entry( entry, &d->arch.hvm_domain.msixtbl_list, list )
-        if ( addr >= entry->gtable &&
-             addr < entry->gtable + entry->table_len )
+        if ( (address >= entry->gtable) &&
+             ((address + len) <= (entry->gtable + entry->table_len)) )
             return entry;
 
     return NULL;
@@ -214,7 +214,7 @@ static int msixtbl_read(
 
     rcu_read_lock(&msixtbl_rcu_lock);
 
-    entry = msixtbl_find_entry(v, address);
+    entry = msixtbl_find_entry(v, address, len);
     if ( !entry )
         goto out;
     offset = address & (PCI_MSIX_ENTRY_SIZE - 1);
@@ -273,7 +273,7 @@ static int msixtbl_write(struct vcpu *v, unsigned long address,
 
     rcu_read_lock(&msixtbl_rcu_lock);
 
-    entry = msixtbl_find_entry(v, address);
+    entry = msixtbl_find_entry(v, address, len);
     if ( !entry )
         goto out;
     nr_entry = (address - entry->gtable) / PCI_MSIX_ENTRY_SIZE;
@@ -333,12 +333,15 @@ out:
     return r;
 }
 
-static int msixtbl_range(struct vcpu *v, unsigned long addr)
+static int msixtbl_range(struct vcpu *v, unsigned long address,
+                         unsigned long len)
 {
+    struct msixtbl_entry *entry;
     const struct msi_desc *desc;
 
     rcu_read_lock(&msixtbl_rcu_lock);
-    desc = msixtbl_addr_to_desc(msixtbl_find_entry(v, addr), addr);
+    entry = msixtbl_find_entry(v, address, len);
+    desc = msixtbl_addr_to_desc(entry, address);
     rcu_read_unlock(&msixtbl_rcu_lock);
 
     return !!desc;
@@ -514,3 +517,13 @@ void msix_write_completion(struct vcpu *v)
     if ( msixtbl_write(v, ctrl_address, 4, 0) != X86EMUL_OKAY )
         gdprintk(XENLOG_WARNING, "MSI-X write completion failure\n");
 }
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/drivers/passthrough/amd/iommu_guest.c b/xen/drivers/passthrough/amd/iommu_guest.c
index 9c3c488..dd84281 100644
--- a/xen/drivers/passthrough/amd/iommu_guest.c
+++ b/xen/drivers/passthrough/amd/iommu_guest.c
@@ -868,12 +868,14 @@ static void guest_iommu_reg_init(struct guest_iommu *iommu)
     iommu->reg_ext_feature.hi = upper;
 }
 
-static int guest_iommu_mmio_range(struct vcpu *v, unsigned long addr)
+static int guest_iommu_mmio_range(struct vcpu *v, unsigned long addr,
+                                  unsigned long length)
 {
     struct guest_iommu *iommu = vcpu_iommu(v);
 
-    return iommu && addr >= iommu->mmio_base &&
-           addr < iommu->mmio_base + IOMMU_MMIO_SIZE;
+    return iommu &&
+           (addr >= iommu->mmio_base) &&
+           ((addr + length) <= (iommu->mmio_base + IOMMU_MMIO_SIZE));
 }
 
 static const struct hvm_mmio_ops iommu_mmio_ops = {
@@ -926,3 +928,13 @@ void guest_iommu_destroy(struct domain *d)
 
     domain_hvm_iommu(d)->arch.g_iommu = NULL;
 }
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index fecd02d..b4596fc 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -35,7 +35,9 @@ typedef int (*hvm_mmio_write_t)(struct vcpu *v,
                                 unsigned long addr,
                                 unsigned long length,
                                 unsigned long val);
-typedef int (*hvm_mmio_check_t)(struct vcpu *v, unsigned long addr);
+typedef int (*hvm_mmio_check_t)(struct vcpu *v,
+                                unsigned long addr,
+                                unsigned long length);
 
 struct hvm_mmio_ops {
     hvm_mmio_check_t check;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 09/18] x86/hvm: unify dpci portio intercept with standard portio intercept
  2015-06-23 10:39 [PATCH v3 00/18] x86/hvm: I/O emulation cleanup and fix Paul Durrant
                   ` (7 preceding siblings ...)
  2015-06-23 10:39 ` [PATCH v3 08/18] x86/hvm: add length to mmio check op Paul Durrant
@ 2015-06-23 10:39 ` Paul Durrant
  2015-06-23 10:39 ` [PATCH v3 10/18] x86/hvm: unify stdvga mmio intercept with standard mmio intercept Paul Durrant
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: Paul Durrant @ 2015-06-23 10:39 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant, Keir Fraser, Jan Beulich

This patch re-works the dpci portio intercepts so that they can be unified
with standard portio handling thereby removing a substantial amount of
code duplication.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/hvm.c         |    2 +
 xen/arch/x86/hvm/intercept.c   |   22 ++--
 xen/arch/x86/hvm/io.c          |  225 +++++++++++++---------------------------
 xen/include/asm-x86/hvm/io.h   |    8 ++
 xen/include/asm-x86/hvm/vcpu.h |    2 +
 xen/include/xen/iommu.h        |    1 -
 6 files changed, 88 insertions(+), 172 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index c10db78..f8486f4 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1486,6 +1486,8 @@ int hvm_domain_initialise(struct domain *d)
     else
         d->arch.hvm_domain.io_bitmap = hvm_io_bitmap;
 
+    register_dpci_portio_handler(d);
+
     if ( is_pvh_domain(d) )
     {
         register_portio_handler(d, 0, 0x10003, handle_pvh_io);
diff --git a/xen/arch/x86/hvm/intercept.c b/xen/arch/x86/hvm/intercept.c
index 5e8d8b2..5633959 100644
--- a/xen/arch/x86/hvm/intercept.c
+++ b/xen/arch/x86/hvm/intercept.c
@@ -116,10 +116,7 @@ static int hvm_process_io_intercept(struct hvm_io_handler *handler,
                                     ioreq_t *p)
 {
     struct hvm_vcpu_io *vio = &current->arch.hvm_vcpu.hvm_io;
-    const struct hvm_io_ops *ops =
-        (p->type == IOREQ_TYPE_COPY) ?
-        &mmio_ops :
-        &portio_ops;
+    const struct hvm_io_ops *ops = handler->ops;
     int rc = X86EMUL_OKAY, i, step = p->df ? -p->size : p->size;
     uint64_t data;
     uint64_t addr;
@@ -237,16 +234,13 @@ static struct hvm_io_handler *hvm_find_io_handler(ioreq_t *p)
 {
     struct vcpu *curr = current;
     struct domain *curr_d = curr->domain;
-    const struct hvm_io_ops *ops =
-        (p->type == IOREQ_TYPE_COPY) ?
-        &mmio_ops :
-        &portio_ops;
     unsigned int i;
 
     for ( i = 0; i < curr_d->arch.hvm_domain.io_handler_count; i++ )
     {
         struct hvm_io_handler *handler =
             &curr_d->arch.hvm_domain.io_handler[i];
+        const struct hvm_io_ops *ops = handler->ops;
         uint64_t start, end, count = p->count, size = p->size;
 
         if ( handler->type != p->type )
@@ -285,13 +279,7 @@ int hvm_io_intercept(ioreq_t *p)
 {
     struct hvm_io_handler *handler;
 
-    if ( p->type == IOREQ_TYPE_PIO )
-    {
-        int rc = dpci_ioport_intercept(p);
-        if ( (rc == X86EMUL_OKAY) || (rc == X86EMUL_RETRY) )
-            return rc;
-    }
-    else if ( p->type == IOREQ_TYPE_COPY )
+    if ( p->type == IOREQ_TYPE_COPY )
     {
         int rc = stdvga_intercept_mmio(p);
         if ( (rc == X86EMUL_OKAY) || (rc == X86EMUL_RETRY) )
@@ -306,7 +294,7 @@ int hvm_io_intercept(ioreq_t *p)
     return hvm_process_io_intercept(handler, p);
 }
 
-static struct hvm_io_handler *hvm_next_io_handler(struct domain *d)
+struct hvm_io_handler *hvm_next_io_handler(struct domain *d)
 {
     unsigned int i = d->arch.hvm_domain.io_handler_count++;
 
@@ -321,6 +309,7 @@ void register_mmio_handler(struct domain *d, const struct hvm_mmio_ops *ops)
     struct hvm_io_handler *handler = hvm_next_io_handler(d);
 
     handler->type = IOREQ_TYPE_COPY;
+    handler->ops = &mmio_ops;
     handler->u.mmio.ops = ops;
 }
 
@@ -330,6 +319,7 @@ void register_portio_handler(struct domain *d, uint32_t addr,
     struct hvm_io_handler *handler = hvm_next_io_handler(d);
 
     handler->type = IOREQ_TYPE_PIO;
+    handler->ops = &portio_ops;
     handler->u.portio.start = addr;
     handler->u.portio.end = addr + size;
     handler->u.portio.action = action;
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index c0964ec..51ef19a 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -208,185 +208,100 @@ void hvm_io_assist(ioreq_t *p)
     }
 }
 
-static int dpci_ioport_read(uint32_t mport, ioreq_t *p)
+static bool_t dpci_portio_accept(struct hvm_io_handler *handler,
+                                 uint64_t addr,
+                                 uint64_t size)
 {
-    struct hvm_vcpu_io *vio = &current->arch.hvm_vcpu.hvm_io;
-    int rc = X86EMUL_OKAY, i, step = p->df ? -p->size : p->size;
-    uint32_t data = 0;
+    struct vcpu *curr = current;
+    struct hvm_iommu *hd = domain_hvm_iommu(curr->domain);
+    struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
+    struct g2m_ioport *g2m_ioport;
+    uint32_t start, end;
+    uint32_t gport = addr, mport;
 
-    for ( i = 0; i < p->count; i++ )
+    list_for_each_entry( g2m_ioport, &hd->arch.g2m_ioport_list, list )
     {
-        if ( vio->mmio_retrying )
-        {
-            if ( vio->mmio_large_read_bytes != p->size )
-                return X86EMUL_UNHANDLEABLE;
-            memcpy(&data, vio->mmio_large_read, p->size);
-            vio->mmio_large_read_bytes = 0;
-            vio->mmio_retrying = 0;
-        }
-        else switch ( p->size )
-        {
-        case 1:
-            data = inb(mport);
-            break;
-        case 2:
-            data = inw(mport);
-            break;
-        case 4:
-            data = inl(mport);
-            break;
-        default:
-            BUG();
-        }
-
-        if ( p->data_is_ptr )
-        {
-            switch ( hvm_copy_to_guest_phys(p->data + step * i,
-                                            &data, p->size) )
-            {
-            case HVMCOPY_okay:
-                break;
-            case HVMCOPY_gfn_paged_out:
-            case HVMCOPY_gfn_shared:
-                rc = X86EMUL_RETRY;
-                break;
-            case HVMCOPY_bad_gfn_to_mfn:
-                /* Drop the write as real hardware would. */
-                continue;
-            case HVMCOPY_bad_gva_to_gfn:
-                ASSERT(0);
-                /* fall through */
-            default:
-                rc = X86EMUL_UNHANDLEABLE;
-                break;
-            }
-            if ( rc != X86EMUL_OKAY)
-                break;
-        }
-        else
-            p->data = data;
+        start = g2m_ioport->gport;
+        end = start + g2m_ioport->np;
+        if ( (gport >= start) && (gport + size <= end) )
+            goto found;
     }
 
-    if ( rc == X86EMUL_RETRY )
-    {
-        vio->mmio_retry = 1;
-        vio->mmio_large_read_bytes = p->size;
-        memcpy(vio->mmio_large_read, &data, p->size);
-    }
+    return 0;
 
-    if ( i != 0 )
-    {
-        p->count = i;
-        rc = X86EMUL_OKAY;
-    }
+ found:
+    mport = (gport - start) + g2m_ioport->mport;
 
-    return rc;
+    vio->g2m_ioport = g2m_ioport;
+    return 1;
 }
 
-static int dpci_ioport_write(uint32_t mport, ioreq_t *p)
+static int dpci_portio_read(struct hvm_io_handler *handler,
+                            uint64_t addr,
+                            uint64_t size,
+                            uint64_t *data)
 {
-    int rc = X86EMUL_OKAY, i, step = p->df ? -p->size : p->size;
-    uint32_t data;
-
-    for ( i = 0; i < p->count; i++ )
-    {
-        data = p->data;
-        if ( p->data_is_ptr )
-        {
-            switch ( hvm_copy_from_guest_phys(&data, p->data + step * i,
-                                              p->size) )
-            {
-            case HVMCOPY_okay:
-                break;
-            case HVMCOPY_gfn_paged_out:
-            case HVMCOPY_gfn_shared:
-                rc = X86EMUL_RETRY;
-                break;
-            case HVMCOPY_bad_gfn_to_mfn:
-                data = ~0;
-                break;
-            case HVMCOPY_bad_gva_to_gfn:
-                ASSERT(0);
-                /* fall through */
-            default:
-                rc = X86EMUL_UNHANDLEABLE;
-                break;
-            }
-            if ( rc != X86EMUL_OKAY)
-                break;
-        }
-
-        switch ( p->size )
-        {
-        case 1:
-            outb(data, mport);
-            break;
-        case 2:
-            outw(data, mport);
-            break;
-        case 4:
-            outl(data, mport);
-            break;
-        default:
-            BUG();
-        }
-    }
-
-    if ( rc == X86EMUL_RETRY )
-        current->arch.hvm_vcpu.hvm_io.mmio_retry = 1;
+    struct hvm_vcpu_io *vio = &current->arch.hvm_vcpu.hvm_io;
+    struct g2m_ioport *g2m_ioport = vio->g2m_ioport;
+    uint32_t mport = (addr - g2m_ioport->gport) + g2m_ioport->mport;
 
-    if ( i != 0 )
+    switch ( size )
     {
-        p->count = i;
-        rc = X86EMUL_OKAY;
+    case 1:
+        *data = inb(mport);
+        break;
+    case 2:
+        *data = inw(mport);
+        break;
+    case 4:
+        *data = inl(mport);
+        break;
+    default:
+        BUG();
     }
 
-    return rc;
+    return X86EMUL_OKAY;
 }
 
-int dpci_ioport_intercept(ioreq_t *p)
+static int dpci_portio_write(struct hvm_io_handler *handler,
+                             uint64_t addr,
+                             uint64_t size,
+                             uint64_t data)
 {
-    struct domain *d = current->domain;
-    struct hvm_iommu *hd = domain_hvm_iommu(d);
-    struct g2m_ioport *g2m_ioport;
-    unsigned int mport, gport = p->addr;
-    unsigned int s = 0, e = 0;
-    int rc;
-
-    list_for_each_entry( g2m_ioport, &hd->arch.g2m_ioport_list, list )
-    {
-        s = g2m_ioport->gport;
-        e = s + g2m_ioport->np;
-        if ( (gport >= s) && (gport < e) )
-            goto found;
-    }
-
-    return X86EMUL_UNHANDLEABLE;
-
- found:
-    mport = (gport - s) + g2m_ioport->mport;
-
-    if ( !ioports_access_permitted(d, mport, mport + p->size - 1) ) 
-    {
-        gdprintk(XENLOG_ERR, "Error: access to gport=%#x denied!\n",
-                 (uint32_t)p->addr);
-        return X86EMUL_UNHANDLEABLE;
-    }
+    struct hvm_vcpu_io *vio = &current->arch.hvm_vcpu.hvm_io;
+    struct g2m_ioport *g2m_ioport = vio->g2m_ioport;
+    uint32_t mport = (addr - g2m_ioport->gport) + g2m_ioport->mport;
 
-    switch ( p->dir )
+    switch ( size )
     {
-    case IOREQ_READ:
-        rc = dpci_ioport_read(mport, p);
+    case 1:
+        outb(data, mport);
         break;
-    case IOREQ_WRITE:
-        rc = dpci_ioport_write(mport, p);
+    case 2:
+        outw(data, mport);
+        break;
+    case 4:
+        outl(data, mport);
         break;
     default:
-        gdprintk(XENLOG_ERR, "Error: couldn't handle p->dir = %d", p->dir);
-        rc = X86EMUL_UNHANDLEABLE;
+        BUG();
     }
 
-    return rc;
+    return X86EMUL_OKAY;
+}
+
+static const struct hvm_io_ops dpci_portio_ops = {
+    .accept = dpci_portio_accept,
+    .read = dpci_portio_read,
+    .write = dpci_portio_write
+};
+
+void register_dpci_portio_handler(struct domain *d)
+{
+    struct hvm_io_handler *handler = hvm_next_io_handler(d);
+
+    handler->type = IOREQ_TYPE_PIO;
+    handler->ops = &dpci_portio_ops;
 }
 
 /*
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index b4596fc..afa27bf 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -48,6 +48,8 @@ struct hvm_mmio_ops {
 typedef int (*portio_action_t)(
     int dir, uint32_t port, uint32_t bytes, uint32_t *val);
 
+struct hvm_io_ops;
+
 struct hvm_io_handler {
     union {
         struct {
@@ -58,6 +60,7 @@ struct hvm_io_handler {
             portio_action_t action;
         } portio;
     } u;
+    const struct hvm_io_ops *ops;
     uint8_t type;
 };
 
@@ -81,6 +84,8 @@ struct hvm_io_ops {
 
 int hvm_io_intercept(ioreq_t *p);
 
+struct hvm_io_handler *hvm_next_io_handler(struct domain *d);
+
 void register_mmio_handler(struct domain *d,
                            const struct hvm_mmio_ops *ops);
 void register_portio_handler(struct domain *d, uint32_t addr,
@@ -121,6 +126,9 @@ int stdvga_intercept_mmio(ioreq_t *p);
 void stdvga_deinit(struct domain *d);
 
 extern void hvm_dpci_msi_eoi(struct domain *d, int vector);
+
+void register_dpci_portio_handler(struct domain *d);
+
 #endif /* __ASM_X86_HVM_IO_H__ */
 
 
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index 3d8f4dc..dd08416 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -77,6 +77,8 @@ struct hvm_vcpu_io {
     bool_t mmio_retry, mmio_retrying;
 
     unsigned long msix_unmask_address;
+
+    struct g2m_ioport *g2m_ioport;
 };
 
 #define VMCX_EADDR    (~0ULL)
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index b30bf41..1d00696 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -93,7 +93,6 @@ void pt_pci_init(void);
 
 struct pirq;
 int hvm_do_IRQ_dpci(struct domain *, struct pirq *);
-int dpci_ioport_intercept(ioreq_t *p);
 int pt_irq_create_bind(struct domain *, xen_domctl_bind_pt_irq_t *);
 int pt_irq_destroy_bind(struct domain *, xen_domctl_bind_pt_irq_t *);
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 10/18] x86/hvm: unify stdvga mmio intercept with standard mmio intercept
  2015-06-23 10:39 [PATCH v3 00/18] x86/hvm: I/O emulation cleanup and fix Paul Durrant
                   ` (8 preceding siblings ...)
  2015-06-23 10:39 ` [PATCH v3 09/18] x86/hvm: unify dpci portio intercept with standard portio intercept Paul Durrant
@ 2015-06-23 10:39 ` Paul Durrant
  2015-06-23 10:39 ` [PATCH v3 11/18] x86/hvm: revert 82ed8716b "fix direct PCI port I/O emulation retry Paul Durrant
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: Paul Durrant @ 2015-06-23 10:39 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant, Keir Fraser, Jan Beulich

It's clear from the following check in hvmemul_rep_movs:

    if ( sp2mt == p2m_mmio_direct || dp2mt == p2m_mmio_direct ||
         (sp2mt == p2m_mmio_dm && dp2mt == p2m_mmio_dm) )
        return X86EMUL_UNHANDLEABLE;

that mmio <-> mmio copy is not handled. This means the code in the
stdvga mmio intercept that explicitly handles mmio <-> mmio copy when
hvm_copy_to/from_guest_phys() fails is never going to be executed.

This patch therefore adds a check in hvmemul_do_io_addr() to make sure
mmio <-> mmio is disallowed and then registers standard mmio intercept ops
in stdvga_init().

With this patch all mmio and portio handled within Xen now goes through
process_io_intercept().

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/emulate.c   |    9 +++
 xen/arch/x86/hvm/intercept.c |    7 --
 xen/arch/x86/hvm/stdvga.c    |  173 +++++++++---------------------------------
 xen/include/asm-x86/hvm/io.h |    1 -
 4 files changed, 44 insertions(+), 146 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 91e57b0..7d6c0eb 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -267,6 +267,15 @@ static int hvmemul_acquire_page(unsigned long gmfn, struct page_info **page)
         return X86EMUL_RETRY;
     }
 
+    /* This code should not be reached if the gmfn is not RAM */
+    if ( p2m_is_mmio(p2mt) )
+    {
+        domain_crash(curr_d);
+
+        put_page(*page);
+        return X86EMUL_UNHANDLEABLE;
+    }
+
     return X86EMUL_OKAY;
 }
 
diff --git a/xen/arch/x86/hvm/intercept.c b/xen/arch/x86/hvm/intercept.c
index 5633959..625e585 100644
--- a/xen/arch/x86/hvm/intercept.c
+++ b/xen/arch/x86/hvm/intercept.c
@@ -279,13 +279,6 @@ int hvm_io_intercept(ioreq_t *p)
 {
     struct hvm_io_handler *handler;
 
-    if ( p->type == IOREQ_TYPE_COPY )
-    {
-        int rc = stdvga_intercept_mmio(p);
-        if ( (rc == X86EMUL_OKAY) || (rc == X86EMUL_RETRY) )
-            return rc;
-    }
-
     handler = hvm_find_io_handler(p);
 
     if ( handler == NULL )
diff --git a/xen/arch/x86/hvm/stdvga.c b/xen/arch/x86/hvm/stdvga.c
index dcd532a..639da6a 100644
--- a/xen/arch/x86/hvm/stdvga.c
+++ b/xen/arch/x86/hvm/stdvga.c
@@ -275,9 +275,10 @@ static uint8_t stdvga_mem_readb(uint64_t addr)
     return ret;
 }
 
-static uint64_t stdvga_mem_read(uint64_t addr, uint64_t size)
+static int stdvga_mem_read(struct vcpu *v, unsigned long addr,
+                           unsigned long size, unsigned long *p_data)
 {
-    uint64_t data = 0;
+    unsigned long data = 0;
 
     switch ( size )
     {
@@ -313,7 +314,9 @@ static uint64_t stdvga_mem_read(uint64_t addr, uint64_t size)
         break;
     }
 
-    return data;
+    *p_data = data;
+
+    return X86EMUL_OKAY;
 }
 
 static void stdvga_mem_writeb(uint64_t addr, uint32_t val)
@@ -424,8 +427,17 @@ static void stdvga_mem_writeb(uint64_t addr, uint32_t val)
     }
 }
 
-static void stdvga_mem_write(uint64_t addr, uint64_t data, uint64_t size)
+static int stdvga_mem_write(struct vcpu *v, unsigned long addr,
+                            unsigned long size, unsigned long data)
 {
+    ioreq_t p = { .type = IOREQ_TYPE_COPY,
+                  .addr = addr,
+                  .size = size,
+                  .count = 1,
+                  .dir = IOREQ_WRITE,
+                  .data = data,
+    };
+
     /* Intercept mmio write */
     switch ( size )
     {
@@ -460,153 +472,36 @@ static void stdvga_mem_write(uint64_t addr, uint64_t data, uint64_t size)
         gdprintk(XENLOG_WARNING, "invalid io size: %"PRId64"\n", size);
         break;
     }
-}
-
-static uint32_t read_data;
-
-static int mmio_move(struct hvm_hw_stdvga *s, ioreq_t *p)
-{
-    int i;
-    uint64_t addr = p->addr;
-    p2m_type_t p2mt;
-    struct domain *d = current->domain;
-    int step = p->df ? -p->size : p->size;
 
-    if ( p->data_is_ptr )
-    {
-        uint64_t data = p->data, tmp;
-
-        if ( p->dir == IOREQ_READ )
-        {
-            for ( i = 0; i < p->count; i++ ) 
-            {
-                tmp = stdvga_mem_read(addr, p->size);
-                if ( hvm_copy_to_guest_phys(data, &tmp, p->size) !=
-                     HVMCOPY_okay )
-                {
-                    struct page_info *dp = get_page_from_gfn(
-                            d, data >> PAGE_SHIFT, &p2mt, P2M_ALLOC);
-                    /*
-                     * The only case we handle is vga_mem <-> vga_mem.
-                     * Anything else disables caching and leaves it to qemu-dm.
-                     */
-                    if ( (p2mt != p2m_mmio_dm) || (data < VGA_MEM_BASE) ||
-                         ((data + p->size) > (VGA_MEM_BASE + VGA_MEM_SIZE)) )
-                    {
-                        if ( dp )
-                            put_page(dp);
-                        return 0;
-                    }
-                    ASSERT(!dp);
-                    stdvga_mem_write(data, tmp, p->size);
-                }
-                data += step;
-                addr += step;
-            }
-        }
-        else
-        {
-            for ( i = 0; i < p->count; i++ )
-            {
-                if ( hvm_copy_from_guest_phys(&tmp, data, p->size) !=
-                     HVMCOPY_okay )
-                {
-                    struct page_info *dp = get_page_from_gfn(
-                        d, data >> PAGE_SHIFT, &p2mt, P2M_ALLOC);
-                    if ( (p2mt != p2m_mmio_dm) || (data < VGA_MEM_BASE) ||
-                         ((data + p->size) > (VGA_MEM_BASE + VGA_MEM_SIZE)) )
-                    {
-                        if ( dp )
-                            put_page(dp);
-                        return 0;
-                    }
-                    ASSERT(!dp);
-                    tmp = stdvga_mem_read(data, p->size);
-                }
-                stdvga_mem_write(addr, tmp, p->size);
-                data += step;
-                addr += step;
-            }
-        }
-    }
-    else if ( p->dir == IOREQ_WRITE )
-    {
-        for ( i = 0; i < p->count; i++ )
-        {
-            stdvga_mem_write(addr, p->data, p->size);
-            addr += step;
-        }
-    }
-    else
-    {
-        ASSERT(p->count == 1);
-        p->data = stdvga_mem_read(addr, p->size);
-    }
+    if ( !hvm_buffered_io_send(&p) )
+        return X86EMUL_UNHANDLEABLE;
 
-    read_data = p->data;
-    return 1;
+    return X86EMUL_OKAY;
 }
 
-int stdvga_intercept_mmio(ioreq_t *p)
+static int stdvga_mem_check(struct vcpu *v, unsigned long addr,
+                            unsigned long length)
 {
-    struct domain *d = current->domain;
-    struct hvm_hw_stdvga *s = &d->arch.hvm_domain.stdvga;
-    uint64_t start, end, count = p->count, size = p->size;
-    int buf = 0, rc;
-
-    if ( p->df )
-    {
-        start = (p->addr - (count - 1) * size);
-        end = p->addr + size;
-    }
-    else
-    {
-        start = p->addr;
-        end = p->addr + count * size;
-    }
-
-    if ( (start < VGA_MEM_BASE) || (end > (VGA_MEM_BASE + VGA_MEM_SIZE)) )
-        return X86EMUL_UNHANDLEABLE;
-
-    if ( p->size > 8 )
-    {
-        gdprintk(XENLOG_WARNING, "invalid mmio size %d\n", (int)p->size);
-        return X86EMUL_UNHANDLEABLE;
-    }
+    struct hvm_hw_stdvga *s = &v->domain->arch.hvm_domain.stdvga;
+    int rc;
 
     spin_lock(&s->lock);
 
-    if ( s->stdvga && s->cache )
-    {
-        switch ( p->type )
-        {
-        case IOREQ_TYPE_COPY:
-            buf = mmio_move(s, p);
-            if ( !buf )
-                s->cache = 0;
-            break;
-        default:
-            gdprintk(XENLOG_WARNING, "unsupported mmio request type:%d "
-                     "addr:0x%04x data:0x%04x size:%d count:%d state:%d "
-                     "isptr:%d dir:%d df:%d\n",
-                     p->type, (int)p->addr, (int)p->data, (int)p->size,
-                     (int)p->count, p->state,
-                     p->data_is_ptr, p->dir, p->df);
-            s->cache = 0;
-        }
-    }
-    else
-    {
-        buf = (p->dir == IOREQ_WRITE);
-    }
-
-    rc = (buf && hvm_buffered_io_send(p));
+    rc = s->stdvga && s->cache &&
+        (addr >= VGA_MEM_BASE) &&
+        ((addr + length) < (VGA_MEM_BASE + VGA_MEM_SIZE));
 
     spin_unlock(&s->lock);
 
-    return rc ? X86EMUL_OKAY : X86EMUL_UNHANDLEABLE;
+    return rc;
 }
 
+static const struct hvm_mmio_ops stdvga_mem_ops = {
+    .check = stdvga_mem_check,
+    .read = stdvga_mem_read,
+    .write = stdvga_mem_write
+};
+
 void stdvga_init(struct domain *d)
 {
     struct hvm_hw_stdvga *s = &d->arch.hvm_domain.stdvga;
@@ -634,6 +529,8 @@ void stdvga_init(struct domain *d)
         register_portio_handler(d, 0x3c4, 2, stdvga_intercept_pio);
         /* Graphics registers. */
         register_portio_handler(d, 0x3ce, 2, stdvga_intercept_pio);
+        /* VGA memory */
+        register_mmio_handler(d, &stdvga_mem_ops);
     }
 }
 
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index afa27bf..00b4a89 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -122,7 +122,6 @@ struct hvm_hw_stdvga {
 };
 
 void stdvga_init(struct domain *d);
-int stdvga_intercept_mmio(ioreq_t *p);
 void stdvga_deinit(struct domain *d);
 
 extern void hvm_dpci_msi_eoi(struct domain *d, int vector);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 11/18] x86/hvm: revert 82ed8716b "fix direct PCI port I/O emulation retry...
  2015-06-23 10:39 [PATCH v3 00/18] x86/hvm: I/O emulation cleanup and fix Paul Durrant
                   ` (9 preceding siblings ...)
  2015-06-23 10:39 ` [PATCH v3 10/18] x86/hvm: unify stdvga mmio intercept with standard mmio intercept Paul Durrant
@ 2015-06-23 10:39 ` Paul Durrant
  2015-06-23 10:39 ` [PATCH v3 12/18] x86/hvm: only call hvm_io_assist() from hvm_wait_for_io() Paul Durrant
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: Paul Durrant @ 2015-06-23 10:39 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant, Keir Fraser, Jan Beulich

...and error handling"

NOTE: A straight reversion was not possible because of subsequent changes
      in the code so this at least partially a manual reversion.

By limiting hvmemul_do_io_addr() to reps falling within the pages on which
a reference has already been taken, we can guarantee that calls to
hvm_copy_to/from_guest_phys() will not hit the HVMCOPY_gfn_paged_out
or HVMCOPY_gfn_shared cases. Thus we can remove the retry logic from
the intercept code and simplify it significantly.

Normally hvmemul_do_io_addr() will only reference single page at a time.
It will, however, take an extra page reference for I/O spanning a page
boundary.

It is still important to know, upon returning from x86_emulate(), whether
the number of reps was reduced so the mmio_retry flag is retained for that
purpose.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/emulate.c     |   86 +++++++++++++++++++++++++++-------------
 xen/arch/x86/hvm/hvm.c         |    4 ++
 xen/arch/x86/hvm/intercept.c   |   52 +++++-------------------
 xen/include/asm-x86/hvm/vcpu.h |    2 +-
 4 files changed, 74 insertions(+), 70 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 7d6c0eb..e132d7c 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -52,7 +52,7 @@ static void hvmtrace_io_assist(ioreq_t *p)
 }
 
 static int hvmemul_do_io(
-    bool_t is_mmio, paddr_t addr, unsigned long *reps, unsigned int size,
+    bool_t is_mmio, paddr_t addr, unsigned long reps, unsigned int size,
     uint8_t dir, bool_t df, bool_t data_is_addr, uintptr_t data)
 {
     struct vcpu *curr = current;
@@ -61,6 +61,7 @@ static int hvmemul_do_io(
         .type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO,
         .addr = addr,
         .size = size,
+        .count = reps,
         .dir = dir,
         .df = df,
         .data = data,
@@ -126,15 +127,6 @@ static int hvmemul_do_io(
         HVMIO_dispatched : HVMIO_awaiting_completion;
     vio->io_size = size;
 
-    /*
-     * When retrying a repeated string instruction, force exit to guest after
-     * completion of the retried iteration to allow handling of interrupts.
-     */
-    if ( vio->mmio_retrying )
-        *reps = 1;
-
-    p.count = *reps;
-
     if ( dir == IOREQ_WRITE )
     {
         if ( !data_is_addr )
@@ -148,17 +140,9 @@ static int hvmemul_do_io(
     switch ( rc )
     {
     case X86EMUL_OKAY:
-    case X86EMUL_RETRY:
-        *reps = p.count;
         p.state = STATE_IORESP_READY;
-        if ( !vio->mmio_retry )
-        {
-            hvm_io_assist(&p);
-            vio->io_state = HVMIO_none;
-        }
-        else
-            /* Defer hvm_io_assist() invocation to hvm_do_resume(). */
-            vio->io_state = HVMIO_handle_mmio_awaiting_completion;
+        hvm_io_assist(&p);
+        vio->io_state = HVMIO_none;
         break;
     case X86EMUL_UNHANDLEABLE:
     {
@@ -236,7 +220,7 @@ static int hvmemul_do_io_buffer(
 
     BUG_ON(buffer == NULL);
 
-    rc = hvmemul_do_io(is_mmio, addr, reps, size, dir, df, 0,
+    rc = hvmemul_do_io(is_mmio, addr, *reps, size, dir, df, 0,
                        (uintptr_t)buffer);
     if ( rc == X86EMUL_UNHANDLEABLE && dir == IOREQ_READ )
         memset(buffer, 0xff, size);
@@ -288,17 +272,66 @@ static int hvmemul_do_io_addr(
     bool_t is_mmio, paddr_t addr, unsigned long *reps,
     unsigned int size, uint8_t dir, bool_t df, paddr_t ram_gpa)
 {
-    struct page_info *ram_page;
+    struct vcpu *v = current;
+    unsigned long ram_gmfn = paddr_to_pfn(ram_gpa);
+    struct page_info *ram_page[2];
+    int nr_pages = 0;
+    unsigned long count;
     int rc;
 
-    rc = hvmemul_acquire_page(paddr_to_pfn(ram_gpa), &ram_page);
+    rc = hvmemul_acquire_page(ram_gmfn, &ram_page[nr_pages]);
     if ( rc != X86EMUL_OKAY )
-        return rc;
+        goto out;
 
-    rc = hvmemul_do_io(is_mmio, addr, reps, size, dir, df, 1,
+    nr_pages++;
+
+    /* Detemine how many reps will fit within this page */
+    for ( count = 0; count < *reps; count++ )
+    {
+        paddr_t start, end;
+
+        if ( df )
+        {
+            start = ram_gpa - count * size;
+            end = ram_gpa + size - 1;
+        }
+        else
+        {
+            start = ram_gpa;
+            end = ram_gpa + (count + 1) * size - 1;
+        }
+
+        if ( paddr_to_pfn(start) != ram_gmfn ||
+             paddr_to_pfn(end) != ram_gmfn )
+            break;
+    }
+
+    if ( count == 0 )
+    {
+        /*
+         * This access must span two pages, so grab a reference to
+         * the next page and do a single rep.
+         */
+        rc = hvmemul_acquire_page(df ? ram_gmfn - 1 : ram_gmfn + 1,
+                                  &ram_page[nr_pages]);
+        if ( rc != X86EMUL_OKAY )
+            goto out;
+
+        nr_pages++;
+        count = 1;
+    }
+
+    rc = hvmemul_do_io(is_mmio, addr, count, size, dir, df, 1,
                        ram_gpa);
+    if ( rc == X86EMUL_OKAY )
+    {
+        v->arch.hvm_vcpu.hvm_io.mmio_retry = (count < *reps);
+        *reps = count;
+    }
 
-    hvmemul_release_page(ram_page);
+ out:
+    while ( --nr_pages >= 0 )
+        hvmemul_release_page(ram_page[nr_pages]);
 
     return rc;
 }
@@ -1561,7 +1594,6 @@ static int _hvm_emulate_one(struct hvm_emulate_ctxt *hvmemul_ctxt,
     }
 
     hvmemul_ctxt->exn_pending = 0;
-    vio->mmio_retrying = vio->mmio_retry;
     vio->mmio_retry = 0;
 
     if ( cpu_has_vmx )
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index f8486f4..626c431 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -440,6 +440,7 @@ static bool_t hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p)
 
 void hvm_do_resume(struct vcpu *v)
 {
+    struct hvm_vcpu_io *vio = &v->arch.hvm_vcpu.hvm_io;
     struct domain *d = v->domain;
     struct hvm_ioreq_server *s;
 
@@ -468,6 +469,9 @@ void hvm_do_resume(struct vcpu *v)
         }
     }
 
+    if ( vio->mmio_retry )
+        (void)handle_mmio();
+
     /* Inject pending hw/sw trap */
     if ( v->arch.hvm_vcpu.inject_trap.vector != -1 ) 
     {
diff --git a/xen/arch/x86/hvm/intercept.c b/xen/arch/x86/hvm/intercept.c
index 625e585..02d7408 100644
--- a/xen/arch/x86/hvm/intercept.c
+++ b/xen/arch/x86/hvm/intercept.c
@@ -115,7 +115,6 @@ static const struct hvm_io_ops portio_ops = {
 static int hvm_process_io_intercept(struct hvm_io_handler *handler,
                                     ioreq_t *p)
 {
-    struct hvm_vcpu_io *vio = &current->arch.hvm_vcpu.hvm_io;
     const struct hvm_io_ops *ops = handler->ops;
     int rc = X86EMUL_OKAY, i, step = p->df ? -p->size : p->size;
     uint64_t data;
@@ -125,23 +124,12 @@ static int hvm_process_io_intercept(struct hvm_io_handler *handler,
     {
         for ( i = 0; i < p->count; i++ )
         {
-            if ( vio->mmio_retrying )
-            {
-                if ( vio->mmio_large_read_bytes != p->size )
-                    return X86EMUL_UNHANDLEABLE;
-                memcpy(&data, vio->mmio_large_read, p->size);
-                vio->mmio_large_read_bytes = 0;
-                vio->mmio_retrying = 0;
-            }
-            else
-            {
-                addr = (p->type == IOREQ_TYPE_COPY) ?
-                    p->addr + step * i :
-                    p->addr;
-                rc = ops->read(handler, addr, p->size, &data);
-                if ( rc != X86EMUL_OKAY )
-                    break;
-            }
+            addr = (p->type == IOREQ_TYPE_COPY) ?
+                p->addr + step * i :
+                p->addr;
+            rc = ops->read(handler, addr, p->size, &data);
+            if ( rc != X86EMUL_OKAY )
+                break;
 
             if ( p->data_is_ptr )
             {
@@ -150,14 +138,12 @@ static int hvm_process_io_intercept(struct hvm_io_handler *handler,
                 {
                 case HVMCOPY_okay:
                     break;
-                case HVMCOPY_gfn_paged_out:
-                case HVMCOPY_gfn_shared:
-                    rc = X86EMUL_RETRY;
-                    break;
                 case HVMCOPY_bad_gfn_to_mfn:
                     /* Drop the write as real hardware would. */
                     continue;
                 case HVMCOPY_bad_gva_to_gfn:
+                case HVMCOPY_gfn_paged_out:
+                case HVMCOPY_gfn_shared:
                     ASSERT_UNREACHABLE();
                     /* fall through */
                 default:
@@ -170,13 +156,6 @@ static int hvm_process_io_intercept(struct hvm_io_handler *handler,
             else
                 p->data = data;
         }
-
-        if ( rc == X86EMUL_RETRY )
-        {
-            vio->mmio_retry = 1;
-            vio->mmio_large_read_bytes = p->size;
-            memcpy(vio->mmio_large_read, &data, p->size);
-        }
     }
     else /* p->dir == IOREQ_WRITE */
     {
@@ -189,14 +168,12 @@ static int hvm_process_io_intercept(struct hvm_io_handler *handler,
                 {
                 case HVMCOPY_okay:
                     break;
-                case HVMCOPY_gfn_paged_out:
-                case HVMCOPY_gfn_shared:
-                    rc = X86EMUL_RETRY;
-                    break;
                 case HVMCOPY_bad_gfn_to_mfn:
                     data = ~0;
                     break;
                 case HVMCOPY_bad_gva_to_gfn:
+                case HVMCOPY_gfn_paged_out:
+                case HVMCOPY_gfn_shared:
                     ASSERT_UNREACHABLE();
                     /* fall through */
                 default:
@@ -216,15 +193,6 @@ static int hvm_process_io_intercept(struct hvm_io_handler *handler,
             if ( rc != X86EMUL_OKAY )
                 break;
         }
-
-        if ( rc == X86EMUL_RETRY )
-            vio->mmio_retry = 1;
-    }
-
-    if ( i != 0 )
-    {
-        p->count = i;
-        rc = X86EMUL_OKAY;
     }
 
     return rc;
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index dd08416..97d78bd 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -74,7 +74,7 @@ struct hvm_vcpu_io {
      * For string instruction emulation we need to be able to signal a
      * necessary retry through other than function return codes.
      */
-    bool_t mmio_retry, mmio_retrying;
+    bool_t mmio_retry;
 
     unsigned long msix_unmask_address;
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 12/18] x86/hvm: only call hvm_io_assist() from hvm_wait_for_io()
  2015-06-23 10:39 [PATCH v3 00/18] x86/hvm: I/O emulation cleanup and fix Paul Durrant
                   ` (10 preceding siblings ...)
  2015-06-23 10:39 ` [PATCH v3 11/18] x86/hvm: revert 82ed8716b "fix direct PCI port I/O emulation retry Paul Durrant
@ 2015-06-23 10:39 ` Paul Durrant
  2015-06-23 10:39 ` [PATCH v3 13/18] x86/hvm: split I/O completion handling from state model Paul Durrant
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: Paul Durrant @ 2015-06-23 10:39 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant, Keir Fraser, Jan Beulich

By removing the calls in hvmemul_do_io() (which is replaced by a single
assignment) and hvm_complete_assist_request() (which is replaced by a
call to process_portio_intercept() with a suitable set of ops) then
hvm_io_assist() can be moved into hvm.c and made static (and hence be a
candidate for inlining).

This patch also fixes the I/O state test at the end of hvm_io_assist()
to check the correct value. Since the ioreq server patch series was
integrated the current ioreq state is no longer an indicator of in-flight
I/O state, since an I/O sheduled by re-emulation may be targetted at a
different ioreq server.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/emulate.c   |   34 +++++++++++++++++---
 xen/arch/x86/hvm/hvm.c       |   70 +++++++++++++++++++++++-------------------
 xen/arch/x86/hvm/intercept.c |    4 +--
 xen/arch/x86/hvm/io.c        |   39 -----------------------
 xen/include/asm-x86/hvm/io.h |    3 +-
 5 files changed, 73 insertions(+), 77 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index e132d7c..eca90b5 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -51,6 +51,32 @@ static void hvmtrace_io_assist(ioreq_t *p)
     trace_var(event, 0/*!cycles*/, size, buffer);
 }
 
+static int null_read(struct hvm_io_handler *io_handler,
+                     uint64_t addr,
+                     uint64_t size,
+                     uint64_t *data)
+{
+    *data = ~0ul;
+    return X86EMUL_OKAY;
+}
+
+static int null_write(struct hvm_io_handler *handler,
+                      uint64_t addr,
+                      uint64_t size,
+                      uint64_t data)
+{
+    return X86EMUL_OKAY;
+}
+
+static const struct hvm_io_ops null_ops = {
+    .read = null_read,
+    .write = null_write
+};
+
+static struct hvm_io_handler null_handler = {
+    .ops = &null_ops
+};
+
 static int hvmemul_do_io(
     bool_t is_mmio, paddr_t addr, unsigned long reps, unsigned int size,
     uint8_t dir, bool_t df, bool_t data_is_addr, uintptr_t data)
@@ -140,8 +166,7 @@ static int hvmemul_do_io(
     switch ( rc )
     {
     case X86EMUL_OKAY:
-        p.state = STATE_IORESP_READY;
-        hvm_io_assist(&p);
+        vio->io_data = p.data;
         vio->io_state = HVMIO_none;
         break;
     case X86EMUL_UNHANDLEABLE:
@@ -152,8 +177,9 @@ static int hvmemul_do_io(
         /* If there is no suitable backing DM, just ignore accesses */
         if ( !s )
         {
-            hvm_complete_assist_req(&p);
-            rc = X86EMUL_OKAY;
+            rc = hvm_process_io_intercept(&null_handler, &p);
+            if ( rc == X86EMUL_OKAY )
+                vio->io_data = p.data;
             vio->io_state = HVMIO_none;
         }
         else
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 626c431..3365abb 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -411,6 +411,45 @@ bool_t hvm_io_pending(struct vcpu *v)
     return 0;
 }
 
+static void hvm_io_assist(ioreq_t *p)
+{
+    struct vcpu *curr = current;
+    struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
+    enum hvm_io_state io_state;
+
+    p->state = STATE_IOREQ_NONE;
+
+    io_state = vio->io_state;
+    vio->io_state = HVMIO_none;
+
+    switch ( io_state )
+    {
+    case HVMIO_awaiting_completion:
+        vio->io_state = HVMIO_completed;
+        vio->io_data = p->data;
+        break;
+    case HVMIO_handle_mmio_awaiting_completion:
+        vio->io_state = HVMIO_completed;
+        vio->io_data = p->data;
+        (void)handle_mmio();
+        break;
+    case HVMIO_handle_pio_awaiting_completion:
+        if ( vio->io_size == 4 ) /* Needs zero extension. */
+            guest_cpu_user_regs()->rax = (uint32_t)p->data;
+        else
+            memcpy(&guest_cpu_user_regs()->rax, &p->data, vio->io_size);
+        break;
+    default:
+        break;
+    }
+
+    if ( p->state == STATE_IOREQ_NONE )
+    {
+        msix_write_completion(curr);
+        vcpu_end_shutdown_deferral(curr);
+    }
+}
+
 static bool_t hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p)
 {
     /* NB. Optimised for common case (p->state == STATE_IOREQ_NONE). */
@@ -2667,37 +2706,6 @@ int hvm_send_assist_req(struct hvm_ioreq_server *s, ioreq_t *proto_p)
     return X86EMUL_UNHANDLEABLE;
 }
 
-void hvm_complete_assist_req(ioreq_t *p)
-{
-    switch ( p->type )
-    {
-    case IOREQ_TYPE_PCI_CONFIG:
-        ASSERT_UNREACHABLE();
-        break;
-    case IOREQ_TYPE_COPY:
-    case IOREQ_TYPE_PIO:
-        if ( p->dir == IOREQ_READ )
-        {
-            if ( !p->data_is_ptr )
-                p->data = ~0ul;
-            else
-            {
-                int i, step = p->df ? -p->size : p->size;
-                uint32_t data = ~0;
-
-                for ( i = 0; i < p->count; i++ )
-                    hvm_copy_to_guest_phys(p->data + step * i, &data,
-                                           p->size);
-            }
-        }
-        /* FALLTHRU */
-    default:
-        p->state = STATE_IORESP_READY;
-        hvm_io_assist(p);
-        break;
-    }
-}
-
 void hvm_broadcast_assist_req(ioreq_t *p)
 {
     struct domain *d = current->domain;
diff --git a/xen/arch/x86/hvm/intercept.c b/xen/arch/x86/hvm/intercept.c
index 02d7408..d6f1966 100644
--- a/xen/arch/x86/hvm/intercept.c
+++ b/xen/arch/x86/hvm/intercept.c
@@ -112,8 +112,8 @@ static const struct hvm_io_ops portio_ops = {
     .write = hvm_portio_write
 };
 
-static int hvm_process_io_intercept(struct hvm_io_handler *handler,
-                                    ioreq_t *p)
+int hvm_process_io_intercept(struct hvm_io_handler *handler,
+                             ioreq_t *p)
 {
     const struct hvm_io_ops *ops = handler->ops;
     int rc = X86EMUL_OKAY, i, step = p->df ? -p->size : p->size;
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index 51ef19a..61df6dd 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -169,45 +169,6 @@ int handle_pio(uint16_t port, unsigned int size, int dir)
     return 1;
 }
 
-void hvm_io_assist(ioreq_t *p)
-{
-    struct vcpu *curr = current;
-    struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
-    enum hvm_io_state io_state;
-
-    p->state = STATE_IOREQ_NONE;
-
-    io_state = vio->io_state;
-    vio->io_state = HVMIO_none;
-
-    switch ( io_state )
-    {
-    case HVMIO_awaiting_completion:
-        vio->io_state = HVMIO_completed;
-        vio->io_data = p->data;
-        break;
-    case HVMIO_handle_mmio_awaiting_completion:
-        vio->io_state = HVMIO_completed;
-        vio->io_data = p->data;
-        (void)handle_mmio();
-        break;
-    case HVMIO_handle_pio_awaiting_completion:
-        if ( vio->io_size == 4 ) /* Needs zero extension. */
-            guest_cpu_user_regs()->rax = (uint32_t)p->data;
-        else
-            memcpy(&guest_cpu_user_regs()->rax, &p->data, vio->io_size);
-        break;
-    default:
-        break;
-    }
-
-    if ( p->state == STATE_IOREQ_NONE )
-    {
-        msix_write_completion(curr);
-        vcpu_end_shutdown_deferral(curr);
-    }
-}
-
 static bool_t dpci_portio_accept(struct hvm_io_handler *handler,
                                  uint64_t addr,
                                  uint64_t size)
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index 00b4a89..f913c3b 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -82,6 +82,8 @@ struct hvm_io_ops {
     hvm_io_write_t  write;
 };
 
+int hvm_process_io_intercept(struct hvm_io_handler *handler,
+                             ioreq_t *p);
 int hvm_io_intercept(ioreq_t *p);
 
 struct hvm_io_handler *hvm_next_io_handler(struct domain *d);
@@ -103,7 +105,6 @@ int handle_mmio_with_translation(unsigned long gva, unsigned long gpfn,
                                  struct npfec);
 int handle_pio(uint16_t port, unsigned int size, int dir);
 void hvm_interrupt_post(struct vcpu *v, int vector, int type);
-void hvm_io_assist(ioreq_t *p);
 void hvm_dpci_eoi(struct domain *d, unsigned int guest_irq,
                   const union vioapic_redir_entry *ent);
 void msix_write_completion(struct vcpu *);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 13/18] x86/hvm: split I/O completion handling from state model
  2015-06-23 10:39 [PATCH v3 00/18] x86/hvm: I/O emulation cleanup and fix Paul Durrant
                   ` (11 preceding siblings ...)
  2015-06-23 10:39 ` [PATCH v3 12/18] x86/hvm: only call hvm_io_assist() from hvm_wait_for_io() Paul Durrant
@ 2015-06-23 10:39 ` Paul Durrant
  2015-06-23 10:39 ` [PATCH v3 14/18] x86/hvm: remove HVMIO_dispatched I/O state Paul Durrant
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: Paul Durrant @ 2015-06-23 10:39 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant, Keir Fraser, Jan Beulich

The state of in-flight I/O and how its completion will be handled are
logically separate and conflating the two makes the code unnecessarily
confusing.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/hvm.c            |   50 ++++++++++++++++++++++++-------------
 xen/arch/x86/hvm/io.c             |    6 ++---
 xen/arch/x86/hvm/vmx/realmode.c   |   27 ++++++++++++++------
 xen/include/asm-x86/hvm/vcpu.h    |   16 ++++++++----
 xen/include/asm-x86/hvm/vmx/vmx.h |    1 +
 5 files changed, 68 insertions(+), 32 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 3365abb..39f40ad 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -59,6 +59,7 @@
 #include <asm/hvm/trace.h>
 #include <asm/hvm/nestedhvm.h>
 #include <asm/hvm/event.h>
+#include <asm/hvm/vmx/vmx.h>
 #include <asm/mtrr.h>
 #include <asm/apic.h>
 #include <public/sched.h>
@@ -428,26 +429,12 @@ static void hvm_io_assist(ioreq_t *p)
         vio->io_state = HVMIO_completed;
         vio->io_data = p->data;
         break;
-    case HVMIO_handle_mmio_awaiting_completion:
-        vio->io_state = HVMIO_completed;
-        vio->io_data = p->data;
-        (void)handle_mmio();
-        break;
-    case HVMIO_handle_pio_awaiting_completion:
-        if ( vio->io_size == 4 ) /* Needs zero extension. */
-            guest_cpu_user_regs()->rax = (uint32_t)p->data;
-        else
-            memcpy(&guest_cpu_user_regs()->rax, &p->data, vio->io_size);
-        break;
     default:
         break;
     }
 
-    if ( p->state == STATE_IOREQ_NONE )
-    {
-        msix_write_completion(curr);
-        vcpu_end_shutdown_deferral(curr);
-    }
+    msix_write_completion(curr);
+    vcpu_end_shutdown_deferral(curr);
 }
 
 static bool_t hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p)
@@ -482,6 +469,7 @@ void hvm_do_resume(struct vcpu *v)
     struct hvm_vcpu_io *vio = &v->arch.hvm_vcpu.hvm_io;
     struct domain *d = v->domain;
     struct hvm_ioreq_server *s;
+    enum hvm_io_completion io_completion;
 
     check_wakeup_from_wait();
 
@@ -508,8 +496,36 @@ void hvm_do_resume(struct vcpu *v)
         }
     }
 
-    if ( vio->mmio_retry )
+    io_completion = vio->io_completion;
+    vio->io_completion = HVMIO_no_completion;
+
+    switch ( io_completion )
+    {
+    case HVMIO_no_completion:
+        break;
+    case HVMIO_mmio_completion:
         (void)handle_mmio();
+        break;
+    case HVMIO_pio_completion:
+        if ( vio->io_size == 4 ) /* Needs zero extension. */
+            guest_cpu_user_regs()->rax = (uint32_t)vio->io_data;
+        else
+            memcpy(&guest_cpu_user_regs()->rax, &vio->io_data, vio->io_size);
+        vio->io_state = HVMIO_none;
+        break;
+    case HVMIO_realmode_completion:
+    {
+        struct hvm_emulate_ctxt ctxt;
+
+        hvm_emulate_prepare(&ctxt, guest_cpu_user_regs());
+        vmx_realmode_emulate_one(&ctxt);
+        hvm_emulate_writeback(&ctxt);
+
+        break;
+    }
+    default:
+        BUG();
+    }
 
     /* Inject pending hw/sw trap */
     if ( v->arch.hvm_vcpu.inject_trap.vector != -1 ) 
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index 61df6dd..27150e9 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -92,8 +92,8 @@ int handle_mmio(void)
 
     if ( rc != X86EMUL_RETRY )
         vio->io_state = HVMIO_none;
-    if ( vio->io_state == HVMIO_awaiting_completion )
-        vio->io_state = HVMIO_handle_mmio_awaiting_completion;
+    if ( vio->io_state == HVMIO_awaiting_completion || vio->mmio_retry )
+        vio->io_completion = HVMIO_mmio_completion;
     else
         vio->mmio_access = (struct npfec){};
 
@@ -158,7 +158,7 @@ int handle_pio(uint16_t port, unsigned int size, int dir)
             return 0;
         /* Completion in hvm_io_assist() with no re-emulation required. */
         ASSERT(dir == IOREQ_READ);
-        vio->io_state = HVMIO_handle_pio_awaiting_completion;
+        vio->io_completion = HVMIO_pio_completion;
         break;
     default:
         gdprintk(XENLOG_ERR, "Weird HVM ioemulation status %d.\n", rc);
diff --git a/xen/arch/x86/hvm/vmx/realmode.c b/xen/arch/x86/hvm/vmx/realmode.c
index fe8b4a0..76ff9a5 100644
--- a/xen/arch/x86/hvm/vmx/realmode.c
+++ b/xen/arch/x86/hvm/vmx/realmode.c
@@ -101,15 +101,19 @@ static void realmode_deliver_exception(
     }
 }
 
-static void realmode_emulate_one(struct hvm_emulate_ctxt *hvmemul_ctxt)
+void vmx_realmode_emulate_one(struct hvm_emulate_ctxt *hvmemul_ctxt)
 {
     struct vcpu *curr = current;
+    struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
     int rc;
 
     perfc_incr(realmode_emulations);
 
     rc = hvm_emulate_one(hvmemul_ctxt);
 
+    if ( vio->io_state == HVMIO_awaiting_completion || vio->mmio_retry )
+        vio->io_completion = HVMIO_realmode_completion;
+
     if ( rc == X86EMUL_UNHANDLEABLE )
     {
         gdprintk(XENLOG_ERR, "Failed to emulate insn.\n");
@@ -177,9 +181,6 @@ void vmx_realmode(struct cpu_user_regs *regs)
 
     hvm_emulate_prepare(&hvmemul_ctxt, regs);
 
-    if ( vio->io_state == HVMIO_completed )
-        realmode_emulate_one(&hvmemul_ctxt);
-
     /* Only deliver interrupts into emulated real mode. */
     if ( !(curr->arch.hvm_vcpu.guest_cr[0] & X86_CR0_PE) &&
          (intr_info & INTR_INFO_VALID_MASK) )
@@ -190,8 +191,7 @@ void vmx_realmode(struct cpu_user_regs *regs)
 
     curr->arch.hvm_vmx.vmx_emulate = 1;
     while ( curr->arch.hvm_vmx.vmx_emulate &&
-            !softirq_pending(smp_processor_id()) &&
-            (vio->io_state == HVMIO_none) )
+            !softirq_pending(smp_processor_id()) )
     {
         /*
          * Check for pending interrupts only every 16 instructions, because
@@ -203,7 +203,10 @@ void vmx_realmode(struct cpu_user_regs *regs)
              hvm_local_events_need_delivery(curr) )
             break;
 
-        realmode_emulate_one(&hvmemul_ctxt);
+        vmx_realmode_emulate_one(&hvmemul_ctxt);
+
+        if ( vio->io_state != HVMIO_none || vio->mmio_retry )
+            break;
 
         /* Stop emulating unless our segment state is not safe */
         if ( curr->arch.hvm_vmx.vmx_realmode )
@@ -245,3 +248,13 @@ void vmx_realmode(struct cpu_user_regs *regs)
     if ( intr_info & INTR_INFO_VALID_MASK )
         __vmwrite(VM_ENTRY_INTR_INFO, intr_info);
 }
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index 97d78bd..c4d96a8 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -34,11 +34,16 @@ enum hvm_io_state {
     HVMIO_none = 0,
     HVMIO_dispatched,
     HVMIO_awaiting_completion,
-    HVMIO_handle_mmio_awaiting_completion,
-    HVMIO_handle_pio_awaiting_completion,
     HVMIO_completed
 };
 
+enum hvm_io_completion {
+    HVMIO_no_completion = 0,
+    HVMIO_mmio_completion,
+    HVMIO_pio_completion,
+    HVMIO_realmode_completion
+};
+
 struct hvm_vcpu_asid {
     uint64_t generation;
     uint32_t asid;
@@ -46,9 +51,10 @@ struct hvm_vcpu_asid {
 
 struct hvm_vcpu_io {
     /* I/O request in flight to device model. */
-    enum hvm_io_state   io_state;
-    unsigned long       io_data;
-    int                 io_size;
+    enum hvm_io_state      io_state;
+    unsigned long          io_data;
+    int                    io_size;
+    enum hvm_io_completion io_completion;
 
     /*
      * HVM emulation:
diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
index 35f804a..c5f3d24 100644
--- a/xen/include/asm-x86/hvm/vmx/vmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vmx.h
@@ -94,6 +94,7 @@ void vmx_asm_do_vmentry(void);
 void vmx_intr_assist(void);
 void noreturn vmx_do_resume(struct vcpu *);
 void vmx_vlapic_msr_changed(struct vcpu *v);
+void vmx_realmode_emulate_one(struct hvm_emulate_ctxt *hvmemul_ctxt);
 void vmx_realmode(struct cpu_user_regs *regs);
 void vmx_update_debug_state(struct vcpu *v);
 void vmx_update_exception_bitmap(struct vcpu *v);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 14/18] x86/hvm: remove HVMIO_dispatched I/O state
  2015-06-23 10:39 [PATCH v3 00/18] x86/hvm: I/O emulation cleanup and fix Paul Durrant
                   ` (12 preceding siblings ...)
  2015-06-23 10:39 ` [PATCH v3 13/18] x86/hvm: split I/O completion handling from state model Paul Durrant
@ 2015-06-23 10:39 ` Paul Durrant
  2015-06-23 10:39 ` [PATCH v3 15/18] x86/hvm: remove hvm_io_state enumeration Paul Durrant
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: Paul Durrant @ 2015-06-23 10:39 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant, Keir Fraser, Jan Beulich

By removing the HVMIO_dispatched state and making all pending emulations
(i.e. all those not handled by the hypervisor) use HVMIO_awating_completion,
various code-paths can be simplified.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/emulate.c      |   12 +++---------
 xen/arch/x86/hvm/hvm.c          |   12 +++---------
 xen/arch/x86/hvm/io.c           |   12 ++++--------
 xen/arch/x86/hvm/vmx/realmode.c |    2 +-
 xen/include/asm-x86/hvm/vcpu.h  |    8 +++++++-
 5 files changed, 18 insertions(+), 28 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index eca90b5..7edd90c 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -138,20 +138,14 @@ static int hvmemul_do_io(
         if ( data_is_addr || dir == IOREQ_WRITE )
             return X86EMUL_UNHANDLEABLE;
         goto finish_access;
-    case HVMIO_dispatched:
-        /* May have to wait for previous cycle of a multi-write to complete. */
-        if ( is_mmio && !data_is_addr && (dir == IOREQ_WRITE) &&
-             (addr == (vio->mmio_large_write_pa +
-                       vio->mmio_large_write_bytes)) )
-            return X86EMUL_RETRY;
-        /* fallthrough */
     default:
         return X86EMUL_UNHANDLEABLE;
     }
 
-    vio->io_state = (data_is_addr || dir == IOREQ_WRITE) ?
-        HVMIO_dispatched : HVMIO_awaiting_completion;
+    vio->io_state = HVMIO_awaiting_completion;
     vio->io_size = size;
+    vio->io_dir = dir;
+    vio->io_data_is_addr = data_is_addr;
 
     if ( dir == IOREQ_WRITE )
     {
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 39f40ad..4458fa4 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -416,22 +416,16 @@ static void hvm_io_assist(ioreq_t *p)
 {
     struct vcpu *curr = current;
     struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
-    enum hvm_io_state io_state;
 
     p->state = STATE_IOREQ_NONE;
 
-    io_state = vio->io_state;
-    vio->io_state = HVMIO_none;
-
-    switch ( io_state )
+    if ( HVMIO_NEED_COMPLETION(vio) )
     {
-    case HVMIO_awaiting_completion:
         vio->io_state = HVMIO_completed;
         vio->io_data = p->data;
-        break;
-    default:
-        break;
     }
+    else
+        vio->io_state = HVMIO_none;
 
     msix_write_completion(curr);
     vcpu_end_shutdown_deferral(curr);
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index 27150e9..12fc923 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -90,9 +90,7 @@ int handle_mmio(void)
 
     rc = hvm_emulate_one(&ctxt);
 
-    if ( rc != X86EMUL_RETRY )
-        vio->io_state = HVMIO_none;
-    if ( vio->io_state == HVMIO_awaiting_completion || vio->mmio_retry )
+    if ( HVMIO_NEED_COMPLETION(vio) || vio->mmio_retry )
         vio->io_completion = HVMIO_mmio_completion;
     else
         vio->mmio_access = (struct npfec){};
@@ -142,6 +140,9 @@ int handle_pio(uint16_t port, unsigned int size, int dir)
 
     rc = hvmemul_do_pio_buffer(port, size, dir, &data);
 
+    if ( HVMIO_NEED_COMPLETION(vio) )
+        vio->io_completion = HVMIO_pio_completion;
+
     switch ( rc )
     {
     case X86EMUL_OKAY:
@@ -154,11 +155,6 @@ int handle_pio(uint16_t port, unsigned int size, int dir)
         }
         break;
     case X86EMUL_RETRY:
-        if ( vio->io_state != HVMIO_awaiting_completion )
-            return 0;
-        /* Completion in hvm_io_assist() with no re-emulation required. */
-        ASSERT(dir == IOREQ_READ);
-        vio->io_completion = HVMIO_pio_completion;
         break;
     default:
         gdprintk(XENLOG_ERR, "Weird HVM ioemulation status %d.\n", rc);
diff --git a/xen/arch/x86/hvm/vmx/realmode.c b/xen/arch/x86/hvm/vmx/realmode.c
index 76ff9a5..5e56a1f 100644
--- a/xen/arch/x86/hvm/vmx/realmode.c
+++ b/xen/arch/x86/hvm/vmx/realmode.c
@@ -111,7 +111,7 @@ void vmx_realmode_emulate_one(struct hvm_emulate_ctxt *hvmemul_ctxt)
 
     rc = hvm_emulate_one(hvmemul_ctxt);
 
-    if ( vio->io_state == HVMIO_awaiting_completion || vio->mmio_retry )
+    if ( HVMIO_NEED_COMPLETION(vio) || vio->mmio_retry )
         vio->io_completion = HVMIO_realmode_completion;
 
     if ( rc == X86EMUL_UNHANDLEABLE )
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index c4d96a8..2830057 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -32,7 +32,6 @@
 
 enum hvm_io_state {
     HVMIO_none = 0,
-    HVMIO_dispatched,
     HVMIO_awaiting_completion,
     HVMIO_completed
 };
@@ -55,6 +54,13 @@ struct hvm_vcpu_io {
     unsigned long          io_data;
     int                    io_size;
     enum hvm_io_completion io_completion;
+    uint8_t                io_dir;
+    uint8_t                io_data_is_addr;
+
+#define HVMIO_NEED_COMPLETION(_vio) \
+    ( ((_vio)->io_state == HVMIO_awaiting_completion) &&    \
+      !(_vio)->io_data_is_addr && \
+      ((_vio)->io_dir == IOREQ_READ) )
 
     /*
      * HVM emulation:
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 15/18] x86/hvm: remove hvm_io_state enumeration
  2015-06-23 10:39 [PATCH v3 00/18] x86/hvm: I/O emulation cleanup and fix Paul Durrant
                   ` (13 preceding siblings ...)
  2015-06-23 10:39 ` [PATCH v3 14/18] x86/hvm: remove HVMIO_dispatched I/O state Paul Durrant
@ 2015-06-23 10:39 ` Paul Durrant
  2015-06-23 10:39 ` [PATCH v3 16/18] x86/hvm: use ioreq_t to track in-flight state Paul Durrant
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: Paul Durrant @ 2015-06-23 10:39 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant, Keir Fraser, Jan Beulich

Emulation request status is already covered by STATE_IOREQ_XXX values so
just use those. The mapping is:

HVMIO_none                -> STATE_IOREQ_NONE
HVMIO_awaiting_completion -> STATE_IOREQ_READY
HVMIO_completed           -> STATE_IORESP_READY

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/emulate.c       |   14 +++++++-------
 xen/arch/x86/hvm/hvm.c           |    6 +++---
 xen/arch/x86/hvm/svm/nestedsvm.c |    2 +-
 xen/arch/x86/hvm/vmx/realmode.c  |    4 ++--
 xen/include/asm-x86/hvm/vcpu.h   |   10 ++--------
 5 files changed, 15 insertions(+), 21 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 7edd90c..23582af 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -131,10 +131,10 @@ static int hvmemul_do_io(
 
     switch ( vio->io_state )
     {
-    case HVMIO_none:
+    case STATE_IOREQ_NONE:
         break;
-    case HVMIO_completed:
-        vio->io_state = HVMIO_none;
+    case STATE_IORESP_READY:
+        vio->io_state = STATE_IOREQ_NONE;
         if ( data_is_addr || dir == IOREQ_WRITE )
             return X86EMUL_UNHANDLEABLE;
         goto finish_access;
@@ -142,7 +142,7 @@ static int hvmemul_do_io(
         return X86EMUL_UNHANDLEABLE;
     }
 
-    vio->io_state = HVMIO_awaiting_completion;
+    vio->io_state = STATE_IOREQ_READY;
     vio->io_size = size;
     vio->io_dir = dir;
     vio->io_data_is_addr = data_is_addr;
@@ -161,7 +161,7 @@ static int hvmemul_do_io(
     {
     case X86EMUL_OKAY:
         vio->io_data = p.data;
-        vio->io_state = HVMIO_none;
+        vio->io_state = STATE_IOREQ_NONE;
         break;
     case X86EMUL_UNHANDLEABLE:
     {
@@ -174,13 +174,13 @@ static int hvmemul_do_io(
             rc = hvm_process_io_intercept(&null_handler, &p);
             if ( rc == X86EMUL_OKAY )
                 vio->io_data = p.data;
-            vio->io_state = HVMIO_none;
+            vio->io_state = STATE_IOREQ_NONE;
         }
         else
         {
             rc = hvm_send_assist_req(s, &p);
             if ( rc != X86EMUL_RETRY )
-                vio->io_state = HVMIO_none;
+                vio->io_state = STATE_IOREQ_NONE;
             else if ( data_is_addr || dir == IOREQ_WRITE )
                 rc = X86EMUL_OKAY;
         }
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 4458fa4..7411287 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -421,11 +421,11 @@ static void hvm_io_assist(ioreq_t *p)
 
     if ( HVMIO_NEED_COMPLETION(vio) )
     {
-        vio->io_state = HVMIO_completed;
+        vio->io_state = STATE_IORESP_READY;
         vio->io_data = p->data;
     }
     else
-        vio->io_state = HVMIO_none;
+        vio->io_state = STATE_IOREQ_NONE;
 
     msix_write_completion(curr);
     vcpu_end_shutdown_deferral(curr);
@@ -505,7 +505,7 @@ void hvm_do_resume(struct vcpu *v)
             guest_cpu_user_regs()->rax = (uint32_t)vio->io_data;
         else
             memcpy(&guest_cpu_user_regs()->rax, &vio->io_data, vio->io_size);
-        vio->io_state = HVMIO_none;
+        vio->io_state = STATE_IOREQ_NONE;
         break;
     case HVMIO_realmode_completion:
     {
diff --git a/xen/arch/x86/hvm/svm/nestedsvm.c b/xen/arch/x86/hvm/svm/nestedsvm.c
index be5797a..8b165c6 100644
--- a/xen/arch/x86/hvm/svm/nestedsvm.c
+++ b/xen/arch/x86/hvm/svm/nestedsvm.c
@@ -1231,7 +1231,7 @@ enum hvm_intblk nsvm_intr_blocked(struct vcpu *v)
          * Delay the injection because this would result in delivering
          * an interrupt *within* the execution of an instruction.
          */
-        if ( v->arch.hvm_vcpu.hvm_io.io_state != HVMIO_none )
+        if ( v->arch.hvm_vcpu.hvm_io.io_state != STATE_IOREQ_NONE )
             return hvm_intblk_shadow;
 
         if ( !nv->nv_vmexit_pending && n2vmcb->exitintinfo.bytes != 0 ) {
diff --git a/xen/arch/x86/hvm/vmx/realmode.c b/xen/arch/x86/hvm/vmx/realmode.c
index 5e56a1f..4135ad4 100644
--- a/xen/arch/x86/hvm/vmx/realmode.c
+++ b/xen/arch/x86/hvm/vmx/realmode.c
@@ -205,7 +205,7 @@ void vmx_realmode(struct cpu_user_regs *regs)
 
         vmx_realmode_emulate_one(&hvmemul_ctxt);
 
-        if ( vio->io_state != HVMIO_none || vio->mmio_retry )
+        if ( vio->io_state != STATE_IOREQ_NONE || vio->mmio_retry )
             break;
 
         /* Stop emulating unless our segment state is not safe */
@@ -219,7 +219,7 @@ void vmx_realmode(struct cpu_user_regs *regs)
     }
 
     /* Need to emulate next time if we've started an IO operation */
-    if ( vio->io_state != HVMIO_none )
+    if ( vio->io_state != STATE_IOREQ_NONE )
         curr->arch.hvm_vmx.vmx_emulate = 1;
 
     if ( !curr->arch.hvm_vmx.vmx_emulate && !curr->arch.hvm_vmx.vmx_realmode )
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index 2830057..f797518 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -30,12 +30,6 @@
 #include <asm/hvm/svm/nestedsvm.h>
 #include <asm/mtrr.h>
 
-enum hvm_io_state {
-    HVMIO_none = 0,
-    HVMIO_awaiting_completion,
-    HVMIO_completed
-};
-
 enum hvm_io_completion {
     HVMIO_no_completion = 0,
     HVMIO_mmio_completion,
@@ -50,7 +44,7 @@ struct hvm_vcpu_asid {
 
 struct hvm_vcpu_io {
     /* I/O request in flight to device model. */
-    enum hvm_io_state      io_state;
+    uint8_t                io_state;
     unsigned long          io_data;
     int                    io_size;
     enum hvm_io_completion io_completion;
@@ -58,7 +52,7 @@ struct hvm_vcpu_io {
     uint8_t                io_data_is_addr;
 
 #define HVMIO_NEED_COMPLETION(_vio) \
-    ( ((_vio)->io_state == HVMIO_awaiting_completion) &&    \
+    ( ((_vio)->io_state == STATE_IOREQ_READY) &&    \
       !(_vio)->io_data_is_addr && \
       ((_vio)->io_dir == IOREQ_READ) )
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 16/18] x86/hvm: use ioreq_t to track in-flight state
  2015-06-23 10:39 [PATCH v3 00/18] x86/hvm: I/O emulation cleanup and fix Paul Durrant
                   ` (14 preceding siblings ...)
  2015-06-23 10:39 ` [PATCH v3 15/18] x86/hvm: remove hvm_io_state enumeration Paul Durrant
@ 2015-06-23 10:39 ` Paul Durrant
  2015-06-23 10:39 ` [PATCH v3 17/18] x86/hvm: always re-emulate I/O from a buffer Paul Durrant
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: Paul Durrant @ 2015-06-23 10:39 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant, Keir Fraser, Jan Beulich

Use an ioreq_t rather than open coded state, size, dir and data fields
in struct hvm_vcpu_io. This also allows PIO completion to be handled
similarly to MMIO completion by re-issuing the handle_pio() call.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/emulate.c       |   35 +++++++++++++++++++++--------------
 xen/arch/x86/hvm/hvm.c           |   15 ++++++++-------
 xen/arch/x86/hvm/svm/nestedsvm.c |    2 +-
 xen/arch/x86/hvm/vmx/realmode.c  |    4 ++--
 xen/include/asm-x86/hvm/vcpu.h   |   12 ++++--------
 5 files changed, 36 insertions(+), 32 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 23582af..57a93f2 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -92,6 +92,7 @@ static int hvmemul_do_io(
         .df = df,
         .data = data,
         .data_is_ptr = data_is_addr, /* ioreq_t field name is misleading */
+        .state = STATE_IOREQ_READY,
     };
     void *p_data = (void *)data;
     int rc;
@@ -129,12 +130,24 @@ static int hvmemul_do_io(
         }
     }
 
-    switch ( vio->io_state )
+    switch ( vio->io_req.state )
     {
     case STATE_IOREQ_NONE:
         break;
     case STATE_IORESP_READY:
-        vio->io_state = STATE_IOREQ_NONE;
+        vio->io_req.state = STATE_IOREQ_NONE;
+        p = vio->io_req;
+
+        /* Verify the emulation request has been correctly re-issued */
+        if ( (p.type != is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO) ||
+             (p.addr != addr) ||
+             (p.size != size) ||
+             (p.count != reps) ||
+             (p.dir != dir) ||
+             (p.df != df) ||
+             (p.data_is_ptr != data_is_addr) )
+            domain_crash(curr->domain);
+
         if ( data_is_addr || dir == IOREQ_WRITE )
             return X86EMUL_UNHANDLEABLE;
         goto finish_access;
@@ -142,11 +155,6 @@ static int hvmemul_do_io(
         return X86EMUL_UNHANDLEABLE;
     }
 
-    vio->io_state = STATE_IOREQ_READY;
-    vio->io_size = size;
-    vio->io_dir = dir;
-    vio->io_data_is_addr = data_is_addr;
-
     if ( dir == IOREQ_WRITE )
     {
         if ( !data_is_addr )
@@ -155,13 +163,14 @@ static int hvmemul_do_io(
         hvmtrace_io_assist(&p);
     }
 
+    vio->io_req = p;
+
     rc = hvm_io_intercept(&p);
 
     switch ( rc )
     {
     case X86EMUL_OKAY:
-        vio->io_data = p.data;
-        vio->io_state = STATE_IOREQ_NONE;
+        vio->io_req.state = STATE_IOREQ_NONE;
         break;
     case X86EMUL_UNHANDLEABLE:
     {
@@ -172,15 +181,13 @@ static int hvmemul_do_io(
         if ( !s )
         {
             rc = hvm_process_io_intercept(&null_handler, &p);
-            if ( rc == X86EMUL_OKAY )
-                vio->io_data = p.data;
-            vio->io_state = STATE_IOREQ_NONE;
+            vio->io_req.state = STATE_IOREQ_NONE;
         }
         else
         {
             rc = hvm_send_assist_req(s, &p);
             if ( rc != X86EMUL_RETRY )
-                vio->io_state = STATE_IOREQ_NONE;
+                vio->io_req.state = STATE_IOREQ_NONE;
             else if ( data_is_addr || dir == IOREQ_WRITE )
                 rc = X86EMUL_OKAY;
         }
@@ -199,7 +206,7 @@ static int hvmemul_do_io(
         hvmtrace_io_assist(&p);
 
         if ( !data_is_addr )
-            memcpy(p_data, &vio->io_data, size);
+            memcpy(p_data, &p.data, size);
     }
 
     if ( is_mmio && !data_is_addr )
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 7411287..8abf29b 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -421,11 +421,11 @@ static void hvm_io_assist(ioreq_t *p)
 
     if ( HVMIO_NEED_COMPLETION(vio) )
     {
-        vio->io_state = STATE_IORESP_READY;
-        vio->io_data = p->data;
+        vio->io_req.state = STATE_IORESP_READY;
+        vio->io_req.data = p->data;
     }
     else
-        vio->io_state = STATE_IOREQ_NONE;
+        vio->io_req.state = STATE_IOREQ_NONE;
 
     msix_write_completion(curr);
     vcpu_end_shutdown_deferral(curr);
@@ -501,11 +501,12 @@ void hvm_do_resume(struct vcpu *v)
         (void)handle_mmio();
         break;
     case HVMIO_pio_completion:
-        if ( vio->io_size == 4 ) /* Needs zero extension. */
-            guest_cpu_user_regs()->rax = (uint32_t)vio->io_data;
+        if ( vio->io_req.size == 4 ) /* Needs zero extension. */
+            guest_cpu_user_regs()->rax = (uint32_t)vio->io_req.data;
         else
-            memcpy(&guest_cpu_user_regs()->rax, &vio->io_data, vio->io_size);
-        vio->io_state = STATE_IOREQ_NONE;
+            memcpy(&guest_cpu_user_regs()->rax, &vio->io_req.data,
+                   vio->io_req.size);
+        vio->io_req.state = STATE_IOREQ_NONE;
         break;
     case HVMIO_realmode_completion:
     {
diff --git a/xen/arch/x86/hvm/svm/nestedsvm.c b/xen/arch/x86/hvm/svm/nestedsvm.c
index 8b165c6..78667a2 100644
--- a/xen/arch/x86/hvm/svm/nestedsvm.c
+++ b/xen/arch/x86/hvm/svm/nestedsvm.c
@@ -1231,7 +1231,7 @@ enum hvm_intblk nsvm_intr_blocked(struct vcpu *v)
          * Delay the injection because this would result in delivering
          * an interrupt *within* the execution of an instruction.
          */
-        if ( v->arch.hvm_vcpu.hvm_io.io_state != STATE_IOREQ_NONE )
+        if ( v->arch.hvm_vcpu.hvm_io.io_req.state != STATE_IOREQ_NONE )
             return hvm_intblk_shadow;
 
         if ( !nv->nv_vmexit_pending && n2vmcb->exitintinfo.bytes != 0 ) {
diff --git a/xen/arch/x86/hvm/vmx/realmode.c b/xen/arch/x86/hvm/vmx/realmode.c
index 4135ad4..0f3124d 100644
--- a/xen/arch/x86/hvm/vmx/realmode.c
+++ b/xen/arch/x86/hvm/vmx/realmode.c
@@ -205,7 +205,7 @@ void vmx_realmode(struct cpu_user_regs *regs)
 
         vmx_realmode_emulate_one(&hvmemul_ctxt);
 
-        if ( vio->io_state != STATE_IOREQ_NONE || vio->mmio_retry )
+        if ( vio->io_req.state != STATE_IOREQ_NONE || vio->mmio_retry )
             break;
 
         /* Stop emulating unless our segment state is not safe */
@@ -219,7 +219,7 @@ void vmx_realmode(struct cpu_user_regs *regs)
     }
 
     /* Need to emulate next time if we've started an IO operation */
-    if ( vio->io_state != STATE_IOREQ_NONE )
+    if ( vio->io_req.state != STATE_IOREQ_NONE )
         curr->arch.hvm_vmx.vmx_emulate = 1;
 
     if ( !curr->arch.hvm_vmx.vmx_emulate && !curr->arch.hvm_vmx.vmx_realmode )
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index f797518..7338638 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -44,17 +44,13 @@ struct hvm_vcpu_asid {
 
 struct hvm_vcpu_io {
     /* I/O request in flight to device model. */
-    uint8_t                io_state;
-    unsigned long          io_data;
-    int                    io_size;
+    ioreq_t                io_req;
     enum hvm_io_completion io_completion;
-    uint8_t                io_dir;
-    uint8_t                io_data_is_addr;
 
 #define HVMIO_NEED_COMPLETION(_vio) \
-    ( ((_vio)->io_state == STATE_IOREQ_READY) &&    \
-      !(_vio)->io_data_is_addr && \
-      ((_vio)->io_dir == IOREQ_READ) )
+    ( ((_vio)->io_req.state == STATE_IOREQ_READY) &&    \
+      !(_vio)->io_req.data_is_ptr && \
+      ((_vio)->io_req.dir == IOREQ_READ) )
 
     /*
      * HVM emulation:
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 17/18] x86/hvm: always re-emulate I/O from a buffer
  2015-06-23 10:39 [PATCH v3 00/18] x86/hvm: I/O emulation cleanup and fix Paul Durrant
                   ` (15 preceding siblings ...)
  2015-06-23 10:39 ` [PATCH v3 16/18] x86/hvm: use ioreq_t to track in-flight state Paul Durrant
@ 2015-06-23 10:39 ` Paul Durrant
  2015-06-23 10:39 ` [PATCH v3 18/18] x86/hvm: track large memory mapped accesses by buffer offset Paul Durrant
  2015-06-23 14:47 ` [PATCH v3 00/18] x86/hvm: I/O emulation cleanup and fix Fabio Fantoni
  18 siblings, 0 replies; 29+ messages in thread
From: Paul Durrant @ 2015-06-23 10:39 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant, Keir Fraser, Jan Beulich

If memory mapped I/O is 'chunked' then the I/O must be re-emulated,
otherwise only the first chunk will be processed. This patch makes
sure all I/O from a buffer is re-emulated regardless of whether it
is a read or a write.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/emulate.c     |    4 ++--
 xen/arch/x86/hvm/hvm.c         |   13 ++++++++-----
 xen/include/asm-x86/hvm/vcpu.h |    3 +--
 3 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 57a93f2..d953cdb 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -148,7 +148,7 @@ static int hvmemul_do_io(
              (p.data_is_ptr != data_is_addr) )
             domain_crash(curr->domain);
 
-        if ( data_is_addr || dir == IOREQ_WRITE )
+        if ( data_is_addr )
             return X86EMUL_UNHANDLEABLE;
         goto finish_access;
     default:
@@ -188,7 +188,7 @@ static int hvmemul_do_io(
             rc = hvm_send_assist_req(s, &p);
             if ( rc != X86EMUL_RETRY )
                 vio->io_req.state = STATE_IOREQ_NONE;
-            else if ( data_is_addr || dir == IOREQ_WRITE )
+            else if ( data_is_addr )
                 rc = X86EMUL_OKAY;
         }
         break;
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 8abf29b..c062c9f 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -501,11 +501,14 @@ void hvm_do_resume(struct vcpu *v)
         (void)handle_mmio();
         break;
     case HVMIO_pio_completion:
-        if ( vio->io_req.size == 4 ) /* Needs zero extension. */
-            guest_cpu_user_regs()->rax = (uint32_t)vio->io_req.data;
-        else
-            memcpy(&guest_cpu_user_regs()->rax, &vio->io_req.data,
-                   vio->io_req.size);
+        if ( vio->io_req.dir == IOREQ_READ )
+        {
+            if ( vio->io_req.size == 4 ) /* Needs zero extension. */
+                guest_cpu_user_regs()->rax = (uint32_t)vio->io_req.data;
+            else
+                memcpy(&guest_cpu_user_regs()->rax, &vio->io_req.data,
+                       vio->io_req.size);
+        }
         vio->io_req.state = STATE_IOREQ_NONE;
         break;
     case HVMIO_realmode_completion:
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index 7338638..008c8fa 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -49,8 +49,7 @@ struct hvm_vcpu_io {
 
 #define HVMIO_NEED_COMPLETION(_vio) \
     ( ((_vio)->io_req.state == STATE_IOREQ_READY) &&    \
-      !(_vio)->io_req.data_is_ptr && \
-      ((_vio)->io_req.dir == IOREQ_READ) )
+      !(_vio)->io_req.data_is_ptr )
 
     /*
      * HVM emulation:
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 18/18] x86/hvm: track large memory mapped accesses by buffer offset
  2015-06-23 10:39 [PATCH v3 00/18] x86/hvm: I/O emulation cleanup and fix Paul Durrant
                   ` (16 preceding siblings ...)
  2015-06-23 10:39 ` [PATCH v3 17/18] x86/hvm: always re-emulate I/O from a buffer Paul Durrant
@ 2015-06-23 10:39 ` Paul Durrant
  2015-06-23 14:47 ` [PATCH v3 00/18] x86/hvm: I/O emulation cleanup and fix Fabio Fantoni
  18 siblings, 0 replies; 29+ messages in thread
From: Paul Durrant @ 2015-06-23 10:39 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant, Keir Fraser, Jan Beulich

The code in hvmemul_do_io() that tracks large reads or writes, to avoid
re-issue of component I/O, is defeated by accesses across a page boundary
because it uses physical address. The code is also only relevant to memory
mapped I/O to or from a buffer.

This patch re-factors the code and moves it into hvmemul_phys_mmio_access()
where it is relevant and tracks using buffer offset rather then address.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/emulate.c     |   89 +++++++++++++++-------------------------
 xen/include/asm-x86/hvm/vcpu.h |   16 ++++----
 2 files changed, 43 insertions(+), 62 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index d953cdb..d132321 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -107,29 +107,6 @@ static int hvmemul_do_io(
         return X86EMUL_UNHANDLEABLE;
     }
 
-    if ( is_mmio && !data_is_addr )
-    {
-        /* Part of a multi-cycle read or write? */
-        if ( dir == IOREQ_WRITE )
-        {
-            paddr_t pa = vio->mmio_large_write_pa;
-            unsigned int bytes = vio->mmio_large_write_bytes;
-            if ( (addr >= pa) && ((addr + size) <= (pa + bytes)) )
-                return X86EMUL_OKAY;
-        }
-        else
-        {
-            paddr_t pa = vio->mmio_large_read_pa;
-            unsigned int bytes = vio->mmio_large_read_bytes;
-            if ( (addr >= pa) && ((addr + size) <= (pa + bytes)) )
-            {
-                memcpy(p_data, &vio->mmio_large_read[addr - pa],
-                       size);
-                return X86EMUL_OKAY;
-            }
-        }
-    }
-
     switch ( vio->io_req.state )
     {
     case STATE_IOREQ_NONE:
@@ -209,33 +186,6 @@ static int hvmemul_do_io(
             memcpy(p_data, &p.data, size);
     }
 
-    if ( is_mmio && !data_is_addr )
-    {
-        /* Part of a multi-cycle read or write? */
-        if ( dir == IOREQ_WRITE )
-        {
-            paddr_t pa = vio->mmio_large_write_pa;
-            unsigned int bytes = vio->mmio_large_write_bytes;
-            if ( bytes == 0 )
-                pa = vio->mmio_large_write_pa = addr;
-            if ( addr == (pa + bytes) )
-                vio->mmio_large_write_bytes += size;
-        }
-        else
-        {
-            paddr_t pa = vio->mmio_large_read_pa;
-            unsigned int bytes = vio->mmio_large_read_bytes;
-            if ( bytes == 0 )
-                pa = vio->mmio_large_read_pa = addr;
-            if ( (addr == (pa + bytes)) &&
-                 ((bytes + size) <= sizeof(vio->mmio_large_read)) )
-            {
-                memcpy(&vio->mmio_large_read[bytes], p_data, size);
-                vio->mmio_large_read_bytes += size;
-            }
-        }
-    }
-
     return X86EMUL_OKAY;
 }
 
@@ -604,6 +554,8 @@ static int hvmemul_phys_mmio_access(
     paddr_t gpa, unsigned int size, uint8_t dir, uint8_t *buffer,
     unsigned int *off)
 {
+    struct vcpu *curr = current;
+    struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
     unsigned long one_rep = 1;
     unsigned int chunk;
     int rc = 0;
@@ -623,10 +575,37 @@ static int hvmemul_phys_mmio_access(
 
     while ( size != 0 )
     {
-        rc = hvmemul_do_mmio_buffer(gpa, &one_rep, chunk, dir, 0,
-                                    &buffer[*off]);
-        if ( rc != X86EMUL_OKAY )
-            break;
+        /* Have we already done this chunk? */
+        if ( (*off + chunk) <= vio->mmio_cache[dir].size )
+        {
+            ASSERT(*off + chunk <= vio->mmio_cache[dir].size);
+
+            if ( dir == IOREQ_READ )
+                memcpy(&buffer[*off],
+                       &vio->mmio_cache[IOREQ_READ].buffer[*off],
+                       chunk);
+            else
+            {
+                if ( memcmp(&buffer[*off],
+                            &vio->mmio_cache[IOREQ_WRITE].buffer[*off],
+                            chunk) != 0 )
+                    domain_crash(curr->domain);
+            }
+        }
+        else
+        {
+            ASSERT(*off == vio->mmio_cache[dir].size);
+
+            rc = hvmemul_do_mmio_buffer(gpa, &one_rep, chunk, dir, 0,
+                                        &buffer[*off]);
+            if ( rc != X86EMUL_OKAY )
+                break;
+
+            /* Note that we have now done this chunk */
+            memcpy(&vio->mmio_cache[dir].buffer[*off],
+                   &buffer[*off], chunk);
+            vio->mmio_cache[dir].size += chunk;
+        }
 
         /* Advance to the next chunk */
         gpa += chunk;
@@ -1636,7 +1615,7 @@ static int _hvm_emulate_one(struct hvm_emulate_ctxt *hvmemul_ctxt,
         rc = X86EMUL_RETRY;
     if ( rc != X86EMUL_RETRY )
     {
-        vio->mmio_large_read_bytes = vio->mmio_large_write_bytes = 0;
+        memset(&vio->mmio_cache, 0, sizeof(vio->mmio_cache));
         vio->mmio_insn_bytes = 0;
     }
     else
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index 008c8fa..4f41c83 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -61,13 +61,15 @@ struct hvm_vcpu_io {
     unsigned long       mmio_gva;
     unsigned long       mmio_gpfn;
 
-    /* We may read up to m256 as a number of device-model transactions. */
-    paddr_t mmio_large_read_pa;
-    uint8_t mmio_large_read[32];
-    unsigned int mmio_large_read_bytes;
-    /* We may write up to m256 as a number of device-model transactions. */
-    unsigned int mmio_large_write_bytes;
-    paddr_t mmio_large_write_pa;
+    /*
+     * We may read or write up to m256 as a number of device-model
+     * transactions.
+     */
+    struct {
+        unsigned long size;
+        uint8_t buffer[32];
+    } mmio_cache[2]; /* Indexed by ioreq type */
+
     /* For retries we shouldn't re-fetch the instruction. */
     unsigned int mmio_insn_bytes;
     unsigned char mmio_insn[16];
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 00/18] x86/hvm: I/O emulation cleanup and fix
  2015-06-23 10:39 [PATCH v3 00/18] x86/hvm: I/O emulation cleanup and fix Paul Durrant
                   ` (17 preceding siblings ...)
  2015-06-23 10:39 ` [PATCH v3 18/18] x86/hvm: track large memory mapped accesses by buffer offset Paul Durrant
@ 2015-06-23 14:47 ` Fabio Fantoni
  18 siblings, 0 replies; 29+ messages in thread
From: Fabio Fantoni @ 2015-06-23 14:47 UTC (permalink / raw)
  To: Paul Durrant, xen-devel

Il 23/06/2015 12:39, Paul Durrant ha scritto:
> This patch series re-works much of the code involved in emulation of port
> and memory mapped I/O for HVM guests.
>
> The code has become very convoluted and, at least by inspection, certain
> emulations will apparently malfunction.
>
> The series is broken down into 18 patches (which are also available in
> my xenbits repo: http://xenbits.xen.org/gitweb/?p=people/pauldu/xen.git
> on the emulation25 branch) as follows:
>
> 0001-x86-hvm-simplify-hvmemul_do_io.patch
> 0002-x86-hvm-remove-hvm_io_pending-check-in-hvmemul_do_io.patch
> 0003-x86-hvm-remove-extraneous-parameter-from-hvmtrace_io.patch
> 0004-x86-hvm-make-sure-translated-MMIO-reads-or-writes-fa.patch
> 0005-x86-hvm-remove-multiple-open-coded-chunking-loops.patch
> 0006-x86-hvm-re-name-struct-hvm_mmio_handler-to-hvm_mmio_.patch
> 0007-x86-hvm-unify-internal-portio-and-mmio-intercepts.patch
> 0008-x86-hvm-add-length-to-mmio-check-op.patch
> 0009-x86-hvm-unify-dpci-portio-intercept-with-standard-po.patch
> 0010-x86-hvm-unify-stdvga-mmio-intercept-with-standard-mm.patch
> 0011-x86-hvm-revert-82ed8716b-fix-direct-PCI-port-I-O-emu.patch
> 0012-x86-hvm-only-call-hvm_io_assist-from-hvm_wait_for_io.patch
> 0013-x86-hvm-split-I-O-completion-handling-from-state-mod.patch
> 0014-x86-hvm-remove-HVMIO_dispatched-I-O-state.patch
> 0015-x86-hvm-remove-hvm_io_state-enumeration.patch
> 0016-x86-hvm-use-ioreq_t-to-track-in-flight-state.patch
> 0017-x86-hvm-always-re-emulate-I-O-from-a-buffer.patch
> 0018-x86-hvm-track-large-memory-mapped-accesses-by-buffer.patch
>
> v2:
>   - Removed bogus assertion from patch 16
>   - Re-worked patch 17 after basic testing of back-port onto XenServer
>
> v3:
>   - Addressed comments from Jan
>   - Re-ordered series to bring a couple of more trivial patches to the
>     front
>   - Backport to XenServer (4.5) now passing automated tests
>   - Tested on unstable with QEMU upstream and trad, with and without
>     HAP (to force shadow emulation)
>
>

Thanks, I tested this serie emulation25 branch of your git.
I not found regression for now but trying qxl linux domUs that before 
caused qemu crash on sse2 operations now qemu didn't crashed but screen 
showed nothing and qemu process of domU with cpu at 100% and I'm unable 
to found details in dom0 about the problem, seems similar to 
xen-unstable of one year ago.
Someone can tell me a way to know if sse2 istructions are now fully 
working (including operations with videoram) and/or how to debug this 
problem without nothing that crashes where take datas with gdb?

Thanks for any reply and sorry for my bad english.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 04/18] x86/hvm: make sure translated MMIO reads or writes fall within a page
  2015-06-23 10:39 ` [PATCH v3 04/18] x86/hvm: make sure translated MMIO reads or writes fall within a page Paul Durrant
@ 2015-06-23 15:52   ` Jan Beulich
  2015-06-23 16:13     ` Paul Durrant
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2015-06-23 15:52 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Andrew Cooper, Keir Fraser, xen-devel

>>> On 23.06.15 at 12:39, <paul.durrant@citrix.com> wrote:
> ...otherwise they will simply carry on to the next page using a normal
> linear-to-phys translation.

And what's wrong about this?

> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -586,7 +586,6 @@ static int __hvmemul_read(
>                                          p_data);
>              if ( rc != X86EMUL_OKAY || bytes == chunk )
>                  return rc;
> -            addr += chunk;
>              off += chunk;
>              gpa += chunk;
>              p_data += chunk;
> @@ -594,6 +593,8 @@ static int __hvmemul_read(
>              if ( bytes < chunk )
>                  chunk = bytes;
>          }
> +
> +        return X86EMUL_UNHANDLEABLE;
>      }

All the loop above does is leverage that we don't need to do a
translation, as we already know the physical address. Continuing
if the access crosses a page boundary is not wrong at all afaict.

Jan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 04/18] x86/hvm: make sure translated MMIO reads or writes fall within a page
  2015-06-23 15:52   ` Jan Beulich
@ 2015-06-23 16:13     ` Paul Durrant
  2015-06-23 16:21       ` Jan Beulich
  0 siblings, 1 reply; 29+ messages in thread
From: Paul Durrant @ 2015-06-23 16:13 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Keir (Xen.org), xen-devel

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 23 June 2015 16:53
> To: Paul Durrant
> Cc: Andrew Cooper; xen-devel@lists.xenproject.org; Keir (Xen.org)
> Subject: Re: [PATCH v3 04/18] x86/hvm: make sure translated MMIO reads or
> writes fall within a page
> 
> >>> On 23.06.15 at 12:39, <paul.durrant@citrix.com> wrote:
> > ...otherwise they will simply carry on to the next page using a normal
> > linear-to-phys translation.
> 
> And what's wrong about this?
> 

Is it the right thing to do? It seems wrong.

> > --- a/xen/arch/x86/hvm/emulate.c
> > +++ b/xen/arch/x86/hvm/emulate.c
> > @@ -586,7 +586,6 @@ static int __hvmemul_read(
> >                                          p_data);
> >              if ( rc != X86EMUL_OKAY || bytes == chunk )
> >                  return rc;
> > -            addr += chunk;
> >              off += chunk;
> >              gpa += chunk;
> >              p_data += chunk;
> > @@ -594,6 +593,8 @@ static int __hvmemul_read(
> >              if ( bytes < chunk )
> >                  chunk = bytes;
> >          }
> > +
> > +        return X86EMUL_UNHANDLEABLE;
> >      }
> 
> All the loop above does is leverage that we don't need to do a
> translation, as we already know the physical address. Continuing
> if the access crosses a page boundary is not wrong at all afaict.
> 

In what circumstances would you expect this to happen. My cscope showed up handle_mmio_with_translation() being called by the shadow code and nested page fault code.
The nested code does not look like it expects the I/O to cross a page boundary. Particularly it appears to do checks on the gpa without considering the possibility that the I/O may spill on to a subsequent page. Similarly the shadow code appears to do a va to gpa lookup assuming again that the translation is valid for the whole I/O.

I'm willing to admit that I only have a very basic knowledge of the code in this area but, on the face of it, just walking onto the next linear address after translation does not seem like it's the right thing to do.

  Paul

> Jan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 04/18] x86/hvm: make sure translated MMIO reads or writes fall within a page
  2015-06-23 16:13     ` Paul Durrant
@ 2015-06-23 16:21       ` Jan Beulich
  2015-06-23 16:32         ` Paul Durrant
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2015-06-23 16:21 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Andrew Cooper, Keir(Xen.org), xen-devel

>>> On 23.06.15 at 18:13, <Paul.Durrant@citrix.com> wrote:
>>  -----Original Message-----
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: 23 June 2015 16:53
>> To: Paul Durrant
>> Cc: Andrew Cooper; xen-devel@lists.xenproject.org; Keir (Xen.org)
>> Subject: Re: [PATCH v3 04/18] x86/hvm: make sure translated MMIO reads or
>> writes fall within a page
>> 
>> >>> On 23.06.15 at 12:39, <paul.durrant@citrix.com> wrote:
>> > ...otherwise they will simply carry on to the next page using a normal
>> > linear-to-phys translation.
>> 
>> And what's wrong about this?
> 
> Is it the right thing to do? It seems wrong.

But why? Which way we obtain a linear-to-phys translation doesn't
matter.

>> > --- a/xen/arch/x86/hvm/emulate.c
>> > +++ b/xen/arch/x86/hvm/emulate.c
>> > @@ -586,7 +586,6 @@ static int __hvmemul_read(
>> >                                          p_data);
>> >              if ( rc != X86EMUL_OKAY || bytes == chunk )
>> >                  return rc;
>> > -            addr += chunk;
>> >              off += chunk;
>> >              gpa += chunk;
>> >              p_data += chunk;
>> > @@ -594,6 +593,8 @@ static int __hvmemul_read(
>> >              if ( bytes < chunk )
>> >                  chunk = bytes;
>> >          }
>> > +
>> > +        return X86EMUL_UNHANDLEABLE;
>> >      }
>> 
>> All the loop above does is leverage that we don't need to do a
>> translation, as we already know the physical address. Continuing
>> if the access crosses a page boundary is not wrong at all afaict.
>> 
> 
> In what circumstances would you expect this to happen. My cscope showed up 
> handle_mmio_with_translation() being called by the shadow code and nested 
> page fault code.
> The nested code does not look like it expects the I/O to cross a page 
> boundary. Particularly it appears to do checks on the gpa without considering 
> the possibility that the I/O may spill on to a subsequent page. Similarly the 
> shadow code appears to do a va to gpa lookup assuming again that the 
> translation is valid for the whole I/O.

Taking specific examples of current callers is not ideal. The code
should cope with any (valid) caller, i.e. with any memory operand
a valid instruction can have. And indeed I'd expect the calls from
hvm_hap_nested_page_fault() to handle_mmio() to also have
the potential to reach here (for arbitrary instructions accessing
MMIO memory).

> I'm willing to admit that I only have a very basic knowledge of the code in 
> this area but, on the face of it, just walking onto the next linear address 
> after translation does not seem like it's the right thing to do.

That's how instructions work - they simply do another page
translation if a memory access crosses a page boundary.

Jan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 04/18] x86/hvm: make sure translated MMIO reads or writes fall within a page
  2015-06-23 16:21       ` Jan Beulich
@ 2015-06-23 16:32         ` Paul Durrant
  2015-06-24  7:38           ` Jan Beulich
  0 siblings, 1 reply; 29+ messages in thread
From: Paul Durrant @ 2015-06-23 16:32 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Keir (Xen.org), xen-devel

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 23 June 2015 17:22
> To: Paul Durrant
> Cc: Andrew Cooper; xen-devel@lists.xenproject.org; Keir (Xen.org)
> Subject: RE: [PATCH v3 04/18] x86/hvm: make sure translated MMIO reads or
> writes fall within a page
> 
> >>> On 23.06.15 at 18:13, <Paul.Durrant@citrix.com> wrote:
> >>  -----Original Message-----
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> Sent: 23 June 2015 16:53
> >> To: Paul Durrant
> >> Cc: Andrew Cooper; xen-devel@lists.xenproject.org; Keir (Xen.org)
> >> Subject: Re: [PATCH v3 04/18] x86/hvm: make sure translated MMIO
> reads or
> >> writes fall within a page
> >>
> >> >>> On 23.06.15 at 12:39, <paul.durrant@citrix.com> wrote:
> >> > ...otherwise they will simply carry on to the next page using a normal
> >> > linear-to-phys translation.
> >>
> >> And what's wrong about this?
> >
> > Is it the right thing to do? It seems wrong.
> 
> But why? Which way we obtain a linear-to-phys translation doesn't
> matter.
> 
> >> > --- a/xen/arch/x86/hvm/emulate.c
> >> > +++ b/xen/arch/x86/hvm/emulate.c
> >> > @@ -586,7 +586,6 @@ static int __hvmemul_read(
> >> >                                          p_data);
> >> >              if ( rc != X86EMUL_OKAY || bytes == chunk )
> >> >                  return rc;
> >> > -            addr += chunk;
> >> >              off += chunk;
> >> >              gpa += chunk;
> >> >              p_data += chunk;
> >> > @@ -594,6 +593,8 @@ static int __hvmemul_read(
> >> >              if ( bytes < chunk )
> >> >                  chunk = bytes;
> >> >          }
> >> > +
> >> > +        return X86EMUL_UNHANDLEABLE;
> >> >      }
> >>
> >> All the loop above does is leverage that we don't need to do a
> >> translation, as we already know the physical address. Continuing
> >> if the access crosses a page boundary is not wrong at all afaict.
> >>
> >
> > In what circumstances would you expect this to happen. My cscope
> showed up
> > handle_mmio_with_translation() being called by the shadow code and
> nested
> > page fault code.
> > The nested code does not look like it expects the I/O to cross a page
> > boundary. Particularly it appears to do checks on the gpa without
> considering
> > the possibility that the I/O may spill on to a subsequent page. Similarly the
> > shadow code appears to do a va to gpa lookup assuming again that the
> > translation is valid for the whole I/O.
> 
> Taking specific examples of current callers is not ideal. The code
> should cope with any (valid) caller, i.e. with any memory operand
> a valid instruction can have. And indeed I'd expect the calls from
> hvm_hap_nested_page_fault() to handle_mmio() to also have
> the potential to reach here (for arbitrary instructions accessing
> MMIO memory).
> 
> > I'm willing to admit that I only have a very basic knowledge of the code in
> > this area but, on the face of it, just walking onto the next linear address
> > after translation does not seem like it's the right thing to do.
> 
> That's how instructions work - they simply do another page
> translation if a memory access crosses a page boundary.
> 

Ok. If you believe it's the right thing to do, I'm happy to drop this patch out of the series. I'll send v4 tomorrow.

  Paul

> Jan
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 04/18] x86/hvm: make sure translated MMIO reads or writes fall within a page
  2015-06-23 16:32         ` Paul Durrant
@ 2015-06-24  7:38           ` Jan Beulich
  0 siblings, 0 replies; 29+ messages in thread
From: Jan Beulich @ 2015-06-24  7:38 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Andrew Cooper, Keir(Xen.org), xen-devel

>>> On 23.06.15 at 18:32, <Paul.Durrant@citrix.com> wrote:
> Ok. If you believe it's the right thing to do, I'm happy to drop this patch 
> out of the series. I'll send v4 tomorrow.

Perhaps worth waiting a little for further review comments (fwiw I
didn't get to look at 5 and onwards so far)? But of course if you
don't mind doing an extra round...

Jan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 05/18] x86/hvm: remove multiple open coded 'chunking' loops
  2015-06-23 10:39 ` [PATCH v3 05/18] x86/hvm: remove multiple open coded 'chunking' loops Paul Durrant
@ 2015-06-24  9:38   ` Jan Beulich
  2015-06-24 10:10     ` Paul Durrant
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2015-06-24  9:38 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Andrew Cooper, Keir Fraser, xen-devel

>>> On 23.06.15 at 12:39, <paul.durrant@citrix.com> wrote:
> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -540,6 +540,115 @@ static int hvmemul_virtual_to_linear(
>      return X86EMUL_EXCEPTION;
>  }
>  
> +static int hvmemul_phys_mmio_access(
> +    paddr_t gpa, unsigned int size, uint8_t dir, uint8_t *buffer,
> +    unsigned int *off)

Why this (buffer, off) pair? The caller can easily adjust buffer as
necessary, avoiding the other parameter altogether. And buffer
itself can be void * just like it is in some of the callers (and the
others should follow suit).

> +{
> +    unsigned long one_rep = 1;
> +    unsigned int chunk;
> +    int rc = 0;
> +
> +    /* Accesses must fall within a page */
> +    if ( (gpa & (PAGE_SIZE - 1)) + size > PAGE_SIZE )
> +        return X86EMUL_UNHANDLEABLE;

As for patch 4 - this imposes a restriction that real hardware doesn't
have, and hence this needs to be replaced by adjusting the one caller
not currently guaranteeing this such that it caps the size.

> +    /*
> +     * hvmemul_do_io() cannot handle non-power-of-2 accesses or
> +     * accesses larger than sizeof(long), so choose the highest power
> +     * of 2 not exceeding sizeof(long) as the 'chunk' size.
> +     */
> +    chunk = 1 << (fls(size) - 1);
> +    if ( chunk > sizeof (long) )
> +        chunk = sizeof (long);

I suppose you intentionally generalize this; if so this should be
mentioned in the commit message. This is particularly because it
results in changed behavior (which isn't to say that I'm sure the
previous way was any better in the sense of being closer to what
real hardware does): Right now, an 8 byte access at the last
byte of a page would get carried out as 8 1-byte accesses. Your
change makes it a 1-, 4-, 2-, and 1-byte access in that order.

Also, considering instruction characteristics (as explained in the
original comment) I think the old way of determining the chunk
size may have been cheaper than yours using fls().

> +
> +    while ( size != 0 )
> +    {
> +        rc = hvmemul_do_mmio_buffer(gpa, &one_rep, chunk, dir, 0,
> +                                    &buffer[*off]);
> +        if ( rc != X86EMUL_OKAY )
> +            break;
> +
> +        /* Advance to the next chunk */
> +        gpa += chunk;
> +        *off += chunk;
> +        size -= chunk;
> +
> +        /*
> +         * If the chunk now exceeds the remaining size, choose the next
> +         * lowest power of 2 that will fit.
> +         */
> +        while ( chunk > size )
> +            chunk >>= 1;

Please avoid this loop when size == 0. Since the function won't be
called with size being zero, I think the loop should be a for ( ; ; )
one with the loop exit condition put in the middle.

> @@ -549,52 +658,26 @@ static int __hvmemul_read(
>      struct hvm_emulate_ctxt *hvmemul_ctxt)
>  {
>      struct vcpu *curr = current;
> -    unsigned long addr, reps = 1;
> -    unsigned int off, chunk = min(bytes, 1U << LONG_BYTEORDER);
> +    unsigned long addr, one_rep = 1;
>      uint32_t pfec = PFEC_page_present;
>      struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
> -    paddr_t gpa;
>      int rc;
>  
>      rc = hvmemul_virtual_to_linear(
> -        seg, offset, bytes, &reps, access_type, hvmemul_ctxt, &addr);
> +        seg, offset, bytes, &one_rep, access_type, hvmemul_ctxt, &addr);
>      if ( rc != X86EMUL_OKAY )
>          return rc;
> -    off = addr & (PAGE_SIZE - 1);
> -    /*
> -     * We only need to handle sizes actual instruction operands can have. All
> -     * such sizes are either powers of 2 or the sum of two powers of 2. Thus
> -     * picking as initial chunk size the largest power of 2 not greater than
> -     * the total size will always result in only power-of-2 size requests
> -     * issued to hvmemul_do_mmio() (hvmemul_do_io() rejects non-powers-of-2).
> -     */
> -    while ( chunk & (chunk - 1) )
> -        chunk &= chunk - 1;
> -    if ( off + bytes > PAGE_SIZE )
> -        while ( off & (chunk - 1) )
> -            chunk >>= 1;
>  
>      if ( ((access_type != hvm_access_insn_fetch
>             ? vio->mmio_access.read_access
>             : vio->mmio_access.insn_fetch)) &&
>           (vio->mmio_gva == (addr & PAGE_MASK)) )
>      {
> -        gpa = (((paddr_t)vio->mmio_gpfn << PAGE_SHIFT) | off);
> -        while ( (off + chunk) <= PAGE_SIZE )
> -        {
> -            rc = hvmemul_do_mmio_buffer(gpa, &reps, chunk, IOREQ_READ, 0,
> -                                        p_data);
> -            if ( rc != X86EMUL_OKAY || bytes == chunk )
> -                return rc;
> -            off += chunk;
> -            gpa += chunk;
> -            p_data += chunk;
> -            bytes -= chunk;
> -            if ( bytes < chunk )
> -                chunk = bytes;
> -        }
> +        unsigned int off = 0;
> +        paddr_t gpa = (((paddr_t)vio->mmio_gpfn << PAGE_SHIFT) |

If you already touch this, pfn_to_paddr() please.

Jan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 05/18] x86/hvm: remove multiple open coded 'chunking' loops
  2015-06-24  9:38   ` Jan Beulich
@ 2015-06-24 10:10     ` Paul Durrant
  2015-06-24 10:27       ` Jan Beulich
  0 siblings, 1 reply; 29+ messages in thread
From: Paul Durrant @ 2015-06-24 10:10 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Keir (Xen.org), xen-devel

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 24 June 2015 10:38
> To: Paul Durrant
> Cc: Andrew Cooper; xen-devel@lists.xenproject.org; Keir (Xen.org)
> Subject: Re: [PATCH v3 05/18] x86/hvm: remove multiple open coded
> 'chunking' loops
> 
> >>> On 23.06.15 at 12:39, <paul.durrant@citrix.com> wrote:
> > --- a/xen/arch/x86/hvm/emulate.c
> > +++ b/xen/arch/x86/hvm/emulate.c
> > @@ -540,6 +540,115 @@ static int hvmemul_virtual_to_linear(
> >      return X86EMUL_EXCEPTION;
> >  }
> >
> > +static int hvmemul_phys_mmio_access(
> > +    paddr_t gpa, unsigned int size, uint8_t dir, uint8_t *buffer,
> > +    unsigned int *off)
> 
> Why this (buffer, off) pair? The caller can easily adjust buffer as
> necessary, avoiding the other parameter altogether. And buffer
> itself can be void * just like it is in some of the callers (and the
> others should follow suit).
> 

It actually becomes necessary in a later patch, but I'll make the change there instead. As for incrementing a void *, I know that MSVC disallows this. I believe it is a gcc-ism which I guess clang must tolerate, but I don't think it is standard C.

> > +{
> > +    unsigned long one_rep = 1;
> > +    unsigned int chunk;
> > +    int rc = 0;
> > +
> > +    /* Accesses must fall within a page */
> > +    if ( (gpa & (PAGE_SIZE - 1)) + size > PAGE_SIZE )
> > +        return X86EMUL_UNHANDLEABLE;
> 
> As for patch 4 - this imposes a restriction that real hardware doesn't
> have, and hence this needs to be replaced by adjusting the one caller
> not currently guaranteeing this such that it caps the size.
> 

Ok.

> > +    /*
> > +     * hvmemul_do_io() cannot handle non-power-of-2 accesses or
> > +     * accesses larger than sizeof(long), so choose the highest power
> > +     * of 2 not exceeding sizeof(long) as the 'chunk' size.
> > +     */
> > +    chunk = 1 << (fls(size) - 1);
> > +    if ( chunk > sizeof (long) )
> > +        chunk = sizeof (long);
> 
> I suppose you intentionally generalize this; if so this should be
> mentioned in the commit message. This is particularly because it
> results in changed behavior (which isn't to say that I'm sure the
> previous way was any better in the sense of being closer to what
> real hardware does): Right now, an 8 byte access at the last
> byte of a page would get carried out as 8 1-byte accesses. Your
> change makes it a 1-, 4-, 2-, and 1-byte access in that order.
> 

...which is certainly going to be quicker since it's only 4 round-trips to an emulator rather than 8.

> Also, considering instruction characteristics (as explained in the
> original comment) I think the old way of determining the chunk
> size may have been cheaper than yours using fls().
> 

I thought fls() was generally implemented using inline assembler and was pretty fast. I didn't actually check how Xen implements it; I just assumed it would be optimal.

> > +
> > +    while ( size != 0 )
> > +    {
> > +        rc = hvmemul_do_mmio_buffer(gpa, &one_rep, chunk, dir, 0,
> > +                                    &buffer[*off]);
> > +        if ( rc != X86EMUL_OKAY )
> > +            break;
> > +
> > +        /* Advance to the next chunk */
> > +        gpa += chunk;
> > +        *off += chunk;
> > +        size -= chunk;
> > +
> > +        /*
> > +         * If the chunk now exceeds the remaining size, choose the next
> > +         * lowest power of 2 that will fit.
> > +         */
> > +        while ( chunk > size )
> > +            chunk >>= 1;
> 
> Please avoid this loop when size == 0. Since the function won't be
> called with size being zero, I think the loop should be a for ( ; ; )
> one with the loop exit condition put in the middle.

Sure.

> 
> > @@ -549,52 +658,26 @@ static int __hvmemul_read(
> >      struct hvm_emulate_ctxt *hvmemul_ctxt)
> >  {
> >      struct vcpu *curr = current;
> > -    unsigned long addr, reps = 1;
> > -    unsigned int off, chunk = min(bytes, 1U << LONG_BYTEORDER);
> > +    unsigned long addr, one_rep = 1;
> >      uint32_t pfec = PFEC_page_present;
> >      struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
> > -    paddr_t gpa;
> >      int rc;
> >
> >      rc = hvmemul_virtual_to_linear(
> > -        seg, offset, bytes, &reps, access_type, hvmemul_ctxt, &addr);
> > +        seg, offset, bytes, &one_rep, access_type, hvmemul_ctxt, &addr);
> >      if ( rc != X86EMUL_OKAY )
> >          return rc;
> > -    off = addr & (PAGE_SIZE - 1);
> > -    /*
> > -     * We only need to handle sizes actual instruction operands can have. All
> > -     * such sizes are either powers of 2 or the sum of two powers of 2. Thus
> > -     * picking as initial chunk size the largest power of 2 not greater than
> > -     * the total size will always result in only power-of-2 size requests
> > -     * issued to hvmemul_do_mmio() (hvmemul_do_io() rejects non-
> powers-of-2).
> > -     */
> > -    while ( chunk & (chunk - 1) )
> > -        chunk &= chunk - 1;
> > -    if ( off + bytes > PAGE_SIZE )
> > -        while ( off & (chunk - 1) )
> > -            chunk >>= 1;
> >
> >      if ( ((access_type != hvm_access_insn_fetch
> >             ? vio->mmio_access.read_access
> >             : vio->mmio_access.insn_fetch)) &&
> >           (vio->mmio_gva == (addr & PAGE_MASK)) )
> >      {
> > -        gpa = (((paddr_t)vio->mmio_gpfn << PAGE_SHIFT) | off);
> > -        while ( (off + chunk) <= PAGE_SIZE )
> > -        {
> > -            rc = hvmemul_do_mmio_buffer(gpa, &reps, chunk, IOREQ_READ, 0,
> > -                                        p_data);
> > -            if ( rc != X86EMUL_OKAY || bytes == chunk )
> > -                return rc;
> > -            off += chunk;
> > -            gpa += chunk;
> > -            p_data += chunk;
> > -            bytes -= chunk;
> > -            if ( bytes < chunk )
> > -                chunk = bytes;
> > -        }
> > +        unsigned int off = 0;
> > +        paddr_t gpa = (((paddr_t)vio->mmio_gpfn << PAGE_SHIFT) |
> 
> If you already touch this, pfn_to_paddr() please.
> 

Indeed. Will do.

  Paul

> Jan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 05/18] x86/hvm: remove multiple open coded 'chunking' loops
  2015-06-24 10:10     ` Paul Durrant
@ 2015-06-24 10:27       ` Jan Beulich
  0 siblings, 0 replies; 29+ messages in thread
From: Jan Beulich @ 2015-06-24 10:27 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Andrew Cooper, Keir(Xen.org), xen-devel

>>> On 24.06.15 at 12:10, <Paul.Durrant@citrix.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: 24 June 2015 10:38
>> >>> On 23.06.15 at 12:39, <paul.durrant@citrix.com> wrote:
>> > --- a/xen/arch/x86/hvm/emulate.c
>> > +++ b/xen/arch/x86/hvm/emulate.c
>> > @@ -540,6 +540,115 @@ static int hvmemul_virtual_to_linear(
>> >      return X86EMUL_EXCEPTION;
>> >  }
>> >
>> > +static int hvmemul_phys_mmio_access(
>> > +    paddr_t gpa, unsigned int size, uint8_t dir, uint8_t *buffer,
>> > +    unsigned int *off)
>> 
>> Why this (buffer, off) pair? The caller can easily adjust buffer as
>> necessary, avoiding the other parameter altogether. And buffer
>> itself can be void * just like it is in some of the callers (and the
>> others should follow suit).
>> 
> 
> It actually becomes necessary in a later patch, but I'll make the change 
> there instead. As for incrementing a void *, I know that MSVC disallows this. 
> I believe it is a gcc-ism which I guess clang must tolerate, but I don't think 
> it is standard C.

That's correct.

>> Also, considering instruction characteristics (as explained in the
>> original comment) I think the old way of determining the chunk
>> size may have been cheaper than yours using fls().
>> 
> 
> I thought fls() was generally implemented using inline assembler and was 
> pretty fast. I didn't actually check how Xen implements it; I just assumed it 
> would be optimal.

It uses BSR, which I remember to generally be (perhaps much) slower
than "ordinary" logical operations. But maybe that's not the case
anymore on recent CPUs...

Jan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 07/18] x86/hvm: unify internal portio and mmio intercepts
  2015-06-23 10:39 ` [PATCH v3 07/18] x86/hvm: unify internal portio and mmio intercepts Paul Durrant
@ 2015-06-24 12:07   ` Jan Beulich
  0 siblings, 0 replies; 29+ messages in thread
From: Jan Beulich @ 2015-06-24 12:07 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Andrew Cooper, Keir Fraser, xen-devel

>>> On 23.06.15 at 12:39, <paul.durrant@citrix.com> wrote:
> The implementation of mmio and portio intercepts is unnecessarily different.
> This leads to much code duplication. This patch unifies much of the
> intercept handling, leaving only distinct handlers for stdvga mmio and dpci
> portio. Subsequent patches will unify those handlers.
> 
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> Cc: Keir Fraser <keir@xen.org>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> ---

Please help reviewers of previous versions of your patches by
summarizing changes in the current version here.

> +static int hvm_portio_read(struct hvm_io_handler *handler,
> +                           uint64_t addr,
> +                           uint64_t size,
> +                           uint64_t *data)
>  {
> -    struct vcpu *curr = current;
> -    unsigned int i;
> +    uint32_t val = 0;
> +    int rc;
>  
> -    for ( i = 0; i < HVM_MMIO_HANDLER_NR; ++i )
> -        if ( hvm_mmio_handlers[i]->check(curr, gpa) )
> -            return 1;
> +    BUG_ON(handler->type != IOREQ_TYPE_PIO);
>  
> -    return 0;
> +    rc = handler->u.portio.action(IOREQ_READ, addr, size, &val);
> +    if ( rc == X86EMUL_OKAY )
> +        *data = val;

I think there would be no harm doing this unconditionally, and it
would eliminate the potential of the caller using uninitialized data.

> @@ -284,29 +185,37 @@ static int process_portio_intercept(portio_action_t action, ioreq_t *p)
>      {
>          for ( i = 0; i < p->count; i++ )
>          {
> -            data = 0;
> -            switch ( hvm_copy_from_guest_phys(&data, p->data + step * i,
> -                                              p->size) )
> +            if ( p->data_is_ptr )
>              {
> -            case HVMCOPY_okay:
> -                break;
> -            case HVMCOPY_gfn_paged_out:
> -            case HVMCOPY_gfn_shared:
> -                rc = X86EMUL_RETRY;
> -                break;
> -            case HVMCOPY_bad_gfn_to_mfn:
> -                data = ~0;
> -                break;
> -            case HVMCOPY_bad_gva_to_gfn:
> -                ASSERT(0);
> -                /* fall through */
> -            default:
> -                rc = X86EMUL_UNHANDLEABLE;
> -                break;
> +                switch ( hvm_copy_from_guest_phys(&data, p->data + step * i,
> +                                                  p->size) )
> +                {
> +                case HVMCOPY_okay:
> +                    break;
> +                case HVMCOPY_gfn_paged_out:
> +                case HVMCOPY_gfn_shared:
> +                    rc = X86EMUL_RETRY;
> +                    break;
> +                case HVMCOPY_bad_gfn_to_mfn:
> +                    data = ~0;
> +                    break;
> +                case HVMCOPY_bad_gva_to_gfn:
> +                    ASSERT_UNREACHABLE();
> +                    /* fall through */
> +                default:
> +                    rc = X86EMUL_UNHANDLEABLE;
> +                    break;
> +                }
> +                if ( rc != X86EMUL_OKAY )
> +                    break;
>              }
> -            if ( rc != X86EMUL_OKAY )
> -                break;
> -            rc = action(IOREQ_WRITE, p->addr, p->size, &data);
> +            else
> +                data = p->data;
> +
> +            addr = (p->type == IOREQ_TYPE_COPY) ?
> +                p->addr + step * i :
> +                p->addr;

Indentation.

> @@ -324,78 +233,133 @@ static int process_portio_intercept(portio_action_t action, ioreq_t *p)
>      return rc;
>  }
>  
> -/*
> - * Check if the request is handled inside xen
> - * return value: 0 --not handled; 1 --handled
> - */
> -int hvm_io_intercept(ioreq_t *p, int type)
> +static struct hvm_io_handler *hvm_find_io_handler(ioreq_t *p)
> +{
> +    struct vcpu *curr = current;
> +    struct domain *curr_d = curr->domain;

"curr" is used only once (here) and hence pointless as a local variable.

> +    const struct hvm_io_ops *ops =
> +        (p->type == IOREQ_TYPE_COPY) ?
> +        &mmio_ops :
> +        &portio_ops;
> +    unsigned int i;
> +
> +    for ( i = 0; i < curr_d->arch.hvm_domain.io_handler_count; i++ )
> +    {
> +        struct hvm_io_handler *handler =
> +            &curr_d->arch.hvm_domain.io_handler[i];
> +        uint64_t start, end, count = p->count, size = p->size;

I'm not really happy with all these 64-bit local variables, but I
guess they are the result of you not wanting to do things the
way they're currently done everywhere... From a logical pov,
start and end would better be paddr_t, count and size
unsigned long.

> +        if ( handler->type != p->type )
> +            continue;
> +
> +        switch ( handler->type )
> +        {
> +        case IOREQ_TYPE_PIO:
> +            start = p->addr;
> +            end = p->addr + size;
> +            break;
> +        case IOREQ_TYPE_COPY:
> +            if ( p->df )
> +            {
> +                start = (p->addr - (count - 1) * size);
> +                end = p->addr + size;
> +            }
> +            else
> +            {
> +                start = p->addr;
> +                end = p->addr + count * size;
> +            }
> +            break;
> +        default:
> +            BUG();
> +        }

Doing this over and over again during each iteration looks kind of
wasteful.

> +static struct hvm_io_handler *hvm_next_io_handler(struct domain *d)
>  {
> -    struct hvm_io_handler *handler = d->arch.hvm_domain.io_handler;
> -    int num = handler->num_slot;
> +    unsigned int i = d->arch.hvm_domain.io_handler_count++;
>  
> -    BUG_ON(num >= MAX_IO_HANDLER);
> +    if (i == NR_IO_HANDLERS)

Codingy style.

> +        domain_crash(d);

return NULL;

(or else you, despite having checked, return a pointer to an out of
bounds slot to your caller)

> +void relocate_portio_handler(struct domain *d, uint32_t old_addr,
> +                             uint32_t new_addr, uint32_t size)
> +{
> +    ioreq_t p = {
> +        .type = IOREQ_TYPE_PIO,
> +        .addr = old_addr,
> +        .size = size
> +    };
> +    struct hvm_io_handler *handler = hvm_find_io_handler(&p);

This isn't quite the same as the original checking - there was an
exact match of size required.

> +struct hvm_io_handler {
> +    union {
> +        struct {
> +            const struct hvm_mmio_ops *ops;
> +        } mmio;
> +        struct {
> +            uint32_t        start, end;
> +            portio_action_t action;
> +        } portio;
> +    } u;

No need to name this union, this is not a public interface.

Jan

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2015-06-24 12:07 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-23 10:39 [PATCH v3 00/18] x86/hvm: I/O emulation cleanup and fix Paul Durrant
2015-06-23 10:39 ` [PATCH v3 01/18] x86/hvm: simplify hvmemul_do_io() Paul Durrant
2015-06-23 10:39 ` [PATCH v3 02/18] x86/hvm: remove hvm_io_pending() check in hvmemul_do_io() Paul Durrant
2015-06-23 10:39 ` [PATCH v3 03/18] x86/hvm: remove extraneous parameter from hvmtrace_io_assist() Paul Durrant
2015-06-23 10:39 ` [PATCH v3 04/18] x86/hvm: make sure translated MMIO reads or writes fall within a page Paul Durrant
2015-06-23 15:52   ` Jan Beulich
2015-06-23 16:13     ` Paul Durrant
2015-06-23 16:21       ` Jan Beulich
2015-06-23 16:32         ` Paul Durrant
2015-06-24  7:38           ` Jan Beulich
2015-06-23 10:39 ` [PATCH v3 05/18] x86/hvm: remove multiple open coded 'chunking' loops Paul Durrant
2015-06-24  9:38   ` Jan Beulich
2015-06-24 10:10     ` Paul Durrant
2015-06-24 10:27       ` Jan Beulich
2015-06-23 10:39 ` [PATCH v3 06/18] x86/hvm: re-name struct hvm_mmio_handler to hvm_mmio_ops Paul Durrant
2015-06-23 10:39 ` [PATCH v3 07/18] x86/hvm: unify internal portio and mmio intercepts Paul Durrant
2015-06-24 12:07   ` Jan Beulich
2015-06-23 10:39 ` [PATCH v3 08/18] x86/hvm: add length to mmio check op Paul Durrant
2015-06-23 10:39 ` [PATCH v3 09/18] x86/hvm: unify dpci portio intercept with standard portio intercept Paul Durrant
2015-06-23 10:39 ` [PATCH v3 10/18] x86/hvm: unify stdvga mmio intercept with standard mmio intercept Paul Durrant
2015-06-23 10:39 ` [PATCH v3 11/18] x86/hvm: revert 82ed8716b "fix direct PCI port I/O emulation retry Paul Durrant
2015-06-23 10:39 ` [PATCH v3 12/18] x86/hvm: only call hvm_io_assist() from hvm_wait_for_io() Paul Durrant
2015-06-23 10:39 ` [PATCH v3 13/18] x86/hvm: split I/O completion handling from state model Paul Durrant
2015-06-23 10:39 ` [PATCH v3 14/18] x86/hvm: remove HVMIO_dispatched I/O state Paul Durrant
2015-06-23 10:39 ` [PATCH v3 15/18] x86/hvm: remove hvm_io_state enumeration Paul Durrant
2015-06-23 10:39 ` [PATCH v3 16/18] x86/hvm: use ioreq_t to track in-flight state Paul Durrant
2015-06-23 10:39 ` [PATCH v3 17/18] x86/hvm: always re-emulate I/O from a buffer Paul Durrant
2015-06-23 10:39 ` [PATCH v3 18/18] x86/hvm: track large memory mapped accesses by buffer offset Paul Durrant
2015-06-23 14:47 ` [PATCH v3 00/18] x86/hvm: I/O emulation cleanup and fix Fabio Fantoni

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.