All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 0/9] Support for running secondary emulators
@ 2014-05-09  8:39 Paul Durrant
  2014-05-09  8:39 ` [PATCH v7 1/9] ioreq-server: pre-series tidy up Paul Durrant
                   ` (8 more replies)
  0 siblings, 9 replies; 39+ messages in thread
From: Paul Durrant @ 2014-05-09  8:39 UTC (permalink / raw)
  To: xen-devel

This patch series adds the ioreq server interface which I mentioned in
my talk at the Xen developer summit in Edinburgh at the end of last year.
The code is based on work originally done by Julien Grall but has been
re-written to allow existing versions of QEMU to work unmodified.

The code is available in my xen.git [1] repo on xenbits, under the 'savannah7'
branch, and I have also written a demo emulator to test the code, which can
be found in my demu.git [2] repo.


The series has been re-worked since v5. The modifications are now broken
down as follows:

Patch #1 is a pre-series tidy-up. No semantic change.

Patch #2 moves some code around to centralize use of the ioreq_t data
structure.

Patch #3 introduces the new hvm_ioreq_server structure.

Patch #4 defers creation of the ioreq server until something actually
reads one of the HVM parameters concerned with emulation.

Patch #5 adds an implementation of asprintf() for Xen.

Patch #6 adds an extra function, rangeset_limit() to set a limit on the
number of ranges in a set.

Patch #7 makes the single ioreq server of previous patches into the
default ioreq server and introduces an API for creating secondary servers.

Patch #8 adds an enable/disable operation to the API for secondary servers
which makes sure that they cannot be active whilst their shared pages are
present in the guest's P2M.

Patch #9 adds makes handling buffered ioreqs optional for secondary servers.
This saves a page of memory per server.

The hotplug controller patch of previous versions of this series has
been dropped to give me more time for testing, and possible re-factoring.
This means that, for now, bus rescans will need to be invoked from within
the guest to notice devices provided by secondary emulators coming and going.

The demo emulator can simply be invoked from a shell. The emulated
device is not an awful lot of use at this stage - it appears as a SCSI
controller with one IO BAR and one MEM BAR and has no intrinsic
functionality... but then it is only supposed to be demo :-)

  Paul

[1] http://xenbits.xen.org/gitweb/?p=people/pauldu/xen.git
[2] http://xenbits.xen.org/gitweb/?p=people/pauldu/demu.git

v2:
 - First non-RFC posting

v3:
 - Addressed comments from Jan Beulich

v4:
 - Addressed comments from Ian Campbell and George Dunlap
 - Series heavily re-worked, 2 patches added

v5:
 - One more patch added to separate out a bug-fix, as requested by
   Jan Beulich
 - Switched to using rangesets as suggested by Jan Beulich
 - Changed domain restore path as requested by Ian Campbell
 - Added documentation for new hypercalls and libxenctrl API as requested
   by Ian Campbell
 - Added handling of multi-byte GPE I/O as suggested by Jan Beulich

v6:
 - Bugfix patch dropped as it has already been taken
 - Addressed various comments from Jan Beulich and Ian Campbell
 - Split out the addition of asprintf() into a separate patch
 - Dropped hotplug controller patch for now, to allow more time for testing
 - Split out rangeset patch and make range limit optional parameter
 - Added missing XSM checks

v7:
 - Addressed more comments from Jan Beulich
 - Backed out argument change to rangset_new() in favour of new
   rangeset_limit() function

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v7 1/9] ioreq-server: pre-series tidy up
  2014-05-09  8:39 [PATCH v7 0/9] Support for running secondary emulators Paul Durrant
@ 2014-05-09  8:39 ` Paul Durrant
  2014-05-09  8:39 ` [PATCH v7 2/9] ioreq-server: centralize access to ioreq structures Paul Durrant
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 39+ messages in thread
From: Paul Durrant @ 2014-05-09  8:39 UTC (permalink / raw)
  To: xen-devel; +Cc: Paul Durrant, Keir Fraser

This patch tidies up various parts of the code that following patches move
around. If these modifications were combined with the code motion it would
be easy to miss them.

There's also some function renaming to reflect purpose and a single
whitespace fix.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
---
 xen/arch/x86/hvm/emulate.c        |    2 +-
 xen/arch/x86/hvm/hvm.c            |   60 ++++++++++++++++++-------------------
 xen/arch/x86/hvm/io.c             |   46 +++++++++++++---------------
 xen/include/asm-x86/hvm/hvm.h     |    2 +-
 xen/include/asm-x86/hvm/support.h |    2 ++
 5 files changed, 54 insertions(+), 58 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 868aa1d..6d3522a 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -241,7 +241,7 @@ static int hvmemul_do_io(
         else
         {
             rc = X86EMUL_RETRY;
-            if ( !hvm_send_assist_req(curr) )
+            if ( !hvm_send_assist_req() )
                 vio->io_state = HVMIO_none;
             else if ( p_data == NULL )
                 rc = X86EMUL_OKAY;
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index da220bf..6b53ddc 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -365,7 +365,7 @@ void hvm_migrate_pirqs(struct vcpu *v)
 
 void hvm_do_resume(struct vcpu *v)
 {
-    ioreq_t *p;
+    ioreq_t *p = get_ioreq(v);
 
     check_wakeup_from_wait();
 
@@ -373,30 +373,29 @@ void hvm_do_resume(struct vcpu *v)
         pt_restore_timer(v);
 
     /* NB. Optimised for common case (p->state == STATE_IOREQ_NONE). */
-    if ( !(p = get_ioreq(v)) )
-        goto check_inject_trap;
-
-    while ( p->state != STATE_IOREQ_NONE )
+    if ( p )
     {
-        switch ( p->state )
+        while ( p->state != STATE_IOREQ_NONE )
         {
-        case STATE_IORESP_READY: /* IORESP_READY -> NONE */
-            hvm_io_assist(p);
-            break;
-        case STATE_IOREQ_READY:  /* IOREQ_{READY,INPROCESS} -> IORESP_READY */
-        case STATE_IOREQ_INPROCESS:
-            wait_on_xen_event_channel(v->arch.hvm_vcpu.xen_port,
-                                      (p->state != STATE_IOREQ_READY) &&
-                                      (p->state != STATE_IOREQ_INPROCESS));
-            break;
-        default:
-            gdprintk(XENLOG_ERR, "Weird HVM iorequest state %d.\n", p->state);
-            domain_crash(v->domain);
-            return; /* bail */
+            switch ( p->state )
+            {
+            case STATE_IORESP_READY: /* IORESP_READY -> NONE */
+                hvm_io_assist(p);
+                break;
+            case STATE_IOREQ_READY:  /* IOREQ_{READY,INPROCESS} -> IORESP_READY */
+            case STATE_IOREQ_INPROCESS:
+                wait_on_xen_event_channel(v->arch.hvm_vcpu.xen_port,
+                                          (p->state != STATE_IOREQ_READY) &&
+                                          (p->state != STATE_IOREQ_INPROCESS));
+                break;
+            default:
+                gdprintk(XENLOG_ERR, "Weird HVM iorequest state %d.\n", p->state);
+                domain_crash(v->domain);
+                return; /* bail */
+            }
         }
     }
 
- check_inject_trap:
     /* Inject pending hw/sw trap */
     if ( v->arch.hvm_vcpu.inject_trap.vector != -1 ) 
     {
@@ -426,7 +425,7 @@ void destroy_ring_for_helper(
     }
 }
 
-static void hvm_destroy_ioreq_page(
+static void hvm_unmap_ioreq_page(
     struct domain *d, struct hvm_ioreq_page *iorp)
 {
     spin_lock(&iorp->lock);
@@ -482,7 +481,7 @@ int prepare_ring_for_helper(
     return 0;
 }
 
-static int hvm_set_ioreq_page(
+static int hvm_map_ioreq_page(
     struct domain *d, struct hvm_ioreq_page *iorp, unsigned long gmfn)
 {
     struct page_info *page;
@@ -652,8 +651,8 @@ void hvm_domain_relinquish_resources(struct domain *d)
     if ( hvm_funcs.nhvm_domain_relinquish_resources )
         hvm_funcs.nhvm_domain_relinquish_resources(d);
 
-    hvm_destroy_ioreq_page(d, &d->arch.hvm_domain.ioreq);
-    hvm_destroy_ioreq_page(d, &d->arch.hvm_domain.buf_ioreq);
+    hvm_unmap_ioreq_page(d, &d->arch.hvm_domain.ioreq);
+    hvm_unmap_ioreq_page(d, &d->arch.hvm_domain.buf_ioreq);
 
     msixtbl_pt_cleanup(d);
 
@@ -1425,14 +1424,15 @@ void hvm_vcpu_down(struct vcpu *v)
     }
 }
 
-bool_t hvm_send_assist_req(struct vcpu *v)
+bool_t hvm_send_assist_req(void)
 {
-    ioreq_t *p;
+    struct vcpu *v = current;
+    ioreq_t *p = get_ioreq(v);
 
     if ( unlikely(!vcpu_start_shutdown_deferral(v)) )
         return 0; /* implicitly bins the i/o operation */
 
-    if ( !(p = get_ioreq(v)) )
+    if ( !p )
         return 0;
 
     if ( unlikely(p->state != STATE_IOREQ_NONE) )
@@ -4129,7 +4129,7 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
             {
             case HVM_PARAM_IOREQ_PFN:
                 iorp = &d->arch.hvm_domain.ioreq;
-                if ( (rc = hvm_set_ioreq_page(d, iorp, a.value)) != 0 )
+                if ( (rc = hvm_map_ioreq_page(d, iorp, a.value)) != 0 )
                     break;
                 spin_lock(&iorp->lock);
                 if ( iorp->va != NULL )
@@ -4138,9 +4138,9 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
                         get_ioreq(v)->vp_eport = v->arch.hvm_vcpu.xen_port;
                 spin_unlock(&iorp->lock);
                 break;
-            case HVM_PARAM_BUFIOREQ_PFN: 
+            case HVM_PARAM_BUFIOREQ_PFN:
                 iorp = &d->arch.hvm_domain.buf_ioreq;
-                rc = hvm_set_ioreq_page(d, iorp, a.value);
+                rc = hvm_map_ioreq_page(d, iorp, a.value);
                 break;
             case HVM_PARAM_CALLBACK_IRQ:
                 hvm_set_callback_via(d, a.value);
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index bf6309d..44b4e20 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -49,9 +49,13 @@
 int hvm_buffered_io_send(ioreq_t *p)
 {
     struct vcpu *v = current;
-    struct hvm_ioreq_page *iorp = &v->domain->arch.hvm_domain.buf_ioreq;
+    struct domain *d = v->domain;
+    struct hvm_ioreq_page *iorp = &d->arch.hvm_domain.buf_ioreq;
     buffered_iopage_t *pg = iorp->va;
-    buf_ioreq_t bp;
+    buf_ioreq_t bp = { .data = p->data,
+                       .addr = p->addr,
+                       .type = p->type,
+                       .dir  = p->dir };
     /* Timeoffset sends 64b data, but no address. Use two consecutive slots. */
     int qw = 0;
 
@@ -69,8 +73,6 @@ int hvm_buffered_io_send(ioreq_t *p)
     if ( (p->addr > 0xffffful) || p->data_is_ptr || (p->count != 1) )
         return 0;
 
-    bp.type = p->type;
-    bp.dir  = p->dir;
     switch ( p->size )
     {
     case 1:
@@ -91,9 +93,6 @@ int hvm_buffered_io_send(ioreq_t *p)
         return 0;
     }
     
-    bp.data = p->data;
-    bp.addr = p->addr;
-    
     spin_lock(&iorp->lock);
 
     if ( (pg->write_pointer - pg->read_pointer) >=
@@ -104,22 +103,20 @@ int hvm_buffered_io_send(ioreq_t *p)
         return 0;
     }
     
-    memcpy(&pg->buf_ioreq[pg->write_pointer % IOREQ_BUFFER_SLOT_NUM],
-           &bp, sizeof(bp));
+    pg->buf_ioreq[pg->write_pointer % IOREQ_BUFFER_SLOT_NUM] = bp;
     
     if ( qw )
     {
         bp.data = p->data >> 32;
-        memcpy(&pg->buf_ioreq[(pg->write_pointer+1) % IOREQ_BUFFER_SLOT_NUM],
-               &bp, sizeof(bp));
+        pg->buf_ioreq[(pg->write_pointer+1) % IOREQ_BUFFER_SLOT_NUM] = bp;
     }
 
     /* Make the ioreq_t visible /before/ write_pointer. */
     wmb();
     pg->write_pointer += qw ? 2 : 1;
 
-    notify_via_xen_event_channel(v->domain,
-            v->domain->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN]);
+    notify_via_xen_event_channel(d,
+            d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN]);
     spin_unlock(&iorp->lock);
     
     return 1;
@@ -127,22 +124,19 @@ int hvm_buffered_io_send(ioreq_t *p)
 
 void send_timeoffset_req(unsigned long timeoff)
 {
-    ioreq_t p[1];
+    ioreq_t p = {
+        .type = IOREQ_TYPE_TIMEOFFSET,
+        .size = 8,
+        .count = 1,
+        .dir = IOREQ_WRITE,
+        .data = timeoff,
+        .state = STATE_IOREQ_READY,
+    };
 
     if ( timeoff == 0 )
         return;
 
-    memset(p, 0, sizeof(*p));
-
-    p->type = IOREQ_TYPE_TIMEOFFSET;
-    p->size = 8;
-    p->count = 1;
-    p->dir = IOREQ_WRITE;
-    p->data = timeoff;
-
-    p->state = STATE_IOREQ_READY;
-
-    if ( !hvm_buffered_io_send(p) )
+    if ( !hvm_buffered_io_send(&p) )
         printk("Unsuccessful timeoffset update\n");
 }
 
@@ -168,7 +162,7 @@ void send_invalidate_req(void)
     p->dir = IOREQ_WRITE;
     p->data = ~0UL; /* flush all */
 
-    (void)hvm_send_assist_req(v);
+    (void)hvm_send_assist_req();
 }
 
 int handle_mmio(void)
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index b1c340e..4fb7e22 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -231,7 +231,7 @@ int prepare_ring_for_helper(struct domain *d, unsigned long gmfn,
                             struct page_info **_page, void **_va);
 void destroy_ring_for_helper(void **_va, struct page_info *page);
 
-bool_t hvm_send_assist_req(struct vcpu *v);
+bool_t hvm_send_assist_req(void);
 
 void hvm_get_guest_pat(struct vcpu *v, u64 *guest_pat);
 int hvm_set_guest_pat(struct vcpu *v, u64 guest_pat);
diff --git a/xen/include/asm-x86/hvm/support.h b/xen/include/asm-x86/hvm/support.h
index 3529499..1dc2f2d 100644
--- a/xen/include/asm-x86/hvm/support.h
+++ b/xen/include/asm-x86/hvm/support.h
@@ -31,7 +31,9 @@ static inline ioreq_t *get_ioreq(struct vcpu *v)
 {
     struct domain *d = v->domain;
     shared_iopage_t *p = d->arch.hvm_domain.ioreq.va;
+
     ASSERT((v == current) || spin_is_locked(&d->arch.hvm_domain.ioreq.lock));
+
     return p ? &p->vcpu_ioreq[v->vcpu_id] : NULL;
 }
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v7 2/9] ioreq-server: centralize access to ioreq structures
  2014-05-09  8:39 [PATCH v7 0/9] Support for running secondary emulators Paul Durrant
  2014-05-09  8:39 ` [PATCH v7 1/9] ioreq-server: pre-series tidy up Paul Durrant
@ 2014-05-09  8:39 ` Paul Durrant
  2014-05-09  8:39 ` [PATCH v7 3/9] ioreq-server: create basic ioreq server abstraction Paul Durrant
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 39+ messages in thread
From: Paul Durrant @ 2014-05-09  8:39 UTC (permalink / raw)
  To: xen-devel; +Cc: Eddie Dong, Paul Durrant, Keir Fraser, Jun Nakajima

To simplify creation of the ioreq server abstraction in a subsequent patch,
this patch centralizes all use of the shared ioreq structure and the
buffered ioreq ring to the source module xen/arch/x86/hvm/hvm.c.

The patch moves an rmb() from inside hvm_io_assist() to hvm_do_resume()
because the former may now be passed a data structure on stack, in which
case the barrier is unnecessary.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: Eddie Dong <eddie.dong@intel.com>
---
 xen/arch/x86/hvm/emulate.c        |   66 +++++++++++-----------
 xen/arch/x86/hvm/hvm.c            |  111 ++++++++++++++++++++++++++++++++++++-
 xen/arch/x86/hvm/io.c             |  104 +++-------------------------------
 xen/arch/x86/hvm/vmx/vvmx.c       |   13 ++++-
 xen/include/asm-x86/hvm/hvm.h     |   15 ++++-
 xen/include/asm-x86/hvm/support.h |   21 ++++---
 6 files changed, 181 insertions(+), 149 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 6d3522a..904c71a 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -57,24 +57,11 @@ static int hvmemul_do_io(
     int value_is_ptr = (p_data == NULL);
     struct vcpu *curr = current;
     struct hvm_vcpu_io *vio;
-    ioreq_t *p = get_ioreq(curr);
-    ioreq_t _ioreq;
+    ioreq_t p;
     unsigned long ram_gfn = paddr_to_pfn(ram_gpa);
     p2m_type_t p2mt;
     struct page_info *ram_page;
     int rc;
-    bool_t has_dm = 1;
-
-    /*
-     * Domains without a backing DM, don't have an ioreq page.  Just
-     * point to a struct on the stack, initialising the state as needed.
-     */
-    if ( !p )
-    {
-        has_dm = 0;
-        p = &_ioreq;
-        p->state = STATE_IOREQ_NONE;
-    }
 
     /* Check for paged out page */
     ram_page = get_page_from_gfn(curr->domain, ram_gfn, &p2mt, P2M_UNSHARE);
@@ -173,10 +160,9 @@ static int hvmemul_do_io(
         return X86EMUL_UNHANDLEABLE;
     }
 
-    if ( p->state != STATE_IOREQ_NONE )
+    if ( hvm_io_pending(curr) )
     {
-        gdprintk(XENLOG_WARNING, "WARNING: io already pending (%d)?\n",
-                 p->state);
+        gdprintk(XENLOG_WARNING, "WARNING: io already pending?\n");
         if ( ram_page )
             put_page(ram_page);
         return X86EMUL_UNHANDLEABLE;
@@ -193,38 +179,38 @@ static int hvmemul_do_io(
     if ( vio->mmio_retrying )
         *reps = 1;
 
-    p->dir = dir;
-    p->data_is_ptr = value_is_ptr;
-    p->type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO;
-    p->size = size;
-    p->addr = addr;
-    p->count = *reps;
-    p->df = df;
-    p->data = value;
+    p.dir = dir;
+    p.data_is_ptr = value_is_ptr;
+    p.type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO;
+    p.size = size;
+    p.addr = addr;
+    p.count = *reps;
+    p.df = df;
+    p.data = value;
 
     if ( dir == IOREQ_WRITE )
-        hvmtrace_io_assist(is_mmio, p);
+        hvmtrace_io_assist(is_mmio, &p);
 
     if ( is_mmio )
     {
-        rc = hvm_mmio_intercept(p);
+        rc = hvm_mmio_intercept(&p);
         if ( rc == X86EMUL_UNHANDLEABLE )
-            rc = hvm_buffered_io_intercept(p);
+            rc = hvm_buffered_io_intercept(&p);
     }
     else
     {
-        rc = hvm_portio_intercept(p);
+        rc = hvm_portio_intercept(&p);
     }
 
     switch ( rc )
     {
     case X86EMUL_OKAY:
     case X86EMUL_RETRY:
-        *reps = p->count;
-        p->state = STATE_IORESP_READY;
+        *reps = p.count;
+        p.state = STATE_IORESP_READY;
         if ( !vio->mmio_retry )
         {
-            hvm_io_assist(p);
+            hvm_io_assist(&p);
             vio->io_state = HVMIO_none;
         }
         else
@@ -233,7 +219,7 @@ static int hvmemul_do_io(
         break;
     case X86EMUL_UNHANDLEABLE:
         /* If there is no backing DM, just ignore accesses */
-        if ( !has_dm )
+        if ( !hvm_has_dm(curr->domain) )
         {
             rc = X86EMUL_OKAY;
             vio->io_state = HVMIO_none;
@@ -241,7 +227,7 @@ static int hvmemul_do_io(
         else
         {
             rc = X86EMUL_RETRY;
-            if ( !hvm_send_assist_req() )
+            if ( !hvm_send_assist_req(&p) )
                 vio->io_state = HVMIO_none;
             else if ( p_data == NULL )
                 rc = X86EMUL_OKAY;
@@ -260,7 +246,7 @@ static int hvmemul_do_io(
 
  finish_access:
     if ( dir == IOREQ_READ )
-        hvmtrace_io_assist(is_mmio, p);
+        hvmtrace_io_assist(is_mmio, &p);
 
     if ( p_data != NULL )
         memcpy(p_data, &vio->io_data, size);
@@ -1292,3 +1278,13 @@ struct segment_register *hvmemul_get_seg_reg(
         hvm_get_segment_register(current, seg, &hvmemul_ctxt->seg_reg[seg]);
     return &hvmemul_ctxt->seg_reg[seg];
 }
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 6b53ddc..d371639 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -363,6 +363,26 @@ void hvm_migrate_pirqs(struct vcpu *v)
     spin_unlock(&d->event_lock);
 }
 
+static ioreq_t *get_ioreq(struct vcpu *v)
+{
+    struct domain *d = v->domain;
+    shared_iopage_t *p = d->arch.hvm_domain.ioreq.va;
+
+    ASSERT((v == current) || spin_is_locked(&d->arch.hvm_domain.ioreq.lock));
+
+    return p ? &p->vcpu_ioreq[v->vcpu_id] : NULL;
+}
+
+bool_t hvm_io_pending(struct vcpu *v)
+{
+    ioreq_t *p = get_ioreq(v);
+
+    if ( !p )
+        return 0;
+
+    return p->state != STATE_IOREQ_NONE;
+}
+
 void hvm_do_resume(struct vcpu *v)
 {
     ioreq_t *p = get_ioreq(v);
@@ -380,6 +400,7 @@ void hvm_do_resume(struct vcpu *v)
             switch ( p->state )
             {
             case STATE_IORESP_READY: /* IORESP_READY -> NONE */
+                rmb(); /* see IORESP_READY /then/ read contents of ioreq */
                 hvm_io_assist(p);
                 break;
             case STATE_IOREQ_READY:  /* IOREQ_{READY,INPROCESS} -> IORESP_READY */
@@ -1424,7 +1445,87 @@ void hvm_vcpu_down(struct vcpu *v)
     }
 }
 
-bool_t hvm_send_assist_req(void)
+int hvm_buffered_io_send(ioreq_t *p)
+{
+    struct vcpu *v = current;
+    struct domain *d = v->domain;
+    struct hvm_ioreq_page *iorp = &d->arch.hvm_domain.buf_ioreq;
+    buffered_iopage_t *pg = iorp->va;
+    buf_ioreq_t bp = { .data = p->data,
+                       .addr = p->addr,
+                       .type = p->type,
+                       .dir = p->dir };
+    /* Timeoffset sends 64b data, but no address. Use two consecutive slots. */
+    int qw = 0;
+
+    /* Ensure buffered_iopage fits in a page */
+    BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE);
+
+    /*
+     * Return 0 for the cases we can't deal with:
+     *  - 'addr' is only a 20-bit field, so we cannot address beyond 1MB
+     *  - we cannot buffer accesses to guest memory buffers, as the guest
+     *    may expect the memory buffer to be synchronously accessed
+     *  - the count field is usually used with data_is_ptr and since we don't
+     *    support data_is_ptr we do not waste space for the count field either
+     */
+    if ( (p->addr > 0xffffful) || p->data_is_ptr || (p->count != 1) )
+        return 0;
+
+    switch ( p->size )
+    {
+    case 1:
+        bp.size = 0;
+        break;
+    case 2:
+        bp.size = 1;
+        break;
+    case 4:
+        bp.size = 2;
+        break;
+    case 8:
+        bp.size = 3;
+        qw = 1;
+        break;
+    default:
+        gdprintk(XENLOG_WARNING, "unexpected ioreq size: %u\n", p->size);
+        return 0;
+    }
+
+    spin_lock(&iorp->lock);
+
+    if ( (pg->write_pointer - pg->read_pointer) >=
+         (IOREQ_BUFFER_SLOT_NUM - qw) )
+    {
+        /* The queue is full: send the iopacket through the normal path. */
+        spin_unlock(&iorp->lock);
+        return 0;
+    }
+
+    pg->buf_ioreq[pg->write_pointer % IOREQ_BUFFER_SLOT_NUM] = bp;
+
+    if ( qw )
+    {
+        bp.data = p->data >> 32;
+        pg->buf_ioreq[(pg->write_pointer+1) % IOREQ_BUFFER_SLOT_NUM] = bp;
+    }
+
+    /* Make the ioreq_t visible /before/ write_pointer. */
+    wmb();
+    pg->write_pointer += qw ? 2 : 1;
+
+    notify_via_xen_event_channel(d, d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN]);
+    spin_unlock(&iorp->lock);
+
+    return 1;
+}
+
+bool_t hvm_has_dm(struct domain *d)
+{
+    return !!d->arch.hvm_domain.ioreq.va;
+}
+
+bool_t hvm_send_assist_req(ioreq_t *proto_p)
 {
     struct vcpu *v = current;
     ioreq_t *p = get_ioreq(v);
@@ -1443,14 +1544,18 @@ bool_t hvm_send_assist_req(void)
         return 0;
     }
 
-    prepare_wait_on_xen_event_channel(v->arch.hvm_vcpu.xen_port);
+    proto_p->state = STATE_IOREQ_NONE;
+    proto_p->vp_eport = p->vp_eport;
+    *p = *proto_p;
+
+    prepare_wait_on_xen_event_channel(p->vp_eport);
 
     /*
      * Following happens /after/ blocking and setting up ioreq contents.
      * prepare_wait_on_xen_event_channel() is an implicit barrier.
      */
     p->state = STATE_IOREQ_READY;
-    notify_via_xen_event_channel(v->domain, v->arch.hvm_vcpu.xen_port);
+    notify_via_xen_event_channel(v->domain, p->vp_eport);
 
     return 1;
 }
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index 44b4e20..f5ad9be 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -46,82 +46,6 @@
 #include <xen/iocap.h>
 #include <public/hvm/ioreq.h>
 
-int hvm_buffered_io_send(ioreq_t *p)
-{
-    struct vcpu *v = current;
-    struct domain *d = v->domain;
-    struct hvm_ioreq_page *iorp = &d->arch.hvm_domain.buf_ioreq;
-    buffered_iopage_t *pg = iorp->va;
-    buf_ioreq_t bp = { .data = p->data,
-                       .addr = p->addr,
-                       .type = p->type,
-                       .dir  = p->dir };
-    /* Timeoffset sends 64b data, but no address. Use two consecutive slots. */
-    int qw = 0;
-
-    /* Ensure buffered_iopage fits in a page */
-    BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE);
-
-    /*
-     * Return 0 for the cases we can't deal with:
-     *  - 'addr' is only a 20-bit field, so we cannot address beyond 1MB
-     *  - we cannot buffer accesses to guest memory buffers, as the guest
-     *    may expect the memory buffer to be synchronously accessed
-     *  - the count field is usually used with data_is_ptr and since we don't
-     *    support data_is_ptr we do not waste space for the count field either
-     */
-    if ( (p->addr > 0xffffful) || p->data_is_ptr || (p->count != 1) )
-        return 0;
-
-    switch ( p->size )
-    {
-    case 1:
-        bp.size = 0;
-        break;
-    case 2:
-        bp.size = 1;
-        break;
-    case 4:
-        bp.size = 2;
-        break;
-    case 8:
-        bp.size = 3;
-        qw = 1;
-        break;
-    default:
-        gdprintk(XENLOG_WARNING, "unexpected ioreq size: %u\n", p->size);
-        return 0;
-    }
-    
-    spin_lock(&iorp->lock);
-
-    if ( (pg->write_pointer - pg->read_pointer) >=
-         (IOREQ_BUFFER_SLOT_NUM - qw) )
-    {
-        /* The queue is full: send the iopacket through the normal path. */
-        spin_unlock(&iorp->lock);
-        return 0;
-    }
-    
-    pg->buf_ioreq[pg->write_pointer % IOREQ_BUFFER_SLOT_NUM] = bp;
-    
-    if ( qw )
-    {
-        bp.data = p->data >> 32;
-        pg->buf_ioreq[(pg->write_pointer+1) % IOREQ_BUFFER_SLOT_NUM] = bp;
-    }
-
-    /* Make the ioreq_t visible /before/ write_pointer. */
-    wmb();
-    pg->write_pointer += qw ? 2 : 1;
-
-    notify_via_xen_event_channel(d,
-            d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN]);
-    spin_unlock(&iorp->lock);
-    
-    return 1;
-}
-
 void send_timeoffset_req(unsigned long timeoff)
 {
     ioreq_t p = {
@@ -143,26 +67,14 @@ void send_timeoffset_req(unsigned long timeoff)
 /* Ask ioemu mapcache to invalidate mappings. */
 void send_invalidate_req(void)
 {
-    struct vcpu *v = current;
-    ioreq_t *p = get_ioreq(v);
-
-    if ( !p )
-        return;
-
-    if ( p->state != STATE_IOREQ_NONE )
-    {
-        gdprintk(XENLOG_ERR, "WARNING: send invalidate req with something "
-                 "already pending (%d)?\n", p->state);
-        domain_crash(v->domain);
-        return;
-    }
-
-    p->type = IOREQ_TYPE_INVALIDATE;
-    p->size = 4;
-    p->dir = IOREQ_WRITE;
-    p->data = ~0UL; /* flush all */
+    ioreq_t p = {
+        .type = IOREQ_TYPE_INVALIDATE,
+        .size = 4,
+        .dir = IOREQ_WRITE,
+        .data = ~0UL, /* flush all */
+    };
 
-    (void)hvm_send_assist_req();
+    (void)hvm_send_assist_req(&p);
 }
 
 int handle_mmio(void)
@@ -265,8 +177,6 @@ void hvm_io_assist(ioreq_t *p)
     struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
     enum hvm_io_state io_state;
 
-    rmb(); /* see IORESP_READY /then/ read contents of ioreq */
-
     p->state = STATE_IOREQ_NONE;
 
     io_state = vio->io_state;
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index e263376..9ccc03f 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -1394,7 +1394,6 @@ void nvmx_switch_guest(void)
     struct vcpu *v = current;
     struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
     struct cpu_user_regs *regs = guest_cpu_user_regs();
-    const ioreq_t *ioreq = get_ioreq(v);
 
     /*
      * A pending IO emulation may still be not finished. In this case, no
@@ -1404,7 +1403,7 @@ void nvmx_switch_guest(void)
      * don't want to continue as this setup is not implemented nor supported
      * as of right now.
      */
-    if ( !ioreq || ioreq->state != STATE_IOREQ_NONE )
+    if ( hvm_io_pending(v) )
         return;
     /*
      * a softirq may interrupt us between a virtual vmentry is
@@ -2522,3 +2521,13 @@ void nvmx_set_cr_read_shadow(struct vcpu *v, unsigned int cr)
     /* nvcpu.guest_cr is what L2 write to cr actually. */
     __vmwrite(read_shadow_field, v->arch.hvm_vcpu.nvcpu.guest_cr[cr]);
 }
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 4fb7e22..251625d 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -26,6 +26,7 @@
 #include <asm/hvm/asid.h>
 #include <public/domctl.h>
 #include <public/hvm/save.h>
+#include <public/hvm/ioreq.h>
 #include <asm/mm.h>
 
 /* Interrupt acknowledgement sources. */
@@ -231,7 +232,7 @@ int prepare_ring_for_helper(struct domain *d, unsigned long gmfn,
                             struct page_info **_page, void **_va);
 void destroy_ring_for_helper(void **_va, struct page_info *page);
 
-bool_t hvm_send_assist_req(void);
+bool_t hvm_send_assist_req(ioreq_t *p);
 
 void hvm_get_guest_pat(struct vcpu *v, u64 *guest_pat);
 int hvm_set_guest_pat(struct vcpu *v, u64 guest_pat);
@@ -349,6 +350,8 @@ void hvm_hypervisor_cpuid_leaf(uint32_t sub_idx,
 void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
                                    unsigned int *ecx, unsigned int *edx);
 void hvm_migrate_timers(struct vcpu *v);
+bool_t hvm_has_dm(struct domain *d);
+bool_t hvm_io_pending(struct vcpu *v);
 void hvm_do_resume(struct vcpu *v);
 void hvm_migrate_pirqs(struct vcpu *v);
 
@@ -545,3 +548,13 @@ bool_t nhvm_vmcx_hap_enabled(struct vcpu *v);
 enum hvm_intblk nhvm_interrupt_blocked(struct vcpu *v);
 
 #endif /* __ASM_X86_HVM_HVM_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/asm-x86/hvm/support.h b/xen/include/asm-x86/hvm/support.h
index 1dc2f2d..05ef5c5 100644
--- a/xen/include/asm-x86/hvm/support.h
+++ b/xen/include/asm-x86/hvm/support.h
@@ -22,21 +22,10 @@
 #define __ASM_X86_HVM_SUPPORT_H__
 
 #include <xen/types.h>
-#include <public/hvm/ioreq.h>
 #include <xen/sched.h>
 #include <xen/hvm/save.h>
 #include <asm/processor.h>
 
-static inline ioreq_t *get_ioreq(struct vcpu *v)
-{
-    struct domain *d = v->domain;
-    shared_iopage_t *p = d->arch.hvm_domain.ioreq.va;
-
-    ASSERT((v == current) || spin_is_locked(&d->arch.hvm_domain.ioreq.lock));
-
-    return p ? &p->vcpu_ioreq[v->vcpu_id] : NULL;
-}
-
 #define HVM_DELIVER_NO_ERROR_CODE  -1
 
 #ifndef NDEBUG
@@ -144,3 +133,13 @@ int hvm_mov_to_cr(unsigned int cr, unsigned int gpr);
 int hvm_mov_from_cr(unsigned int cr, unsigned int gpr);
 
 #endif /* __ASM_X86_HVM_SUPPORT_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v7 3/9] ioreq-server: create basic ioreq server abstraction.
  2014-05-09  8:39 [PATCH v7 0/9] Support for running secondary emulators Paul Durrant
  2014-05-09  8:39 ` [PATCH v7 1/9] ioreq-server: pre-series tidy up Paul Durrant
  2014-05-09  8:39 ` [PATCH v7 2/9] ioreq-server: centralize access to ioreq structures Paul Durrant
@ 2014-05-09  8:39 ` Paul Durrant
  2014-05-09 13:09   ` Jan Beulich
  2014-05-09  8:39 ` [PATCH v7 4/9] ioreq-server: on-demand creation of ioreq server Paul Durrant
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 39+ messages in thread
From: Paul Durrant @ 2014-05-09  8:39 UTC (permalink / raw)
  To: xen-devel; +Cc: Paul Durrant, Keir Fraser, Jan Beulich

Collect together data structures concerning device emulation together into
a new struct hvm_ioreq_server.

Code that deals with the shared and buffered ioreq pages is extracted from
functions such as hvm_domain_initialise, hvm_vcpu_initialise and do_hvm_op
and consolidated into a set of hvm_ioreq_server manipulation functions. The
lock in the hvm_ioreq_page served two different purposes and has been
replaced by separate locks in the hvm_ioreq_server structure.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
---
 xen/arch/x86/hvm/hvm.c           |  414 ++++++++++++++++++++++++++------------
 xen/include/asm-x86/hvm/domain.h |   36 +++-
 xen/include/asm-x86/hvm/vcpu.h   |   12 +-
 3 files changed, 328 insertions(+), 134 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index d371639..62d25a9 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -363,38 +363,43 @@ void hvm_migrate_pirqs(struct vcpu *v)
     spin_unlock(&d->event_lock);
 }
 
-static ioreq_t *get_ioreq(struct vcpu *v)
+static ioreq_t *get_ioreq(struct hvm_ioreq_server *s, struct vcpu *v)
 {
-    struct domain *d = v->domain;
-    shared_iopage_t *p = d->arch.hvm_domain.ioreq.va;
+    shared_iopage_t *p = s->ioreq.va;
 
-    ASSERT((v == current) || spin_is_locked(&d->arch.hvm_domain.ioreq.lock));
+    ASSERT((v == current) || !vcpu_runnable(v));
+    ASSERT(p != NULL);
 
-    return p ? &p->vcpu_ioreq[v->vcpu_id] : NULL;
+    return &p->vcpu_ioreq[v->vcpu_id];
 }
 
 bool_t hvm_io_pending(struct vcpu *v)
 {
-    ioreq_t *p = get_ioreq(v);
+    struct hvm_ioreq_server *s = v->domain->arch.hvm_domain.ioreq_server;
+    ioreq_t *p;
 
-    if ( !p )
+    if ( !s )
         return 0;
 
+    p = get_ioreq(s, v);
     return p->state != STATE_IOREQ_NONE;
 }
 
 void hvm_do_resume(struct vcpu *v)
 {
-    ioreq_t *p = get_ioreq(v);
+    struct domain *d = v->domain;
+    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
 
     check_wakeup_from_wait();
 
     if ( is_hvm_vcpu(v) )
         pt_restore_timer(v);
 
-    /* NB. Optimised for common case (p->state == STATE_IOREQ_NONE). */
-    if ( p )
+    if ( s )
     {
+        ioreq_t *p = get_ioreq(s, v);
+
+        /* NB. Optimised for common case (p->state == STATE_IOREQ_NONE). */
         while ( p->state != STATE_IOREQ_NONE )
         {
             switch ( p->state )
@@ -405,13 +410,13 @@ void hvm_do_resume(struct vcpu *v)
                 break;
             case STATE_IOREQ_READY:  /* IOREQ_{READY,INPROCESS} -> IORESP_READY */
             case STATE_IOREQ_INPROCESS:
-                wait_on_xen_event_channel(v->arch.hvm_vcpu.xen_port,
+                wait_on_xen_event_channel(p->vp_eport,
                                           (p->state != STATE_IOREQ_READY) &&
                                           (p->state != STATE_IOREQ_INPROCESS));
                 break;
             default:
                 gdprintk(XENLOG_ERR, "Weird HVM iorequest state %d.\n", p->state);
-                domain_crash(v->domain);
+                domain_crash(d);
                 return; /* bail */
             }
         }
@@ -425,14 +430,6 @@ void hvm_do_resume(struct vcpu *v)
     }
 }
 
-static void hvm_init_ioreq_page(
-    struct domain *d, struct hvm_ioreq_page *iorp)
-{
-    memset(iorp, 0, sizeof(*iorp));
-    spin_lock_init(&iorp->lock);
-    domain_pause(d);
-}
-
 void destroy_ring_for_helper(
     void **_va, struct page_info *page)
 {
@@ -446,16 +443,11 @@ void destroy_ring_for_helper(
     }
 }
 
-static void hvm_unmap_ioreq_page(
-    struct domain *d, struct hvm_ioreq_page *iorp)
+static void hvm_unmap_ioreq_page(struct hvm_ioreq_server *s, bool_t buf)
 {
-    spin_lock(&iorp->lock);
-
-    ASSERT(d->is_dying);
+    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
 
     destroy_ring_for_helper(&iorp->va, iorp->page);
-
-    spin_unlock(&iorp->lock);
 }
 
 int prepare_ring_for_helper(
@@ -503,8 +495,10 @@ int prepare_ring_for_helper(
 }
 
 static int hvm_map_ioreq_page(
-    struct domain *d, struct hvm_ioreq_page *iorp, unsigned long gmfn)
+    struct hvm_ioreq_server *s, bool_t buf, unsigned long gmfn)
 {
+    struct domain *d = s->domain;
+    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
     struct page_info *page;
     void *va;
     int rc;
@@ -512,22 +506,15 @@ static int hvm_map_ioreq_page(
     if ( (rc = prepare_ring_for_helper(d, gmfn, &page, &va)) )
         return rc;
 
-    spin_lock(&iorp->lock);
-
     if ( (iorp->va != NULL) || d->is_dying )
     {
         destroy_ring_for_helper(&va, page);
-        spin_unlock(&iorp->lock);
         return -EINVAL;
     }
 
     iorp->va = va;
     iorp->page = page;
 
-    spin_unlock(&iorp->lock);
-
-    domain_unpause(d);
-
     return 0;
 }
 
@@ -571,8 +558,230 @@ static int handle_pvh_io(
     return X86EMUL_OKAY;
 }
 
+static void hvm_update_ioreq_evtchn(struct hvm_ioreq_server *s,
+                                    struct hvm_ioreq_vcpu *sv)
+{
+    ASSERT(spin_is_locked(&s->lock));
+
+    if ( s->ioreq.va != NULL )
+    {
+        ioreq_t *p = get_ioreq(s, sv->vcpu);
+
+        p->vp_eport = sv->ioreq_evtchn;
+    }
+}
+
+static int hvm_ioreq_server_add_vcpu(struct hvm_ioreq_server *s,
+                                     struct vcpu *v)
+{
+    struct hvm_ioreq_vcpu *sv;
+    int rc;
+
+    sv = xzalloc(struct hvm_ioreq_vcpu);
+
+    rc = -ENOMEM;
+    if ( !sv )
+        goto fail1;
+
+    spin_lock(&s->lock);
+
+    rc = alloc_unbound_xen_event_channel(v, s->domid, NULL);
+    if ( rc < 0 )
+        goto fail2;
+
+    sv->ioreq_evtchn = rc;
+
+    if ( v->vcpu_id == 0 )
+    {
+        struct domain *d = s->domain;
+
+        rc = alloc_unbound_xen_event_channel(v, s->domid, NULL);
+        if ( rc < 0 )
+            goto fail3;
+
+        s->bufioreq_evtchn = rc;
+        d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN] =
+            s->bufioreq_evtchn;
+    }
+
+    sv->vcpu = v;
+
+    list_add(&sv->list_entry, &s->ioreq_vcpu_list);
+
+    hvm_update_ioreq_evtchn(s, sv);
+
+    spin_unlock(&s->lock);
+    return 0;
+
+ fail3:
+    free_xen_event_channel(v, sv->ioreq_evtchn);
+    
+ fail2:
+    spin_unlock(&s->lock);
+    xfree(sv);
+
+ fail1:
+    return rc;
+}
+
+static void hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s,
+                                         struct vcpu *v)
+{
+    struct hvm_ioreq_vcpu *sv;
+
+    spin_lock(&s->lock);
+
+    list_for_each_entry ( sv,
+                          &s->ioreq_vcpu_list,
+                          list_entry )
+    {
+        if ( sv->vcpu != v )
+            continue;
+
+        list_del(&sv->list_entry);
+
+        if ( v->vcpu_id == 0 )
+            free_xen_event_channel(v, s->bufioreq_evtchn);
+
+        free_xen_event_channel(v, sv->ioreq_evtchn);
+
+        xfree(sv);
+        break;
+    }
+
+    spin_unlock(&s->lock);
+}
+
+static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
+{
+    struct hvm_ioreq_server *s;
+
+    s = xzalloc(struct hvm_ioreq_server);
+    if ( !s )
+        return -ENOMEM;
+
+    s->domain = d;
+    s->domid = domid;
+
+    spin_lock_init(&s->lock);
+    INIT_LIST_HEAD(&s->ioreq_vcpu_list);
+    spin_lock_init(&s->bufioreq_lock);
+
+    /*
+     * The domain needs to wait until HVM_PARAM_IOREQ_PFN and
+     * HVM_PARAM_BUFIOREQ_PFN are both set.
+     */
+    domain_pause(d);
+    domain_pause(d);
+
+    d->arch.hvm_domain.ioreq_server = s;
+    return 0;
+}
+
+static void hvm_destroy_ioreq_server(struct domain *d)
+{
+    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+
+    hvm_unmap_ioreq_page(s, 1);
+    hvm_unmap_ioreq_page(s, 0);
+
+    xfree(s);
+}
+
+static int hvm_set_ioreq_pfn(struct domain *d, bool_t buf,
+                             unsigned long pfn)
+{
+    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+    int rc;
+
+    spin_lock(&s->lock);
+
+    rc = hvm_map_ioreq_page(s, buf, pfn);
+    if ( rc )
+        goto fail;
+
+    if ( !buf )
+    {
+        struct hvm_ioreq_vcpu *sv;
+
+        list_for_each_entry ( sv,
+                              &s->ioreq_vcpu_list,
+                              list_entry )
+            hvm_update_ioreq_evtchn(s, sv);
+    }
+
+    spin_unlock(&s->lock);
+    domain_unpause(d); /* domain_pause() in hvm_create_ioreq_server() */
+
+    return 0;
+
+ fail:
+    spin_unlock(&s->lock);
+    return rc;
+}
+
+static int hvm_replace_event_channel(struct vcpu *v, domid_t remote_domid,
+                                     evtchn_port_t *p_port)
+{
+    int old_port, new_port;
+
+    new_port = alloc_unbound_xen_event_channel(v, remote_domid, NULL);
+    if ( new_port < 0 )
+        return new_port;
+
+    /* xchg() ensures that only we call free_xen_event_channel(). */
+    old_port = xchg(p_port, new_port);
+    free_xen_event_channel(v, old_port);
+    return 0;
+}
+
+static int hvm_set_dm_domain(struct domain *d, domid_t domid)
+{
+    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+    int rc = 0;
+
+    domain_pause(d);
+    spin_lock(&s->lock);
+
+    if ( s->domid != domid ) {
+        struct hvm_ioreq_vcpu *sv;
+
+        list_for_each_entry ( sv,
+                              &s->ioreq_vcpu_list,
+                              list_entry )
+        {
+            struct vcpu *v = sv->vcpu;
+
+            if ( v->vcpu_id == 0 )
+            {
+                rc = hvm_replace_event_channel(v, domid,
+                                               &s->bufioreq_evtchn);
+                if ( rc )
+                    break;
+
+                d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN] =
+                    s->bufioreq_evtchn;
+            }
+
+            rc = hvm_replace_event_channel(v, domid, &sv->ioreq_evtchn);
+            if ( rc )
+                break;
+
+            hvm_update_ioreq_evtchn(s, sv);
+        }
+
+        s->domid = domid;
+    }
+
+    spin_unlock(&s->lock);
+    domain_unpause(d);
+
+    return rc;
+}
+
 int hvm_domain_initialise(struct domain *d)
 {
+    domid_t domid;
     int rc;
 
     if ( !hvm_enabled )
@@ -638,17 +847,21 @@ int hvm_domain_initialise(struct domain *d)
 
     rtc_init(d);
 
-    hvm_init_ioreq_page(d, &d->arch.hvm_domain.ioreq);
-    hvm_init_ioreq_page(d, &d->arch.hvm_domain.buf_ioreq);
+    domid = d->arch.hvm_domain.params[HVM_PARAM_DM_DOMAIN];
+    rc = hvm_create_ioreq_server(d, domid);
+    if ( rc != 0 )
+        goto fail2;
 
     register_portio_handler(d, 0xe9, 1, hvm_print_line);
 
     rc = hvm_funcs.domain_initialise(d);
     if ( rc != 0 )
-        goto fail2;
+        goto fail3;
 
     return 0;
 
+ fail3:
+    hvm_destroy_ioreq_server(d);
  fail2:
     rtc_deinit(d);
     stdvga_deinit(d);
@@ -672,8 +885,7 @@ void hvm_domain_relinquish_resources(struct domain *d)
     if ( hvm_funcs.nhvm_domain_relinquish_resources )
         hvm_funcs.nhvm_domain_relinquish_resources(d);
 
-    hvm_unmap_ioreq_page(d, &d->arch.hvm_domain.ioreq);
-    hvm_unmap_ioreq_page(d, &d->arch.hvm_domain.buf_ioreq);
+    hvm_destroy_ioreq_server(d);
 
     msixtbl_pt_cleanup(d);
 
@@ -1306,7 +1518,7 @@ int hvm_vcpu_initialise(struct vcpu *v)
 {
     int rc;
     struct domain *d = v->domain;
-    domid_t dm_domid;
+    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
 
     hvm_asid_flush_vcpu(v);
 
@@ -1349,30 +1561,10 @@ int hvm_vcpu_initialise(struct vcpu *v)
          && (rc = nestedhvm_vcpu_initialise(v)) < 0 ) /* teardown: nestedhvm_vcpu_destroy */
         goto fail5;
 
-    dm_domid = d->arch.hvm_domain.params[HVM_PARAM_DM_DOMAIN];
-
-    /* Create ioreq event channel. */
-    rc = alloc_unbound_xen_event_channel(v, dm_domid, NULL); /* teardown: none */
-    if ( rc < 0 )
+    rc = hvm_ioreq_server_add_vcpu(s, v);
+    if ( rc != 0 )
         goto fail6;
 
-    /* Register ioreq event channel. */
-    v->arch.hvm_vcpu.xen_port = rc;
-
-    if ( v->vcpu_id == 0 )
-    {
-        /* Create bufioreq event channel. */
-        rc = alloc_unbound_xen_event_channel(v, dm_domid, NULL); /* teardown: none */
-        if ( rc < 0 )
-            goto fail6;
-        d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN] = rc;
-    }
-
-    spin_lock(&d->arch.hvm_domain.ioreq.lock);
-    if ( d->arch.hvm_domain.ioreq.va != NULL )
-        get_ioreq(v)->vp_eport = v->arch.hvm_vcpu.xen_port;
-    spin_unlock(&d->arch.hvm_domain.ioreq.lock);
-
     if ( v->vcpu_id == 0 )
     {
         /* NB. All these really belong in hvm_domain_initialise(). */
@@ -1405,6 +1597,11 @@ int hvm_vcpu_initialise(struct vcpu *v)
 
 void hvm_vcpu_destroy(struct vcpu *v)
 {
+    struct domain *d = v->domain;
+    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+
+    hvm_ioreq_server_remove_vcpu(s, v);
+
     nestedhvm_vcpu_destroy(v);
 
     free_compat_arg_xlat(v);
@@ -1416,9 +1613,6 @@ void hvm_vcpu_destroy(struct vcpu *v)
         vlapic_destroy(v);
 
     hvm_funcs.vcpu_destroy(v);
-
-    /* Event channel is already freed by evtchn_destroy(). */
-    /*free_xen_event_channel(v, v->arch.hvm_vcpu.xen_port);*/
 }
 
 void hvm_vcpu_down(struct vcpu *v)
@@ -1449,8 +1643,9 @@ int hvm_buffered_io_send(ioreq_t *p)
 {
     struct vcpu *v = current;
     struct domain *d = v->domain;
-    struct hvm_ioreq_page *iorp = &d->arch.hvm_domain.buf_ioreq;
-    buffered_iopage_t *pg = iorp->va;
+    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+    struct hvm_ioreq_page *iorp;
+    buffered_iopage_t *pg;
     buf_ioreq_t bp = { .data = p->data,
                        .addr = p->addr,
                        .type = p->type,
@@ -1461,6 +1656,12 @@ int hvm_buffered_io_send(ioreq_t *p)
     /* Ensure buffered_iopage fits in a page */
     BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE);
 
+    if ( !s )
+        return 0;
+
+    iorp = &s->bufioreq;
+    pg = iorp->va;
+
     /*
      * Return 0 for the cases we can't deal with:
      *  - 'addr' is only a 20-bit field, so we cannot address beyond 1MB
@@ -1492,13 +1693,13 @@ int hvm_buffered_io_send(ioreq_t *p)
         return 0;
     }
 
-    spin_lock(&iorp->lock);
+    spin_lock(&s->bufioreq_lock);
 
     if ( (pg->write_pointer - pg->read_pointer) >=
          (IOREQ_BUFFER_SLOT_NUM - qw) )
     {
         /* The queue is full: send the iopacket through the normal path. */
-        spin_unlock(&iorp->lock);
+        spin_unlock(&s->bufioreq_lock);
         return 0;
     }
 
@@ -1514,33 +1715,37 @@ int hvm_buffered_io_send(ioreq_t *p)
     wmb();
     pg->write_pointer += qw ? 2 : 1;
 
-    notify_via_xen_event_channel(d, d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN]);
-    spin_unlock(&iorp->lock);
+    notify_via_xen_event_channel(d, s->bufioreq_evtchn);
+    spin_unlock(&s->bufioreq_lock);
 
     return 1;
 }
 
 bool_t hvm_has_dm(struct domain *d)
 {
-    return !!d->arch.hvm_domain.ioreq.va;
+    return !!d->arch.hvm_domain.ioreq_server;
 }
 
 bool_t hvm_send_assist_req(ioreq_t *proto_p)
 {
     struct vcpu *v = current;
-    ioreq_t *p = get_ioreq(v);
+    struct domain *d = v->domain;
+    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+    ioreq_t *p;
 
     if ( unlikely(!vcpu_start_shutdown_deferral(v)) )
         return 0; /* implicitly bins the i/o operation */
 
-    if ( !p )
+    if ( !s )
         return 0;
 
+    p = get_ioreq(s, v);
+
     if ( unlikely(p->state != STATE_IOREQ_NONE) )
     {
         /* This indicates a bug in the device model. Crash the domain. */
         gdprintk(XENLOG_ERR, "Device model set bad IO state %d.\n", p->state);
-        domain_crash(v->domain);
+        domain_crash(d);
         return 0;
     }
 
@@ -1555,7 +1760,7 @@ bool_t hvm_send_assist_req(ioreq_t *proto_p)
      * prepare_wait_on_xen_event_channel() is an implicit barrier.
      */
     p->state = STATE_IOREQ_READY;
-    notify_via_xen_event_channel(v->domain, p->vp_eport);
+    notify_via_xen_event_channel(d, p->vp_eport);
 
     return 1;
 }
@@ -4170,21 +4375,6 @@ static int hvmop_flush_tlb_all(void)
     return 0;
 }
 
-static int hvm_replace_event_channel(struct vcpu *v, domid_t remote_domid,
-                                     int *p_port)
-{
-    int old_port, new_port;
-
-    new_port = alloc_unbound_xen_event_channel(v, remote_domid, NULL);
-    if ( new_port < 0 )
-        return new_port;
-
-    /* xchg() ensures that only we call free_xen_event_channel(). */
-    old_port = xchg(p_port, new_port);
-    free_xen_event_channel(v, old_port);
-    return 0;
-}
-
 #define HVMOP_op_mask 0xff
 
 long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
@@ -4200,7 +4390,6 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
     case HVMOP_get_param:
     {
         struct xen_hvm_param a;
-        struct hvm_ioreq_page *iorp;
         struct domain *d;
         struct vcpu *v;
 
@@ -4233,19 +4422,10 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
             switch ( a.index )
             {
             case HVM_PARAM_IOREQ_PFN:
-                iorp = &d->arch.hvm_domain.ioreq;
-                if ( (rc = hvm_map_ioreq_page(d, iorp, a.value)) != 0 )
-                    break;
-                spin_lock(&iorp->lock);
-                if ( iorp->va != NULL )
-                    /* Initialise evtchn port info if VCPUs already created. */
-                    for_each_vcpu ( d, v )
-                        get_ioreq(v)->vp_eport = v->arch.hvm_vcpu.xen_port;
-                spin_unlock(&iorp->lock);
+                rc = hvm_set_ioreq_pfn(d, 0, a.value);
                 break;
             case HVM_PARAM_BUFIOREQ_PFN:
-                iorp = &d->arch.hvm_domain.buf_ioreq;
-                rc = hvm_map_ioreq_page(d, iorp, a.value);
+                rc = hvm_set_ioreq_pfn(d, 1, a.value);
                 break;
             case HVM_PARAM_CALLBACK_IRQ:
                 hvm_set_callback_via(d, a.value);
@@ -4300,31 +4480,7 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
                 if ( a.value == DOMID_SELF )
                     a.value = curr_d->domain_id;
 
-                rc = 0;
-                domain_pause(d); /* safe to change per-vcpu xen_port */
-                if ( d->vcpu[0] )
-                    rc = hvm_replace_event_channel(d->vcpu[0], a.value,
-                             (int *)&d->vcpu[0]->domain->arch.hvm_domain.params
-                                     [HVM_PARAM_BUFIOREQ_EVTCHN]);
-                if ( rc )
-                {
-                    domain_unpause(d);
-                    break;
-                }
-                iorp = &d->arch.hvm_domain.ioreq;
-                for_each_vcpu ( d, v )
-                {
-                    rc = hvm_replace_event_channel(v, a.value,
-                                                   &v->arch.hvm_vcpu.xen_port);
-                    if ( rc )
-                        break;
-
-                    spin_lock(&iorp->lock);
-                    if ( iorp->va != NULL )
-                        get_ioreq(v)->vp_eport = v->arch.hvm_vcpu.xen_port;
-                    spin_unlock(&iorp->lock);
-                }
-                domain_unpause(d);
+                rc = hvm_set_dm_domain(d, a.value);
                 break;
             case HVM_PARAM_ACPI_S_STATE:
                 /* Not reflexive, as we must domain_pause(). */
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index 460dd94..92dc5fb 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -36,14 +36,35 @@
 #include <public/hvm/save.h>
 
 struct hvm_ioreq_page {
-    spinlock_t lock;
     struct page_info *page;
     void *va;
 };
 
-struct hvm_domain {
+struct hvm_ioreq_vcpu {
+    struct list_head list_entry;
+    struct vcpu      *vcpu;
+    evtchn_port_t    ioreq_evtchn;
+};
+
+struct hvm_ioreq_server {
+    struct domain          *domain;
+
+    /* Lock to serialize toolstack modifications */
+    spinlock_t             lock;
+
+    /* Domain id of emulating domain */
+    domid_t                domid;
     struct hvm_ioreq_page  ioreq;
-    struct hvm_ioreq_page  buf_ioreq;
+    struct list_head       ioreq_vcpu_list;
+    struct hvm_ioreq_page  bufioreq;
+
+    /* Lock to serialize access to buffered ioreq ring */
+    spinlock_t             bufioreq_lock;
+    evtchn_port_t          bufioreq_evtchn;
+};
+
+struct hvm_domain {
+    struct hvm_ioreq_server *ioreq_server;
 
     struct pl_time         pl_time;
 
@@ -106,3 +127,12 @@ struct hvm_domain {
 
 #endif /* __ASM_X86_HVM_DOMAIN_H__ */
 
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index f34fa91..db37232 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -138,8 +138,6 @@ struct hvm_vcpu {
     spinlock_t          tm_lock;
     struct list_head    tm_list;
 
-    int                 xen_port;
-
     u8                  flag_dr_dirty;
     bool_t              debug_state_latch;
     bool_t              single_step;
@@ -186,3 +184,13 @@ struct hvm_vcpu {
 };
 
 #endif /* __ASM_X86_HVM_VCPU_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v7 4/9] ioreq-server: on-demand creation of ioreq server
  2014-05-09  8:39 [PATCH v7 0/9] Support for running secondary emulators Paul Durrant
                   ` (2 preceding siblings ...)
  2014-05-09  8:39 ` [PATCH v7 3/9] ioreq-server: create basic ioreq server abstraction Paul Durrant
@ 2014-05-09  8:39 ` Paul Durrant
  2014-05-09 13:12   ` Jan Beulich
  2014-05-09  8:40 ` [PATCH v7 5/9] Add an implentation of asprintf() for xen Paul Durrant
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 39+ messages in thread
From: Paul Durrant @ 2014-05-09  8:39 UTC (permalink / raw)
  To: xen-devel; +Cc: Paul Durrant, Keir Fraser, Jan Beulich

This patch only creates the ioreq server when the legacy HVM parameters
are read (by an emulator).

A lock is introduced to protect access to the ioreq server by multiple
emulator/tool invocations should such an eventuality arise. The guest is
protected by creation of the ioreq server only being done whilst the
domain is paused.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
---
 xen/arch/x86/hvm/hvm.c           |  226 ++++++++++++++++++++++++++++----------
 xen/include/asm-x86/hvm/domain.h |    1 +
 2 files changed, 168 insertions(+), 59 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 62d25a9..cdcb25b 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -652,13 +652,66 @@ static void hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s,
     spin_unlock(&s->lock);
 }
 
-static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
+static void hvm_ioreq_server_remove_all_vcpus(struct hvm_ioreq_server *s)
 {
-    struct hvm_ioreq_server *s;
+    struct hvm_ioreq_vcpu *sv, *next;
 
-    s = xzalloc(struct hvm_ioreq_server);
-    if ( !s )
-        return -ENOMEM;
+    spin_lock(&s->lock);
+
+    list_for_each_entry_safe ( sv,
+                               next,
+                               &s->ioreq_vcpu_list,
+                               list_entry )
+    {
+        struct vcpu *v = sv->vcpu;
+
+        list_del(&sv->list_entry);
+
+        if ( v->vcpu_id == 0 )
+            free_xen_event_channel(v, s->bufioreq_evtchn);
+
+        free_xen_event_channel(v, sv->ioreq_evtchn);
+
+        xfree(sv);
+    }
+
+    spin_unlock(&s->lock);
+}
+
+static int hvm_ioreq_server_map_pages(struct hvm_ioreq_server *s)
+{
+    struct domain *d = s->domain;
+    unsigned long pfn;
+    int rc;
+
+    pfn = d->arch.hvm_domain.params[HVM_PARAM_IOREQ_PFN];
+    rc = hvm_map_ioreq_page(s, 0, pfn);
+    if ( rc )
+        return rc;
+
+    pfn = d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_PFN];
+    rc = hvm_map_ioreq_page(s, 1, pfn);
+    if ( rc )
+        goto fail;
+
+    return 0;
+
+fail:
+    hvm_unmap_ioreq_page(s, 0);
+    return rc;
+}
+
+static void hvm_ioreq_server_unmap_pages(struct hvm_ioreq_server *s)
+{
+    hvm_unmap_ioreq_page(s, 1);
+    hvm_unmap_ioreq_page(s, 0);
+}
+
+static int hvm_ioreq_server_init(struct hvm_ioreq_server *s, struct domain *d,
+                                 domid_t domid)
+{
+    struct vcpu *v;
+    int rc;
 
     s->domain = d;
     s->domid = domid;
@@ -667,59 +720,89 @@ static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
     INIT_LIST_HEAD(&s->ioreq_vcpu_list);
     spin_lock_init(&s->bufioreq_lock);
 
-    /*
-     * The domain needs to wait until HVM_PARAM_IOREQ_PFN and
-     * HVM_PARAM_BUFIOREQ_PFN are both set.
-     */
-    domain_pause(d);
-    domain_pause(d);
+    rc = hvm_ioreq_server_map_pages(s);
+    if ( rc )
+        return rc;
+
+    for_each_vcpu ( d, v )
+    {
+        rc = hvm_ioreq_server_add_vcpu(s, v);
+        if ( rc )
+            goto fail;
+    }
 
-    d->arch.hvm_domain.ioreq_server = s;
     return 0;
-}
 
-static void hvm_destroy_ioreq_server(struct domain *d)
-{
-    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+ fail:
+    hvm_ioreq_server_remove_all_vcpus(s);
+    hvm_ioreq_server_unmap_pages(s);
 
-    hvm_unmap_ioreq_page(s, 1);
-    hvm_unmap_ioreq_page(s, 0);
+    return rc;
+}
 
-    xfree(s);
+static void hvm_ioreq_server_deinit(struct hvm_ioreq_server *s)
+{
+    hvm_ioreq_server_remove_all_vcpus(s);
+    hvm_ioreq_server_unmap_pages(s);
 }
 
-static int hvm_set_ioreq_pfn(struct domain *d, bool_t buf,
-                             unsigned long pfn)
+static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
 {
-    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+    struct hvm_ioreq_server *s;
     int rc;
 
-    spin_lock(&s->lock);
+    rc = -ENOMEM;
+    s = xzalloc(struct hvm_ioreq_server);
+    if ( !s )
+        goto fail1;
 
-    rc = hvm_map_ioreq_page(s, buf, pfn);
-    if ( rc )
-        goto fail;
+    domain_pause(d);
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
 
-    if ( !buf )
-    {
-        struct hvm_ioreq_vcpu *sv;
+    rc = -EEXIST;
+    if ( d->arch.hvm_domain.ioreq_server != NULL )
+        goto fail2;
 
-        list_for_each_entry ( sv,
-                              &s->ioreq_vcpu_list,
-                              list_entry )
-            hvm_update_ioreq_evtchn(s, sv);
-    }
+    rc = hvm_ioreq_server_init(s, d, domid);
+    if ( rc )
+        goto fail2;
 
-    spin_unlock(&s->lock);
-    domain_unpause(d); /* domain_pause() in hvm_create_ioreq_server() */
+    d->arch.hvm_domain.ioreq_server = s;
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+    domain_unpause(d);
 
     return 0;
 
- fail:
-    spin_unlock(&s->lock);
+ fail2:
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+    domain_unpause(d);
+
+    xfree(s);
+ fail1:
     return rc;
 }
 
+static void hvm_destroy_ioreq_server(struct domain *d)
+{
+    struct hvm_ioreq_server *s;
+
+    domain_pause(d);
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    s = d->arch.hvm_domain.ioreq_server;
+    if ( s )
+    {
+        d->arch.hvm_domain.ioreq_server = NULL;
+        hvm_ioreq_server_deinit(s);
+    }
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+    domain_unpause(d);
+
+    xfree(s);
+}
+
 static int hvm_replace_event_channel(struct vcpu *v, domid_t remote_domid,
                                      evtchn_port_t *p_port)
 {
@@ -737,9 +820,20 @@ static int hvm_replace_event_channel(struct vcpu *v, domid_t remote_domid,
 
 static int hvm_set_dm_domain(struct domain *d, domid_t domid)
 {
-    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+    struct hvm_ioreq_server *s;
     int rc = 0;
 
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    /*
+     * Lack of ioreq server is not a failure. HVM_PARAM_DM_DOMAIN will
+     * still be set and thus, when the server is created, it will have
+     * the correct domid.
+     */
+    s = d->arch.hvm_domain.ioreq_server;
+    if ( !s )
+        goto done;
+
     domain_pause(d);
     spin_lock(&s->lock);
 
@@ -776,12 +870,13 @@ static int hvm_set_dm_domain(struct domain *d, domid_t domid)
     spin_unlock(&s->lock);
     domain_unpause(d);
 
+ done:
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
     return rc;
 }
 
 int hvm_domain_initialise(struct domain *d)
 {
-    domid_t domid;
     int rc;
 
     if ( !hvm_enabled )
@@ -807,6 +902,7 @@ int hvm_domain_initialise(struct domain *d)
 
     }
 
+    spin_lock_init(&d->arch.hvm_domain.ioreq_server_lock);
     spin_lock_init(&d->arch.hvm_domain.irq_lock);
     spin_lock_init(&d->arch.hvm_domain.uc_lock);
 
@@ -847,21 +943,14 @@ int hvm_domain_initialise(struct domain *d)
 
     rtc_init(d);
 
-    domid = d->arch.hvm_domain.params[HVM_PARAM_DM_DOMAIN];
-    rc = hvm_create_ioreq_server(d, domid);
-    if ( rc != 0 )
-        goto fail2;
-
     register_portio_handler(d, 0xe9, 1, hvm_print_line);
 
     rc = hvm_funcs.domain_initialise(d);
     if ( rc != 0 )
-        goto fail3;
+        goto fail2;
 
     return 0;
 
- fail3:
-    hvm_destroy_ioreq_server(d);
  fail2:
     rtc_deinit(d);
     stdvga_deinit(d);
@@ -1518,7 +1607,7 @@ int hvm_vcpu_initialise(struct vcpu *v)
 {
     int rc;
     struct domain *d = v->domain;
-    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+    struct hvm_ioreq_server *s;
 
     hvm_asid_flush_vcpu(v);
 
@@ -1561,7 +1650,14 @@ int hvm_vcpu_initialise(struct vcpu *v)
          && (rc = nestedhvm_vcpu_initialise(v)) < 0 ) /* teardown: nestedhvm_vcpu_destroy */
         goto fail5;
 
-    rc = hvm_ioreq_server_add_vcpu(s, v);
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    s = d->arch.hvm_domain.ioreq_server;
+    if ( s )
+        rc = hvm_ioreq_server_add_vcpu(s, v);
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+
     if ( rc != 0 )
         goto fail6;
 
@@ -1598,9 +1694,15 @@ int hvm_vcpu_initialise(struct vcpu *v)
 void hvm_vcpu_destroy(struct vcpu *v)
 {
     struct domain *d = v->domain;
-    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+    struct hvm_ioreq_server *s;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    s = d->arch.hvm_domain.ioreq_server;
+    if ( s )
+        hvm_ioreq_server_remove_vcpu(s, v);
 
-    hvm_ioreq_server_remove_vcpu(s, v);
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
 
     nestedhvm_vcpu_destroy(v);
 
@@ -4421,12 +4523,6 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
 
             switch ( a.index )
             {
-            case HVM_PARAM_IOREQ_PFN:
-                rc = hvm_set_ioreq_pfn(d, 0, a.value);
-                break;
-            case HVM_PARAM_BUFIOREQ_PFN:
-                rc = hvm_set_ioreq_pfn(d, 1, a.value);
-                break;
             case HVM_PARAM_CALLBACK_IRQ:
                 hvm_set_callback_via(d, a.value);
                 hvm_latch_shinfo_size(d);
@@ -4472,7 +4568,7 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
                 domctl_lock_release();
                 break;
             case HVM_PARAM_DM_DOMAIN:
-                /* Not reflexive, as we must domain_pause(). */
+                /* Not reflexive, as we may need to domain_pause(). */
                 rc = -EPERM;
                 if ( curr_d == d )
                     break;
@@ -4578,6 +4674,18 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
             case HVM_PARAM_ACPI_S_STATE:
                 a.value = d->arch.hvm_domain.is_s3_suspended ? 3 : 0;
                 break;
+            case HVM_PARAM_IOREQ_PFN:
+            case HVM_PARAM_BUFIOREQ_PFN:
+            case HVM_PARAM_BUFIOREQ_EVTCHN: {
+                domid_t domid;
+                
+                /* May need to create server */
+                domid = d->arch.hvm_domain.params[HVM_PARAM_DM_DOMAIN];
+                rc = hvm_create_ioreq_server(d, domid);
+                if ( rc != 0 && rc != -EEXIST )
+                    goto param_fail;
+                /*FALLTHRU*/
+            }
             default:
                 a.value = d->arch.hvm_domain.params[a.index];
                 break;
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index 92dc5fb..1b0514c 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -64,6 +64,7 @@ struct hvm_ioreq_server {
 };
 
 struct hvm_domain {
+    spinlock_t              ioreq_server_lock;
     struct hvm_ioreq_server *ioreq_server;
 
     struct pl_time         pl_time;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v7 5/9] Add an implentation of asprintf() for xen
  2014-05-09  8:39 [PATCH v7 0/9] Support for running secondary emulators Paul Durrant
                   ` (3 preceding siblings ...)
  2014-05-09  8:39 ` [PATCH v7 4/9] ioreq-server: on-demand creation of ioreq server Paul Durrant
@ 2014-05-09  8:40 ` Paul Durrant
  2014-05-09 13:06   ` Jan Beulich
  2014-05-09  8:40 ` [PATCH v7 6/9] Add the facility to limit ranges per rangeset Paul Durrant
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 39+ messages in thread
From: Paul Durrant @ 2014-05-09  8:40 UTC (permalink / raw)
  To: xen-devel
  Cc: Keir Fraser, Ian Campbell, Tim Deegan, Ian Jackson, Paul Durrant,
	Jan Beulich

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Tim Deegan <tim@xen.org>
---
 xen/common/vsprintf.c    |   54 ++++++++++++++++++++++++++++++++++++++++++++++
 xen/include/xen/lib.h    |    4 ++++
 xen/include/xen/stdarg.h |    1 +
 3 files changed, 59 insertions(+)

diff --git a/xen/common/vsprintf.c b/xen/common/vsprintf.c
index 8c43282..c4a3962 100644
--- a/xen/common/vsprintf.c
+++ b/xen/common/vsprintf.c
@@ -631,6 +631,60 @@ int scnprintf(char * buf, size_t size, const char *fmt, ...)
 }
 EXPORT_SYMBOL(scnprintf);
 
+/**
+ * vasprintf - Format a string and allocate a buffer to place it in
+ *
+ * @bufp: Pointer to a pointer to receive the allocated buffer
+ * @fmt: The format string to use
+ * @args: Arguments for the format string
+ *
+ * -ENOMEM is returned on failure and @bufp is not touched.
+ * On success, 0 is returned. The buffer passed back is
+ * guaranteed to be null terminated. The memory is allocated
+ * from xenheap, so the buffer should be freed with xfree().
+ */
+int vasprintf(char **bufp, const char *fmt, va_list args)
+{
+    va_list args_copy;
+    size_t size;
+    char *buf, dummy[1];
+
+    va_copy(args_copy, args);
+    size = vsnprintf(dummy, 0, fmt, args_copy);
+    va_end(args_copy);
+
+    buf = xmalloc_array(char, ++size);
+    if ( !buf )
+        return -ENOMEM;
+
+    (void) vsnprintf(buf, size, fmt, args);
+
+    *bufp = buf;
+    return 0;
+}
+
+/**
+ * asprintf - Format a string and place it in a buffer
+ * @bufp: Pointer to a pointer to receive the allocated buffer
+ * @fmt: The format string to use
+ * @...: Arguments for the format string
+ *
+ * -ENOMEM is returned on failure and @bufp is not touched.
+ * On success, 0 is returned. The buffer passed back is
+ * guaranteed to be null terminated. The memory is allocated
+ * from xenheap, so the buffer should be freed with xfree().
+ */
+int asprintf(char **bufp, const char *fmt, ...)
+{
+    va_list args;
+    int i;
+
+    va_start(args, fmt);
+    i=vasprintf(bufp,fmt,args);
+    va_end(args);
+    return i;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/xen/lib.h b/xen/include/xen/lib.h
index 1369b2b..e81b80e 100644
--- a/xen/include/xen/lib.h
+++ b/xen/include/xen/lib.h
@@ -104,6 +104,10 @@ extern int scnprintf(char * buf, size_t size, const char * fmt, ...)
     __attribute__ ((format (printf, 3, 4)));
 extern int vscnprintf(char *buf, size_t size, const char *fmt, va_list args)
     __attribute__ ((format (printf, 3, 0)));
+extern int asprintf(char ** bufp, const char * fmt, ...)
+    __attribute__ ((format (printf, 2, 3)));
+extern int vasprintf(char ** bufp, const char * fmt, va_list args)
+    __attribute__ ((format (printf, 2, 0)));
 
 long simple_strtol(
     const char *cp,const char **endp, unsigned int base);
diff --git a/xen/include/xen/stdarg.h b/xen/include/xen/stdarg.h
index 216fe6d..29249a1 100644
--- a/xen/include/xen/stdarg.h
+++ b/xen/include/xen/stdarg.h
@@ -2,6 +2,7 @@
 #define __XEN_STDARG_H__
 
 typedef __builtin_va_list va_list;
+#define va_copy(dest, src)    __builtin_va_copy((dest), (src))
 #define va_start(ap, last)    __builtin_va_start((ap), (last))
 #define va_end(ap)            __builtin_va_end(ap)
 #define va_arg                __builtin_va_arg
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v7 6/9] Add the facility to limit ranges per rangeset
  2014-05-09  8:39 [PATCH v7 0/9] Support for running secondary emulators Paul Durrant
                   ` (4 preceding siblings ...)
  2014-05-09  8:40 ` [PATCH v7 5/9] Add an implentation of asprintf() for xen Paul Durrant
@ 2014-05-09  8:40 ` Paul Durrant
  2014-05-09 13:22   ` Jan Beulich
  2014-05-09  8:40 ` [PATCH v7 7/9] ioreq-server: add support for multiple servers Paul Durrant
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 39+ messages in thread
From: Paul Durrant @ 2014-05-09  8:40 UTC (permalink / raw)
  To: xen-devel; +Cc: Paul Durrant, Keir Fraser, Jan Beulich

A subsequent patch exposes rangesets to secondary emulators, so to allow a
limit to be placed on the amount of xenheap that an emulator can cause to be
consumed, the function rangeset_limit() has been created to set the allowed
number of ranges in a rangeset. By default, there is no limit.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
---
 xen/common/rangeset.c      |   56 +++++++++++++++++++++++++++++++++++++-------
 xen/include/xen/rangeset.h |   17 ++++++++++++++
 2 files changed, 64 insertions(+), 9 deletions(-)

diff --git a/xen/common/rangeset.c b/xen/common/rangeset.c
index 2b986fb..93a3ef2 100644
--- a/xen/common/rangeset.c
+++ b/xen/common/rangeset.c
@@ -25,6 +25,9 @@ struct rangeset {
 
     /* Ordered list of ranges contained in this set, and protecting lock. */
     struct list_head range_list;
+
+    /* Number of ranges that can be allocated */
+    long             nr_ranges;
     spinlock_t       lock;
 
     /* Pretty-printing name. */
@@ -81,12 +84,30 @@ static void insert_range(
 
 /* Remove a range from its list and free it. */
 static void destroy_range(
-    struct range *x)
+    struct rangeset *r, struct range *x)
 {
+    r->nr_ranges++;
+
     list_del(&x->list);
     xfree(x);
 }
 
+/* Allocate a new range */
+static struct range *alloc_range(
+    struct rangeset *r)
+{
+    struct range *x;
+
+    if ( r->nr_ranges == 0 )
+        return NULL;
+
+    x = xmalloc(struct range);
+    if ( x )
+        --r->nr_ranges;
+
+    return x;
+}
+
 /*****************************
  * Core public functions
  */
@@ -108,7 +129,7 @@ int rangeset_add_range(
     {
         if ( (x == NULL) || ((x->e < s) && ((x->e + 1) != s)) )
         {
-            x = xmalloc(struct range);
+            x = alloc_range(r);
             if ( x == NULL )
             {
                 rc = -ENOMEM;
@@ -143,7 +164,7 @@ int rangeset_add_range(
             y = next_range(r, x);
             if ( (y == NULL) || (y->e > x->e) )
                 break;
-            destroy_range(y);
+            destroy_range(r, y);
         }
     }
 
@@ -151,7 +172,7 @@ int rangeset_add_range(
     if ( (y != NULL) && ((x->e + 1) == y->s) )
     {
         x->e = y->e;
-        destroy_range(y);
+        destroy_range(r, y);
     }
 
  out:
@@ -179,7 +200,7 @@ int rangeset_remove_range(
 
         if ( (x->s < s) && (x->e > e) )
         {
-            y = xmalloc(struct range);
+            y = alloc_range(r);
             if ( y == NULL )
             {
                 rc = -ENOMEM;
@@ -193,7 +214,7 @@ int rangeset_remove_range(
             insert_range(r, x, y);
         }
         else if ( (x->s == s) && (x->e <= e) )
-            destroy_range(x);
+            destroy_range(r, x);
         else if ( x->s == s )
             x->s = e + 1;
         else if ( x->e <= e )
@@ -214,12 +235,12 @@ int rangeset_remove_range(
         {
             t = x;
             x = next_range(r, x);
-            destroy_range(t);
+            destroy_range(r, t);
         }
 
         x->s = e + 1;
         if ( x->s > x->e )
-            destroy_range(x);
+            destroy_range(r, x);
     }
 
  out:
@@ -312,6 +333,7 @@ struct rangeset *rangeset_new(
 
     spin_lock_init(&r->lock);
     INIT_LIST_HEAD(&r->range_list);
+    r->nr_ranges = -1;
 
     BUG_ON(flags & ~RANGESETF_prettyprint_hex);
     r->flags = flags;
@@ -351,11 +373,17 @@ void rangeset_destroy(
     }
 
     while ( (x = first_range(r)) != NULL )
-        destroy_range(x);
+        destroy_range(r, x);
 
     xfree(r);
 }
 
+void rangeset_limit(
+    struct rangeset *r, unsigned int limit)
+{
+    r->nr_ranges = limit;
+}
+
 void rangeset_domain_initialise(
     struct domain *d)
 {
@@ -461,3 +489,13 @@ void rangeset_domain_printk(
 
     spin_unlock(&d->rangesets_lock);
 }
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/xen/rangeset.h b/xen/include/xen/rangeset.h
index 2c122c1..5ed6817 100644
--- a/xen/include/xen/rangeset.h
+++ b/xen/include/xen/rangeset.h
@@ -38,6 +38,13 @@ struct rangeset *rangeset_new(
 void rangeset_destroy(
     struct rangeset *r);
 
+/*
+ * Set a limit on the number of ranges that may exist in set @r.
+ * NOTE: This must be called while @r is empty.
+ */
+void rangeset_limit(
+    struct rangeset *r, unsigned int limit);
+
 /* Flags for passing to rangeset_new(). */
  /* Pretty-print range limits in hexadecimal. */
 #define _RANGESETF_prettyprint_hex 0
@@ -77,3 +84,13 @@ void rangeset_domain_printk(
     struct domain *d);
 
 #endif /* __XEN_RANGESET_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v7 7/9] ioreq-server: add support for multiple servers
  2014-05-09  8:39 [PATCH v7 0/9] Support for running secondary emulators Paul Durrant
                   ` (5 preceding siblings ...)
  2014-05-09  8:40 ` [PATCH v7 6/9] Add the facility to limit ranges per rangeset Paul Durrant
@ 2014-05-09  8:40 ` Paul Durrant
  2014-05-09  9:34   ` Ian Campbell
  2014-05-12 11:26   ` Jan Beulich
  2014-05-09  8:40 ` [PATCH v7 8/9] ioreq-server: remove p2m entries when server is enabled Paul Durrant
  2014-05-09  8:40 ` [PATCH v7 9/9] ioreq-server: make buffered ioreq handling optional Paul Durrant
  8 siblings, 2 replies; 39+ messages in thread
From: Paul Durrant @ 2014-05-09  8:40 UTC (permalink / raw)
  To: xen-devel
  Cc: Paul Durrant, Ian Jackson, Ian Campbell, Jan Beulich, Stefano Stabellini

The previous single ioreq server that was created on demand now
becomes the default server and an API is created to allow secondary
servers, which handle specific IO ranges or PCI devices, to be added.

When the guest issues an IO the list of secondary servers is checked
for a matching IO range or PCI device. If none is found then the IO
is passed to the default server.

Secondary servers use guest pages to communicate with emulators, in
the same way as the default server. These pages need to be in the
guest physmap otherwise there is no suitable reference that can be
queried by an emulator in order to map them. Therefore a pool of
pages in the current E820 reserved region, just below the special
pages is used. Secondary servers allocate from and free to this pool
as they are created and destroyed.

The size of the pool is currently hardcoded in the domain build at a
value of 8. This should be sufficient for now and both the location and
size of the pool can be modified in future without any need to change the
API.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
---
 tools/libxc/xc_domain.c          |  224 ++++++++++
 tools/libxc/xc_domain_restore.c  |   48 ++
 tools/libxc/xc_domain_save.c     |   24 +
 tools/libxc/xc_hvm_build_x86.c   |   50 ++-
 tools/libxc/xc_private.c         |   16 +-
 tools/libxc/xenctrl.h            |  143 +++++-
 tools/libxc/xg_save_restore.h    |    3 +
 xen/arch/x86/hvm/hvm.c           |  889 ++++++++++++++++++++++++++++++++++----
 xen/arch/x86/hvm/io.c            |    2 +-
 xen/include/asm-x86/hvm/domain.h |   27 +-
 xen/include/asm-x86/hvm/hvm.h    |    1 +
 xen/include/public/hvm/hvm_op.h  |  117 +++++
 xen/include/public/hvm/ioreq.h   |    9 +-
 xen/include/public/hvm/params.h  |    5 +-
 xen/include/xsm/dummy.h          |    6 +
 xen/include/xsm/xsm.h            |    6 +
 xen/xsm/flask/hooks.c            |    6 +
 17 files changed, 1467 insertions(+), 109 deletions(-)

diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 369c3f3..a8c9f81 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1284,6 +1284,230 @@ int xc_get_hvm_param(xc_interface *handle, domid_t dom, int param, unsigned long
     return rc;
 }
 
+int xc_hvm_create_ioreq_server(xc_interface *xch,
+                               domid_t domid,
+                               ioservid_t *id)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_create_ioreq_server_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_create_ioreq_server;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+
+    arg->domid = domid;
+
+    rc = do_xen_hypercall(xch, &hypercall);
+
+    *id = arg->id;
+
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
+int xc_hvm_get_ioreq_server_info(xc_interface *xch,
+                                 domid_t domid,
+                                 ioservid_t id,
+                                 xen_pfn_t *ioreq_pfn,
+                                 xen_pfn_t *bufioreq_pfn,
+                                 evtchn_port_t *bufioreq_port)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_get_ioreq_server_info_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_get_ioreq_server_info;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+
+    arg->domid = domid;
+    arg->id = id;
+
+    rc = do_xen_hypercall(xch, &hypercall);
+    if ( rc != 0 )
+        goto done;
+
+    if ( ioreq_pfn )
+        *ioreq_pfn = arg->ioreq_pfn;
+
+    if ( bufioreq_pfn )
+        *bufioreq_pfn = arg->bufioreq_pfn;
+
+    if ( bufioreq_port )
+        *bufioreq_port = arg->bufioreq_port;
+
+done:
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
+int xc_hvm_map_io_range_to_ioreq_server(xc_interface *xch, domid_t domid,
+                                        ioservid_t id, int is_mmio,
+                                        uint64_t start, uint64_t end)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_io_range_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_map_io_range_to_ioreq_server;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+
+    arg->domid = domid;
+    arg->id = id;
+    arg->type = is_mmio ? HVMOP_IO_RANGE_MEMORY : HVMOP_IO_RANGE_PORT;
+    arg->start = start;
+    arg->end = end;
+
+    rc = do_xen_hypercall(xch, &hypercall);
+
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
+int xc_hvm_unmap_io_range_from_ioreq_server(xc_interface *xch, domid_t domid,
+                                            ioservid_t id, int is_mmio,
+                                            uint64_t start, uint64_t end)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_io_range_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_unmap_io_range_from_ioreq_server;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+
+    arg->domid = domid;
+    arg->id = id;
+    arg->type = is_mmio ? HVMOP_IO_RANGE_MEMORY : HVMOP_IO_RANGE_PORT;
+    arg->start = start;
+    arg->end = end;
+
+    rc = do_xen_hypercall(xch, &hypercall);
+
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
+int xc_hvm_map_pcidev_to_ioreq_server(xc_interface *xch, domid_t domid,
+                                      ioservid_t id, uint16_t segment,
+                                      uint8_t bus, uint8_t device,
+                                      uint8_t function)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_io_range_t, arg);
+    int rc;
+
+    if (device > 0x1f || function > 0x7) {
+        errno = EINVAL;
+        return -1;
+    }
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_map_io_range_to_ioreq_server;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+
+    arg->domid = domid;
+    arg->id = id;
+    arg->type = HVMOP_IO_RANGE_PCI;
+
+    /*
+     * The underlying hypercall will deal with ranges of PCI SBDF
+     * but, for simplicity, the API only uses singletons.
+     */
+    arg->start = arg->end = HVMOP_PCI_SBDF((uint64_t)segment,
+                                           (uint64_t)bus,
+                                           (uint64_t)device,
+                                           (uint64_t)function);
+
+    rc = do_xen_hypercall(xch, &hypercall);
+
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
+int xc_hvm_unmap_pcidev_from_ioreq_server(xc_interface *xch, domid_t domid,
+                                          ioservid_t id, uint16_t segment,
+                                          uint8_t bus, uint8_t device,
+                                          uint8_t function)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_io_range_t, arg);
+    int rc;
+
+    if (device > 0x1f || function > 0x7) {
+        errno = EINVAL;
+        return -1;
+    }
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_unmap_io_range_from_ioreq_server;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+
+    arg->domid = domid;
+    arg->id = id;
+    arg->type = HVMOP_IO_RANGE_PCI;
+    arg->start = arg->end = HVMOP_PCI_SBDF((uint64_t)segment,
+                                           (uint64_t)bus,
+                                           (uint64_t)device,
+                                           (uint64_t)function);
+
+    rc = do_xen_hypercall(xch, &hypercall);
+
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
+int xc_hvm_destroy_ioreq_server(xc_interface *xch,
+                                domid_t domid,
+                                ioservid_t id)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_destroy_ioreq_server_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_destroy_ioreq_server;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+
+    arg->domid = domid;
+    arg->id = id;
+
+    rc = do_xen_hypercall(xch, &hypercall);
+
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
 int xc_domain_setdebugging(xc_interface *xch,
                            uint32_t domid,
                            unsigned int enable)
diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c
index bcb0ae0..0d97555 100644
--- a/tools/libxc/xc_domain_restore.c
+++ b/tools/libxc/xc_domain_restore.c
@@ -38,6 +38,7 @@
 
 #include <stdlib.h>
 #include <unistd.h>
+#include <inttypes.h>
 
 #include "xg_private.h"
 #include "xg_save_restore.h"
@@ -740,6 +741,8 @@ typedef struct {
     uint64_t acpi_ioport_location;
     uint64_t viridian;
     uint64_t vm_generationid_addr;
+    uint64_t ioreq_server_pfn;
+    uint64_t nr_ioreq_server_pages;
 
     struct toolstack_data_t tdata;
 } pagebuf_t;
@@ -990,6 +993,26 @@ static int pagebuf_get_one(xc_interface *xch, struct restore_ctx *ctx,
         DPRINTF("read generation id buffer address");
         return pagebuf_get_one(xch, ctx, buf, fd, dom);
 
+    case XC_SAVE_ID_HVM_IOREQ_SERVER_PFN:
+        /* Skip padding 4 bytes then read the ioreq server gmfn base. */
+        if ( RDEXACT(fd, &buf->ioreq_server_pfn, sizeof(uint32_t)) ||
+             RDEXACT(fd, &buf->ioreq_server_pfn, sizeof(uint64_t)) )
+        {
+            PERROR("error read the ioreq server gmfn base");
+            return -1;
+        }
+        return pagebuf_get_one(xch, ctx, buf, fd, dom);
+
+    case XC_SAVE_ID_HVM_NR_IOREQ_SERVER_PAGES:
+        /* Skip padding 4 bytes then read the ioreq server gmfn count. */
+        if ( RDEXACT(fd, &buf->nr_ioreq_server_pages, sizeof(uint32_t)) ||
+             RDEXACT(fd, &buf->nr_ioreq_server_pages, sizeof(uint64_t)) )
+        {
+            PERROR("error read the ioreq server gmfn count");
+            return -1;
+        }
+        return pagebuf_get_one(xch, ctx, buf, fd, dom);
+
     default:
         if ( (count > MAX_BATCH_SIZE) || (count < 0) ) {
             ERROR("Max batch size exceeded (%d). Giving up.", count);
@@ -1748,6 +1771,31 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
     if (pagebuf.viridian != 0)
         xc_set_hvm_param(xch, dom, HVM_PARAM_VIRIDIAN, 1);
 
+    /*
+     * If we are migrating in from a host that does not support
+     * secondary emulators then nr_ioreq_server_pages will be 0, since
+     * there will be no XC_SAVE_ID_HVM_NR_IOREQ_SERVER_PAGES chunk in
+     * the image.
+     * If we are migrating from a host that does support secondary
+     * emulators then the XC_SAVE_ID_HVM_NR_IOREQ_SERVER_PAGES chunk
+     * will exist and is guaranteed to have a non-zero value. The
+     * existence of that chunk also implies the existence of the
+     * XC_SAVE_ID_HVM_IOREQ_SERVER_PFN chunk, which is also guaranteed
+     * to have a non-zero value.
+     */
+    if (!pagebuf.nr_ioreq_server_pages ^ !pagebuf.ioreq_server_pfn) {
+        ERROR("XC_SAVE_ID_HVM_NR_IOREQ_SERVER_PAGES = %"PRIx64" XC_SAVE_ID_HVM_IOREQ_SERVER_PFN = %"PRIx64,
+              pagebuf.nr_ioreq_server_pages, pagebuf.ioreq_server_pfn);
+    } else {
+        if (pagebuf.nr_ioreq_server_pages != 0 &&
+            pagebuf.ioreq_server_pfn != 0) {
+            xc_set_hvm_param(xch, dom, HVM_PARAM_NR_IOREQ_SERVER_PAGES, 
+                             pagebuf.nr_ioreq_server_pages);
+            xc_set_hvm_param(xch, dom, HVM_PARAM_IOREQ_SERVER_PFN,
+                             pagebuf.ioreq_server_pfn);
+        }
+    }
+
     if (pagebuf.acpi_ioport_location == 1) {
         DBGPRINTF("Use new firmware ioport from the checkpoint\n");
         xc_set_hvm_param(xch, dom, HVM_PARAM_ACPI_IOPORTS_LOCATION, 1);
diff --git a/tools/libxc/xc_domain_save.c b/tools/libxc/xc_domain_save.c
index 71f9b59..acf3685 100644
--- a/tools/libxc/xc_domain_save.c
+++ b/tools/libxc/xc_domain_save.c
@@ -1737,6 +1737,30 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iter
             PERROR("Error when writing the viridian flag");
             goto out;
         }
+
+        chunk.id = XC_SAVE_ID_HVM_IOREQ_SERVER_PFN;
+        chunk.data = 0;
+        xc_get_hvm_param(xch, dom, HVM_PARAM_IOREQ_SERVER_PFN,
+                         (unsigned long *)&chunk.data);
+
+        if ( (chunk.data != 0) &&
+             wrexact(io_fd, &chunk, sizeof(chunk)) )
+        {
+            PERROR("Error when writing the ioreq server gmfn base");
+            goto out;
+        }
+
+        chunk.id = XC_SAVE_ID_HVM_NR_IOREQ_SERVER_PAGES;
+        chunk.data = 0;
+        xc_get_hvm_param(xch, dom, HVM_PARAM_NR_IOREQ_SERVER_PAGES,
+                         (unsigned long *)&chunk.data);
+
+        if ( (chunk.data != 0) &&
+             wrexact(io_fd, &chunk, sizeof(chunk)) )
+        {
+            PERROR("Error when writing the ioreq server gmfn count");
+            goto out;
+        }
     }
 
     if ( callbacks != NULL && callbacks->toolstack_save != NULL )
diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
index dd3b522..eb8d539 100644
--- a/tools/libxc/xc_hvm_build_x86.c
+++ b/tools/libxc/xc_hvm_build_x86.c
@@ -49,6 +49,9 @@
 #define NR_SPECIAL_PAGES     8
 #define special_pfn(x) (0xff000u - NR_SPECIAL_PAGES + (x))
 
+#define NR_IOREQ_SERVER_PAGES 8
+#define ioreq_server_pfn(x) (special_pfn(0) - NR_IOREQ_SERVER_PAGES + (x))
+
 #define VGA_HOLE_SIZE (0x20)
 
 static int modules_init(struct xc_hvm_build_args *args,
@@ -114,7 +117,7 @@ static void build_hvm_info(void *hvm_info_page, uint64_t mem_size,
     /* Memory parameters. */
     hvm_info->low_mem_pgend = lowmem_end >> PAGE_SHIFT;
     hvm_info->high_mem_pgend = highmem_end >> PAGE_SHIFT;
-    hvm_info->reserved_mem_pgstart = special_pfn(0);
+    hvm_info->reserved_mem_pgstart = ioreq_server_pfn(0);
 
     /* Finish with the checksum. */
     for ( i = 0, sum = 0; i < hvm_info->length; i++ )
@@ -257,6 +260,8 @@ static int setup_guest(xc_interface *xch,
         stat_1gb_pages = 0;
     int pod_mode = 0;
     int claim_enabled = args->claim_enabled;
+    xen_pfn_t special_array[NR_SPECIAL_PAGES];
+    xen_pfn_t ioreq_server_array[NR_IOREQ_SERVER_PAGES];
 
     if ( nr_pages > target_pages )
         pod_mode = XENMEMF_populate_on_demand;
@@ -474,18 +479,19 @@ static int setup_guest(xc_interface *xch,
 
     /* Allocate and clear special pages. */
     for ( i = 0; i < NR_SPECIAL_PAGES; i++ )
+        special_array[i] = special_pfn(i);
+
+    rc = xc_domain_populate_physmap_exact(xch, dom, NR_SPECIAL_PAGES, 0, 0,
+                                          special_array);
+    if ( rc != 0 )
     {
-        xen_pfn_t pfn = special_pfn(i);
-        rc = xc_domain_populate_physmap_exact(xch, dom, 1, 0, 0, &pfn);
-        if ( rc != 0 )
-        {
-            PERROR("Could not allocate %d'th special page.", i);
-            goto error_out;
-        }
-        if ( xc_clear_domain_page(xch, dom, special_pfn(i)) )
-            goto error_out;
+        PERROR("Could not allocate special pages.");
+        goto error_out;
     }
 
+    if ( xc_clear_domain_pages(xch, dom, special_pfn(0), NR_SPECIAL_PAGES) )
+            goto error_out;
+
     xc_set_hvm_param(xch, dom, HVM_PARAM_STORE_PFN,
                      special_pfn(SPECIALPAGE_XENSTORE));
     xc_set_hvm_param(xch, dom, HVM_PARAM_BUFIOREQ_PFN,
@@ -502,6 +508,30 @@ static int setup_guest(xc_interface *xch,
                      special_pfn(SPECIALPAGE_SHARING));
 
     /*
+     * Allocate and clear additional ioreq server pages. The default
+     * server will use the IOREQ and BUFIOREQ special pages above.
+     */
+    for ( i = 0; i < NR_IOREQ_SERVER_PAGES; i++ )
+        ioreq_server_array[i] = ioreq_server_pfn(i);
+
+    rc = xc_domain_populate_physmap_exact(xch, dom, NR_IOREQ_SERVER_PAGES, 0, 0,
+                                          ioreq_server_array);
+    if ( rc != 0 )
+    {
+        PERROR("Could not allocate ioreq server pages.");
+        goto error_out;
+    }
+
+    if ( xc_clear_domain_pages(xch, dom, ioreq_server_pfn(0), NR_IOREQ_SERVER_PAGES) )
+            goto error_out;
+
+    /* Tell the domain where the pages are and how many there are */
+    xc_set_hvm_param(xch, dom, HVM_PARAM_IOREQ_SERVER_PFN,
+                     ioreq_server_pfn(0));
+    xc_set_hvm_param(xch, dom, HVM_PARAM_NR_IOREQ_SERVER_PAGES,
+                     NR_IOREQ_SERVER_PAGES);
+
+    /*
      * Identity-map page table is required for running with CR0.PG=0 when
      * using Intel EPT. Create a 32-bit non-PAE page directory of superpages.
      */
diff --git a/tools/libxc/xc_private.c b/tools/libxc/xc_private.c
index 0e18892..9aeee08 100644
--- a/tools/libxc/xc_private.c
+++ b/tools/libxc/xc_private.c
@@ -628,17 +628,19 @@ int xc_copy_to_domain_page(xc_interface *xch,
     return 0;
 }
 
-int xc_clear_domain_page(xc_interface *xch,
-                         uint32_t domid,
-                         unsigned long dst_pfn)
+int xc_clear_domain_pages(xc_interface *xch,
+                          uint32_t domid,
+                          unsigned long dst_pfn,
+                          int num)
 {
+    size_t size = num * PAGE_SIZE;
     void *vaddr = xc_map_foreign_range(
-        xch, domid, PAGE_SIZE, PROT_WRITE, dst_pfn);
+        xch, domid, size, PROT_WRITE, dst_pfn);
     if ( vaddr == NULL )
         return -1;
-    memset(vaddr, 0, PAGE_SIZE);
-    munmap(vaddr, PAGE_SIZE);
-    xc_domain_cacheflush(xch, domid, dst_pfn, 1);
+    memset(vaddr, 0, size);
+    munmap(vaddr, size);
+    xc_domain_cacheflush(xch, domid, dst_pfn, num);
     return 0;
 }
 
diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
index 02129f7..3260a56 100644
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -1393,8 +1393,14 @@ int xc_get_pfn_list(xc_interface *xch, uint32_t domid, uint64_t *pfn_buf,
 int xc_copy_to_domain_page(xc_interface *xch, uint32_t domid,
                            unsigned long dst_pfn, const char *src_page);
 
-int xc_clear_domain_page(xc_interface *xch, uint32_t domid,
-                         unsigned long dst_pfn);
+int xc_clear_domain_pages(xc_interface *xch, uint32_t domid,
+                          unsigned long dst_pfn, int num);
+
+static inline int xc_clear_domain_page(xc_interface *xch, uint32_t domid,
+                                       unsigned long dst_pfn)
+{
+    return xc_clear_domain_pages(xch, domid, dst_pfn, 1);
+}
 
 int xc_mmuext_op(xc_interface *xch, struct mmuext_op *op, unsigned int nr_ops,
                  domid_t dom);
@@ -1787,6 +1793,129 @@ void xc_clear_last_error(xc_interface *xch);
 int xc_set_hvm_param(xc_interface *handle, domid_t dom, int param, unsigned long value);
 int xc_get_hvm_param(xc_interface *handle, domid_t dom, int param, unsigned long *value);
 
+/*
+ * IOREQ Server API. (See section on IOREQ Servers in public/hvm_op.h).
+ */
+
+/**
+ * This function instantiates an IOREQ Server.
+ *
+ * @parm xch a handle to an open hypervisor interface.
+ * @parm domid the domain id to be serviced
+ * @parm id pointer to an ioservid_t to receive the IOREQ Server id.
+ * @return 0 on success, -1 on failure.
+ */
+int xc_hvm_create_ioreq_server(xc_interface *xch,
+                               domid_t domid,
+                               ioservid_t *id);
+
+/**
+ * This function retrieves the necessary information to allow an
+ * emulator to use an IOREQ Server.
+ *
+ * @parm xch a handle to an open hypervisor interface.
+ * @parm domid the domain id to be serviced
+ * @parm id the IOREQ Server id.
+ * @parm ioreq_pfn pointer to a xen_pfn_t to receive the synchronous ioreq gmfn
+ * @parm bufioreq_pfn pointer to a xen_pfn_t to receive the buffered ioreq gmfn
+ * @parm bufioreq_port pointer to a evtchn_port_t to receive the buffered ioreq event channel
+ * @return 0 on success, -1 on failure.
+ */
+int xc_hvm_get_ioreq_server_info(xc_interface *xch,
+                                 domid_t domid,
+                                 ioservid_t id,
+                                 xen_pfn_t *ioreq_pfn,
+                                 xen_pfn_t *bufioreq_pfn,
+                                 evtchn_port_t *bufioreq_port);
+
+/**
+ * This function registers a range of memory or I/O ports for emulation.
+ *
+ * @parm xch a handle to an open hypervisor interface.
+ * @parm domid the domain id to be serviced
+ * @parm id the IOREQ Server id.
+ * @parm is_mmio is this a range of ports or memory
+ * @parm start start of range
+ * @parm end end of range (inclusive).
+ * @return 0 on success, -1 on failure.
+ */
+int xc_hvm_map_io_range_to_ioreq_server(xc_interface *xch,
+                                        domid_t domid,
+                                        ioservid_t id,
+                                        int is_mmio,
+                                        uint64_t start,
+                                        uint64_t end);
+
+/**
+ * This function deregisters a range of memory or I/O ports for emulation.
+ *
+ * @parm xch a handle to an open hypervisor interface.
+ * @parm domid the domain id to be serviced
+ * @parm id the IOREQ Server id.
+ * @parm is_mmio is this a range of ports or memory
+ * @parm start start of range
+ * @parm end end of range (inclusive).
+ * @return 0 on success, -1 on failure.
+ */
+int xc_hvm_unmap_io_range_from_ioreq_server(xc_interface *xch,
+                                            domid_t domid,
+                                            ioservid_t id,
+                                            int is_mmio,
+                                            uint64_t start,
+                                            uint64_t end);
+
+/**
+ * This function registers a PCI device for config space emulation.
+ *
+ * @parm xch a handle to an open hypervisor interface.
+ * @parm domid the domain id to be serviced
+ * @parm id the IOREQ Server id.
+ * @parm segment the PCI segment of the device
+ * @parm bus the PCI bus of the device
+ * @parm device the 'slot' number of the device
+ * @parm function the function number of the device
+ * @return 0 on success, -1 on failure.
+ */
+int xc_hvm_map_pcidev_to_ioreq_server(xc_interface *xch,
+                                      domid_t domid,
+                                      ioservid_t id,
+                                      uint16_t segment,
+                                      uint8_t bus,
+                                      uint8_t device,
+                                      uint8_t function);
+
+/**
+ * This function deregisters a PCI device for config space emulation.
+ *
+ * @parm xch a handle to an open hypervisor interface.
+ * @parm domid the domain id to be serviced
+ * @parm id the IOREQ Server id.
+ * @parm segment the PCI segment of the device
+ * @parm bus the PCI bus of the device
+ * @parm device the 'slot' number of the device
+ * @parm function the function number of the device
+ * @return 0 on success, -1 on failure.
+ */
+int xc_hvm_unmap_pcidev_from_ioreq_server(xc_interface *xch,
+                                          domid_t domid,
+                                          ioservid_t id,
+                                          uint16_t segment,
+                                          uint8_t bus,
+                                          uint8_t device,
+                                          uint8_t function);
+
+/**
+ * This function destroys an IOREQ Server.
+ *
+ * @parm xch a handle to an open hypervisor interface.
+ * @parm domid the domain id to be serviced
+ * @parm id the IOREQ Server id.
+ * @return 0 on success, -1 on failure.
+ */
+int xc_hvm_destroy_ioreq_server(xc_interface *xch,
+                                domid_t domid,
+                                ioservid_t id);
+
 /* HVM guest pass-through */
 int xc_assign_device(xc_interface *xch,
                      uint32_t domid,
@@ -2425,3 +2554,13 @@ int xc_kexec_load(xc_interface *xch, uint8_t type, uint16_t arch,
 int xc_kexec_unload(xc_interface *xch, int type);
 
 #endif /* XENCTRL_H */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/libxc/xg_save_restore.h b/tools/libxc/xg_save_restore.h
index f859621..69ea64e 100644
--- a/tools/libxc/xg_save_restore.h
+++ b/tools/libxc/xg_save_restore.h
@@ -259,6 +259,9 @@
 #define XC_SAVE_ID_HVM_ACCESS_RING_PFN  -16
 #define XC_SAVE_ID_HVM_SHARING_RING_PFN -17
 #define XC_SAVE_ID_TOOLSTACK          -18 /* Optional toolstack specific info */
+/* These are a pair; it is an error for one to exist without the other */
+#define XC_SAVE_ID_HVM_IOREQ_SERVER_PFN -19
+#define XC_SAVE_ID_HVM_NR_IOREQ_SERVER_PAGES -20
 
 /*
 ** We process save/restore/migrate in batches of pages; the below
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index cdcb25b..58cc478 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -66,6 +66,7 @@
 #include <asm/mem_event.h>
 #include <asm/mem_access.h>
 #include <public/mem_event.h>
+#include <xen/rangeset.h>
 
 bool_t __read_mostly hvm_enabled;
 
@@ -375,27 +376,35 @@ static ioreq_t *get_ioreq(struct hvm_ioreq_server *s, struct vcpu *v)
 
 bool_t hvm_io_pending(struct vcpu *v)
 {
-    struct hvm_ioreq_server *s = v->domain->arch.hvm_domain.ioreq_server;
-    ioreq_t *p;
-
-    if ( !s )
-        return 0;
+    struct domain *d = v->domain;
+    struct hvm_ioreq_server *s;
+ 
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server.list,
+                          list_entry )
+    {
+        ioreq_t *p = get_ioreq(s, v);
+ 
+        if ( p->state != STATE_IOREQ_NONE )
+            return 1;
+    }
 
-    p = get_ioreq(s, v);
-    return p->state != STATE_IOREQ_NONE;
+    return 0;
 }
 
 void hvm_do_resume(struct vcpu *v)
 {
     struct domain *d = v->domain;
-    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+    struct hvm_ioreq_server *s;
 
     check_wakeup_from_wait();
 
     if ( is_hvm_vcpu(v) )
         pt_restore_timer(v);
 
-    if ( s )
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server.list,
+                          list_entry )
     {
         ioreq_t *p = get_ioreq(s, v);
 
@@ -430,6 +439,32 @@ void hvm_do_resume(struct vcpu *v)
     }
 }
 
+static int hvm_alloc_ioreq_gmfn(struct domain *d, unsigned long *gmfn)
+{
+    unsigned int i;
+    int rc;
+
+    rc = -ENOMEM;
+    for ( i = 0; i < d->arch.hvm_domain.ioreq_gmfn.count; i++ )
+    {
+        if ( !test_and_set_bit(i, &d->arch.hvm_domain.ioreq_gmfn.mask) )
+        {
+            *gmfn = d->arch.hvm_domain.ioreq_gmfn.base + i;
+            rc = 0;
+            break;
+        }
+    }
+
+    return rc;
+}
+
+static void hvm_free_ioreq_gmfn(struct domain *d, unsigned long gmfn)
+{
+    unsigned int i = gmfn - d->arch.hvm_domain.ioreq_gmfn.base;
+
+    clear_bit(i, &d->arch.hvm_domain.ioreq_gmfn.mask);
+}
+
 void destroy_ring_for_helper(
     void **_va, struct page_info *page)
 {
@@ -514,6 +549,7 @@ static int hvm_map_ioreq_page(
 
     iorp->va = va;
     iorp->page = page;
+    iorp->gmfn = gmfn;
 
     return 0;
 }
@@ -544,6 +580,31 @@ static int hvm_print_line(
     return X86EMUL_OKAY;
 }
 
+static int hvm_access_cf8(
+    int dir, uint32_t port, uint32_t bytes, uint32_t *val)
+{
+    struct domain *d = current->domain;
+
+    if ( bytes != 4 )
+        return X86EMUL_UNHANDLEABLE;
+
+    BUG_ON(port < 0xcf8);
+    port -= 0xcf8;
+
+    if ( dir == IOREQ_WRITE )
+    {
+        d->arch.hvm_domain.pci_cf8 = *val;
+
+        /* We always need to fall through to the catch all emulator */
+        return X86EMUL_UNHANDLEABLE;
+    }
+    else
+    {
+        *val = d->arch.hvm_domain.pci_cf8;
+        return X86EMUL_OKAY;
+    }
+}
+
 static int handle_pvh_io(
     int dir, uint32_t port, uint32_t bytes, uint32_t *val)
 {
@@ -572,7 +633,7 @@ static void hvm_update_ioreq_evtchn(struct hvm_ioreq_server *s,
 }
 
 static int hvm_ioreq_server_add_vcpu(struct hvm_ioreq_server *s,
-                                     struct vcpu *v)
+                                     bool_t is_default, struct vcpu *v)
 {
     struct hvm_ioreq_vcpu *sv;
     int rc;
@@ -600,8 +661,9 @@ static int hvm_ioreq_server_add_vcpu(struct hvm_ioreq_server *s,
             goto fail3;
 
         s->bufioreq_evtchn = rc;
-        d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN] =
-            s->bufioreq_evtchn;
+        if ( is_default )
+            d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN] =
+                s->bufioreq_evtchn;
     }
 
     sv->vcpu = v;
@@ -678,41 +740,127 @@ static void hvm_ioreq_server_remove_all_vcpus(struct hvm_ioreq_server *s)
     spin_unlock(&s->lock);
 }
 
-static int hvm_ioreq_server_map_pages(struct hvm_ioreq_server *s)
+static int hvm_ioreq_server_map_pages(struct hvm_ioreq_server *s,
+                                      bool_t is_default)
 {
     struct domain *d = s->domain;
-    unsigned long pfn;
+    unsigned long ioreq_pfn, bufioreq_pfn;
     int rc;
 
-    pfn = d->arch.hvm_domain.params[HVM_PARAM_IOREQ_PFN];
-    rc = hvm_map_ioreq_page(s, 0, pfn);
+    if ( is_default ) {
+        ioreq_pfn = d->arch.hvm_domain.params[HVM_PARAM_IOREQ_PFN];
+        bufioreq_pfn = d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_PFN];
+    } else {
+        rc = hvm_alloc_ioreq_gmfn(d, &ioreq_pfn);
+        if ( rc )
+            goto fail1;
+
+        rc = hvm_alloc_ioreq_gmfn(d, &bufioreq_pfn);
+        if ( rc )
+            goto fail2;
+    }
+
+    rc = hvm_map_ioreq_page(s, 0, ioreq_pfn);
     if ( rc )
-        return rc;
+        goto fail3;
 
-    pfn = d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_PFN];
-    rc = hvm_map_ioreq_page(s, 1, pfn);
+    rc = hvm_map_ioreq_page(s, 1, bufioreq_pfn);
     if ( rc )
-        goto fail;
+        goto fail4;
 
     return 0;
 
-fail:
+fail4:
     hvm_unmap_ioreq_page(s, 0);
+
+fail3:
+    if ( !is_default )
+        hvm_free_ioreq_gmfn(d, bufioreq_pfn);
+
+fail2:
+    if ( !is_default )
+        hvm_free_ioreq_gmfn(d, ioreq_pfn);
+
+fail1:
     return rc;
 }
 
-static void hvm_ioreq_server_unmap_pages(struct hvm_ioreq_server *s)
+static void hvm_ioreq_server_unmap_pages(struct hvm_ioreq_server *s, 
+                                         bool_t is_default)
 {
+    struct domain *d = s->domain;
+
     hvm_unmap_ioreq_page(s, 1);
     hvm_unmap_ioreq_page(s, 0);
+
+    if ( !is_default ) {
+        hvm_free_ioreq_gmfn(d, s->bufioreq.gmfn);
+        hvm_free_ioreq_gmfn(d, s->ioreq.gmfn);
+    }
+}
+
+static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s, 
+                                            bool_t is_default)
+{
+    int i;
+    int rc;
+
+    if ( is_default )
+        goto done;
+
+    for ( i = 0; i < NR_IO_RANGE_TYPES; i++ ) {
+        char *name;
+
+        rc = asprintf(&name, "ioreq_server %d %s", s->id,
+                      (i == HVMOP_IO_RANGE_PORT) ? "port" :
+                      (i == HVMOP_IO_RANGE_MEMORY) ? "memory" :
+                      (i == HVMOP_IO_RANGE_PCI) ? "pci" :
+                      "");
+        if ( rc )
+            goto fail;
+
+        s->range[i] = rangeset_new(s->domain, name,
+                                   RANGESETF_prettyprint_hex);
+
+        xfree(name);
+
+        rc = -ENOMEM;
+        if ( !s->range[i] )
+            goto fail;
+
+        rangeset_limit(s->range[i], MAX_NR_IO_RANGES);
+    }
+
+ done:
+    return 0;
+
+ fail:
+    while ( --i >= 0 )
+        rangeset_destroy(s->range[i]);
+
+    return rc;
+}
+
+static void hvm_ioreq_server_free_rangesets(struct hvm_ioreq_server *s, 
+                                            bool_t is_default)
+{
+    unsigned int i;
+
+    if ( is_default )
+        return;
+
+    for ( i = 0; i < NR_IO_RANGE_TYPES; i++ )
+        rangeset_destroy(s->range[i]);
 }
 
 static int hvm_ioreq_server_init(struct hvm_ioreq_server *s, struct domain *d,
-                                 domid_t domid)
+                                 domid_t domid, bool_t is_default,
+                                 ioservid_t id)
 {
     struct vcpu *v;
     int rc;
 
+    s->id = id;
     s->domain = d;
     s->domid = domid;
 
@@ -720,33 +868,70 @@ static int hvm_ioreq_server_init(struct hvm_ioreq_server *s, struct domain *d,
     INIT_LIST_HEAD(&s->ioreq_vcpu_list);
     spin_lock_init(&s->bufioreq_lock);
 
-    rc = hvm_ioreq_server_map_pages(s);
+    rc = hvm_ioreq_server_alloc_rangesets(s, is_default);
     if ( rc )
-        return rc;
+        goto fail1;
+
+    rc = hvm_ioreq_server_map_pages(s, is_default);
+    if ( rc )
+        goto fail2;
 
     for_each_vcpu ( d, v )
     {
-        rc = hvm_ioreq_server_add_vcpu(s, v);
+        rc = hvm_ioreq_server_add_vcpu(s, is_default, v);
         if ( rc )
-            goto fail;
+            goto fail3;
     }
 
     return 0;
 
- fail:
+ fail3:
     hvm_ioreq_server_remove_all_vcpus(s);
-    hvm_ioreq_server_unmap_pages(s);
+    hvm_ioreq_server_unmap_pages(s, is_default);
+
+ fail2:
+    hvm_ioreq_server_free_rangesets(s, is_default);
 
+ fail1:
+    spin_unlock(&d->arch.hvm_domain.ioreq_server.lock);
     return rc;
 }
 
-static void hvm_ioreq_server_deinit(struct hvm_ioreq_server *s)
+static void hvm_ioreq_server_deinit(struct hvm_ioreq_server *s,
+                                    bool_t is_default)
 {
     hvm_ioreq_server_remove_all_vcpus(s);
-    hvm_ioreq_server_unmap_pages(s);
+    hvm_ioreq_server_unmap_pages(s, is_default);
+    hvm_ioreq_server_free_rangesets(s, is_default);
 }
 
-static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
+static ioservid_t next_ioservid(struct domain *d)
+{
+    struct hvm_ioreq_server *s;
+    static ioservid_t id;
+
+    ASSERT(spin_is_locked(&d->arch.hvm_domain.ioreq_server.lock));
+
+    id++;
+
+ again:
+    /* Check for uniqueness */
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server.list,
+                          list_entry )
+    {
+        if (id == s->id)
+        {
+            id++;
+            goto again;
+        }
+    }
+
+    return id;
+}
+
+static int hvm_create_ioreq_server(struct domain *d, domid_t domid,
+                                   bool_t is_default, ioservid_t *id)
 {
     struct hvm_ioreq_server *s;
     int rc;
@@ -757,25 +942,35 @@ static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
         goto fail1;
 
     domain_pause(d);
-    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+    spin_lock(&d->arch.hvm_domain.ioreq_server.lock);
 
     rc = -EEXIST;
-    if ( d->arch.hvm_domain.ioreq_server != NULL )
+    if ( is_default && d->arch.hvm_domain.default_ioreq_server != NULL )
         goto fail2;
 
-    rc = hvm_ioreq_server_init(s, d, domid);
+    rc = hvm_ioreq_server_init(s, d, domid, is_default,
+                               next_ioservid(d));
     if ( rc )
-        goto fail2;
+        goto fail3;
+
+    list_add(&s->list_entry,
+             &d->arch.hvm_domain.ioreq_server.list);
+    d->arch.hvm_domain.ioreq_server.count++;
 
-    d->arch.hvm_domain.ioreq_server = s;
+    if ( is_default )
+        d->arch.hvm_domain.default_ioreq_server = s;
 
-    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+    if (id != NULL)
+        *id = s->id;
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server.lock);
     domain_unpause(d);
 
     return 0;
 
+ fail3:
  fail2:
-    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+    spin_unlock(&d->arch.hvm_domain.ioreq_server.lock);
     domain_unpause(d);
 
     xfree(s);
@@ -783,24 +978,248 @@ static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
     return rc;
 }
 
-static void hvm_destroy_ioreq_server(struct domain *d)
+static int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id)
 {
     struct hvm_ioreq_server *s;
+    int rc;
 
-    domain_pause(d);
-    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+    spin_lock(&d->arch.hvm_domain.ioreq_server.lock);
 
-    s = d->arch.hvm_domain.ioreq_server;
-    if ( s )
+    rc = -ENOENT;
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server.list,
+                          list_entry )
     {
-        d->arch.hvm_domain.ioreq_server = NULL;
-        hvm_ioreq_server_deinit(s);
+        if ( s == d->arch.hvm_domain.default_ioreq_server )
+            continue;
+
+        if ( s->id != id )
+            continue;
+
+        domain_pause(d);
+
+        --d->arch.hvm_domain.ioreq_server.count;
+        list_del(&s->list_entry);
+        
+        hvm_ioreq_server_deinit(s, 0);
+
+        domain_unpause(d);
+
+        xfree(s);
+
+        rc = 0;
+        break;
     }
 
-    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
-    domain_unpause(d);
+    spin_unlock(&d->arch.hvm_domain.ioreq_server.lock);
 
-    xfree(s);
+    return rc;
+}
+
+static int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id,
+                                     unsigned long *ioreq_pfn,
+                                     unsigned long *bufioreq_pfn,
+                                     evtchn_port_t *bufioreq_port)
+{
+    struct hvm_ioreq_server *s;
+    int rc;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server.lock);
+
+    rc = -ENOENT;
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server.list,
+                          list_entry )
+    {
+        if ( s == d->arch.hvm_domain.default_ioreq_server )
+            continue;
+
+        if ( s->id != id )
+            continue;
+
+        *ioreq_pfn = s->ioreq.gmfn;
+        *bufioreq_pfn = s->bufioreq.gmfn;
+        *bufioreq_port = s->bufioreq_evtchn;
+
+        rc = 0;
+        break;
+    }
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server.lock);
+
+    return rc;
+}
+
+static int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
+                                            uint32_t type, uint64_t start, uint64_t end)
+{
+    struct hvm_ioreq_server *s;
+    int rc;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server.lock);
+
+    rc = -ENOENT;
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server.list,
+                          list_entry )
+    {
+        if ( s == d->arch.hvm_domain.default_ioreq_server )
+            continue;
+
+        if ( s->id == id )
+        {
+            struct rangeset *r;
+
+            switch ( type )
+            {
+            case HVMOP_IO_RANGE_PORT:
+            case HVMOP_IO_RANGE_MEMORY:
+            case HVMOP_IO_RANGE_PCI:
+                r = s->range[type];
+                break;
+
+            default:
+                r = NULL;
+                break;
+            }
+
+            rc = -EINVAL;
+            if ( !r )
+                break;
+
+            rc = rangeset_add_range(r, start, end);
+            break;
+        }
+    }
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server.lock);
+
+    return rc;
+}
+
+static int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
+                                                uint32_t type, uint64_t start, uint64_t end)
+{
+    struct hvm_ioreq_server *s;
+    int rc;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server.lock);
+
+    rc = -ENOENT;
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server.list,
+                          list_entry )
+    {
+        if ( s == d->arch.hvm_domain.default_ioreq_server )
+            continue;
+
+        if ( s->id == id )
+        {
+            struct rangeset *r;
+
+            switch ( type )
+            {
+            case HVMOP_IO_RANGE_PORT:
+            case HVMOP_IO_RANGE_MEMORY:
+            case HVMOP_IO_RANGE_PCI:
+                r = s->range[type];
+                break;
+
+            default:
+                r = NULL;
+                break;
+            }
+
+            rc = -EINVAL;
+            if ( !r )
+                break;
+
+            rc = rangeset_remove_range(r, start, end);
+            break;
+        }
+    }
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server.lock);
+
+    return rc;
+}
+
+static int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v)
+{
+    struct hvm_ioreq_server *s;
+    int rc;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server.lock);
+
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server.list,
+                          list_entry )
+    {
+        bool_t is_default = ( s == d->arch.hvm_domain.default_ioreq_server);
+
+        rc = hvm_ioreq_server_add_vcpu(s, is_default, v);
+        if ( rc )
+            goto fail;
+    }
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server.lock);
+
+    return 0;
+
+ fail:
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server.list,
+                          list_entry )
+        hvm_ioreq_server_remove_vcpu(s, v);
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server.lock);
+
+    return rc;
+}
+
+static void hvm_all_ioreq_servers_remove_vcpu(struct domain *d, struct vcpu *v)
+{
+    struct hvm_ioreq_server *s;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server.lock);
+
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server.list,
+                          list_entry )
+        hvm_ioreq_server_remove_vcpu(s, v);
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server.lock);
+}
+
+static void hvm_destroy_all_ioreq_servers(struct domain *d)
+{
+    struct hvm_ioreq_server *s, *next;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server.lock);
+
+    list_for_each_entry_safe ( s,
+                               next,
+                               &d->arch.hvm_domain.ioreq_server.list,
+                               list_entry )
+    {
+        bool_t is_default = ( s == d->arch.hvm_domain.default_ioreq_server);
+
+        domain_pause(d);
+
+        if ( is_default )
+            d->arch.hvm_domain.default_ioreq_server = NULL;
+
+        --d->arch.hvm_domain.ioreq_server.count;
+        list_del(&s->list_entry);
+        
+        hvm_ioreq_server_deinit(s, is_default);
+
+        domain_unpause(d);
+
+        xfree(s);
+    }
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server.lock);
 }
 
 static int hvm_replace_event_channel(struct vcpu *v, domid_t remote_domid,
@@ -823,14 +1242,14 @@ static int hvm_set_dm_domain(struct domain *d, domid_t domid)
     struct hvm_ioreq_server *s;
     int rc = 0;
 
-    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+    spin_lock(&d->arch.hvm_domain.ioreq_server.lock);
 
     /*
      * Lack of ioreq server is not a failure. HVM_PARAM_DM_DOMAIN will
      * still be set and thus, when the server is created, it will have
      * the correct domid.
      */
-    s = d->arch.hvm_domain.ioreq_server;
+    s = d->arch.hvm_domain.default_ioreq_server;
     if ( !s )
         goto done;
 
@@ -871,7 +1290,7 @@ static int hvm_set_dm_domain(struct domain *d, domid_t domid)
     domain_unpause(d);
 
  done:
-    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+    spin_unlock(&d->arch.hvm_domain.ioreq_server.lock);
     return rc;
 }
 
@@ -902,7 +1321,8 @@ int hvm_domain_initialise(struct domain *d)
 
     }
 
-    spin_lock_init(&d->arch.hvm_domain.ioreq_server_lock);
+    spin_lock_init(&d->arch.hvm_domain.ioreq_server.lock);
+    INIT_LIST_HEAD(&d->arch.hvm_domain.ioreq_server.list);
     spin_lock_init(&d->arch.hvm_domain.irq_lock);
     spin_lock_init(&d->arch.hvm_domain.uc_lock);
 
@@ -944,6 +1364,7 @@ int hvm_domain_initialise(struct domain *d)
     rtc_init(d);
 
     register_portio_handler(d, 0xe9, 1, hvm_print_line);
+    register_portio_handler(d, 0xcf8, 4, hvm_access_cf8);
 
     rc = hvm_funcs.domain_initialise(d);
     if ( rc != 0 )
@@ -974,7 +1395,7 @@ void hvm_domain_relinquish_resources(struct domain *d)
     if ( hvm_funcs.nhvm_domain_relinquish_resources )
         hvm_funcs.nhvm_domain_relinquish_resources(d);
 
-    hvm_destroy_ioreq_server(d);
+    hvm_destroy_all_ioreq_servers(d);
 
     msixtbl_pt_cleanup(d);
 
@@ -1607,7 +2028,6 @@ int hvm_vcpu_initialise(struct vcpu *v)
 {
     int rc;
     struct domain *d = v->domain;
-    struct hvm_ioreq_server *s;
 
     hvm_asid_flush_vcpu(v);
 
@@ -1650,14 +2070,7 @@ int hvm_vcpu_initialise(struct vcpu *v)
          && (rc = nestedhvm_vcpu_initialise(v)) < 0 ) /* teardown: nestedhvm_vcpu_destroy */
         goto fail5;
 
-    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
-
-    s = d->arch.hvm_domain.ioreq_server;
-    if ( s )
-        rc = hvm_ioreq_server_add_vcpu(s, v);
-
-    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
-
+    rc = hvm_all_ioreq_servers_add_vcpu(d, v);
     if ( rc != 0 )
         goto fail6;
 
@@ -1694,15 +2107,8 @@ int hvm_vcpu_initialise(struct vcpu *v)
 void hvm_vcpu_destroy(struct vcpu *v)
 {
     struct domain *d = v->domain;
-    struct hvm_ioreq_server *s;
 
-    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
-
-    s = d->arch.hvm_domain.ioreq_server;
-    if ( s )
-        hvm_ioreq_server_remove_vcpu(s, v);
-
-    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+    hvm_all_ioreq_servers_remove_vcpu(d, v);
 
     nestedhvm_vcpu_destroy(v);
 
@@ -1741,11 +2147,95 @@ void hvm_vcpu_down(struct vcpu *v)
     }
 }
 
+static struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
+                                                        ioreq_t *p)
+{
+#define CF8_BDF(cf8)     (((cf8) & 0x00ffff00) >> 8)
+#define CF8_ADDR_LO(cf8) ((cf8) & 0x000000fc)
+#define CF8_ADDR_HI(cf8) (((cf8) & 0x0f000000) >> 16)
+#define CF8_ENABLED(cf8) (!!((cf8) & 0x80000000))
+
+    struct hvm_ioreq_server *s;
+    uint32_t cf8;
+    uint8_t type;
+    uint64_t addr;
+
+    if ( d->arch.hvm_domain.ioreq_server.count <= 1 ||
+         (p->type != IOREQ_TYPE_COPY && p->type != IOREQ_TYPE_PIO) )
+        return d->arch.hvm_domain.default_ioreq_server;
+
+    cf8 = d->arch.hvm_domain.pci_cf8;
+        
+    if ( p->type == IOREQ_TYPE_PIO &&
+         (p->addr & ~3) == 0xcfc &&
+         CF8_ENABLED(cf8) )
+    {
+        uint32_t sbdf;
+
+        /* PCI config data cycle */
+
+        sbdf = HVMOP_PCI_SBDF(0,
+                              PCI_BUS(CF8_BDF(cf8)),
+                              PCI_SLOT(CF8_BDF(cf8)),
+                              PCI_FUNC(CF8_BDF(cf8)));
+
+        type = IOREQ_TYPE_PCI_CONFIG;
+        addr = ((uint64_t)sbdf << 32) |
+               CF8_ADDR_HI(cf8) |
+               CF8_ADDR_LO(cf8) |
+               (p->addr & 3);
+    }
+    else
+    {
+        type = p->type;
+        addr = p->addr;
+    }
+
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server.list,
+                          list_entry )
+    {
+        struct rangeset *r;
+
+        if ( s == d->arch.hvm_domain.default_ioreq_server )
+            continue;
+
+        BUILD_BUG_ON(IOREQ_TYPE_PIO != HVMOP_IO_RANGE_PORT);
+        BUILD_BUG_ON(IOREQ_TYPE_COPY != HVMOP_IO_RANGE_MEMORY);
+        BUILD_BUG_ON(IOREQ_TYPE_PCI_CONFIG != HVMOP_IO_RANGE_PCI);
+        r = s->range[type];
+
+        switch ( type )
+        {
+        case IOREQ_TYPE_PIO:
+        case IOREQ_TYPE_COPY:
+            if ( rangeset_contains_singleton(r, addr) )
+                return s;
+
+            break;
+        case IOREQ_TYPE_PCI_CONFIG:
+            if ( rangeset_contains_singleton(r, addr >> 32) ) {
+                p->type = type;
+                p->addr = addr;
+                return s;
+            }
+
+            break;
+        }
+    }
+
+    return d->arch.hvm_domain.default_ioreq_server;
+
+#undef CF8_ADDR_ENABLED
+#undef CF8_ADDR_HI
+#undef CF8_ADDR_LO
+#undef CF8_BDF
+}
+
 int hvm_buffered_io_send(ioreq_t *p)
 {
-    struct vcpu *v = current;
-    struct domain *d = v->domain;
-    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+    struct domain *d = current->domain;
+    struct hvm_ioreq_server *s = hvm_select_ioreq_server(d, p);
     struct hvm_ioreq_page *iorp;
     buffered_iopage_t *pg;
     buf_ioreq_t bp = { .data = p->data,
@@ -1825,23 +2315,20 @@ int hvm_buffered_io_send(ioreq_t *p)
 
 bool_t hvm_has_dm(struct domain *d)
 {
-    return !!d->arch.hvm_domain.ioreq_server;
+    return !list_empty(&d->arch.hvm_domain.ioreq_server.list);
 }
 
-bool_t hvm_send_assist_req(ioreq_t *proto_p)
+bool_t hvm_send_assist_req_to_ioreq_server(struct hvm_ioreq_server *s,
+                                           ioreq_t *proto_p)
 {
-    struct vcpu *v = current;
-    struct domain *d = v->domain;
-    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+    struct vcpu *curr = current;
+    struct domain *d = curr->domain;
     ioreq_t *p;
 
-    if ( unlikely(!vcpu_start_shutdown_deferral(v)) )
+    if ( unlikely(!vcpu_start_shutdown_deferral(curr)) )
         return 0; /* implicitly bins the i/o operation */
 
-    if ( !s )
-        return 0;
-
-    p = get_ioreq(s, v);
+    p = get_ioreq(s, curr);
 
     if ( unlikely(p->state != STATE_IOREQ_NONE) )
     {
@@ -1867,6 +2354,29 @@ bool_t hvm_send_assist_req(ioreq_t *proto_p)
     return 1;
 }
 
+bool_t hvm_send_assist_req(ioreq_t *p)
+{
+    struct hvm_ioreq_server *s = hvm_select_ioreq_server(current->domain, p);
+
+    if ( !s )
+        return 0;
+
+    return hvm_send_assist_req_to_ioreq_server(s, p);
+}
+
+void hvm_broadcast_assist_req(ioreq_t *p)
+{
+    struct domain *d = current->domain;
+    struct hvm_ioreq_server *s;
+
+    ASSERT(p->type == IOREQ_TYPE_INVALIDATE);
+
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server.list,
+                          list_entry )
+        (void) hvm_send_assist_req_to_ioreq_server(s, p);
+}
+
 void hvm_hlt(unsigned long rflags)
 {
     struct vcpu *curr = current;
@@ -4477,6 +4987,165 @@ static int hvmop_flush_tlb_all(void)
     return 0;
 }
 
+static int hvmop_create_ioreq_server(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_create_ioreq_server_t) uop)
+{
+    struct domain *curr_d = current->domain;
+    xen_hvm_create_ioreq_server_t op;
+    struct domain *d;
+    int rc;
+
+    if ( copy_from_guest(&op, uop, 1) )
+        return -EFAULT;
+
+    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
+    if ( rc != 0 )
+        return rc;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    rc = xsm_hvm_ioreq_server(XSM_DM_PRIV, d);
+    if ( rc != 0 )
+        goto out;
+
+    rc = hvm_create_ioreq_server(d, curr_d->domain_id, 0, &op.id);
+    if ( rc != 0 )
+        goto out;
+
+    rc = copy_to_guest(uop, &op, 1) ? -EFAULT : 0;
+    
+ out:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
+static int hvmop_get_ioreq_server_info(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_get_ioreq_server_info_t) uop)
+{
+    xen_hvm_get_ioreq_server_info_t op;
+    struct domain *d;
+    int rc;
+
+    if ( copy_from_guest(&op, uop, 1) )
+        return -EFAULT;
+
+    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
+    if ( rc != 0 )
+        return rc;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    rc = xsm_hvm_ioreq_server(XSM_DM_PRIV, d);
+    if ( rc != 0 )
+        goto out;
+
+    rc = hvm_get_ioreq_server_info(d, op.id,
+                                   &op.ioreq_pfn,
+                                   &op.bufioreq_pfn, 
+                                   &op.bufioreq_port);
+    if ( rc != 0 )
+        goto out;
+
+    rc = copy_to_guest(uop, &op, 1) ? -EFAULT : 0;
+    
+ out:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
+static int hvmop_map_io_range_to_ioreq_server(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_io_range_t) uop)
+{
+    xen_hvm_io_range_t op;
+    struct domain *d;
+    int rc;
+
+    if ( copy_from_guest(&op, uop, 1) )
+        return -EFAULT;
+
+    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
+    if ( rc != 0 )
+        return rc;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    rc = xsm_hvm_ioreq_server(XSM_DM_PRIV, d);
+    if ( rc != 0 )
+        goto out;
+
+    rc = hvm_map_io_range_to_ioreq_server(d, op.id, op.type,
+                                          op.start, op.end);
+
+ out:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
+static int hvmop_unmap_io_range_from_ioreq_server(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_io_range_t) uop)
+{
+    xen_hvm_io_range_t op;
+    struct domain *d;
+    int rc;
+
+    if ( copy_from_guest(&op, uop, 1) )
+        return -EFAULT;
+
+    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
+    if ( rc != 0 )
+        return rc;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    rc = xsm_hvm_ioreq_server(XSM_DM_PRIV, d);
+    if ( rc != 0 )
+        goto out;
+
+    rc = hvm_unmap_io_range_from_ioreq_server(d, op.id, op.type,
+                                              op.start, op.end);
+    
+ out:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
+static int hvmop_destroy_ioreq_server(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_destroy_ioreq_server_t) uop)
+{
+    xen_hvm_destroy_ioreq_server_t op;
+    struct domain *d;
+    int rc;
+
+    if ( copy_from_guest(&op, uop, 1) )
+        return -EFAULT;
+
+    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
+    if ( rc != 0 )
+        return rc;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    rc = xsm_hvm_ioreq_server(XSM_DM_PRIV, d);
+    if ( rc != 0 )
+        goto out;
+
+    rc = hvm_destroy_ioreq_server(d, op.id);
+
+ out:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
 #define HVMOP_op_mask 0xff
 
 long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
@@ -4488,6 +5157,31 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
 
     switch ( op &= HVMOP_op_mask )
     {
+    case HVMOP_create_ioreq_server:
+        rc = hvmop_create_ioreq_server(
+            guest_handle_cast(arg, xen_hvm_create_ioreq_server_t));
+        break;
+    
+    case HVMOP_get_ioreq_server_info:
+        rc = hvmop_get_ioreq_server_info(
+            guest_handle_cast(arg, xen_hvm_get_ioreq_server_info_t));
+        break;
+    
+    case HVMOP_map_io_range_to_ioreq_server:
+        rc = hvmop_map_io_range_to_ioreq_server(
+            guest_handle_cast(arg, xen_hvm_io_range_t));
+        break;
+    
+    case HVMOP_unmap_io_range_from_ioreq_server:
+        rc = hvmop_unmap_io_range_from_ioreq_server(
+            guest_handle_cast(arg, xen_hvm_io_range_t));
+        break;
+    
+    case HVMOP_destroy_ioreq_server:
+        rc = hvmop_destroy_ioreq_server(
+            guest_handle_cast(arg, xen_hvm_destroy_ioreq_server_t));
+        break;
+    
     case HVMOP_set_param:
     case HVMOP_get_param:
     {
@@ -4641,6 +5335,25 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
                 if ( a.value > SHUTDOWN_MAX )
                     rc = -EINVAL;
                 break;
+            case HVM_PARAM_IOREQ_SERVER_PFN:
+                if ( d == current->domain ) {
+                    rc = -EPERM;
+                    break;
+                }
+                d->arch.hvm_domain.ioreq_gmfn.base = a.value;
+                break;
+            case HVM_PARAM_NR_IOREQ_SERVER_PAGES:
+                if ( d == current->domain ) {
+                    rc = -EPERM;
+                    break;
+                }
+                if ( a.value == 0 ||
+                     a.value > sizeof(unsigned long) * 8 ) {
+                    rc = -EINVAL;
+                    break;
+                }
+                d->arch.hvm_domain.ioreq_gmfn.count = a.value;
+                break;
             }
 
             if ( rc == 0 ) 
@@ -4674,6 +5387,12 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
             case HVM_PARAM_ACPI_S_STATE:
                 a.value = d->arch.hvm_domain.is_s3_suspended ? 3 : 0;
                 break;
+            case HVM_PARAM_IOREQ_SERVER_PFN:
+            case HVM_PARAM_NR_IOREQ_SERVER_PAGES:
+                if ( d == current->domain ) {
+                    rc = -EPERM;
+                    break;
+                }
             case HVM_PARAM_IOREQ_PFN:
             case HVM_PARAM_BUFIOREQ_PFN:
             case HVM_PARAM_BUFIOREQ_EVTCHN: {
@@ -4681,7 +5400,7 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
                 
                 /* May need to create server */
                 domid = d->arch.hvm_domain.params[HVM_PARAM_DM_DOMAIN];
-                rc = hvm_create_ioreq_server(d, domid);
+                rc = hvm_create_ioreq_server(d, domid, 1, NULL);
                 if ( rc != 0 && rc != -EEXIST )
                     goto param_fail;
                 /*FALLTHRU*/
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index f5ad9be..bf68837 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -74,7 +74,7 @@ void send_invalidate_req(void)
         .data = ~0UL, /* flush all */
     };
 
-    (void)hvm_send_assist_req(&p);
+    hvm_broadcast_assist_req(&p);
 }
 
 int handle_mmio(void)
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index 1b0514c..a3e2727 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -34,8 +34,10 @@
 #include <public/grant_table.h>
 #include <public/hvm/params.h>
 #include <public/hvm/save.h>
+#include <public/hvm/hvm_op.h>
 
 struct hvm_ioreq_page {
+    unsigned long gmfn;
     struct page_info *page;
     void *va;
 };
@@ -46,7 +48,11 @@ struct hvm_ioreq_vcpu {
     evtchn_port_t    ioreq_evtchn;
 };
 
+#define NR_IO_RANGE_TYPES (HVMOP_IO_RANGE_PCI + 1)
+#define MAX_NR_IO_RANGES  256
+
 struct hvm_ioreq_server {
+    struct list_head       list_entry;
     struct domain          *domain;
 
     /* Lock to serialize toolstack modifications */
@@ -54,6 +60,7 @@ struct hvm_ioreq_server {
 
     /* Domain id of emulating domain */
     domid_t                domid;
+    ioservid_t             id;
     struct hvm_ioreq_page  ioreq;
     struct list_head       ioreq_vcpu_list;
     struct hvm_ioreq_page  bufioreq;
@@ -61,11 +68,27 @@ struct hvm_ioreq_server {
     /* Lock to serialize access to buffered ioreq ring */
     spinlock_t             bufioreq_lock;
     evtchn_port_t          bufioreq_evtchn;
+    struct rangeset        *range[NR_IO_RANGE_TYPES];
 };
 
 struct hvm_domain {
-    spinlock_t              ioreq_server_lock;
-    struct hvm_ioreq_server *ioreq_server;
+    /* Guest page range used for non-default ioreq servers */
+    struct {
+        unsigned long base;
+        unsigned long mask;
+        unsigned int  count;
+    } ioreq_gmfn;
+
+    /* Lock protects all other values in the sub-struct and the default */
+    struct {
+        spinlock_t       lock;
+        unsigned int     count;
+        struct list_head list;
+    } ioreq_server;
+    struct hvm_ioreq_server *default_ioreq_server;
+
+    /* Cached CF8 for guest PCI config cycles */
+    uint32_t                pci_cf8;
 
     struct pl_time         pl_time;
 
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 251625d..f97e594 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -233,6 +233,7 @@ int prepare_ring_for_helper(struct domain *d, unsigned long gmfn,
 void destroy_ring_for_helper(void **_va, struct page_info *page);
 
 bool_t hvm_send_assist_req(ioreq_t *p);
+void hvm_broadcast_assist_req(ioreq_t *p);
 
 void hvm_get_guest_pat(struct vcpu *v, u64 *guest_pat);
 int hvm_set_guest_pat(struct vcpu *v, u64 guest_pat);
diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
index f00f6d2..a335235 100644
--- a/xen/include/public/hvm/hvm_op.h
+++ b/xen/include/public/hvm/hvm_op.h
@@ -23,6 +23,7 @@
 
 #include "../xen.h"
 #include "../trace.h"
+#include "../event_channel.h"
 
 /* Get/set subcommands: extra argument == pointer to xen_hvm_param struct. */
 #define HVMOP_set_param           0
@@ -232,6 +233,122 @@ struct xen_hvm_inject_msi {
 typedef struct xen_hvm_inject_msi xen_hvm_inject_msi_t;
 DEFINE_XEN_GUEST_HANDLE(xen_hvm_inject_msi_t);
 
+/*
+ * IOREQ Servers
+ *
+ * The interface between an I/O emulator an Xen is called an IOREQ Server.
+ * A domain supports a single 'legacy' IOREQ Server which is instantiated if
+ * parameter...
+ *
+ * HVM_PARAM_IOREQ_PFN is read (to get the gmfn containing the synchronous
+ * ioreq structures), or...
+ * HVM_PARAM_BUFIOREQ_PFN is read (to get the gmfn containing the buffered
+ * ioreq ring), or...
+ * HVM_PARAM_BUFIOREQ_EVTCHN is read (to get the event channel that Xen uses
+ * to request buffered I/O emulation).
+ * 
+ * The following hypercalls facilitate the creation of IOREQ Servers for
+ * 'secondary' emulators which are invoked to implement port I/O, memory, or
+ * PCI config space ranges which they explicitly register.
+ */
+
+typedef uint16_t ioservid_t;
+
+/*
+ * HVMOP_create_ioreq_server: Instantiate a new IOREQ Server for a secondary
+ *                            emulator servicing domain <domid>.
+ *
+ * The <id> handed back is unique for <domid>.
+ */
+#define HVMOP_create_ioreq_server 17
+struct xen_hvm_create_ioreq_server {
+    domid_t domid; /* IN - domain to be serviced */
+    ioservid_t id; /* OUT - server id */
+};
+typedef struct xen_hvm_create_ioreq_server xen_hvm_create_ioreq_server_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_create_ioreq_server_t);
+
+/*
+ * HVMOP_get_ioreq_server_info: Get all the information necessary to access
+ *                              IOREQ Server <id>. 
+ *
+ * The emulator needs to map the synchronous ioreq structures and buffered
+ * ioreq ring that Xen uses to request emulation. These are hosted in domain
+ * <domid>'s gmfns <ioreq_pfn> and <bufioreq_pfn> respectively. In addition the
+ * emulator needs to bind to event channel <bufioreq_port> to listen for
+ * buffered emulation requests. (The event channels used for synchronous
+ * emulation requests are specified in the per-CPU ioreq structures in
+ * <ioreq_pfn>).
+ */
+#define HVMOP_get_ioreq_server_info 18
+struct xen_hvm_get_ioreq_server_info {
+    domid_t domid;               /* IN - domain to be serviced */
+    ioservid_t id;               /* IN - server id */
+    evtchn_port_t bufioreq_port; /* OUT - buffered ioreq port */
+    xen_pfn_t ioreq_pfn;         /* OUT - sync ioreq pfn */
+    xen_pfn_t bufioreq_pfn;      /* OUT - buffered ioreq pfn */
+};
+typedef struct xen_hvm_get_ioreq_server_info xen_hvm_get_ioreq_server_info_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_get_ioreq_server_info_t);
+
+/*
+ * HVM_map_io_range_to_ioreq_server: Register an I/O range of domain <domid>
+ *                                   for emulation by the client of IOREQ
+ *                                   Server <id>
+ * HVM_unmap_io_range_from_ioreq_server: Deregister an I/O range of <domid>
+ *                                       for emulation by the client of IOREQ
+ *                                       Server <id>
+ *
+ * There are three types of I/O that can be emulated: port I/O, memory accesses
+ * and PCI config space accesses. The <type> field denotes which type of range
+ * the <start> and <end> (inclusive) fields are specifying.
+ * PCI config space ranges are specified by segment/bus/device/function values
+ * which should be encoded using the HVMOP_PCI_SBDF helper macro below.
+ */
+#define HVMOP_map_io_range_to_ioreq_server 19
+#define HVMOP_unmap_io_range_from_ioreq_server 20
+struct xen_hvm_io_range {
+    domid_t domid;               /* IN - domain to be serviced */
+    ioservid_t id;               /* IN - server id */
+    uint32_t type;               /* IN - type of range */
+# define HVMOP_IO_RANGE_PORT   0 /* I/O port range */
+# define HVMOP_IO_RANGE_MEMORY 1 /* MMIO range */
+# define HVMOP_IO_RANGE_PCI    2 /* PCI segment/bus/dev/func range */
+    uint64_aligned_t start, end; /* IN - inclusive start and end of range */
+};
+typedef struct xen_hvm_io_range xen_hvm_io_range_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_io_range_t);
+
+#define HVMOP_PCI_SBDF(s,b,d,f)                 \
+	((((s) & 0xffff) << 16) |                   \
+	 (((b) & 0xff) << 8) |                      \
+	 (((d) & 0x1f) << 3) |                      \
+	 ((f) & 0x07))
+
+/*
+ * HVMOP_destroy_ioreq_server: Destroy the IOREQ Server <id> servicing domain
+ *                             <domid>.
+ *
+ * Any registered I/O ranges will be automatically deregistered.
+ */
+#define HVMOP_destroy_ioreq_server 21
+struct xen_hvm_destroy_ioreq_server {
+    domid_t domid; /* IN - domain to be serviced */
+    ioservid_t id; /* IN - server id */
+};
+typedef struct xen_hvm_destroy_ioreq_server xen_hvm_destroy_ioreq_server_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_destroy_ioreq_server_t);
+
 #endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
 
 #endif /* __XEN_PUBLIC_HVM_HVM_OP_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/public/hvm/ioreq.h b/xen/include/public/hvm/ioreq.h
index f05d130..5b5fedf 100644
--- a/xen/include/public/hvm/ioreq.h
+++ b/xen/include/public/hvm/ioreq.h
@@ -34,13 +34,20 @@
 
 #define IOREQ_TYPE_PIO          0 /* pio */
 #define IOREQ_TYPE_COPY         1 /* mmio ops */
+#define IOREQ_TYPE_PCI_CONFIG   2
 #define IOREQ_TYPE_TIMEOFFSET   7
 #define IOREQ_TYPE_INVALIDATE   8 /* mapcache */
 
 /*
  * VMExit dispatcher should cooperate with instruction decoder to
  * prepare this structure and notify service OS and DM by sending
- * virq
+ * virq.
+ *
+ * For I/O type IOREQ_TYPE_PCI_CONFIG, the physical address is formatted
+ * as follows:
+ * 
+ * 63....48|47..40|39..35|34..32|31........0
+ * SEGMENT |BUS   |DEV   |FN    |OFFSET
  */
 struct ioreq {
     uint64_t addr;          /* physical address */
diff --git a/xen/include/public/hvm/params.h b/xen/include/public/hvm/params.h
index 517a184..f830bdd 100644
--- a/xen/include/public/hvm/params.h
+++ b/xen/include/public/hvm/params.h
@@ -145,6 +145,9 @@
 /* SHUTDOWN_* action in case of a triple fault */
 #define HVM_PARAM_TRIPLE_FAULT_REASON 31
 
-#define HVM_NR_PARAMS          32
+#define HVM_PARAM_IOREQ_SERVER_PFN 32
+#define HVM_PARAM_NR_IOREQ_SERVER_PAGES 33
+
+#define HVM_NR_PARAMS          34
 
 #endif /* __XEN_PUBLIC_HVM_PARAMS_H__ */
diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
index 5de4ad4..510be41 100644
--- a/xen/include/xsm/dummy.h
+++ b/xen/include/xsm/dummy.h
@@ -538,6 +538,12 @@ static XSM_INLINE int xsm_hvm_inject_msi(XSM_DEFAULT_ARG struct domain *d)
     return xsm_default_action(action, current->domain, d);
 }
 
+static XSM_INLINE int xsm_hvm_ioreq_server(XSM_DEFAULT_ARG struct domain *d)
+{
+    XSM_ASSERT_ACTION(XSM_DM_PRIV);
+    return xsm_default_action(action, current->domain, d);
+}
+
 static XSM_INLINE int xsm_mem_event_control(XSM_DEFAULT_ARG struct domain *d, int mode, int op)
 {
     XSM_ASSERT_ACTION(XSM_PRIV);
diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
index 0c85ca6..9b2ccef 100644
--- a/xen/include/xsm/xsm.h
+++ b/xen/include/xsm/xsm.h
@@ -146,6 +146,7 @@ struct xsm_operations {
     int (*hvm_set_isa_irq_level) (struct domain *d);
     int (*hvm_set_pci_link_route) (struct domain *d);
     int (*hvm_inject_msi) (struct domain *d);
+    int (*hvm_ioreq_server) (struct domain *d);
     int (*mem_event_control) (struct domain *d, int mode, int op);
     int (*mem_event_op) (struct domain *d, int op);
     int (*mem_sharing_op) (struct domain *d, struct domain *cd, int op);
@@ -558,6 +559,11 @@ static inline int xsm_hvm_inject_msi (xsm_default_t def, struct domain *d)
     return xsm_ops->hvm_inject_msi(d);
 }
 
+static inline int xsm_hvm_ioreq_server (xsm_default_t def, struct domain *d)
+{
+    return xsm_ops->hvm_ioreq_server(d);
+}
+
 static inline int xsm_mem_event_control (xsm_default_t def, struct domain *d, int mode, int op)
 {
     return xsm_ops->mem_event_control(d, mode, op);
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index 3eb6c1e..9203e98 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -1286,6 +1286,11 @@ static int flask_hvm_inject_msi(struct domain *d)
     return current_has_perm(d, SECCLASS_HVM, HVM__SEND_IRQ);
 }
 
+static int flask_hvm_ioreq_server(struct domain *d)
+{
+    return current_has_perm(d, SECCLASS_HVM, HVM__HVMCTL);
+}
+
 static int flask_mem_event_control(struct domain *d, int mode, int op)
 {
     return current_has_perm(d, SECCLASS_HVM, HVM__MEM_EVENT);
@@ -1568,6 +1573,7 @@ static struct xsm_operations flask_ops = {
     .hvm_set_isa_irq_level = flask_hvm_set_isa_irq_level,
     .hvm_set_pci_link_route = flask_hvm_set_pci_link_route,
     .hvm_inject_msi = flask_hvm_inject_msi,
+    .hvm_ioreq_server = flask_hvm_ioreq_server,
     .mem_event_control = flask_mem_event_control,
     .mem_event_op = flask_mem_event_op,
     .mem_sharing_op = flask_mem_sharing_op,
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v7 8/9] ioreq-server: remove p2m entries when server is enabled
  2014-05-09  8:39 [PATCH v7 0/9] Support for running secondary emulators Paul Durrant
                   ` (6 preceding siblings ...)
  2014-05-09  8:40 ` [PATCH v7 7/9] ioreq-server: add support for multiple servers Paul Durrant
@ 2014-05-09  8:40 ` Paul Durrant
  2014-05-09  9:36   ` Ian Campbell
  2014-05-12 11:34   ` Jan Beulich
  2014-05-09  8:40 ` [PATCH v7 9/9] ioreq-server: make buffered ioreq handling optional Paul Durrant
  8 siblings, 2 replies; 39+ messages in thread
From: Paul Durrant @ 2014-05-09  8:40 UTC (permalink / raw)
  To: xen-devel
  Cc: Paul Durrant, Ian Jackson, Ian Campbell, Jan Beulich, Stefano Stabellini

For secondary servers, add a hvm op to enable/disable the server. The
server will not accept IO until it is enabled and the act of enabling
the server removes its pages from the guest p2m, thus preventing the guest
from directly mapping the pages and synthesizing ioreqs.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
---
 tools/libxc/xc_domain.c          |   27 +++++++
 tools/libxc/xenctrl.h            |   18 +++++
 xen/arch/x86/hvm/hvm.c           |  152 +++++++++++++++++++++++++++++++++++++-
 xen/include/asm-x86/hvm/domain.h |    1 +
 xen/include/public/hvm/hvm_op.h  |   19 +++++
 5 files changed, 215 insertions(+), 2 deletions(-)

diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index a8c9f81..17a8417 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1508,6 +1508,33 @@ int xc_hvm_destroy_ioreq_server(xc_interface *xch,
     return rc;
 }
 
+int xc_hvm_set_ioreq_server_state(xc_interface *xch,
+                                  domid_t domid,
+                                  ioservid_t id,
+                                  int enabled)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_set_ioreq_server_state_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_set_ioreq_server_state;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+
+    arg->domid = domid;
+    arg->id = id;
+    arg->enabled = !!enabled;
+
+    rc = do_xen_hypercall(xch, &hypercall);
+
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
 int xc_domain_setdebugging(xc_interface *xch,
                            uint32_t domid,
                            unsigned int enable)
diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
index 3260a56..2045084 100644
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -1829,6 +1829,24 @@ int xc_hvm_get_ioreq_server_info(xc_interface *xch,
                                  evtchn_port_t *bufioreq_port);
 
 /**
+ * This function sets IOREQ Server state. An IOREQ Server
+ * will not be passed emulation requests until it is in
+ * the enabled state.
+ * Note that the contents of the ioreq_pfn and bufioreq_pfn are
+ * not meaningful until the IOREQ Server is in the enabled state.
+ *
+ * @parm xch a handle to an open hypervisor interface.
+ * @parm domid the domain id to be serviced
+ * @parm id the IOREQ Server id.
+ * @parm enabled the state.
+ * @return 0 on success, -1 on failure.
+ */
+int xc_hvm_set_ioreq_server_state(xc_interface *xch,
+                                  domid_t domid,
+                                  ioservid_t id,
+                                  int enabled);
+
+/**
  * This function registers a range of memory or I/O ports for emulation.
  *
  * @parm xch a handle to an open hypervisor interface.
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 58cc478..86daecd 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -554,6 +554,22 @@ static int hvm_map_ioreq_page(
     return 0;
 }
 
+static void hvm_remove_ioreq_gmfn(
+    struct domain *d, struct hvm_ioreq_page *iorp)
+{
+    guest_physmap_remove_page(d, iorp->gmfn, 
+                              page_to_mfn(iorp->page), 0);
+    clear_page(iorp->va);
+}
+
+static int hvm_add_ioreq_gmfn(
+    struct domain *d, struct hvm_ioreq_page *iorp)
+{
+    clear_page(iorp->va);
+    return guest_physmap_add_page(d, iorp->gmfn,
+                                  page_to_mfn(iorp->page), 0);
+}
+
 static int hvm_print_line(
     int dir, uint32_t port, uint32_t bytes, uint32_t *val)
 {
@@ -670,7 +686,8 @@ static int hvm_ioreq_server_add_vcpu(struct hvm_ioreq_server *s,
 
     list_add(&sv->list_entry, &s->ioreq_vcpu_list);
 
-    hvm_update_ioreq_evtchn(s, sv);
+    if ( s->enabled )
+        hvm_update_ioreq_evtchn(s, sv);
 
     spin_unlock(&s->lock);
     return 0;
@@ -853,6 +870,56 @@ static void hvm_ioreq_server_free_rangesets(struct hvm_ioreq_server *s,
         rangeset_destroy(s->range[i]);
 }
 
+static void hvm_ioreq_server_enable(struct hvm_ioreq_server *s,
+                                    bool_t is_default)
+{
+    struct domain *d = s->domain;
+    struct hvm_ioreq_vcpu *sv;
+
+    spin_lock(&s->lock);
+
+    if ( s->enabled )
+        goto done;
+
+    if ( !is_default )
+    {
+        hvm_remove_ioreq_gmfn(d, &s->ioreq);
+        hvm_remove_ioreq_gmfn(d, &s->bufioreq);
+    }
+
+    s->enabled = 1;
+
+    list_for_each_entry ( sv,
+                          &s->ioreq_vcpu_list,
+                          list_entry )
+        hvm_update_ioreq_evtchn(s, sv);
+
+  done:
+    spin_unlock(&s->lock);
+}
+
+static void hvm_ioreq_server_disable(struct hvm_ioreq_server *s,
+                                    bool_t is_default)
+{
+    struct domain *d = s->domain;
+
+    spin_lock(&s->lock);
+
+    if ( !s->enabled )
+        goto done;
+
+    if ( !is_default )
+    {
+        hvm_add_ioreq_gmfn(d, &s->bufioreq);
+        hvm_add_ioreq_gmfn(d, &s->ioreq);
+    }
+
+    s->enabled = 0;
+
+ done:
+    spin_unlock(&s->lock);
+}
+
 static int hvm_ioreq_server_init(struct hvm_ioreq_server *s, struct domain *d,
                                  domid_t domid, bool_t is_default,
                                  ioservid_t id)
@@ -958,7 +1025,10 @@ static int hvm_create_ioreq_server(struct domain *d, domid_t domid,
     d->arch.hvm_domain.ioreq_server.count++;
 
     if ( is_default )
+    {
         d->arch.hvm_domain.default_ioreq_server = s;
+        hvm_ioreq_server_enable(s, 1);
+    }
 
     if (id != NULL)
         *id = s->id;
@@ -1144,6 +1214,45 @@ static int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
     return rc;
 }
 
+static int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id,
+                                      bool_t enabled)
+{
+    struct list_head *entry;
+    int rc;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server.lock);
+
+    rc = -ENOENT;
+    list_for_each ( entry,
+                    &d->arch.hvm_domain.ioreq_server.list )
+    {
+        struct hvm_ioreq_server *s = list_entry(entry,
+                                                struct hvm_ioreq_server,
+                                                list_entry);
+
+        if ( s == d->arch.hvm_domain.default_ioreq_server )
+            continue;
+
+        if ( s->id != id )
+            continue;
+
+        domain_pause(d);
+
+        if ( enabled )
+            hvm_ioreq_server_enable(s, 0);
+        else
+            hvm_ioreq_server_disable(s, 0);
+
+        domain_unpause(d);
+
+        rc = 0;
+        break;
+    }
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server.lock);
+    return rc;
+}
+
 static int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v)
 {
     struct hvm_ioreq_server *s;
@@ -1206,8 +1315,10 @@ static void hvm_destroy_all_ioreq_servers(struct domain *d)
 
         domain_pause(d);
 
-        if ( is_default )
+        if ( is_default ) {
+            hvm_ioreq_server_disable(s, 1);
             d->arch.hvm_domain.default_ioreq_server = NULL;
+        }
 
         --d->arch.hvm_domain.ioreq_server.count;
         list_del(&s->list_entry);
@@ -2200,6 +2311,9 @@ static struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
         if ( s == d->arch.hvm_domain.default_ioreq_server )
             continue;
 
+        if ( !s->enabled )
+            continue;
+
         BUILD_BUG_ON(IOREQ_TYPE_PIO != HVMOP_IO_RANGE_PORT);
         BUILD_BUG_ON(IOREQ_TYPE_COPY != HVMOP_IO_RANGE_MEMORY);
         BUILD_BUG_ON(IOREQ_TYPE_PCI_CONFIG != HVMOP_IO_RANGE_PCI);
@@ -5117,6 +5231,35 @@ static int hvmop_unmap_io_range_from_ioreq_server(
     return rc;
 }
 
+static int hvmop_set_ioreq_server_state(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_set_ioreq_server_state_t) uop)
+{
+    xen_hvm_set_ioreq_server_state_t op;
+    struct domain *d;
+    int rc;
+
+    if ( copy_from_guest(&op, uop, 1) )
+        return -EFAULT;
+
+    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
+    if ( rc != 0 )
+        return rc;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    rc = xsm_hvm_ioreq_server(XSM_DM_PRIV, d);
+    if ( rc != 0 )
+        goto out;
+
+    rc = hvm_set_ioreq_server_state(d, op.id, !!op.enabled);
+
+ out:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
 static int hvmop_destroy_ioreq_server(
     XEN_GUEST_HANDLE_PARAM(xen_hvm_destroy_ioreq_server_t) uop)
 {
@@ -5176,6 +5319,11 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
         rc = hvmop_unmap_io_range_from_ioreq_server(
             guest_handle_cast(arg, xen_hvm_io_range_t));
         break;
+
+    case HVMOP_set_ioreq_server_state:
+        rc = hvmop_set_ioreq_server_state(
+            guest_handle_cast(arg, xen_hvm_set_ioreq_server_state_t));
+        break;
     
     case HVMOP_destroy_ioreq_server:
         rc = hvmop_destroy_ioreq_server(
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index a3e2727..9caeb41 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -69,6 +69,7 @@ struct hvm_ioreq_server {
     spinlock_t             bufioreq_lock;
     evtchn_port_t          bufioreq_evtchn;
     struct rangeset        *range[NR_IO_RANGE_TYPES];
+    bool_t                 enabled;
 };
 
 struct hvm_domain {
diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
index a335235..6cf1844 100644
--- a/xen/include/public/hvm/hvm_op.h
+++ b/xen/include/public/hvm/hvm_op.h
@@ -339,6 +339,25 @@ struct xen_hvm_destroy_ioreq_server {
 typedef struct xen_hvm_destroy_ioreq_server xen_hvm_destroy_ioreq_server_t;
 DEFINE_XEN_GUEST_HANDLE(xen_hvm_destroy_ioreq_server_t);
 
+/*
+ * HVMOP_set_ioreq_server_state: Enable or disable the IOREQ Server <id> servicing
+ *                               domain <domid>.
+ *
+ * The IOREQ Server will not be passed any emulation requests until it is in the
+ * enabled state.
+ * Note that the contents of the ioreq_pfn and bufioreq_fn (see
+ * HVMOP_get_ioreq_server_info) are not meaningful until the IOREQ Server is in
+ * the enabled state.
+ */
+#define HVMOP_set_ioreq_server_state 22
+struct xen_hvm_set_ioreq_server_state {
+    domid_t domid;   /* IN - domain to be serviced */
+    ioservid_t id;   /* IN - server id */
+    uint8_t enabled; /* IN - enabled? */    
+};
+typedef struct xen_hvm_set_ioreq_server_state xen_hvm_set_ioreq_server_state_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_set_ioreq_server_state_t);
+
 #endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
 
 #endif /* __XEN_PUBLIC_HVM_HVM_OP_H__ */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v7 9/9] ioreq-server: make buffered ioreq handling optional
  2014-05-09  8:39 [PATCH v7 0/9] Support for running secondary emulators Paul Durrant
                   ` (7 preceding siblings ...)
  2014-05-09  8:40 ` [PATCH v7 8/9] ioreq-server: remove p2m entries when server is enabled Paul Durrant
@ 2014-05-09  8:40 ` Paul Durrant
  2014-05-12 11:36   ` Jan Beulich
  8 siblings, 1 reply; 39+ messages in thread
From: Paul Durrant @ 2014-05-09  8:40 UTC (permalink / raw)
  To: xen-devel; +Cc: Paul Durrant, Ian Jackson, Jan Beulich, Stefano Stabellini

Some emulators will only register regions that require non-buffered
access. (In practice the only region that a guest uses buffered access
for today is the VGA aperture from 0xa0000-0xbffff). This patch therefore
makes allocation of the buffered ioreq page and event channel optional for
secondary ioreq servers.

If a guest attempts buffered access to an ioreq server that does not
support it, the access will be handled via the normal synchronous path.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
---
 tools/libxc/xc_domain.c         |    2 ++
 tools/libxc/xenctrl.h           |    2 ++
 xen/arch/x86/hvm/hvm.c          |   75 +++++++++++++++++++++++++++------------
 xen/include/public/hvm/hvm_op.h |   24 ++++++++-----
 4 files changed, 71 insertions(+), 32 deletions(-)

diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 17a8417..37ed141 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1286,6 +1286,7 @@ int xc_get_hvm_param(xc_interface *handle, domid_t dom, int param, unsigned long
 
 int xc_hvm_create_ioreq_server(xc_interface *xch,
                                domid_t domid,
+                               int handle_bufioreq,
                                ioservid_t *id)
 {
     DECLARE_HYPERCALL;
@@ -1301,6 +1302,7 @@ int xc_hvm_create_ioreq_server(xc_interface *xch,
     hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
 
     arg->domid = domid;
+    arg->handle_bufioreq = !!handle_bufioreq;
 
     rc = do_xen_hypercall(xch, &hypercall);
 
diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
index 2045084..400f0df 100644
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -1802,11 +1802,13 @@ int xc_get_hvm_param(xc_interface *handle, domid_t dom, int param, unsigned long
  *
  * @parm xch a handle to an open hypervisor interface.
  * @parm domid the domain id to be serviced
+ * @parm handle_bufioreq should the IOREQ Server handle buffered requests?
  * @parm id pointer to an ioservid_t to receive the IOREQ Server id.
  * @return 0 on success, -1 on failure.
  */
 int xc_hvm_create_ioreq_server(xc_interface *xch,
                                domid_t domid,
+                               int handle_bufioreq,
                                ioservid_t *id);
 
 /**
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 86daecd..67f6f47 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -668,7 +668,7 @@ static int hvm_ioreq_server_add_vcpu(struct hvm_ioreq_server *s,
 
     sv->ioreq_evtchn = rc;
 
-    if ( v->vcpu_id == 0 )
+    if ( v->vcpu_id == 0 && s->bufioreq.va != NULL )
     {
         struct domain *d = s->domain;
 
@@ -719,7 +719,7 @@ static void hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s,
 
         list_del(&sv->list_entry);
 
-        if ( v->vcpu_id == 0 )
+        if ( v->vcpu_id == 0 && s->bufioreq.va != NULL )
             free_xen_event_channel(v, s->bufioreq_evtchn);
 
         free_xen_event_channel(v, sv->ioreq_evtchn);
@@ -746,7 +746,7 @@ static void hvm_ioreq_server_remove_all_vcpus(struct hvm_ioreq_server *s)
 
         list_del(&sv->list_entry);
 
-        if ( v->vcpu_id == 0 )
+        if ( v->vcpu_id == 0 && s->bufioreq.va != NULL )
             free_xen_event_channel(v, s->bufioreq_evtchn);
 
         free_xen_event_channel(v, sv->ioreq_evtchn);
@@ -758,7 +758,7 @@ static void hvm_ioreq_server_remove_all_vcpus(struct hvm_ioreq_server *s)
 }
 
 static int hvm_ioreq_server_map_pages(struct hvm_ioreq_server *s,
-                                      bool_t is_default)
+                                      bool_t is_default, bool_t handle_bufioreq)
 {
     struct domain *d = s->domain;
     unsigned long ioreq_pfn, bufioreq_pfn;
@@ -766,24 +766,34 @@ static int hvm_ioreq_server_map_pages(struct hvm_ioreq_server *s,
 
     if ( is_default ) {
         ioreq_pfn = d->arch.hvm_domain.params[HVM_PARAM_IOREQ_PFN];
+
+        /*
+         * The default ioreq server must handle buffered ioreqs, for
+         * backwards compatibility.
+         */
+        ASSERT(handle_bufioreq);
         bufioreq_pfn = d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_PFN];
     } else {
         rc = hvm_alloc_ioreq_gmfn(d, &ioreq_pfn);
         if ( rc )
             goto fail1;
 
-        rc = hvm_alloc_ioreq_gmfn(d, &bufioreq_pfn);
-        if ( rc )
-            goto fail2;
+        if ( handle_bufioreq ) {
+            rc = hvm_alloc_ioreq_gmfn(d, &bufioreq_pfn);
+            if ( rc )
+                goto fail2;
+        }
     }
 
     rc = hvm_map_ioreq_page(s, 0, ioreq_pfn);
     if ( rc )
         goto fail3;
 
-    rc = hvm_map_ioreq_page(s, 1, bufioreq_pfn);
-    if ( rc )
-        goto fail4;
+    if ( handle_bufioreq ) {
+        rc = hvm_map_ioreq_page(s, 1, bufioreq_pfn);
+        if ( rc )
+            goto fail4;
+    }
 
     return 0;
 
@@ -791,7 +801,7 @@ fail4:
     hvm_unmap_ioreq_page(s, 0);
 
 fail3:
-    if ( !is_default )
+    if ( !is_default && handle_bufioreq )
         hvm_free_ioreq_gmfn(d, bufioreq_pfn);
 
 fail2:
@@ -806,12 +816,17 @@ static void hvm_ioreq_server_unmap_pages(struct hvm_ioreq_server *s,
                                          bool_t is_default)
 {
     struct domain *d = s->domain;
+    bool_t handle_bufioreq = ( s->bufioreq.va != NULL );
+
+    if ( handle_bufioreq )
+        hvm_unmap_ioreq_page(s, 1);
 
-    hvm_unmap_ioreq_page(s, 1);
     hvm_unmap_ioreq_page(s, 0);
 
     if ( !is_default ) {
-        hvm_free_ioreq_gmfn(d, s->bufioreq.gmfn);
+        if ( handle_bufioreq )
+            hvm_free_ioreq_gmfn(d, s->bufioreq.gmfn);
+
         hvm_free_ioreq_gmfn(d, s->ioreq.gmfn);
     }
 }
@@ -875,6 +890,7 @@ static void hvm_ioreq_server_enable(struct hvm_ioreq_server *s,
 {
     struct domain *d = s->domain;
     struct hvm_ioreq_vcpu *sv;
+    bool_t handle_bufioreq = ( s->bufioreq.va != NULL );
 
     spin_lock(&s->lock);
 
@@ -884,7 +900,9 @@ static void hvm_ioreq_server_enable(struct hvm_ioreq_server *s,
     if ( !is_default )
     {
         hvm_remove_ioreq_gmfn(d, &s->ioreq);
-        hvm_remove_ioreq_gmfn(d, &s->bufioreq);
+
+        if ( handle_bufioreq )
+            hvm_remove_ioreq_gmfn(d, &s->bufioreq);
     }
 
     s->enabled = 1;
@@ -902,6 +920,7 @@ static void hvm_ioreq_server_disable(struct hvm_ioreq_server *s,
                                     bool_t is_default)
 {
     struct domain *d = s->domain;
+    bool_t handle_bufioreq = ( s->bufioreq.va != NULL );
 
     spin_lock(&s->lock);
 
@@ -910,7 +929,9 @@ static void hvm_ioreq_server_disable(struct hvm_ioreq_server *s,
 
     if ( !is_default )
     {
-        hvm_add_ioreq_gmfn(d, &s->bufioreq);
+        if ( handle_bufioreq )
+            hvm_add_ioreq_gmfn(d, &s->bufioreq);
+
         hvm_add_ioreq_gmfn(d, &s->ioreq);
     }
 
@@ -922,7 +943,7 @@ static void hvm_ioreq_server_disable(struct hvm_ioreq_server *s,
 
 static int hvm_ioreq_server_init(struct hvm_ioreq_server *s, struct domain *d,
                                  domid_t domid, bool_t is_default,
-                                 ioservid_t id)
+                                 bool_t handle_bufioreq, ioservid_t id)
 {
     struct vcpu *v;
     int rc;
@@ -939,7 +960,7 @@ static int hvm_ioreq_server_init(struct hvm_ioreq_server *s, struct domain *d,
     if ( rc )
         goto fail1;
 
-    rc = hvm_ioreq_server_map_pages(s, is_default);
+    rc = hvm_ioreq_server_map_pages(s, is_default, handle_bufioreq);
     if ( rc )
         goto fail2;
 
@@ -998,7 +1019,8 @@ static ioservid_t next_ioservid(struct domain *d)
 }
 
 static int hvm_create_ioreq_server(struct domain *d, domid_t domid,
-                                   bool_t is_default, ioservid_t *id)
+                                   bool_t is_default, bool_t handle_bufioreq,
+                                   ioservid_t *id)
 {
     struct hvm_ioreq_server *s;
     int rc;
@@ -1015,7 +1037,7 @@ static int hvm_create_ioreq_server(struct domain *d, domid_t domid,
     if ( is_default && d->arch.hvm_domain.default_ioreq_server != NULL )
         goto fail2;
 
-    rc = hvm_ioreq_server_init(s, d, domid, is_default,
+    rc = hvm_ioreq_server_init(s, d, domid, is_default, handle_bufioreq,
                                next_ioservid(d));
     if ( rc )
         goto fail3;
@@ -1108,8 +1130,11 @@ static int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id,
             continue;
 
         *ioreq_pfn = s->ioreq.gmfn;
-        *bufioreq_pfn = s->bufioreq.gmfn;
-        *bufioreq_port = s->bufioreq_evtchn;
+
+        if ( s->bufioreq.va != NULL ) {
+            *bufioreq_pfn = s->bufioreq.gmfn;
+            *bufioreq_port = s->bufioreq_evtchn;
+        }
 
         rc = 0;
         break;
@@ -2368,6 +2393,9 @@ int hvm_buffered_io_send(ioreq_t *p)
     iorp = &s->bufioreq;
     pg = iorp->va;
 
+    if ( !pg )
+        return 0;
+
     /*
      * Return 0 for the cases we can't deal with:
      *  - 'addr' is only a 20-bit field, so we cannot address beyond 1MB
@@ -5124,7 +5152,8 @@ static int hvmop_create_ioreq_server(
     if ( rc != 0 )
         goto out;
 
-    rc = hvm_create_ioreq_server(d, curr_d->domain_id, 0, &op.id);
+    rc = hvm_create_ioreq_server(d, curr_d->domain_id, 0,
+                                 !!op.handle_bufioreq, &op.id);
     if ( rc != 0 )
         goto out;
 
@@ -5548,7 +5577,7 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
                 
                 /* May need to create server */
                 domid = d->arch.hvm_domain.params[HVM_PARAM_DM_DOMAIN];
-                rc = hvm_create_ioreq_server(d, domid, 1, NULL);
+                rc = hvm_create_ioreq_server(d, domid, 1, 1, NULL);
                 if ( rc != 0 && rc != -EEXIST )
                     goto param_fail;
                 /*FALLTHRU*/
diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
index 6cf1844..6d3e559 100644
--- a/xen/include/public/hvm/hvm_op.h
+++ b/xen/include/public/hvm/hvm_op.h
@@ -258,12 +258,15 @@ typedef uint16_t ioservid_t;
  * HVMOP_create_ioreq_server: Instantiate a new IOREQ Server for a secondary
  *                            emulator servicing domain <domid>.
  *
- * The <id> handed back is unique for <domid>.
+ * The <id> handed back is unique for <domid>. If <handle_bufioreq> is zero
+ * the buffered ioreq ring will not be allocated and hence all emulation
+ * requestes to this server will be synchronous.
  */
 #define HVMOP_create_ioreq_server 17
 struct xen_hvm_create_ioreq_server {
-    domid_t domid; /* IN - domain to be serviced */
-    ioservid_t id; /* OUT - server id */
+    domid_t domid;           /* IN - domain to be serviced */
+    uint8_t handle_bufioreq; /* IN - should server handle buffered ioreqs */
+    ioservid_t id;           /* OUT - server id */
 };
 typedef struct xen_hvm_create_ioreq_server xen_hvm_create_ioreq_server_t;
 DEFINE_XEN_GUEST_HANDLE(xen_hvm_create_ioreq_server_t);
@@ -273,12 +276,15 @@ DEFINE_XEN_GUEST_HANDLE(xen_hvm_create_ioreq_server_t);
  *                              IOREQ Server <id>. 
  *
  * The emulator needs to map the synchronous ioreq structures and buffered
- * ioreq ring that Xen uses to request emulation. These are hosted in domain
- * <domid>'s gmfns <ioreq_pfn> and <bufioreq_pfn> respectively. In addition the
- * emulator needs to bind to event channel <bufioreq_port> to listen for
- * buffered emulation requests. (The event channels used for synchronous
- * emulation requests are specified in the per-CPU ioreq structures in
- * <ioreq_pfn>).
+ * ioreq ring (if it exists) that Xen uses to request emulation. These are
+ * hosted in domain <domid>'s gmfns <ioreq_pfn> and <bufioreq_pfn>
+ * respectively. In addition, if the IOREQ Server is handling buffered
+ * emulation requests, the emulator needs to bind to event channel
+ * <bufioreq_port> to listen for them. (The event channels used for
+ * synchronous emulation requests are specified in the per-CPU ioreq
+ * structures in <ioreq_pfn>).
+ * If the IOREQ Server is not handling buffered emulation requests then the
+ * values handed back in <bufioreq_pfn> and <bufioreq_port> will both be 0.
  */
 #define HVMOP_get_ioreq_server_info 18
 struct xen_hvm_get_ioreq_server_info {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v7 7/9] ioreq-server: add support for multiple servers
  2014-05-09  8:40 ` [PATCH v7 7/9] ioreq-server: add support for multiple servers Paul Durrant
@ 2014-05-09  9:34   ` Ian Campbell
  2014-05-09  9:38     ` Ian Campbell
  2014-05-09  9:50     ` Paul Durrant
  2014-05-12 11:26   ` Jan Beulich
  1 sibling, 2 replies; 39+ messages in thread
From: Ian Campbell @ 2014-05-09  9:34 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Stefano Stabellini, Ian Jackson, Jan Beulich, xen-devel

On Fri, 2014-05-09 at 09:40 +0100, Paul Durrant wrote:
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> Cc: Ian Campbell <ian.campbell@citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> ---

An interseries changelog would have been appreciated.

Assuming the hypervisor guys are happy with the hypercal API:
Acked-by: Ian Campbell <ian.campbell@citrix.com>

One minor nit:

> +    if (!pagebuf.nr_ioreq_server_pages ^ !pagebuf.ioreq_server_pfn) {
> +        ERROR("XC_SAVE_ID_HVM_NR_IOREQ_SERVER_PAGES = %"PRIx64" XC_SAVE_ID_HVM_IOREQ_SERVER_PFN = %"PRIx64,
> +              pagebuf.nr_ioreq_server_pages, pagebuf.ioreq_server_pfn);

This is a very long line, and I fear the resulting message will be a bit
meaningless. How about:
        ERROR("Inconsistent IOREQ server settings (pfn=%PRIx64",nr=%"PRIx64")",
               pagebuf....)
?

Ian.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v7 8/9] ioreq-server: remove p2m entries when server is enabled
  2014-05-09  8:40 ` [PATCH v7 8/9] ioreq-server: remove p2m entries when server is enabled Paul Durrant
@ 2014-05-09  9:36   ` Ian Campbell
  2014-05-12 11:34   ` Jan Beulich
  1 sibling, 0 replies; 39+ messages in thread
From: Ian Campbell @ 2014-05-09  9:36 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Stefano Stabellini, Ian Jackson, Jan Beulich, xen-devel

On Fri, 2014-05-09 at 09:40 +0100, Paul Durrant wrote:
> For secondary servers, add a hvm op to enable/disable the server. The
> server will not accept IO until it is enabled and the act of enabling
> the server removes its pages from the guest p2m, thus preventing the guest
> from directly mapping the pages and synthesizing ioreqs.
> 
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

Tools side:
        Acked-by: Ian Campbell <ian.campbell@citrix.com>
(with the usual caveat about the h/v side guys being happy with the
hcall interface)

Ian.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v7 7/9] ioreq-server: add support for multiple servers
  2014-05-09  9:34   ` Ian Campbell
@ 2014-05-09  9:38     ` Ian Campbell
  2014-05-09  9:50     ` Paul Durrant
  1 sibling, 0 replies; 39+ messages in thread
From: Ian Campbell @ 2014-05-09  9:38 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel, Ian Jackson, Jan Beulich, Stefano Stabellini

On Fri, 2014-05-09 at 10:34 +0100, Ian Campbell wrote:
> On Fri, 2014-05-09 at 09:40 +0100, Paul Durrant wrote:
> > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> > Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> > Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> > Cc: Ian Campbell <ian.campbell@citrix.com>
> > Cc: Jan Beulich <jbeulich@suse.com>
> > ---
> 
> An interseries changelog would have been appreciated.
> 
> Assuming the hypervisor guys are happy with the hypercal API:

I should also note that I'm expecting someone to shovel these in along
with the hypervisor side changes when they are all ready.

> Acked-by: Ian Campbell <ian.campbell@citrix.com>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v7 7/9] ioreq-server: add support for multiple servers
  2014-05-09  9:34   ` Ian Campbell
  2014-05-09  9:38     ` Ian Campbell
@ 2014-05-09  9:50     ` Paul Durrant
  2014-05-09 13:25       ` Ian Campbell
  1 sibling, 1 reply; 39+ messages in thread
From: Paul Durrant @ 2014-05-09  9:50 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Ian Jackson, Stefano Stabellini, Jan Beulich, xen-devel

> -----Original Message-----
> From: Ian Campbell
> Sent: 09 May 2014 10:34
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; Ian Jackson; Stefano Stabellini; Jan Beulich
> Subject: Re: [PATCH v7 7/9] ioreq-server: add support for multiple servers
> 
> On Fri, 2014-05-09 at 09:40 +0100, Paul Durrant wrote:
> > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> > Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> > Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> > Cc: Ian Campbell <ian.campbell@citrix.com>
> > Cc: Jan Beulich <jbeulich@suse.com>
> > ---
> 
> An interseries changelog would have been appreciated.

Sorry, I should have noted exactly which patches had changed.
 
> Assuming the hypervisor guys are happy with the hypercal API:
> Acked-by: Ian Campbell <ian.campbell@citrix.com>
> 

Thanks.

> One minor nit:
> 
> > +    if (!pagebuf.nr_ioreq_server_pages ^ !pagebuf.ioreq_server_pfn) {
> > +        ERROR("XC_SAVE_ID_HVM_NR_IOREQ_SERVER_PAGES = %"PRIx64"
> XC_SAVE_ID_HVM_IOREQ_SERVER_PFN = %"PRIx64,
> > +              pagebuf.nr_ioreq_server_pages, pagebuf.ioreq_server_pfn);
> 
> This is a very long line, and I fear the resulting message will be a bit
> meaningless. How about:
>         ERROR("Inconsistent IOREQ server settings
> (pfn=%PRIx64",nr=%"PRIx64")",
>                pagebuf....)
> ?
> 

Agreed. I won't rework the series for this. I'll make a note and fix it afterwards.

  Paul

> Ian.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v7 5/9] Add an implentation of asprintf() for xen
  2014-05-09  8:40 ` [PATCH v7 5/9] Add an implentation of asprintf() for xen Paul Durrant
@ 2014-05-09 13:06   ` Jan Beulich
  2014-05-09 13:08     ` Paul Durrant
  0 siblings, 1 reply; 39+ messages in thread
From: Jan Beulich @ 2014-05-09 13:06 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Keir Fraser, Tim Deegan, Ian Jackson, Ian Campbell, xen-devel

>>> On 09.05.14 at 10:40, <paul.durrant@citrix.com> wrote:
> +int vasprintf(char **bufp, const char *fmt, va_list args)
> +{
> +    va_list args_copy;
> +    size_t size;
> +    char *buf, dummy[1];
> +
> +    va_copy(args_copy, args);
> +    size = vsnprintf(dummy, 0, fmt, args_copy);

Hmm, this is still dummy rather than NULL.

Jan

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v7 5/9] Add an implentation of asprintf() for xen
  2014-05-09 13:06   ` Jan Beulich
@ 2014-05-09 13:08     ` Paul Durrant
  2014-05-09 13:15       ` Jan Beulich
  0 siblings, 1 reply; 39+ messages in thread
From: Paul Durrant @ 2014-05-09 13:08 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Ian Jackson, Tim (Xen.org), Keir (Xen.org), Ian Campbell, xen-devel

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 09 May 2014 14:07
> To: Paul Durrant
> Cc: Ian Campbell; Ian Jackson; xen-devel@lists.xen.org; Keir (Xen.org); Tim
> (Xen.org)
> Subject: Re: [PATCH v7 5/9] Add an implentation of asprintf() for xen
> 
> >>> On 09.05.14 at 10:40, <paul.durrant@citrix.com> wrote:
> > +int vasprintf(char **bufp, const char *fmt, va_list args)
> > +{
> > +    va_list args_copy;
> > +    size_t size;
> > +    char *buf, dummy[1];
> > +
> > +    va_copy(args_copy, args);
> > +    size = vsnprintf(dummy, 0, fmt, args_copy);
> 
> Hmm, this is still dummy rather than NULL.
> 

Yes, I explained why - I thought you were ok with that.

  Paul

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v7 3/9] ioreq-server: create basic ioreq server abstraction.
  2014-05-09  8:39 ` [PATCH v7 3/9] ioreq-server: create basic ioreq server abstraction Paul Durrant
@ 2014-05-09 13:09   ` Jan Beulich
  2014-05-09 15:22     ` Paul Durrant
  0 siblings, 1 reply; 39+ messages in thread
From: Jan Beulich @ 2014-05-09 13:09 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Keir Fraser, xen-devel

>>> On 09.05.14 at 10:39, <paul.durrant@citrix.com> wrote:
> Collect together data structures concerning device emulation together into
> a new struct hvm_ioreq_server.
> 
> Code that deals with the shared and buffered ioreq pages is extracted from
> functions such as hvm_domain_initialise, hvm_vcpu_initialise and do_hvm_op
> and consolidated into a set of hvm_ioreq_server manipulation functions. The
> lock in the hvm_ioreq_page served two different purposes and has been
> replaced by separate locks in the hvm_ioreq_server structure.
> 
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>

Acked-by: Jan Beulich <jbeulich@suse.com>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v7 4/9] ioreq-server: on-demand creation of ioreq server
  2014-05-09  8:39 ` [PATCH v7 4/9] ioreq-server: on-demand creation of ioreq server Paul Durrant
@ 2014-05-09 13:12   ` Jan Beulich
  2014-05-09 15:22     ` Paul Durrant
  0 siblings, 1 reply; 39+ messages in thread
From: Jan Beulich @ 2014-05-09 13:12 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Keir Fraser, xen-devel

>>> On 09.05.14 at 10:39, <paul.durrant@citrix.com> wrote:
> This patch only creates the ioreq server when the legacy HVM parameters
> are read (by an emulator).
> 
> A lock is introduced to protect access to the ioreq server by multiple
> emulator/tool invocations should such an eventuality arise. The guest is
> protected by creation of the ioreq server only being done whilst the
> domain is paused.
> 
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>

Acked-by: Jan Beulich <jbeulich@suse.com>
(hesitantly, due to the various not really warranted uses of goto)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v7 5/9] Add an implentation of asprintf() for xen
  2014-05-09 13:08     ` Paul Durrant
@ 2014-05-09 13:15       ` Jan Beulich
  2014-05-09 13:19         ` Paul Durrant
  2014-05-09 14:15         ` Paul Durrant
  0 siblings, 2 replies; 39+ messages in thread
From: Jan Beulich @ 2014-05-09 13:15 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Ian Jackson, Tim (Xen.org), Keir (Xen.org), Ian Campbell, xen-devel

>>> On 09.05.14 at 15:08, <Paul.Durrant@citrix.com> wrote:
>>  -----Original Message-----
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: 09 May 2014 14:07
>> To: Paul Durrant
>> Cc: Ian Campbell; Ian Jackson; xen-devel@lists.xen.org; Keir (Xen.org); Tim
>> (Xen.org)
>> Subject: Re: [PATCH v7 5/9] Add an implentation of asprintf() for xen
>> 
>> >>> On 09.05.14 at 10:40, <paul.durrant@citrix.com> wrote:
>> > +int vasprintf(char **bufp, const char *fmt, va_list args)
>> > +{
>> > +    va_list args_copy;
>> > +    size_t size;
>> > +    char *buf, dummy[1];
>> > +
>> > +    va_copy(args_copy, args);
>> > +    size = vsnprintf(dummy, 0, fmt, args_copy);
>> 
>> Hmm, this is still dummy rather than NULL.
> 
> Yes, I explained why - I thought you were ok with that.

No, I wasn't - the argument was rather weak: If we had a problem
with NULL in vsnprintf() then we ought to fix it rather than working
around.

Jan

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v7 5/9] Add an implentation of asprintf() for xen
  2014-05-09 13:15       ` Jan Beulich
@ 2014-05-09 13:19         ` Paul Durrant
  2014-05-09 14:15         ` Paul Durrant
  1 sibling, 0 replies; 39+ messages in thread
From: Paul Durrant @ 2014-05-09 13:19 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Ian Jackson, Tim (Xen.org), Keir (Xen.org), Ian Campbell, xen-devel

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 09 May 2014 14:16
> To: Paul Durrant
> Cc: Ian Campbell; Ian Jackson; xen-devel@lists.xen.org; Keir (Xen.org); Tim
> (Xen.org)
> Subject: RE: [PATCH v7 5/9] Add an implentation of asprintf() for xen
> 
> >>> On 09.05.14 at 15:08, <Paul.Durrant@citrix.com> wrote:
> >>  -----Original Message-----
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> Sent: 09 May 2014 14:07
> >> To: Paul Durrant
> >> Cc: Ian Campbell; Ian Jackson; xen-devel@lists.xen.org; Keir (Xen.org);
> Tim
> >> (Xen.org)
> >> Subject: Re: [PATCH v7 5/9] Add an implentation of asprintf() for xen
> >>
> >> >>> On 09.05.14 at 10:40, <paul.durrant@citrix.com> wrote:
> >> > +int vasprintf(char **bufp, const char *fmt, va_list args)
> >> > +{
> >> > +    va_list args_copy;
> >> > +    size_t size;
> >> > +    char *buf, dummy[1];
> >> > +
> >> > +    va_copy(args_copy, args);
> >> > +    size = vsnprintf(dummy, 0, fmt, args_copy);
> >>
> >> Hmm, this is still dummy rather than NULL.
> >
> > Yes, I explained why - I thought you were ok with that.
> 
> No, I wasn't - the argument was rather weak: If we had a problem
> with NULL in vsnprintf() then we ought to fix it rather than working
> around.
> 

Ok. I'll give it a try. V8 coming up...

  Paul

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v7 6/9] Add the facility to limit ranges per rangeset
  2014-05-09  8:40 ` [PATCH v7 6/9] Add the facility to limit ranges per rangeset Paul Durrant
@ 2014-05-09 13:22   ` Jan Beulich
  2014-05-09 15:23     ` Paul Durrant
  0 siblings, 1 reply; 39+ messages in thread
From: Jan Beulich @ 2014-05-09 13:22 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Keir Fraser, xen-devel

>>> On 09.05.14 at 10:40, <paul.durrant@citrix.com> wrote:
> A subsequent patch exposes rangesets to secondary emulators, so to allow a
> limit to be placed on the amount of xenheap that an emulator can cause to be
> consumed, the function rangeset_limit() has been created to set the allowed
> number of ranges in a rangeset. By default, there is no limit.
> 
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

albeit ...

> --- a/xen/common/rangeset.c
> +++ b/xen/common/rangeset.c
> @@ -25,6 +25,9 @@ struct rangeset {
>  
>      /* Ordered list of ranges contained in this set, and protecting lock. 
> */
>      struct list_head range_list;
> +
> +    /* Number of ranges that can be allocated */
> +    long             nr_ranges;

I would have preferred this to be unsigned int (and to have the no-
limit case more explicitly handled by never decrementing the count
when it's e.g. UINT_MAX).

Jan

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v7 7/9] ioreq-server: add support for multiple servers
  2014-05-09  9:50     ` Paul Durrant
@ 2014-05-09 13:25       ` Ian Campbell
  2014-05-09 13:29         ` Paul Durrant
  0 siblings, 1 reply; 39+ messages in thread
From: Ian Campbell @ 2014-05-09 13:25 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Ian Jackson, Stefano Stabellini, Jan Beulich, xen-devel

On Fri, 2014-05-09 at 10:50 +0100, Paul Durrant wrote:
> > One minor nit:
> > 
> > > +    if (!pagebuf.nr_ioreq_server_pages ^ !pagebuf.ioreq_server_pfn) {
> > > +        ERROR("XC_SAVE_ID_HVM_NR_IOREQ_SERVER_PAGES = %"PRIx64"
> > XC_SAVE_ID_HVM_IOREQ_SERVER_PFN = %"PRIx64,
> > > +              pagebuf.nr_ioreq_server_pages, pagebuf.ioreq_server_pfn);
> > 
> > This is a very long line, and I fear the resulting message will be a bit
> > meaningless. How about:
> >         ERROR("Inconsistent IOREQ server settings
> > (pfn=%PRIx64",nr=%"PRIx64")",
> >                pagebuf....)
> > ?
> > 
> 
> Agreed. I won't rework the series for this. I'll make a note and fix it afterwards.

Fine by me, unless you are respining for some other reason of course.

Ian.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v7 7/9] ioreq-server: add support for multiple servers
  2014-05-09 13:25       ` Ian Campbell
@ 2014-05-09 13:29         ` Paul Durrant
  2014-05-09 13:38           ` Jan Beulich
  0 siblings, 1 reply; 39+ messages in thread
From: Paul Durrant @ 2014-05-09 13:29 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Ian Jackson, Stefano Stabellini, Jan Beulich, xen-devel

> -----Original Message-----
> From: Ian Campbell
> Sent: 09 May 2014 14:26
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; Ian Jackson; Stefano Stabellini; Jan Beulich
> Subject: Re: [PATCH v7 7/9] ioreq-server: add support for multiple servers
> 
> On Fri, 2014-05-09 at 10:50 +0100, Paul Durrant wrote:
> > > One minor nit:
> > >
> > > > +    if (!pagebuf.nr_ioreq_server_pages ^ !pagebuf.ioreq_server_pfn) {
> > > > +        ERROR("XC_SAVE_ID_HVM_NR_IOREQ_SERVER_PAGES =
> %"PRIx64"
> > > XC_SAVE_ID_HVM_IOREQ_SERVER_PFN = %"PRIx64,
> > > > +              pagebuf.nr_ioreq_server_pages, pagebuf.ioreq_server_pfn);
> > >
> > > This is a very long line, and I fear the resulting message will be a bit
> > > meaningless. How about:
> > >         ERROR("Inconsistent IOREQ server settings
> > > (pfn=%PRIx64",nr=%"PRIx64")",
> > >                pagebuf....)
> > > ?
> > >
> >
> > Agreed. I won't rework the series for this. I'll make a note and fix it
> afterwards.
> 
> Fine by me, unless you are respining for some other reason of course.
> 

Ok. Looks like I am so I'll fix.

  Paul

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v7 7/9] ioreq-server: add support for multiple servers
  2014-05-09 13:29         ` Paul Durrant
@ 2014-05-09 13:38           ` Jan Beulich
  2014-05-09 13:40             ` Paul Durrant
  0 siblings, 1 reply; 39+ messages in thread
From: Jan Beulich @ 2014-05-09 13:38 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Ian Jackson, Stefano Stabellini, Ian Campbell, xen-devel

>>> On 09.05.14 at 15:29, <Paul.Durrant@citrix.com> wrote:
>>  -----Original Message-----
>> From: Ian Campbell
>> Sent: 09 May 2014 14:26
>> To: Paul Durrant
>> Cc: xen-devel@lists.xen.org; Ian Jackson; Stefano Stabellini; Jan Beulich
>> Subject: Re: [PATCH v7 7/9] ioreq-server: add support for multiple servers
>> 
>> On Fri, 2014-05-09 at 10:50 +0100, Paul Durrant wrote:
>> > > One minor nit:
>> > >
>> > > > +    if (!pagebuf.nr_ioreq_server_pages ^ !pagebuf.ioreq_server_pfn) {
>> > > > +        ERROR("XC_SAVE_ID_HVM_NR_IOREQ_SERVER_PAGES =
>> %"PRIx64"
>> > > XC_SAVE_ID_HVM_IOREQ_SERVER_PFN = %"PRIx64,
>> > > > +              pagebuf.nr_ioreq_server_pages, pagebuf.ioreq_server_pfn);
>> > >
>> > > This is a very long line, and I fear the resulting message will be a bit
>> > > meaningless. How about:
>> > >         ERROR("Inconsistent IOREQ server settings
>> > > (pfn=%PRIx64",nr=%"PRIx64")",
>> > >                pagebuf....)
>> > > ?
>> > >
>> >
>> > Agreed. I won't rework the series for this. I'll make a note and fix it
>> afterwards.
>> 
>> Fine by me, unless you are respining for some other reason of course.
>> 
> 
> Ok. Looks like I am so I'll fix.

If that's just because of patch 5 (didn't get to 7...9 yet), I'd be fine
with just that one patch resent.

Jan

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v7 7/9] ioreq-server: add support for multiple servers
  2014-05-09 13:38           ` Jan Beulich
@ 2014-05-09 13:40             ` Paul Durrant
  0 siblings, 0 replies; 39+ messages in thread
From: Paul Durrant @ 2014-05-09 13:40 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Ian Jackson, Stefano Stabellini, Ian Campbell, xen-devel

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 09 May 2014 14:38
> To: Paul Durrant
> Cc: Ian Campbell; Ian Jackson; Stefano Stabellini; xen-devel@lists.xen.org
> Subject: RE: [PATCH v7 7/9] ioreq-server: add support for multiple servers
> 
> >>> On 09.05.14 at 15:29, <Paul.Durrant@citrix.com> wrote:
> >>  -----Original Message-----
> >> From: Ian Campbell
> >> Sent: 09 May 2014 14:26
> >> To: Paul Durrant
> >> Cc: xen-devel@lists.xen.org; Ian Jackson; Stefano Stabellini; Jan Beulich
> >> Subject: Re: [PATCH v7 7/9] ioreq-server: add support for multiple servers
> >>
> >> On Fri, 2014-05-09 at 10:50 +0100, Paul Durrant wrote:
> >> > > One minor nit:
> >> > >
> >> > > > +    if (!pagebuf.nr_ioreq_server_pages ^
> !pagebuf.ioreq_server_pfn) {
> >> > > > +        ERROR("XC_SAVE_ID_HVM_NR_IOREQ_SERVER_PAGES =
> >> %"PRIx64"
> >> > > XC_SAVE_ID_HVM_IOREQ_SERVER_PFN = %"PRIx64,
> >> > > > +              pagebuf.nr_ioreq_server_pages,
> pagebuf.ioreq_server_pfn);
> >> > >
> >> > > This is a very long line, and I fear the resulting message will be a bit
> >> > > meaningless. How about:
> >> > >         ERROR("Inconsistent IOREQ server settings
> >> > > (pfn=%PRIx64",nr=%"PRIx64")",
> >> > >                pagebuf....)
> >> > > ?
> >> > >
> >> >
> >> > Agreed. I won't rework the series for this. I'll make a note and fix it
> >> afterwards.
> >>
> >> Fine by me, unless you are respining for some other reason of course.
> >>
> >
> > Ok. Looks like I am so I'll fix.
> 
> If that's just because of patch 5 (didn't get to 7...9 yet), I'd be fine
> with just that one patch resent.
>

Ok - I'll just send a v8 version of patch 5 alone then and leave patch 7 alone (unless you find something I need to change).

  Paul

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v7 5/9] Add an implentation of asprintf() for xen
  2014-05-09 13:15       ` Jan Beulich
  2014-05-09 13:19         ` Paul Durrant
@ 2014-05-09 14:15         ` Paul Durrant
  2014-05-09 15:47           ` Jan Beulich
  1 sibling, 1 reply; 39+ messages in thread
From: Paul Durrant @ 2014-05-09 14:15 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Ian Jackson, Tim (Xen.org), Keir (Xen.org), Ian Campbell, xen-devel

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 09 May 2014 14:16
> To: Paul Durrant
> Cc: Ian Campbell; Ian Jackson; xen-devel@lists.xen.org; Keir (Xen.org); Tim
> (Xen.org)
> Subject: RE: [PATCH v7 5/9] Add an implentation of asprintf() for xen
> 
> >>> On 09.05.14 at 15:08, <Paul.Durrant@citrix.com> wrote:
> >>  -----Original Message-----
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> Sent: 09 May 2014 14:07
> >> To: Paul Durrant
> >> Cc: Ian Campbell; Ian Jackson; xen-devel@lists.xen.org; Keir (Xen.org);
> Tim
> >> (Xen.org)
> >> Subject: Re: [PATCH v7 5/9] Add an implentation of asprintf() for xen
> >>
> >> >>> On 09.05.14 at 10:40, <paul.durrant@citrix.com> wrote:
> >> > +int vasprintf(char **bufp, const char *fmt, va_list args)
> >> > +{
> >> > +    va_list args_copy;
> >> > +    size_t size;
> >> > +    char *buf, dummy[1];
> >> > +
> >> > +    va_copy(args_copy, args);
> >> > +    size = vsnprintf(dummy, 0, fmt, args_copy);
> >>
> >> Hmm, this is still dummy rather than NULL.
> >
> > Yes, I explained why - I thought you were ok with that.
> 
> No, I wasn't - the argument was rather weak: If we had a problem
> with NULL in vsnprintf() then we ought to fix it rather than working
> around.
> 

We do have a problem. Crashed immediately.

  Paul

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v7 3/9] ioreq-server: create basic ioreq server abstraction.
  2014-05-09 13:09   ` Jan Beulich
@ 2014-05-09 15:22     ` Paul Durrant
  0 siblings, 0 replies; 39+ messages in thread
From: Paul Durrant @ 2014-05-09 15:22 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Keir (Xen.org), xen-devel

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 09 May 2014 14:09
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; Keir (Xen.org)
> Subject: Re: [PATCH v7 3/9] ioreq-server: create basic ioreq server
> abstraction.
> 
> >>> On 09.05.14 at 10:39, <paul.durrant@citrix.com> wrote:
> > Collect together data structures concerning device emulation together into
> > a new struct hvm_ioreq_server.
> >
> > Code that deals with the shared and buffered ioreq pages is extracted from
> > functions such as hvm_domain_initialise, hvm_vcpu_initialise and
> do_hvm_op
> > and consolidated into a set of hvm_ioreq_server manipulation functions.
> The
> > lock in the hvm_ioreq_page served two different purposes and has been
> > replaced by separate locks in the hvm_ioreq_server structure.
> >
> > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> 
> Acked-by: Jan Beulich <jbeulich@suse.com>
> 

Thanks.

  Paul

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v7 4/9] ioreq-server: on-demand creation of ioreq server
  2014-05-09 13:12   ` Jan Beulich
@ 2014-05-09 15:22     ` Paul Durrant
  0 siblings, 0 replies; 39+ messages in thread
From: Paul Durrant @ 2014-05-09 15:22 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Keir (Xen.org), xen-devel

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 09 May 2014 14:13
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; Keir (Xen.org)
> Subject: Re: [PATCH v7 4/9] ioreq-server: on-demand creation of ioreq
> server
> 
> >>> On 09.05.14 at 10:39, <paul.durrant@citrix.com> wrote:
> > This patch only creates the ioreq server when the legacy HVM parameters
> > are read (by an emulator).
> >
> > A lock is introduced to protect access to the ioreq server by multiple
> > emulator/tool invocations should such an eventuality arise. The guest is
> > protected by creation of the ioreq server only being done whilst the
> > domain is paused.
> >
> > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> 
> Acked-by: Jan Beulich <jbeulich@suse.com>
> (hesitantly, due to the various not really warranted uses of goto)

Thanks,

  Paul

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v7 6/9] Add the facility to limit ranges per rangeset
  2014-05-09 13:22   ` Jan Beulich
@ 2014-05-09 15:23     ` Paul Durrant
  0 siblings, 0 replies; 39+ messages in thread
From: Paul Durrant @ 2014-05-09 15:23 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Keir (Xen.org), xen-devel

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 09 May 2014 14:23
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; Keir (Xen.org)
> Subject: Re: [PATCH v7 6/9] Add the facility to limit ranges per rangeset
> 
> >>> On 09.05.14 at 10:40, <paul.durrant@citrix.com> wrote:
> > A subsequent patch exposes rangesets to secondary emulators, so to allow
> a
> > limit to be placed on the amount of xenheap that an emulator can cause to
> be
> > consumed, the function rangeset_limit() has been created to set the
> allowed
> > number of ranges in a rangeset. By default, there is no limit.
> >
> > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> 
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> 

Thanks,

  Paul

> albeit ...
> 
> > --- a/xen/common/rangeset.c
> > +++ b/xen/common/rangeset.c
> > @@ -25,6 +25,9 @@ struct rangeset {
> >
> >      /* Ordered list of ranges contained in this set, and protecting lock.
> > */
> >      struct list_head range_list;
> > +
> > +    /* Number of ranges that can be allocated */
> > +    long             nr_ranges;
> 
> I would have preferred this to be unsigned int (and to have the no-
> limit case more explicitly handled by never decrementing the count
> when it's e.g. UINT_MAX).
> 
> Jan

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v7 5/9] Add an implentation of asprintf() for xen
  2014-05-09 14:15         ` Paul Durrant
@ 2014-05-09 15:47           ` Jan Beulich
  0 siblings, 0 replies; 39+ messages in thread
From: Jan Beulich @ 2014-05-09 15:47 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Ian Jackson, Tim (Xen.org), Keir (Xen.org), Ian Campbell, xen-devel

>>> On 09.05.14 at 16:15, <Paul.Durrant@citrix.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> >>> On 09.05.14 at 15:08, <Paul.Durrant@citrix.com> wrote:
>> >> From: Jan Beulich [mailto:JBeulich@suse.com]
>> >> >>> On 09.05.14 at 10:40, <paul.durrant@citrix.com> wrote:
>> >> > +int vasprintf(char **bufp, const char *fmt, va_list args)
>> >> > +{
>> >> > +    va_list args_copy;
>> >> > +    size_t size;
>> >> > +    char *buf, dummy[1];
>> >> > +
>> >> > +    va_copy(args_copy, args);
>> >> > +    size = vsnprintf(dummy, 0, fmt, args_copy);
>> >>
>> >> Hmm, this is still dummy rather than NULL.
>> >
>> > Yes, I explained why - I thought you were ok with that.
>> 
>> No, I wasn't - the argument was rather weak: If we had a problem
>> with NULL in vsnprintf() then we ought to fix it rather than working
>> around.
> 
> We do have a problem. Crashed immediately.

Urgh.

Jan

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v7 7/9] ioreq-server: add support for multiple servers
  2014-05-09  8:40 ` [PATCH v7 7/9] ioreq-server: add support for multiple servers Paul Durrant
  2014-05-09  9:34   ` Ian Campbell
@ 2014-05-12 11:26   ` Jan Beulich
  2014-05-12 12:19     ` Paul Durrant
  2014-05-19 12:55     ` Paul Durrant
  1 sibling, 2 replies; 39+ messages in thread
From: Jan Beulich @ 2014-05-12 11:26 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel, Ian Jackson, Ian Campbell, Stefano Stabellini

>>> On 09.05.14 at 10:40, <paul.durrant@citrix.com> wrote:
> +static void hvm_free_ioreq_gmfn(struct domain *d, unsigned long gmfn)
> +{
> +    unsigned int i = gmfn - d->arch.hvm_domain.ioreq_gmfn.base;
> +
> +    clear_bit(i, &d->arch.hvm_domain.ioreq_gmfn.mask);

Perhaps worth an ASSERT() on i not being out of range, to avoid
hard to debug memory corruption?

> +static int hvm_access_cf8(
> +    int dir, uint32_t port, uint32_t bytes, uint32_t *val)
> +{
> +    struct domain *d = current->domain;
> +
> +    if ( bytes != 4 )
> +        return X86EMUL_UNHANDLEABLE;
> +
> +    BUG_ON(port < 0xcf8);

If at all, that would better be an ASSERT() imo.

> +    port -= 0xcf8;

Stale statement - port isn't being used anymore below.

> +
> +    if ( dir == IOREQ_WRITE )
> +    {
> +        d->arch.hvm_domain.pci_cf8 = *val;
> +
> +        /* We always need to fall through to the catch all emulator */
> +        return X86EMUL_UNHANDLEABLE;
> +    }
> +    else
> +    {
> +        *val = d->arch.hvm_domain.pci_cf8;
> +        return X86EMUL_OKAY;

Is this correct irrespective of the value of the high bit? You don't
really know what non-config-space accesses mean (including whether
that would have any side effects), so I guess you'd be better of
forwarding that case too.

And in the end I wonder whether it wouldn't be better to forward
all access, with this function solely serving the purpose of latching
the last written value.

> -static int hvm_ioreq_server_map_pages(struct hvm_ioreq_server *s)
> +static int hvm_ioreq_server_map_pages(struct hvm_ioreq_server *s,
> +                                      bool_t is_default)
>  {
>      struct domain *d = s->domain;
> -    unsigned long pfn;
> +    unsigned long ioreq_pfn, bufioreq_pfn;
>      int rc;
>  
> -    pfn = d->arch.hvm_domain.params[HVM_PARAM_IOREQ_PFN];
> -    rc = hvm_map_ioreq_page(s, 0, pfn);
> +    if ( is_default ) {
> +        ioreq_pfn = d->arch.hvm_domain.params[HVM_PARAM_IOREQ_PFN];
> +        bufioreq_pfn = d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_PFN];
> +    } else {

I hate to say that, but: Coding style. And still more of this further
down.

> +static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s, 
> +                                            bool_t is_default)
> +{
> +    int i;

And for variables like this I also already asked that they be unsigned
(as properly done e.g. in hvm_ioreq_server_free_rangesets()).

> +    int rc;
> +
> +    if ( is_default )
> +        goto done;
> +
> +    for ( i = 0; i < NR_IO_RANGE_TYPES; i++ ) {
> +        char *name;
> +
> +        rc = asprintf(&name, "ioreq_server %d %s", s->id,
> +                      (i == HVMOP_IO_RANGE_PORT) ? "port" :
> +                      (i == HVMOP_IO_RANGE_MEMORY) ? "memory" :
> +                      (i == HVMOP_IO_RANGE_PCI) ? "pci" :
> +                      "");
> +        if ( rc )
> +            goto fail;
> +
> +        s->range[i] = rangeset_new(s->domain, name,
> +                                   RANGESETF_prettyprint_hex);
> +
> +        xfree(name);
> +
> +        rc = -ENOMEM;
> +        if ( !s->range[i] )
> +            goto fail;
> +
> +        rangeset_limit(s->range[i], MAX_NR_IO_RANGES);

Just as a remark/question: Perhaps ultimately this limit should
become a configurable one, at once allowing the limits for the three
kinds to be different?

> +static ioservid_t next_ioservid(struct domain *d)
> +{
> +    struct hvm_ioreq_server *s;
> +    static ioservid_t id;
> +
> +    ASSERT(spin_is_locked(&d->arch.hvm_domain.ioreq_server.lock));
> +
> +    id++;
> +
> + again:
> +    /* Check for uniqueness */
> +    list_for_each_entry ( s,
> +                          &d->arch.hvm_domain.ioreq_server.list,
> +                          list_entry )
> +    {
> +        if (id == s->id)
> +        {
> +            id++;
> +            goto again;

Please put the label ahead of the pre-loop increment, avoiding the
duplication.

And what's worse - without latching the value of "id" into a local
variable (storing back only the final result), your uniqueness check
doesn't work, because the lock only guards against concurrent
accesses for a single domain. Once fixed, perhaps also add a brief
comment explaining why the remaining not fully serialized access to
"id" is not a problem?

And of course, if you're going to stay with this system-wide static
variable, you'd need to clarify why you think this is not a - how
ever small - information leak (of the side channel kind).

> @@ -757,25 +942,35 @@ static int hvm_create_ioreq_server(struct domain *d, 
> domid_t domid)
>          goto fail1;
>  
>      domain_pause(d);
> -    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> +    spin_lock(&d->arch.hvm_domain.ioreq_server.lock);
>  
>      rc = -EEXIST;
> -    if ( d->arch.hvm_domain.ioreq_server != NULL )
> +    if ( is_default && d->arch.hvm_domain.default_ioreq_server != NULL )
>          goto fail2;
>  
> -    rc = hvm_ioreq_server_init(s, d, domid);
> +    rc = hvm_ioreq_server_init(s, d, domid, is_default,
> +                               next_ioservid(d));
>      if ( rc )
> -        goto fail2;
> +        goto fail3;
> +
> +    list_add(&s->list_entry,
> +             &d->arch.hvm_domain.ioreq_server.list);
> +    d->arch.hvm_domain.ioreq_server.count++;

Do you really need that field? The only time it's being read seems to be
to determine whether there's exactly one of them. That surely can also
be determined by looking at .list.

> +static int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
> +                                            uint32_t type, uint64_t start, uint64_t end)
> +{
> +    struct hvm_ioreq_server *s;
> +    int rc;
> +
> +    spin_lock(&d->arch.hvm_domain.ioreq_server.lock);
> +
> +    rc = -ENOENT;
> +    list_for_each_entry ( s,
> +                          &d->arch.hvm_domain.ioreq_server.list,
> +                          list_entry )
> +    {
> +        if ( s == d->arch.hvm_domain.default_ioreq_server )
> +            continue;
> +
> +        if ( s->id == id )
> +        {
> +            struct rangeset *r;
> +
> +            switch ( type )
> +            {
> +            case HVMOP_IO_RANGE_PORT:
> +            case HVMOP_IO_RANGE_MEMORY:
> +            case HVMOP_IO_RANGE_PCI:
> +                r = s->range[type];
> +                break;
> +
> +            default:
> +                r = NULL;
> +                break;
> +            }
> +
> +            rc = -EINVAL;
> +            if ( !r )
> +                break;
> +
> +            rc = rangeset_add_range(r, start, end);

Is it not an error for the range to perhaps already be there ...

> +static int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
> +                                                uint32_t type, uint64_t start, uint64_t end)
> +{
> +    struct hvm_ioreq_server *s;
> +    int rc;
> +
> +    spin_lock(&d->arch.hvm_domain.ioreq_server.lock);
> +
> +    rc = -ENOENT;
> +    list_for_each_entry ( s,
> +                          &d->arch.hvm_domain.ioreq_server.list,
> +                          list_entry )
> +    {
> +        if ( s == d->arch.hvm_domain.default_ioreq_server )
> +            continue;
> +
> +        if ( s->id == id )
> +        {
> +            struct rangeset *r;
> +
> +            switch ( type )
> +            {
> +            case HVMOP_IO_RANGE_PORT:
> +            case HVMOP_IO_RANGE_MEMORY:
> +            case HVMOP_IO_RANGE_PCI:
> +                r = s->range[type];
> +                break;
> +
> +            default:
> +                r = NULL;
> +                break;
> +            }
> +
> +            rc = -EINVAL;
> +            if ( !r )
> +                break;
> +
> +            rc = rangeset_remove_range(r, start, end);

... or to not be there here?

> +static int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v)
> +{
> +    struct hvm_ioreq_server *s;
> +    int rc;
> +
> +    spin_lock(&d->arch.hvm_domain.ioreq_server.lock);
> +
> +    list_for_each_entry ( s,
> +                          &d->arch.hvm_domain.ioreq_server.list,
> +                          list_entry )
> +    {
> +        bool_t is_default = ( s == d->arch.hvm_domain.default_ioreq_server);

Stray blank after (, also again further down.

> +static void hvm_destroy_all_ioreq_servers(struct domain *d)
> +{
> +    struct hvm_ioreq_server *s, *next;
> +
> +    spin_lock(&d->arch.hvm_domain.ioreq_server.lock);
> +
> +    list_for_each_entry_safe ( s,
> +                               next,
> +                               &d->arch.hvm_domain.ioreq_server.list,
> +                               list_entry )
> +    {
> +        bool_t is_default = ( s == d->arch.hvm_domain.default_ioreq_server);
> +
> +        domain_pause(d);

This gets called only during domain teardown - what's the point of
the domain_pause()?

> +static struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
> +                                                        ioreq_t *p)
> +{
> +#define CF8_BDF(cf8)     (((cf8) & 0x00ffff00) >> 8)
> +#define CF8_ADDR_LO(cf8) ((cf8) & 0x000000fc)
> +#define CF8_ADDR_HI(cf8) (((cf8) & 0x0f000000) >> 16)

Any reason you exclude the low two bits here, despite separately
or-ing them back in below?

> +#define CF8_ENABLED(cf8) (!!((cf8) & 0x80000000))
> +
> +    struct hvm_ioreq_server *s;
> +    uint32_t cf8;
> +    uint8_t type;
> +    uint64_t addr;
> +
> +    if ( d->arch.hvm_domain.ioreq_server.count <= 1 ||
> +         (p->type != IOREQ_TYPE_COPY && p->type != IOREQ_TYPE_PIO) )
> +        return d->arch.hvm_domain.default_ioreq_server;
> +
> +    cf8 = d->arch.hvm_domain.pci_cf8;
> +        
> +    if ( p->type == IOREQ_TYPE_PIO &&
> +         (p->addr & ~3) == 0xcfc &&
> +         CF8_ENABLED(cf8) )
> +    {
> +        uint32_t sbdf;
> +
> +        /* PCI config data cycle */
> +
> +        sbdf = HVMOP_PCI_SBDF(0,
> +                              PCI_BUS(CF8_BDF(cf8)),
> +                              PCI_SLOT(CF8_BDF(cf8)),
> +                              PCI_FUNC(CF8_BDF(cf8)));
> +
> +        type = IOREQ_TYPE_PCI_CONFIG;
> +        addr = ((uint64_t)sbdf << 32) |
> +               CF8_ADDR_HI(cf8) |
> +               CF8_ADDR_LO(cf8) |
> +               (p->addr & 3);
> +    }
> +    else
> +    {
> +        type = p->type;

I know I asked this before, and iirc your answer was "no": Is
IOREQ_TYPE_PCI_CONFIG coming in here valid?

> +        addr = p->addr;
> +    }
> +
> +    list_for_each_entry ( s,
> +                          &d->arch.hvm_domain.ioreq_server.list,
> +                          list_entry )
> +    {
> +        struct rangeset *r;
> +
> +        if ( s == d->arch.hvm_domain.default_ioreq_server )
> +            continue;
> +
> +        BUILD_BUG_ON(IOREQ_TYPE_PIO != HVMOP_IO_RANGE_PORT);
> +        BUILD_BUG_ON(IOREQ_TYPE_COPY != HVMOP_IO_RANGE_MEMORY);
> +        BUILD_BUG_ON(IOREQ_TYPE_PCI_CONFIG != HVMOP_IO_RANGE_PCI);
> +        r = s->range[type];
> +
> +        switch ( type )
> +        {
> +        case IOREQ_TYPE_PIO:
> +        case IOREQ_TYPE_COPY:
> +            if ( rangeset_contains_singleton(r, addr) )

While the PCI check below certainly can be a singleton one, I don't
think you can do so here: In order for a request to be forwarded,
the full spanned range should be owned by the respective server.
And yes, consideration how to deal with split accesses is going to be
needed I'm afraid.

> +static int hvmop_create_ioreq_server(
> +    XEN_GUEST_HANDLE_PARAM(xen_hvm_create_ioreq_server_t) uop)
> +{
> +    struct domain *curr_d = current->domain;
> +    xen_hvm_create_ioreq_server_t op;
> +    struct domain *d;
> +    int rc;
> +
> +    if ( copy_from_guest(&op, uop, 1) )
> +        return -EFAULT;
> +
> +    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
> +    if ( rc != 0 )
> +        return rc;
> +
> +    rc = -EINVAL;
> +    if ( !is_hvm_domain(d) )
> +        goto out;
> +
> +    rc = xsm_hvm_ioreq_server(XSM_DM_PRIV, d);

This call is indistinguishable from the four other ones further down.
You ought to let XSM know what operation it really is that you're
about to perform.

Jan

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v7 8/9] ioreq-server: remove p2m entries when server is enabled
  2014-05-09  8:40 ` [PATCH v7 8/9] ioreq-server: remove p2m entries when server is enabled Paul Durrant
  2014-05-09  9:36   ` Ian Campbell
@ 2014-05-12 11:34   ` Jan Beulich
  2014-05-12 12:21     ` Paul Durrant
  1 sibling, 1 reply; 39+ messages in thread
From: Jan Beulich @ 2014-05-12 11:34 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel, Ian Jackson, Ian Campbell, Stefano Stabellini

>>> On 09.05.14 at 10:40, <paul.durrant@citrix.com> wrote:
> For secondary servers, add a hvm op to enable/disable the server. The
> server will not accept IO until it is enabled and the act of enabling
> the server removes its pages from the guest p2m, thus preventing the guest
> from directly mapping the pages and synthesizing ioreqs.
> 
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>

Acked-by: Jan Beulich <jbeulich@suse.com>
(hypervisor side, and provided the at least one remaining coding style
issue gets cleaned up)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v7 9/9] ioreq-server: make buffered ioreq handling optional
  2014-05-09  8:40 ` [PATCH v7 9/9] ioreq-server: make buffered ioreq handling optional Paul Durrant
@ 2014-05-12 11:36   ` Jan Beulich
  2014-05-12 12:20     ` Paul Durrant
  0 siblings, 1 reply; 39+ messages in thread
From: Jan Beulich @ 2014-05-12 11:36 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel, Ian Jackson, Stefano Stabellini

>>> On 09.05.14 at 10:40, <paul.durrant@citrix.com> wrote:
> Some emulators will only register regions that require non-buffered
> access. (In practice the only region that a guest uses buffered access
> for today is the VGA aperture from 0xa0000-0xbffff). This patch therefore
> makes allocation of the buffered ioreq page and event channel optional for
> secondary ioreq servers.
> 
> If a guest attempts buffered access to an ioreq server that does not
> support it, the access will be handled via the normal synchronous path.
> 
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> Acked-by: Ian Campbell <ian.campbell@citrix.com>

Acked-by: Jan Beulich <jbeulich@suse.com>
(hypervisor side, and provided the many remaining coding style
issues get cleaned up)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v7 7/9] ioreq-server: add support for multiple servers
  2014-05-12 11:26   ` Jan Beulich
@ 2014-05-12 12:19     ` Paul Durrant
  2014-05-19 12:55     ` Paul Durrant
  1 sibling, 0 replies; 39+ messages in thread
From: Paul Durrant @ 2014-05-12 12:19 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Ian Jackson, Stefano Stabellini, Ian Campbell, xen-devel

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 12 May 2014 04:26
> To: Paul Durrant
> Cc: Ian Campbell; Ian Jackson; Stefano Stabellini; xen-devel@lists.xen.org
> Subject: Re: [PATCH v7 7/9] ioreq-server: add support for multiple servers
> 
> >>> On 09.05.14 at 10:40, <paul.durrant@citrix.com> wrote:
> > +static void hvm_free_ioreq_gmfn(struct domain *d, unsigned long gmfn)
> > +{
> > +    unsigned int i = gmfn - d->arch.hvm_domain.ioreq_gmfn.base;
> > +
> > +    clear_bit(i, &d->arch.hvm_domain.ioreq_gmfn.mask);
> 
> Perhaps worth an ASSERT() on i not being out of range, to avoid
> hard to debug memory corruption?
>

Ok.
 
> > +static int hvm_access_cf8(
> > +    int dir, uint32_t port, uint32_t bytes, uint32_t *val)
> > +{
> > +    struct domain *d = current->domain;
> > +
> > +    if ( bytes != 4 )
> > +        return X86EMUL_UNHANDLEABLE;
> > +
> > +    BUG_ON(port < 0xcf8);
> 
> If at all, that would better be an ASSERT() imo.
> 

Ok.

> > +    port -= 0xcf8;
> 
> Stale statement - port isn't being used anymore below.
> 

Ok - thanks for spotting that.
> > +
> > +    if ( dir == IOREQ_WRITE )
> > +    {
> > +        d->arch.hvm_domain.pci_cf8 = *val;
> > +
> > +        /* We always need to fall through to the catch all emulator */
> > +        return X86EMUL_UNHANDLEABLE;
> > +    }
> > +    else
> > +    {
> > +        *val = d->arch.hvm_domain.pci_cf8;
> > +        return X86EMUL_OKAY;
> 
> Is this correct irrespective of the value of the high bit? You don't
> really know what non-config-space accesses mean (including whether
> that would have any side effects), so I guess you'd be better of
> forwarding that case too.
> 

True - there could be side effects. I'll forward if the top bit is set.

> And in the end I wonder whether it wouldn't be better to forward
> all access, with this function solely serving the purpose of latching
> the last written value.
> 

Yes, I guess I could do that since the write is being forwarded.

> > -static int hvm_ioreq_server_map_pages(struct hvm_ioreq_server *s)
> > +static int hvm_ioreq_server_map_pages(struct hvm_ioreq_server *s,
> > +                                      bool_t is_default)
> >  {
> >      struct domain *d = s->domain;
> > -    unsigned long pfn;
> > +    unsigned long ioreq_pfn, bufioreq_pfn;
> >      int rc;
> >
> > -    pfn = d->arch.hvm_domain.params[HVM_PARAM_IOREQ_PFN];
> > -    rc = hvm_map_ioreq_page(s, 0, pfn);
> > +    if ( is_default ) {
> > +        ioreq_pfn = d-
> >arch.hvm_domain.params[HVM_PARAM_IOREQ_PFN];
> > +        bufioreq_pfn = d-
> >arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_PFN];
> > +    } else {
> 
> I hate to say that, but: Coding style. And still more of this further
> down.
> 

Aaarggh. I keep checking for these but I'm just not seeing them.

> > +static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s,
> > +                                            bool_t is_default)
> > +{
> > +    int i;
> 
> And for variables like this I also already asked that they be unsigned
> (as properly done e.g. in hvm_ioreq_server_free_rangesets()).
> 

Ok.

> > +    int rc;
> > +
> > +    if ( is_default )
> > +        goto done;
> > +
> > +    for ( i = 0; i < NR_IO_RANGE_TYPES; i++ ) {
> > +        char *name;
> > +
> > +        rc = asprintf(&name, "ioreq_server %d %s", s->id,
> > +                      (i == HVMOP_IO_RANGE_PORT) ? "port" :
> > +                      (i == HVMOP_IO_RANGE_MEMORY) ? "memory" :
> > +                      (i == HVMOP_IO_RANGE_PCI) ? "pci" :
> > +                      "");
> > +        if ( rc )
> > +            goto fail;
> > +
> > +        s->range[i] = rangeset_new(s->domain, name,
> > +                                   RANGESETF_prettyprint_hex);
> > +
> > +        xfree(name);
> > +
> > +        rc = -ENOMEM;
> > +        if ( !s->range[i] )
> > +            goto fail;
> > +
> > +        rangeset_limit(s->range[i], MAX_NR_IO_RANGES);
> 
> Just as a remark/question: Perhaps ultimately this limit should
> become a configurable one, at once allowing the limits for the three
> kinds to be different?
> 

Yes, I guess it's likely the limit on PCI ranges could be lower than the others. It's really just a safety net to stop a malicious emulator from stealing all of Xen's memory though.

> > +static ioservid_t next_ioservid(struct domain *d)
> > +{
> > +    struct hvm_ioreq_server *s;
> > +    static ioservid_t id;
> > +
> > +    ASSERT(spin_is_locked(&d->arch.hvm_domain.ioreq_server.lock));
> > +
> > +    id++;
> > +
> > + again:
> > +    /* Check for uniqueness */
> > +    list_for_each_entry ( s,
> > +                          &d->arch.hvm_domain.ioreq_server.list,
> > +                          list_entry )
> > +    {
> > +        if (id == s->id)
> > +        {
> > +            id++;
> > +            goto again;
> 
> Please put the label ahead of the pre-loop increment, avoiding the
> duplication.
> 
> And what's worse - without latching the value of "id" into a local
> variable (storing back only the final result), your uniqueness check
> doesn't work, because the lock only guards against concurrent
> accesses for a single domain. Once fixed, perhaps also add a brief
> comment explaining why the remaining not fully serialized access to
> "id" is not a problem?
> 

Good point. Multiple emulating domains would fight here.

> And of course, if you're going to stay with this system-wide static
> variable, you'd need to clarify why you think this is not a - how
> ever small - information leak (of the side channel kind).
> 

Yes - if I move to a per-domain value then the lock also does its job so I'll do that.

> > @@ -757,25 +942,35 @@ static int hvm_create_ioreq_server(struct domain
> *d,
> > domid_t domid)
> >          goto fail1;
> >
> >      domain_pause(d);
> > -    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> > +    spin_lock(&d->arch.hvm_domain.ioreq_server.lock);
> >
> >      rc = -EEXIST;
> > -    if ( d->arch.hvm_domain.ioreq_server != NULL )
> > +    if ( is_default && d->arch.hvm_domain.default_ioreq_server != NULL )
> >          goto fail2;
> >
> > -    rc = hvm_ioreq_server_init(s, d, domid);
> > +    rc = hvm_ioreq_server_init(s, d, domid, is_default,
> > +                               next_ioservid(d));
> >      if ( rc )
> > -        goto fail2;
> > +        goto fail3;
> > +
> > +    list_add(&s->list_entry,
> > +             &d->arch.hvm_domain.ioreq_server.list);
> > +    d->arch.hvm_domain.ioreq_server.count++;
> 
> Do you really need that field? The only time it's being read seems to be
> to determine whether there's exactly one of them. That surely can also
> be determined by looking at .list.
>

Yes - I think I must have used for something else at some point. I'll get rid of it.
 
> > +static int hvm_map_io_range_to_ioreq_server(struct domain *d,
> ioservid_t id,
> > +                                            uint32_t type, uint64_t start, uint64_t end)
> > +{
> > +    struct hvm_ioreq_server *s;
> > +    int rc;
> > +
> > +    spin_lock(&d->arch.hvm_domain.ioreq_server.lock);
> > +
> > +    rc = -ENOENT;
> > +    list_for_each_entry ( s,
> > +                          &d->arch.hvm_domain.ioreq_server.list,
> > +                          list_entry )
> > +    {
> > +        if ( s == d->arch.hvm_domain.default_ioreq_server )
> > +            continue;
> > +
> > +        if ( s->id == id )
> > +        {
> > +            struct rangeset *r;
> > +
> > +            switch ( type )
> > +            {
> > +            case HVMOP_IO_RANGE_PORT:
> > +            case HVMOP_IO_RANGE_MEMORY:
> > +            case HVMOP_IO_RANGE_PCI:
> > +                r = s->range[type];
> > +                break;
> > +
> > +            default:
> > +                r = NULL;
> > +                break;
> > +            }
> > +
> > +            rc = -EINVAL;
> > +            if ( !r )
> > +                break;
> > +
> > +            rc = rangeset_add_range(r, start, end);
> 
> Is it not an error for the range to perhaps already be there ...
> 

Hmm. I was trusting the rangeset code to fail in that case; I'll check what it actually does.

> > +static int hvm_unmap_io_range_from_ioreq_server(struct domain *d,
> ioservid_t id,
> > +                                                uint32_t type, uint64_t start, uint64_t end)
> > +{
> > +    struct hvm_ioreq_server *s;
> > +    int rc;
> > +
> > +    spin_lock(&d->arch.hvm_domain.ioreq_server.lock);
> > +
> > +    rc = -ENOENT;
> > +    list_for_each_entry ( s,
> > +                          &d->arch.hvm_domain.ioreq_server.list,
> > +                          list_entry )
> > +    {
> > +        if ( s == d->arch.hvm_domain.default_ioreq_server )
> > +            continue;
> > +
> > +        if ( s->id == id )
> > +        {
> > +            struct rangeset *r;
> > +
> > +            switch ( type )
> > +            {
> > +            case HVMOP_IO_RANGE_PORT:
> > +            case HVMOP_IO_RANGE_MEMORY:
> > +            case HVMOP_IO_RANGE_PCI:
> > +                r = s->range[type];
> > +                break;
> > +
> > +            default:
> > +                r = NULL;
> > +                break;
> > +            }
> > +
> > +            rc = -EINVAL;
> > +            if ( !r )
> > +                break;
> > +
> > +            rc = rangeset_remove_range(r, start, end);
> 
> ... or to not be there here?
> 

Again. I was expecting the rangeset code to fail. I'll check.

> > +static int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu
> *v)
> > +{
> > +    struct hvm_ioreq_server *s;
> > +    int rc;
> > +
> > +    spin_lock(&d->arch.hvm_domain.ioreq_server.lock);
> > +
> > +    list_for_each_entry ( s,
> > +                          &d->arch.hvm_domain.ioreq_server.list,
> > +                          list_entry )
> > +    {
> > +        bool_t is_default = ( s == d-
> >arch.hvm_domain.default_ioreq_server);
> 
> Stray blank after (, also again further down.
> 

Ok. Must have coded as an if and then changed my mind.

> > +static void hvm_destroy_all_ioreq_servers(struct domain *d)
> > +{
> > +    struct hvm_ioreq_server *s, *next;
> > +
> > +    spin_lock(&d->arch.hvm_domain.ioreq_server.lock);
> > +
> > +    list_for_each_entry_safe ( s,
> > +                               next,
> > +                               &d->arch.hvm_domain.ioreq_server.list,
> > +                               list_entry )
> > +    {
> > +        bool_t is_default = ( s == d-
> >arch.hvm_domain.default_ioreq_server);
> > +
> > +        domain_pause(d);
> 
> This gets called only during domain teardown - what's the point of
> the domain_pause()?
> 

True. Too much cut'n'paste from the single server case.

> > +static struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain
> *d,
> > +                                                        ioreq_t *p)
> > +{
> > +#define CF8_BDF(cf8)     (((cf8) & 0x00ffff00) >> 8)
> > +#define CF8_ADDR_LO(cf8) ((cf8) & 0x000000fc)
> > +#define CF8_ADDR_HI(cf8) (((cf8) & 0x0f000000) >> 16)
> 
> Any reason you exclude the low two bits here, despite separately
> or-ing them back in below?
> 

A couple of documents I've read suggest that the 2 low order bits of cf8 are reserved, so I mask them here. The bits I OR in lower down are from the data cycle rather than the address cycle.

> > +#define CF8_ENABLED(cf8) (!!((cf8) & 0x80000000))
> > +
> > +    struct hvm_ioreq_server *s;
> > +    uint32_t cf8;
> > +    uint8_t type;
> > +    uint64_t addr;
> > +
> > +    if ( d->arch.hvm_domain.ioreq_server.count <= 1 ||
> > +         (p->type != IOREQ_TYPE_COPY && p->type != IOREQ_TYPE_PIO) )
> > +        return d->arch.hvm_domain.default_ioreq_server;
> > +
> > +    cf8 = d->arch.hvm_domain.pci_cf8;
> > +
> > +    if ( p->type == IOREQ_TYPE_PIO &&
> > +         (p->addr & ~3) == 0xcfc &&
> > +         CF8_ENABLED(cf8) )
> > +    {
> > +        uint32_t sbdf;
> > +
> > +        /* PCI config data cycle */
> > +
> > +        sbdf = HVMOP_PCI_SBDF(0,
> > +                              PCI_BUS(CF8_BDF(cf8)),
> > +                              PCI_SLOT(CF8_BDF(cf8)),
> > +                              PCI_FUNC(CF8_BDF(cf8)));
> > +
> > +        type = IOREQ_TYPE_PCI_CONFIG;
> > +        addr = ((uint64_t)sbdf << 32) |
> > +               CF8_ADDR_HI(cf8) |
> > +               CF8_ADDR_LO(cf8) |
> > +               (p->addr & 3);
> > +    }
> > +    else
> > +    {
> > +        type = p->type;
> 
> I know I asked this before, and iirc your answer was "no": Is
> IOREQ_TYPE_PCI_CONFIG coming in here valid?
> 

No. Callers of this function should never pass that type in - it should only come back out.

> > +        addr = p->addr;
> > +    }
> > +
> > +    list_for_each_entry ( s,
> > +                          &d->arch.hvm_domain.ioreq_server.list,
> > +                          list_entry )
> > +    {
> > +        struct rangeset *r;
> > +
> > +        if ( s == d->arch.hvm_domain.default_ioreq_server )
> > +            continue;
> > +
> > +        BUILD_BUG_ON(IOREQ_TYPE_PIO != HVMOP_IO_RANGE_PORT);
> > +        BUILD_BUG_ON(IOREQ_TYPE_COPY !=
> HVMOP_IO_RANGE_MEMORY);
> > +        BUILD_BUG_ON(IOREQ_TYPE_PCI_CONFIG !=
> HVMOP_IO_RANGE_PCI);
> > +        r = s->range[type];
> > +
> > +        switch ( type )
> > +        {
> > +        case IOREQ_TYPE_PIO:
> > +        case IOREQ_TYPE_COPY:
> > +            if ( rangeset_contains_singleton(r, addr) )
> 
> While the PCI check below certainly can be a singleton one, I don't
> think you can do so here: In order for a request to be forwarded,
> the full spanned range should be owned by the respective server.
> And yes, consideration how to deal with split accesses is going to be
> needed I'm afraid.

Yes - that must have been lost with the change to using rangesets. I'm going to punt on split ranges though, as it's a whole other layer of complexity. I'll add a comment somewhere to say that unless a cycle falls fully within the range of a secondary emulator it will not be passed to it. That's certainly good enough for what I'm using this for at the moment, and support for split ranges could be added later if needed.

> 
> > +static int hvmop_create_ioreq_server(
> > +    XEN_GUEST_HANDLE_PARAM(xen_hvm_create_ioreq_server_t) uop)
> > +{
> > +    struct domain *curr_d = current->domain;
> > +    xen_hvm_create_ioreq_server_t op;
> > +    struct domain *d;
> > +    int rc;
> > +
> > +    if ( copy_from_guest(&op, uop, 1) )
> > +        return -EFAULT;
> > +
> > +    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
> > +    if ( rc != 0 )
> > +        return rc;
> > +
> > +    rc = -EINVAL;
> > +    if ( !is_hvm_domain(d) )
> > +        goto out;
> > +
> > +    rc = xsm_hvm_ioreq_server(XSM_DM_PRIV, d);
> 
> This call is indistinguishable from the four other ones further down.
> You ought to let XSM know what operation it really is that you're
> about to perform.
> 

Ok, I'll differentiate the types of operation if you think a finer grained level of access control is required.

  Paul

> Jan

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v7 9/9] ioreq-server: make buffered ioreq handling optional
  2014-05-12 11:36   ` Jan Beulich
@ 2014-05-12 12:20     ` Paul Durrant
  0 siblings, 0 replies; 39+ messages in thread
From: Paul Durrant @ 2014-05-12 12:20 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Ian Jackson, Stefano Stabellini, xen-devel

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 12 May 2014 04:37
> To: Paul Durrant
> Cc: Ian Jackson; Stefano Stabellini; xen-devel@lists.xen.org
> Subject: Re: [PATCH v7 9/9] ioreq-server: make buffered ioreq handling
> optional
> 
> >>> On 09.05.14 at 10:40, <paul.durrant@citrix.com> wrote:
> > Some emulators will only register regions that require non-buffered
> > access. (In practice the only region that a guest uses buffered access
> > for today is the VGA aperture from 0xa0000-0xbffff). This patch therefore
> > makes allocation of the buffered ioreq page and event channel optional for
> > secondary ioreq servers.
> >
> > If a guest attempts buffered access to an ioreq server that does not
> > support it, the access will be handled via the normal synchronous path.
> >
> > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> > Acked-by: Ian Campbell <ian.campbell@citrix.com>
> 
> Acked-by: Jan Beulich <jbeulich@suse.com>
> (hypervisor side, and provided the many remaining coding style
> issues get cleaned up)
> 

Thanks. I'll fix any coding style issues I can spot.

  Paul

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v7 8/9] ioreq-server: remove p2m entries when server is enabled
  2014-05-12 11:34   ` Jan Beulich
@ 2014-05-12 12:21     ` Paul Durrant
  0 siblings, 0 replies; 39+ messages in thread
From: Paul Durrant @ 2014-05-12 12:21 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Ian Jackson, Stefano Stabellini, Ian Campbell, xen-devel

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 12 May 2014 04:35
> To: Paul Durrant
> Cc: Ian Campbell; Ian Jackson; Stefano Stabellini; xen-devel@lists.xen.org
> Subject: Re: [PATCH v7 8/9] ioreq-server: remove p2m entries when server is
> enabled
> 
> >>> On 09.05.14 at 10:40, <paul.durrant@citrix.com> wrote:
> > For secondary servers, add a hvm op to enable/disable the server. The
> > server will not accept IO until it is enabled and the act of enabling
> > the server removes its pages from the guest p2m, thus preventing the
> guest
> > from directly mapping the pages and synthesizing ioreqs.
> >
> > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> 
> Acked-by: Jan Beulich <jbeulich@suse.com>
> (hypervisor side, and provided the at least one remaining coding style
> issue gets cleaned up)

Thanks.

  Paul

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v7 7/9] ioreq-server: add support for multiple servers
  2014-05-12 11:26   ` Jan Beulich
  2014-05-12 12:19     ` Paul Durrant
@ 2014-05-19 12:55     ` Paul Durrant
  2014-05-19 13:12       ` Jan Beulich
  1 sibling, 1 reply; 39+ messages in thread
From: Paul Durrant @ 2014-05-19 12:55 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Ian Jackson, Stefano Stabellini, Ian Campbell, xen-devel

> -----Original Message-----
> From: Paul Durrant
> Sent: 12 May 2014 13:20
> To: 'Jan Beulich'
> Cc: Ian Campbell; Ian Jackson; Stefano Stabellini; xen-devel@lists.xen.org
> Subject: RE: [PATCH v7 7/9] ioreq-server: add support for multiple servers
> 
[big snip]
> > > +    list_for_each_entry ( s,
> > > +                          &d->arch.hvm_domain.ioreq_server.list,
> > > +                          list_entry )
> > > +    {
> > > +        struct rangeset *r;
> > > +
> > > +        if ( s == d->arch.hvm_domain.default_ioreq_server )
> > > +            continue;
> > > +
> > > +        BUILD_BUG_ON(IOREQ_TYPE_PIO != HVMOP_IO_RANGE_PORT);
> > > +        BUILD_BUG_ON(IOREQ_TYPE_COPY !=
> > HVMOP_IO_RANGE_MEMORY);
> > > +        BUILD_BUG_ON(IOREQ_TYPE_PCI_CONFIG !=
> > HVMOP_IO_RANGE_PCI);
> > > +        r = s->range[type];
> > > +
> > > +        switch ( type )
> > > +        {
> > > +        case IOREQ_TYPE_PIO:
> > > +        case IOREQ_TYPE_COPY:
> > > +            if ( rangeset_contains_singleton(r, addr) )
> >
> > While the PCI check below certainly can be a singleton one, I don't
> > think you can do so here: In order for a request to be forwarded,
> > the full spanned range should be owned by the respective server.
> > And yes, consideration how to deal with split accesses is going to be
> > needed I'm afraid.
> 
> Yes - that must have been lost with the change to using rangesets. I'm going
> to punt on split ranges though, as it's a whole other layer of complexity. I'll
> add a comment somewhere to say that unless a cycle falls fully within the
> range of a secondary emulator it will not be passed to it. That's certainly good
> enough for what I'm using this for at the moment, and support for split
> ranges could be added later if needed.
> 

I've been looking at this and I actually think it's correct to only steer IO based on the address, rather than a span of the entire range for port IO at least; I'm pretty sure h/w only bases its decision to accept a cycle based on a decode of the base address and the cycle size - so it's possible for two different pieces of h/w to have overlapping ranges.
I also notice that Xen only passes the base address to hvm_mmio_handler.check_handler() in hvm_mmio_intercept() even though a fully matching span seems reasonable for MMIO transactions.

  Paul

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v7 7/9] ioreq-server: add support for multiple servers
  2014-05-19 12:55     ` Paul Durrant
@ 2014-05-19 13:12       ` Jan Beulich
  2014-05-19 13:15         ` Paul Durrant
  0 siblings, 1 reply; 39+ messages in thread
From: Jan Beulich @ 2014-05-19 13:12 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Ian Jackson, Stefano Stabellini, Ian Campbell, xen-devel

>>> On 19.05.14 at 14:55, <Paul.Durrant@citrix.com> wrote:
>> From: Paul Durrant
>> > > +    list_for_each_entry ( s,
>> > > +                          &d->arch.hvm_domain.ioreq_server.list,
>> > > +                          list_entry )
>> > > +    {
>> > > +        struct rangeset *r;
>> > > +
>> > > +        if ( s == d->arch.hvm_domain.default_ioreq_server )
>> > > +            continue;
>> > > +
>> > > +        BUILD_BUG_ON(IOREQ_TYPE_PIO != HVMOP_IO_RANGE_PORT);
>> > > +        BUILD_BUG_ON(IOREQ_TYPE_COPY !=
>> > HVMOP_IO_RANGE_MEMORY);
>> > > +        BUILD_BUG_ON(IOREQ_TYPE_PCI_CONFIG !=
>> > HVMOP_IO_RANGE_PCI);
>> > > +        r = s->range[type];
>> > > +
>> > > +        switch ( type )
>> > > +        {
>> > > +        case IOREQ_TYPE_PIO:
>> > > +        case IOREQ_TYPE_COPY:
>> > > +            if ( rangeset_contains_singleton(r, addr) )
>> >
>> > While the PCI check below certainly can be a singleton one, I don't
>> > think you can do so here: In order for a request to be forwarded,
>> > the full spanned range should be owned by the respective server.
>> > And yes, consideration how to deal with split accesses is going to be
>> > needed I'm afraid.
>> 
>> Yes - that must have been lost with the change to using rangesets. I'm going
>> to punt on split ranges though, as it's a whole other layer of complexity. I'll
>> add a comment somewhere to say that unless a cycle falls fully within the
>> range of a secondary emulator it will not be passed to it. That's certainly good
>> enough for what I'm using this for at the moment, and support for split
>> ranges could be added later if needed.
>> 
> 
> I've been looking at this and I actually think it's correct to only steer IO 
> based on the address, rather than a span of the entire range for port IO at 
> least; I'm pretty sure h/w only bases its decision to accept a cycle based on 
> a decode of the base address and the cycle size - so it's possible for two 
> different pieces of h/w to have overlapping ranges.

I don't think this is even fully defined for I/O ports - I could see things
done the way you say for e.g. fields falling inside a 16-byte range, but
for accesses to be split when crossing a 16-byte boundary (where the
16 is taken as an arbitrary example) - much like special MMIO behavior
results for accesses crossing cache line or page boundaries. 

Yet I think to be on the safe side here, accesses crossing server
boundaries should (at least initially, i.e. until we learn that certain
things need weakening to work in practice) be implemented as
restrictively as possible - either splitting them into parts or, for
simplicity's sake, by dropping writes and returning all ones on reads.

> I also notice that Xen only passes the base address to 
> hvm_mmio_handler.check_handler() in hvm_mmio_intercept() even though a fully 
> matching span seems reasonable for MMIO transactions.

ISTR having got puzzled by this too, but having had enough other,
more important things to do to also deal with this one.

Jan

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v7 7/9] ioreq-server: add support for multiple servers
  2014-05-19 13:12       ` Jan Beulich
@ 2014-05-19 13:15         ` Paul Durrant
  0 siblings, 0 replies; 39+ messages in thread
From: Paul Durrant @ 2014-05-19 13:15 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Ian Jackson, Stefano Stabellini, Ian Campbell, xen-devel

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 19 May 2014 14:12
> To: Paul Durrant
> Cc: Ian Campbell; Ian Jackson; Stefano Stabellini; xen-devel@lists.xen.org
> Subject: RE: [PATCH v7 7/9] ioreq-server: add support for multiple servers
> 
> >>> On 19.05.14 at 14:55, <Paul.Durrant@citrix.com> wrote:
> >> From: Paul Durrant
> >> > > +    list_for_each_entry ( s,
> >> > > +                          &d->arch.hvm_domain.ioreq_server.list,
> >> > > +                          list_entry )
> >> > > +    {
> >> > > +        struct rangeset *r;
> >> > > +
> >> > > +        if ( s == d->arch.hvm_domain.default_ioreq_server )
> >> > > +            continue;
> >> > > +
> >> > > +        BUILD_BUG_ON(IOREQ_TYPE_PIO !=
> HVMOP_IO_RANGE_PORT);
> >> > > +        BUILD_BUG_ON(IOREQ_TYPE_COPY !=
> >> > HVMOP_IO_RANGE_MEMORY);
> >> > > +        BUILD_BUG_ON(IOREQ_TYPE_PCI_CONFIG !=
> >> > HVMOP_IO_RANGE_PCI);
> >> > > +        r = s->range[type];
> >> > > +
> >> > > +        switch ( type )
> >> > > +        {
> >> > > +        case IOREQ_TYPE_PIO:
> >> > > +        case IOREQ_TYPE_COPY:
> >> > > +            if ( rangeset_contains_singleton(r, addr) )
> >> >
> >> > While the PCI check below certainly can be a singleton one, I don't
> >> > think you can do so here: In order for a request to be forwarded,
> >> > the full spanned range should be owned by the respective server.
> >> > And yes, consideration how to deal with split accesses is going to be
> >> > needed I'm afraid.
> >>
> >> Yes - that must have been lost with the change to using rangesets. I'm
> going
> >> to punt on split ranges though, as it's a whole other layer of complexity. I'll
> >> add a comment somewhere to say that unless a cycle falls fully within the
> >> range of a secondary emulator it will not be passed to it. That's certainly
> good
> >> enough for what I'm using this for at the moment, and support for split
> >> ranges could be added later if needed.
> >>
> >
> > I've been looking at this and I actually think it's correct to only steer IO
> > based on the address, rather than a span of the entire range for port IO at
> > least; I'm pretty sure h/w only bases its decision to accept a cycle based on
> > a decode of the base address and the cycle size - so it's possible for two
> > different pieces of h/w to have overlapping ranges.
> 
> I don't think this is even fully defined for I/O ports - I could see things
> done the way you say for e.g. fields falling inside a 16-byte range, but
> for accesses to be split when crossing a 16-byte boundary (where the
> 16 is taken as an arbitrary example) - much like special MMIO behavior
> results for accesses crossing cache line or page boundaries.
> 
> Yet I think to be on the safe side here, accesses crossing server
> boundaries should (at least initially, i.e. until we learn that certain
> things need weakening to work in practice) be implemented as
> restrictively as possible - either splitting them into parts or, for
> simplicity's sake, by dropping writes and returning all ones on reads.

Ok - I'm not going to deal with splits for now; I'll insist on a fully mapped span.

  Paul

> 
> > I also notice that Xen only passes the base address to
> > hvm_mmio_handler.check_handler() in hvm_mmio_intercept() even
> though a fully
> > matching span seems reasonable for MMIO transactions.
> 
> ISTR having got puzzled by this too, but having had enough other,
> more important things to do to also deal with this one.
> 
> Jan

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2014-05-19 13:15 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-09  8:39 [PATCH v7 0/9] Support for running secondary emulators Paul Durrant
2014-05-09  8:39 ` [PATCH v7 1/9] ioreq-server: pre-series tidy up Paul Durrant
2014-05-09  8:39 ` [PATCH v7 2/9] ioreq-server: centralize access to ioreq structures Paul Durrant
2014-05-09  8:39 ` [PATCH v7 3/9] ioreq-server: create basic ioreq server abstraction Paul Durrant
2014-05-09 13:09   ` Jan Beulich
2014-05-09 15:22     ` Paul Durrant
2014-05-09  8:39 ` [PATCH v7 4/9] ioreq-server: on-demand creation of ioreq server Paul Durrant
2014-05-09 13:12   ` Jan Beulich
2014-05-09 15:22     ` Paul Durrant
2014-05-09  8:40 ` [PATCH v7 5/9] Add an implentation of asprintf() for xen Paul Durrant
2014-05-09 13:06   ` Jan Beulich
2014-05-09 13:08     ` Paul Durrant
2014-05-09 13:15       ` Jan Beulich
2014-05-09 13:19         ` Paul Durrant
2014-05-09 14:15         ` Paul Durrant
2014-05-09 15:47           ` Jan Beulich
2014-05-09  8:40 ` [PATCH v7 6/9] Add the facility to limit ranges per rangeset Paul Durrant
2014-05-09 13:22   ` Jan Beulich
2014-05-09 15:23     ` Paul Durrant
2014-05-09  8:40 ` [PATCH v7 7/9] ioreq-server: add support for multiple servers Paul Durrant
2014-05-09  9:34   ` Ian Campbell
2014-05-09  9:38     ` Ian Campbell
2014-05-09  9:50     ` Paul Durrant
2014-05-09 13:25       ` Ian Campbell
2014-05-09 13:29         ` Paul Durrant
2014-05-09 13:38           ` Jan Beulich
2014-05-09 13:40             ` Paul Durrant
2014-05-12 11:26   ` Jan Beulich
2014-05-12 12:19     ` Paul Durrant
2014-05-19 12:55     ` Paul Durrant
2014-05-19 13:12       ` Jan Beulich
2014-05-19 13:15         ` Paul Durrant
2014-05-09  8:40 ` [PATCH v7 8/9] ioreq-server: remove p2m entries when server is enabled Paul Durrant
2014-05-09  9:36   ` Ian Campbell
2014-05-12 11:34   ` Jan Beulich
2014-05-12 12:21     ` Paul Durrant
2014-05-09  8:40 ` [PATCH v7 9/9] ioreq-server: make buffered ioreq handling optional Paul Durrant
2014-05-12 11:36   ` Jan Beulich
2014-05-12 12:20     ` Paul Durrant

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.