All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/9] Support for running secondary emulators
@ 2014-05-01 12:08 Paul Durrant
  2014-05-01 12:08 ` [PATCH v5 1/9] hvm_set_ioreq_page() releases wrong page in error path Paul Durrant
                   ` (9 more replies)
  0 siblings, 10 replies; 57+ messages in thread
From: Paul Durrant @ 2014-05-01 12:08 UTC (permalink / raw)
  To: xen-devel

This patch series adds the ioreq server interface which I mentioned in
my talk at the Xen developer summit in Edinburgh at the end of last year.
The code is based on work originally done by Julien Grall but has been
re-written to allow existing versions of QEMU to work unmodified.

The code is available in my xen.git [1] repo on xenbits, under the 'savannah5'
branch, and I have also written a demo emulator to test the code, which can
be found in my demu.git [2] repo.


The series has been re-worked since v4. The modifications are now broken
down as follows:

Patch #1 is a bug-fix in code touched later on in the series.

Patch #2 is a pre-series tidy-up. No semantic change.

Patch #3 moves some code around to centralize use of the ioreq_t data
structure.

Patch #4 introduces the new hvm_ioreq_server structure.

Patch #5 defers creation of the ioreq server until something actually
reads one of the HVM parameters concerned with emulation.

Patch #6 makes the single ioreq server of previous patches into the
default ioreq server and introduces an API for creating secondary servers.

Patch #7 adds an enable/disable operation to the API for secondary servers
which makes sure that they cannot be active whilst their shared pages are
present in the guest's P2M.

Patch #8 adds makes handling bufferd ioreqs optional for secondary servers.
This saves a page of memory per server.

Patch #9 pulls the PCI hotplug controller emulation into Xen. This is
necessary to allow a secondary emulator to hotplug a PCI device into the VM.
The code implements the controller in the same way as upstream QEMU and thus
the variant of the DSDT ASL used for upstream QEMU is retained.


The demo emulator can simply be invoked from a shell and will hotplug its
device onto the PCI bus (and remove it again when it's killed). The emulated
device is not an awful lot of use at this stage - it appears as a SCSI
controller with one IO BAR and one MEM BAR and has no intrinsic
functionality... but then it is only supposed to be demo :-)

  Paul

[1] http://xenbits.xen.org/gitweb/?p=people/pauldu/xen.git
[2] http://xenbits.xen.org/gitweb/?p=people/pauldu/demu.git

v2:
 - First non-RFC posting

v3:
 - Addressed comments from Jan Beulich

v4:
 - Addressed comments from Ian Campbell and George Dunlap
 - Series heavily re-worked, 2 patches added

v5:
 - One more patch added to separate out a bug-fix, as requested by
   Jan Beulich
 - Switched to using rangesets as suggested by Jan Beulich
 - Changed domain restore path as requested by Ian Campbell
 - Added documentation for new hypercalls and libxenctrl API as requested
   by Ian Campbell
 - Added handling of multi-byte GPE I/O as suggested by Jan Beulich

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v5 1/9] hvm_set_ioreq_page() releases wrong page in error path
  2014-05-01 12:08 [PATCH v5 0/9] Support for running secondary emulators Paul Durrant
@ 2014-05-01 12:08 ` Paul Durrant
  2014-05-01 12:48   ` Andrew Cooper
  2014-05-01 12:08 ` [PATCH v5 2/9] ioreq-server: pre-series tidy up Paul Durrant
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 57+ messages in thread
From: Paul Durrant @ 2014-05-01 12:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Paul Durrant, Keir Fraser, Jan Beulich

The function calls prepare_ring_for_helper() to acquire a mapping for the
given gmfn, then checks (under lock) to see if the ioreq page is already
set up but, if it is, the function then releases the in-use ioreq page
mapping on the error path rather than the one it just acquired. This patch
fixes this bug.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
---
 xen/arch/x86/hvm/hvm.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index ac05160..3dec1eb 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -496,7 +496,7 @@ static int hvm_set_ioreq_page(
 
     if ( (iorp->va != NULL) || d->is_dying )
     {
-        destroy_ring_for_helper(&iorp->va, iorp->page);
+        destroy_ring_for_helper(&va, page);
         spin_unlock(&iorp->lock);
         return -EINVAL;
     }
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 2/9] ioreq-server: pre-series tidy up
  2014-05-01 12:08 [PATCH v5 0/9] Support for running secondary emulators Paul Durrant
  2014-05-01 12:08 ` [PATCH v5 1/9] hvm_set_ioreq_page() releases wrong page in error path Paul Durrant
@ 2014-05-01 12:08 ` Paul Durrant
  2014-05-06 12:25   ` Jan Beulich
  2014-05-01 12:08 ` [PATCH v5 3/9] ioreq-server: centralize access to ioreq structures Paul Durrant
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 57+ messages in thread
From: Paul Durrant @ 2014-05-01 12:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Paul Durrant, Keir Fraser, Jan Beulich

This patch tidies up various parts of the code that following patches move
around. If these modifications were combined with the code motion it would
be easy to miss them.

There's also some function renaming to reflect purpose and a single
whitespace fix.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
---
 xen/arch/x86/hvm/emulate.c        |    2 +-
 xen/arch/x86/hvm/hvm.c            |   25 +++++++++++++------------
 xen/arch/x86/hvm/io.c             |   36 ++++++++++++++++--------------------
 xen/include/asm-x86/hvm/hvm.h     |    2 +-
 xen/include/asm-x86/hvm/support.h |    2 ++
 5 files changed, 33 insertions(+), 34 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 868aa1d..6d3522a 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -241,7 +241,7 @@ static int hvmemul_do_io(
         else
         {
             rc = X86EMUL_RETRY;
-            if ( !hvm_send_assist_req(curr) )
+            if ( !hvm_send_assist_req() )
                 vio->io_state = HVMIO_none;
             else if ( p_data == NULL )
                 rc = X86EMUL_OKAY;
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 3dec1eb..e12d9fe 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -365,7 +365,7 @@ void hvm_migrate_pirqs(struct vcpu *v)
 
 void hvm_do_resume(struct vcpu *v)
 {
-    ioreq_t *p;
+    ioreq_t *p = get_ioreq(v);
 
     check_wakeup_from_wait();
 
@@ -373,7 +373,7 @@ void hvm_do_resume(struct vcpu *v)
         pt_restore_timer(v);
 
     /* NB. Optimised for common case (p->state == STATE_IOREQ_NONE). */
-    if ( !(p = get_ioreq(v)) )
+    if ( !p )
         goto check_inject_trap;
 
     while ( p->state != STATE_IOREQ_NONE )
@@ -426,7 +426,7 @@ void destroy_ring_for_helper(
     }
 }
 
-static void hvm_destroy_ioreq_page(
+static void hvm_unmap_ioreq_page(
     struct domain *d, struct hvm_ioreq_page *iorp)
 {
     spin_lock(&iorp->lock);
@@ -482,7 +482,7 @@ int prepare_ring_for_helper(
     return 0;
 }
 
-static int hvm_set_ioreq_page(
+static int hvm_map_ioreq_page(
     struct domain *d, struct hvm_ioreq_page *iorp, unsigned long gmfn)
 {
     struct page_info *page;
@@ -652,8 +652,8 @@ void hvm_domain_relinquish_resources(struct domain *d)
     if ( hvm_funcs.nhvm_domain_relinquish_resources )
         hvm_funcs.nhvm_domain_relinquish_resources(d);
 
-    hvm_destroy_ioreq_page(d, &d->arch.hvm_domain.ioreq);
-    hvm_destroy_ioreq_page(d, &d->arch.hvm_domain.buf_ioreq);
+    hvm_unmap_ioreq_page(d, &d->arch.hvm_domain.ioreq);
+    hvm_unmap_ioreq_page(d, &d->arch.hvm_domain.buf_ioreq);
 
     msixtbl_pt_cleanup(d);
 
@@ -1425,14 +1425,15 @@ void hvm_vcpu_down(struct vcpu *v)
     }
 }
 
-bool_t hvm_send_assist_req(struct vcpu *v)
+bool_t hvm_send_assist_req(void)
 {
-    ioreq_t *p;
+    struct vcpu *v = current;
+    ioreq_t *p = get_ioreq(v);
 
     if ( unlikely(!vcpu_start_shutdown_deferral(v)) )
         return 0; /* implicitly bins the i/o operation */
 
-    if ( !(p = get_ioreq(v)) )
+    if ( !p )
         return 0;
 
     if ( unlikely(p->state != STATE_IOREQ_NONE) )
@@ -4120,7 +4121,7 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
             {
             case HVM_PARAM_IOREQ_PFN:
                 iorp = &d->arch.hvm_domain.ioreq;
-                if ( (rc = hvm_set_ioreq_page(d, iorp, a.value)) != 0 )
+                if ( (rc = hvm_map_ioreq_page(d, iorp, a.value)) != 0 )
                     break;
                 spin_lock(&iorp->lock);
                 if ( iorp->va != NULL )
@@ -4129,9 +4130,9 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
                         get_ioreq(v)->vp_eport = v->arch.hvm_vcpu.xen_port;
                 spin_unlock(&iorp->lock);
                 break;
-            case HVM_PARAM_BUFIOREQ_PFN: 
+            case HVM_PARAM_BUFIOREQ_PFN:
                 iorp = &d->arch.hvm_domain.buf_ioreq;
-                rc = hvm_set_ioreq_page(d, iorp, a.value);
+                rc = hvm_map_ioreq_page(d, iorp, a.value);
                 break;
             case HVM_PARAM_CALLBACK_IRQ:
                 hvm_set_callback_via(d, a.value);
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index bf6309d..87bb3f6 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -49,7 +49,8 @@
 int hvm_buffered_io_send(ioreq_t *p)
 {
     struct vcpu *v = current;
-    struct hvm_ioreq_page *iorp = &v->domain->arch.hvm_domain.buf_ioreq;
+    struct domain *d = v->domain;
+    struct hvm_ioreq_page *iorp = &d->arch.hvm_domain.buf_ioreq;
     buffered_iopage_t *pg = iorp->va;
     buf_ioreq_t bp;
     /* Timeoffset sends 64b data, but no address. Use two consecutive slots. */
@@ -104,22 +105,20 @@ int hvm_buffered_io_send(ioreq_t *p)
         return 0;
     }
     
-    memcpy(&pg->buf_ioreq[pg->write_pointer % IOREQ_BUFFER_SLOT_NUM],
-           &bp, sizeof(bp));
+    pg->buf_ioreq[pg->write_pointer % IOREQ_BUFFER_SLOT_NUM] = bp;
     
     if ( qw )
     {
         bp.data = p->data >> 32;
-        memcpy(&pg->buf_ioreq[(pg->write_pointer+1) % IOREQ_BUFFER_SLOT_NUM],
-               &bp, sizeof(bp));
+        pg->buf_ioreq[(pg->write_pointer+1) % IOREQ_BUFFER_SLOT_NUM] = bp;
     }
 
     /* Make the ioreq_t visible /before/ write_pointer. */
     wmb();
     pg->write_pointer += qw ? 2 : 1;
 
-    notify_via_xen_event_channel(v->domain,
-            v->domain->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN]);
+    notify_via_xen_event_channel(d,
+            d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN]);
     spin_unlock(&iorp->lock);
     
     return 1;
@@ -127,22 +126,19 @@ int hvm_buffered_io_send(ioreq_t *p)
 
 void send_timeoffset_req(unsigned long timeoff)
 {
-    ioreq_t p[1];
+    ioreq_t p = {
+        .type = IOREQ_TYPE_TIMEOFFSET,
+        .size = 8,
+        .count = 1,
+        .dir = IOREQ_WRITE,
+        .data = timeoff,
+        .state = STATE_IOREQ_READY,
+    };
 
     if ( timeoff == 0 )
         return;
 
-    memset(p, 0, sizeof(*p));
-
-    p->type = IOREQ_TYPE_TIMEOFFSET;
-    p->size = 8;
-    p->count = 1;
-    p->dir = IOREQ_WRITE;
-    p->data = timeoff;
-
-    p->state = STATE_IOREQ_READY;
-
-    if ( !hvm_buffered_io_send(p) )
+    if ( !hvm_buffered_io_send(&p) )
         printk("Unsuccessful timeoffset update\n");
 }
 
@@ -168,7 +164,7 @@ void send_invalidate_req(void)
     p->dir = IOREQ_WRITE;
     p->data = ~0UL; /* flush all */
 
-    (void)hvm_send_assist_req(v);
+    (void)hvm_send_assist_req();
 }
 
 int handle_mmio(void)
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index c373930..9d84f4a 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -227,7 +227,7 @@ int prepare_ring_for_helper(struct domain *d, unsigned long gmfn,
                             struct page_info **_page, void **_va);
 void destroy_ring_for_helper(void **_va, struct page_info *page);
 
-bool_t hvm_send_assist_req(struct vcpu *v);
+bool_t hvm_send_assist_req(void);
 
 void hvm_get_guest_pat(struct vcpu *v, u64 *guest_pat);
 int hvm_set_guest_pat(struct vcpu *v, u64 guest_pat);
diff --git a/xen/include/asm-x86/hvm/support.h b/xen/include/asm-x86/hvm/support.h
index 3529499..1dc2f2d 100644
--- a/xen/include/asm-x86/hvm/support.h
+++ b/xen/include/asm-x86/hvm/support.h
@@ -31,7 +31,9 @@ static inline ioreq_t *get_ioreq(struct vcpu *v)
 {
     struct domain *d = v->domain;
     shared_iopage_t *p = d->arch.hvm_domain.ioreq.va;
+
     ASSERT((v == current) || spin_is_locked(&d->arch.hvm_domain.ioreq.lock));
+
     return p ? &p->vcpu_ioreq[v->vcpu_id] : NULL;
 }
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 3/9] ioreq-server: centralize access to ioreq structures
  2014-05-01 12:08 [PATCH v5 0/9] Support for running secondary emulators Paul Durrant
  2014-05-01 12:08 ` [PATCH v5 1/9] hvm_set_ioreq_page() releases wrong page in error path Paul Durrant
  2014-05-01 12:08 ` [PATCH v5 2/9] ioreq-server: pre-series tidy up Paul Durrant
@ 2014-05-01 12:08 ` Paul Durrant
  2014-05-06 12:35   ` Jan Beulich
  2014-05-01 12:08 ` [PATCH v5 4/9] ioreq-server: create basic ioreq server abstraction Paul Durrant
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 57+ messages in thread
From: Paul Durrant @ 2014-05-01 12:08 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Keir Fraser, Jan Beulich, Eddie Dong, Paul Durrant,
	Jun Nakajima

To simplify creation of the ioreq server abstraction in a subsequent patch,
this patch centralizes all use of the shared ioreq structure and the
buffered ioreq ring to the source module xen/arch/x86/hvm/hvm.c.

The patch moves an rmb() from inside hvm_io_assist() to hvm_do_resume()
because the former may now be passed a data structure on stack, in which
case the barrier is unnecessary.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: Eddie Dong <eddie.dong@intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
---
 xen/arch/x86/hvm/emulate.c        |   66 ++++++++++-----------
 xen/arch/x86/hvm/hvm.c            |  115 +++++++++++++++++++++++++++++++++++--
 xen/arch/x86/hvm/io.c             |  106 +++-------------------------------
 xen/arch/x86/hvm/vmx/vvmx.c       |   13 ++++-
 xen/include/asm-x86/hvm/hvm.h     |   15 ++++-
 xen/include/asm-x86/hvm/support.h |   21 ++++---
 6 files changed, 184 insertions(+), 152 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 6d3522a..904c71a 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -57,24 +57,11 @@ static int hvmemul_do_io(
     int value_is_ptr = (p_data == NULL);
     struct vcpu *curr = current;
     struct hvm_vcpu_io *vio;
-    ioreq_t *p = get_ioreq(curr);
-    ioreq_t _ioreq;
+    ioreq_t p;
     unsigned long ram_gfn = paddr_to_pfn(ram_gpa);
     p2m_type_t p2mt;
     struct page_info *ram_page;
     int rc;
-    bool_t has_dm = 1;
-
-    /*
-     * Domains without a backing DM, don't have an ioreq page.  Just
-     * point to a struct on the stack, initialising the state as needed.
-     */
-    if ( !p )
-    {
-        has_dm = 0;
-        p = &_ioreq;
-        p->state = STATE_IOREQ_NONE;
-    }
 
     /* Check for paged out page */
     ram_page = get_page_from_gfn(curr->domain, ram_gfn, &p2mt, P2M_UNSHARE);
@@ -173,10 +160,9 @@ static int hvmemul_do_io(
         return X86EMUL_UNHANDLEABLE;
     }
 
-    if ( p->state != STATE_IOREQ_NONE )
+    if ( hvm_io_pending(curr) )
     {
-        gdprintk(XENLOG_WARNING, "WARNING: io already pending (%d)?\n",
-                 p->state);
+        gdprintk(XENLOG_WARNING, "WARNING: io already pending?\n");
         if ( ram_page )
             put_page(ram_page);
         return X86EMUL_UNHANDLEABLE;
@@ -193,38 +179,38 @@ static int hvmemul_do_io(
     if ( vio->mmio_retrying )
         *reps = 1;
 
-    p->dir = dir;
-    p->data_is_ptr = value_is_ptr;
-    p->type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO;
-    p->size = size;
-    p->addr = addr;
-    p->count = *reps;
-    p->df = df;
-    p->data = value;
+    p.dir = dir;
+    p.data_is_ptr = value_is_ptr;
+    p.type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO;
+    p.size = size;
+    p.addr = addr;
+    p.count = *reps;
+    p.df = df;
+    p.data = value;
 
     if ( dir == IOREQ_WRITE )
-        hvmtrace_io_assist(is_mmio, p);
+        hvmtrace_io_assist(is_mmio, &p);
 
     if ( is_mmio )
     {
-        rc = hvm_mmio_intercept(p);
+        rc = hvm_mmio_intercept(&p);
         if ( rc == X86EMUL_UNHANDLEABLE )
-            rc = hvm_buffered_io_intercept(p);
+            rc = hvm_buffered_io_intercept(&p);
     }
     else
     {
-        rc = hvm_portio_intercept(p);
+        rc = hvm_portio_intercept(&p);
     }
 
     switch ( rc )
     {
     case X86EMUL_OKAY:
     case X86EMUL_RETRY:
-        *reps = p->count;
-        p->state = STATE_IORESP_READY;
+        *reps = p.count;
+        p.state = STATE_IORESP_READY;
         if ( !vio->mmio_retry )
         {
-            hvm_io_assist(p);
+            hvm_io_assist(&p);
             vio->io_state = HVMIO_none;
         }
         else
@@ -233,7 +219,7 @@ static int hvmemul_do_io(
         break;
     case X86EMUL_UNHANDLEABLE:
         /* If there is no backing DM, just ignore accesses */
-        if ( !has_dm )
+        if ( !hvm_has_dm(curr->domain) )
         {
             rc = X86EMUL_OKAY;
             vio->io_state = HVMIO_none;
@@ -241,7 +227,7 @@ static int hvmemul_do_io(
         else
         {
             rc = X86EMUL_RETRY;
-            if ( !hvm_send_assist_req() )
+            if ( !hvm_send_assist_req(&p) )
                 vio->io_state = HVMIO_none;
             else if ( p_data == NULL )
                 rc = X86EMUL_OKAY;
@@ -260,7 +246,7 @@ static int hvmemul_do_io(
 
  finish_access:
     if ( dir == IOREQ_READ )
-        hvmtrace_io_assist(is_mmio, p);
+        hvmtrace_io_assist(is_mmio, &p);
 
     if ( p_data != NULL )
         memcpy(p_data, &vio->io_data, size);
@@ -1292,3 +1278,13 @@ struct segment_register *hvmemul_get_seg_reg(
         hvm_get_segment_register(current, seg, &hvmemul_ctxt->seg_reg[seg]);
     return &hvmemul_ctxt->seg_reg[seg];
 }
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index e12d9fe..16c120d 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -363,6 +363,26 @@ void hvm_migrate_pirqs(struct vcpu *v)
     spin_unlock(&d->event_lock);
 }
 
+static ioreq_t *get_ioreq(struct vcpu *v)
+{
+    struct domain *d = v->domain;
+    shared_iopage_t *p = d->arch.hvm_domain.ioreq.va;
+
+    ASSERT((v == current) || spin_is_locked(&d->arch.hvm_domain.ioreq.lock));
+
+    return p ? &p->vcpu_ioreq[v->vcpu_id] : NULL;
+}
+
+bool_t hvm_io_pending(struct vcpu *v)
+{
+    ioreq_t *p = get_ioreq(v);
+
+    if ( !p )
+        return 0;
+
+    return p->state != STATE_IOREQ_NONE;
+}
+
 void hvm_do_resume(struct vcpu *v)
 {
     ioreq_t *p = get_ioreq(v);
@@ -381,11 +401,12 @@ void hvm_do_resume(struct vcpu *v)
         switch ( p->state )
         {
         case STATE_IORESP_READY: /* IORESP_READY -> NONE */
+            rmb(); /* see IORESP_READY /then/ read contents of ioreq */
             hvm_io_assist(p);
             break;
         case STATE_IOREQ_READY:  /* IOREQ_{READY,INPROCESS} -> IORESP_READY */
         case STATE_IOREQ_INPROCESS:
-            wait_on_xen_event_channel(v->arch.hvm_vcpu.xen_port,
+            wait_on_xen_event_channel(p->vp_eport,
                                       (p->state != STATE_IOREQ_READY) &&
                                       (p->state != STATE_IOREQ_INPROCESS));
             break;
@@ -1425,7 +1446,89 @@ void hvm_vcpu_down(struct vcpu *v)
     }
 }
 
-bool_t hvm_send_assist_req(void)
+int hvm_buffered_io_send(ioreq_t *p)
+{
+    struct vcpu *v = current;
+    struct domain *d = v->domain;
+    struct hvm_ioreq_page *iorp = &d->arch.hvm_domain.buf_ioreq;
+    buffered_iopage_t *pg = iorp->va;
+    buf_ioreq_t bp;
+    /* Timeoffset sends 64b data, but no address. Use two consecutive slots. */
+    int qw = 0;
+
+    /* Ensure buffered_iopage fits in a page */
+    BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE);
+
+    /*
+     * Return 0 for the cases we can't deal with:
+     *  - 'addr' is only a 20-bit field, so we cannot address beyond 1MB
+     *  - we cannot buffer accesses to guest memory buffers, as the guest
+     *    may expect the memory buffer to be synchronously accessed
+     *  - the count field is usually used with data_is_ptr and since we don't
+     *    support data_is_ptr we do not waste space for the count field either
+     */
+    if ( (p->addr > 0xffffful) || p->data_is_ptr || (p->count != 1) )
+        return 0;
+
+    bp.type = p->type;
+    bp.dir  = p->dir;
+    switch ( p->size )
+    {
+    case 1:
+        bp.size = 0;
+        break;
+    case 2:
+        bp.size = 1;
+        break;
+    case 4:
+        bp.size = 2;
+        break;
+    case 8:
+        bp.size = 3;
+        qw = 1;
+        break;
+    default:
+        gdprintk(XENLOG_WARNING, "unexpected ioreq size: %u\n", p->size);
+        return 0;
+    }
+
+    bp.data = p->data;
+    bp.addr = p->addr;
+
+    spin_lock(&iorp->lock);
+
+    if ( (pg->write_pointer - pg->read_pointer) >=
+         (IOREQ_BUFFER_SLOT_NUM - qw) )
+    {
+        /* The queue is full: send the iopacket through the normal path. */
+        spin_unlock(&iorp->lock);
+        return 0;
+    }
+
+    pg->buf_ioreq[pg->write_pointer % IOREQ_BUFFER_SLOT_NUM] = bp;
+
+    if ( qw )
+    {
+        bp.data = p->data >> 32;
+        pg->buf_ioreq[(pg->write_pointer+1) % IOREQ_BUFFER_SLOT_NUM] = bp;
+    }
+
+    /* Make the ioreq_t visible /before/ write_pointer. */
+    wmb();
+    pg->write_pointer += qw ? 2 : 1;
+
+    notify_via_xen_event_channel(d, d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN]);
+    spin_unlock(&iorp->lock);
+
+    return 1;
+}
+
+bool_t hvm_has_dm(struct domain *d)
+{
+    return !!d->arch.hvm_domain.ioreq.va;
+}
+
+bool_t hvm_send_assist_req(ioreq_t *proto_p)
 {
     struct vcpu *v = current;
     ioreq_t *p = get_ioreq(v);
@@ -1444,14 +1547,18 @@ bool_t hvm_send_assist_req(void)
         return 0;
     }
 
-    prepare_wait_on_xen_event_channel(v->arch.hvm_vcpu.xen_port);
+    proto_p->state = STATE_IOREQ_NONE;
+    proto_p->vp_eport = p->vp_eport;
+    *p = *proto_p;
+
+    prepare_wait_on_xen_event_channel(p->vp_eport);
 
     /*
      * Following happens /after/ blocking and setting up ioreq contents.
      * prepare_wait_on_xen_event_channel() is an implicit barrier.
      */
     p->state = STATE_IOREQ_READY;
-    notify_via_xen_event_channel(v->domain, v->arch.hvm_vcpu.xen_port);
+    notify_via_xen_event_channel(v->domain, p->vp_eport);
 
     return 1;
 }
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index 87bb3f6..f5ad9be 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -46,84 +46,6 @@
 #include <xen/iocap.h>
 #include <public/hvm/ioreq.h>
 
-int hvm_buffered_io_send(ioreq_t *p)
-{
-    struct vcpu *v = current;
-    struct domain *d = v->domain;
-    struct hvm_ioreq_page *iorp = &d->arch.hvm_domain.buf_ioreq;
-    buffered_iopage_t *pg = iorp->va;
-    buf_ioreq_t bp;
-    /* Timeoffset sends 64b data, but no address. Use two consecutive slots. */
-    int qw = 0;
-
-    /* Ensure buffered_iopage fits in a page */
-    BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE);
-
-    /*
-     * Return 0 for the cases we can't deal with:
-     *  - 'addr' is only a 20-bit field, so we cannot address beyond 1MB
-     *  - we cannot buffer accesses to guest memory buffers, as the guest
-     *    may expect the memory buffer to be synchronously accessed
-     *  - the count field is usually used with data_is_ptr and since we don't
-     *    support data_is_ptr we do not waste space for the count field either
-     */
-    if ( (p->addr > 0xffffful) || p->data_is_ptr || (p->count != 1) )
-        return 0;
-
-    bp.type = p->type;
-    bp.dir  = p->dir;
-    switch ( p->size )
-    {
-    case 1:
-        bp.size = 0;
-        break;
-    case 2:
-        bp.size = 1;
-        break;
-    case 4:
-        bp.size = 2;
-        break;
-    case 8:
-        bp.size = 3;
-        qw = 1;
-        break;
-    default:
-        gdprintk(XENLOG_WARNING, "unexpected ioreq size: %u\n", p->size);
-        return 0;
-    }
-    
-    bp.data = p->data;
-    bp.addr = p->addr;
-    
-    spin_lock(&iorp->lock);
-
-    if ( (pg->write_pointer - pg->read_pointer) >=
-         (IOREQ_BUFFER_SLOT_NUM - qw) )
-    {
-        /* The queue is full: send the iopacket through the normal path. */
-        spin_unlock(&iorp->lock);
-        return 0;
-    }
-    
-    pg->buf_ioreq[pg->write_pointer % IOREQ_BUFFER_SLOT_NUM] = bp;
-    
-    if ( qw )
-    {
-        bp.data = p->data >> 32;
-        pg->buf_ioreq[(pg->write_pointer+1) % IOREQ_BUFFER_SLOT_NUM] = bp;
-    }
-
-    /* Make the ioreq_t visible /before/ write_pointer. */
-    wmb();
-    pg->write_pointer += qw ? 2 : 1;
-
-    notify_via_xen_event_channel(d,
-            d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN]);
-    spin_unlock(&iorp->lock);
-    
-    return 1;
-}
-
 void send_timeoffset_req(unsigned long timeoff)
 {
     ioreq_t p = {
@@ -145,26 +67,14 @@ void send_timeoffset_req(unsigned long timeoff)
 /* Ask ioemu mapcache to invalidate mappings. */
 void send_invalidate_req(void)
 {
-    struct vcpu *v = current;
-    ioreq_t *p = get_ioreq(v);
-
-    if ( !p )
-        return;
-
-    if ( p->state != STATE_IOREQ_NONE )
-    {
-        gdprintk(XENLOG_ERR, "WARNING: send invalidate req with something "
-                 "already pending (%d)?\n", p->state);
-        domain_crash(v->domain);
-        return;
-    }
-
-    p->type = IOREQ_TYPE_INVALIDATE;
-    p->size = 4;
-    p->dir = IOREQ_WRITE;
-    p->data = ~0UL; /* flush all */
+    ioreq_t p = {
+        .type = IOREQ_TYPE_INVALIDATE,
+        .size = 4,
+        .dir = IOREQ_WRITE,
+        .data = ~0UL, /* flush all */
+    };
 
-    (void)hvm_send_assist_req();
+    (void)hvm_send_assist_req(&p);
 }
 
 int handle_mmio(void)
@@ -267,8 +177,6 @@ void hvm_io_assist(ioreq_t *p)
     struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
     enum hvm_io_state io_state;
 
-    rmb(); /* see IORESP_READY /then/ read contents of ioreq */
-
     p->state = STATE_IOREQ_NONE;
 
     io_state = vio->io_state;
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index e263376..9ccc03f 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -1394,7 +1394,6 @@ void nvmx_switch_guest(void)
     struct vcpu *v = current;
     struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
     struct cpu_user_regs *regs = guest_cpu_user_regs();
-    const ioreq_t *ioreq = get_ioreq(v);
 
     /*
      * A pending IO emulation may still be not finished. In this case, no
@@ -1404,7 +1403,7 @@ void nvmx_switch_guest(void)
      * don't want to continue as this setup is not implemented nor supported
      * as of right now.
      */
-    if ( !ioreq || ioreq->state != STATE_IOREQ_NONE )
+    if ( hvm_io_pending(v) )
         return;
     /*
      * a softirq may interrupt us between a virtual vmentry is
@@ -2522,3 +2521,13 @@ void nvmx_set_cr_read_shadow(struct vcpu *v, unsigned int cr)
     /* nvcpu.guest_cr is what L2 write to cr actually. */
     __vmwrite(read_shadow_field, v->arch.hvm_vcpu.nvcpu.guest_cr[cr]);
 }
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 9d84f4a..b0f7be5 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -26,6 +26,7 @@
 #include <asm/hvm/asid.h>
 #include <public/domctl.h>
 #include <public/hvm/save.h>
+#include <public/hvm/ioreq.h>
 #include <asm/mm.h>
 
 /* Interrupt acknowledgement sources. */
@@ -227,7 +228,7 @@ int prepare_ring_for_helper(struct domain *d, unsigned long gmfn,
                             struct page_info **_page, void **_va);
 void destroy_ring_for_helper(void **_va, struct page_info *page);
 
-bool_t hvm_send_assist_req(void);
+bool_t hvm_send_assist_req(ioreq_t *p);
 
 void hvm_get_guest_pat(struct vcpu *v, u64 *guest_pat);
 int hvm_set_guest_pat(struct vcpu *v, u64 guest_pat);
@@ -342,6 +343,8 @@ static inline unsigned long hvm_get_shadow_gs_base(struct vcpu *v)
 void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
                                    unsigned int *ecx, unsigned int *edx);
 void hvm_migrate_timers(struct vcpu *v);
+bool_t hvm_has_dm(struct domain *d);
+bool_t hvm_io_pending(struct vcpu *v);
 void hvm_do_resume(struct vcpu *v);
 void hvm_migrate_pirqs(struct vcpu *v);
 
@@ -538,3 +541,13 @@ bool_t nhvm_vmcx_hap_enabled(struct vcpu *v);
 enum hvm_intblk nhvm_interrupt_blocked(struct vcpu *v);
 
 #endif /* __ASM_X86_HVM_HVM_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/asm-x86/hvm/support.h b/xen/include/asm-x86/hvm/support.h
index 1dc2f2d..05ef5c5 100644
--- a/xen/include/asm-x86/hvm/support.h
+++ b/xen/include/asm-x86/hvm/support.h
@@ -22,21 +22,10 @@
 #define __ASM_X86_HVM_SUPPORT_H__
 
 #include <xen/types.h>
-#include <public/hvm/ioreq.h>
 #include <xen/sched.h>
 #include <xen/hvm/save.h>
 #include <asm/processor.h>
 
-static inline ioreq_t *get_ioreq(struct vcpu *v)
-{
-    struct domain *d = v->domain;
-    shared_iopage_t *p = d->arch.hvm_domain.ioreq.va;
-
-    ASSERT((v == current) || spin_is_locked(&d->arch.hvm_domain.ioreq.lock));
-
-    return p ? &p->vcpu_ioreq[v->vcpu_id] : NULL;
-}
-
 #define HVM_DELIVER_NO_ERROR_CODE  -1
 
 #ifndef NDEBUG
@@ -144,3 +133,13 @@ int hvm_mov_to_cr(unsigned int cr, unsigned int gpr);
 int hvm_mov_from_cr(unsigned int cr, unsigned int gpr);
 
 #endif /* __ASM_X86_HVM_SUPPORT_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 4/9] ioreq-server: create basic ioreq server abstraction.
  2014-05-01 12:08 [PATCH v5 0/9] Support for running secondary emulators Paul Durrant
                   ` (2 preceding siblings ...)
  2014-05-01 12:08 ` [PATCH v5 3/9] ioreq-server: centralize access to ioreq structures Paul Durrant
@ 2014-05-01 12:08 ` Paul Durrant
  2014-05-06 12:55   ` Jan Beulich
  2014-05-01 12:08 ` [PATCH v5 5/9] ioreq-server: on-demand creation of ioreq server Paul Durrant
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 57+ messages in thread
From: Paul Durrant @ 2014-05-01 12:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Paul Durrant, Keir Fraser, Jan Beulich

Collect together data structures concerning device emulation together into
a new struct hvm_ioreq_server.

Code that deals with the shared and buffered ioreq pages is extracted from
functions such as hvm_domain_initialise, hvm_vcpu_initialise and do_hvm_op
and consolidated into a set of hvm_ioreq_server manipulation functions. The
lock in the hvm_ioreq_page served two different purposes and has been
replaced by separate locks in the hvm_ioreq_server structure.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
---
 xen/arch/x86/hvm/hvm.c           |  397 ++++++++++++++++++++++++++------------
 xen/include/asm-x86/hvm/domain.h |   36 +++-
 xen/include/asm-x86/hvm/vcpu.h   |   12 +-
 3 files changed, 314 insertions(+), 131 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 16c120d..1684705 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -363,39 +363,44 @@ void hvm_migrate_pirqs(struct vcpu *v)
     spin_unlock(&d->event_lock);
 }
 
-static ioreq_t *get_ioreq(struct vcpu *v)
+static ioreq_t *get_ioreq(struct hvm_ioreq_server *s, struct vcpu *v)
 {
-    struct domain *d = v->domain;
-    shared_iopage_t *p = d->arch.hvm_domain.ioreq.va;
+    shared_iopage_t *p = s->ioreq.va;
 
-    ASSERT((v == current) || spin_is_locked(&d->arch.hvm_domain.ioreq.lock));
+    ASSERT((v == current) || !vcpu_runnable(v));
+    ASSERT(p != NULL);
 
-    return p ? &p->vcpu_ioreq[v->vcpu_id] : NULL;
+    return &p->vcpu_ioreq[v->vcpu_id];
 }
 
 bool_t hvm_io_pending(struct vcpu *v)
 {
-    ioreq_t *p = get_ioreq(v);
+    struct hvm_ioreq_server *s = v->domain->arch.hvm_domain.ioreq_server;
+    ioreq_t *p;
 
-    if ( !p )
+    if ( !s )
         return 0;
 
+    p = get_ioreq(s, v);
     return p->state != STATE_IOREQ_NONE;
 }
 
 void hvm_do_resume(struct vcpu *v)
 {
-    ioreq_t *p = get_ioreq(v);
+    struct domain *d = v->domain;
+    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+    ioreq_t *p;
 
     check_wakeup_from_wait();
 
     if ( is_hvm_vcpu(v) )
         pt_restore_timer(v);
 
-    /* NB. Optimised for common case (p->state == STATE_IOREQ_NONE). */
-    if ( !p )
+    if ( !s )
         goto check_inject_trap;
 
+    /* NB. Optimised for common case (p->state == STATE_IOREQ_NONE). */
+    p = get_ioreq(s, v);
     while ( p->state != STATE_IOREQ_NONE )
     {
         switch ( p->state )
@@ -426,14 +431,6 @@ void hvm_do_resume(struct vcpu *v)
     }
 }
 
-static void hvm_init_ioreq_page(
-    struct domain *d, struct hvm_ioreq_page *iorp)
-{
-    memset(iorp, 0, sizeof(*iorp));
-    spin_lock_init(&iorp->lock);
-    domain_pause(d);
-}
-
 void destroy_ring_for_helper(
     void **_va, struct page_info *page)
 {
@@ -447,16 +444,11 @@ void destroy_ring_for_helper(
     }
 }
 
-static void hvm_unmap_ioreq_page(
-    struct domain *d, struct hvm_ioreq_page *iorp)
+static void hvm_unmap_ioreq_page(struct hvm_ioreq_server *s, bool_t buf)
 {
-    spin_lock(&iorp->lock);
-
-    ASSERT(d->is_dying);
+    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
 
     destroy_ring_for_helper(&iorp->va, iorp->page);
-
-    spin_unlock(&iorp->lock);
 }
 
 int prepare_ring_for_helper(
@@ -504,8 +496,10 @@ int prepare_ring_for_helper(
 }
 
 static int hvm_map_ioreq_page(
-    struct domain *d, struct hvm_ioreq_page *iorp, unsigned long gmfn)
+    struct hvm_ioreq_server *s, bool_t buf, unsigned long gmfn)
 {
+    struct domain *d = s->domain;
+    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
     struct page_info *page;
     void *va;
     int rc;
@@ -513,22 +507,15 @@ static int hvm_map_ioreq_page(
     if ( (rc = prepare_ring_for_helper(d, gmfn, &page, &va)) )
         return rc;
 
-    spin_lock(&iorp->lock);
-
     if ( (iorp->va != NULL) || d->is_dying )
     {
         destroy_ring_for_helper(&va, page);
-        spin_unlock(&iorp->lock);
         return -EINVAL;
     }
 
     iorp->va = va;
     iorp->page = page;
 
-    spin_unlock(&iorp->lock);
-
-    domain_unpause(d);
-
     return 0;
 }
 
@@ -572,8 +559,219 @@ static int handle_pvh_io(
     return X86EMUL_OKAY;
 }
 
+static void hvm_update_ioreq_evtchn(struct hvm_ioreq_server *s,
+                                    struct hvm_ioreq_vcpu *sv)
+{
+    ASSERT(spin_is_locked(&s->lock));
+
+    if ( s->ioreq.va != NULL )
+    {
+        ioreq_t *p = get_ioreq(s, sv->vcpu);
+
+        p->vp_eport = sv->ioreq_evtchn;
+    }
+}
+
+static int hvm_ioreq_server_add_vcpu(struct hvm_ioreq_server *s,
+                                     struct vcpu *v)
+{
+    struct hvm_ioreq_vcpu *sv;
+    int rc;
+
+    sv = xzalloc(struct hvm_ioreq_vcpu);
+
+    rc = -ENOMEM;
+    if ( !sv )
+        goto fail1;
+
+    spin_lock(&s->lock);
+
+    rc = alloc_unbound_xen_event_channel(v, s->domid, NULL);
+    if ( rc < 0 )
+        goto fail2;
+
+    sv->ioreq_evtchn = rc;
+
+    if ( v->vcpu_id == 0 )
+    {
+        struct domain *d = s->domain;
+
+        rc = alloc_unbound_xen_event_channel(v, s->domid, NULL);
+        if ( rc < 0 )
+            goto fail3;
+
+        s->bufioreq_evtchn = rc;
+        d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN] =
+            s->bufioreq_evtchn;
+    }
+
+    sv->vcpu = v;
+
+    list_add(&sv->list_entry, &s->ioreq_vcpu_list);
+
+    hvm_update_ioreq_evtchn(s, sv);
+
+    spin_unlock(&s->lock);
+    return 0;
+
+ fail3:
+    free_xen_event_channel(v, sv->ioreq_evtchn);
+    
+ fail2:
+    spin_unlock(&s->lock);
+    xfree(sv);
+
+ fail1:
+    return rc;
+}
+
+static void hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s,
+                                         struct vcpu *v)
+{
+    struct hvm_ioreq_vcpu *sv;
+
+    spin_lock(&s->lock);
+
+    list_for_each_entry ( sv,
+                          &s->ioreq_vcpu_list,
+                          list_entry )
+    {
+        if ( sv->vcpu != v )
+            continue;
+
+        list_del_init(&sv->list_entry);
+
+        if ( v->vcpu_id == 0 )
+            free_xen_event_channel(v, s->bufioreq_evtchn);
+
+        free_xen_event_channel(v, sv->ioreq_evtchn);
+
+        xfree(sv);
+        break;
+    }
+
+    spin_unlock(&s->lock);
+}
+
+static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
+{
+    struct hvm_ioreq_server *s;
+
+    s = xzalloc(struct hvm_ioreq_server);
+    if ( !s )
+        return -ENOMEM;
+
+    s->domain = d;
+    s->domid = domid;
+
+    spin_lock_init(&s->lock);
+    INIT_LIST_HEAD(&s->ioreq_vcpu_list);
+    spin_lock_init(&s->bufioreq_lock);
+
+    d->arch.hvm_domain.ioreq_server = s;
+    return 0;
+}
+
+static void hvm_destroy_ioreq_server(struct domain *d)
+{
+    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+
+    hvm_unmap_ioreq_page(s, 1);
+    hvm_unmap_ioreq_page(s, 0);
+
+    xfree(s);
+}
+
+static int hvm_set_ioreq_pfn(struct domain *d, bool_t buf,
+                             unsigned long pfn)
+{
+    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+    int rc;
+
+    spin_lock(&s->lock);
+
+    rc = hvm_map_ioreq_page(s, buf, pfn);
+    if ( rc )
+        goto fail;
+
+    if (!buf) {
+        struct hvm_ioreq_vcpu *sv;
+
+        list_for_each_entry ( sv,
+                              &s->ioreq_vcpu_list,
+                              list_entry )
+            hvm_update_ioreq_evtchn(s, sv);
+    }
+
+    spin_unlock(&s->lock);
+    return 0;
+
+ fail:
+    spin_unlock(&s->lock);
+    return rc;
+}
+
+static int hvm_replace_event_channel(struct vcpu *v, domid_t remote_domid,
+                                     evtchn_port_t *p_port)
+{
+    evtchn_port_t old_port, new_port;
+
+    new_port = alloc_unbound_xen_event_channel(v, remote_domid, NULL);
+    if ( new_port < 0 )
+        return new_port;
+
+    /* xchg() ensures that only we call free_xen_event_channel(). */
+    old_port = xchg(p_port, new_port);
+    free_xen_event_channel(v, old_port);
+    return 0;
+}
+
+static int hvm_set_dm_domain(struct domain *d, domid_t domid)
+{
+    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+    int rc = 0;
+
+    domain_pause(d);
+    spin_lock(&s->lock);
+
+    if ( s->domid != domid ) {
+        struct hvm_ioreq_vcpu *sv;
+
+        list_for_each_entry ( sv,
+                              &s->ioreq_vcpu_list,
+                              list_entry )
+        {
+            struct vcpu *v = sv->vcpu;
+
+            if ( v->vcpu_id == 0 ) {
+                rc = hvm_replace_event_channel(v, domid,
+                                               &s->bufioreq_evtchn);
+                if ( rc )
+                    break;
+
+                d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN] =
+                    s->bufioreq_evtchn;
+            }
+
+            rc = hvm_replace_event_channel(v, domid, &sv->ioreq_evtchn);
+            if ( rc )
+                break;
+
+            hvm_update_ioreq_evtchn(s, sv);
+        }
+
+        s->domid = domid;
+    }
+
+    spin_unlock(&s->lock);
+    domain_unpause(d);
+
+    return rc;
+}
+
 int hvm_domain_initialise(struct domain *d)
 {
+    domid_t domid;
     int rc;
 
     if ( !hvm_enabled )
@@ -639,17 +837,21 @@ int hvm_domain_initialise(struct domain *d)
 
     rtc_init(d);
 
-    hvm_init_ioreq_page(d, &d->arch.hvm_domain.ioreq);
-    hvm_init_ioreq_page(d, &d->arch.hvm_domain.buf_ioreq);
+    domid = d->arch.hvm_domain.params[HVM_PARAM_DM_DOMAIN];
+    rc = hvm_create_ioreq_server(d, domid);
+    if ( rc != 0 )
+        goto fail2;
 
     register_portio_handler(d, 0xe9, 1, hvm_print_line);
 
     rc = hvm_funcs.domain_initialise(d);
     if ( rc != 0 )
-        goto fail2;
+        goto fail3;
 
     return 0;
 
+ fail3:
+    hvm_destroy_ioreq_server(d);
  fail2:
     rtc_deinit(d);
     stdvga_deinit(d);
@@ -673,8 +875,7 @@ void hvm_domain_relinquish_resources(struct domain *d)
     if ( hvm_funcs.nhvm_domain_relinquish_resources )
         hvm_funcs.nhvm_domain_relinquish_resources(d);
 
-    hvm_unmap_ioreq_page(d, &d->arch.hvm_domain.ioreq);
-    hvm_unmap_ioreq_page(d, &d->arch.hvm_domain.buf_ioreq);
+    hvm_destroy_ioreq_server(d);
 
     msixtbl_pt_cleanup(d);
 
@@ -1307,7 +1508,7 @@ int hvm_vcpu_initialise(struct vcpu *v)
 {
     int rc;
     struct domain *d = v->domain;
-    domid_t dm_domid;
+    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
 
     hvm_asid_flush_vcpu(v);
 
@@ -1350,30 +1551,10 @@ int hvm_vcpu_initialise(struct vcpu *v)
          && (rc = nestedhvm_vcpu_initialise(v)) < 0 ) /* teardown: nestedhvm_vcpu_destroy */
         goto fail5;
 
-    dm_domid = d->arch.hvm_domain.params[HVM_PARAM_DM_DOMAIN];
-
-    /* Create ioreq event channel. */
-    rc = alloc_unbound_xen_event_channel(v, dm_domid, NULL); /* teardown: none */
-    if ( rc < 0 )
+    rc = hvm_ioreq_server_add_vcpu(s, v);
+    if ( rc != 0 )
         goto fail6;
 
-    /* Register ioreq event channel. */
-    v->arch.hvm_vcpu.xen_port = rc;
-
-    if ( v->vcpu_id == 0 )
-    {
-        /* Create bufioreq event channel. */
-        rc = alloc_unbound_xen_event_channel(v, dm_domid, NULL); /* teardown: none */
-        if ( rc < 0 )
-            goto fail6;
-        d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN] = rc;
-    }
-
-    spin_lock(&d->arch.hvm_domain.ioreq.lock);
-    if ( d->arch.hvm_domain.ioreq.va != NULL )
-        get_ioreq(v)->vp_eport = v->arch.hvm_vcpu.xen_port;
-    spin_unlock(&d->arch.hvm_domain.ioreq.lock);
-
     if ( v->vcpu_id == 0 )
     {
         /* NB. All these really belong in hvm_domain_initialise(). */
@@ -1406,6 +1587,11 @@ int hvm_vcpu_initialise(struct vcpu *v)
 
 void hvm_vcpu_destroy(struct vcpu *v)
 {
+    struct domain *d = v->domain;
+    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+
+    hvm_ioreq_server_remove_vcpu(s, v);
+
     nestedhvm_vcpu_destroy(v);
 
     free_compat_arg_xlat(v);
@@ -1417,9 +1603,6 @@ void hvm_vcpu_destroy(struct vcpu *v)
         vlapic_destroy(v);
 
     hvm_funcs.vcpu_destroy(v);
-
-    /* Event channel is already freed by evtchn_destroy(). */
-    /*free_xen_event_channel(v, v->arch.hvm_vcpu.xen_port);*/
 }
 
 void hvm_vcpu_down(struct vcpu *v)
@@ -1450,8 +1633,9 @@ int hvm_buffered_io_send(ioreq_t *p)
 {
     struct vcpu *v = current;
     struct domain *d = v->domain;
-    struct hvm_ioreq_page *iorp = &d->arch.hvm_domain.buf_ioreq;
-    buffered_iopage_t *pg = iorp->va;
+    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+    struct hvm_ioreq_page *iorp;
+    buffered_iopage_t *pg;
     buf_ioreq_t bp;
     /* Timeoffset sends 64b data, but no address. Use two consecutive slots. */
     int qw = 0;
@@ -1459,6 +1643,12 @@ int hvm_buffered_io_send(ioreq_t *p)
     /* Ensure buffered_iopage fits in a page */
     BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE);
 
+    if ( !s )
+        return 0;
+
+    iorp = &s->bufioreq;
+    pg = iorp->va;
+
     /*
      * Return 0 for the cases we can't deal with:
      *  - 'addr' is only a 20-bit field, so we cannot address beyond 1MB
@@ -1495,13 +1685,13 @@ int hvm_buffered_io_send(ioreq_t *p)
     bp.data = p->data;
     bp.addr = p->addr;
 
-    spin_lock(&iorp->lock);
+    spin_lock(&s->bufioreq_lock);
 
     if ( (pg->write_pointer - pg->read_pointer) >=
          (IOREQ_BUFFER_SLOT_NUM - qw) )
     {
         /* The queue is full: send the iopacket through the normal path. */
-        spin_unlock(&iorp->lock);
+        spin_unlock(&s->bufioreq_lock);
         return 0;
     }
 
@@ -1517,33 +1707,37 @@ int hvm_buffered_io_send(ioreq_t *p)
     wmb();
     pg->write_pointer += qw ? 2 : 1;
 
-    notify_via_xen_event_channel(d, d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN]);
-    spin_unlock(&iorp->lock);
+    notify_via_xen_event_channel(d, s->bufioreq_evtchn);
+    spin_unlock(&s->bufioreq_lock);
 
     return 1;
 }
 
 bool_t hvm_has_dm(struct domain *d)
 {
-    return !!d->arch.hvm_domain.ioreq.va;
+    return !!d->arch.hvm_domain.ioreq_server;
 }
 
 bool_t hvm_send_assist_req(ioreq_t *proto_p)
 {
     struct vcpu *v = current;
-    ioreq_t *p = get_ioreq(v);
+    struct domain *d = v->domain;
+    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+    ioreq_t *p;
 
     if ( unlikely(!vcpu_start_shutdown_deferral(v)) )
         return 0; /* implicitly bins the i/o operation */
 
-    if ( !p )
+    if ( !s )
         return 0;
 
+    p = get_ioreq(s, v);
+
     if ( unlikely(p->state != STATE_IOREQ_NONE) )
     {
         /* This indicates a bug in the device model. Crash the domain. */
         gdprintk(XENLOG_ERR, "Device model set bad IO state %d.\n", p->state);
-        domain_crash(v->domain);
+        domain_crash(d);
         return 0;
     }
 
@@ -4164,21 +4358,6 @@ static int hvmop_flush_tlb_all(void)
     return 0;
 }
 
-static int hvm_replace_event_channel(struct vcpu *v, domid_t remote_domid,
-                                     int *p_port)
-{
-    int old_port, new_port;
-
-    new_port = alloc_unbound_xen_event_channel(v, remote_domid, NULL);
-    if ( new_port < 0 )
-        return new_port;
-
-    /* xchg() ensures that only we call free_xen_event_channel(). */
-    old_port = xchg(p_port, new_port);
-    free_xen_event_channel(v, old_port);
-    return 0;
-}
-
 #define HVMOP_op_mask 0xff
 
 long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
@@ -4194,7 +4373,6 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
     case HVMOP_get_param:
     {
         struct xen_hvm_param a;
-        struct hvm_ioreq_page *iorp;
         struct domain *d;
         struct vcpu *v;
 
@@ -4227,19 +4405,10 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
             switch ( a.index )
             {
             case HVM_PARAM_IOREQ_PFN:
-                iorp = &d->arch.hvm_domain.ioreq;
-                if ( (rc = hvm_map_ioreq_page(d, iorp, a.value)) != 0 )
-                    break;
-                spin_lock(&iorp->lock);
-                if ( iorp->va != NULL )
-                    /* Initialise evtchn port info if VCPUs already created. */
-                    for_each_vcpu ( d, v )
-                        get_ioreq(v)->vp_eport = v->arch.hvm_vcpu.xen_port;
-                spin_unlock(&iorp->lock);
+                rc = hvm_set_ioreq_pfn(d, 0, a.value);
                 break;
             case HVM_PARAM_BUFIOREQ_PFN:
-                iorp = &d->arch.hvm_domain.buf_ioreq;
-                rc = hvm_map_ioreq_page(d, iorp, a.value);
+                rc = hvm_set_ioreq_pfn(d, 1, a.value);
                 break;
             case HVM_PARAM_CALLBACK_IRQ:
                 hvm_set_callback_via(d, a.value);
@@ -4294,31 +4463,7 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
                 if ( a.value == DOMID_SELF )
                     a.value = curr_d->domain_id;
 
-                rc = 0;
-                domain_pause(d); /* safe to change per-vcpu xen_port */
-                if ( d->vcpu[0] )
-                    rc = hvm_replace_event_channel(d->vcpu[0], a.value,
-                             (int *)&d->vcpu[0]->domain->arch.hvm_domain.params
-                                     [HVM_PARAM_BUFIOREQ_EVTCHN]);
-                if ( rc )
-                {
-                    domain_unpause(d);
-                    break;
-                }
-                iorp = &d->arch.hvm_domain.ioreq;
-                for_each_vcpu ( d, v )
-                {
-                    rc = hvm_replace_event_channel(v, a.value,
-                                                   &v->arch.hvm_vcpu.xen_port);
-                    if ( rc )
-                        break;
-
-                    spin_lock(&iorp->lock);
-                    if ( iorp->va != NULL )
-                        get_ioreq(v)->vp_eport = v->arch.hvm_vcpu.xen_port;
-                    spin_unlock(&iorp->lock);
-                }
-                domain_unpause(d);
+                rc = hvm_set_dm_domain(d, a.value);
                 break;
             case HVM_PARAM_ACPI_S_STATE:
                 /* Not reflexive, as we must domain_pause(). */
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index 460dd94..92dc5fb 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -36,14 +36,35 @@
 #include <public/hvm/save.h>
 
 struct hvm_ioreq_page {
-    spinlock_t lock;
     struct page_info *page;
     void *va;
 };
 
-struct hvm_domain {
+struct hvm_ioreq_vcpu {
+    struct list_head list_entry;
+    struct vcpu      *vcpu;
+    evtchn_port_t    ioreq_evtchn;
+};
+
+struct hvm_ioreq_server {
+    struct domain          *domain;
+
+    /* Lock to serialize toolstack modifications */
+    spinlock_t             lock;
+
+    /* Domain id of emulating domain */
+    domid_t                domid;
     struct hvm_ioreq_page  ioreq;
-    struct hvm_ioreq_page  buf_ioreq;
+    struct list_head       ioreq_vcpu_list;
+    struct hvm_ioreq_page  bufioreq;
+
+    /* Lock to serialize access to buffered ioreq ring */
+    spinlock_t             bufioreq_lock;
+    evtchn_port_t          bufioreq_evtchn;
+};
+
+struct hvm_domain {
+    struct hvm_ioreq_server *ioreq_server;
 
     struct pl_time         pl_time;
 
@@ -106,3 +127,12 @@ struct hvm_domain {
 
 #endif /* __ASM_X86_HVM_DOMAIN_H__ */
 
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index f34fa91..db37232 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -138,8 +138,6 @@ struct hvm_vcpu {
     spinlock_t          tm_lock;
     struct list_head    tm_list;
 
-    int                 xen_port;
-
     u8                  flag_dr_dirty;
     bool_t              debug_state_latch;
     bool_t              single_step;
@@ -186,3 +184,13 @@ struct hvm_vcpu {
 };
 
 #endif /* __ASM_X86_HVM_VCPU_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 5/9] ioreq-server: on-demand creation of ioreq server
  2014-05-01 12:08 [PATCH v5 0/9] Support for running secondary emulators Paul Durrant
                   ` (3 preceding siblings ...)
  2014-05-01 12:08 ` [PATCH v5 4/9] ioreq-server: create basic ioreq server abstraction Paul Durrant
@ 2014-05-01 12:08 ` Paul Durrant
  2014-05-06 14:18   ` Jan Beulich
  2014-05-01 12:08 ` [PATCH v5 6/9] ioreq-server: add support for multiple servers Paul Durrant
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 57+ messages in thread
From: Paul Durrant @ 2014-05-01 12:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Paul Durrant, Keir Fraser, Jan Beulich

This patch only creates the ioreq server when the legacy HVM parameters
are read (by an emulator).

A lock is introduced to protect access to the ioreq server by multiple
emulator/tool invocations should such an eventuality arise. The guest is
protected by creation of the ioreq server only being done whilst the
domain is paused.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
---
 xen/arch/x86/hvm/hvm.c           |  265 +++++++++++++++++++++++++++-----------
 xen/include/asm-x86/hvm/domain.h |    1 +
 2 files changed, 194 insertions(+), 72 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 1684705..bc073b5 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -389,40 +389,38 @@ void hvm_do_resume(struct vcpu *v)
 {
     struct domain *d = v->domain;
     struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
-    ioreq_t *p;
 
     check_wakeup_from_wait();
 
     if ( is_hvm_vcpu(v) )
         pt_restore_timer(v);
 
-    if ( !s )
-        goto check_inject_trap;
-
-    /* NB. Optimised for common case (p->state == STATE_IOREQ_NONE). */
-    p = get_ioreq(s, v);
-    while ( p->state != STATE_IOREQ_NONE )
+    if ( s )
     {
-        switch ( p->state )
+        ioreq_t *p = get_ioreq(s, v);
+
+        while ( p->state != STATE_IOREQ_NONE )
         {
-        case STATE_IORESP_READY: /* IORESP_READY -> NONE */
-            rmb(); /* see IORESP_READY /then/ read contents of ioreq */
-            hvm_io_assist(p);
-            break;
-        case STATE_IOREQ_READY:  /* IOREQ_{READY,INPROCESS} -> IORESP_READY */
-        case STATE_IOREQ_INPROCESS:
-            wait_on_xen_event_channel(p->vp_eport,
-                                      (p->state != STATE_IOREQ_READY) &&
-                                      (p->state != STATE_IOREQ_INPROCESS));
-            break;
-        default:
-            gdprintk(XENLOG_ERR, "Weird HVM iorequest state %d.\n", p->state);
-            domain_crash(v->domain);
-            return; /* bail */
+            switch ( p->state )
+            {
+            case STATE_IORESP_READY: /* IORESP_READY -> NONE */
+                rmb(); /* see IORESP_READY /then/ read contents of ioreq */
+                hvm_io_assist(p);
+                break;
+            case STATE_IOREQ_READY:  /* IOREQ_{READY,INPROCESS} -> IORESP_READY */
+            case STATE_IOREQ_INPROCESS:
+                wait_on_xen_event_channel(p->vp_eport,
+                                          (p->state != STATE_IOREQ_READY) &&
+                                          (p->state != STATE_IOREQ_INPROCESS));
+                break;
+            default:
+                gdprintk(XENLOG_ERR, "Weird HVM iorequest state %d.\n", p->state);
+                domain_crash(d);
+                return; /* bail */
+            }
         }
     }
 
- check_inject_trap:
     /* Inject pending hw/sw trap */
     if ( v->arch.hvm_vcpu.inject_trap.vector != -1 ) 
     {
@@ -653,13 +651,70 @@ static void hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s,
     spin_unlock(&s->lock);
 }
 
-static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
+static void hvm_ioreq_server_remove_all_vcpus(struct hvm_ioreq_server *s)
 {
-    struct hvm_ioreq_server *s;
+    struct hvm_ioreq_vcpu *sv, *next;
 
-    s = xzalloc(struct hvm_ioreq_server);
-    if ( !s )
-        return -ENOMEM;
+    spin_lock(&s->lock);
+
+    list_for_each_entry_safe ( sv,
+                               next,
+                               &s->ioreq_vcpu_list,
+                               list_entry )
+    {
+        struct vcpu *v = sv->vcpu;
+
+        list_del_init(&sv->list_entry);
+
+        if ( v->vcpu_id == 0 )
+            free_xen_event_channel(v, s->bufioreq_evtchn);
+
+        free_xen_event_channel(v, sv->ioreq_evtchn);
+
+        xfree(sv);
+    }
+
+    spin_unlock(&s->lock);
+}
+
+static int hvm_ioreq_server_map_pages(struct hvm_ioreq_server *s)
+{
+    struct domain *d = s->domain;
+    unsigned long pfn;
+    int rc;
+
+    pfn = d->arch.hvm_domain.params[HVM_PARAM_IOREQ_PFN];
+    rc = hvm_map_ioreq_page(s, 0, pfn);
+    if ( rc )
+        goto fail1;
+
+    pfn = d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_PFN];
+    rc = hvm_map_ioreq_page(s, 1, pfn);
+    if ( rc )
+        goto fail2;
+
+    return 0;
+
+fail2:
+    hvm_unmap_ioreq_page(s, 0);
+
+fail1:
+    return rc;
+}
+
+static void hvm_ioreq_server_unmap_pages(struct hvm_ioreq_server *s)
+{
+    hvm_unmap_ioreq_page(s, 1);
+    hvm_unmap_ioreq_page(s, 0);
+}
+
+static int hvm_ioreq_server_init(struct hvm_ioreq_server *s, struct domain *d,
+                                 domid_t domid)
+{
+    struct vcpu *v;
+    int rc;
+
+    gdprintk(XENLOG_DEBUG, "%s %d\n", __func__, domid);
 
     s->domain = d;
     s->domid = domid;
@@ -668,49 +723,95 @@ static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
     INIT_LIST_HEAD(&s->ioreq_vcpu_list);
     spin_lock_init(&s->bufioreq_lock);
 
-    d->arch.hvm_domain.ioreq_server = s;
+    rc = hvm_ioreq_server_map_pages(s);
+    if ( rc )
+        return rc;
+
+    for_each_vcpu ( d, v )
+    {
+        rc = hvm_ioreq_server_add_vcpu(s, v);
+        if ( rc )
+            goto fail;
+    }
+
     return 0;
+
+ fail:
+    hvm_ioreq_server_remove_all_vcpus(s);
+    hvm_ioreq_server_unmap_pages(s);
+
+    return rc;
 }
 
-static void hvm_destroy_ioreq_server(struct domain *d)
+static void hvm_ioreq_server_deinit(struct hvm_ioreq_server *s)
 {
-    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+    gdprintk(XENLOG_DEBUG, "%s %d\n", __func__, s->domid);
 
-    hvm_unmap_ioreq_page(s, 1);
-    hvm_unmap_ioreq_page(s, 0);
-
-    xfree(s);
+    hvm_ioreq_server_remove_all_vcpus(s);
+    hvm_ioreq_server_unmap_pages(s);
 }
 
-static int hvm_set_ioreq_pfn(struct domain *d, bool_t buf,
-                             unsigned long pfn)
+static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
 {
-    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+    struct hvm_ioreq_server *s;
     int rc;
 
-    spin_lock(&s->lock);
+    rc = -ENOMEM;
+    s = xzalloc(struct hvm_ioreq_server);
+    if ( !s )
+        goto fail1;
+
+    domain_pause(d);
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    rc = -EEXIST;
+    if ( d->arch.hvm_domain.ioreq_server != NULL )
+        goto fail2;
 
-    rc = hvm_map_ioreq_page(s, buf, pfn);
+    rc = hvm_ioreq_server_init(s, d, domid);
     if ( rc )
-        goto fail;
+        goto fail3;
 
-    if (!buf) {
-        struct hvm_ioreq_vcpu *sv;
+    d->arch.hvm_domain.ioreq_server = s;
 
-        list_for_each_entry ( sv,
-                              &s->ioreq_vcpu_list,
-                              list_entry )
-            hvm_update_ioreq_evtchn(s, sv);
-    }
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+    domain_unpause(d);
 
-    spin_unlock(&s->lock);
     return 0;
 
- fail:
-    spin_unlock(&s->lock);
+ fail3:
+ fail2:
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+    domain_unpause(d);
+
+    xfree(s);
+ fail1:
     return rc;
 }
 
+static void hvm_destroy_ioreq_server(struct domain *d)
+{
+    struct hvm_ioreq_server *s;
+
+    domain_pause(d);
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    s = d->arch.hvm_domain.ioreq_server;
+    if ( !s )
+        goto done;
+
+    d->arch.hvm_domain.ioreq_server = NULL;
+
+    hvm_ioreq_server_deinit(s);
+
+ done:
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+    domain_unpause(d);
+
+    if ( s )
+        xfree(s);
+}
+
 static int hvm_replace_event_channel(struct vcpu *v, domid_t remote_domid,
                                      evtchn_port_t *p_port)
 {
@@ -728,9 +829,15 @@ static int hvm_replace_event_channel(struct vcpu *v, domid_t remote_domid,
 
 static int hvm_set_dm_domain(struct domain *d, domid_t domid)
 {
-    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+    struct hvm_ioreq_server *s;
     int rc = 0;
 
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    s = d->arch.hvm_domain.ioreq_server;
+    if ( !s )
+        goto done;
+
     domain_pause(d);
     spin_lock(&s->lock);
 
@@ -766,12 +873,13 @@ static int hvm_set_dm_domain(struct domain *d, domid_t domid)
     spin_unlock(&s->lock);
     domain_unpause(d);
 
+ done:
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
     return rc;
 }
 
 int hvm_domain_initialise(struct domain *d)
 {
-    domid_t domid;
     int rc;
 
     if ( !hvm_enabled )
@@ -797,6 +905,7 @@ int hvm_domain_initialise(struct domain *d)
 
     }
 
+    spin_lock_init(&d->arch.hvm_domain.ioreq_server_lock);
     spin_lock_init(&d->arch.hvm_domain.irq_lock);
     spin_lock_init(&d->arch.hvm_domain.uc_lock);
 
@@ -837,21 +946,14 @@ int hvm_domain_initialise(struct domain *d)
 
     rtc_init(d);
 
-    domid = d->arch.hvm_domain.params[HVM_PARAM_DM_DOMAIN];
-    rc = hvm_create_ioreq_server(d, domid);
-    if ( rc != 0 )
-        goto fail2;
-
     register_portio_handler(d, 0xe9, 1, hvm_print_line);
 
     rc = hvm_funcs.domain_initialise(d);
     if ( rc != 0 )
-        goto fail3;
+        goto fail2;
 
     return 0;
 
- fail3:
-    hvm_destroy_ioreq_server(d);
  fail2:
     rtc_deinit(d);
     stdvga_deinit(d);
@@ -1508,7 +1610,7 @@ int hvm_vcpu_initialise(struct vcpu *v)
 {
     int rc;
     struct domain *d = v->domain;
-    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+    struct hvm_ioreq_server *s;
 
     hvm_asid_flush_vcpu(v);
 
@@ -1551,7 +1653,14 @@ int hvm_vcpu_initialise(struct vcpu *v)
          && (rc = nestedhvm_vcpu_initialise(v)) < 0 ) /* teardown: nestedhvm_vcpu_destroy */
         goto fail5;
 
-    rc = hvm_ioreq_server_add_vcpu(s, v);
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    s = d->arch.hvm_domain.ioreq_server;
+    if ( s )
+        rc = hvm_ioreq_server_add_vcpu(s, v);
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+
     if ( rc != 0 )
         goto fail6;
 
@@ -1588,9 +1697,15 @@ int hvm_vcpu_initialise(struct vcpu *v)
 void hvm_vcpu_destroy(struct vcpu *v)
 {
     struct domain *d = v->domain;
-    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+    struct hvm_ioreq_server *s;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
 
-    hvm_ioreq_server_remove_vcpu(s, v);
+    s = d->arch.hvm_domain.ioreq_server;
+    if ( s )
+        hvm_ioreq_server_remove_vcpu(s, v);
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
 
     nestedhvm_vcpu_destroy(v);
 
@@ -4404,12 +4519,6 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
 
             switch ( a.index )
             {
-            case HVM_PARAM_IOREQ_PFN:
-                rc = hvm_set_ioreq_pfn(d, 0, a.value);
-                break;
-            case HVM_PARAM_BUFIOREQ_PFN:
-                rc = hvm_set_ioreq_pfn(d, 1, a.value);
-                break;
             case HVM_PARAM_CALLBACK_IRQ:
                 hvm_set_callback_via(d, a.value);
                 hvm_latch_shinfo_size(d);
@@ -4455,7 +4564,7 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
                 domctl_lock_release();
                 break;
             case HVM_PARAM_DM_DOMAIN:
-                /* Not reflexive, as we must domain_pause(). */
+                /* Not reflexive, as we may need to domain_pause(). */
                 rc = -EPERM;
                 if ( curr_d == d )
                     break;
@@ -4561,6 +4670,18 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
             case HVM_PARAM_ACPI_S_STATE:
                 a.value = d->arch.hvm_domain.is_s3_suspended ? 3 : 0;
                 break;
+            case HVM_PARAM_IOREQ_PFN:
+            case HVM_PARAM_BUFIOREQ_PFN:
+            case HVM_PARAM_BUFIOREQ_EVTCHN: {
+                domid_t domid;
+                
+                /* May need to create server */
+                domid = d->arch.hvm_domain.params[HVM_PARAM_DM_DOMAIN];
+                rc = hvm_create_ioreq_server(d, domid);
+                if ( rc != 0 && rc != -EEXIST )
+                    goto param_fail;
+                /*FALLTHRU*/
+            }
             default:
                 a.value = d->arch.hvm_domain.params[a.index];
                 break;
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index 92dc5fb..cd885de 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -65,6 +65,7 @@ struct hvm_ioreq_server {
 
 struct hvm_domain {
     struct hvm_ioreq_server *ioreq_server;
+    spinlock_t              ioreq_server_lock;
 
     struct pl_time         pl_time;
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 6/9] ioreq-server: add support for multiple servers
  2014-05-01 12:08 [PATCH v5 0/9] Support for running secondary emulators Paul Durrant
                   ` (4 preceding siblings ...)
  2014-05-01 12:08 ` [PATCH v5 5/9] ioreq-server: on-demand creation of ioreq server Paul Durrant
@ 2014-05-01 12:08 ` Paul Durrant
  2014-05-06 10:46   ` Ian Campbell
  2014-05-07 11:13   ` Jan Beulich
  2014-05-01 12:08 ` [PATCH v5 7/9] ioreq-server: remove p2m entries when server is enabled Paul Durrant
                   ` (3 subsequent siblings)
  9 siblings, 2 replies; 57+ messages in thread
From: Paul Durrant @ 2014-05-01 12:08 UTC (permalink / raw)
  To: xen-devel
  Cc: Paul Durrant, Ian Jackson, Ian Campbell, Jan Beulich, Stefano Stabellini

The previous single ioreq server that was created on demand now
becomes the default server and an API is created to allow secondary
servers, which handle specific IO ranges or PCI devices, to be added.

When the guest issues an IO the list of secondary servers is checked
for a matching IO range or PCI device. If none is found then the IO
is passed to the default server.

NOTE: To prevent emulators running in non-privileged guests from
      potentially allocating very large amounts of xen heap, the core
      rangeset code has been modified to introduce a hard limit of 256
      ranges per set.
      This patch also introduces an implementation of asprintf() for
      Xen to allow meaningful names to be supplied for the secondary
      server rangesets.

Secondary servers use guest pages to communicate with emulators, in
the same way as the default server. These pages need to be in the
guest physmap otherwise there is no suitable reference that can be
queried by an emulator in order to map them. Therefore a pool of
pages in the current E820 reserved region, just below the special
pages is used. Secondary servers allocate from and free to this pool
as they are created and destroyed.

The size of the pool is currently hardcoded in the domain build at a
value of 8. This should be sufficient for now and both the location and
size of the pool can be modified in future without any need to change the
API.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
---
 tools/libxc/xc_domain.c          |  219 +++++++++
 tools/libxc/xc_domain_restore.c  |   45 ++
 tools/libxc/xc_domain_save.c     |   24 +
 tools/libxc/xc_hvm_build_x86.c   |   30 +-
 tools/libxc/xenctrl.h            |  133 ++++++
 tools/libxc/xg_save_restore.h    |    2 +
 xen/arch/x86/hvm/hvm.c           |  933 ++++++++++++++++++++++++++++++++++----
 xen/arch/x86/hvm/io.c            |    2 +-
 xen/common/rangeset.c            |   52 ++-
 xen/common/vsprintf.c            |   56 +++
 xen/include/asm-x86/hvm/domain.h |   22 +-
 xen/include/asm-x86/hvm/hvm.h    |    1 +
 xen/include/public/hvm/hvm_op.h  |  118 +++++
 xen/include/public/hvm/ioreq.h   |   15 +-
 xen/include/public/hvm/params.h  |    5 +-
 xen/include/xen/lib.h            |    4 +
 xen/include/xen/stdarg.h         |    1 +
 17 files changed, 1557 insertions(+), 105 deletions(-)

diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 369c3f3..b3ed029 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1284,6 +1284,225 @@ int xc_get_hvm_param(xc_interface *handle, domid_t dom, int param, unsigned long
     return rc;
 }
 
+int xc_hvm_create_ioreq_server(xc_interface *xch,
+                               domid_t domid,
+                               ioservid_t *id)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_create_ioreq_server_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_create_ioreq_server;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+
+    arg->domid = domid;
+
+    rc = do_xen_hypercall(xch, &hypercall);
+
+    *id = arg->id;
+
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
+int xc_hvm_get_ioreq_server_info(xc_interface *xch,
+                                 domid_t domid,
+                                 ioservid_t id,
+                                 xen_pfn_t *ioreq_pfn,
+                                 xen_pfn_t *bufioreq_pfn,
+                                 evtchn_port_t *bufioreq_port)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_get_ioreq_server_info_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_get_ioreq_server_info;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+
+    arg->domid = domid;
+    arg->id = id;
+
+    rc = do_xen_hypercall(xch, &hypercall);
+    if ( rc != 0 )
+        goto done;
+
+    if ( ioreq_pfn )
+        *ioreq_pfn = arg->ioreq_pfn;
+
+    if ( bufioreq_pfn )
+        *bufioreq_pfn = arg->bufioreq_pfn;
+
+    if ( bufioreq_port )
+        *bufioreq_port = arg->bufioreq_port;
+
+done:
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
+int xc_hvm_map_io_range_to_ioreq_server(xc_interface *xch, domid_t domid,
+                                        ioservid_t id, int is_mmio,
+                                        uint64_t start, uint64_t end)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_io_range_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_map_io_range_to_ioreq_server;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+
+    arg->domid = domid;
+    arg->id = id;
+    arg->type = is_mmio ? HVMOP_IO_RANGE_MEMORY : HVMOP_IO_RANGE_PORT;
+    arg->start = start;
+    arg->end = end;
+
+    rc = do_xen_hypercall(xch, &hypercall);
+
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
+int xc_hvm_unmap_io_range_from_ioreq_server(xc_interface *xch, domid_t domid,
+                                            ioservid_t id, int is_mmio,
+                                            uint64_t start, uint64_t end)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_io_range_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_unmap_io_range_from_ioreq_server;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+
+    arg->domid = domid;
+    arg->id = id;
+    arg->type = is_mmio ? HVMOP_IO_RANGE_MEMORY : HVMOP_IO_RANGE_PORT;
+    arg->start = start;
+    arg->end = end;
+
+    rc = do_xen_hypercall(xch, &hypercall);
+
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
+int xc_hvm_map_pcidev_to_ioreq_server(xc_interface *xch, domid_t domid,
+                                      ioservid_t id, uint16_t segment,
+                                      uint8_t bus, uint8_t device,
+                                      uint8_t function)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_io_range_t, arg);
+    int rc;
+
+    if (device > 0x1f || function > 0x7) {
+        errno = EINVAL;
+        return -1;
+    }
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_map_io_range_to_ioreq_server;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+
+    arg->domid = domid;
+    arg->id = id;
+    arg->type = HVMOP_IO_RANGE_PCI;
+    arg->start = arg->end = HVMOP_PCI_SBDF((uint64_t)segment,
+                                           (uint64_t)bus,
+                                           (uint64_t)device,
+                                           (uint64_t)function);
+
+    rc = do_xen_hypercall(xch, &hypercall);
+
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
+int xc_hvm_unmap_pcidev_from_ioreq_server(xc_interface *xch, domid_t domid,
+                                          ioservid_t id, uint16_t segment,
+                                          uint8_t bus, uint8_t device,
+                                          uint8_t function)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_io_range_t, arg);
+    int rc;
+
+    if (device > 0x1f || function > 0x7) {
+        errno = EINVAL;
+        return -1;
+    }
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_unmap_io_range_from_ioreq_server;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+
+    arg->domid = domid;
+    arg->id = id;
+    arg->type = HVMOP_IO_RANGE_PCI;
+    arg->start = arg->end = HVMOP_PCI_SBDF((uint64_t)segment,
+                                           (uint64_t)bus,
+                                           (uint64_t)device,
+                                           (uint64_t)function);
+
+    rc = do_xen_hypercall(xch, &hypercall);
+
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
+int xc_hvm_destroy_ioreq_server(xc_interface *xch,
+                                domid_t domid,
+                                ioservid_t id)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_destroy_ioreq_server_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_destroy_ioreq_server;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+
+    arg->domid = domid;
+    arg->id = id;
+
+    rc = do_xen_hypercall(xch, &hypercall);
+
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
 int xc_domain_setdebugging(xc_interface *xch,
                            uint32_t domid,
                            unsigned int enable)
diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c
index bcb0ae0..af2bf3a 100644
--- a/tools/libxc/xc_domain_restore.c
+++ b/tools/libxc/xc_domain_restore.c
@@ -740,6 +740,8 @@ typedef struct {
     uint64_t acpi_ioport_location;
     uint64_t viridian;
     uint64_t vm_generationid_addr;
+    uint64_t ioreq_server_pfn;
+    uint64_t nr_ioreq_server_pages;
 
     struct toolstack_data_t tdata;
 } pagebuf_t;
@@ -990,6 +992,26 @@ static int pagebuf_get_one(xc_interface *xch, struct restore_ctx *ctx,
         DPRINTF("read generation id buffer address");
         return pagebuf_get_one(xch, ctx, buf, fd, dom);
 
+    case XC_SAVE_ID_HVM_IOREQ_SERVER_PFN:
+        /* Skip padding 4 bytes then read the ioreq server gmfn base. */
+        if ( RDEXACT(fd, &buf->ioreq_server_pfn, sizeof(uint32_t)) ||
+             RDEXACT(fd, &buf->ioreq_server_pfn, sizeof(uint64_t)) )
+        {
+            PERROR("error read the ioreq server gmfn base");
+            return -1;
+        }
+        return pagebuf_get_one(xch, ctx, buf, fd, dom);
+
+    case XC_SAVE_ID_HVM_NR_IOREQ_SERVER_PAGES:
+        /* Skip padding 4 bytes then read the ioreq server gmfn count. */
+        if ( RDEXACT(fd, &buf->nr_ioreq_server_pages, sizeof(uint32_t)) ||
+             RDEXACT(fd, &buf->nr_ioreq_server_pages, sizeof(uint64_t)) )
+        {
+            PERROR("error read the ioreq server gmfn count");
+            return -1;
+        }
+        return pagebuf_get_one(xch, ctx, buf, fd, dom);
+
     default:
         if ( (count > MAX_BATCH_SIZE) || (count < 0) ) {
             ERROR("Max batch size exceeded (%d). Giving up.", count);
@@ -1748,6 +1770,29 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
     if (pagebuf.viridian != 0)
         xc_set_hvm_param(xch, dom, HVM_PARAM_VIRIDIAN, 1);
 
+    /*
+     * If we are migrating in from a host that does not support
+     * secondary emulators then nr_ioreq_server_pages will be 0, since
+     * there will be no XC_SAVE_ID_HVM_NR_IOREQ_SERVER_PAGES chunk in
+     * the image.
+     * If we are migrating from a host that does support secondary
+     * emulators then the XC_SAVE_ID_HVM_NR_IOREQ_SERVER_PAGES chunk
+     * will exist and is guaranteed to have a non-zero value. The
+     * existence of that chunk also implies the existence of the
+     * XC_SAVE_ID_HVM_IOREQ_SERVER_PFN chunk, which is also guaranteed
+     * to have a non-zero value.
+     */
+    if (pagebuf.nr_ioreq_server_pages != 0) {
+        if (pagebuf.ioreq_server_pfn != 0) {
+            xc_set_hvm_param(xch, dom, HVM_PARAM_NR_IOREQ_SERVER_PAGES, 
+                             pagebuf.nr_ioreq_server_pages);
+            xc_set_hvm_param(xch, dom, HVM_PARAM_IOREQ_SERVER_PFN,
+                             pagebuf.ioreq_server_pfn);
+        } else {
+            ERROR("ioreq_server_pfn is invalid");
+        }
+    }
+
     if (pagebuf.acpi_ioport_location == 1) {
         DBGPRINTF("Use new firmware ioport from the checkpoint\n");
         xc_set_hvm_param(xch, dom, HVM_PARAM_ACPI_IOPORTS_LOCATION, 1);
diff --git a/tools/libxc/xc_domain_save.c b/tools/libxc/xc_domain_save.c
index 71f9b59..acf3685 100644
--- a/tools/libxc/xc_domain_save.c
+++ b/tools/libxc/xc_domain_save.c
@@ -1737,6 +1737,30 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iter
             PERROR("Error when writing the viridian flag");
             goto out;
         }
+
+        chunk.id = XC_SAVE_ID_HVM_IOREQ_SERVER_PFN;
+        chunk.data = 0;
+        xc_get_hvm_param(xch, dom, HVM_PARAM_IOREQ_SERVER_PFN,
+                         (unsigned long *)&chunk.data);
+
+        if ( (chunk.data != 0) &&
+             wrexact(io_fd, &chunk, sizeof(chunk)) )
+        {
+            PERROR("Error when writing the ioreq server gmfn base");
+            goto out;
+        }
+
+        chunk.id = XC_SAVE_ID_HVM_NR_IOREQ_SERVER_PAGES;
+        chunk.data = 0;
+        xc_get_hvm_param(xch, dom, HVM_PARAM_NR_IOREQ_SERVER_PAGES,
+                         (unsigned long *)&chunk.data);
+
+        if ( (chunk.data != 0) &&
+             wrexact(io_fd, &chunk, sizeof(chunk)) )
+        {
+            PERROR("Error when writing the ioreq server gmfn count");
+            goto out;
+        }
     }
 
     if ( callbacks != NULL && callbacks->toolstack_save != NULL )
diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
index dd3b522..3564e8b 100644
--- a/tools/libxc/xc_hvm_build_x86.c
+++ b/tools/libxc/xc_hvm_build_x86.c
@@ -49,6 +49,9 @@
 #define NR_SPECIAL_PAGES     8
 #define special_pfn(x) (0xff000u - NR_SPECIAL_PAGES + (x))
 
+#define NR_IOREQ_SERVER_PAGES 8
+#define ioreq_server_pfn(x) (special_pfn(0) - NR_IOREQ_SERVER_PAGES + (x))
+
 #define VGA_HOLE_SIZE (0x20)
 
 static int modules_init(struct xc_hvm_build_args *args,
@@ -114,7 +117,7 @@ static void build_hvm_info(void *hvm_info_page, uint64_t mem_size,
     /* Memory parameters. */
     hvm_info->low_mem_pgend = lowmem_end >> PAGE_SHIFT;
     hvm_info->high_mem_pgend = highmem_end >> PAGE_SHIFT;
-    hvm_info->reserved_mem_pgstart = special_pfn(0);
+    hvm_info->reserved_mem_pgstart = ioreq_server_pfn(0);
 
     /* Finish with the checksum. */
     for ( i = 0, sum = 0; i < hvm_info->length; i++ )
@@ -502,6 +505,31 @@ static int setup_guest(xc_interface *xch,
                      special_pfn(SPECIALPAGE_SHARING));
 
     /*
+     * Allocate and clear additional ioreq server pages. The default
+     * server will use the IOREQ and BUFIOREQ special pages above.
+     */
+    for ( i = 0; i < NR_IOREQ_SERVER_PAGES; i++ )
+    {
+        xen_pfn_t pfn = ioreq_server_pfn(i);
+
+        rc = xc_domain_populate_physmap_exact(xch, dom, 1, 0, 0, &pfn);
+        if ( rc != 0 )
+        {
+            PERROR("Could not allocate %d'th ioreq server page.", i);
+            goto error_out;
+        }
+
+        if ( xc_clear_domain_page(xch, dom, pfn) )
+            goto error_out;
+    }
+
+    /* Tell the domain where the pages are and how many there are */
+    xc_set_hvm_param(xch, dom, HVM_PARAM_IOREQ_SERVER_PFN,
+                     ioreq_server_pfn(0));
+    xc_set_hvm_param(xch, dom, HVM_PARAM_NR_IOREQ_SERVER_PAGES,
+                     NR_IOREQ_SERVER_PAGES);
+
+    /*
      * Identity-map page table is required for running with CR0.PG=0 when
      * using Intel EPT. Create a 32-bit non-PAE page directory of superpages.
      */
diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
index 02129f7..74fb738 100644
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -1787,6 +1787,129 @@ void xc_clear_last_error(xc_interface *xch);
 int xc_set_hvm_param(xc_interface *handle, domid_t dom, int param, unsigned long value);
 int xc_get_hvm_param(xc_interface *handle, domid_t dom, int param, unsigned long *value);
 
+/*
+ * IOREQ Server API. (See section on IOREQ Servers in public/hvm_op.h).
+ */
+
+/**
+ * This function instantiates an IOREQ Server.
+ *
+ * @parm xch a handle to an open hypervisor interface.
+ * @parm domid the domain id to be serviced
+ * @parm id pointer to an ioservid_t to receive the IOREQ Server id.
+ * @return 0 on success, -1 on failure.
+ */
+int xc_hvm_create_ioreq_server(xc_interface *xch,
+                               domid_t domid,
+                               ioservid_t *id);
+
+/**
+ * This function retrieves the necessary information to allow an
+ * emulator to use an IOREQ Server.
+ *
+ * @parm xch a handle to an open hypervisor interface.
+ * @parm domid the domain id to be serviced
+ * @parm id the IOREQ Server id.
+ * @parm ioreq_pfn pointer to a xen_pfn_t to receive the synchronous ioreq gmfn
+ * @parm bufioreq_pfn pointer to a xen_pfn_t to receive the buffered ioreq gmfn
+ * @parm bufioreq_port pointer to a evtchn_port_t to receive the buffered ioreq event channel
+ * @return 0 on success, -1 on failure.
+ */
+int xc_hvm_get_ioreq_server_info(xc_interface *xch,
+                                 domid_t domid,
+                                 ioservid_t id,
+                                 xen_pfn_t *ioreq_pfn,
+                                 xen_pfn_t *bufioreq_pfn,
+                                 evtchn_port_t *bufioreq_port);
+
+/**
+ * This function registers a range of memory or I/O ports for emulation.
+ *
+ * @parm xch a handle to an open hypervisor interface.
+ * @parm domid the domain id to be serviced
+ * @parm id the IOREQ Server id.
+ * @parm is_mmio is this a range of ports or memory
+ * @parm start start of range
+ * @parm end end of range (inclusive).
+ * @return 0 on success, -1 on failure.
+ */
+int xc_hvm_map_io_range_to_ioreq_server(xc_interface *xch,
+                                        domid_t domid,
+                                        ioservid_t id,
+                                        int is_mmio,
+                                        uint64_t start,
+                                        uint64_t end);
+
+/**
+ * This function deregisters a range of memory or I/O ports for emulation.
+ *
+ * @parm xch a handle to an open hypervisor interface.
+ * @parm domid the domain id to be serviced
+ * @parm id the IOREQ Server id.
+ * @parm is_mmio is this a range of ports or memory
+ * @parm start start of range
+ * @parm end end of range (inclusive).
+ * @return 0 on success, -1 on failure.
+ */
+int xc_hvm_unmap_io_range_from_ioreq_server(xc_interface *xch,
+                                            domid_t domid,
+                                            ioservid_t id,
+                                            int is_mmio,
+                                            uint64_t start,
+                                            uint64_t end);
+
+/**
+ * This function registers a PCI device for config space emulation.
+ *
+ * @parm xch a handle to an open hypervisor interface.
+ * @parm domid the domain id to be serviced
+ * @parm id the IOREQ Server id.
+ * @parm segment the PCI segment of the device
+ * @parm bus the PCI bus of the device
+ * @parm device the 'slot' number of the device
+ * @parm function the function number of the device
+ * @return 0 on success, -1 on failure.
+ */
+int xc_hvm_map_pcidev_to_ioreq_server(xc_interface *xch,
+                                      domid_t domid,
+                                      ioservid_t id,
+                                      uint16_t segment,
+                                      uint8_t bus,
+                                      uint8_t device,
+                                      uint8_t function);
+
+/**
+ * This function deregisters a PCI device for config space emulation.
+ *
+ * @parm xch a handle to an open hypervisor interface.
+ * @parm domid the domain id to be serviced
+ * @parm id the IOREQ Server id.
+ * @parm segment the PCI segment of the device
+ * @parm bus the PCI bus of the device
+ * @parm device the 'slot' number of the device
+ * @parm function the function number of the device
+ * @return 0 on success, -1 on failure.
+ */
+int xc_hvm_unmap_pcidev_from_ioreq_server(xc_interface *xch,
+                                          domid_t domid,
+                                          ioservid_t id,
+                                          uint16_t segment,
+                                          uint8_t bus,
+                                          uint8_t device,
+                                          uint8_t function);
+
+/**
+ * This function destroys an IOREQ Server.
+ *
+ * @parm xch a handle to an open hypervisor interface.
+ * @parm domid the domain id to be serviced
+ * @parm id the IOREQ Server id.
+ * @return 0 on success, -1 on failure.
+ */
+int xc_hvm_destroy_ioreq_server(xc_interface *xch,
+                                domid_t domid,
+                                ioservid_t id);
+
 /* HVM guest pass-through */
 int xc_assign_device(xc_interface *xch,
                      uint32_t domid,
@@ -2425,3 +2548,13 @@ int xc_kexec_load(xc_interface *xch, uint8_t type, uint16_t arch,
 int xc_kexec_unload(xc_interface *xch, int type);
 
 #endif /* XENCTRL_H */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/libxc/xg_save_restore.h b/tools/libxc/xg_save_restore.h
index f859621..f1ec7f5 100644
--- a/tools/libxc/xg_save_restore.h
+++ b/tools/libxc/xg_save_restore.h
@@ -259,6 +259,8 @@
 #define XC_SAVE_ID_HVM_ACCESS_RING_PFN  -16
 #define XC_SAVE_ID_HVM_SHARING_RING_PFN -17
 #define XC_SAVE_ID_TOOLSTACK          -18 /* Optional toolstack specific info */
+#define XC_SAVE_ID_HVM_IOREQ_SERVER_PFN -19
+#define XC_SAVE_ID_HVM_NR_IOREQ_SERVER_PAGES -20
 
 /*
 ** We process save/restore/migrate in batches of pages; the below
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index bc073b5..5ac2d93 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -66,6 +66,7 @@
 #include <asm/mem_event.h>
 #include <asm/mem_access.h>
 #include <public/mem_event.h>
+#include <xen/rangeset.h>
 
 bool_t __read_mostly hvm_enabled;
 
@@ -375,27 +376,36 @@ static ioreq_t *get_ioreq(struct hvm_ioreq_server *s, struct vcpu *v)
 
 bool_t hvm_io_pending(struct vcpu *v)
 {
-    struct hvm_ioreq_server *s = v->domain->arch.hvm_domain.ioreq_server;
-    ioreq_t *p;
-
-    if ( !s )
-        return 0;
+    struct domain *d = v->domain;
+    struct hvm_ioreq_server *s;
+ 
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server_list,
+                          list_entry )
+    {
+        ioreq_t *p = get_ioreq(s, v);
+ 
+        p = get_ioreq(s, v);
+        if ( p->state != STATE_IOREQ_NONE )
+            return 1;
+    }
 
-    p = get_ioreq(s, v);
-    return p->state != STATE_IOREQ_NONE;
+    return 0;
 }
 
 void hvm_do_resume(struct vcpu *v)
 {
     struct domain *d = v->domain;
-    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+    struct hvm_ioreq_server *s;
 
     check_wakeup_from_wait();
 
     if ( is_hvm_vcpu(v) )
         pt_restore_timer(v);
 
-    if ( s )
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server_list,
+                          list_entry )
     {
         ioreq_t *p = get_ioreq(s, v);
 
@@ -429,6 +439,31 @@ void hvm_do_resume(struct vcpu *v)
     }
 }
 
+static int hvm_alloc_ioreq_gmfn(struct domain *d, unsigned long *gmfn)
+{
+    unsigned int i;
+    int rc;
+
+    rc = -ENOMEM;
+    for ( i = 0; i < d->arch.hvm_domain.ioreq_gmfn_count; i++ )
+    {
+        if ( !test_and_set_bit(i, &d->arch.hvm_domain.ioreq_gmfn_mask) ) {
+            *gmfn = d->arch.hvm_domain.ioreq_gmfn_base + i;
+            rc = 0;
+            break;
+        }
+    }
+
+    return rc;
+}
+
+static void hvm_free_ioreq_gmfn(struct domain *d, unsigned long gmfn)
+{
+    unsigned int i = gmfn - d->arch.hvm_domain.ioreq_gmfn_base;
+
+    clear_bit(i, &d->arch.hvm_domain.ioreq_gmfn_mask);
+}
+
 void destroy_ring_for_helper(
     void **_va, struct page_info *page)
 {
@@ -513,6 +548,7 @@ static int hvm_map_ioreq_page(
 
     iorp->va = va;
     iorp->page = page;
+    iorp->gmfn = gmfn;
 
     return 0;
 }
@@ -543,6 +579,127 @@ static int hvm_print_line(
     return X86EMUL_OKAY;
 }
 
+static int hvm_access_cf8(
+    int dir, uint32_t port, uint32_t bytes, uint32_t *val)
+{
+    struct vcpu *curr = current;
+    struct hvm_domain *hd = &curr->domain->arch.hvm_domain;
+    int rc;
+
+    BUG_ON(port < 0xcf8);
+    port -= 0xcf8;
+
+    spin_lock(&hd->pci_lock);
+
+    if ( dir == IOREQ_WRITE )
+    {
+        switch ( bytes )
+        {
+        case 4:
+            hd->pci_cf8 = *val;
+            break;
+
+        case 2:
+        {
+            uint32_t mask = 0xffff << (port * 8);
+            uint32_t subval = *val << (port * 8);
+
+            hd->pci_cf8 = (hd->pci_cf8 & ~mask) |
+                          (subval & mask);
+            break;
+        }
+            
+        case 1:
+        {
+            uint32_t mask = 0xff << (port * 8);
+            uint32_t subval = *val << (port * 8);
+
+            hd->pci_cf8 = (hd->pci_cf8 & ~mask) |
+                          (subval & mask);
+            break;
+        }
+
+        default:
+            break;
+        }
+
+        /* We always need to fall through to the catch all emulator */
+        rc = X86EMUL_UNHANDLEABLE;
+    }
+    else
+    {
+        switch ( bytes )
+        {
+        case 4:
+            *val = hd->pci_cf8;
+            break;
+
+        case 2:
+            *val = (hd->pci_cf8 >> (port * 8)) & 0xffff;
+            break;
+            
+        case 1:
+            *val = (hd->pci_cf8 >> (port * 8)) & 0xff;
+            break;
+
+        default:
+            break;
+        }
+
+        rc = X86EMUL_OKAY;
+    }
+
+    spin_unlock(&hd->pci_lock);
+
+    return rc;
+}
+
+static int hvm_access_cfc(
+    int dir, uint32_t port, uint32_t bytes, uint32_t *val)
+{
+    struct vcpu *curr = current;
+    struct hvm_domain *hd = &curr->domain->arch.hvm_domain;
+    int rc;
+
+    BUG_ON(port < 0xcfc);
+    port -= 0xcfc;
+
+    spin_lock(&hd->pci_lock);
+
+    if ( hd->pci_cf8 & (1 << 31) ) {
+        /* Fall through to an emulator */
+        rc = X86EMUL_UNHANDLEABLE;
+    } else {
+        /* Config access disabled */
+        if ( dir == IOREQ_READ )
+        {
+            switch ( bytes )
+            {
+            case 4:
+                *val = 0xffffffff;
+                break;
+
+            case 2:
+                *val = 0xffff;
+                break;
+            
+            case 1:
+                *val = 0xff;
+                break;
+
+            default:
+                break;
+            }
+        }
+
+        rc = X86EMUL_OKAY;
+    }
+
+    spin_unlock(&hd->pci_lock);
+
+    return rc;
+}
+
 static int handle_pvh_io(
     int dir, uint32_t port, uint32_t bytes, uint32_t *val)
 {
@@ -571,7 +728,7 @@ static void hvm_update_ioreq_evtchn(struct hvm_ioreq_server *s,
 }
 
 static int hvm_ioreq_server_add_vcpu(struct hvm_ioreq_server *s,
-                                     struct vcpu *v)
+                                     bool_t is_default, struct vcpu *v)
 {
     struct hvm_ioreq_vcpu *sv;
     int rc;
@@ -599,8 +756,9 @@ static int hvm_ioreq_server_add_vcpu(struct hvm_ioreq_server *s,
             goto fail3;
 
         s->bufioreq_evtchn = rc;
-        d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN] =
-            s->bufioreq_evtchn;
+        if ( is_default )
+            d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN] =
+                s->bufioreq_evtchn;
     }
 
     sv->vcpu = v;
@@ -677,45 +835,128 @@ static void hvm_ioreq_server_remove_all_vcpus(struct hvm_ioreq_server *s)
     spin_unlock(&s->lock);
 }
 
-static int hvm_ioreq_server_map_pages(struct hvm_ioreq_server *s)
+static int hvm_ioreq_server_map_pages(struct hvm_ioreq_server *s,
+                                      bool_t is_default)
 {
     struct domain *d = s->domain;
-    unsigned long pfn;
+    unsigned long ioreq_pfn, bufioreq_pfn;
     int rc;
 
-    pfn = d->arch.hvm_domain.params[HVM_PARAM_IOREQ_PFN];
-    rc = hvm_map_ioreq_page(s, 0, pfn);
+    if ( is_default ) {
+        ioreq_pfn = d->arch.hvm_domain.params[HVM_PARAM_IOREQ_PFN];
+        bufioreq_pfn = d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_PFN];
+    } else {
+        rc = hvm_alloc_ioreq_gmfn(d, &ioreq_pfn);
+        if ( rc )
+            goto fail1;
+
+        rc = hvm_alloc_ioreq_gmfn(d, &bufioreq_pfn);
+        if ( rc )
+            goto fail2;
+    }
+
+    rc = hvm_map_ioreq_page(s, 0, ioreq_pfn);
     if ( rc )
-        goto fail1;
+        goto fail3;
 
-    pfn = d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_PFN];
-    rc = hvm_map_ioreq_page(s, 1, pfn);
+    rc = hvm_map_ioreq_page(s, 1, bufioreq_pfn);
     if ( rc )
-        goto fail2;
+        goto fail4;
 
     return 0;
 
-fail2:
+fail4:
     hvm_unmap_ioreq_page(s, 0);
 
+fail3:
+    if ( !is_default )
+        hvm_free_ioreq_gmfn(d, bufioreq_pfn);
+
+fail2:
+    if ( !is_default )
+        hvm_free_ioreq_gmfn(d, ioreq_pfn);
+
 fail1:
     return rc;
 }
 
-static void hvm_ioreq_server_unmap_pages(struct hvm_ioreq_server *s)
+static void hvm_ioreq_server_unmap_pages(struct hvm_ioreq_server *s, 
+                                         bool_t is_default)
 {
+    struct domain *d = s->domain;
+
     hvm_unmap_ioreq_page(s, 1);
     hvm_unmap_ioreq_page(s, 0);
+
+    if ( !is_default ) {
+        hvm_free_ioreq_gmfn(d, s->bufioreq.gmfn);
+        hvm_free_ioreq_gmfn(d, s->ioreq.gmfn);
+    }
+}
+
+static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s, 
+                                            bool_t is_default)
+{
+    int i;
+    int rc;
+
+    if ( is_default )
+        goto done;
+
+    for ( i = 0; i < MAX_IO_RANGE_TYPE; i++ ) {
+        char *name;
+
+        rc = asprintf(&name, "ioreq_server %d:%d %s", s->domid, s->id,
+                      (i == HVMOP_IO_RANGE_PORT) ? "port" :
+                      (i == HVMOP_IO_RANGE_MEMORY) ? "memory" :
+                      (i == HVMOP_IO_RANGE_PCI) ? "pci" :
+                      "");
+        if ( rc )
+            goto fail;
+
+        s->range[i] = rangeset_new(s->domain, name,
+                                   RANGESETF_prettyprint_hex);
+
+        xfree(name);
+
+        rc = -ENOMEM;
+        if ( !s->range[i] )
+            goto fail;
+    }
+
+ done:
+    return 0;
+
+ fail:
+    while ( --i >= 0 )
+        rangeset_destroy(s->range[i]);
+
+    return rc;
+}
+
+static void hvm_ioreq_server_free_rangesets(struct hvm_ioreq_server *s, 
+                                            bool_t is_default)
+{
+    int i;
+
+    if ( is_default )
+        return;
+
+    for ( i = 0; i < MAX_IO_RANGE_TYPE; i++ )
+        rangeset_destroy(s->range[i]);
 }
 
 static int hvm_ioreq_server_init(struct hvm_ioreq_server *s, struct domain *d,
-                                 domid_t domid)
+                                 domid_t domid, bool_t is_default,
+                                 ioservid_t id)
 {
     struct vcpu *v;
     int rc;
 
-    gdprintk(XENLOG_DEBUG, "%s %d\n", __func__, domid);
+    gdprintk(XENLOG_DEBUG, "%s %d:%d %s\n", __func__, domid, id,
+             is_default ? "[DEFAULT]" : "");
 
+    s->id = id;
     s->domain = d;
     s->domid = domid;
 
@@ -723,35 +964,49 @@ static int hvm_ioreq_server_init(struct hvm_ioreq_server *s, struct domain *d,
     INIT_LIST_HEAD(&s->ioreq_vcpu_list);
     spin_lock_init(&s->bufioreq_lock);
 
-    rc = hvm_ioreq_server_map_pages(s);
+    rc = hvm_ioreq_server_alloc_rangesets(s, is_default);
     if ( rc )
-        return rc;
+        goto fail1;
+
+    rc = hvm_ioreq_server_map_pages(s, is_default);
+    if ( rc )
+        goto fail2;
 
     for_each_vcpu ( d, v )
     {
-        rc = hvm_ioreq_server_add_vcpu(s, v);
+        rc = hvm_ioreq_server_add_vcpu(s, is_default, v);
         if ( rc )
-            goto fail;
+            goto fail3;
     }
 
     return 0;
 
- fail:
+ fail3:
     hvm_ioreq_server_remove_all_vcpus(s);
-    hvm_ioreq_server_unmap_pages(s);
+    hvm_ioreq_server_unmap_pages(s, is_default);
 
+ fail2:
+    hvm_ioreq_server_free_rangesets(s, is_default);
+
+ fail1:
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
     return rc;
 }
 
-static void hvm_ioreq_server_deinit(struct hvm_ioreq_server *s)
+static void hvm_ioreq_server_deinit(struct hvm_ioreq_server *s,
+                                    bool_t is_default)
 {
-    gdprintk(XENLOG_DEBUG, "%s %d\n", __func__, s->domid);
+
+    gdprintk(XENLOG_DEBUG, "%s %d:%d %s\n", __func__, s->domid, s->id,
+             is_default ? "[DEFAULT]" : "");
 
     hvm_ioreq_server_remove_all_vcpus(s);
-    hvm_ioreq_server_unmap_pages(s);
+    hvm_ioreq_server_unmap_pages(s, is_default);
+    hvm_ioreq_server_free_rangesets(s, is_default);
 }
 
-static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
+static int hvm_create_ioreq_server(struct domain *d, domid_t domid,
+                                   bool_t is_default, ioservid_t *id)
 {
     struct hvm_ioreq_server *s;
     int rc;
@@ -764,52 +1019,270 @@ static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
     domain_pause(d);
     spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
 
-    rc = -EEXIST;
-    if ( d->arch.hvm_domain.ioreq_server != NULL )
-        goto fail2;
+    rc = -EEXIST;
+    if ( is_default && d->arch.hvm_domain.default_ioreq_server != NULL )
+        goto fail2;
+
+    rc = hvm_ioreq_server_init(s, d, domid, is_default,
+                               d->arch.hvm_domain.ioreq_server_id++);
+    if ( rc )
+        goto fail3;
+
+    list_add(&s->list_entry,
+             &d->arch.hvm_domain.ioreq_server_list);
+    d->arch.hvm_domain.ioreq_server_count++;
+
+    if ( is_default )
+        d->arch.hvm_domain.default_ioreq_server = s;
+
+    if (id != NULL)
+        *id = s->id;
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+    domain_unpause(d);
+
+    return 0;
+
+ fail3:
+ fail2:
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+    domain_unpause(d);
+
+    xfree(s);
+ fail1:
+    return rc;
+}
+
+static void hvm_destroy_ioreq_server(struct domain *d, ioservid_t id)
+{
+    struct hvm_ioreq_server *s;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server_list,
+                          list_entry )
+    {
+        bool_t is_default = ( s == d->arch.hvm_domain.default_ioreq_server);
+
+        if ( s->id != id )
+            continue;
+
+        domain_pause(d);
+
+        if ( is_default )
+            d->arch.hvm_domain.default_ioreq_server = NULL;
+
+        --d->arch.hvm_domain.ioreq_server_count;
+        list_del_init(&s->list_entry);
+        
+        hvm_ioreq_server_deinit(s, is_default);
+
+        domain_unpause(d);
+
+        xfree(s);
+
+        break;
+    }
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+}
+
+static int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id,
+                                     unsigned long *ioreq_pfn,
+                                     unsigned long *bufioreq_pfn,
+                                     evtchn_port_t *bufioreq_port)
+{
+    struct hvm_ioreq_server *s;
+    int rc;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    rc = -ENOENT;
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server_list,
+                          list_entry )
+    {
+        if ( s->id != id )
+            continue;
+
+        *ioreq_pfn = s->ioreq.gmfn;
+        *bufioreq_pfn = s->bufioreq.gmfn;
+        *bufioreq_port = s->bufioreq_evtchn;
+
+        rc = 0;
+        break;
+    }
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    return rc;
+}
+
+static int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
+                                            uint32_t type, uint64_t start, uint64_t end)
+{
+    struct hvm_ioreq_server *s;
+    int rc;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    rc = -ENOENT;
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server_list,
+                          list_entry )
+    {
+        if ( s->id == id )
+        {
+            struct rangeset *r;
+
+            switch ( type )
+            {
+            case HVMOP_IO_RANGE_PORT:
+            case HVMOP_IO_RANGE_MEMORY:
+            case HVMOP_IO_RANGE_PCI:
+                r = s->range[type];
+                break;
+
+            default:
+                r = NULL;
+                break;
+            }
+
+            rc = -EINVAL;
+            if ( !r )
+                break;
+
+            rc = rangeset_add_range(r, start, end);
+            break;
+        }
+    }
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    return rc;
+}
+
+static int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
+                                                uint32_t type, uint64_t start, uint64_t end)
+{
+    struct hvm_ioreq_server *s;
+    int rc;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    rc = -ENOENT;
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server_list,
+                          list_entry )
+    {
+        if ( s->id == id )
+        {
+            struct rangeset *r;
+
+            switch ( type )
+            {
+            case HVMOP_IO_RANGE_PORT:
+            case HVMOP_IO_RANGE_MEMORY:
+            case HVMOP_IO_RANGE_PCI:
+                r = s->range[type];
+                break;
+
+            default:
+                r = NULL;
+                break;
+            }
+
+            rc = -EINVAL;
+            if ( !r )
+                break;
+
+            rc = rangeset_remove_range(r, start, end);
+            break;
+        }
+    }
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    return rc;
+}
+
+static int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v)
+{
+    struct hvm_ioreq_server *s;
+    int rc;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
 
-    rc = hvm_ioreq_server_init(s, d, domid);
-    if ( rc )
-        goto fail3;
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server_list,
+                          list_entry )
+    {
+        bool_t is_default = ( s == d->arch.hvm_domain.default_ioreq_server);
 
-    d->arch.hvm_domain.ioreq_server = s;
+        rc = hvm_ioreq_server_add_vcpu(s, is_default, v);
+        if ( rc )
+            goto fail;
+    }
 
     spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
-    domain_unpause(d);
 
     return 0;
 
- fail3:
- fail2:
+ fail:
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server_list,
+                          list_entry )
+        hvm_ioreq_server_remove_vcpu(s, v);
+
     spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
-    domain_unpause(d);
 
-    xfree(s);
- fail1:
     return rc;
 }
 
-static void hvm_destroy_ioreq_server(struct domain *d)
+static void hvm_all_ioreq_servers_remove_vcpu(struct domain *d, struct vcpu *v)
 {
     struct hvm_ioreq_server *s;
 
-    domain_pause(d);
     spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
 
-    s = d->arch.hvm_domain.ioreq_server;
-    if ( !s )
-        goto done;
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server_list,
+                          list_entry )
+        hvm_ioreq_server_remove_vcpu(s, v);
 
-    d->arch.hvm_domain.ioreq_server = NULL;
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+}
 
-    hvm_ioreq_server_deinit(s);
+static void hvm_destroy_all_ioreq_servers(struct domain *d)
+{
+    struct hvm_ioreq_server *s, *next;
 
- done:
-    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
-    domain_unpause(d);
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    list_for_each_entry_safe ( s,
+                               next,
+                               &d->arch.hvm_domain.ioreq_server_list,
+                               list_entry )
+    {
+        bool_t is_default = ( s == d->arch.hvm_domain.default_ioreq_server);
+
+        domain_pause(d);
+
+        if ( is_default )
+            d->arch.hvm_domain.default_ioreq_server = NULL;
+
+        --d->arch.hvm_domain.ioreq_server_count;
+        list_del_init(&s->list_entry);
+        
+        hvm_ioreq_server_deinit(s, is_default);
+
+        domain_unpause(d);
 
-    if ( s )
         xfree(s);
+    }
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
 }
 
 static int hvm_replace_event_channel(struct vcpu *v, domid_t remote_domid,
@@ -834,7 +1307,7 @@ static int hvm_set_dm_domain(struct domain *d, domid_t domid)
 
     spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
 
-    s = d->arch.hvm_domain.ioreq_server;
+    s = d->arch.hvm_domain.default_ioreq_server;
     if ( !s )
         goto done;
 
@@ -906,6 +1379,8 @@ int hvm_domain_initialise(struct domain *d)
     }
 
     spin_lock_init(&d->arch.hvm_domain.ioreq_server_lock);
+    INIT_LIST_HEAD(&d->arch.hvm_domain.ioreq_server_list);
+    spin_lock_init(&d->arch.hvm_domain.pci_lock);
     spin_lock_init(&d->arch.hvm_domain.irq_lock);
     spin_lock_init(&d->arch.hvm_domain.uc_lock);
 
@@ -947,6 +1422,8 @@ int hvm_domain_initialise(struct domain *d)
     rtc_init(d);
 
     register_portio_handler(d, 0xe9, 1, hvm_print_line);
+    register_portio_handler(d, 0xcf8, 4, hvm_access_cf8);
+    register_portio_handler(d, 0xcfc, 4, hvm_access_cfc);
 
     rc = hvm_funcs.domain_initialise(d);
     if ( rc != 0 )
@@ -977,7 +1454,7 @@ void hvm_domain_relinquish_resources(struct domain *d)
     if ( hvm_funcs.nhvm_domain_relinquish_resources )
         hvm_funcs.nhvm_domain_relinquish_resources(d);
 
-    hvm_destroy_ioreq_server(d);
+    hvm_destroy_all_ioreq_servers(d);
 
     msixtbl_pt_cleanup(d);
 
@@ -1610,7 +2087,6 @@ int hvm_vcpu_initialise(struct vcpu *v)
 {
     int rc;
     struct domain *d = v->domain;
-    struct hvm_ioreq_server *s;
 
     hvm_asid_flush_vcpu(v);
 
@@ -1653,14 +2129,7 @@ int hvm_vcpu_initialise(struct vcpu *v)
          && (rc = nestedhvm_vcpu_initialise(v)) < 0 ) /* teardown: nestedhvm_vcpu_destroy */
         goto fail5;
 
-    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
-
-    s = d->arch.hvm_domain.ioreq_server;
-    if ( s )
-        rc = hvm_ioreq_server_add_vcpu(s, v);
-
-    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
-
+    rc = hvm_all_ioreq_servers_add_vcpu(d, v);
     if ( rc != 0 )
         goto fail6;
 
@@ -1697,15 +2166,8 @@ int hvm_vcpu_initialise(struct vcpu *v)
 void hvm_vcpu_destroy(struct vcpu *v)
 {
     struct domain *d = v->domain;
-    struct hvm_ioreq_server *s;
-
-    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
-
-    s = d->arch.hvm_domain.ioreq_server;
-    if ( s )
-        hvm_ioreq_server_remove_vcpu(s, v);
 
-    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+    hvm_all_ioreq_servers_remove_vcpu(d, v);
 
     nestedhvm_vcpu_destroy(v);
 
@@ -1744,11 +2206,94 @@ void hvm_vcpu_down(struct vcpu *v)
     }
 }
 
+static struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
+                                                        ioreq_t *p)
+{
+#define CF8_BDF(cf8)  (((cf8) & 0x00ffff00) >> 8)
+#define CF8_ADDR(cf8) ((cf8) & 0x000000ff)
+
+    struct hvm_ioreq_server *s;
+    uint8_t type;
+    uint64_t addr;
+
+    if ( d->arch.hvm_domain.ioreq_server_count == 1 &&
+         d->arch.hvm_domain.default_ioreq_server )
+        goto done;
+
+    if ( p->type == IOREQ_TYPE_PIO &&
+         (p->addr & ~3) == 0xcfc )
+    {
+        uint32_t cf8;
+        uint32_t sbdf;
+
+        /* PCI config data cycle */
+        type = IOREQ_TYPE_PCI_CONFIG;
+
+        spin_lock(&d->arch.hvm_domain.pci_lock);
+        cf8 = d->arch.hvm_domain.pci_cf8;
+        sbdf = HVMOP_PCI_SBDF(0,
+                              PCI_BUS(CF8_BDF(cf8)),
+                              PCI_SLOT(CF8_BDF(cf8)),
+                              PCI_FUNC(CF8_BDF(cf8)));
+        addr = ((uint64_t)sbdf << 32) | (CF8_ADDR(cf8) + (p->addr & 3));
+        spin_unlock(&d->arch.hvm_domain.pci_lock);
+    }
+    else
+    {
+        type = p->type;
+        addr = p->addr;
+    }
+
+    switch ( type )
+    {
+    case IOREQ_TYPE_COPY:
+    case IOREQ_TYPE_PIO:
+    case IOREQ_TYPE_PCI_CONFIG:
+        break;
+    default:
+        goto done;
+    }
+
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server_list,
+                          list_entry )
+    {
+        if ( s == d->arch.hvm_domain.default_ioreq_server )
+            continue;
+
+        switch ( type )
+        {
+        case IOREQ_TYPE_COPY:
+        case IOREQ_TYPE_PIO:
+            if ( rangeset_contains_singleton(s->range[type], addr) )
+                goto found;
+
+            break;
+        case IOREQ_TYPE_PCI_CONFIG:
+            if ( rangeset_contains_singleton(s->range[type], addr >> 32) ) {
+                p->type = type;
+                p->addr = addr;
+                goto found;
+            }
+
+            break;
+        }
+    }
+
+ done:
+    s = d->arch.hvm_domain.default_ioreq_server;
+
+ found:
+    return s;
+
+#undef CF8_ADDR
+#undef CF8_BDF
+}
+
 int hvm_buffered_io_send(ioreq_t *p)
 {
-    struct vcpu *v = current;
-    struct domain *d = v->domain;
-    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+    struct domain *d = current->domain;
+    struct hvm_ioreq_server *s = hvm_select_ioreq_server(d, p);
     struct hvm_ioreq_page *iorp;
     buffered_iopage_t *pg;
     buf_ioreq_t bp;
@@ -1830,22 +2375,19 @@ int hvm_buffered_io_send(ioreq_t *p)
 
 bool_t hvm_has_dm(struct domain *d)
 {
-    return !!d->arch.hvm_domain.ioreq_server;
+    return !list_empty(&d->arch.hvm_domain.ioreq_server_list);
 }
 
-bool_t hvm_send_assist_req(ioreq_t *proto_p)
+bool_t hvm_send_assist_req_to_ioreq_server(struct hvm_ioreq_server *s,
+                                           struct vcpu *v,
+                                           ioreq_t *proto_p)
 {
-    struct vcpu *v = current;
     struct domain *d = v->domain;
-    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
     ioreq_t *p;
 
     if ( unlikely(!vcpu_start_shutdown_deferral(v)) )
         return 0; /* implicitly bins the i/o operation */
 
-    if ( !s )
-        return 0;
-
     p = get_ioreq(s, v);
 
     if ( unlikely(p->state != STATE_IOREQ_NONE) )
@@ -1867,11 +2409,35 @@ bool_t hvm_send_assist_req(ioreq_t *proto_p)
      * prepare_wait_on_xen_event_channel() is an implicit barrier.
      */
     p->state = STATE_IOREQ_READY;
-    notify_via_xen_event_channel(v->domain, p->vp_eport);
+    notify_via_xen_event_channel(d, p->vp_eport);
 
     return 1;
 }
 
+bool_t hvm_send_assist_req(ioreq_t *p)
+{
+    struct vcpu *v = current;
+    struct domain *d = v->domain;
+    struct hvm_ioreq_server *s = hvm_select_ioreq_server(d, p);
+
+    if ( !s )
+        return 0;
+
+    return hvm_send_assist_req_to_ioreq_server(s, v, p);
+}
+
+void hvm_broadcast_assist_req(ioreq_t *p)
+{
+    struct vcpu *v = current;
+    struct domain *d = v->domain;
+    struct hvm_ioreq_server *s;
+
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server_list,
+                          list_entry )
+        (void) hvm_send_assist_req_to_ioreq_server(s, v, p);
+}
+
 void hvm_hlt(unsigned long rflags)
 {
     struct vcpu *curr = current;
@@ -4473,6 +5039,145 @@ static int hvmop_flush_tlb_all(void)
     return 0;
 }
 
+static int hvmop_create_ioreq_server(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_create_ioreq_server_t) uop)
+{
+    struct domain *curr_d = current->domain;
+    xen_hvm_create_ioreq_server_t op;
+    struct domain *d;
+    int rc;
+
+    if ( copy_from_guest(&op, uop, 1) )
+        return -EFAULT;
+
+    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
+    if ( rc != 0 )
+        return rc;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    rc = hvm_create_ioreq_server(d, curr_d->domain_id, 0, &op.id);
+    if ( rc != 0 )
+        goto out;
+
+    rc = copy_to_guest(uop, &op, 1) ? -EFAULT : 0;
+    
+ out:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
+static int hvmop_get_ioreq_server_info(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_get_ioreq_server_info_t) uop)
+{
+    xen_hvm_get_ioreq_server_info_t op;
+    struct domain *d;
+    int rc;
+
+    if ( copy_from_guest(&op, uop, 1) )
+        return -EFAULT;
+
+    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
+    if ( rc != 0 )
+        return rc;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    if ( (rc = hvm_get_ioreq_server_info(d, op.id,
+                                         &op.ioreq_pfn,
+                                         &op.bufioreq_pfn, 
+                                         &op.bufioreq_port)) < 0 )
+        goto out;
+
+    rc = copy_to_guest(uop, &op, 1) ? -EFAULT : 0;
+    
+ out:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
+static int hvmop_map_io_range_to_ioreq_server(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_io_range_t) uop)
+{
+    xen_hvm_io_range_t op;
+    struct domain *d;
+    int rc;
+
+    if ( copy_from_guest(&op, uop, 1) )
+        return -EFAULT;
+
+    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
+    if ( rc != 0 )
+        return rc;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    rc = hvm_map_io_range_to_ioreq_server(d, op.id, op.type,
+                                          op.start, op.end);
+
+ out:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
+static int hvmop_unmap_io_range_from_ioreq_server(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_io_range_t) uop)
+{
+    xen_hvm_io_range_t op;
+    struct domain *d;
+    int rc;
+
+    if ( copy_from_guest(&op, uop, 1) )
+        return -EFAULT;
+
+    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
+    if ( rc != 0 )
+        return rc;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    rc = hvm_unmap_io_range_from_ioreq_server(d, op.id, op.type,
+                                              op.start, op.end);
+    
+ out:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
+static int hvmop_destroy_ioreq_server(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_destroy_ioreq_server_t) uop)
+{
+    xen_hvm_destroy_ioreq_server_t op;
+    struct domain *d;
+    int rc;
+
+    if ( copy_from_guest(&op, uop, 1) )
+        return -EFAULT;
+
+    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
+    if ( rc != 0 )
+        return rc;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    hvm_destroy_ioreq_server(d, op.id);
+    rc = 0;
+
+ out:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
 #define HVMOP_op_mask 0xff
 
 long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
@@ -4484,6 +5189,31 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
 
     switch ( op &= HVMOP_op_mask )
     {
+    case HVMOP_create_ioreq_server:
+        rc = hvmop_create_ioreq_server(
+            guest_handle_cast(arg, xen_hvm_create_ioreq_server_t));
+        break;
+    
+    case HVMOP_get_ioreq_server_info:
+        rc = hvmop_get_ioreq_server_info(
+            guest_handle_cast(arg, xen_hvm_get_ioreq_server_info_t));
+        break;
+    
+    case HVMOP_map_io_range_to_ioreq_server:
+        rc = hvmop_map_io_range_to_ioreq_server(
+            guest_handle_cast(arg, xen_hvm_io_range_t));
+        break;
+    
+    case HVMOP_unmap_io_range_from_ioreq_server:
+        rc = hvmop_unmap_io_range_from_ioreq_server(
+            guest_handle_cast(arg, xen_hvm_io_range_t));
+        break;
+    
+    case HVMOP_destroy_ioreq_server:
+        rc = hvmop_destroy_ioreq_server(
+            guest_handle_cast(arg, xen_hvm_destroy_ioreq_server_t));
+        break;
+    
     case HVMOP_set_param:
     case HVMOP_get_param:
     {
@@ -4637,6 +5367,25 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
                 if ( a.value > SHUTDOWN_MAX )
                     rc = -EINVAL;
                 break;
+            case HVM_PARAM_IOREQ_SERVER_PFN:
+                if ( d == current->domain ) {
+                    rc = -EPERM;
+                    break;
+                }
+                d->arch.hvm_domain.ioreq_gmfn_base = a.value;
+                break;
+            case HVM_PARAM_NR_IOREQ_SERVER_PAGES:
+                if ( d == current->domain ) {
+                    rc = -EPERM;
+                    break;
+                }
+                if ( a.value == 0 ||
+                     a.value > sizeof(unsigned long) * 8 ) {
+                    rc = -EINVAL;
+                    break;
+                }
+                d->arch.hvm_domain.ioreq_gmfn_count = a.value;
+                break;
             }
 
             if ( rc == 0 ) 
@@ -4670,6 +5419,12 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
             case HVM_PARAM_ACPI_S_STATE:
                 a.value = d->arch.hvm_domain.is_s3_suspended ? 3 : 0;
                 break;
+            case HVM_PARAM_IOREQ_SERVER_PFN:
+            case HVM_PARAM_NR_IOREQ_SERVER_PAGES:
+                if ( d == current->domain ) {
+                    rc = -EPERM;
+                    break;
+                }
             case HVM_PARAM_IOREQ_PFN:
             case HVM_PARAM_BUFIOREQ_PFN:
             case HVM_PARAM_BUFIOREQ_EVTCHN: {
@@ -4677,7 +5432,7 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
                 
                 /* May need to create server */
                 domid = d->arch.hvm_domain.params[HVM_PARAM_DM_DOMAIN];
-                rc = hvm_create_ioreq_server(d, domid);
+                rc = hvm_create_ioreq_server(d, domid, 1, NULL);
                 if ( rc != 0 && rc != -EEXIST )
                     goto param_fail;
                 /*FALLTHRU*/
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index f5ad9be..bf68837 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -74,7 +74,7 @@ void send_invalidate_req(void)
         .data = ~0UL, /* flush all */
     };
 
-    (void)hvm_send_assist_req(&p);
+    hvm_broadcast_assist_req(&p);
 }
 
 int handle_mmio(void)
diff --git a/xen/common/rangeset.c b/xen/common/rangeset.c
index 2b986fb..ccda2f4 100644
--- a/xen/common/rangeset.c
+++ b/xen/common/rangeset.c
@@ -25,6 +25,9 @@ struct rangeset {
 
     /* Ordered list of ranges contained in this set, and protecting lock. */
     struct list_head range_list;
+    unsigned int     range_count;
+# define MAX_RANGE_COUNT 256
+
     spinlock_t       lock;
 
     /* Pretty-printing name. */
@@ -81,12 +84,32 @@ static void insert_range(
 
 /* Remove a range from its list and free it. */
 static void destroy_range(
-    struct range *x)
+    struct rangeset *r, struct range *x)
 {
+    ASSERT(r->range_count != 0);
+    r->range_count--;
+
     list_del(&x->list);
     xfree(x);
 }
 
+/* Allocate a new range */
+static struct range *alloc_range(
+    struct rangeset *r)
+{
+    struct range *x;
+
+    ASSERT(r->range_count <= MAX_RANGE_COUNT);
+    if ( r->range_count == MAX_RANGE_COUNT)
+        return NULL;
+
+    x = xmalloc(struct range);
+    if ( x )
+        r->range_count++;
+
+    return x;
+}
+
 /*****************************
  * Core public functions
  */
@@ -108,7 +131,7 @@ int rangeset_add_range(
     {
         if ( (x == NULL) || ((x->e < s) && ((x->e + 1) != s)) )
         {
-            x = xmalloc(struct range);
+            x = alloc_range(r);
             if ( x == NULL )
             {
                 rc = -ENOMEM;
@@ -143,7 +166,7 @@ int rangeset_add_range(
             y = next_range(r, x);
             if ( (y == NULL) || (y->e > x->e) )
                 break;
-            destroy_range(y);
+            destroy_range(r, y);
         }
     }
 
@@ -151,7 +174,7 @@ int rangeset_add_range(
     if ( (y != NULL) && ((x->e + 1) == y->s) )
     {
         x->e = y->e;
-        destroy_range(y);
+        destroy_range(r, y);
     }
 
  out:
@@ -179,7 +202,7 @@ int rangeset_remove_range(
 
         if ( (x->s < s) && (x->e > e) )
         {
-            y = xmalloc(struct range);
+            y = alloc_range(r);
             if ( y == NULL )
             {
                 rc = -ENOMEM;
@@ -193,7 +216,7 @@ int rangeset_remove_range(
             insert_range(r, x, y);
         }
         else if ( (x->s == s) && (x->e <= e) )
-            destroy_range(x);
+            destroy_range(r, x);
         else if ( x->s == s )
             x->s = e + 1;
         else if ( x->e <= e )
@@ -214,12 +237,12 @@ int rangeset_remove_range(
         {
             t = x;
             x = next_range(r, x);
-            destroy_range(t);
+            destroy_range(r, t);
         }
 
         x->s = e + 1;
         if ( x->s > x->e )
-            destroy_range(x);
+            destroy_range(r, x);
     }
 
  out:
@@ -312,6 +335,7 @@ struct rangeset *rangeset_new(
 
     spin_lock_init(&r->lock);
     INIT_LIST_HEAD(&r->range_list);
+    r->range_count = 0;
 
     BUG_ON(flags & ~RANGESETF_prettyprint_hex);
     r->flags = flags;
@@ -351,7 +375,7 @@ void rangeset_destroy(
     }
 
     while ( (x = first_range(r)) != NULL )
-        destroy_range(x);
+        destroy_range(r, x);
 
     xfree(r);
 }
@@ -461,3 +485,13 @@ void rangeset_domain_printk(
 
     spin_unlock(&d->rangesets_lock);
 }
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/common/vsprintf.c b/xen/common/vsprintf.c
index 8c43282..5e5816d 100644
--- a/xen/common/vsprintf.c
+++ b/xen/common/vsprintf.c
@@ -631,6 +631,62 @@ int scnprintf(char * buf, size_t size, const char *fmt, ...)
 }
 EXPORT_SYMBOL(scnprintf);
 
+/**
+ * vasprintf - Format a string and allocate a buffer to place it in
+ *
+ * @bufp: Pointer to a pointer to receive the allocated buffer
+ * @fmt: The format string to use
+ * @args: Arguments for the format string
+ *
+ * -ENOMEM is returned on failure and @bufp is not touched.
+ * On success, 0 is returned. The buffer passed back is
+ * guaranteed to be null terminated. The memory is allocated
+ * from xenheap, so the buffer should be freed with xfree().
+ */
+int vasprintf(char **bufp, const char *fmt, va_list args)
+{
+    va_list args_copy;
+    size_t size;
+    char *buf, dummy[1];
+
+    va_copy(args_copy, args);
+    size = vsnprintf(dummy, 0, fmt, args_copy);
+    va_end(args_copy);
+
+    buf = _xmalloc(++size, 1);
+    if ( !buf )
+        return -ENOMEM;
+
+    (void) vsnprintf(buf, size, fmt, args);
+
+    *bufp = buf;
+    return 0;
+}
+EXPORT_SYMBOL(vasprintf);
+
+/**
+ * asprintf - Format a string and place it in a buffer
+ * @bufp: Pointer to a pointer to receive the allocated buffer
+ * @fmt: The format string to use
+ * @...: Arguments for the format string
+ *
+ * -ENOMEM is returned on failure and @bufp is not touched.
+ * On success, 0 is returned. The buffer passed back is
+ * guaranteed to be null terminated. The memory is allocated
+ * from xenheap, so the buffer should be freed with xfree().
+ */
+int asprintf(char **bufp, const char *fmt, ...)
+{
+    va_list args;
+    int i;
+
+    va_start(args, fmt);
+    i=vasprintf(bufp,fmt,args);
+    va_end(args);
+    return i;
+}
+EXPORT_SYMBOL(asprintf);
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index cd885de..522e2c7 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -34,8 +34,10 @@
 #include <public/grant_table.h>
 #include <public/hvm/params.h>
 #include <public/hvm/save.h>
+#include <public/hvm/hvm_op.h>
 
 struct hvm_ioreq_page {
+    unsigned long gmfn;
     struct page_info *page;
     void *va;
 };
@@ -46,7 +48,10 @@ struct hvm_ioreq_vcpu {
     evtchn_port_t    ioreq_evtchn;
 };
 
+#define MAX_IO_RANGE_TYPE (HVMOP_IO_RANGE_PCI + 1)
+
 struct hvm_ioreq_server {
+    struct list_head       list_entry;
     struct domain          *domain;
 
     /* Lock to serialize toolstack modifications */
@@ -54,6 +59,7 @@ struct hvm_ioreq_server {
 
     /* Domain id of emulating domain */
     domid_t                domid;
+    ioservid_t             id;
     struct hvm_ioreq_page  ioreq;
     struct list_head       ioreq_vcpu_list;
     struct hvm_ioreq_page  bufioreq;
@@ -61,11 +67,25 @@ struct hvm_ioreq_server {
     /* Lock to serialize access to buffered ioreq ring */
     spinlock_t             bufioreq_lock;
     evtchn_port_t          bufioreq_evtchn;
+    struct rangeset        *range[MAX_IO_RANGE_TYPE];
 };
 
 struct hvm_domain {
-    struct hvm_ioreq_server *ioreq_server;
+    /* Guest page range used for non-default ioreq servers */
+    unsigned long           ioreq_gmfn_base;
+    unsigned long           ioreq_gmfn_mask;
+    unsigned int            ioreq_gmfn_count;
+
+    /* Lock protects all other values in the following block */
     spinlock_t              ioreq_server_lock;
+    ioservid_t              ioreq_server_id;
+    unsigned int            ioreq_server_count;
+    struct list_head        ioreq_server_list;
+    struct hvm_ioreq_server *default_ioreq_server;
+
+    /* Cached CF8 for guest PCI config cycles */
+    uint32_t                pci_cf8;
+    spinlock_t              pci_lock;
 
     struct pl_time         pl_time;
 
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index b0f7be5..28c2bd9 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -229,6 +229,7 @@ int prepare_ring_for_helper(struct domain *d, unsigned long gmfn,
 void destroy_ring_for_helper(void **_va, struct page_info *page);
 
 bool_t hvm_send_assist_req(ioreq_t *p);
+void hvm_broadcast_assist_req(ioreq_t *p);
 
 void hvm_get_guest_pat(struct vcpu *v, u64 *guest_pat);
 int hvm_set_guest_pat(struct vcpu *v, u64 guest_pat);
diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
index f00f6d2..f29f2b6 100644
--- a/xen/include/public/hvm/hvm_op.h
+++ b/xen/include/public/hvm/hvm_op.h
@@ -23,6 +23,7 @@
 
 #include "../xen.h"
 #include "../trace.h"
+#include "../event_channel.h"
 
 /* Get/set subcommands: extra argument == pointer to xen_hvm_param struct. */
 #define HVMOP_set_param           0
@@ -232,6 +233,123 @@ struct xen_hvm_inject_msi {
 typedef struct xen_hvm_inject_msi xen_hvm_inject_msi_t;
 DEFINE_XEN_GUEST_HANDLE(xen_hvm_inject_msi_t);
 
+/*
+ * IOREQ Servers
+ *
+ * The interface between an I/O emulator an Xen is called an IOREQ Server.
+ * A domain supports a single 'legacy' IOREQ Server which is instantiated if
+ * parameter...
+ *
+ * HVM_PARAM_IOREQ_PFN is read (to get the gmfn containing the synchronous
+ * ioreq structures), or...
+ * HVM_PARAM_BUFIOREQ_PFN is read (to get the gmfn containing the buffered
+ * ioreq ring), or...
+ * HVM_PARAM_BUFIOREQ_EVTCHN is read (to get the event channel that Xen uses
+ * to request buffered I/O emulation).
+ * 
+ * The following hypercalls facilitate the creation of IOREQ Servers for
+ * 'secondary' emulators which are invoked to implement port I/O, memory, or
+ * PCI config space ranges which they explicitly register.
+ */
+
+typedef uint16_t ioservid_t;
+DEFINE_XEN_GUEST_HANDLE(ioservid_t);
+
+/*
+ * HVMOP_create_ioreq_server: Instantiate a new IOREQ Server for a secondary
+ *                            emulator servicing domain <domid>.
+ *
+ * The <id> handed back is unique for <domid>.
+ */
+#define HVMOP_create_ioreq_server 17
+struct xen_hvm_create_ioreq_server {
+    domid_t domid; /* IN - domain to be serviced */
+    ioservid_t id; /* OUT - server id */
+};
+typedef struct xen_hvm_create_ioreq_server xen_hvm_create_ioreq_server_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_create_ioreq_server_t);
+
+/*
+ * HVMOP_get_ioreq_server_info: Get all the information necessary to access
+ *                              IOREQ Server <id>. 
+ *
+ * The emulator needs to map the synchronous ioreq structures and buffered
+ * ioreq ring that Xen uses to request emulation. These are hosted in domain
+ * <domid>'s gmfns <ioreq_pfn> and <bufioreq_pfn> respectively. In addition the
+ * emulator needs to bind to event channel <bufioreq_port> to listen for
+ * buffered emulation requests. (The event channels used for synchronous
+ * emulation requests are specified in the per-CPU ioreq structures in
+ * <ioreq_pfn>).
+ */
+#define HVMOP_get_ioreq_server_info 18
+struct xen_hvm_get_ioreq_server_info {
+    domid_t domid;               /* IN - domain to be serviced */
+    ioservid_t id;               /* IN - server id */
+    xen_pfn_t ioreq_pfn;         /* OUT - sync ioreq pfn */
+    xen_pfn_t bufioreq_pfn;      /* OUT - buffered ioreq pfn */
+    evtchn_port_t bufioreq_port; /* OUT - buffered ioreq port */
+};
+typedef struct xen_hvm_get_ioreq_server_info xen_hvm_get_ioreq_server_info_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_get_ioreq_server_info_t);
+
+/*
+ * HVM_map_io_range_to_ioreq_server: Register an I/O range of domain <domid>
+ *                                   for emulation by the client of IOREQ
+ *                                   Server <id>
+ * HVM_unmap_io_range_from_ioreq_server: Deregister an I/O range of <domid>
+ *                                       for emulation by the client of IOREQ
+ *                                       Server <id>
+ *
+ * There are three types of I/O that can be emulated: port I/O, memory accesses
+ * and PCI config space accesses. The <type> field denotes which type of range
+ * the <start> and <end> (inclusive) fields are specifying.
+ * PCI config space ranges are specified by segment/bus/device/function values
+ * which should be encoded using the HVMOP_PCI_SBDF helper macro below.
+ */
+#define HVMOP_map_io_range_to_ioreq_server 19
+#define HVMOP_unmap_io_range_from_ioreq_server 20
+struct xen_hvm_io_range {
+    domid_t domid;               /* IN - domain to be serviced */
+    ioservid_t id;               /* IN - server id */
+    uint32_t type;               /* IN - type of range */
+# define HVMOP_IO_RANGE_PORT   0 /* I/O port range */
+# define HVMOP_IO_RANGE_MEMORY 1 /* MMIO range */
+# define HVMOP_IO_RANGE_PCI    2 /* PCI segment/bus/dev/func range */
+    uint64_aligned_t start, end; /* IN - inclusive start and end of range */
+};
+typedef struct xen_hvm_io_range xen_hvm_io_range_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_io_range_t);
+
+#define HVMOP_PCI_SBDF(s,b,d,f)                 \
+	((((s) & 0xffff) << 16) |                   \
+	 (((b) & 0xff) << 8) |                      \
+	 (((d) & 0x1f) << 3) |                      \
+	 ((f) & 0x07))
+
+/*
+ * HVMOP_destroy_ioreq_server: Destroy the IOREQ Server <id> servicing domain
+ *                             <domid>.
+ *
+ * Any registered I/O ranges will be automatically deregistered.
+ */
+#define HVMOP_destroy_ioreq_server 21
+struct xen_hvm_destroy_ioreq_server {
+    domid_t domid; /* IN - domain to be serviced */
+    ioservid_t id; /* IN - server id */
+};
+typedef struct xen_hvm_destroy_ioreq_server xen_hvm_destroy_ioreq_server_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_destroy_ioreq_server_t);
+
 #endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
 
 #endif /* __XEN_PUBLIC_HVM_HVM_OP_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/public/hvm/ioreq.h b/xen/include/public/hvm/ioreq.h
index f05d130..533089d 100644
--- a/xen/include/public/hvm/ioreq.h
+++ b/xen/include/public/hvm/ioreq.h
@@ -24,6 +24,8 @@
 #ifndef _IOREQ_H_
 #define _IOREQ_H_
 
+#include "hvm_op.h"
+
 #define IOREQ_READ      1
 #define IOREQ_WRITE     0
 
@@ -32,15 +34,22 @@
 #define STATE_IOREQ_INPROCESS   2
 #define STATE_IORESP_READY      3
 
-#define IOREQ_TYPE_PIO          0 /* pio */
-#define IOREQ_TYPE_COPY         1 /* mmio ops */
+#define IOREQ_TYPE_PIO          (HVMOP_IO_RANGE_PORT)
+#define IOREQ_TYPE_COPY         (HVMOP_IO_RANGE_MEMORY)
+#define IOREQ_TYPE_PCI_CONFIG   (HVMOP_IO_RANGE_PCI)
 #define IOREQ_TYPE_TIMEOFFSET   7
 #define IOREQ_TYPE_INVALIDATE   8 /* mapcache */
 
 /*
  * VMExit dispatcher should cooperate with instruction decoder to
  * prepare this structure and notify service OS and DM by sending
- * virq
+ * virq.
+ *
+ * For I/O type IOREQ_TYPE_PCI_CONFIG, the physical address is formatted
+ * as follows:
+ * 
+ * 63....48|47..40|39..35|34..32|31........0
+ * SEGMENT |BUS   |DEV   |FN    |OFFSET
  */
 struct ioreq {
     uint64_t addr;          /* physical address */
diff --git a/xen/include/public/hvm/params.h b/xen/include/public/hvm/params.h
index 517a184..f830bdd 100644
--- a/xen/include/public/hvm/params.h
+++ b/xen/include/public/hvm/params.h
@@ -145,6 +145,9 @@
 /* SHUTDOWN_* action in case of a triple fault */
 #define HVM_PARAM_TRIPLE_FAULT_REASON 31
 
-#define HVM_NR_PARAMS          32
+#define HVM_PARAM_IOREQ_SERVER_PFN 32
+#define HVM_PARAM_NR_IOREQ_SERVER_PAGES 33
+
+#define HVM_NR_PARAMS          34
 
 #endif /* __XEN_PUBLIC_HVM_PARAMS_H__ */
diff --git a/xen/include/xen/lib.h b/xen/include/xen/lib.h
index 1369b2b..e81b80e 100644
--- a/xen/include/xen/lib.h
+++ b/xen/include/xen/lib.h
@@ -104,6 +104,10 @@ extern int scnprintf(char * buf, size_t size, const char * fmt, ...)
     __attribute__ ((format (printf, 3, 4)));
 extern int vscnprintf(char *buf, size_t size, const char *fmt, va_list args)
     __attribute__ ((format (printf, 3, 0)));
+extern int asprintf(char ** bufp, const char * fmt, ...)
+    __attribute__ ((format (printf, 2, 3)));
+extern int vasprintf(char ** bufp, const char * fmt, va_list args)
+    __attribute__ ((format (printf, 2, 0)));
 
 long simple_strtol(
     const char *cp,const char **endp, unsigned int base);
diff --git a/xen/include/xen/stdarg.h b/xen/include/xen/stdarg.h
index 216fe6d..29249a1 100644
--- a/xen/include/xen/stdarg.h
+++ b/xen/include/xen/stdarg.h
@@ -2,6 +2,7 @@
 #define __XEN_STDARG_H__
 
 typedef __builtin_va_list va_list;
+#define va_copy(dest, src)    __builtin_va_copy((dest), (src))
 #define va_start(ap, last)    __builtin_va_start((ap), (last))
 #define va_end(ap)            __builtin_va_end(ap)
 #define va_arg                __builtin_va_arg
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 7/9] ioreq-server: remove p2m entries when server is enabled
  2014-05-01 12:08 [PATCH v5 0/9] Support for running secondary emulators Paul Durrant
                   ` (5 preceding siblings ...)
  2014-05-01 12:08 ` [PATCH v5 6/9] ioreq-server: add support for multiple servers Paul Durrant
@ 2014-05-01 12:08 ` Paul Durrant
  2014-05-06 10:48   ` Ian Campbell
  2014-05-07 12:09   ` Jan Beulich
  2014-05-01 12:08 ` [PATCH v5 8/9] ioreq-server: make buffered ioreq handling optional Paul Durrant
                   ` (2 subsequent siblings)
  9 siblings, 2 replies; 57+ messages in thread
From: Paul Durrant @ 2014-05-01 12:08 UTC (permalink / raw)
  To: xen-devel
  Cc: Paul Durrant, Ian Jackson, Ian Campbell, Jan Beulich, Stefano Stabellini

For secondary servers, add a hvm op to enable/disable the server. The
server will not accept IO until it is enabled and the act of enabling
the server removes its pages from the guest p2m, thus preventing the guest
from directly mapping the pages and synthesizing ioreqs.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
---
 tools/libxc/xc_domain.c          |   27 +++++++++
 tools/libxc/xenctrl.h            |   16 ++++++
 xen/arch/x86/hvm/hvm.c           |  113 +++++++++++++++++++++++++++++++++++++-
 xen/include/asm-x86/hvm/domain.h |    1 +
 xen/include/public/hvm/hvm_op.h  |   16 ++++++
 5 files changed, 172 insertions(+), 1 deletion(-)

diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index b3ed029..99101e7 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1503,6 +1503,33 @@ int xc_hvm_destroy_ioreq_server(xc_interface *xch,
     return rc;
 }
 
+int xc_hvm_set_ioreq_server_state(xc_interface *xch,
+                                  domid_t domid,
+                                  ioservid_t id,
+                                  int enabled)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_set_ioreq_server_state_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_set_ioreq_server_state;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+
+    arg->domid = domid;
+    arg->id = id;
+    arg->enabled = !!enabled;
+
+    rc = do_xen_hypercall(xch, &hypercall);
+
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
 int xc_domain_setdebugging(xc_interface *xch,
                            uint32_t domid,
                            unsigned int enable)
diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
index 74fb738..74aa2bf 100644
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -1910,6 +1910,22 @@ int xc_hvm_destroy_ioreq_server(xc_interface *xch,
                                 domid_t domid,
                                 ioservid_t id);
 
+/**
+ * This function sets IOREQ Server state. An IOREQ Server
+ * will not be passed emulation requests until it is in
+ * the enabled state.
+ *
+ * @parm xch a handle to an open hypervisor interface.
+ * @parm domid the domain id to be serviced
+ * @parm id the IOREQ Server id.
+ * @parm enabled the state.
+ * @return 0 on success, -1 on failure.
+ */
+int xc_hvm_set_ioreq_server_state(xc_interface *xch,
+                                  domid_t domid,
+                                  ioservid_t id,
+                                  int enabled);
+
 /* HVM guest pass-through */
 int xc_assign_device(xc_interface *xch,
                      uint32_t domid,
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 5ac2d93..5715ed4 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -553,6 +553,20 @@ static int hvm_map_ioreq_page(
     return 0;
 }
 
+static void hvm_remove_ioreq_gmfn(
+    struct domain *d, struct hvm_ioreq_page *iorp)
+{
+    guest_physmap_remove_page(d, iorp->gmfn, 
+                              page_to_mfn(iorp->page), 0);
+}
+
+static int hvm_add_ioreq_gmfn(
+    struct domain *d, struct hvm_ioreq_page *iorp)
+{
+    return guest_physmap_add_page(d, iorp->gmfn,
+                                  page_to_mfn(iorp->page), 0);
+}
+
 static int hvm_print_line(
     int dir, uint32_t port, uint32_t bytes, uint32_t *val)
 {
@@ -946,6 +960,26 @@ static void hvm_ioreq_server_free_rangesets(struct hvm_ioreq_server *s,
         rangeset_destroy(s->range[i]);
 }
 
+static void hvm_ioreq_server_enable(struct hvm_ioreq_server *s)
+{
+    struct domain *d = s->domain;
+
+    hvm_remove_ioreq_gmfn(d, &s->ioreq);
+    hvm_remove_ioreq_gmfn(d, &s->bufioreq);
+
+    s->enabled = 1;
+}
+
+static void hvm_ioreq_server_disable(struct hvm_ioreq_server *s)
+{
+    struct domain *d = s->domain;
+
+    hvm_add_ioreq_gmfn(d, &s->bufioreq);
+    hvm_add_ioreq_gmfn(d, &s->ioreq);
+
+    s->enabled = 0;
+}
+
 static int hvm_ioreq_server_init(struct hvm_ioreq_server *s, struct domain *d,
                                  domid_t domid, bool_t is_default,
                                  ioservid_t id)
@@ -1000,6 +1034,9 @@ static void hvm_ioreq_server_deinit(struct hvm_ioreq_server *s,
     gdprintk(XENLOG_DEBUG, "%s %d:%d %s\n", __func__, s->domid, s->id,
              is_default ? "[DEFAULT]" : "");
 
+    if ( !is_default && s->enabled )
+        hvm_ioreq_server_disable(s);
+
     hvm_ioreq_server_remove_all_vcpus(s);
     hvm_ioreq_server_unmap_pages(s, is_default);
     hvm_ioreq_server_free_rangesets(s, is_default);
@@ -1032,8 +1069,10 @@ static int hvm_create_ioreq_server(struct domain *d, domid_t domid,
              &d->arch.hvm_domain.ioreq_server_list);
     d->arch.hvm_domain.ioreq_server_count++;
 
-    if ( is_default )
+    if ( is_default ) {
+        s->enabled = 1;
         d->arch.hvm_domain.default_ioreq_server = s;
+    }
 
     if (id != NULL)
         *id = s->id;
@@ -1207,6 +1246,44 @@ static int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
     return rc;
 }
 
+static int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id,
+                                      bool_t enabled)
+{
+    struct list_head *entry;
+    int rc;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    rc = -ENOENT;
+    list_for_each ( entry,
+                    &d->arch.hvm_domain.ioreq_server_list )
+    {
+        struct hvm_ioreq_server *s = list_entry(entry,
+                                                struct hvm_ioreq_server,
+                                                list_entry);
+        if ( s->id != id )
+            continue;
+
+        rc = 0;
+        if ( s->enabled == enabled )
+            break;
+
+        domain_pause(d);
+
+        if ( enabled )
+            hvm_ioreq_server_enable(s);
+        else
+            hvm_ioreq_server_disable(s);
+
+        domain_unpause(d);
+        break;
+    }
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    return rc;
+}
+
 static int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v)
 {
     struct hvm_ioreq_server *s;
@@ -2261,6 +2338,9 @@ static struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
         if ( s == d->arch.hvm_domain.default_ioreq_server )
             continue;
 
+        if ( !s->enabled )
+            continue;
+
         switch ( type )
         {
         case IOREQ_TYPE_COPY:
@@ -2284,6 +2364,7 @@ static struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
     s = d->arch.hvm_domain.default_ioreq_server;
 
  found:
+    ASSERT(!s || s->enabled);
     return s;
 
 #undef CF8_ADDR
@@ -5152,6 +5233,31 @@ static int hvmop_unmap_io_range_from_ioreq_server(
     return rc;
 }
 
+static int hvmop_set_ioreq_server_state(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_set_ioreq_server_state_t) uop)
+{
+    xen_hvm_set_ioreq_server_state_t op;
+    struct domain *d;
+    int rc;
+
+    if ( copy_from_guest(&op, uop, 1) )
+        return -EFAULT;
+
+    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
+    if ( rc != 0 )
+        return rc;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    rc = hvm_set_ioreq_server_state(d, op.id, !!op.enabled);
+
+ out:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
 static int hvmop_destroy_ioreq_server(
     XEN_GUEST_HANDLE_PARAM(xen_hvm_destroy_ioreq_server_t) uop)
 {
@@ -5208,6 +5314,11 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
         rc = hvmop_unmap_io_range_from_ioreq_server(
             guest_handle_cast(arg, xen_hvm_io_range_t));
         break;
+
+    case HVMOP_set_ioreq_server_state:
+        rc = hvmop_set_ioreq_server_state(
+            guest_handle_cast(arg, xen_hvm_set_ioreq_server_state_t));
+        break;
     
     case HVMOP_destroy_ioreq_server:
         rc = hvmop_destroy_ioreq_server(
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index 522e2c7..9220e2f 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -68,6 +68,7 @@ struct hvm_ioreq_server {
     spinlock_t             bufioreq_lock;
     evtchn_port_t          bufioreq_evtchn;
     struct rangeset        *range[MAX_IO_RANGE_TYPE];
+    bool_t                 enabled;
 };
 
 struct hvm_domain {
diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
index f29f2b6..5ca1cb2 100644
--- a/xen/include/public/hvm/hvm_op.h
+++ b/xen/include/public/hvm/hvm_op.h
@@ -340,6 +340,22 @@ struct xen_hvm_destroy_ioreq_server {
 typedef struct xen_hvm_destroy_ioreq_server xen_hvm_destroy_ioreq_server_t;
 DEFINE_XEN_GUEST_HANDLE(xen_hvm_destroy_ioreq_server_t);
 
+/*
+ * HVMOP_set_ioreq_server_state: Enable or disable the IOREQ Server <id> servicing
+ *                               domain <domid>.
+ *
+ * The IOREQ Server will not be passed any emulation requests until it is in the
+ * enabled state.
+ */
+#define HVMOP_set_ioreq_server_state 22
+struct xen_hvm_set_ioreq_server_state {
+    domid_t domid;   /* IN - domain to be serviced */
+    ioservid_t id;   /* IN - server id */
+    uint8_t enabled; /* IN - enabled? */    
+};
+typedef struct xen_hvm_set_ioreq_server_state xen_hvm_set_ioreq_server_state_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_set_ioreq_server_state_t);
+
 #endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
 
 #endif /* __XEN_PUBLIC_HVM_HVM_OP_H__ */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 8/9] ioreq-server: make buffered ioreq handling optional
  2014-05-01 12:08 [PATCH v5 0/9] Support for running secondary emulators Paul Durrant
                   ` (6 preceding siblings ...)
  2014-05-01 12:08 ` [PATCH v5 7/9] ioreq-server: remove p2m entries when server is enabled Paul Durrant
@ 2014-05-01 12:08 ` Paul Durrant
  2014-05-06 10:52   ` Ian Campbell
  2014-05-07 12:13   ` Jan Beulich
  2014-05-01 12:08 ` [PATCH v5 9/9] ioreq-server: bring the PCI hotplug controller implementation into Xen Paul Durrant
  2014-05-07 14:41 ` [PATCH v5 0/9] Support for running secondary emulators Jan Beulich
  9 siblings, 2 replies; 57+ messages in thread
From: Paul Durrant @ 2014-05-01 12:08 UTC (permalink / raw)
  To: xen-devel
  Cc: Paul Durrant, Ian Jackson, Ian Campbell, Jan Beulich, Stefano Stabellini

Some emulators will only register regions that require non-buffered
access. (In practice the only region that a guest uses buffered access
for today is the VGA aperture from 0xa0000-0xbffff). This patch therefore
makes allocation of the buffered ioreq page and event channel optional for
secondary ioreq servers.

If a guest attempts buffered access to an ioreq server that does not
support it, the access will be handled via the normal synchronous path.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
---
 tools/libxc/xc_domain.c         |    2 ++
 tools/libxc/xenctrl.h           |    1 +
 xen/arch/x86/hvm/hvm.c          |   65 ++++++++++++++++++++++++++-------------
 xen/include/public/hvm/hvm_op.h |   24 +++++++++------
 4 files changed, 62 insertions(+), 30 deletions(-)

diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 99101e7..67090bd 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1286,6 +1286,7 @@ int xc_get_hvm_param(xc_interface *handle, domid_t dom, int param, unsigned long
 
 int xc_hvm_create_ioreq_server(xc_interface *xch,
                                domid_t domid,
+                               int handle_bufioreq,
                                ioservid_t *id)
 {
     DECLARE_HYPERCALL;
@@ -1301,6 +1302,7 @@ int xc_hvm_create_ioreq_server(xc_interface *xch,
     hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
 
     arg->domid = domid;
+    arg->handle_bufioreq = !!handle_bufioreq;
 
     rc = do_xen_hypercall(xch, &hypercall);
 
diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
index 74aa2bf..60f6abc 100644
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -1801,6 +1801,7 @@ int xc_get_hvm_param(xc_interface *handle, domid_t dom, int param, unsigned long
  */
 int xc_hvm_create_ioreq_server(xc_interface *xch,
                                domid_t domid,
+                               int handle_bufioreq,
                                ioservid_t *id);
 
 /**
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 5715ed4..e734adb 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -761,7 +761,7 @@ static int hvm_ioreq_server_add_vcpu(struct hvm_ioreq_server *s,
 
     sv->ioreq_evtchn = rc;
 
-    if ( v->vcpu_id == 0 )
+    if ( v->vcpu_id == 0 && s->bufioreq.va != NULL )
     {
         struct domain *d = s->domain;
 
@@ -811,7 +811,7 @@ static void hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s,
 
         list_del_init(&sv->list_entry);
 
-        if ( v->vcpu_id == 0 )
+        if ( v->vcpu_id == 0 && s->bufioreq.va != NULL )
             free_xen_event_channel(v, s->bufioreq_evtchn);
 
         free_xen_event_channel(v, sv->ioreq_evtchn);
@@ -838,7 +838,7 @@ static void hvm_ioreq_server_remove_all_vcpus(struct hvm_ioreq_server *s)
 
         list_del_init(&sv->list_entry);
 
-        if ( v->vcpu_id == 0 )
+        if ( v->vcpu_id == 0 && s->bufioreq.va != NULL )
             free_xen_event_channel(v, s->bufioreq_evtchn);
 
         free_xen_event_channel(v, sv->ioreq_evtchn);
@@ -850,7 +850,7 @@ static void hvm_ioreq_server_remove_all_vcpus(struct hvm_ioreq_server *s)
 }
 
 static int hvm_ioreq_server_map_pages(struct hvm_ioreq_server *s,
-                                      bool_t is_default)
+                                      bool_t is_default, bool_t handle_bufioreq)
 {
     struct domain *d = s->domain;
     unsigned long ioreq_pfn, bufioreq_pfn;
@@ -858,24 +858,34 @@ static int hvm_ioreq_server_map_pages(struct hvm_ioreq_server *s,
 
     if ( is_default ) {
         ioreq_pfn = d->arch.hvm_domain.params[HVM_PARAM_IOREQ_PFN];
+
+        /*
+         * The default ioreq server must handle buffered ioreqs, for
+         * backwards compatibility.
+         */
+        ASSERT(handle_bufioreq);
         bufioreq_pfn = d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_PFN];
     } else {
         rc = hvm_alloc_ioreq_gmfn(d, &ioreq_pfn);
         if ( rc )
             goto fail1;
 
-        rc = hvm_alloc_ioreq_gmfn(d, &bufioreq_pfn);
-        if ( rc )
-            goto fail2;
+        if ( handle_bufioreq ) {
+            rc = hvm_alloc_ioreq_gmfn(d, &bufioreq_pfn);
+            if ( rc )
+                goto fail2;
+        }
     }
 
     rc = hvm_map_ioreq_page(s, 0, ioreq_pfn);
     if ( rc )
         goto fail3;
 
-    rc = hvm_map_ioreq_page(s, 1, bufioreq_pfn);
-    if ( rc )
-        goto fail4;
+    if ( handle_bufioreq ) {
+        rc = hvm_map_ioreq_page(s, 1, bufioreq_pfn);
+        if ( rc )
+            goto fail4;
+    }
 
     return 0;
 
@@ -883,7 +893,7 @@ fail4:
     hvm_unmap_ioreq_page(s, 0);
 
 fail3:
-    if ( !is_default )
+    if ( !is_default && handle_bufioreq )
         hvm_free_ioreq_gmfn(d, bufioreq_pfn);
 
 fail2:
@@ -898,12 +908,17 @@ static void hvm_ioreq_server_unmap_pages(struct hvm_ioreq_server *s,
                                          bool_t is_default)
 {
     struct domain *d = s->domain;
+    bool_t handle_bufioreq = ( s->bufioreq.va != NULL );
+
+    if ( handle_bufioreq )
+        hvm_unmap_ioreq_page(s, 1);
 
-    hvm_unmap_ioreq_page(s, 1);
     hvm_unmap_ioreq_page(s, 0);
 
     if ( !is_default ) {
-        hvm_free_ioreq_gmfn(d, s->bufioreq.gmfn);
+        if ( handle_bufioreq )
+            hvm_free_ioreq_gmfn(d, s->bufioreq.gmfn);
+
         hvm_free_ioreq_gmfn(d, s->ioreq.gmfn);
     }
 }
@@ -982,7 +997,7 @@ static void hvm_ioreq_server_disable(struct hvm_ioreq_server *s)
 
 static int hvm_ioreq_server_init(struct hvm_ioreq_server *s, struct domain *d,
                                  domid_t domid, bool_t is_default,
-                                 ioservid_t id)
+                                 bool_t handle_bufioreq, ioservid_t id)
 {
     struct vcpu *v;
     int rc;
@@ -1002,7 +1017,7 @@ static int hvm_ioreq_server_init(struct hvm_ioreq_server *s, struct domain *d,
     if ( rc )
         goto fail1;
 
-    rc = hvm_ioreq_server_map_pages(s, is_default);
+    rc = hvm_ioreq_server_map_pages(s, is_default, handle_bufioreq);
     if ( rc )
         goto fail2;
 
@@ -1043,7 +1058,8 @@ static void hvm_ioreq_server_deinit(struct hvm_ioreq_server *s,
 }
 
 static int hvm_create_ioreq_server(struct domain *d, domid_t domid,
-                                   bool_t is_default, ioservid_t *id)
+                                   bool_t is_default, bool_t handle_bufioreq,
+                                   ioservid_t *id)
 {
     struct hvm_ioreq_server *s;
     int rc;
@@ -1060,7 +1076,7 @@ static int hvm_create_ioreq_server(struct domain *d, domid_t domid,
     if ( is_default && d->arch.hvm_domain.default_ioreq_server != NULL )
         goto fail2;
 
-    rc = hvm_ioreq_server_init(s, d, domid, is_default,
+    rc = hvm_ioreq_server_init(s, d, domid, is_default, handle_bufioreq,
                                d->arch.hvm_domain.ioreq_server_id++);
     if ( rc )
         goto fail3;
@@ -1146,8 +1162,11 @@ static int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id,
             continue;
 
         *ioreq_pfn = s->ioreq.gmfn;
-        *bufioreq_pfn = s->bufioreq.gmfn;
-        *bufioreq_port = s->bufioreq_evtchn;
+
+        if ( s->bufioreq.va != NULL ) {
+            *bufioreq_pfn = s->bufioreq.gmfn;
+            *bufioreq_port = s->bufioreq_evtchn;
+        }
 
         rc = 0;
         break;
@@ -2390,6 +2409,9 @@ int hvm_buffered_io_send(ioreq_t *p)
     iorp = &s->bufioreq;
     pg = iorp->va;
 
+    if ( !pg )
+        return 0;
+
     /*
      * Return 0 for the cases we can't deal with:
      *  - 'addr' is only a 20-bit field, so we cannot address beyond 1MB
@@ -5139,7 +5161,8 @@ static int hvmop_create_ioreq_server(
     if ( !is_hvm_domain(d) )
         goto out;
 
-    rc = hvm_create_ioreq_server(d, curr_d->domain_id, 0, &op.id);
+    rc = hvm_create_ioreq_server(d, curr_d->domain_id, 0,
+                                 !!op.handle_bufioreq, &op.id);
     if ( rc != 0 )
         goto out;
 
@@ -5543,7 +5566,7 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
                 
                 /* May need to create server */
                 domid = d->arch.hvm_domain.params[HVM_PARAM_DM_DOMAIN];
-                rc = hvm_create_ioreq_server(d, domid, 1, NULL);
+                rc = hvm_create_ioreq_server(d, domid, 1, 1, NULL);
                 if ( rc != 0 && rc != -EEXIST )
                     goto param_fail;
                 /*FALLTHRU*/
diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
index 5ca1cb2..ab4b583 100644
--- a/xen/include/public/hvm/hvm_op.h
+++ b/xen/include/public/hvm/hvm_op.h
@@ -259,12 +259,15 @@ DEFINE_XEN_GUEST_HANDLE(ioservid_t);
  * HVMOP_create_ioreq_server: Instantiate a new IOREQ Server for a secondary
  *                            emulator servicing domain <domid>.
  *
- * The <id> handed back is unique for <domid>.
+ * The <id> handed back is unique for <domid>. If <handle_bufioreq> is zero
+ * the buffered ioreq ring will not be allocated and hence all emulation
+ * requestes to this server will be synchronous.
  */
 #define HVMOP_create_ioreq_server 17
 struct xen_hvm_create_ioreq_server {
-    domid_t domid; /* IN - domain to be serviced */
-    ioservid_t id; /* OUT - server id */
+    domid_t domid;           /* IN - domain to be serviced */
+    uint8_t handle_bufioreq; /* IN - should server handle buffered ioreqs */
+    ioservid_t id;           /* OUT - server id */
 };
 typedef struct xen_hvm_create_ioreq_server xen_hvm_create_ioreq_server_t;
 DEFINE_XEN_GUEST_HANDLE(xen_hvm_create_ioreq_server_t);
@@ -274,12 +277,15 @@ DEFINE_XEN_GUEST_HANDLE(xen_hvm_create_ioreq_server_t);
  *                              IOREQ Server <id>. 
  *
  * The emulator needs to map the synchronous ioreq structures and buffered
- * ioreq ring that Xen uses to request emulation. These are hosted in domain
- * <domid>'s gmfns <ioreq_pfn> and <bufioreq_pfn> respectively. In addition the
- * emulator needs to bind to event channel <bufioreq_port> to listen for
- * buffered emulation requests. (The event channels used for synchronous
- * emulation requests are specified in the per-CPU ioreq structures in
- * <ioreq_pfn>).
+ * ioreq ring (if it exists) that Xen uses to request emulation. These are
+ * hosted in domain <domid>'s gmfns <ioreq_pfn> and <bufioreq_pfn>
+ * respectively. In addition, if the IOREQ Server is handling buffered
+ * emulation requests, the emulator needs to bind to event channel
+ * <bufioreq_port> to listen for them. (The event channels used for
+ * synchronous emulation requests are specified in the per-CPU ioreq
+ * structures in <ioreq_pfn>).
+ * If the IOREQ Server is not handling buffered emulation requests then the
+ * values handed back in <bufioreq_pfn> and <bufioreq_port> will both be 0.
  */
 #define HVMOP_get_ioreq_server_info 18
 struct xen_hvm_get_ioreq_server_info {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 9/9] ioreq-server: bring the PCI hotplug controller implementation into Xen
  2014-05-01 12:08 [PATCH v5 0/9] Support for running secondary emulators Paul Durrant
                   ` (7 preceding siblings ...)
  2014-05-01 12:08 ` [PATCH v5 8/9] ioreq-server: make buffered ioreq handling optional Paul Durrant
@ 2014-05-01 12:08 ` Paul Durrant
  2014-05-06 11:24   ` Ian Campbell
  2014-05-07 14:41 ` [PATCH v5 0/9] Support for running secondary emulators Jan Beulich
  9 siblings, 1 reply; 57+ messages in thread
From: Paul Durrant @ 2014-05-01 12:08 UTC (permalink / raw)
  To: xen-devel
  Cc: Paul Durrant, Ian Jackson, Ian Campbell, Jan Beulich, Stefano Stabellini

Because we may now have more than one emulator, the implementation of the
PCI hotplug controller needs to be done by Xen. Happily the code is very
short and simple and it also removes the need for a different ACPI DSDT
when using different variants of QEMU.

As a precaution, we obscure the IO ranges used by QEMU traditional's gpe
and hotplug controller implementations to avoid the possibility of it
raising an SCI which will never be cleared.

VMs started on an older host and then migrated in will not use the in-Xen
controller as the AML may still point at QEMU traditional's hotplug
controller implementation. This means xc_hvm_pci_hotplug() will fail
with EOPNOTSUPP and it is up to the caller to decide whether this is a
problem or not. libxl will ignore EOPNOTSUPP as it is always hotplugging
via QEMU so it does not matter whether it is Xen or QEMU providing the
implementation.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
---
 tools/firmware/hvmloader/acpi/Makefile  |    2 +-
 tools/firmware/hvmloader/acpi/mk_dsdt.c |  193 +++++---------------------
 tools/libxc/xc_domain.c                 |   27 ++++
 tools/libxc/xc_domain_restore.c         |    1 +
 tools/libxc/xc_hvm_build_x86.c          |    1 +
 tools/libxc/xenctrl.h                   |   24 ++++
 tools/libxl/libxl_pci.c                 |   14 ++
 xen/arch/x86/hvm/Makefile               |    1 +
 xen/arch/x86/hvm/hotplug.c              |  224 +++++++++++++++++++++++++++++++
 xen/arch/x86/hvm/hvm.c                  |   40 ++++++
 xen/include/asm-x86/hvm/domain.h        |   11 ++
 xen/include/asm-x86/hvm/io.h            |    8 +-
 xen/include/public/hvm/hvm_op.h         |   12 ++
 xen/include/public/hvm/ioreq.h          |    4 +
 14 files changed, 402 insertions(+), 160 deletions(-)
 create mode 100644 xen/arch/x86/hvm/hotplug.c

diff --git a/tools/firmware/hvmloader/acpi/Makefile b/tools/firmware/hvmloader/acpi/Makefile
index 2c50851..5fc4ebd 100644
--- a/tools/firmware/hvmloader/acpi/Makefile
+++ b/tools/firmware/hvmloader/acpi/Makefile
@@ -36,7 +36,7 @@ mk_dsdt: mk_dsdt.c
 
 dsdt_anycpu_qemu_xen.asl: dsdt.asl mk_dsdt
 	awk 'NR > 1 {print s} {s=$$0}' $< > $@
-	./mk_dsdt --dm-version qemu-xen >> $@
+	./mk_dsdt >> $@
 
 # NB. awk invocation is a portable alternative to 'head -n -1'
 dsdt_%cpu.asl: dsdt.asl mk_dsdt
diff --git a/tools/firmware/hvmloader/acpi/mk_dsdt.c b/tools/firmware/hvmloader/acpi/mk_dsdt.c
index a4b693b..1de88ac 100644
--- a/tools/firmware/hvmloader/acpi/mk_dsdt.c
+++ b/tools/firmware/hvmloader/acpi/mk_dsdt.c
@@ -8,11 +8,6 @@
 
 static unsigned int indent_level;
 
-typedef enum dm_version {
-    QEMU_XEN_TRADITIONAL,
-    QEMU_XEN,
-} dm_version;
-
 static void indent(void)
 {
     unsigned int i;
@@ -58,38 +53,14 @@ static void pop_block(void)
     printf("}\n");
 }
 
-static void pci_hotplug_notify(unsigned int slt)
-{
-    stmt("Notify", "\\_SB.PCI0.S%02X, EVT", slt);
-}
-
-static void decision_tree(
-    unsigned int s, unsigned int e, char *var, void (*leaf)(unsigned int))
-{
-    if ( s == (e-1) )
-    {
-        (*leaf)(s);
-        return;
-    }
-
-    push_block("If", "And(%s, 0x%02x)", var, (e-s)/2);
-    decision_tree((s+e)/2, e, var, leaf);
-    pop_block();
-    push_block("Else", NULL);
-    decision_tree(s, (s+e)/2, var, leaf);
-    pop_block();
-}
-
 static struct option options[] = {
     { "maxcpu", 1, 0, 'c' },
-    { "dm-version", 1, 0, 'q' },
     { 0, 0, 0, 0 }
 };
 
 int main(int argc, char **argv)
 {
     unsigned int slot, dev, intx, link, cpu, max_cpus = HVM_MAX_VCPUS;
-    dm_version dm_version = QEMU_XEN_TRADITIONAL;
 
     for ( ; ; )
     {
@@ -115,16 +86,6 @@ int main(int argc, char **argv)
             }
             break;
         }
-        case 'q':
-            if (strcmp(optarg, "qemu-xen") == 0) {
-                dm_version = QEMU_XEN;
-            } else if (strcmp(optarg, "qemu-xen-traditional") == 0) {
-                dm_version = QEMU_XEN_TRADITIONAL;
-            } else {
-                fprintf(stderr, "Unknown device model version `%s'.\n", optarg);
-                return -1;
-            }
-            break;
         default:
             return -1;
         }
@@ -222,11 +183,8 @@ int main(int argc, char **argv)
 
     /* Define GPE control method. */
     push_block("Scope", "\\_GPE");
-    if (dm_version == QEMU_XEN_TRADITIONAL) {
-        push_block("Method", "_L02");
-    } else {
-        push_block("Method", "_E02");
-    }
+    push_block("Method", "_L02");
+
     stmt("Return", "\\_SB.PRSC()");
     pop_block();
     pop_block();
@@ -237,23 +195,17 @@ int main(int argc, char **argv)
     push_block("Scope", "\\_SB.PCI0");
 
     /*
-     * Reserve the IO port ranges [0x10c0, 0x1101] and [0xb044, 0xb047].
-     * Or else, for a hotplugged-in device, the port IO BAR assigned
-     * by guest OS may conflict with the ranges here.
+     * Reserve the IO port ranges used by PCI hotplug controller or else,
+     * for a hotplugged-in device, the port IO BAR assigned by guest OS may
+     * conflict with the ranges here.
      */
     push_block("Device", "HP0"); {
         stmt("Name", "_HID, EISAID(\"PNP0C02\")");
-        if (dm_version == QEMU_XEN_TRADITIONAL) {
-            stmt("Name", "_CRS, ResourceTemplate() {"
-                 "  IO (Decode16, 0x10c0, 0x10c0, 0x00, 0x82)"
-                 "  IO (Decode16, 0xb044, 0xb044, 0x00, 0x04)"
-                 "}");
-        } else {
-            stmt("Name", "_CRS, ResourceTemplate() {"
-                 "  IO (Decode16, 0xae00, 0xae00, 0x00, 0x10)"
-                 "  IO (Decode16, 0xb044, 0xb044, 0x00, 0x04)"
-                 "}");
-        }
+        stmt("Name", "_CRS, ResourceTemplate() {"
+             "  IO (Decode16, 0x10c0, 0x10c0, 0x00, 0x82)"
+             "  IO (Decode16, 0xb044, 0xb044, 0x00, 0x04)"
+             "  IO (Decode16, 0xae00, 0xae00, 0x00, 0x10)"
+             "}");
     } pop_block();
 
     /*** PCI-ISA link definitions ***/
@@ -322,64 +274,21 @@ int main(int argc, char **argv)
                    dev, intx, ((dev*4+dev/8+intx)&31)+16);
     printf("})\n");
 
-    /*
-     * Each PCI hotplug slot needs at least two methods to handle
-     * the ACPI event:
-     *  _EJ0: eject a device
-     *  _STA: return a device's status, e.g. enabled or removed
-     * 
-     * Eject button would generate a general-purpose event, then the
-     * control method for this event uses Notify() to inform OSPM which
-     * action happened and on which device.
-     *
-     * Pls. refer "6.3 Device Insertion, Removal, and Status Objects"
-     * in ACPI spec 3.0b for details.
-     *
-     * QEMU provides a simple hotplug controller with some I/O to handle
-     * the hotplug action and status, which is beyond the ACPI scope.
-     */
-    if (dm_version == QEMU_XEN_TRADITIONAL) {
-        for ( slot = 0; slot < 0x100; slot++ )
-        {
-            push_block("Device", "S%02X", slot);
-            /* _ADR == dev:fn (16:16) */
-            stmt("Name", "_ADR, 0x%08x", ((slot & ~7) << 13) | (slot & 7));
-            /* _SUN == dev */
-            stmt("Name", "_SUN, 0x%08x", slot >> 3);
-            push_block("Method", "_EJ0, 1");
-            stmt("Store", "0x%02x, \\_GPE.DPT1", slot);
-            stmt("Store", "0x88, \\_GPE.DPT2");
-            stmt("Store", "0x%02x, \\_GPE.PH%02X", /* eject */
-                 (slot & 1) ? 0x10 : 0x01, slot & ~1);
-            pop_block();
-            push_block("Method", "_STA, 0");
-            stmt("Store", "0x%02x, \\_GPE.DPT1", slot);
-            stmt("Store", "0x89, \\_GPE.DPT2");
-            if ( slot & 1 )
-                stmt("ShiftRight", "0x4, \\_GPE.PH%02X, Local1", slot & ~1);
-            else
-                stmt("And", "\\_GPE.PH%02X, 0x0f, Local1", slot & ~1);
-            stmt("Return", "Local1"); /* IN status as the _STA */
-            pop_block();
-            pop_block();
-        }
-    } else {
-        stmt("OperationRegion", "SEJ, SystemIO, 0xae08, 0x04");
-        push_block("Field", "SEJ, DWordAcc, NoLock, WriteAsZeros");
-        indent(); printf("B0EJ, 32,\n");
-        pop_block();
+    stmt("OperationRegion", "SEJ, SystemIO, 0xae08, 0x04");
+    push_block("Field", "SEJ, DWordAcc, NoLock, WriteAsZeros");
+    indent(); printf("B0EJ, 32,\n");
+    pop_block();
 
-        /* hotplug_slot */
-        for (slot = 1; slot <= 31; slot++) {
-            push_block("Device", "S%i", slot); {
-                stmt("Name", "_ADR, %#06x0000", slot);
-                push_block("Method", "_EJ0,1"); {
-                    stmt("Store", "ShiftLeft(1, %#06x), B0EJ", slot);
-                    stmt("Return", "0x0");
-                } pop_block();
-                stmt("Name", "_SUN, %i", slot);
+    /* hotplug_slot */
+    for (slot = 1; slot <= 31; slot++) {
+        push_block("Device", "S%i", slot); {
+            stmt("Name", "_ADR, %#06x0000", slot);
+            push_block("Method", "_EJ0,1"); {
+                stmt("Store", "ShiftLeft(1, %#06x), B0EJ", slot);
+                stmt("Return", "0x0");
             } pop_block();
-        }
+            stmt("Name", "_SUN, %i", slot);
+        } pop_block();
     }
 
     pop_block();
@@ -389,26 +298,11 @@ int main(int argc, char **argv)
     /**** GPE start ****/
     push_block("Scope", "\\_GPE");
 
-    if (dm_version == QEMU_XEN_TRADITIONAL) {
-        stmt("OperationRegion", "PHP, SystemIO, 0x10c0, 0x82");
-
-        push_block("Field", "PHP, ByteAcc, NoLock, Preserve");
-        indent(); printf("PSTA, 8,\n"); /* hotplug controller event reg */
-        indent(); printf("PSTB, 8,\n"); /* hotplug controller slot reg */
-        for ( slot = 0; slot < 0x100; slot += 2 )
-        {
-            indent();
-            /* Each hotplug control register manages a pair of pci functions. */
-            printf("PH%02X, 8,\n", slot);
-        }
-        pop_block();
-    } else {
-        stmt("OperationRegion", "PCST, SystemIO, 0xae00, 0x08");
-        push_block("Field", "PCST, DWordAcc, NoLock, WriteAsZeros");
-        indent(); printf("PCIU, 32,\n");
-        indent(); printf("PCID, 32,\n");
-        pop_block();
-    }
+    stmt("OperationRegion", "PCST, SystemIO, 0xae00, 0x08");
+    push_block("Field", "PCST, DWordAcc, NoLock, WriteAsZeros");
+    indent(); printf("PCIU, 32,\n");
+    indent(); printf("PCID, 32,\n");
+    pop_block();
 
     stmt("OperationRegion", "DG1, SystemIO, 0xb044, 0x04");
 
@@ -416,33 +310,16 @@ int main(int argc, char **argv)
     indent(); printf("DPT1, 8, DPT2, 8\n");
     pop_block();
 
-    if (dm_version == QEMU_XEN_TRADITIONAL) {
-        push_block("Method", "_L03, 0, Serialized");
-        /* Detect slot and event (remove/add). */
-        stmt("Name", "SLT, 0x0");
-        stmt("Name", "EVT, 0x0");
-        stmt("Store", "PSTA, Local1");
-        stmt("And", "Local1, 0xf, EVT");
-        stmt("Store", "PSTB, Local1"); /* XXX: Store (PSTB, SLT) ? */
-        stmt("And", "Local1, 0xff, SLT");
-        /* Debug */
-        stmt("Store", "SLT, DPT1");
-        stmt("Store", "EVT, DPT2");
-        /* Decision tree */
-        decision_tree(0x00, 0x100, "SLT", pci_hotplug_notify);
+    push_block("Method", "_E01");
+    for (slot = 1; slot <= 31; slot++) {
+        push_block("If", "And(PCIU, ShiftLeft(1, %i))", slot);
+        stmt("Notify", "\\_SB.PCI0.S%i, 1", slot);
         pop_block();
-    } else {
-        push_block("Method", "_E01");
-        for (slot = 1; slot <= 31; slot++) {
-            push_block("If", "And(PCIU, ShiftLeft(1, %i))", slot);
-            stmt("Notify", "\\_SB.PCI0.S%i, 1", slot);
-            pop_block();
-            push_block("If", "And(PCID, ShiftLeft(1, %i))", slot);
-            stmt("Notify", "\\_SB.PCI0.S%i, 3", slot);
-            pop_block();
-        }
+        push_block("If", "And(PCID, ShiftLeft(1, %i))", slot);
+        stmt("Notify", "\\_SB.PCI0.S%i, 3", slot);
         pop_block();
     }
+    pop_block();
 
     pop_block();
     /**** GPE end ****/
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 67090bd..f9b3c07 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1532,6 +1532,33 @@ int xc_hvm_set_ioreq_server_state(xc_interface *xch,
     return rc;
 }
 
+int xc_hvm_pci_hotplug(xc_interface *xch,
+                       domid_t domid,
+                       uint32_t slot,
+                       int enable)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_pci_hotplug_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_pci_hotplug;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+
+    arg->domid = domid;
+    arg->slot = slot;
+    arg->enable = !!enable;
+
+    rc = do_xen_hypercall(xch, &hypercall);
+
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
 int xc_domain_setdebugging(xc_interface *xch,
                            uint32_t domid,
                            unsigned int enable)
diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c
index af2bf3a..9b49509 100644
--- a/tools/libxc/xc_domain_restore.c
+++ b/tools/libxc/xc_domain_restore.c
@@ -1784,6 +1784,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
      */
     if (pagebuf.nr_ioreq_server_pages != 0) {
         if (pagebuf.ioreq_server_pfn != 0) {
+            /* Setting this parameter also enables the hotplug controller */
             xc_set_hvm_param(xch, dom, HVM_PARAM_NR_IOREQ_SERVER_PAGES, 
                              pagebuf.nr_ioreq_server_pages);
             xc_set_hvm_param(xch, dom, HVM_PARAM_IOREQ_SERVER_PFN,
diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
index 3564e8b..31d288f 100644
--- a/tools/libxc/xc_hvm_build_x86.c
+++ b/tools/libxc/xc_hvm_build_x86.c
@@ -526,6 +526,7 @@ static int setup_guest(xc_interface *xch,
     /* Tell the domain where the pages are and how many there are */
     xc_set_hvm_param(xch, dom, HVM_PARAM_IOREQ_SERVER_PFN,
                      ioreq_server_pfn(0));
+    /* Setting this parameter also enables the hotplug controller */
     xc_set_hvm_param(xch, dom, HVM_PARAM_NR_IOREQ_SERVER_PAGES,
                      NR_IOREQ_SERVER_PAGES);
 
diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
index 60f6abc..fad66cb 100644
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -1927,6 +1927,30 @@ int xc_hvm_set_ioreq_server_state(xc_interface *xch,
                                   ioservid_t id,
                                   int enabled);
 
+/*
+ * Hotplug controller API
+ */
+
+/**
+ * This function either enables or disables a hotplug PCI slot
+ *
+ * @parm xch a handle to an open hypervisor interface.
+ * @parm domid the domain id to be serviced
+ * @parm slot the slot number
+ * @parm enable enable/disable the slot
+ * @return 0 on success, -1 on failure.
+ *
+ * VMs started on an old version of Xen may not have a hotplug
+ * controller, in which case this function will fail with errno
+ * set to EOPNOTSUPP. Such a failure can be safely ignored if
+ * the device in question is being emulated by QEMU since it
+ * will providing the hotplug controller implementation.
+ */
+int xc_hvm_pci_hotplug(xc_interface *xch,
+                       domid_t domid,
+                       uint32_t slot,
+                       int enable);
+
 /* HVM guest pass-through */
 int xc_assign_device(xc_interface *xch,
                      uint32_t domid,
diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index 44d0453..55cb8a2 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -867,6 +867,13 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, i
         }
         if ( rc )
             return ERROR_FAIL;
+
+        rc = xc_hvm_pci_hotplug(CTX->xch, domid, pcidev->dev, 1);
+        if (rc < 0 && errno != EOPNOTSUPP) {
+            LOGE(ERROR, "Error: xc_hvm_pci_hotplug enable failed");
+            return ERROR_FAIL;
+        }
+
         break;
     case LIBXL_DOMAIN_TYPE_PV:
     {
@@ -1188,6 +1195,13 @@ static int do_pci_remove(libxl__gc *gc, uint32_t domid,
                                          NULL, NULL, NULL) < 0)
             goto out_fail;
 
+        rc = xc_hvm_pci_hotplug(CTX->xch, domid, pcidev->dev, 0);
+        if (rc < 0 && errno != EOPNOTSUPP) {
+            LOGE(ERROR, "Error: xc_hvm_pci_hotplug disable failed");
+            rc = ERROR_FAIL;
+            goto out_fail;
+        }
+
         switch (libxl__device_model_version_running(gc, domid)) {
         case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL:
             rc = qemu_pci_remove_xenstore(gc, domid, pcidev, force);
diff --git a/xen/arch/x86/hvm/Makefile b/xen/arch/x86/hvm/Makefile
index eea5555..48efddb 100644
--- a/xen/arch/x86/hvm/Makefile
+++ b/xen/arch/x86/hvm/Makefile
@@ -3,6 +3,7 @@ subdir-y += vmx
 
 obj-y += asid.o
 obj-y += emulate.o
+obj-y += hotplug.o
 obj-y += hpet.o
 obj-y += hvm.o
 obj-y += i8254.o
diff --git a/xen/arch/x86/hvm/hotplug.c b/xen/arch/x86/hvm/hotplug.c
new file mode 100644
index 0000000..0b139ad
--- /dev/null
+++ b/xen/arch/x86/hvm/hotplug.c
@@ -0,0 +1,224 @@
+/*
+ * hvm/hotplug.c
+ *
+ * Copyright (c) 2014, Citrix Systems Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ */
+
+#include <xen/types.h>
+#include <xen/spinlock.h>
+#include <xen/xmalloc.h>
+#include <asm/hvm/io.h>
+#include <asm/hvm/support.h>
+
+#define SCI_IRQ 9
+
+/* pci status bit: \_GPE._L02 - i.e. level sensitive, bit 2 */
+#define GPE_PCI_HOTPLUG_STATUS  2
+
+#define PCI_UP      0
+#define PCI_DOWN    4
+#define PCI_EJECT   8
+
+static void gpe_update_sci(struct hvm_hotplug *hp)
+{
+    struct domain *d;
+
+    d = container_of(
+            container_of(
+                container_of(hp, struct hvm_domain, hotplug),
+                struct arch_domain, hvm_domain),
+            struct domain, arch);
+
+    if ( (hp->gpe_sts_en[0] & hp->gpe_sts_en[ACPI_GPE0_BLK_LEN_V1 / 2]) &
+         GPE_PCI_HOTPLUG_STATUS )
+        hvm_isa_irq_assert(d, SCI_IRQ);
+    else
+        hvm_isa_irq_deassert(d, SCI_IRQ);
+}
+
+static uint8_t gpe_read_port(struct hvm_hotplug *hp, uint32_t port)
+{
+    return hp->gpe_sts_en[port];
+}
+
+static void gpe_write_port(struct hvm_hotplug *hp, uint32_t port, uint8_t val)
+{
+    if ( port < ACPI_GPE0_BLK_LEN_V1 / 2 )
+        hp->gpe_sts_en[port] &= ~val;
+    else
+        hp->gpe_sts_en[port] = val;
+}
+
+static int handle_gpe_io(
+    int dir, uint32_t port, uint32_t bytes, uint32_t *val)
+{
+    struct vcpu *v = current;
+    struct domain *d = v->domain;
+    struct hvm_hotplug *hp = &d->arch.hvm_domain.hotplug;
+
+    port -= ACPI_GPE0_BLK_ADDRESS_V1;
+
+    if ( dir == IOREQ_READ )
+    {
+        unsigned int i;
+
+        *val = 0;
+        for ( i = 0; i < bytes; i++ )
+            *val |= gpe_read_port(hp, port + i) << (i * 8);
+    }
+    else
+    {
+        unsigned int i;
+
+        for ( i = 0; i < bytes; i++ )
+            gpe_write_port(hp, port + i, (*val) >> (i * 8));
+
+        gpe_update_sci(hp);
+    }
+
+    return X86EMUL_OKAY;
+}
+
+static void pci_hotplug_eject(struct hvm_hotplug *hp, uint32_t mask)
+{
+    int slot = ffs(mask) - 1;
+
+    gdprintk(XENLOG_INFO, "%s: %d\n", __func__, slot);
+
+    hp->slot_down &= ~(1u  << slot);
+    hp->slot_up &= ~(1u  << slot);
+}
+
+static int handle_pci_hotplug_io(
+    int dir, uint32_t port, uint32_t bytes, uint32_t *val)
+{
+    struct vcpu *v = current;
+    struct domain *d = v->domain;
+    struct hvm_hotplug  *hp = &d->arch.hvm_domain.hotplug;
+
+    /* ASL specifies DWordAcc */
+    if ( bytes != 4 )
+    {
+        gdprintk(XENLOG_WARNING, "%s: bad access\n", __func__);
+        goto done;
+    }
+
+    port -= ACPI_PCI_HOTPLUG_ADDRESS_V1;
+
+    if ( dir == IOREQ_READ )
+    {
+        switch ( port )
+        {
+        case PCI_UP:
+            *val = hp->slot_up;
+            break;
+        case PCI_DOWN:
+            *val = hp->slot_down;
+            break;
+        default:
+            break;
+        }
+    }
+    else
+    {   
+        switch ( port )
+        {
+        case PCI_EJECT:
+            pci_hotplug_eject(hp, *val);
+            break;
+        default:
+            break;
+        }
+    }
+
+ done:
+    return X86EMUL_OKAY;
+}
+
+static int null_io(
+    int dir, uint32_t port, uint32_t bytes, uint32_t *val)
+{
+    /* Make it look like this IO range is non-existent */
+    if ( dir == IOREQ_READ )
+        *val = ~0u;
+
+    return X86EMUL_OKAY;
+}
+
+int pci_hotplug(struct domain *d, int slot, bool_t enable)
+{
+    struct hvm_hotplug  *hp = &d->arch.hvm_domain.hotplug;
+
+    gdprintk(XENLOG_DEBUG, "%s: %d:%d %s\n", __func__,
+             d->domain_id, slot, enable ? "enable" : "disable");
+
+    if ( !hp->gpe_sts_en )
+        return -EOPNOTSUPP;
+
+    if ( enable )
+        hp->slot_up |= (1u << slot);
+    else
+        hp->slot_down |= (1u << slot);
+
+    hp->gpe_sts_en[0] |= GPE_PCI_HOTPLUG_STATUS;
+    gpe_update_sci(hp);
+
+    return 0;
+}
+
+int gpe_init(struct domain *d)
+{
+    struct hvm_hotplug  *hp = &d->arch.hvm_domain.hotplug;
+
+    hp->gpe_sts_en = xzalloc_array(uint8_t, ACPI_GPE0_BLK_LEN_V1);
+    if ( hp->gpe_sts_en == NULL )
+        return -ENOMEM;
+
+    register_portio_handler(d, ACPI_GPE0_BLK_ADDRESS_V1,
+                            ACPI_GPE0_BLK_LEN_V1, handle_gpe_io);
+    register_portio_handler(d, ACPI_PCI_HOTPLUG_ADDRESS_V1,
+                            ACPI_PCI_HOTPLUG_LEN_V1, handle_pci_hotplug_io);
+
+    /*
+     * We should make sure that the old GPE and hotplug controller ranges
+     * used by qemu traditional are obscured to avoid confusion.
+     */
+    register_portio_handler(d, ACPI_GPE0_BLK_ADDRESS_V0,
+                            ACPI_GPE0_BLK_LEN_V0, null_io);
+    register_portio_handler(d, ACPI_PCI_HOTPLUG_ADDRESS_V0,
+                            ACPI_PCI_HOTPLUG_LEN_V0, null_io);
+
+
+    return 0;
+}
+
+void gpe_deinit(struct domain *d)
+{
+    struct hvm_hotplug  *hp = &d->arch.hvm_domain.hotplug;
+
+    xfree(hp->gpe_sts_en);
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * c-tab-always-indent: nil
+ * End:
+ */
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index e734adb..517a789 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1571,6 +1571,7 @@ void hvm_domain_destroy(struct domain *d)
         return;
 
     hvm_funcs.domain_destroy(d);
+    gpe_deinit(d);
     rtc_deinit(d);
     stdvga_deinit(d);
     vioapic_deinit(d);
@@ -5307,6 +5308,31 @@ static int hvmop_destroy_ioreq_server(
     return rc;
 }
 
+static int hvmop_pci_hotplug(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_pci_hotplug_t) uop)
+{
+    xen_hvm_pci_hotplug_t op;
+    struct domain *d;
+    int rc;
+
+    if ( copy_from_guest(&op, uop, 1) )
+        return -EFAULT;
+
+    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
+    if ( rc != 0 )
+        return rc;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    rc = pci_hotplug(d, op.slot, !!op.enable);
+
+out:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
 #define HVMOP_op_mask 0xff
 
 long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
@@ -5348,6 +5374,11 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
             guest_handle_cast(arg, xen_hvm_destroy_ioreq_server_t));
         break;
     
+    case HVMOP_pci_hotplug:
+        rc = hvmop_pci_hotplug(
+            guest_handle_cast(arg, xen_hvm_pci_hotplug_t));
+        break;
+
     case HVMOP_set_param:
     case HVMOP_get_param:
     {
@@ -5519,6 +5550,15 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
                     break;
                 }
                 d->arch.hvm_domain.ioreq_gmfn_count = a.value;
+
+                /*
+                 * Since secondary emulators are now possible, enable
+                 * the hotplug controller.
+                 */
+                rc = gpe_init(d);
+                if ( rc == -EEXIST )
+                    rc = 0;
+
                 break;
             }
 
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index 9220e2f..b80cfaf 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -71,6 +71,15 @@ struct hvm_ioreq_server {
     bool_t                 enabled;
 };
 
+struct hvm_hotplug {
+    /* Lower half of block are status bits */
+    uint8_t         *gpe_sts_en;
+
+    /* PCI hotplug */
+    uint32_t        slot_up;
+    uint32_t        slot_down;
+};
+
 struct hvm_domain {
     /* Guest page range used for non-default ioreq servers */
     unsigned long           ioreq_gmfn_base;
@@ -88,6 +97,8 @@ struct hvm_domain {
     uint32_t                pci_cf8;
     spinlock_t              pci_lock;
 
+    struct hvm_hotplug      hotplug;
+
     struct pl_time         pl_time;
 
     struct hvm_io_handler *io_handler;
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index 86db58d..7f88a62 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -25,7 +25,7 @@
 #include <public/hvm/ioreq.h>
 #include <public/event_channel.h>
 
-#define MAX_IO_HANDLER             16
+#define MAX_IO_HANDLER             32
 
 #define HVM_PORTIO                  0
 #define HVM_BUFFERED_IO             2
@@ -142,5 +142,11 @@ void stdvga_init(struct domain *d);
 void stdvga_deinit(struct domain *d);
 
 extern void hvm_dpci_msi_eoi(struct domain *d, int vector);
+
+int gpe_init(struct domain *d);
+void gpe_deinit(struct domain *d);
+
+int pci_hotplug(struct domain *d, int slot, bool_t enable);
+
 #endif /* __ASM_X86_HVM_IO_H__ */
 
diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
index ab4b583..c08aeda 100644
--- a/xen/include/public/hvm/hvm_op.h
+++ b/xen/include/public/hvm/hvm_op.h
@@ -362,6 +362,18 @@ struct xen_hvm_set_ioreq_server_state {
 typedef struct xen_hvm_set_ioreq_server_state xen_hvm_set_ioreq_server_state_t;
 DEFINE_XEN_GUEST_HANDLE(xen_hvm_set_ioreq_server_state_t);
 
+/*
+ * HVMOP_pci_hotplug: enable/disable <slot> of domain <domid>.
+ */
+#define HVMOP_pci_hotplug 23
+struct xen_hvm_pci_hotplug {
+    domid_t domid;  /* IN - domain to be serviced */
+    uint8_t enable; /* IN - enable or disable? */
+    uint32_t slot;  /* IN - slot to enable/disable */
+};
+typedef struct xen_hvm_pci_hotplug xen_hvm_pci_hotplug_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_pci_hotplug_t);
+
 #endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
 
 #endif /* __XEN_PUBLIC_HVM_HVM_OP_H__ */
diff --git a/xen/include/public/hvm/ioreq.h b/xen/include/public/hvm/ioreq.h
index 533089d..3c3a8c1 100644
--- a/xen/include/public/hvm/ioreq.h
+++ b/xen/include/public/hvm/ioreq.h
@@ -102,6 +102,8 @@ typedef struct buffered_iopage buffered_iopage_t;
 #define ACPI_PM_TMR_BLK_ADDRESS_V0   (ACPI_PM1A_EVT_BLK_ADDRESS_V0 + 0x08)
 #define ACPI_GPE0_BLK_ADDRESS_V0     (ACPI_PM_TMR_BLK_ADDRESS_V0 + 0x20)
 #define ACPI_GPE0_BLK_LEN_V0         0x08
+#define ACPI_PCI_HOTPLUG_ADDRESS_V0  0x10c0
+#define ACPI_PCI_HOTPLUG_LEN_V0      0x82 /* NR_PHP_SLOT_REG in piix4acpi.c */
 
 /* Version 1: Locations preferred by modern Qemu. */
 #define ACPI_PM1A_EVT_BLK_ADDRESS_V1 0xb000
@@ -109,6 +111,8 @@ typedef struct buffered_iopage buffered_iopage_t;
 #define ACPI_PM_TMR_BLK_ADDRESS_V1   (ACPI_PM1A_EVT_BLK_ADDRESS_V1 + 0x08)
 #define ACPI_GPE0_BLK_ADDRESS_V1     0xafe0
 #define ACPI_GPE0_BLK_LEN_V1         0x04
+#define ACPI_PCI_HOTPLUG_ADDRESS_V1  0xae00
+#define ACPI_PCI_HOTPLUG_LEN_V1      0x10
 
 /* Compatibility definitions for the default location (version 0). */
 #define ACPI_PM1A_EVT_BLK_ADDRESS    ACPI_PM1A_EVT_BLK_ADDRESS_V0
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 1/9] hvm_set_ioreq_page() releases wrong page in error path
  2014-05-01 12:08 ` [PATCH v5 1/9] hvm_set_ioreq_page() releases wrong page in error path Paul Durrant
@ 2014-05-01 12:48   ` Andrew Cooper
  0 siblings, 0 replies; 57+ messages in thread
From: Andrew Cooper @ 2014-05-01 12:48 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Keir Fraser, Jan Beulich, xen-devel

On 01/05/14 13:08, Paul Durrant wrote:
> The function calls prepare_ring_for_helper() to acquire a mapping for the
> given gmfn, then checks (under lock) to see if the ioreq page is already
> set up but, if it is, the function then releases the in-use ioreq page
> mapping on the error path rather than the one it just acquired. This patch
> fixes this bug.
>
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> Cc: Keir Fraser <keir@xen.org>
> Cc: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

> ---
>  xen/arch/x86/hvm/hvm.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index ac05160..3dec1eb 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -496,7 +496,7 @@ static int hvm_set_ioreq_page(
>  
>      if ( (iorp->va != NULL) || d->is_dying )
>      {
> -        destroy_ring_for_helper(&iorp->va, iorp->page);
> +        destroy_ring_for_helper(&va, page);
>          spin_unlock(&iorp->lock);
>          return -EINVAL;
>      }

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 6/9] ioreq-server: add support for multiple servers
  2014-05-01 12:08 ` [PATCH v5 6/9] ioreq-server: add support for multiple servers Paul Durrant
@ 2014-05-06 10:46   ` Ian Campbell
  2014-05-06 13:28     ` Paul Durrant
  2014-05-07 11:13   ` Jan Beulich
  1 sibling, 1 reply; 57+ messages in thread
From: Ian Campbell @ 2014-05-06 10:46 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Stefano Stabellini, Ian Jackson, Jan Beulich, xen-devel

On Thu, 2014-05-01 at 13:08 +0100, Paul Durrant wrote:
> NOTE: To prevent emulators running in non-privileged guests from
>       potentially allocating very large amounts of xen heap, the core
>       rangeset code has been modified to introduce a hard limit of 256
>       ranges per set.

OOI how much RAM does that correspond to?

(Arguably this and the asprintf change should/could have been separate)

I've only really looks at tools/ and xen/include/public here.

> diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
> index 369c3f3..b3ed029 100644
> --- a/tools/libxc/xc_domain.c
> +++ b/tools/libxc/xc_domain.c
> @@ -1284,6 +1284,225 @@ int xc_get_hvm_param(xc_interface *handle, domid_t dom, int param, unsigned long
>      return rc;
>  }
>  
> +int xc_hvm_create_ioreq_server(xc_interface *xch,
> +                               domid_t domid,
> +                               ioservid_t *id)
> +{
[...]
> +}
> +
> +int xc_hvm_get_ioreq_server_info(xc_interface *xch,
> +                                 domid_t domid,
> +                                 ioservid_t id,
> +                                 xen_pfn_t *ioreq_pfn,
> +                                 xen_pfn_t *bufioreq_pfn,
> +                                 evtchn_port_t *bufioreq_port)
> +{
[...]
> +}
> +
> +int xc_hvm_map_io_range_to_ioreq_server(xc_interface *xch, domid_t domid,
> +                                        ioservid_t id, int is_mmio,
> +                                        uint64_t start, uint64_t end)
> +{
[...]
> +}
> +
> +int xc_hvm_unmap_io_range_from_ioreq_server(xc_interface *xch, domid_t domid,
> +                                            ioservid_t id, int is_mmio,
> +                                            uint64_t start, uint64_t end)
> +{
[...]
> +}

Those all look like reasonable layers over an underlying hypercall, so
if the hypervisor guys are happy with the hypervisor interface then I'm
happy with this.

> +
> +int xc_hvm_map_pcidev_to_ioreq_server(xc_interface *xch, domid_t domid,
> +                                      ioservid_t id, uint16_t segment,
> +                                      uint8_t bus, uint8_t device,
> +                                      uint8_t function)
> +{
> +    DECLARE_HYPERCALL;
> +    DECLARE_HYPERCALL_BUFFER(xen_hvm_io_range_t, arg);
> +    int rc;
> +
> +    if (device > 0x1f || function > 0x7) {
> +        errno = EINVAL;
> +        return -1;

I suppose without this HVMOP_PCI_SBDF will produce nonsense, which the
hypervisor may or may not reject. Hopefully we aren't relying on this
check here for any security properties.

> +    }
> +
> +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> +    if ( arg == NULL )
> +        return -1;
> +
> +    hypercall.op     = __HYPERVISOR_hvm_op;
> +    hypercall.arg[0] = HVMOP_map_io_range_to_ioreq_server;
> +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> +
> +    arg->domid = domid;
> +    arg->id = id;
> +    arg->type = HVMOP_IO_RANGE_PCI;
> +    arg->start = arg->end = HVMOP_PCI_SBDF((uint64_t)segment,

Since you have HVMOP_IO_RANGE_PCI do you not want to expose that via
this interface?

> diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c
> index bcb0ae0..af2bf3a 100644
> --- a/tools/libxc/xc_domain_restore.c
> +++ b/tools/libxc/xc_domain_restore.c
> @@ -1748,6 +1770,29 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
>      if (pagebuf.viridian != 0)
>          xc_set_hvm_param(xch, dom, HVM_PARAM_VIRIDIAN, 1);
>  
> +    /*
> +     * If we are migrating in from a host that does not support
> +     * secondary emulators then nr_ioreq_server_pages will be 0, since
> +     * there will be no XC_SAVE_ID_HVM_NR_IOREQ_SERVER_PAGES chunk in
> +     * the image.
> +     * If we are migrating from a host that does support secondary
> +     * emulators then the XC_SAVE_ID_HVM_NR_IOREQ_SERVER_PAGES chunk
> +     * will exist and is guaranteed to have a non-zero value. The
> +     * existence of that chunk also implies the existence of the
> +     * XC_SAVE_ID_HVM_IOREQ_SERVER_PFN chunk, which is also guaranteed
> +     * to have a non-zero value.

Please can you also note this both or neither behaviour in
xg_save_restore.h

> +     */
> +    if (pagebuf.nr_ioreq_server_pages != 0) {
> +        if (pagebuf.ioreq_server_pfn != 0) {
> +            xc_set_hvm_param(xch, dom, HVM_PARAM_NR_IOREQ_SERVER_PAGES, 
> +                             pagebuf.nr_ioreq_server_pages);
> +            xc_set_hvm_param(xch, dom, HVM_PARAM_IOREQ_SERVER_PFN,
> +                             pagebuf.ioreq_server_pfn);
> +        } else {
> +            ERROR("ioreq_server_pfn is invalid");

Or ioreq_server_pages was. Perhaps say they are inconsistent? Perhaps
log their values?

> +        }
> +    }
       else if (..server_pfn != 0)
	     Also an error I think

(there might be better ways to structure things to catch both error
cases...)


> +
>      if (pagebuf.acpi_ioport_location == 1) {
>          DBGPRINTF("Use new firmware ioport from the checkpoint\n");
>          xc_set_hvm_param(xch, dom, HVM_PARAM_ACPI_IOPORTS_LOCATION, 1);
> diff --git a/tools/libxc/xc_domain_save.c b/tools/libxc/xc_domain_save.c
> index 71f9b59..acf3685 100644
> --- a/tools/libxc/xc_domain_save.c
> +++ b/tools/libxc/xc_domain_save.c
> @@ -1737,6 +1737,30 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iter
>              PERROR("Error when writing the viridian flag");
>              goto out;
>          }
> +
> +        chunk.id = XC_SAVE_ID_HVM_IOREQ_SERVER_PFN;
> +        chunk.data = 0;
> +        xc_get_hvm_param(xch, dom, HVM_PARAM_IOREQ_SERVER_PFN,
> +                         (unsigned long *)&chunk.data);
> +
> +        if ( (chunk.data != 0) &&
> +             wrexact(io_fd, &chunk, sizeof(chunk)) )
> +        {
> +            PERROR("Error when writing the ioreq server gmfn base");
> +            goto out;
> +        }
> +
> +        chunk.id = XC_SAVE_ID_HVM_NR_IOREQ_SERVER_PAGES;
> +        chunk.data = 0;
> +        xc_get_hvm_param(xch, dom, HVM_PARAM_NR_IOREQ_SERVER_PAGES,
> +                         (unsigned long *)&chunk.data);
> +
> +        if ( (chunk.data != 0) &&
> +             wrexact(io_fd, &chunk, sizeof(chunk)) )
> +        {
> +            PERROR("Error when writing the ioreq server gmfn count");
> +            goto out;
> +        }

Probably arranging to assert that both of these are either zero or
non-zero is too much faff.

> @@ -502,6 +505,31 @@ static int setup_guest(xc_interface *xch,
>                       special_pfn(SPECIALPAGE_SHARING));
>  
>      /*
> +     * Allocate and clear additional ioreq server pages. The default
> +     * server will use the IOREQ and BUFIOREQ special pages above.
> +     */
> +    for ( i = 0; i < NR_IOREQ_SERVER_PAGES; i++ )
> +    {
> +        xen_pfn_t pfn = ioreq_server_pfn(i);
> +
> +        rc = xc_domain_populate_physmap_exact(xch, dom, 1, 0, 0, &pfn);
> +        if ( rc != 0 )
> +        {
> +            PERROR("Could not allocate %d'th ioreq server page.", i);

This will say things like "1'th". "Could not allocate ioreq server page
%d" avoids that.

You could do this allocation all in one go if pfn was an array of
[NR_IOREQ_SERVER_PAGES], you can still use order-0 allocations.

Ian.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 7/9] ioreq-server: remove p2m entries when server is enabled
  2014-05-01 12:08 ` [PATCH v5 7/9] ioreq-server: remove p2m entries when server is enabled Paul Durrant
@ 2014-05-06 10:48   ` Ian Campbell
  2014-05-06 16:57     ` Paul Durrant
  2014-05-07 12:09   ` Jan Beulich
  1 sibling, 1 reply; 57+ messages in thread
From: Ian Campbell @ 2014-05-06 10:48 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Stefano Stabellini, Ian Jackson, Jan Beulich, xen-devel

On Thu, 2014-05-01 at 13:08 +0100, Paul Durrant wrote:
> For secondary servers, add a hvm op to enable/disable the server. The
> server will not accept IO until it is enabled and the act of enabling
> the server removes its pages from the guest p2m, thus preventing the guest
> from directly mapping the pages and synthesizing ioreqs.

Does the ring get reset when the server is enabled? Otherwise what
prevents the guest from preloading something before the ring is
activated?

In terms of the tools side binding of the interface this looks fine to me.

Ian.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 8/9] ioreq-server: make buffered ioreq handling optional
  2014-05-01 12:08 ` [PATCH v5 8/9] ioreq-server: make buffered ioreq handling optional Paul Durrant
@ 2014-05-06 10:52   ` Ian Campbell
  2014-05-06 13:17     ` Paul Durrant
  2014-05-07 12:13   ` Jan Beulich
  1 sibling, 1 reply; 57+ messages in thread
From: Ian Campbell @ 2014-05-06 10:52 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Stefano Stabellini, Ian Jackson, Jan Beulich, xen-devel

On Thu, 2014-05-01 at 13:08 +0100, Paul Durrant wrote:
> Some emulators will only register regions that require non-buffered
> access. (In practice the only region that a guest uses buffered access
> for today is the VGA aperture from 0xa0000-0xbffff). This patch therefore
> makes allocation of the buffered ioreq page and event channel optional for
> secondary ioreq servers.
> 
> If a guest attempts buffered access to an ioreq server that does not
> support it, the access will be handled via the normal synchronous path.
> 
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

Tools portion: Acked-by: Ian Campbell <ian.campbell@citrix.com> with one
small nit:

> diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
> index 74aa2bf..60f6abc 100644
> --- a/tools/libxc/xenctrl.h
> +++ b/tools/libxc/xenctrl.h
> @@ -1801,6 +1801,7 @@ int xc_get_hvm_param(xc_interface *handle, domid_t dom, int param, unsigned long
>   */

IIRC there is a doc comment just above here which ought to list this new
parameter.

(those comments do fall rather into the trap of simply repeating the C
don't they, but that does appear to be the prevailing style of many of
them...)

Ian.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 9/9] ioreq-server: bring the PCI hotplug controller implementation into Xen
  2014-05-01 12:08 ` [PATCH v5 9/9] ioreq-server: bring the PCI hotplug controller implementation into Xen Paul Durrant
@ 2014-05-06 11:24   ` Ian Campbell
  2014-05-06 13:02     ` Paul Durrant
  0 siblings, 1 reply; 57+ messages in thread
From: Ian Campbell @ 2014-05-06 11:24 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Stefano Stabellini, Ian Jackson, Jan Beulich, xen-devel

On Thu, 2014-05-01 at 13:08 +0100, Paul Durrant wrote:
> Because we may now have more than one emulator, the implementation of the
> PCI hotplug controller needs to be done by Xen.

Does that imply that this series must come sooner in the series?

>  Happily the code is very
> short and simple and it also removes the need for a different ACPI DSDT
> when using different variants of QEMU.
> 
> As a precaution, we obscure the IO ranges used by QEMU traditional's gpe
> and hotplug controller implementations to avoid the possibility of it
> raising an SCI which will never be cleared.
> 
> VMs started on an older host and then migrated in will not use the in-Xen
> controller as the AML may still point at QEMU traditional's hotplug
> controller implementation.

"... will not ... may ...". I think perhaps one of those should be the
other?

>  This means xc_hvm_pci_hotplug() will fail
> with EOPNOTSUPP and it is up to the caller to decide whether this is a
> problem or not. libxl will ignore EOPNOTSUPP as it is always hotplugging
> via QEMU so it does not matter whether it is Xen or QEMU providing the
> implementation.

Just to clarify: there is no qemu patch involved here because this
change will mean that PCI HP controller accesses will never be passed to
qemu?

What does "hotplugging via qemu" mean here if qemu isn't patched to call
this new call?

What is the condition which causes the EOPNOTSUPP? I think it is when
HVM_PARAM_IOREQ_SERVER_PFN has not been set. Is that correct? Can you
write it down here please.

Is there any save/restore state associated with the hotplug controller?
If yes then how is that handled when migrating a qemu-xen (new qemu)
guest from a system which uses PCIHP in qemu to one which does it in
Xen?


> 
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> Cc: Ian Campbell <ian.campbell@citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> ---
>  tools/firmware/hvmloader/acpi/Makefile  |    2 +-
>  tools/firmware/hvmloader/acpi/mk_dsdt.c |  193 +++++---------------------
>  tools/libxc/xc_domain.c                 |   27 ++++
>  tools/libxc/xc_domain_restore.c         |    1 +
>  tools/libxc/xc_hvm_build_x86.c          |    1 +
>  tools/libxc/xenctrl.h                   |   24 ++++
>  tools/libxl/libxl_pci.c                 |   14 ++
>  xen/arch/x86/hvm/Makefile               |    1 +
>  xen/arch/x86/hvm/hotplug.c              |  224 +++++++++++++++++++++++++++++++
>  xen/arch/x86/hvm/hvm.c                  |   40 ++++++
>  xen/include/asm-x86/hvm/domain.h        |   11 ++
>  xen/include/asm-x86/hvm/io.h            |    8 +-
>  xen/include/public/hvm/hvm_op.h         |   12 ++
>  xen/include/public/hvm/ioreq.h          |    4 +
>  14 files changed, 402 insertions(+), 160 deletions(-)
>  create mode 100644 xen/arch/x86/hvm/hotplug.c
> 
> diff --git a/tools/firmware/hvmloader/acpi/mk_dsdt.c b/tools/firmware/hvmloader/acpi/mk_dsdt.c
> index a4b693b..1de88ac 100644
> --- a/tools/firmware/hvmloader/acpi/mk_dsdt.c
> +++ b/tools/firmware/hvmloader/acpi/mk_dsdt.c
> @@ -222,11 +183,8 @@ int main(int argc, char **argv)
>  
>      /* Define GPE control method. */
>      push_block("Scope", "\\_GPE");
> -    if (dm_version == QEMU_XEN_TRADITIONAL) {
> -        push_block("Method", "_L02");
> -    } else {
> -        push_block("Method", "_E02");
> -    }
> +    push_block("Method", "_L02");

Aren't you leaving the wrong case behind here?
 
>      /*
> -     * Reserve the IO port ranges [0x10c0, 0x1101] and [0xb044, 0xb047].
> -     * Or else, for a hotplugged-in device, the port IO BAR assigned
> -     * by guest OS may conflict with the ranges here.
> +     * Reserve the IO port ranges used by PCI hotplug controller or else,
> +     * for a hotplugged-in device, the port IO BAR assigned by guest OS may
> +     * conflict with the ranges here.

AIUI you are also reserving the ranges used by qemu-trad to avoid
accidental conflicts?

> @@ -322,64 +274,21 @@ int main(int argc, char **argv)
>                     dev, intx, ((dev*4+dev/8+intx)&31)+16);
>      printf("})\n");
>  
> -    /*
> -     * Each PCI hotplug slot needs at least two methods to handle
> -     * the ACPI event:
> -     *  _EJ0: eject a device
> -     *  _STA: return a device's status, e.g. enabled or removed
> -     * 
> -     * Eject button would generate a general-purpose event, then the
> -     * control method for this event uses Notify() to inform OSPM which
> -     * action happened and on which device.
> -     *
> -     * Pls. refer "6.3 Device Insertion, Removal, and Status Objects"
> -     * in ACPI spec 3.0b for details.
> -     *
> -     * QEMU provides a simple hotplug controller with some I/O to handle
> -     * the hotplug action and status, which is beyond the ACPI scope.

Is this comment not still relevant with s/QEMU/Xen/?

> [...]
> diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c
> index af2bf3a..9b49509 100644
> --- a/tools/libxc/xc_domain_restore.c
> +++ b/tools/libxc/xc_domain_restore.c
> @@ -1784,6 +1784,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
>       */
>      if (pagebuf.nr_ioreq_server_pages != 0) {
>          if (pagebuf.ioreq_server_pfn != 0) {
> +            /* Setting this parameter also enables the hotplug controller */

"Xen's hotplug controller" ? (maybe I only think that because I'm aware
of the qemu alternative).

> +/**
> + * This function either enables or disables a hotplug PCI slot
> + *
> + * @parm xch a handle to an open hypervisor interface.
> + * @parm domid the domain id to be serviced
> + * @parm slot the slot number
> + * @parm enable enable/disable the slot
> + * @return 0 on success, -1 on failure.
> + *
> + * VMs started on an old version of Xen may not have a hotplug
> + * controller, in which case this function will fail with errno
> + * set to EOPNOTSUPP. Such a failure can be safely ignored if
> + * the device in question is being emulated by QEMU since it
> + * will providing the hotplug controller implementation.

Is this more strictly required to be the QEMU providing the fallback
ioreq, as opposed to some potentially disaggregated QEMU functionality
in a secondary thing?

> diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
> index 44d0453..55cb8a2 100644
> --- a/tools/libxl/libxl_pci.c
> +++ b/tools/libxl/libxl_pci.c
> @@ -867,6 +867,13 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, i
>          }
>          if ( rc )
>              return ERROR_FAIL;
> +
> +        rc = xc_hvm_pci_hotplug(CTX->xch, domid, pcidev->dev, 1);
> +        if (rc < 0 && errno != EOPNOTSUPP) {
> +            LOGE(ERROR, "Error: xc_hvm_pci_hotplug enable failed");
> +            return ERROR_FAIL;
> +        }

I initially thought you needed to also reset rc to indicate success in
the errno==EOPNOTSUPP, but actually the error handling in this function
is just confusing...

But, have you tried hotpluging into a migrated guest?

This might be a location worth reiterating (or at least referring to)
the comment about EONOTSUPP and where the HP controller lives

What would it take for the toolstack to know if it needed to do this
call or not? Reading the IOREQ PFNs HVM param perhaps? I wonder if that
is worth it.

> +
>          break;
>      case LIBXL_DOMAIN_TYPE_PV:
>      {
> @@ -1188,6 +1195,13 @@ static int do_pci_remove(libxl__gc *gc, uint32_t domid,
>                                           NULL, NULL, NULL) < 0)
>              goto out_fail;
>  
> +        rc = xc_hvm_pci_hotplug(CTX->xch, domid, pcidev->dev, 0);
> +        if (rc < 0 && errno != EOPNOTSUPP) {
> +            LOGE(ERROR, "Error: xc_hvm_pci_hotplug disable failed");

Similar comments to above.

> +            rc = ERROR_FAIL;
> +            goto out_fail;
> +        }
> +
>          switch (libxl__device_model_version_running(gc, domid)) {
>          case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL:
>              rc = qemu_pci_remove_xenstore(gc, domid, pcidev, force);

Ian.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 2/9] ioreq-server: pre-series tidy up
  2014-05-01 12:08 ` [PATCH v5 2/9] ioreq-server: pre-series tidy up Paul Durrant
@ 2014-05-06 12:25   ` Jan Beulich
  2014-05-06 12:37     ` Paul Durrant
  0 siblings, 1 reply; 57+ messages in thread
From: Jan Beulich @ 2014-05-06 12:25 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Keir Fraser, xen-devel

>>> On 01.05.14 at 14:08, <paul.durrant@citrix.com> wrote:
> This patch tidies up various parts of the code that following patches move
> around. If these modifications were combined with the code motion it would
> be easy to miss them.
> 
> There's also some function renaming to reflect purpose and a single
> whitespace fix.
> 
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>

Acked-by: Jan Beulich <jbeulich@suse.com>

with one comment:

> @@ -1425,14 +1425,15 @@ void hvm_vcpu_down(struct vcpu *v)
>      }
>  }
>  
> -bool_t hvm_send_assist_req(struct vcpu *v)
> +bool_t hvm_send_assist_req(void)
>  {
> -    ioreq_t *p;
> +    struct vcpu *v = current;
> +    ioreq_t *p = get_ioreq(v);

Would the patch grow significantly bigger if you renamed "v" to "curr"
here, as we generally try to do to document that it's not an arbitrary
vCPU that is being referred to (the ack stands regardless of whether
you do the rename)?

Jan

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 3/9] ioreq-server: centralize access to ioreq structures
  2014-05-01 12:08 ` [PATCH v5 3/9] ioreq-server: centralize access to ioreq structures Paul Durrant
@ 2014-05-06 12:35   ` Jan Beulich
  2014-05-06 12:41     ` Paul Durrant
  0 siblings, 1 reply; 57+ messages in thread
From: Jan Beulich @ 2014-05-06 12:35 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Keir Fraser, Kevin Tian, EddieDong, Jun Nakajima, xen-devel

>>> On 01.05.14 at 14:08, <paul.durrant@citrix.com> wrote:
> To simplify creation of the ioreq server abstraction in a subsequent patch,
> this patch centralizes all use of the shared ioreq structure and the
> buffered ioreq ring to the source module xen/arch/x86/hvm/hvm.c.
> 
> The patch moves an rmb() from inside hvm_io_assist() to hvm_do_resume()
> because the former may now be passed a data structure on stack, in which
> case the barrier is unnecessary.
> 
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>

Acked-by: Jan Beulich <jbeulich@suse.com>

with these minor comments:

> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -363,6 +363,26 @@ void hvm_migrate_pirqs(struct vcpu *v)
>      spin_unlock(&d->event_lock);
>  }
>  
> +static ioreq_t *get_ioreq(struct vcpu *v)

const struct vcpu *? Or was it that this conflicts with a subsequent
patch?

> +{
> +    struct domain *d = v->domain;
> +    shared_iopage_t *p = d->arch.hvm_domain.ioreq.va;
> +
> +    ASSERT((v == current) || spin_is_locked(&d->arch.hvm_domain.ioreq.lock));
> +
> +    return p ? &p->vcpu_ioreq[v->vcpu_id] : NULL;
> +}
> +
> +bool_t hvm_io_pending(struct vcpu *v)
> +{
> +    ioreq_t *p = get_ioreq(v);

It would be particularly desirable since this one would logically want
its parameter to be const-qualified.

> +bool_t hvm_has_dm(struct domain *d)
> +{
> +    return !!d->arch.hvm_domain.ioreq.va;
> +}

Pretty certainly const here.

Jan

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 2/9] ioreq-server: pre-series tidy up
  2014-05-06 12:25   ` Jan Beulich
@ 2014-05-06 12:37     ` Paul Durrant
  0 siblings, 0 replies; 57+ messages in thread
From: Paul Durrant @ 2014-05-06 12:37 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Keir (Xen.org), xen-devel

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 06 May 2014 13:26
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; Keir (Xen.org)
> Subject: Re: [PATCH v5 2/9] ioreq-server: pre-series tidy up
> 
> >>> On 01.05.14 at 14:08, <paul.durrant@citrix.com> wrote:
> > This patch tidies up various parts of the code that following patches move
> > around. If these modifications were combined with the code motion it
> would
> > be easy to miss them.
> >
> > There's also some function renaming to reflect purpose and a single
> > whitespace fix.
> >
> > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> 
> Acked-by: Jan Beulich <jbeulich@suse.com>
> 

Thanks.

> with one comment:
> 
> > @@ -1425,14 +1425,15 @@ void hvm_vcpu_down(struct vcpu *v)
> >      }
> >  }
> >
> > -bool_t hvm_send_assist_req(struct vcpu *v)
> > +bool_t hvm_send_assist_req(void)
> >  {
> > -    ioreq_t *p;
> > +    struct vcpu *v = current;
> > +    ioreq_t *p = get_ioreq(v);
> 
> Would the patch grow significantly bigger if you renamed "v" to "curr"
> here, as we generally try to do to document that it's not an arbitrary
> vCPU that is being referred to (the ack stands regardless of whether
> you do the rename)?
> 

I'll take a look.

  Paul

> Jan

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 3/9] ioreq-server: centralize access to ioreq structures
  2014-05-06 12:35   ` Jan Beulich
@ 2014-05-06 12:41     ` Paul Durrant
  2014-05-06 14:13       ` Paul Durrant
  0 siblings, 1 reply; 57+ messages in thread
From: Paul Durrant @ 2014-05-06 12:41 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Keir (Xen.org), Kevin Tian, Eddie Dong, Jun Nakajima, xen-devel

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 06 May 2014 13:35
> To: Paul Durrant
> Cc: Eddie Dong; Jun Nakajima; Kevin Tian; xen-devel@lists.xen.org; Keir
> (Xen.org)
> Subject: Re: [PATCH v5 3/9] ioreq-server: centralize access to ioreq structures
> 
> >>> On 01.05.14 at 14:08, <paul.durrant@citrix.com> wrote:
> > To simplify creation of the ioreq server abstraction in a subsequent patch,
> > this patch centralizes all use of the shared ioreq structure and the
> > buffered ioreq ring to the source module xen/arch/x86/hvm/hvm.c.
> >
> > The patch moves an rmb() from inside hvm_io_assist() to
> hvm_do_resume()
> > because the former may now be passed a data structure on stack, in which
> > case the barrier is unnecessary.
> >
> > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> 
> Acked-by: Jan Beulich <jbeulich@suse.com>
> 

Thanks.

> with these minor comments:
> 
> > --- a/xen/arch/x86/hvm/hvm.c
> > +++ b/xen/arch/x86/hvm/hvm.c
> > @@ -363,6 +363,26 @@ void hvm_migrate_pirqs(struct vcpu *v)
> >      spin_unlock(&d->event_lock);
> >  }
> >
> > +static ioreq_t *get_ioreq(struct vcpu *v)
> 
> const struct vcpu *? Or was it that this conflicts with a subsequent
> patch?
> 

I guess that could be done, although it wasn't const before it was moved.

  Paul

> > +{
> > +    struct domain *d = v->domain;
> > +    shared_iopage_t *p = d->arch.hvm_domain.ioreq.va;
> > +
> > +    ASSERT((v == current) || spin_is_locked(&d-
> >arch.hvm_domain.ioreq.lock));
> > +
> > +    return p ? &p->vcpu_ioreq[v->vcpu_id] : NULL;
> > +}
> > +
> > +bool_t hvm_io_pending(struct vcpu *v)
> > +{
> > +    ioreq_t *p = get_ioreq(v);
> 
> It would be particularly desirable since this one would logically want
> its parameter to be const-qualified.
> 
> > +bool_t hvm_has_dm(struct domain *d)
> > +{
> > +    return !!d->arch.hvm_domain.ioreq.va;
> > +}
> 
> Pretty certainly const here.
> 
> Jan

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 4/9] ioreq-server: create basic ioreq server abstraction.
  2014-05-01 12:08 ` [PATCH v5 4/9] ioreq-server: create basic ioreq server abstraction Paul Durrant
@ 2014-05-06 12:55   ` Jan Beulich
  2014-05-06 13:12     ` Paul Durrant
  0 siblings, 1 reply; 57+ messages in thread
From: Jan Beulich @ 2014-05-06 12:55 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Keir Fraser, xen-devel

>>> On 01.05.14 at 14:08, <paul.durrant@citrix.com> wrote:
> Collect together data structures concerning device emulation together into
> a new struct hvm_ioreq_server.
> 
> Code that deals with the shared and buffered ioreq pages is extracted from
> functions such as hvm_domain_initialise, hvm_vcpu_initialise and do_hvm_op
> and consolidated into a set of hvm_ioreq_server manipulation functions. The
> lock in the hvm_ioreq_page served two different purposes and has been
> replaced by separate locks in the hvm_ioreq_server structure.
> 
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>

> Cc: Jan Beulich <jbeulich@suse.com>

>  bool_t hvm_io_pending(struct vcpu *v)
>  {
> -    ioreq_t *p = get_ioreq(v);
> +    struct hvm_ioreq_server *s = v->domain->arch.hvm_domain.ioreq_server;
> +    ioreq_t *p;
>  
> -    if ( !p )
> +    if ( !s )
>          return 0;
>  
> +    p = get_ioreq(s, v);
>      return p->state != STATE_IOREQ_NONE;

I don't think you need the variable "p" here anymore.

> @@ -426,14 +431,6 @@ void hvm_do_resume(struct vcpu *v)
>      }
>  }
>  
> -static void hvm_init_ioreq_page(
> -    struct domain *d, struct hvm_ioreq_page *iorp)
> -{
> -    memset(iorp, 0, sizeof(*iorp));
> -    spin_lock_init(&iorp->lock);
> -    domain_pause(d);

So where is this ...

> @@ -513,22 +507,15 @@ static int hvm_map_ioreq_page(
>      if ( (rc = prepare_ring_for_helper(d, gmfn, &page, &va)) )
>          return rc;
>  
> -    spin_lock(&iorp->lock);
> -
>      if ( (iorp->va != NULL) || d->is_dying )
>      {
>          destroy_ring_for_helper(&va, page);
> -        spin_unlock(&iorp->lock);
>          return -EINVAL;
>      }
>  
>      iorp->va = va;
>      iorp->page = page;
>  
> -    spin_unlock(&iorp->lock);
> -
> -    domain_unpause(d);

... and this going? Or is the pausing no longer needed for a non-
obvious reason?

> +static int hvm_set_ioreq_pfn(struct domain *d, bool_t buf,
> +                             unsigned long pfn)
> +{
> +    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
> +    int rc;
> +
> +    spin_lock(&s->lock);
> +
> +    rc = hvm_map_ioreq_page(s, buf, pfn);
> +    if ( rc )
> +        goto fail;
> +
> +    if (!buf) {

Coding style. I'm afraid this isn't the first time I have to make such a
remark on this series. Please check style before submitting.

> +        struct hvm_ioreq_vcpu *sv;
> +
> +        list_for_each_entry ( sv,
> +                              &s->ioreq_vcpu_list,
> +                              list_entry )
> +            hvm_update_ioreq_evtchn(s, sv);
> +    }
> +
> +    spin_unlock(&s->lock);
> +    return 0;
> +
> + fail:
> +    spin_unlock(&s->lock);
> +    return rc;
> +}

If the function isn't going to change significantly in subsequent
patches, there's no real point here in having the "fail" label and
separate error exit path.

> +static int hvm_replace_event_channel(struct vcpu *v, domid_t remote_domid,
> +                                     evtchn_port_t *p_port)
> +{
> +    evtchn_port_t old_port, new_port;
> +
> +    new_port = alloc_unbound_xen_event_channel(v, remote_domid, NULL);
> +    if ( new_port < 0 )
> +        return new_port;

I'm pretty sure I commented on this too in a previous version:
evtchn_port_t is an unsigned type, hence checking it to be negative
is pointless.

Jan

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 9/9] ioreq-server: bring the PCI hotplug controller implementation into Xen
  2014-05-06 11:24   ` Ian Campbell
@ 2014-05-06 13:02     ` Paul Durrant
  2014-05-06 13:24       ` Ian Campbell
  0 siblings, 1 reply; 57+ messages in thread
From: Paul Durrant @ 2014-05-06 13:02 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Ian Jackson, Stefano Stabellini, Jan Beulich, xen-devel

> -----Original Message-----
> From: Ian Campbell
> Sent: 06 May 2014 12:25
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; Ian Jackson; Stefano Stabellini; Jan Beulich
> Subject: Re: [PATCH v5 9/9] ioreq-server: bring the PCI hotplug controller
> implementation into Xen
> 
> On Thu, 2014-05-01 at 13:08 +0100, Paul Durrant wrote:
> > Because we may now have more than one emulator, the implementation
> of the
> > PCI hotplug controller needs to be done by Xen.
> 
> Does that imply that this series must come sooner in the series?
> 

Well, it needs to be done before anyone expects to be able to hotplug a device implemented by a secondary emulator but someone getting  hold of a build of Xen which is part way through this series is hopefully unlikely. I think it's ok to leave it here.

> >  Happily the code is very
> > short and simple and it also removes the need for a different ACPI DSDT
> > when using different variants of QEMU.
> >
> > As a precaution, we obscure the IO ranges used by QEMU traditional's gpe
> > and hotplug controller implementations to avoid the possibility of it
> > raising an SCI which will never be cleared.
> >
> > VMs started on an older host and then migrated in will not use the in-Xen
> > controller as the AML may still point at QEMU traditional's hotplug
> > controller implementation.
> 
> "... will not ... may ...". I think perhaps one of those should be the
> other?

Ok - if they were started on an old Xen *and* using qemu trad then the AML *will* point to qemu trad's implementation.

> 
> >  This means xc_hvm_pci_hotplug() will fail
> > with EOPNOTSUPP and it is up to the caller to decide whether this is a
> > problem or not. libxl will ignore EOPNOTSUPP as it is always hotplugging
> > via QEMU so it does not matter whether it is Xen or QEMU providing the
> > implementation.
> 
> Just to clarify: there is no qemu patch involved here because this
> change will mean that PCI HP controller accesses will never be passed to
> qemu?
> 

If Xen is not implementing the PCIHP then EOPNOTSUPP will be returned. Libxl should not care because, if that is the case, QEMU will be implementing the PCIHP and libxl only ever deals with devices being emulated by QEMU (because that was the only possible emulator until now).

> What does "hotplugging via qemu" mean here if qemu isn't patched to call
> this new call?
> 

It means QEMU is implementing the hotplug device. So, if Xen is implementing the PCIHP libxl will call the new function and it will succeed. If QEMU is implementing the PCIHP then libxl will call the new function, it will fail, but QEMU will implicitly do the hotplug. Either way, the guest sees a hotplug event.

> What is the condition which causes the EOPNOTSUPP? I think it is when
> HVM_PARAM_IOREQ_SERVER_PFN has not been set. Is that correct? Can
> you
> write it down here please.
> 

Ok.

> Is there any save/restore state associated with the hotplug controller?
> If yes then how is that handled when migrating a qemu-xen (new qemu)
> guest from a system which uses PCIHP in qemu to one which does it in
> Xen?
> 

No, there is no state. I don't believe there ever was.

> 
> >
> > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> > Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> > Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> > Cc: Ian Campbell <ian.campbell@citrix.com>
> > Cc: Jan Beulich <jbeulich@suse.com>
> > ---
> >  tools/firmware/hvmloader/acpi/Makefile  |    2 +-
> >  tools/firmware/hvmloader/acpi/mk_dsdt.c |  193 +++++---------------------
> >  tools/libxc/xc_domain.c                 |   27 ++++
> >  tools/libxc/xc_domain_restore.c         |    1 +
> >  tools/libxc/xc_hvm_build_x86.c          |    1 +
> >  tools/libxc/xenctrl.h                   |   24 ++++
> >  tools/libxl/libxl_pci.c                 |   14 ++
> >  xen/arch/x86/hvm/Makefile               |    1 +
> >  xen/arch/x86/hvm/hotplug.c              |  224
> +++++++++++++++++++++++++++++++
> >  xen/arch/x86/hvm/hvm.c                  |   40 ++++++
> >  xen/include/asm-x86/hvm/domain.h        |   11 ++
> >  xen/include/asm-x86/hvm/io.h            |    8 +-
> >  xen/include/public/hvm/hvm_op.h         |   12 ++
> >  xen/include/public/hvm/ioreq.h          |    4 +
> >  14 files changed, 402 insertions(+), 160 deletions(-)
> >  create mode 100644 xen/arch/x86/hvm/hotplug.c
> >
> > diff --git a/tools/firmware/hvmloader/acpi/mk_dsdt.c
> b/tools/firmware/hvmloader/acpi/mk_dsdt.c
> > index a4b693b..1de88ac 100644
> > --- a/tools/firmware/hvmloader/acpi/mk_dsdt.c
> > +++ b/tools/firmware/hvmloader/acpi/mk_dsdt.c
> > @@ -222,11 +183,8 @@ int main(int argc, char **argv)
> >
> >      /* Define GPE control method. */
> >      push_block("Scope", "\\_GPE");
> > -    if (dm_version == QEMU_XEN_TRADITIONAL) {
> > -        push_block("Method", "_L02");
> > -    } else {
> > -        push_block("Method", "_E02");
> > -    }
> > +    push_block("Method", "_L02");
> 
> Aren't you leaving the wrong case behind here?
> 

No. AFAICT the pcihp code in upstream QEMU actually implemented level triggered semantics anyway - the AML was just wrong.

> >      /*
> > -     * Reserve the IO port ranges [0x10c0, 0x1101] and [0xb044, 0xb047].
> > -     * Or else, for a hotplugged-in device, the port IO BAR assigned
> > -     * by guest OS may conflict with the ranges here.
> > +     * Reserve the IO port ranges used by PCI hotplug controller or else,
> > +     * for a hotplugged-in device, the port IO BAR assigned by guest OS may
> > +     * conflict with the ranges here.
> 
> AIUI you are also reserving the ranges used by qemu-trad to avoid
> accidental conflicts?
> 

Yes, that was added later. I'll make sure that's correct.

> > @@ -322,64 +274,21 @@ int main(int argc, char **argv)
> >                     dev, intx, ((dev*4+dev/8+intx)&31)+16);
> >      printf("})\n");
> >
> > -    /*
> > -     * Each PCI hotplug slot needs at least two methods to handle
> > -     * the ACPI event:
> > -     *  _EJ0: eject a device
> > -     *  _STA: return a device's status, e.g. enabled or removed
> > -     *
> > -     * Eject button would generate a general-purpose event, then the
> > -     * control method for this event uses Notify() to inform OSPM which
> > -     * action happened and on which device.
> > -     *
> > -     * Pls. refer "6.3 Device Insertion, Removal, and Status Objects"
> > -     * in ACPI spec 3.0b for details.
> > -     *
> > -     * QEMU provides a simple hotplug controller with some I/O to handle
> > -     * the hotplug action and status, which is beyond the ACPI scope.
> 
> Is this comment not still relevant with s/QEMU/Xen/?
> 

Yes, actually it is. I'll leave it in.

> > [...]
> > diff --git a/tools/libxc/xc_domain_restore.c
> b/tools/libxc/xc_domain_restore.c
> > index af2bf3a..9b49509 100644
> > --- a/tools/libxc/xc_domain_restore.c
> > +++ b/tools/libxc/xc_domain_restore.c
> > @@ -1784,6 +1784,7 @@ int xc_domain_restore(xc_interface *xch, int
> io_fd, uint32_t dom,
> >       */
> >      if (pagebuf.nr_ioreq_server_pages != 0) {
> >          if (pagebuf.ioreq_server_pfn != 0) {
> > +            /* Setting this parameter also enables the hotplug controller */
> 
> "Xen's hotplug controller" ? (maybe I only think that because I'm aware
> of the qemu alternative).
> 

Yes, I think QEMU is probably out of scope at this point in the code.

> > +/**
> > + * This function either enables or disables a hotplug PCI slot
> > + *
> > + * @parm xch a handle to an open hypervisor interface.
> > + * @parm domid the domain id to be serviced
> > + * @parm slot the slot number
> > + * @parm enable enable/disable the slot
> > + * @return 0 on success, -1 on failure.
> > + *
> > + * VMs started on an old version of Xen may not have a hotplug
> > + * controller, in which case this function will fail with errno
> > + * set to EOPNOTSUPP. Such a failure can be safely ignored if
> > + * the device in question is being emulated by QEMU since it
> > + * will providing the hotplug controller implementation.
> 
> Is this more strictly required to be the QEMU providing the fallback
> ioreq, as opposed to some potentially disaggregated QEMU functionality
> in a secondary thing?
> 

EOPNOTSUPP can only ever happen in the case where there is no secondary emulator, so the fallback is necessarily QEMU.

> > diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
> > index 44d0453..55cb8a2 100644
> > --- a/tools/libxl/libxl_pci.c
> > +++ b/tools/libxl/libxl_pci.c
> > @@ -867,6 +867,13 @@ static int do_pci_add(libxl__gc *gc, uint32_t
> domid, libxl_device_pci *pcidev, i
> >          }
> >          if ( rc )
> >              return ERROR_FAIL;
> > +
> > +        rc = xc_hvm_pci_hotplug(CTX->xch, domid, pcidev->dev, 1);
> > +        if (rc < 0 && errno != EOPNOTSUPP) {
> > +            LOGE(ERROR, "Error: xc_hvm_pci_hotplug enable failed");
> > +            return ERROR_FAIL;
> > +        }
> 
> I initially thought you needed to also reset rc to indicate success in
> the errno==EOPNOTSUPP, but actually the error handling in this function
> is just confusing...
> 
> But, have you tried hotpluging into a migrated guest?
> 

Not yet, but I can't see why that would be a problem over hotplugging at all, and I've certainly tested that (although not via libxl).

> This might be a location worth reiterating (or at least referring to)
> the comment about EONOTSUPP and where the HP controller lives
> 
> What would it take for the toolstack to know if it needed to do this
> call or not? Reading the IOREQ PFNs HVM param perhaps? I wonder if that
> is worth it.
> 

That seem like unnecessary baggage given that EOPNOTSUPP is there purely to cover the case where Xen's PCIHP does not exist. A comment should be enough IMO. I'll add one.

> > +
> >          break;
> >      case LIBXL_DOMAIN_TYPE_PV:
> >      {
> > @@ -1188,6 +1195,13 @@ static int do_pci_remove(libxl__gc *gc, uint32_t
> domid,
> >                                           NULL, NULL, NULL) < 0)
> >              goto out_fail;
> >
> > +        rc = xc_hvm_pci_hotplug(CTX->xch, domid, pcidev->dev, 0);
> > +        if (rc < 0 && errno != EOPNOTSUPP) {
> > +            LOGE(ERROR, "Error: xc_hvm_pci_hotplug disable failed");
> 
> Similar comments to above.
> 

Ok.

  Paul

> > +            rc = ERROR_FAIL;
> > +            goto out_fail;
> > +        }
> > +
> >          switch (libxl__device_model_version_running(gc, domid)) {
> >          case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL:
> >              rc = qemu_pci_remove_xenstore(gc, domid, pcidev, force);
> 
> Ian.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 4/9] ioreq-server: create basic ioreq server abstraction.
  2014-05-06 12:55   ` Jan Beulich
@ 2014-05-06 13:12     ` Paul Durrant
  2014-05-06 13:24       ` Jan Beulich
  0 siblings, 1 reply; 57+ messages in thread
From: Paul Durrant @ 2014-05-06 13:12 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Keir (Xen.org), xen-devel

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 06 May 2014 13:55
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; Keir (Xen.org)
> Subject: Re: [PATCH v5 4/9] ioreq-server: create basic ioreq server
> abstraction.
> 
> >>> On 01.05.14 at 14:08, <paul.durrant@citrix.com> wrote:
> > Collect together data structures concerning device emulation together into
> > a new struct hvm_ioreq_server.
> >
> > Code that deals with the shared and buffered ioreq pages is extracted from
> > functions such as hvm_domain_initialise, hvm_vcpu_initialise and
> do_hvm_op
> > and consolidated into a set of hvm_ioreq_server manipulation functions.
> The
> > lock in the hvm_ioreq_page served two different purposes and has been
> > replaced by separate locks in the hvm_ioreq_server structure.
> >
> > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> 
> > Cc: Jan Beulich <jbeulich@suse.com>
> 
> >  bool_t hvm_io_pending(struct vcpu *v)
> >  {
> > -    ioreq_t *p = get_ioreq(v);
> > +    struct hvm_ioreq_server *s = v->domain-
> >arch.hvm_domain.ioreq_server;
> > +    ioreq_t *p;
> >
> > -    if ( !p )
> > +    if ( !s )
> >          return 0;
> >
> > +    p = get_ioreq(s, v);
> >      return p->state != STATE_IOREQ_NONE;
> 
> I don't think you need the variable "p" here anymore.
> 

I left it in because I generally dislike dereferencing function returns directly. That's probably just my sense of aesthetic though.

> > @@ -426,14 +431,6 @@ void hvm_do_resume(struct vcpu *v)
> >      }
> >  }
> >
> > -static void hvm_init_ioreq_page(
> > -    struct domain *d, struct hvm_ioreq_page *iorp)
> > -{
> > -    memset(iorp, 0, sizeof(*iorp));
> > -    spin_lock_init(&iorp->lock);
> > -    domain_pause(d);
> 
> So where is this ...

Nowhere. As I said in the checkin comment, the lock has gone and the domain_pause() and subsequent domain_unpause() were always unnecessary AFAICT. I think the intention was that the domain was not unpaused until both the IOREQ PFNs were set, but since the PFNs are set in the domain build code in the toolstack I can't see why this was needed.

> 
> > @@ -513,22 +507,15 @@ static int hvm_map_ioreq_page(
> >      if ( (rc = prepare_ring_for_helper(d, gmfn, &page, &va)) )
> >          return rc;
> >
> > -    spin_lock(&iorp->lock);
> > -
> >      if ( (iorp->va != NULL) || d->is_dying )
> >      {
> >          destroy_ring_for_helper(&va, page);
> > -        spin_unlock(&iorp->lock);
> >          return -EINVAL;
> >      }
> >
> >      iorp->va = va;
> >      iorp->page = page;
> >
> > -    spin_unlock(&iorp->lock);
> > -
> > -    domain_unpause(d);
> 
> ... and this going? Or is the pausing no longer needed for a non-
> obvious reason?

As I said above, I don't think it was actually ever needed.

> 
> > +static int hvm_set_ioreq_pfn(struct domain *d, bool_t buf,
> > +                             unsigned long pfn)
> > +{
> > +    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
> > +    int rc;
> > +
> > +    spin_lock(&s->lock);
> > +
> > +    rc = hvm_map_ioreq_page(s, buf, pfn);
> > +    if ( rc )
> > +        goto fail;
> > +
> > +    if (!buf) {
> 
> Coding style. I'm afraid this isn't the first time I have to make such a
> remark on this series. Please check style before submitting.
> 

Oops. Sorry. It would be nice if there were a checkpatch script. 

> > +        struct hvm_ioreq_vcpu *sv;
> > +
> > +        list_for_each_entry ( sv,
> > +                              &s->ioreq_vcpu_list,
> > +                              list_entry )
> > +            hvm_update_ioreq_evtchn(s, sv);
> > +    }
> > +
> > +    spin_unlock(&s->lock);
> > +    return 0;
> > +
> > + fail:
> > +    spin_unlock(&s->lock);
> > +    return rc;
> > +}
> 
> If the function isn't going to change significantly in subsequent
> patches, there's no real point here in having the "fail" label and
> separate error exit path.
> 

Ok.

> > +static int hvm_replace_event_channel(struct vcpu *v, domid_t
> remote_domid,
> > +                                     evtchn_port_t *p_port)
> > +{
> > +    evtchn_port_t old_port, new_port;
> > +
> > +    new_port = alloc_unbound_xen_event_channel(v, remote_domid,
> NULL);
> > +    if ( new_port < 0 )
> > +        return new_port;
> 
> I'm pretty sure I commented on this too in a previous version:
> evtchn_port_t is an unsigned type, hence checking it to be negative
> is pointless.

Yes, but as I'm pretty sure I responded, alloc_unbound_xen_event_channel() doesn't return an evtchn_port_t!

  Paul

> 
> Jan

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 8/9] ioreq-server: make buffered ioreq handling optional
  2014-05-06 10:52   ` Ian Campbell
@ 2014-05-06 13:17     ` Paul Durrant
  0 siblings, 0 replies; 57+ messages in thread
From: Paul Durrant @ 2014-05-06 13:17 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Ian Jackson, Stefano Stabellini, Jan Beulich, xen-devel

> -----Original Message-----
> From: Ian Campbell
> Sent: 06 May 2014 11:53
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; Ian Jackson; Stefano Stabellini; Jan Beulich
> Subject: Re: [PATCH v5 8/9] ioreq-server: make buffered ioreq handling
> optional
> 
> On Thu, 2014-05-01 at 13:08 +0100, Paul Durrant wrote:
> > Some emulators will only register regions that require non-buffered
> > access. (In practice the only region that a guest uses buffered access
> > for today is the VGA aperture from 0xa0000-0xbffff). This patch therefore
> > makes allocation of the buffered ioreq page and event channel optional for
> > secondary ioreq servers.
> >
> > If a guest attempts buffered access to an ioreq server that does not
> > support it, the access will be handled via the normal synchronous path.
> >
> > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> > Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> > Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> 
> Tools portion: Acked-by: Ian Campbell <ian.campbell@citrix.com> with one
> small nit:
> 

Thanks.

> > diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
> > index 74aa2bf..60f6abc 100644
> > --- a/tools/libxc/xenctrl.h
> > +++ b/tools/libxc/xenctrl.h
> > @@ -1801,6 +1801,7 @@ int xc_get_hvm_param(xc_interface *handle,
> domid_t dom, int param, unsigned long
> >   */
> 
> IIRC there is a doc comment just above here which ought to list this new
> parameter.
> 

Ok. I'll check.

  Paul

> (those comments do fall rather into the trap of simply repeating the C
> don't they, but that does appear to be the prevailing style of many of
> them...)
> 
> Ian.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 9/9] ioreq-server: bring the PCI hotplug controller implementation into Xen
  2014-05-06 13:02     ` Paul Durrant
@ 2014-05-06 13:24       ` Ian Campbell
  2014-05-06 13:35         ` Paul Durrant
  0 siblings, 1 reply; 57+ messages in thread
From: Ian Campbell @ 2014-05-06 13:24 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Ian Jackson, Stefano Stabellini, Jan Beulich, xen-devel

On Tue, 2014-05-06 at 14:02 +0100, Paul Durrant wrote:
> > -----Original Message-----
> > From: Ian Campbell
> > Sent: 06 May 2014 12:25
> > To: Paul Durrant
> > Cc: xen-devel@lists.xen.org; Ian Jackson; Stefano Stabellini; Jan Beulich
> > Subject: Re: [PATCH v5 9/9] ioreq-server: bring the PCI hotplug controller
> > implementation into Xen
> > 
> > On Thu, 2014-05-01 at 13:08 +0100, Paul Durrant wrote:
> > > Because we may now have more than one emulator, the implementation
> > of the
> > > PCI hotplug controller needs to be done by Xen.
> > 
> > Does that imply that this series must come sooner in the series?
> > 
> 
> Well, it needs to be done before anyone expects to be able to hotplug
> a device implemented by a secondary emulator but someone getting  hold
> of a build of Xen which is part way through this series is hopefully
> unlikely. I think it's ok to leave it here.

OK

> > >  Happily the code is very
> > > short and simple and it also removes the need for a different ACPI DSDT
> > > when using different variants of QEMU.
> > >
> > > As a precaution, we obscure the IO ranges used by QEMU traditional's gpe
> > > and hotplug controller implementations to avoid the possibility of it
> > > raising an SCI which will never be cleared.
> > >
> > > VMs started on an older host and then migrated in will not use the in-Xen
> > > controller as the AML may still point at QEMU traditional's hotplug
> > > controller implementation.
> > 
> > "... will not ... may ...". I think perhaps one of those should be the
> > other?
> 
> Ok - if they were started on an old Xen *and* using qemu trad then the
> AML *will* point to qemu trad's implementation.

Thanks.

> > >  This means xc_hvm_pci_hotplug() will fail
> > > with EOPNOTSUPP and it is up to the caller to decide whether this is a
> > > problem or not. libxl will ignore EOPNOTSUPP as it is always hotplugging
> > > via QEMU so it does not matter whether it is Xen or QEMU providing the
> > > implementation.
> > 
> > Just to clarify: there is no qemu patch involved here because this
> > change will mean that PCI HP controller accesses will never be passed to
> > qemu?
> > 
> 
> If Xen is not implementing the PCIHP then EOPNOTSUPP will be returned.
> Libxl should not care because, if that is the case, QEMU will be
> implementing the PCIHP and libxl only ever deals with devices being
> emulated by QEMU (because that was the only possible emulator until
> now).

When libxl does the qmp call to tell qemu about a new device what stops
qemu from generating hotplug events (SCI or whatever the interrupt
mechanism is) even if Xen is emulating the HP controller? Because Qemu
will still have the controller emulation, it's just "shadowed", right?

> > What does "hotplugging via qemu" mean here if qemu isn't patched to call
> > this new call?
> > 
> 
> It means QEMU is implementing the hotplug device. So, if Xen is
> implementing the PCIHP libxl will call the new function and it will
> succeed. If QEMU is implementing the PCIHP then libxl will call the
> new function, it will fail, but QEMU will implicitly do the hotplug.
> Either way, the guest sees a hotplug event.

Even if Xen is implementing the PCIHP qemu is still involved in plugging
the device, since it has to know about it though, so in some sense
hotplugging is always (partially) via qemu (which is why I'm a bit
confused, but also why the lack of a Qemu side patch surprises me)

> > Is there any save/restore state associated with the hotplug controller?
> > If yes then how is that handled when migrating a qemu-xen (new qemu)
> > guest from a system which uses PCIHP in qemu to one which does it in
> > Xen?
> > 
> 
> No, there is no state. I don't believe there ever was.

Phew, that would have been tricky to deal with!

> > >      /* Define GPE control method. */
> > >      push_block("Scope", "\\_GPE");
> > > -    if (dm_version == QEMU_XEN_TRADITIONAL) {
> > > -        push_block("Method", "_L02");
> > > -    } else {
> > > -        push_block("Method", "_E02");
> > > -    }
> > > +    push_block("Method", "_L02");
> > 
> > Aren't you leaving the wrong case behind here?
> > 
> 
> No. AFAICT the pcihp code in upstream QEMU actually implemented level
> triggered semantics anyway - the AML was just wrong.

I think this is worth noting in the commit message, if not splitting
into a separate patch

> > > diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
> > > index 44d0453..55cb8a2 100644
> > > --- a/tools/libxl/libxl_pci.c
> > > +++ b/tools/libxl/libxl_pci.c
> > > @@ -867,6 +867,13 @@ static int do_pci_add(libxl__gc *gc, uint32_t
> > domid, libxl_device_pci *pcidev, i
> > >          }
> > >          if ( rc )
> > >              return ERROR_FAIL;
> > > +
> > > +        rc = xc_hvm_pci_hotplug(CTX->xch, domid, pcidev->dev, 1);
> > > +        if (rc < 0 && errno != EOPNOTSUPP) {
> > > +            LOGE(ERROR, "Error: xc_hvm_pci_hotplug enable failed");
> > > +            return ERROR_FAIL;
> > > +        }
> > 
> > I initially thought you needed to also reset rc to indicate success in
> > the errno==EOPNOTSUPP, but actually the error handling in this function
> > is just confusing...
> > 
> > But, have you tried hotpluging into a migrated guest?
> > 
> 
> Not yet, but I can't see why that would be a problem over hotplugging
> at all,

It's the problems we can't imagine that I'm worried about. This stuff is
certainly subtle in places WRT migration.

Ian.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 4/9] ioreq-server: create basic ioreq server abstraction.
  2014-05-06 13:12     ` Paul Durrant
@ 2014-05-06 13:24       ` Jan Beulich
  2014-05-06 13:40         ` Paul Durrant
  2014-05-06 13:44         ` Paul Durrant
  0 siblings, 2 replies; 57+ messages in thread
From: Jan Beulich @ 2014-05-06 13:24 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Keir (Xen.org), xen-devel

>>> On 06.05.14 at 15:12, <Paul.Durrant@citrix.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> >>> On 01.05.14 at 14:08, <paul.durrant@citrix.com> wrote:
>> > @@ -426,14 +431,6 @@ void hvm_do_resume(struct vcpu *v)
>> >      }
>> >  }
>> >
>> > -static void hvm_init_ioreq_page(
>> > -    struct domain *d, struct hvm_ioreq_page *iorp)
>> > -{
>> > -    memset(iorp, 0, sizeof(*iorp));
>> > -    spin_lock_init(&iorp->lock);
>> > -    domain_pause(d);
>> 
>> So where is this ...
> 
> Nowhere. As I said in the checkin comment, the lock has gone and the 
> domain_pause() and subsequent domain_unpause() were always unnecessary 
> AFAICT. I think the intention was that the domain was not unpaused until both 
> the IOREQ PFNs were set, but since the PFNs are set in the domain build code 
> in the toolstack I can't see why this was needed.

So with a disaggregated, hostile tool stack this would still be
unnecessary? It can go away only if the answer to this is "yes".

>> > +static int hvm_replace_event_channel(struct vcpu *v, domid_t
>> remote_domid,
>> > +                                     evtchn_port_t *p_port)
>> > +{
>> > +    evtchn_port_t old_port, new_port;
>> > +
>> > +    new_port = alloc_unbound_xen_event_channel(v, remote_domid, NULL);
>> > +    if ( new_port < 0 )
>> > +        return new_port;
>> 
>> I'm pretty sure I commented on this too in a previous version:
>> evtchn_port_t is an unsigned type, hence checking it to be negative
>> is pointless.
> 
> Yes, but as I'm pretty sure I responded, alloc_unbound_xen_event_channel() 
> doesn't return an evtchn_port_t!

Which doesn't matter here at all: Once you store the function result
in a variable of type evtchn_port_t, its original signedness is lost.

Jan

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 6/9] ioreq-server: add support for multiple servers
  2014-05-06 10:46   ` Ian Campbell
@ 2014-05-06 13:28     ` Paul Durrant
  2014-05-07  9:44       ` Ian Campbell
  0 siblings, 1 reply; 57+ messages in thread
From: Paul Durrant @ 2014-05-06 13:28 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Ian Jackson, Stefano Stabellini, Jan Beulich, xen-devel

> -----Original Message-----
> From: Ian Campbell
> Sent: 06 May 2014 11:46
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; Ian Jackson; Stefano Stabellini; Jan Beulich
> Subject: Re: [PATCH v5 6/9] ioreq-server: add support for multiple servers
> 
> On Thu, 2014-05-01 at 13:08 +0100, Paul Durrant wrote:
> > NOTE: To prevent emulators running in non-privileged guests from
> >       potentially allocating very large amounts of xen heap, the core
> >       rangeset code has been modified to introduce a hard limit of 256
> >       ranges per set.
> 
> OOI how much RAM does that correspond to?

Each range is two pointers (list_head) and two unsigned longs (start and end), so that's 32 bytes - so 256 is two pages worth.

> 
> (Arguably this and the asprintf change should/could have been separate)
> 

I debated that with myself and decided to leave it in the same patch in case someone objected to me adding a function with no callers. I'll separate it for v6.

> I've only really looks at tools/ and xen/include/public here.
> 
> > diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
> > index 369c3f3..b3ed029 100644
> > --- a/tools/libxc/xc_domain.c
> > +++ b/tools/libxc/xc_domain.c
> > @@ -1284,6 +1284,225 @@ int xc_get_hvm_param(xc_interface *handle,
> domid_t dom, int param, unsigned long
> >      return rc;
> >  }
> >
> > +int xc_hvm_create_ioreq_server(xc_interface *xch,
> > +                               domid_t domid,
> > +                               ioservid_t *id)
> > +{
> [...]
> > +}
> > +
> > +int xc_hvm_get_ioreq_server_info(xc_interface *xch,
> > +                                 domid_t domid,
> > +                                 ioservid_t id,
> > +                                 xen_pfn_t *ioreq_pfn,
> > +                                 xen_pfn_t *bufioreq_pfn,
> > +                                 evtchn_port_t *bufioreq_port)
> > +{
> [...]
> > +}
> > +
> > +int xc_hvm_map_io_range_to_ioreq_server(xc_interface *xch, domid_t
> domid,
> > +                                        ioservid_t id, int is_mmio,
> > +                                        uint64_t start, uint64_t end)
> > +{
> [...]
> > +}
> > +
> > +int xc_hvm_unmap_io_range_from_ioreq_server(xc_interface *xch,
> domid_t domid,
> > +                                            ioservid_t id, int is_mmio,
> > +                                            uint64_t start, uint64_t end)
> > +{
> [...]
> > +}
> 
> Those all look like reasonable layers over an underlying hypercall, so
> if the hypervisor guys are happy with the hypervisor interface then I'm
> happy with this.
> 

Ok.

> > +
> > +int xc_hvm_map_pcidev_to_ioreq_server(xc_interface *xch, domid_t
> domid,
> > +                                      ioservid_t id, uint16_t segment,
> > +                                      uint8_t bus, uint8_t device,
> > +                                      uint8_t function)
> > +{
> > +    DECLARE_HYPERCALL;
> > +    DECLARE_HYPERCALL_BUFFER(xen_hvm_io_range_t, arg);
> > +    int rc;
> > +
> > +    if (device > 0x1f || function > 0x7) {
> > +        errno = EINVAL;
> > +        return -1;
> 
> I suppose without this HVMOP_PCI_SBDF will produce nonsense, which the
> hypervisor may or may not reject. Hopefully we aren't relying on this
> check here for any security properties.
> 

I don't think there are any security implications. As you say it's just to make sure the caller sees a failure if they pass in stupid values as opposed to them just sitting there wondering why they are not seeing any ioreqs.

> > +    }
> > +
> > +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> > +    if ( arg == NULL )
> > +        return -1;
> > +
> > +    hypercall.op     = __HYPERVISOR_hvm_op;
> > +    hypercall.arg[0] = HVMOP_map_io_range_to_ioreq_server;
> > +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> > +
> > +    arg->domid = domid;
> > +    arg->id = id;
> > +    arg->type = HVMOP_IO_RANGE_PCI;
> > +    arg->start = arg->end = HVMOP_PCI_SBDF((uint64_t)segment,
> 
> Since you have HVMOP_IO_RANGE_PCI do you not want to expose that via
> this interface?
> 

I could have crunched this into the map_range function. I left it separate because I thought it was more convenient for callers - who I think will most likely deal with one PCI device at a time.

> > diff --git a/tools/libxc/xc_domain_restore.c
> b/tools/libxc/xc_domain_restore.c
> > index bcb0ae0..af2bf3a 100644
> > --- a/tools/libxc/xc_domain_restore.c
> > +++ b/tools/libxc/xc_domain_restore.c
> > @@ -1748,6 +1770,29 @@ int xc_domain_restore(xc_interface *xch, int
> io_fd, uint32_t dom,
> >      if (pagebuf.viridian != 0)
> >          xc_set_hvm_param(xch, dom, HVM_PARAM_VIRIDIAN, 1);
> >
> > +    /*
> > +     * If we are migrating in from a host that does not support
> > +     * secondary emulators then nr_ioreq_server_pages will be 0, since
> > +     * there will be no XC_SAVE_ID_HVM_NR_IOREQ_SERVER_PAGES
> chunk in
> > +     * the image.
> > +     * If we are migrating from a host that does support secondary
> > +     * emulators then the XC_SAVE_ID_HVM_NR_IOREQ_SERVER_PAGES
> chunk
> > +     * will exist and is guaranteed to have a non-zero value. The
> > +     * existence of that chunk also implies the existence of the
> > +     * XC_SAVE_ID_HVM_IOREQ_SERVER_PFN chunk, which is also
> guaranteed
> > +     * to have a non-zero value.
> 
> Please can you also note this both or neither behaviour in
> xg_save_restore.h

Ok.

> 
> > +     */
> > +    if (pagebuf.nr_ioreq_server_pages != 0) {
> > +        if (pagebuf.ioreq_server_pfn != 0) {
> > +            xc_set_hvm_param(xch, dom,
> HVM_PARAM_NR_IOREQ_SERVER_PAGES,
> > +                             pagebuf.nr_ioreq_server_pages);
> > +            xc_set_hvm_param(xch, dom, HVM_PARAM_IOREQ_SERVER_PFN,
> > +                             pagebuf.ioreq_server_pfn);
> > +        } else {
> > +            ERROR("ioreq_server_pfn is invalid");
> 
> Or ioreq_server_pages was. Perhaps say they are inconsistent? Perhaps
> log their values?

Ok - I'll do that.

> 
> > +        }
> > +    }
>        else if (..server_pfn != 0)
> 	     Also an error I think
> 
> (there might be better ways to structure things to catch both error
> cases...)
> 
> 
> > +
> >      if (pagebuf.acpi_ioport_location == 1) {
> >          DBGPRINTF("Use new firmware ioport from the checkpoint\n");
> >          xc_set_hvm_param(xch, dom,
> HVM_PARAM_ACPI_IOPORTS_LOCATION, 1);
> > diff --git a/tools/libxc/xc_domain_save.c b/tools/libxc/xc_domain_save.c
> > index 71f9b59..acf3685 100644
> > --- a/tools/libxc/xc_domain_save.c
> > +++ b/tools/libxc/xc_domain_save.c
> > @@ -1737,6 +1737,30 @@ int xc_domain_save(xc_interface *xch, int io_fd,
> uint32_t dom, uint32_t max_iter
> >              PERROR("Error when writing the viridian flag");
> >              goto out;
> >          }
> > +
> > +        chunk.id = XC_SAVE_ID_HVM_IOREQ_SERVER_PFN;
> > +        chunk.data = 0;
> > +        xc_get_hvm_param(xch, dom, HVM_PARAM_IOREQ_SERVER_PFN,
> > +                         (unsigned long *)&chunk.data);
> > +
> > +        if ( (chunk.data != 0) &&
> > +             wrexact(io_fd, &chunk, sizeof(chunk)) )
> > +        {
> > +            PERROR("Error when writing the ioreq server gmfn base");
> > +            goto out;
> > +        }
> > +
> > +        chunk.id = XC_SAVE_ID_HVM_NR_IOREQ_SERVER_PAGES;
> > +        chunk.data = 0;
> > +        xc_get_hvm_param(xch, dom,
> HVM_PARAM_NR_IOREQ_SERVER_PAGES,
> > +                         (unsigned long *)&chunk.data);
> > +
> > +        if ( (chunk.data != 0) &&
> > +             wrexact(io_fd, &chunk, sizeof(chunk)) )
> > +        {
> > +            PERROR("Error when writing the ioreq server gmfn count");
> > +            goto out;
> > +        }
> 
> Probably arranging to assert that both of these are either zero or
> non-zero is too much faff.
> 
> > @@ -502,6 +505,31 @@ static int setup_guest(xc_interface *xch,
> >                       special_pfn(SPECIALPAGE_SHARING));
> >
> >      /*
> > +     * Allocate and clear additional ioreq server pages. The default
> > +     * server will use the IOREQ and BUFIOREQ special pages above.
> > +     */
> > +    for ( i = 0; i < NR_IOREQ_SERVER_PAGES; i++ )
> > +    {
> > +        xen_pfn_t pfn = ioreq_server_pfn(i);
> > +
> > +        rc = xc_domain_populate_physmap_exact(xch, dom, 1, 0, 0, &pfn);
> > +        if ( rc != 0 )
> > +        {
> > +            PERROR("Could not allocate %d'th ioreq server page.", i);
> 
> This will say things like "1'th". "Could not allocate ioreq server page
> %d" avoids that.

Ok. It was a cut'n'paste from the special_pfn code just above. I'll fix that while I'm in the neighbourhood.

> 
> You could do this allocation all in one go if pfn was an array of
> [NR_IOREQ_SERVER_PAGES], you can still use order-0 allocations.
> 

Ok.

  Paul

> Ian.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 9/9] ioreq-server: bring the PCI hotplug controller implementation into Xen
  2014-05-06 13:24       ` Ian Campbell
@ 2014-05-06 13:35         ` Paul Durrant
  2014-05-07  9:48           ` Ian Campbell
  0 siblings, 1 reply; 57+ messages in thread
From: Paul Durrant @ 2014-05-06 13:35 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Ian Jackson, Stefano Stabellini, Jan Beulich, xen-devel

> -----Original Message-----
> From: Ian Campbell
> Sent: 06 May 2014 14:24
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; Ian Jackson; Stefano Stabellini; Jan Beulich
> Subject: Re: [PATCH v5 9/9] ioreq-server: bring the PCI hotplug controller
> implementation into Xen
> 
> On Tue, 2014-05-06 at 14:02 +0100, Paul Durrant wrote:
> > > -----Original Message-----
> > > From: Ian Campbell
> > > Sent: 06 May 2014 12:25
> > > To: Paul Durrant
> > > Cc: xen-devel@lists.xen.org; Ian Jackson; Stefano Stabellini; Jan Beulich
> > > Subject: Re: [PATCH v5 9/9] ioreq-server: bring the PCI hotplug controller
> > > implementation into Xen
> > >
> > > On Thu, 2014-05-01 at 13:08 +0100, Paul Durrant wrote:
> > > > Because we may now have more than one emulator, the
> implementation
> > > of the
> > > > PCI hotplug controller needs to be done by Xen.
> > >
> > > Does that imply that this series must come sooner in the series?
> > >
> >
> > Well, it needs to be done before anyone expects to be able to hotplug
> > a device implemented by a secondary emulator but someone getting  hold
> > of a build of Xen which is part way through this series is hopefully
> > unlikely. I think it's ok to leave it here.
> 
> OK
> 
> > > >  Happily the code is very
> > > > short and simple and it also removes the need for a different ACPI DSDT
> > > > when using different variants of QEMU.
> > > >
> > > > As a precaution, we obscure the IO ranges used by QEMU traditional's
> gpe
> > > > and hotplug controller implementations to avoid the possibility of it
> > > > raising an SCI which will never be cleared.
> > > >
> > > > VMs started on an older host and then migrated in will not use the in-
> Xen
> > > > controller as the AML may still point at QEMU traditional's hotplug
> > > > controller implementation.
> > >
> > > "... will not ... may ...". I think perhaps one of those should be the
> > > other?
> >
> > Ok - if they were started on an old Xen *and* using qemu trad then the
> > AML *will* point to qemu trad's implementation.
> 
> Thanks.
> 
> > > >  This means xc_hvm_pci_hotplug() will fail
> > > > with EOPNOTSUPP and it is up to the caller to decide whether this is a
> > > > problem or not. libxl will ignore EOPNOTSUPP as it is always hotplugging
> > > > via QEMU so it does not matter whether it is Xen or QEMU providing
> the
> > > > implementation.
> > >
> > > Just to clarify: there is no qemu patch involved here because this
> > > change will mean that PCI HP controller accesses will never be passed to
> > > qemu?
> > >
> >
> > If Xen is not implementing the PCIHP then EOPNOTSUPP will be returned.
> > Libxl should not care because, if that is the case, QEMU will be
> > implementing the PCIHP and libxl only ever deals with devices being
> > emulated by QEMU (because that was the only possible emulator until
> > now).
> 
> When libxl does the qmp call to tell qemu about a new device what stops
> qemu from generating hotplug events (SCI or whatever the interrupt
> mechanism is) even if Xen is emulating the HP controller? Because Qemu
> will still have the controller emulation, it's just "shadowed", right?
>

The controller emulation is there but since the ports are handled by Xen the I/O to enable the hotplug events will never get through. So, yes QEMU will go through the motions of hotplugging but it will never raise the SCI.
 
> > > What does "hotplugging via qemu" mean here if qemu isn't patched to
> call
> > > this new call?
> > >
> >
> > It means QEMU is implementing the hotplug device. So, if Xen is
> > implementing the PCIHP libxl will call the new function and it will
> > succeed. If QEMU is implementing the PCIHP then libxl will call the
> > new function, it will fail, but QEMU will implicitly do the hotplug.
> > Either way, the guest sees a hotplug event.
> 
> Even if Xen is implementing the PCIHP qemu is still involved in plugging
> the device, since it has to know about it though, so in some sense
> hotplugging is always (partially) via qemu (which is why I'm a bit
> confused, but also why the lack of a Qemu side patch surprises me)
> 

Yes, but because QEMU's PCIHP implementation never has any of its enabled bits set it remains silent.

> > > Is there any save/restore state associated with the hotplug controller?
> > > If yes then how is that handled when migrating a qemu-xen (new qemu)
> > > guest from a system which uses PCIHP in qemu to one which does it in
> > > Xen?
> > >
> >
> > No, there is no state. I don't believe there ever was.
> 
> Phew, that would have been tricky to deal with!
> 

No kidding.

> > > >      /* Define GPE control method. */
> > > >      push_block("Scope", "\\_GPE");
> > > > -    if (dm_version == QEMU_XEN_TRADITIONAL) {
> > > > -        push_block("Method", "_L02");
> > > > -    } else {
> > > > -        push_block("Method", "_E02");
> > > > -    }
> > > > +    push_block("Method", "_L02");
> > >
> > > Aren't you leaving the wrong case behind here?
> > >
> >
> > No. AFAICT the pcihp code in upstream QEMU actually implemented level
> > triggered semantics anyway - the AML was just wrong.
> 
> I think this is worth noting in the commit message, if not splitting
> into a separate patch
> 

I'll add a comment to the commit message.

> > > > diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
> > > > index 44d0453..55cb8a2 100644
> > > > --- a/tools/libxl/libxl_pci.c
> > > > +++ b/tools/libxl/libxl_pci.c
> > > > @@ -867,6 +867,13 @@ static int do_pci_add(libxl__gc *gc, uint32_t
> > > domid, libxl_device_pci *pcidev, i
> > > >          }
> > > >          if ( rc )
> > > >              return ERROR_FAIL;
> > > > +
> > > > +        rc = xc_hvm_pci_hotplug(CTX->xch, domid, pcidev->dev, 1);
> > > > +        if (rc < 0 && errno != EOPNOTSUPP) {
> > > > +            LOGE(ERROR, "Error: xc_hvm_pci_hotplug enable failed");
> > > > +            return ERROR_FAIL;
> > > > +        }
> > >
> > > I initially thought you needed to also reset rc to indicate success in
> > > the errno==EOPNOTSUPP, but actually the error handling in this function
> > > is just confusing...
> > >
> > > But, have you tried hotpluging into a migrated guest?
> > >
> >
> > Not yet, but I can't see why that would be a problem over hotplugging
> > at all,
> 
> It's the problems we can't imagine that I'm worried about. This stuff is
> certainly subtle in places WRT migration.
> 

I'll give it a go on my dev box. I'm also trying to get this series backported onto Xen 4.4 so I can throw it at XenRT.

  Paul

> Ian.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 4/9] ioreq-server: create basic ioreq server abstraction.
  2014-05-06 13:24       ` Jan Beulich
@ 2014-05-06 13:40         ` Paul Durrant
  2014-05-06 13:50           ` Jan Beulich
  2014-05-06 13:44         ` Paul Durrant
  1 sibling, 1 reply; 57+ messages in thread
From: Paul Durrant @ 2014-05-06 13:40 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Keir (Xen.org), xen-devel

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 06 May 2014 14:24
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; Keir (Xen.org)
> Subject: RE: [PATCH v5 4/9] ioreq-server: create basic ioreq server
> abstraction.
> 
> >>> On 06.05.14 at 15:12, <Paul.Durrant@citrix.com> wrote:
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> >>> On 01.05.14 at 14:08, <paul.durrant@citrix.com> wrote:
> >> > @@ -426,14 +431,6 @@ void hvm_do_resume(struct vcpu *v)
> >> >      }
> >> >  }
> >> >
> >> > -static void hvm_init_ioreq_page(
> >> > -    struct domain *d, struct hvm_ioreq_page *iorp)
> >> > -{
> >> > -    memset(iorp, 0, sizeof(*iorp));
> >> > -    spin_lock_init(&iorp->lock);
> >> > -    domain_pause(d);
> >>
> >> So where is this ...
> >
> > Nowhere. As I said in the checkin comment, the lock has gone and the
> > domain_pause() and subsequent domain_unpause() were always
> unnecessary
> > AFAICT. I think the intention was that the domain was not unpaused until
> both
> > the IOREQ PFNs were set, but since the PFNs are set in the domain build
> code
> > in the toolstack I can't see why this was needed.
> 
> So with a disaggregated, hostile tool stack this would still be
> unnecessary? It can go away only if the answer to this is "yes".
>

Ok. I'll add some belt and braces code but I was under the impression that a domain building domain currently had to be trusted.
 
> >> > +static int hvm_replace_event_channel(struct vcpu *v, domid_t
> >> remote_domid,
> >> > +                                     evtchn_port_t *p_port)
> >> > +{
> >> > +    evtchn_port_t old_port, new_port;
> >> > +
> >> > +    new_port = alloc_unbound_xen_event_channel(v, remote_domid,
> NULL);
> >> > +    if ( new_port < 0 )
> >> > +        return new_port;
> >>
> >> I'm pretty sure I commented on this too in a previous version:
> >> evtchn_port_t is an unsigned type, hence checking it to be negative
> >> is pointless.
> >
> > Yes, but as I'm pretty sure I responded,
> alloc_unbound_xen_event_channel()
> > doesn't return an evtchn_port_t!
> 
> Which doesn't matter here at all: Once you store the function result
> in a variable of type evtchn_port_t, its original signedness is lost.
> 

...which is probably why I had these coded as unsigned longs originally. I'll change them back.

  Paul

> Jan

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 4/9] ioreq-server: create basic ioreq server abstraction.
  2014-05-06 13:24       ` Jan Beulich
  2014-05-06 13:40         ` Paul Durrant
@ 2014-05-06 13:44         ` Paul Durrant
  2014-05-06 13:51           ` Jan Beulich
  1 sibling, 1 reply; 57+ messages in thread
From: Paul Durrant @ 2014-05-06 13:44 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Keir (Xen.org), xen-devel



> -----Original Message-----
> From: Paul Durrant
> Sent: 06 May 2014 14:40
> To: 'Jan Beulich'
> Cc: xen-devel@lists.xen.org; Keir (Xen.org)
> Subject: RE: [PATCH v5 4/9] ioreq-server: create basic ioreq server
> abstraction.
> 
> > -----Original Message-----
> > From: Jan Beulich [mailto:JBeulich@suse.com]
> > Sent: 06 May 2014 14:24
> > To: Paul Durrant
> > Cc: xen-devel@lists.xen.org; Keir (Xen.org)
> > Subject: RE: [PATCH v5 4/9] ioreq-server: create basic ioreq server
> > abstraction.
> >
> > >>> On 06.05.14 at 15:12, <Paul.Durrant@citrix.com> wrote:
> > >> From: Jan Beulich [mailto:JBeulich@suse.com]
> > >> >>> On 01.05.14 at 14:08, <paul.durrant@citrix.com> wrote:
> > >> > @@ -426,14 +431,6 @@ void hvm_do_resume(struct vcpu *v)
> > >> >      }
> > >> >  }
> > >> >
> > >> > -static void hvm_init_ioreq_page(
> > >> > -    struct domain *d, struct hvm_ioreq_page *iorp)
> > >> > -{
> > >> > -    memset(iorp, 0, sizeof(*iorp));
> > >> > -    spin_lock_init(&iorp->lock);
> > >> > -    domain_pause(d);
> > >>
> > >> So where is this ...
> > >
> > > Nowhere. As I said in the checkin comment, the lock has gone and the
> > > domain_pause() and subsequent domain_unpause() were always
> > unnecessary
> > > AFAICT. I think the intention was that the domain was not unpaused until
> > both
> > > the IOREQ PFNs were set, but since the PFNs are set in the domain build
> > code
> > > in the toolstack I can't see why this was needed.
> >
> > So with a disaggregated, hostile tool stack this would still be
> > unnecessary? It can go away only if the answer to this is "yes".
> >
> 
> Ok. I'll add some belt and braces code but I was under the impression that a
> domain building domain currently had to be trusted.
> 
> > >> > +static int hvm_replace_event_channel(struct vcpu *v, domid_t
> > >> remote_domid,
> > >> > +                                     evtchn_port_t *p_port)
> > >> > +{
> > >> > +    evtchn_port_t old_port, new_port;
> > >> > +
> > >> > +    new_port = alloc_unbound_xen_event_channel(v,
> remote_domid,
> > NULL);
> > >> > +    if ( new_port < 0 )
> > >> > +        return new_port;
> > >>
> > >> I'm pretty sure I commented on this too in a previous version:
> > >> evtchn_port_t is an unsigned type, hence checking it to be negative
> > >> is pointless.
> > >
> > > Yes, but as I'm pretty sure I responded,
> > alloc_unbound_xen_event_channel()
> > > doesn't return an evtchn_port_t!
> >
> > Which doesn't matter here at all: Once you store the function result
> > in a variable of type evtchn_port_t, its original signedness is lost.
> >
> 
> ...which is probably why I had these coded as unsigned longs originally. I'll
> change them back.

... and of course, I mean longs there and not unsigned longs!

> 
>   Paul
> 
> > Jan

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 4/9] ioreq-server: create basic ioreq server abstraction.
  2014-05-06 13:40         ` Paul Durrant
@ 2014-05-06 13:50           ` Jan Beulich
  0 siblings, 0 replies; 57+ messages in thread
From: Jan Beulich @ 2014-05-06 13:50 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Keir (Xen.org), xen-devel

>>> On 06.05.14 at 15:40, <Paul.Durrant@citrix.com> wrote:
>>  -----Original Message-----
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: 06 May 2014 14:24
>> To: Paul Durrant
>> Cc: xen-devel@lists.xen.org; Keir (Xen.org)
>> Subject: RE: [PATCH v5 4/9] ioreq-server: create basic ioreq server
>> abstraction.
>> 
>> >>> On 06.05.14 at 15:12, <Paul.Durrant@citrix.com> wrote:
>> >> From: Jan Beulich [mailto:JBeulich@suse.com]
>> >> >>> On 01.05.14 at 14:08, <paul.durrant@citrix.com> wrote:
>> >> > @@ -426,14 +431,6 @@ void hvm_do_resume(struct vcpu *v)
>> >> >      }
>> >> >  }
>> >> >
>> >> > -static void hvm_init_ioreq_page(
>> >> > -    struct domain *d, struct hvm_ioreq_page *iorp)
>> >> > -{
>> >> > -    memset(iorp, 0, sizeof(*iorp));
>> >> > -    spin_lock_init(&iorp->lock);
>> >> > -    domain_pause(d);
>> >>
>> >> So where is this ...
>> >
>> > Nowhere. As I said in the checkin comment, the lock has gone and the
>> > domain_pause() and subsequent domain_unpause() were always
>> unnecessary
>> > AFAICT. I think the intention was that the domain was not unpaused until
>> both
>> > the IOREQ PFNs were set, but since the PFNs are set in the domain build
>> code
>> > in the toolstack I can't see why this was needed.
>> 
>> So with a disaggregated, hostile tool stack this would still be
>> unnecessary? It can go away only if the answer to this is "yes".
>>
> 
> Ok. I'll add some belt and braces code but I was under the impression that a 
> domain building domain currently had to be trusted.

That's the defacto state, but with moving towards disaggregation
we're not intending to grow the number of things that need fixing
for this purpose. See XSA-77.

>> >> > +static int hvm_replace_event_channel(struct vcpu *v, domid_t
>> >> remote_domid,
>> >> > +                                     evtchn_port_t *p_port)
>> >> > +{
>> >> > +    evtchn_port_t old_port, new_port;
>> >> > +
>> >> > +    new_port = alloc_unbound_xen_event_channel(v, remote_domid,
>> NULL);
>> >> > +    if ( new_port < 0 )
>> >> > +        return new_port;
>> >>
>> >> I'm pretty sure I commented on this too in a previous version:
>> >> evtchn_port_t is an unsigned type, hence checking it to be negative
>> >> is pointless.
>> >
>> > Yes, but as I'm pretty sure I responded,
>> alloc_unbound_xen_event_channel()
>> > doesn't return an evtchn_port_t!
>> 
>> Which doesn't matter here at all: Once you store the function result
>> in a variable of type evtchn_port_t, its original signedness is lost.
>> 
> 
> ...which is probably why I had these coded as unsigned longs originally. 
> I'll change them back.

No matter which unsigned type you pick, it won't help.

Jan

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 4/9] ioreq-server: create basic ioreq server abstraction.
  2014-05-06 13:44         ` Paul Durrant
@ 2014-05-06 13:51           ` Jan Beulich
  2014-05-06 13:53             ` Paul Durrant
  0 siblings, 1 reply; 57+ messages in thread
From: Jan Beulich @ 2014-05-06 13:51 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Keir (Xen.org), xen-devel

>>> On 06.05.14 at 15:44, <Paul.Durrant@citrix.com> wrote:
>> From: Paul Durrant
>> > >> > +static int hvm_replace_event_channel(struct vcpu *v, domid_t
>> > >> remote_domid,
>> > >> > +                                     evtchn_port_t *p_port)
>> > >> > +{
>> > >> > +    evtchn_port_t old_port, new_port;
>> > >> > +
>> > >> > +    new_port = alloc_unbound_xen_event_channel(v,
>> remote_domid,
>> > NULL);
>> > >> > +    if ( new_port < 0 )
>> > >> > +        return new_port;
>> > >>
>> > >> I'm pretty sure I commented on this too in a previous version:
>> > >> evtchn_port_t is an unsigned type, hence checking it to be negative
>> > >> is pointless.
>> > >
>> > > Yes, but as I'm pretty sure I responded,
>> > alloc_unbound_xen_event_channel()
>> > > doesn't return an evtchn_port_t!
>> >
>> > Which doesn't matter here at all: Once you store the function result
>> > in a variable of type evtchn_port_t, its original signedness is lost.
>> >
>> 
>> ...which is probably why I had these coded as unsigned longs originally. I'll
>> change them back.
> 
> ... and of course, I mean longs there and not unsigned longs!

In which case you ought to mean int, as that's what the function
returns. No need for widening that value.

Jan

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 4/9] ioreq-server: create basic ioreq server abstraction.
  2014-05-06 13:51           ` Jan Beulich
@ 2014-05-06 13:53             ` Paul Durrant
  0 siblings, 0 replies; 57+ messages in thread
From: Paul Durrant @ 2014-05-06 13:53 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Keir (Xen.org), xen-devel

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 06 May 2014 14:51
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; Keir (Xen.org)
> Subject: RE: [PATCH v5 4/9] ioreq-server: create basic ioreq server
> abstraction.
> 
> >>> On 06.05.14 at 15:44, <Paul.Durrant@citrix.com> wrote:
> >> From: Paul Durrant
> >> > >> > +static int hvm_replace_event_channel(struct vcpu *v, domid_t
> >> > >> remote_domid,
> >> > >> > +                                     evtchn_port_t *p_port)
> >> > >> > +{
> >> > >> > +    evtchn_port_t old_port, new_port;
> >> > >> > +
> >> > >> > +    new_port = alloc_unbound_xen_event_channel(v,
> >> remote_domid,
> >> > NULL);
> >> > >> > +    if ( new_port < 0 )
> >> > >> > +        return new_port;
> >> > >>
> >> > >> I'm pretty sure I commented on this too in a previous version:
> >> > >> evtchn_port_t is an unsigned type, hence checking it to be negative
> >> > >> is pointless.
> >> > >
> >> > > Yes, but as I'm pretty sure I responded,
> >> > alloc_unbound_xen_event_channel()
> >> > > doesn't return an evtchn_port_t!
> >> >
> >> > Which doesn't matter here at all: Once you store the function result
> >> > in a variable of type evtchn_port_t, its original signedness is lost.
> >> >
> >>
> >> ...which is probably why I had these coded as unsigned longs originally. I'll
> >> change them back.
> >
> > ... and of course, I mean longs there and not unsigned longs!
> 
> In which case you ought to mean int, as that's what the function
> returns. No need for widening that value.
>

Ok. Good point.

  Paul

> Jan

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 3/9] ioreq-server: centralize access to ioreq structures
  2014-05-06 12:41     ` Paul Durrant
@ 2014-05-06 14:13       ` Paul Durrant
  2014-05-06 14:21         ` Jan Beulich
  0 siblings, 1 reply; 57+ messages in thread
From: Paul Durrant @ 2014-05-06 14:13 UTC (permalink / raw)
  To: Paul Durrant, Jan Beulich
  Cc: Eddie Dong, Kevin Tian, Keir (Xen.org), Jun Nakajima, xen-devel

> -----Original Message-----
> From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-
> bounces@lists.xen.org] On Behalf Of Paul Durrant
> Sent: 06 May 2014 13:41
> To: Jan Beulich
> Cc: Keir (Xen.org); Kevin Tian; Eddie Dong; Jun Nakajima; xen-
> devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH v5 3/9] ioreq-server: centralize access to
> ioreq structures
> 
> > -----Original Message-----
> > From: Jan Beulich [mailto:JBeulich@suse.com]
> > Sent: 06 May 2014 13:35
> > To: Paul Durrant
> > Cc: Eddie Dong; Jun Nakajima; Kevin Tian; xen-devel@lists.xen.org; Keir
> > (Xen.org)
> > Subject: Re: [PATCH v5 3/9] ioreq-server: centralize access to ioreq
> structures
> >
> > >>> On 01.05.14 at 14:08, <paul.durrant@citrix.com> wrote:
> > > To simplify creation of the ioreq server abstraction in a subsequent patch,
> > > this patch centralizes all use of the shared ioreq structure and the
> > > buffered ioreq ring to the source module xen/arch/x86/hvm/hvm.c.
> > >
> > > The patch moves an rmb() from inside hvm_io_assist() to
> > hvm_do_resume()
> > > because the former may now be passed a data structure on stack, in
> which
> > > case the barrier is unnecessary.
> > >
> > > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> >
> > Acked-by: Jan Beulich <jbeulich@suse.com>
> >
> 
> Thanks.
> 
> > with these minor comments:
> >
> > > --- a/xen/arch/x86/hvm/hvm.c
> > > +++ b/xen/arch/x86/hvm/hvm.c
> > > @@ -363,6 +363,26 @@ void hvm_migrate_pirqs(struct vcpu *v)
> > >      spin_unlock(&d->event_lock);
> > >  }
> > >
> > > +static ioreq_t *get_ioreq(struct vcpu *v)
> >
> > const struct vcpu *? Or was it that this conflicts with a subsequent
> > patch?
> >
> 
> I guess that could be done, although it wasn't const before it was moved.
> 

Attempting this causes issues when using vcpu_runnable() in a subsequent patch, so I won't do it.

  Paul

>   Paul
> 
> > > +{
> > > +    struct domain *d = v->domain;
> > > +    shared_iopage_t *p = d->arch.hvm_domain.ioreq.va;
> > > +
> > > +    ASSERT((v == current) || spin_is_locked(&d-
> > >arch.hvm_domain.ioreq.lock));
> > > +
> > > +    return p ? &p->vcpu_ioreq[v->vcpu_id] : NULL;
> > > +}
> > > +
> > > +bool_t hvm_io_pending(struct vcpu *v)
> > > +{
> > > +    ioreq_t *p = get_ioreq(v);
> >
> > It would be particularly desirable since this one would logically want
> > its parameter to be const-qualified.
> >
> > > +bool_t hvm_has_dm(struct domain *d)
> > > +{
> > > +    return !!d->arch.hvm_domain.ioreq.va;
> > > +}
> >
> > Pretty certainly const here.
> >
> > Jan
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 5/9] ioreq-server: on-demand creation of ioreq server
  2014-05-01 12:08 ` [PATCH v5 5/9] ioreq-server: on-demand creation of ioreq server Paul Durrant
@ 2014-05-06 14:18   ` Jan Beulich
  2014-05-06 14:24     ` Paul Durrant
  0 siblings, 1 reply; 57+ messages in thread
From: Jan Beulich @ 2014-05-06 14:18 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Keir Fraser, xen-devel

>>> On 01.05.14 at 14:08, <paul.durrant@citrix.com> wrote:
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -389,40 +389,38 @@ void hvm_do_resume(struct vcpu *v)
>  {
>      struct domain *d = v->domain;
>      struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
> -    ioreq_t *p;
>  
>      check_wakeup_from_wait();
>  
>      if ( is_hvm_vcpu(v) )
>          pt_restore_timer(v);
>  
> -    if ( !s )
> -        goto check_inject_trap;
> -
> -    /* NB. Optimised for common case (p->state == STATE_IOREQ_NONE). */
> -    p = get_ioreq(s, v);
> -    while ( p->state != STATE_IOREQ_NONE )
> +    if ( s )
>      {
> -        switch ( p->state )
> +        ioreq_t *p = get_ioreq(s, v);
> +
> +        while ( p->state != STATE_IOREQ_NONE )
>          {
> -        case STATE_IORESP_READY: /* IORESP_READY -> NONE */
> -            rmb(); /* see IORESP_READY /then/ read contents of ioreq */
> -            hvm_io_assist(p);
> -            break;
> -        case STATE_IOREQ_READY:  /* IOREQ_{READY,INPROCESS} -> IORESP_READY */
> -        case STATE_IOREQ_INPROCESS:
> -            wait_on_xen_event_channel(p->vp_eport,
> -                                      (p->state != STATE_IOREQ_READY) &&
> -                                      (p->state != STATE_IOREQ_INPROCESS));
> -            break;
> -        default:
> -            gdprintk(XENLOG_ERR, "Weird HVM iorequest state %d.\n", p->state);
> -            domain_crash(v->domain);
> -            return; /* bail */
> +            switch ( p->state )
> +            {
> +            case STATE_IORESP_READY: /* IORESP_READY -> NONE */
> +                rmb(); /* see IORESP_READY /then/ read contents of ioreq */
> +                hvm_io_assist(p);
> +                break;
> +            case STATE_IOREQ_READY:  /* IOREQ_{READY,INPROCESS} -> IORESP_READY */
> +            case STATE_IOREQ_INPROCESS:
> +                wait_on_xen_event_channel(p->vp_eport,
> +                                          (p->state != STATE_IOREQ_READY) &&
> +                                          (p->state != STATE_IOREQ_INPROCESS));
> +                break;
> +            default:
> +                gdprintk(XENLOG_ERR, "Weird HVM iorequest state %d.\n", p->state);
> +                domain_crash(d);
> +                return; /* bail */
> +            }
>          }
>      }
>  
> - check_inject_trap:
>      /* Inject pending hw/sw trap */
>      if ( v->arch.hvm_vcpu.inject_trap.vector != -1 ) 
>      {

Isn't this entire hunk just a stylistic change from using goto to the
alternative if() representation? I would strongly suggest leaving out
such reformatting from an already large patch.

> -static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
> +static void hvm_ioreq_server_remove_all_vcpus(struct hvm_ioreq_server *s)
>  {
> -    struct hvm_ioreq_server *s;
> +    struct hvm_ioreq_vcpu *sv, *next;
>  
> -    s = xzalloc(struct hvm_ioreq_server);
> -    if ( !s )
> -        return -ENOMEM;
> +    spin_lock(&s->lock);
> +
> +    list_for_each_entry_safe ( sv,
> +                               next,
> +                               &s->ioreq_vcpu_list,
> +                               list_entry )
> +    {
> +        struct vcpu *v = sv->vcpu;
> +
> +        list_del_init(&sv->list_entry);

list_del() - you're freeing the entry below anyway.

> +static int hvm_ioreq_server_init(struct hvm_ioreq_server *s, struct domain *d,
> +                                 domid_t domid)
> +{
> +    struct vcpu *v;
> +    int rc;
> +
> +    gdprintk(XENLOG_DEBUG, "%s %d\n", __func__, domid);

Please don't in a non-RFC patch.

> +static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
>  {
> -    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
> +    struct hvm_ioreq_server *s;
>      int rc;
>  
> -    spin_lock(&s->lock);
> +    rc = -ENOMEM;
> +    s = xzalloc(struct hvm_ioreq_server);
> +    if ( !s )
> +        goto fail1;
> +
> +    domain_pause(d);
> +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> +
> +    rc = -EEXIST;
> +    if ( d->arch.hvm_domain.ioreq_server != NULL )
> +        goto fail2;
>  
> -    rc = hvm_map_ioreq_page(s, buf, pfn);
> +    rc = hvm_ioreq_server_init(s, d, domid);
>      if ( rc )
> -        goto fail;
> +        goto fail3;
>  
> -    if (!buf) {
> -        struct hvm_ioreq_vcpu *sv;
> +    d->arch.hvm_domain.ioreq_server = s;
>  
> -        list_for_each_entry ( sv,
> -                              &s->ioreq_vcpu_list,
> -                              list_entry )
> -            hvm_update_ioreq_evtchn(s, sv);
> -    }
> +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> +    domain_unpause(d);
>  
> -    spin_unlock(&s->lock);
>      return 0;
>  
> - fail:
> -    spin_unlock(&s->lock);
> + fail3:
> + fail2:

Why two successive labels?

>  static int hvm_set_dm_domain(struct domain *d, domid_t domid)
>  {
> -    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
> +    struct hvm_ioreq_server *s;
>      int rc = 0;
>  
> +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> +
> +    s = d->arch.hvm_domain.ioreq_server;
> +    if ( !s )
> +        goto done;

You didn't do what you were asked for if there's no server - how can
this result in "success" being returned?

Jan

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 3/9] ioreq-server: centralize access to ioreq structures
  2014-05-06 14:13       ` Paul Durrant
@ 2014-05-06 14:21         ` Jan Beulich
  2014-05-06 14:32           ` Paul Durrant
  0 siblings, 1 reply; 57+ messages in thread
From: Jan Beulich @ 2014-05-06 14:21 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Keir (Xen.org), Kevin Tian, EddieDong, Jun Nakajima, xen-devel

>>> On 06.05.14 at 16:13, <Paul.Durrant@citrix.com> wrote:
>> > From: Jan Beulich [mailto:JBeulich@suse.com]
>> > >>> On 01.05.14 at 14:08, <paul.durrant@citrix.com> wrote:
>> > > --- a/xen/arch/x86/hvm/hvm.c
>> > > +++ b/xen/arch/x86/hvm/hvm.c
>> > > @@ -363,6 +363,26 @@ void hvm_migrate_pirqs(struct vcpu *v)
>> > >      spin_unlock(&d->event_lock);
>> > >  }
>> > >
>> > > +static ioreq_t *get_ioreq(struct vcpu *v)
>> >
>> > const struct vcpu *? Or was it that this conflicts with a subsequent
>> > patch?
>> >
>> 
>> I guess that could be done, although it wasn't const before it was moved.
>> 
> 
> Attempting this causes issues when using vcpu_runnable() in a subsequent 
> patch, so I won't do it.

Just make vcpu_runnable()'s parameter const too then - it easily can
be.

Jan

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 5/9] ioreq-server: on-demand creation of ioreq server
  2014-05-06 14:18   ` Jan Beulich
@ 2014-05-06 14:24     ` Paul Durrant
  2014-05-06 15:07       ` Jan Beulich
  0 siblings, 1 reply; 57+ messages in thread
From: Paul Durrant @ 2014-05-06 14:24 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Keir (Xen.org), xen-devel

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 06 May 2014 15:19
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; Keir (Xen.org)
> Subject: Re: [PATCH v5 5/9] ioreq-server: on-demand creation of ioreq
> server
> 
> >>> On 01.05.14 at 14:08, <paul.durrant@citrix.com> wrote:
> > --- a/xen/arch/x86/hvm/hvm.c
> > +++ b/xen/arch/x86/hvm/hvm.c
> > @@ -389,40 +389,38 @@ void hvm_do_resume(struct vcpu *v)
> >  {
> >      struct domain *d = v->domain;
> >      struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
> > -    ioreq_t *p;
> >
> >      check_wakeup_from_wait();
> >
> >      if ( is_hvm_vcpu(v) )
> >          pt_restore_timer(v);
> >
> > -    if ( !s )
> > -        goto check_inject_trap;
> > -
> > -    /* NB. Optimised for common case (p->state == STATE_IOREQ_NONE).
> */
> > -    p = get_ioreq(s, v);
> > -    while ( p->state != STATE_IOREQ_NONE )
> > +    if ( s )
> >      {
> > -        switch ( p->state )
> > +        ioreq_t *p = get_ioreq(s, v);
> > +
> > +        while ( p->state != STATE_IOREQ_NONE )
> >          {
> > -        case STATE_IORESP_READY: /* IORESP_READY -> NONE */
> > -            rmb(); /* see IORESP_READY /then/ read contents of ioreq */
> > -            hvm_io_assist(p);
> > -            break;
> > -        case STATE_IOREQ_READY:  /* IOREQ_{READY,INPROCESS} ->
> IORESP_READY */
> > -        case STATE_IOREQ_INPROCESS:
> > -            wait_on_xen_event_channel(p->vp_eport,
> > -                                      (p->state != STATE_IOREQ_READY) &&
> > -                                      (p->state != STATE_IOREQ_INPROCESS));
> > -            break;
> > -        default:
> > -            gdprintk(XENLOG_ERR, "Weird HVM iorequest state %d.\n", p-
> >state);
> > -            domain_crash(v->domain);
> > -            return; /* bail */
> > +            switch ( p->state )
> > +            {
> > +            case STATE_IORESP_READY: /* IORESP_READY -> NONE */
> > +                rmb(); /* see IORESP_READY /then/ read contents of ioreq */
> > +                hvm_io_assist(p);
> > +                break;
> > +            case STATE_IOREQ_READY:  /* IOREQ_{READY,INPROCESS} ->
> IORESP_READY */
> > +            case STATE_IOREQ_INPROCESS:
> > +                wait_on_xen_event_channel(p->vp_eport,
> > +                                          (p->state != STATE_IOREQ_READY) &&
> > +                                          (p->state != STATE_IOREQ_INPROCESS));
> > +                break;
> > +            default:
> > +                gdprintk(XENLOG_ERR, "Weird HVM iorequest state %d.\n", p-
> >state);
> > +                domain_crash(d);
> > +                return; /* bail */
> > +            }
> >          }
> >      }
> >
> > - check_inject_trap:
> >      /* Inject pending hw/sw trap */
> >      if ( v->arch.hvm_vcpu.inject_trap.vector != -1 )
> >      {
> 
> Isn't this entire hunk just a stylistic change from using goto to the
> alternative if() representation? I would strongly suggest leaving out
> such reformatting from an already large patch.
> 

Ok. I'll pull it into pre-series tidy up.

> > -static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
> > +static void hvm_ioreq_server_remove_all_vcpus(struct
> hvm_ioreq_server *s)
> >  {
> > -    struct hvm_ioreq_server *s;
> > +    struct hvm_ioreq_vcpu *sv, *next;
> >
> > -    s = xzalloc(struct hvm_ioreq_server);
> > -    if ( !s )
> > -        return -ENOMEM;
> > +    spin_lock(&s->lock);
> > +
> > +    list_for_each_entry_safe ( sv,
> > +                               next,
> > +                               &s->ioreq_vcpu_list,
> > +                               list_entry )
> > +    {
> > +        struct vcpu *v = sv->vcpu;
> > +
> > +        list_del_init(&sv->list_entry);
> 
> list_del() - you're freeing the entry below anyway.
> 

Ok.

> > +static int hvm_ioreq_server_init(struct hvm_ioreq_server *s, struct
> domain *d,
> > +                                 domid_t domid)
> > +{
> > +    struct vcpu *v;
> > +    int rc;
> > +
> > +    gdprintk(XENLOG_DEBUG, "%s %d\n", __func__, domid);
> 
> Please don't in a non-RFC patch.
> 

Ok.

> > +static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
> >  {
> > -    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
> > +    struct hvm_ioreq_server *s;
> >      int rc;
> >
> > -    spin_lock(&s->lock);
> > +    rc = -ENOMEM;
> > +    s = xzalloc(struct hvm_ioreq_server);
> > +    if ( !s )
> > +        goto fail1;
> > +
> > +    domain_pause(d);
> > +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> > +
> > +    rc = -EEXIST;
> > +    if ( d->arch.hvm_domain.ioreq_server != NULL )
> > +        goto fail2;
> >
> > -    rc = hvm_map_ioreq_page(s, buf, pfn);
> > +    rc = hvm_ioreq_server_init(s, d, domid);
> >      if ( rc )
> > -        goto fail;
> > +        goto fail3;
> >
> > -    if (!buf) {
> > -        struct hvm_ioreq_vcpu *sv;
> > +    d->arch.hvm_domain.ioreq_server = s;
> >
> > -        list_for_each_entry ( sv,
> > -                              &s->ioreq_vcpu_list,
> > -                              list_entry )
> > -            hvm_update_ioreq_evtchn(s, sv);
> > -    }
> > +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> > +    domain_unpause(d);
> >
> > -    spin_unlock(&s->lock);
> >      return 0;
> >
> > - fail:
> > -    spin_unlock(&s->lock);
> > + fail3:
> > + fail2:
> 
> Why two successive labels?
> 

Two different jumping off points - I know you have a hatred of labels and gotos so I'll collapse into one.

> >  static int hvm_set_dm_domain(struct domain *d, domid_t domid)
> >  {
> > -    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
> > +    struct hvm_ioreq_server *s;
> >      int rc = 0;
> >
> > +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> > +
> > +    s = d->arch.hvm_domain.ioreq_server;
> > +    if ( !s )
> > +        goto done;
> 
> You didn't do what you were asked for if there's no server - how can
> this result in "success" being returned?
> 

Because lack of server is not a failure! If the HVM param for the emulating domain is set prior to server creation then the server is created with the correct domain. If it's done subsequently then the guest needs to be paused, the emulating domain updated and the event channels rebound.

  Paul

> Jan

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 3/9] ioreq-server: centralize access to ioreq structures
  2014-05-06 14:21         ` Jan Beulich
@ 2014-05-06 14:32           ` Paul Durrant
  2014-05-06 14:39             ` Jan Beulich
  0 siblings, 1 reply; 57+ messages in thread
From: Paul Durrant @ 2014-05-06 14:32 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Keir (Xen.org), Kevin Tian, Eddie Dong, Jun Nakajima, xen-devel

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 06 May 2014 15:22
> To: Paul Durrant
> Cc: Eddie Dong; Jun Nakajima; Kevin Tian; xen-devel@lists.xen.org; Keir
> (Xen.org)
> Subject: RE: [Xen-devel] [PATCH v5 3/9] ioreq-server: centralize access to
> ioreq structures
> 
> >>> On 06.05.14 at 16:13, <Paul.Durrant@citrix.com> wrote:
> >> > From: Jan Beulich [mailto:JBeulich@suse.com]
> >> > >>> On 01.05.14 at 14:08, <paul.durrant@citrix.com> wrote:
> >> > > --- a/xen/arch/x86/hvm/hvm.c
> >> > > +++ b/xen/arch/x86/hvm/hvm.c
> >> > > @@ -363,6 +363,26 @@ void hvm_migrate_pirqs(struct vcpu *v)
> >> > >      spin_unlock(&d->event_lock);
> >> > >  }
> >> > >
> >> > > +static ioreq_t *get_ioreq(struct vcpu *v)
> >> >
> >> > const struct vcpu *? Or was it that this conflicts with a subsequent
> >> > patch?
> >> >
> >>
> >> I guess that could be done, although it wasn't const before it was moved.
> >>
> >
> > Attempting this causes issues when using vcpu_runnable() in a subsequent
> > patch, so I won't do it.
> 
> Just make vcpu_runnable()'s parameter const too then - it easily can
> be.

That causes the build to fail because atomic_read() discards the const qualifier.

  Paul

> 
> Jan

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 3/9] ioreq-server: centralize access to ioreq structures
  2014-05-06 14:32           ` Paul Durrant
@ 2014-05-06 14:39             ` Jan Beulich
  0 siblings, 0 replies; 57+ messages in thread
From: Jan Beulich @ 2014-05-06 14:39 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Keir (Xen.org), Kevin Tian, Eddie Dong, Jun Nakajima, xen-devel

>>> On 06.05.14 at 16:32, <Paul.Durrant@citrix.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> >>> On 06.05.14 at 16:13, <Paul.Durrant@citrix.com> wrote:
>> >> > From: Jan Beulich [mailto:JBeulich@suse.com]
>> >> > >>> On 01.05.14 at 14:08, <paul.durrant@citrix.com> wrote:
>> >> > > --- a/xen/arch/x86/hvm/hvm.c
>> >> > > +++ b/xen/arch/x86/hvm/hvm.c
>> >> > > @@ -363,6 +363,26 @@ void hvm_migrate_pirqs(struct vcpu *v)
>> >> > >      spin_unlock(&d->event_lock);
>> >> > >  }
>> >> > >
>> >> > > +static ioreq_t *get_ioreq(struct vcpu *v)
>> >> >
>> >> > const struct vcpu *? Or was it that this conflicts with a subsequent
>> >> > patch?
>> >> >
>> >>
>> >> I guess that could be done, although it wasn't const before it was moved.
>> >>
>> >
>> > Attempting this causes issues when using vcpu_runnable() in a subsequent
>> > patch, so I won't do it.
>> 
>> Just make vcpu_runnable()'s parameter const too then - it easily can
>> be.
> 
> That causes the build to fail because atomic_read() discards the const 
> qualifier.

I guess just leave it then, albeit I'm tempted to ask to also fix that
one in turn. I'm of the opinion that we really ought to make much
broader use of const, especially to document functions not intended
to alter state.

Jan

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 5/9] ioreq-server: on-demand creation of ioreq server
  2014-05-06 14:24     ` Paul Durrant
@ 2014-05-06 15:07       ` Jan Beulich
  0 siblings, 0 replies; 57+ messages in thread
From: Jan Beulich @ 2014-05-06 15:07 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Keir (Xen.org), xen-devel

>>> On 06.05.14 at 16:24, <Paul.Durrant@citrix.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> >>> On 01.05.14 at 14:08, <paul.durrant@citrix.com> wrote:
>> > --- a/xen/arch/x86/hvm/hvm.c
>> > +++ b/xen/arch/x86/hvm/hvm.c
>> > @@ -389,40 +389,38 @@ void hvm_do_resume(struct vcpu *v)
>> >  {
>> >      struct domain *d = v->domain;
>> >      struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
>> > -    ioreq_t *p;
>> >
>> >      check_wakeup_from_wait();
>> >
>> >      if ( is_hvm_vcpu(v) )
>> >          pt_restore_timer(v);
>> >
>> > -    if ( !s )
>> > -        goto check_inject_trap;
>> > -
>> > -    /* NB. Optimised for common case (p->state == STATE_IOREQ_NONE).
>> */
>> > -    p = get_ioreq(s, v);
>> > -    while ( p->state != STATE_IOREQ_NONE )
>> > +    if ( s )
>> >      {
>> > -        switch ( p->state )
>> > +        ioreq_t *p = get_ioreq(s, v);
>> > +
>> > +        while ( p->state != STATE_IOREQ_NONE )
>> >          {
>> > -        case STATE_IORESP_READY: /* IORESP_READY -> NONE */
>> > -            rmb(); /* see IORESP_READY /then/ read contents of ioreq */
>> > -            hvm_io_assist(p);
>> > -            break;
>> > -        case STATE_IOREQ_READY:  /* IOREQ_{READY,INPROCESS} ->
>> IORESP_READY */
>> > -        case STATE_IOREQ_INPROCESS:
>> > -            wait_on_xen_event_channel(p->vp_eport,
>> > -                                      (p->state != STATE_IOREQ_READY) &&
>> > -                                      (p->state != STATE_IOREQ_INPROCESS));
>> > -            break;
>> > -        default:
>> > -            gdprintk(XENLOG_ERR, "Weird HVM iorequest state %d.\n", p-
>> >state);
>> > -            domain_crash(v->domain);
>> > -            return; /* bail */
>> > +            switch ( p->state )
>> > +            {
>> > +            case STATE_IORESP_READY: /* IORESP_READY -> NONE */
>> > +                rmb(); /* see IORESP_READY /then/ read contents of ioreq */
>> > +                hvm_io_assist(p);
>> > +                break;
>> > +            case STATE_IOREQ_READY:  /* IOREQ_{READY,INPROCESS} ->
>> IORESP_READY */
>> > +            case STATE_IOREQ_INPROCESS:
>> > +                wait_on_xen_event_channel(p->vp_eport,
>> > +                                          (p->state != STATE_IOREQ_READY) &&
>> > +                                          (p->state != STATE_IOREQ_INPROCESS));
>> > +                break;
>> > +            default:
>> > +                gdprintk(XENLOG_ERR, "Weird HVM iorequest state %d.\n", p-
>> >state);
>> > +                domain_crash(d);
>> > +                return; /* bail */
>> > +            }
>> >          }
>> >      }
>> >
>> > - check_inject_trap:
>> >      /* Inject pending hw/sw trap */
>> >      if ( v->arch.hvm_vcpu.inject_trap.vector != -1 )
>> >      {
>> 
>> Isn't this entire hunk just a stylistic change from using goto to the
>> alternative if() representation? I would strongly suggest leaving out
>> such reformatting from an already large patch.
>> 
> 
> Ok. I'll pull it into pre-series tidy up.

If you really think it's worthwhile...

Jan

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 7/9] ioreq-server: remove p2m entries when server is enabled
  2014-05-06 10:48   ` Ian Campbell
@ 2014-05-06 16:57     ` Paul Durrant
  0 siblings, 0 replies; 57+ messages in thread
From: Paul Durrant @ 2014-05-06 16:57 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Ian Jackson, Stefano Stabellini, Jan Beulich, xen-devel

> -----Original Message-----
> From: Ian Campbell
> Sent: 06 May 2014 11:48
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; Ian Jackson; Stefano Stabellini; Jan Beulich
> Subject: Re: [PATCH v5 7/9] ioreq-server: remove p2m entries when server is
> enabled
> 
> On Thu, 2014-05-01 at 13:08 +0100, Paul Durrant wrote:
> > For secondary servers, add a hvm op to enable/disable the server. The
> > server will not accept IO until it is enabled and the act of enabling
> > the server removes its pages from the guest p2m, thus preventing the
> guest
> > from directly mapping the pages and synthesizing ioreqs.
> 
> Does the ring get reset when the server is enabled? Otherwise what
> prevents the guest from preloading something before the ring is
> activated?
> 

Good point. For secondary emulators the ioreq pages should be reset.

  Paul

> In terms of the tools side binding of the interface this looks fine to me.
> 
> Ian.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 6/9] ioreq-server: add support for multiple servers
  2014-05-06 13:28     ` Paul Durrant
@ 2014-05-07  9:44       ` Ian Campbell
  2014-05-07  9:48         ` Paul Durrant
  0 siblings, 1 reply; 57+ messages in thread
From: Ian Campbell @ 2014-05-07  9:44 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Ian Jackson, Stefano Stabellini, Jan Beulich, xen-devel

On Tue, 2014-05-06 at 14:28 +0100, Paul Durrant wrote:
> > -----Original Message-----
> > From: Ian Campbell
> > Sent: 06 May 2014 11:46
> > To: Paul Durrant
> > Cc: xen-devel@lists.xen.org; Ian Jackson; Stefano Stabellini; Jan Beulich
> > Subject: Re: [PATCH v5 6/9] ioreq-server: add support for multiple servers
> > 
> > On Thu, 2014-05-01 at 13:08 +0100, Paul Durrant wrote:
> > > NOTE: To prevent emulators running in non-privileged guests from
> > >       potentially allocating very large amounts of xen heap, the core
> > >       rangeset code has been modified to introduce a hard limit of 256
> > >       ranges per set.
> > 
> > OOI how much RAM does that correspond to?
> 
> Each range is two pointers (list_head) and two unsigned longs (start and end), so that's 32 bytes - so 256 is two pages worth.

Seems reasonable.


> > > +    }
> > > +
> > > +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> > > +    if ( arg == NULL )
> > > +        return -1;
> > > +
> > > +    hypercall.op     = __HYPERVISOR_hvm_op;
> > > +    hypercall.arg[0] = HVMOP_map_io_range_to_ioreq_server;
> > > +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> > > +
> > > +    arg->domid = domid;
> > > +    arg->id = id;
> > > +    arg->type = HVMOP_IO_RANGE_PCI;
> > > +    arg->start = arg->end = HVMOP_PCI_SBDF((uint64_t)segment,
> > 
> > Since you have HVMOP_IO_RANGE_PCI do you not want to expose that via
> > this interface?
> > 
> 
> I could have crunched this into the map_range function. I left it
> separate because I thought it was more convenient for callers - who I
> think will most likely deal with one PCI device at a time.

Oh, so the "ranginess" is an existing feature of the hypercall which you
are using but doesn't really apply to this use case? (I did think the
concept of a range of PCI devices wasn't likely to be very useful,
except perhaps in the "all functions of a device" case perhaps).

Does the actual hypercall deal with a range? Or does it insist that
start == end? Looks like the former, I'm happy with that if the
hypervisor side guys are. Not sure if it is worth a comment somewhere?

> > > @@ -502,6 +505,31 @@ static int setup_guest(xc_interface *xch,
> > >                       special_pfn(SPECIALPAGE_SHARING));
> > >
> > >      /*
> > > +     * Allocate and clear additional ioreq server pages. The default
> > > +     * server will use the IOREQ and BUFIOREQ special pages above.
> > > +     */
> > > +    for ( i = 0; i < NR_IOREQ_SERVER_PAGES; i++ )
> > > +    {
> > > +        xen_pfn_t pfn = ioreq_server_pfn(i);
> > > +
> > > +        rc = xc_domain_populate_physmap_exact(xch, dom, 1, 0, 0, &pfn);
> > > +        if ( rc != 0 )
> > > +        {
> > > +            PERROR("Could not allocate %d'th ioreq server page.", i);
> > 
> > This will say things like "1'th". "Could not allocate ioreq server page
> > %d" avoids that.
> 
> Ok. It was a cut'n'paste from the special_pfn code just above. I'll
> fix that while I'm in the neighbourhood.

Thanks!

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 9/9] ioreq-server: bring the PCI hotplug controller implementation into Xen
  2014-05-06 13:35         ` Paul Durrant
@ 2014-05-07  9:48           ` Ian Campbell
  2014-05-07  9:51             ` Paul Durrant
  0 siblings, 1 reply; 57+ messages in thread
From: Ian Campbell @ 2014-05-07  9:48 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Ian Jackson, Stefano Stabellini, Jan Beulich, xen-devel

On Tue, 2014-05-06 at 14:35 +0100, Paul Durrant wrote:
> > When libxl does the qmp call to tell qemu about a new device what stops
> > qemu from generating hotplug events (SCI or whatever the interrupt
> > mechanism is) even if Xen is emulating the HP controller? Because Qemu
> > will still have the controller emulation, it's just "shadowed", right?
> >
> 
> The controller emulation is there but since the ports are handled by
> Xen the I/O to enable the hotplug events will never get through. So,
> yes QEMU will go through the motions of hotplugging but it will never
> raise the SCI.

OK. And just to make doubly sure this I/O to enable the hotplug events
is definitely not preserved over a migration somewhere? (meaning I don't
have to think about what happens if it is enabled in qemu and then
migrated to a system where Xen handles it)


>  
> > > > What does "hotplugging via qemu" mean here if qemu isn't patched to
> > call
> > > > this new call?
> > > >
> > >
> > > It means QEMU is implementing the hotplug device. So, if Xen is
> > > implementing the PCIHP libxl will call the new function and it will
> > > succeed. If QEMU is implementing the PCIHP then libxl will call the
> > > new function, it will fail, but QEMU will implicitly do the hotplug.
> > > Either way, the guest sees a hotplug event.
> > 
> > Even if Xen is implementing the PCIHP qemu is still involved in plugging
> > the device, since it has to know about it though, so in some sense
> > hotplugging is always (partially) via qemu (which is why I'm a bit
> > confused, but also why the lack of a Qemu side patch surprises me)
> > 
> 
> Yes, but because QEMU's PCIHP implementation never has any of its
> enabled bits set it remains silent.

Makes sense, might be worth mentioing this somewhere (comment, commit
log), since it is a bit subtle.


> > > > > diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
> > > > > index 44d0453..55cb8a2 100644
> > > > > --- a/tools/libxl/libxl_pci.c
> > > > > +++ b/tools/libxl/libxl_pci.c
> > > > > @@ -867,6 +867,13 @@ static int do_pci_add(libxl__gc *gc, uint32_t
> > > > domid, libxl_device_pci *pcidev, i
> > > > >          }
> > > > >          if ( rc )
> > > > >              return ERROR_FAIL;
> > > > > +
> > > > > +        rc = xc_hvm_pci_hotplug(CTX->xch, domid, pcidev->dev, 1);
> > > > > +        if (rc < 0 && errno != EOPNOTSUPP) {
> > > > > +            LOGE(ERROR, "Error: xc_hvm_pci_hotplug enable failed");
> > > > > +            return ERROR_FAIL;
> > > > > +        }
> > > >
> > > > I initially thought you needed to also reset rc to indicate success in
> > > > the errno==EOPNOTSUPP, but actually the error handling in this function
> > > > is just confusing...
> > > >
> > > > But, have you tried hotpluging into a migrated guest?
> > > >
> > >
> > > Not yet, but I can't see why that would be a problem over hotplugging
> > > at all,
> > 
> > It's the problems we can't imagine that I'm worried about. This stuff is
> > certainly subtle in places WRT migration.
> > 
> 
> I'll give it a go on my dev box. I'm also trying to get this series
> backported onto Xen 4.4 so I can throw it at XenRT.

Thanks!

Ian.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 6/9] ioreq-server: add support for multiple servers
  2014-05-07  9:44       ` Ian Campbell
@ 2014-05-07  9:48         ` Paul Durrant
  0 siblings, 0 replies; 57+ messages in thread
From: Paul Durrant @ 2014-05-07  9:48 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Ian Jackson, Stefano Stabellini, Jan Beulich, xen-devel

> -----Original Message-----
> From: Ian Campbell
> Sent: 07 May 2014 10:45
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; Ian Jackson; Stefano Stabellini; Jan Beulich
> Subject: Re: [PATCH v5 6/9] ioreq-server: add support for multiple servers
> 
> On Tue, 2014-05-06 at 14:28 +0100, Paul Durrant wrote:
> > > -----Original Message-----
> > > From: Ian Campbell
> > > Sent: 06 May 2014 11:46
> > > To: Paul Durrant
> > > Cc: xen-devel@lists.xen.org; Ian Jackson; Stefano Stabellini; Jan Beulich
> > > Subject: Re: [PATCH v5 6/9] ioreq-server: add support for multiple
> servers
> > >
> > > On Thu, 2014-05-01 at 13:08 +0100, Paul Durrant wrote:
> > > > NOTE: To prevent emulators running in non-privileged guests from
> > > >       potentially allocating very large amounts of xen heap, the core
> > > >       rangeset code has been modified to introduce a hard limit of 256
> > > >       ranges per set.
> > >
> > > OOI how much RAM does that correspond to?
> >
> > Each range is two pointers (list_head) and two unsigned longs (start and
> end), so that's 32 bytes - so 256 is two pages worth.
> 
> Seems reasonable.
> 
> 
> > > > +    }
> > > > +
> > > > +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> > > > +    if ( arg == NULL )
> > > > +        return -1;
> > > > +
> > > > +    hypercall.op     = __HYPERVISOR_hvm_op;
> > > > +    hypercall.arg[0] = HVMOP_map_io_range_to_ioreq_server;
> > > > +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> > > > +
> > > > +    arg->domid = domid;
> > > > +    arg->id = id;
> > > > +    arg->type = HVMOP_IO_RANGE_PCI;
> > > > +    arg->start = arg->end = HVMOP_PCI_SBDF((uint64_t)segment,
> > >
> > > Since you have HVMOP_IO_RANGE_PCI do you not want to expose that
> via
> > > this interface?
> > >
> >
> > I could have crunched this into the map_range function. I left it
> > separate because I thought it was more convenient for callers - who I
> > think will most likely deal with one PCI device at a time.
> 
> Oh, so the "ranginess" is an existing feature of the hypercall which you
> are using but doesn't really apply to this use case? (I did think the
> concept of a range of PCI devices wasn't likely to be very useful,
> except perhaps in the "all functions of a device" case perhaps).
> 
> Does the actual hypercall deal with a range? Or does it insist that
> start == end? Looks like the former, I'm happy with that if the
> hypervisor side guys are. Not sure if it is worth a comment somewhere?
> 

It'll deal with a range. I debated whether all functions of a device was useful but decided to stick with a single PCI function in the user API as I think that's what most clients will want. I'll stick a comment in to note it's a singleton range.

  Paul

> > > > @@ -502,6 +505,31 @@ static int setup_guest(xc_interface *xch,
> > > >                       special_pfn(SPECIALPAGE_SHARING));
> > > >
> > > >      /*
> > > > +     * Allocate and clear additional ioreq server pages. The default
> > > > +     * server will use the IOREQ and BUFIOREQ special pages above.
> > > > +     */
> > > > +    for ( i = 0; i < NR_IOREQ_SERVER_PAGES; i++ )
> > > > +    {
> > > > +        xen_pfn_t pfn = ioreq_server_pfn(i);
> > > > +
> > > > +        rc = xc_domain_populate_physmap_exact(xch, dom, 1, 0, 0,
> &pfn);
> > > > +        if ( rc != 0 )
> > > > +        {
> > > > +            PERROR("Could not allocate %d'th ioreq server page.", i);
> > >
> > > This will say things like "1'th". "Could not allocate ioreq server page
> > > %d" avoids that.
> >
> > Ok. It was a cut'n'paste from the special_pfn code just above. I'll
> > fix that while I'm in the neighbourhood.
> 
> Thanks!
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 9/9] ioreq-server: bring the PCI hotplug controller implementation into Xen
  2014-05-07  9:48           ` Ian Campbell
@ 2014-05-07  9:51             ` Paul Durrant
  0 siblings, 0 replies; 57+ messages in thread
From: Paul Durrant @ 2014-05-07  9:51 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Ian Jackson, Stefano Stabellini, Jan Beulich, xen-devel

> -----Original Message-----
> From: Ian Campbell
> Sent: 07 May 2014 10:48
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; Ian Jackson; Stefano Stabellini; Jan Beulich
> Subject: Re: [PATCH v5 9/9] ioreq-server: bring the PCI hotplug controller
> implementation into Xen
> 
> On Tue, 2014-05-06 at 14:35 +0100, Paul Durrant wrote:
> > > When libxl does the qmp call to tell qemu about a new device what stops
> > > qemu from generating hotplug events (SCI or whatever the interrupt
> > > mechanism is) even if Xen is emulating the HP controller? Because Qemu
> > > will still have the controller emulation, it's just "shadowed", right?
> > >
> >
> > The controller emulation is there but since the ports are handled by
> > Xen the I/O to enable the hotplug events will never get through. So,
> > yes QEMU will go through the motions of hotplugging but it will never
> > raise the SCI.
> 
> OK. And just to make doubly sure this I/O to enable the hotplug events
> is definitely not preserved over a migration somewhere? (meaning I don't
> have to think about what happens if it is enabled in qemu and then
> migrated to a system where Xen handles it)
> 

Damn... You have a good point there - it is necessary to preserve the enabled bits. I'm going to drop this patch from the series to allow more testing time. I'll post v6 of the rest shortly.

  Paul

> 
> >
> > > > > What does "hotplugging via qemu" mean here if qemu isn't patched
> to
> > > call
> > > > > this new call?
> > > > >
> > > >
> > > > It means QEMU is implementing the hotplug device. So, if Xen is
> > > > implementing the PCIHP libxl will call the new function and it will
> > > > succeed. If QEMU is implementing the PCIHP then libxl will call the
> > > > new function, it will fail, but QEMU will implicitly do the hotplug.
> > > > Either way, the guest sees a hotplug event.
> > >
> > > Even if Xen is implementing the PCIHP qemu is still involved in plugging
> > > the device, since it has to know about it though, so in some sense
> > > hotplugging is always (partially) via qemu (which is why I'm a bit
> > > confused, but also why the lack of a Qemu side patch surprises me)
> > >
> >
> > Yes, but because QEMU's PCIHP implementation never has any of its
> > enabled bits set it remains silent.
> 
> Makes sense, might be worth mentioing this somewhere (comment, commit
> log), since it is a bit subtle.
> 
> 
> > > > > > diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
> > > > > > index 44d0453..55cb8a2 100644
> > > > > > --- a/tools/libxl/libxl_pci.c
> > > > > > +++ b/tools/libxl/libxl_pci.c
> > > > > > @@ -867,6 +867,13 @@ static int do_pci_add(libxl__gc *gc, uint32_t
> > > > > domid, libxl_device_pci *pcidev, i
> > > > > >          }
> > > > > >          if ( rc )
> > > > > >              return ERROR_FAIL;
> > > > > > +
> > > > > > +        rc = xc_hvm_pci_hotplug(CTX->xch, domid, pcidev->dev, 1);
> > > > > > +        if (rc < 0 && errno != EOPNOTSUPP) {
> > > > > > +            LOGE(ERROR, "Error: xc_hvm_pci_hotplug enable failed");
> > > > > > +            return ERROR_FAIL;
> > > > > > +        }
> > > > >
> > > > > I initially thought you needed to also reset rc to indicate success in
> > > > > the errno==EOPNOTSUPP, but actually the error handling in this
> function
> > > > > is just confusing...
> > > > >
> > > > > But, have you tried hotpluging into a migrated guest?
> > > > >
> > > >
> > > > Not yet, but I can't see why that would be a problem over hotplugging
> > > > at all,
> > >
> > > It's the problems we can't imagine that I'm worried about. This stuff is
> > > certainly subtle in places WRT migration.
> > >
> >
> > I'll give it a go on my dev box. I'm also trying to get this series
> > backported onto Xen 4.4 so I can throw it at XenRT.
> 
> Thanks!
> 
> Ian.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 6/9] ioreq-server: add support for multiple servers
  2014-05-01 12:08 ` [PATCH v5 6/9] ioreq-server: add support for multiple servers Paul Durrant
  2014-05-06 10:46   ` Ian Campbell
@ 2014-05-07 11:13   ` Jan Beulich
  2014-05-07 12:06     ` Paul Durrant
  1 sibling, 1 reply; 57+ messages in thread
From: Jan Beulich @ 2014-05-07 11:13 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel, Ian Jackson, Ian Campbell, Stefano Stabellini

>>> On 01.05.14 at 14:08, <paul.durrant@citrix.com> wrote:
> NOTE: To prevent emulators running in non-privileged guests from
>       potentially allocating very large amounts of xen heap, the core
>       rangeset code has been modified to introduce a hard limit of 256
>       ranges per set.

This is definitely not acceptable - there's no reason to impose any
such limit on e.g. the hardware domain, and we're (or may in the
future want to be) using range sets also for non-domain purposes.
What I'd consider acceptable would be a new range set operation
setting a limit (with the default being unlimited).

>  bool_t hvm_io_pending(struct vcpu *v)
>  {
> -    struct hvm_ioreq_server *s = v->domain->arch.hvm_domain.ioreq_server;
> -    ioreq_t *p;
> -
> -    if ( !s )
> -        return 0;
> +    struct domain *d = v->domain;
> +    struct hvm_ioreq_server *s;
> + 
> +    list_for_each_entry ( s,
> +                          &d->arch.hvm_domain.ioreq_server_list,
> +                          list_entry )
> +    {
> +        ioreq_t *p = get_ioreq(s, v);
> + 
> +        p = get_ioreq(s, v);

Why twice?

> +static int hvm_access_cf8(
> +    int dir, uint32_t port, uint32_t bytes, uint32_t *val)
> +{
> +    struct vcpu *curr = current;
> +    struct hvm_domain *hd = &curr->domain->arch.hvm_domain;
> +    int rc;
> +
> +    BUG_ON(port < 0xcf8);
> +    port -= 0xcf8;
> +
> +    spin_lock(&hd->pci_lock);

Is there really any danger in not having this lock at all? On real
hardware, if the OS doesn't properly serialize accesses, the
result is going to be undefined too. All I think you need to make
sure is that ->pci_cf8 never gets updated non-atomically.

> +    if ( dir == IOREQ_WRITE )
> +    {
> +        switch ( bytes )
> +        {
> +        case 4:
> +            hd->pci_cf8 = *val;
> +            break;
> +
> +        case 2:

I don't think handling other than 4-byte accesses at the precise
address 0xCF8 is necessary or even correct here. Port 0xCF9,
when accessed as a byte, commonly has another meaning for
example.

> +static int hvm_access_cfc(
> +    int dir, uint32_t port, uint32_t bytes, uint32_t *val)
> +{
> +    struct vcpu *curr = current;
> +    struct hvm_domain *hd = &curr->domain->arch.hvm_domain;
> +    int rc;
> +
> +    BUG_ON(port < 0xcfc);
> +    port -= 0xcfc;
> +
> +    spin_lock(&hd->pci_lock);
> +
> +    if ( hd->pci_cf8 & (1 << 31) ) {
> +        /* Fall through to an emulator */
> +        rc = X86EMUL_UNHANDLEABLE;
> +    } else {
> +        /* Config access disabled */

Why does this not also get passed through to an emulator?

> +static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s, 
> +                                            bool_t is_default)
> +{
> +    int i;

Please try to avoid using variables of signed types for things that
can never be negative.

> +    int rc;
> +
> +    if ( is_default )
> +        goto done;
> +
> +    for ( i = 0; i < MAX_IO_RANGE_TYPE; i++ ) {
> +        char *name;
> +
> +        rc = asprintf(&name, "ioreq_server %d:%d %s", s->domid, s->id,

Is it really useful to include the domain ID in that name?

> @@ -764,52 +1019,270 @@ static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
>      domain_pause(d);
>      spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
>  
> -    rc = -EEXIST;
> -    if ( d->arch.hvm_domain.ioreq_server != NULL )
> -        goto fail2;
> +    rc = -EEXIST;
> +    if ( is_default && d->arch.hvm_domain.default_ioreq_server != NULL )
> +        goto fail2;
> +
> +    rc = hvm_ioreq_server_init(s, d, domid, is_default,
> +                               d->arch.hvm_domain.ioreq_server_id++);

What about wraparound here?

> +static void hvm_destroy_ioreq_server(struct domain *d, ioservid_t id)
> +{
> +    struct hvm_ioreq_server *s;
> +
> +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> +
> +    list_for_each_entry ( s,
> +                          &d->arch.hvm_domain.ioreq_server_list,
> +                          list_entry )
> +    {
> +        bool_t is_default = ( s == d->arch.hvm_domain.default_ioreq_server);
> +
> +        if ( s->id != id )
> +            continue;
> +
> +        domain_pause(d);
> +
> +        if ( is_default )
> +            d->arch.hvm_domain.default_ioreq_server = NULL;
> +
> +        --d->arch.hvm_domain.ioreq_server_count;
> +        list_del_init(&s->list_entry);

list_del() again, as s gets freed below.

> @@ -1744,11 +2206,94 @@ void hvm_vcpu_down(struct vcpu *v)
>      }
>  }
>  
> +static struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
> +                                                        ioreq_t *p)
> +{
> +#define CF8_BDF(cf8)  (((cf8) & 0x00ffff00) >> 8)
> +#define CF8_ADDR(cf8) ((cf8) & 0x000000ff)

Are you aware that AMD has this port-based extended config space
access mechanism with the high 4 address bits being (iirc) at bits
24-27 of the port 0xCF8 value? You shouldn't be unconditionally
discarding these here I think (the server ought to know whether it
can handle such).

> +
> +    struct hvm_ioreq_server *s;
> +    uint8_t type;
> +    uint64_t addr;
> +
> +    if ( d->arch.hvm_domain.ioreq_server_count == 1 &&
> +         d->arch.hvm_domain.default_ioreq_server )
> +        goto done;
> +
> +    if ( p->type == IOREQ_TYPE_PIO &&
> +         (p->addr & ~3) == 0xcfc )
> +    {
> +        uint32_t cf8;
> +        uint32_t sbdf;
> +
> +        /* PCI config data cycle */
> +        type = IOREQ_TYPE_PCI_CONFIG;
> +
> +        spin_lock(&d->arch.hvm_domain.pci_lock);
> +        cf8 = d->arch.hvm_domain.pci_cf8;
> +        sbdf = HVMOP_PCI_SBDF(0,
> +                              PCI_BUS(CF8_BDF(cf8)),
> +                              PCI_SLOT(CF8_BDF(cf8)),
> +                              PCI_FUNC(CF8_BDF(cf8)));
> +        addr = ((uint64_t)sbdf << 32) | (CF8_ADDR(cf8) + (p->addr & 3));
> +        spin_unlock(&d->arch.hvm_domain.pci_lock);
> +    }
> +    else
> +    {
> +        type = p->type;
> +        addr = p->addr;
> +    }
> +
> +    switch ( type )
> +    {
> +    case IOREQ_TYPE_COPY:
> +    case IOREQ_TYPE_PIO:
> +    case IOREQ_TYPE_PCI_CONFIG:
> +        break;
> +    default:
> +        goto done;
> +    }

This switch would better go into the "else" above. And with that the
question arises whether an input of IOREQ_TYPE_PCI_CONFIG is
actually valid here.

> +    list_for_each_entry ( s,
> +                          &d->arch.hvm_domain.ioreq_server_list,
> +                          list_entry )
> +    {
> +        if ( s == d->arch.hvm_domain.default_ioreq_server )
> +            continue;
> +
> +        switch ( type )
> +        {
> +        case IOREQ_TYPE_COPY:
> +        case IOREQ_TYPE_PIO:
> +            if ( rangeset_contains_singleton(s->range[type], addr) )
> +                goto found;
> +
> +            break;
> +        case IOREQ_TYPE_PCI_CONFIG:
> +            if ( rangeset_contains_singleton(s->range[type], addr >> 32) ) {
> +                p->type = type;
> +                p->addr = addr;
> +                goto found;
> +            }
> +
> +            break;
> +        }
> +    }
> +
> + done:
> +    s = d->arch.hvm_domain.default_ioreq_server;
> +
> + found:
> +    return s;

The actions at these labels being rather simple, I'm once again having
a hard time seeing why the various goto-s up there can't be return-s,
at once making it much more obvious to the reader what is happening
in the respective cases.

> -bool_t hvm_send_assist_req(ioreq_t *proto_p)
> +bool_t hvm_send_assist_req_to_ioreq_server(struct hvm_ioreq_server *s,
> +                                           struct vcpu *v,

Both callers effectively pass "current" here - if you really want to
retain the parameter, please name it "curr" to document that fact.

> +void hvm_broadcast_assist_req(ioreq_t *p)
> +{
> +    struct vcpu *v = current;
> +    struct domain *d = v->domain;
> +    struct hvm_ioreq_server *s;
> +
> +    list_for_each_entry ( s,
> +                          &d->arch.hvm_domain.ioreq_server_list,
> +                          list_entry )
> +        (void) hvm_send_assist_req_to_ioreq_server(s, v, p);
> +}

Is there possibly any way to make sure only operations not having
any results and not affecting guest visible state changes can make
it here?

> +    rc = hvm_create_ioreq_server(d, curr_d->domain_id, 0, &op.id);
> +    if ( rc != 0 )
> +        goto out;
> ...
> +    if ( (rc = hvm_get_ioreq_server_info(d, op.id,
> +                                         &op.ioreq_pfn,
> +                                         &op.bufioreq_pfn, 
> +                                         &op.bufioreq_port)) < 0 )
> +        goto out;

May I ask that you use consistent style for similar operations?

> +static int hvmop_destroy_ioreq_server(
> +    XEN_GUEST_HANDLE_PARAM(xen_hvm_destroy_ioreq_server_t) uop)
> +{
> +    xen_hvm_destroy_ioreq_server_t op;
> +    struct domain *d;
> +    int rc;
> +
> +    if ( copy_from_guest(&op, uop, 1) )
> +        return -EFAULT;
> +
> +    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
> +    if ( rc != 0 )
> +        return rc;
> +
> +    rc = -EINVAL;
> +    if ( !is_hvm_domain(d) )
> +        goto out;
> +
> +    hvm_destroy_ioreq_server(d, op.id);
> +    rc = 0;

Shouldn't this minimally return -ENOENT for an invalid or no in use ID?

> @@ -4484,6 +5189,31 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>  
>      switch ( op &= HVMOP_op_mask )
>      {
> +    case HVMOP_create_ioreq_server:
> +        rc = hvmop_create_ioreq_server(
> +            guest_handle_cast(arg, xen_hvm_create_ioreq_server_t));
> +        break;
> +    
> +    case HVMOP_get_ioreq_server_info:
> +        rc = hvmop_get_ioreq_server_info(
> +            guest_handle_cast(arg, xen_hvm_get_ioreq_server_info_t));
> +        break;
> +    
> +    case HVMOP_map_io_range_to_ioreq_server:
> +        rc = hvmop_map_io_range_to_ioreq_server(
> +            guest_handle_cast(arg, xen_hvm_io_range_t));
> +        break;
> +    
> +    case HVMOP_unmap_io_range_from_ioreq_server:
> +        rc = hvmop_unmap_io_range_from_ioreq_server(
> +            guest_handle_cast(arg, xen_hvm_io_range_t));
> +        break;
> +    
> +    case HVMOP_destroy_ioreq_server:
> +        rc = hvmop_destroy_ioreq_server(
> +            guest_handle_cast(arg, xen_hvm_destroy_ioreq_server_t));
> +        break;

Neither here nor in the individual functions are any XSM checks - I
don't think you want to permit any domain to issue any of these on
an arbitrary other domain?

> +int vasprintf(char **bufp, const char *fmt, va_list args)
> +{
> +    va_list args_copy;
> +    size_t size;
> +    char *buf, dummy[1];
> +
> +    va_copy(args_copy, args);
> +    size = vsnprintf(dummy, 0, fmt, args_copy);
> +    va_end(args_copy);
> +
> +    buf = _xmalloc(++size, 1);

xmalloc_array(char, ++size)

> +EXPORT_SYMBOL(vasprintf);

I realize that there are more of these in that file, but since they're
useless to us, let's at least not add any new ones.

> @@ -46,7 +48,10 @@ struct hvm_ioreq_vcpu {
>      evtchn_port_t    ioreq_evtchn;
>  };
>  
> +#define MAX_IO_RANGE_TYPE (HVMOP_IO_RANGE_PCI + 1)

A common misconception: You mean NR_ here, not MAX_ (or else
you'd have to drop the " + 1").

>  struct hvm_domain {
> -    struct hvm_ioreq_server *ioreq_server;
> +    /* Guest page range used for non-default ioreq servers */
> +    unsigned long           ioreq_gmfn_base;
> +    unsigned long           ioreq_gmfn_mask;
> +    unsigned int            ioreq_gmfn_count;
> +
> +    /* Lock protects all other values in the following block */
>      spinlock_t              ioreq_server_lock;
> +    ioservid_t              ioreq_server_id;
> +    unsigned int            ioreq_server_count;
> +    struct list_head        ioreq_server_list;

Rather than using all these redundant prefixes, could I talk you into
using sub-structures instead:

    struct {
        struct {
        } gmfn;
        struct {
        } server;
    } ioreq;

?

> +typedef uint16_t ioservid_t;
> +DEFINE_XEN_GUEST_HANDLE(ioservid_t);

I don't think you really need this handle type.

> +struct xen_hvm_get_ioreq_server_info {
> +    domid_t domid;               /* IN - domain to be serviced */
> +    ioservid_t id;               /* IN - server id */
> +    xen_pfn_t ioreq_pfn;         /* OUT - sync ioreq pfn */
> +    xen_pfn_t bufioreq_pfn;      /* OUT - buffered ioreq pfn */
> +    evtchn_port_t bufioreq_port; /* OUT - buffered ioreq port */
> +};

Afaict this structure's layout will differ between 32- and 64-bit
tool stack domains, i.e. you need to add some explicit padding/
alignment. Or even better yet, shuffle the fields - bufioreq_port
fits in the 32-bit hole after id.

> +struct xen_hvm_destroy_ioreq_server {
> +    domid_t domid; /* IN - domain to be serviced */
> +    ioservid_t id; /* IN - server id */
> +};

Thats the same structure as for create, just that the last field is an
output there. Please have just one structure for creation/destruction.

Jan

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 6/9] ioreq-server: add support for multiple servers
  2014-05-07 11:13   ` Jan Beulich
@ 2014-05-07 12:06     ` Paul Durrant
  2014-05-07 12:23       ` Jan Beulich
  0 siblings, 1 reply; 57+ messages in thread
From: Paul Durrant @ 2014-05-07 12:06 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Ian Jackson, Stefano Stabellini, Ian Campbell, xen-devel

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 07 May 2014 12:13
> To: Paul Durrant
> Cc: Ian Campbell; Ian Jackson; Stefano Stabellini; xen-devel@lists.xen.org
> Subject: Re: [PATCH v5 6/9] ioreq-server: add support for multiple servers
> 
> >>> On 01.05.14 at 14:08, <paul.durrant@citrix.com> wrote:
> > NOTE: To prevent emulators running in non-privileged guests from
> >       potentially allocating very large amounts of xen heap, the core
> >       rangeset code has been modified to introduce a hard limit of 256
> >       ranges per set.
> 
> This is definitely not acceptable - there's no reason to impose any
> such limit on e.g. the hardware domain, and we're (or may in the
> future want to be) using range sets also for non-domain purposes.
> What I'd consider acceptable would be a new range set operation
> setting a limit (with the default being unlimited).

Ok, I'll make it optional when the rangeset is created.

> 
> >  bool_t hvm_io_pending(struct vcpu *v)
> >  {
> > -    struct hvm_ioreq_server *s = v->domain-
> >arch.hvm_domain.ioreq_server;
> > -    ioreq_t *p;
> > -
> > -    if ( !s )
> > -        return 0;
> > +    struct domain *d = v->domain;
> > +    struct hvm_ioreq_server *s;
> > +
> > +    list_for_each_entry ( s,
> > +                          &d->arch.hvm_domain.ioreq_server_list,
> > +                          list_entry )
> > +    {
> > +        ioreq_t *p = get_ioreq(s, v);
> > +
> > +        p = get_ioreq(s, v);
> 
> Why twice?
> 

Rebasing problem by the looks of it. I'll fix.

> > +static int hvm_access_cf8(
> > +    int dir, uint32_t port, uint32_t bytes, uint32_t *val)
> > +{
> > +    struct vcpu *curr = current;
> > +    struct hvm_domain *hd = &curr->domain->arch.hvm_domain;
> > +    int rc;
> > +
> > +    BUG_ON(port < 0xcf8);
> > +    port -= 0xcf8;
> > +
> > +    spin_lock(&hd->pci_lock);
> 
> Is there really any danger in not having this lock at all? On real
> hardware, if the OS doesn't properly serialize accesses, the
> result is going to be undefined too. All I think you need to make
> sure is that ->pci_cf8 never gets updated non-atomically.
> 

I could remove it, but is it doing any harm?

> > +    if ( dir == IOREQ_WRITE )
> > +    {
> > +        switch ( bytes )
> > +        {
> > +        case 4:
> > +            hd->pci_cf8 = *val;
> > +            break;
> > +
> > +        case 2:
> 
> I don't think handling other than 4-byte accesses at the precise
> address 0xCF8 is necessary or even correct here. Port 0xCF9,
> when accessed as a byte, commonly has another meaning for
> example.
> 

Ok.

> > +static int hvm_access_cfc(
> > +    int dir, uint32_t port, uint32_t bytes, uint32_t *val)
> > +{
> > +    struct vcpu *curr = current;
> > +    struct hvm_domain *hd = &curr->domain->arch.hvm_domain;
> > +    int rc;
> > +
> > +    BUG_ON(port < 0xcfc);
> > +    port -= 0xcfc;
> > +
> > +    spin_lock(&hd->pci_lock);
> > +
> > +    if ( hd->pci_cf8 & (1 << 31) ) {
> > +        /* Fall through to an emulator */
> > +        rc = X86EMUL_UNHANDLEABLE;
> > +    } else {
> > +        /* Config access disabled */
> 
> Why does this not also get passed through to an emulator?
> 

I was trying to be consistent with QEMU here. It squashes any data accesses if cf8 has the top bit set.

> > +static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s,
> > +                                            bool_t is_default)
> > +{
> > +    int i;
> 
> Please try to avoid using variables of signed types for things that
> can never be negative.
> 

Ok.

> > +    int rc;
> > +
> > +    if ( is_default )
> > +        goto done;
> > +
> > +    for ( i = 0; i < MAX_IO_RANGE_TYPE; i++ ) {
> > +        char *name;
> > +
> > +        rc = asprintf(&name, "ioreq_server %d:%d %s", s->domid, s->id,
> 
> Is it really useful to include the domain ID in that name?
> 

Hmm, probably not.

> > @@ -764,52 +1019,270 @@ static int hvm_create_ioreq_server(struct
> domain *d, domid_t domid)
> >      domain_pause(d);
> >      spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> >
> > -    rc = -EEXIST;
> > -    if ( d->arch.hvm_domain.ioreq_server != NULL )
> > -        goto fail2;
> > +    rc = -EEXIST;
> > +    if ( is_default && d->arch.hvm_domain.default_ioreq_server != NULL )
> > +        goto fail2;
> > +
> > +    rc = hvm_ioreq_server_init(s, d, domid, is_default,
> > +                               d->arch.hvm_domain.ioreq_server_id++);
> 
> What about wraparound here?

It's pretty big but I guess it could cause weirdness so I'll add a check for uniqueness.

> > +static void hvm_destroy_ioreq_server(struct domain *d, ioservid_t id)
> > +{
> > +    struct hvm_ioreq_server *s;
> > +
> > +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> > +
> > +    list_for_each_entry ( s,
> > +                          &d->arch.hvm_domain.ioreq_server_list,
> > +                          list_entry )
> > +    {
> > +        bool_t is_default = ( s == d-
> >arch.hvm_domain.default_ioreq_server);
> > +
> > +        if ( s->id != id )
> > +            continue;
> > +
> > +        domain_pause(d);
> > +
> > +        if ( is_default )
> > +            d->arch.hvm_domain.default_ioreq_server = NULL;
> > +
> > +        --d->arch.hvm_domain.ioreq_server_count;
> > +        list_del_init(&s->list_entry);
> 
> list_del() again, as s gets freed below.
> 

Ok.

> > @@ -1744,11 +2206,94 @@ void hvm_vcpu_down(struct vcpu *v)
> >      }
> >  }
> >
> > +static struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain
> *d,
> > +                                                        ioreq_t *p)
> > +{
> > +#define CF8_BDF(cf8)  (((cf8) & 0x00ffff00) >> 8)
> > +#define CF8_ADDR(cf8) ((cf8) & 0x000000ff)
> 
> Are you aware that AMD has this port-based extended config space
> access mechanism with the high 4 address bits being (iirc) at bits
> 24-27 of the port 0xCF8 value? You shouldn't be unconditionally
> discarding these here I think (the server ought to know whether it
> can handle such).

No, I wasn't aware of that. I'll look into that.

> 
> > +
> > +    struct hvm_ioreq_server *s;
> > +    uint8_t type;
> > +    uint64_t addr;
> > +
> > +    if ( d->arch.hvm_domain.ioreq_server_count == 1 &&
> > +         d->arch.hvm_domain.default_ioreq_server )
> > +        goto done;
> > +
> > +    if ( p->type == IOREQ_TYPE_PIO &&
> > +         (p->addr & ~3) == 0xcfc )
> > +    {
> > +        uint32_t cf8;
> > +        uint32_t sbdf;
> > +
> > +        /* PCI config data cycle */
> > +        type = IOREQ_TYPE_PCI_CONFIG;
> > +
> > +        spin_lock(&d->arch.hvm_domain.pci_lock);
> > +        cf8 = d->arch.hvm_domain.pci_cf8;
> > +        sbdf = HVMOP_PCI_SBDF(0,
> > +                              PCI_BUS(CF8_BDF(cf8)),
> > +                              PCI_SLOT(CF8_BDF(cf8)),
> > +                              PCI_FUNC(CF8_BDF(cf8)));
> > +        addr = ((uint64_t)sbdf << 32) | (CF8_ADDR(cf8) + (p->addr & 3));
> > +        spin_unlock(&d->arch.hvm_domain.pci_lock);
> > +    }
> > +    else
> > +    {
> > +        type = p->type;
> > +        addr = p->addr;
> > +    }
> > +
> > +    switch ( type )
> > +    {
> > +    case IOREQ_TYPE_COPY:
> > +    case IOREQ_TYPE_PIO:
> > +    case IOREQ_TYPE_PCI_CONFIG:
> > +        break;
> > +    default:
> > +        goto done;
> > +    }
> 
> This switch would better go into the "else" above. And with that the
> question arises whether an input of IOREQ_TYPE_PCI_CONFIG is
> actually valid here.

It's not. I'll re-structure to try to make it clearer.

> 
> > +    list_for_each_entry ( s,
> > +                          &d->arch.hvm_domain.ioreq_server_list,
> > +                          list_entry )
> > +    {
> > +        if ( s == d->arch.hvm_domain.default_ioreq_server )
> > +            continue;
> > +
> > +        switch ( type )
> > +        {
> > +        case IOREQ_TYPE_COPY:
> > +        case IOREQ_TYPE_PIO:
> > +            if ( rangeset_contains_singleton(s->range[type], addr) )
> > +                goto found;
> > +
> > +            break;
> > +        case IOREQ_TYPE_PCI_CONFIG:
> > +            if ( rangeset_contains_singleton(s->range[type], addr >> 32) ) {
> > +                p->type = type;
> > +                p->addr = addr;
> > +                goto found;
> > +            }
> > +
> > +            break;
> > +        }
> > +    }
> > +
> > + done:
> > +    s = d->arch.hvm_domain.default_ioreq_server;
> > +
> > + found:
> > +    return s;
> 
> The actions at these labels being rather simple, I'm once again having
> a hard time seeing why the various goto-s up there can't be return-s,
> at once making it much more obvious to the reader what is happening
> in the respective cases.
> 

Ok.

> > -bool_t hvm_send_assist_req(ioreq_t *proto_p)
> > +bool_t hvm_send_assist_req_to_ioreq_server(struct hvm_ioreq_server
> *s,
> > +                                           struct vcpu *v,
> 
> Both callers effectively pass "current" here - if you really want to
> retain the parameter, please name it "curr" to document that fact.
> 

Ok.

> > +void hvm_broadcast_assist_req(ioreq_t *p)
> > +{
> > +    struct vcpu *v = current;
> > +    struct domain *d = v->domain;
> > +    struct hvm_ioreq_server *s;
> > +
> > +    list_for_each_entry ( s,
> > +                          &d->arch.hvm_domain.ioreq_server_list,
> > +                          list_entry )
> > +        (void) hvm_send_assist_req_to_ioreq_server(s, v, p);
> > +}
> 
> Is there possibly any way to make sure only operations not having
> any results and not affecting guest visible state changes can make
> it here?
> 

Well, I could whitelist the IOREQ type(s) here I guess.

> > +    rc = hvm_create_ioreq_server(d, curr_d->domain_id, 0, &op.id);
> > +    if ( rc != 0 )
> > +        goto out;
> > ...
> > +    if ( (rc = hvm_get_ioreq_server_info(d, op.id,
> > +                                         &op.ioreq_pfn,
> > +                                         &op.bufioreq_pfn,
> > +                                         &op.bufioreq_port)) < 0 )
> > +        goto out;
> 
> May I ask that you use consistent style for similar operations?
> 

Yes - I'm not why that 'if' has the assignment in it.

> > +static int hvmop_destroy_ioreq_server(
> > +    XEN_GUEST_HANDLE_PARAM(xen_hvm_destroy_ioreq_server_t)
> uop)
> > +{
> > +    xen_hvm_destroy_ioreq_server_t op;
> > +    struct domain *d;
> > +    int rc;
> > +
> > +    if ( copy_from_guest(&op, uop, 1) )
> > +        return -EFAULT;
> > +
> > +    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
> > +    if ( rc != 0 )
> > +        return rc;
> > +
> > +    rc = -EINVAL;
> > +    if ( !is_hvm_domain(d) )
> > +        goto out;
> > +
> > +    hvm_destroy_ioreq_server(d, op.id);
> > +    rc = 0;
> 
> Shouldn't this minimally return -ENOENT for an invalid or no in use ID?
> 

Yes, I guess it probably should.

> > @@ -4484,6 +5189,31 @@ long do_hvm_op(unsigned long op,
> XEN_GUEST_HANDLE_PARAM(void) arg)
> >
> >      switch ( op &= HVMOP_op_mask )
> >      {
> > +    case HVMOP_create_ioreq_server:
> > +        rc = hvmop_create_ioreq_server(
> > +            guest_handle_cast(arg, xen_hvm_create_ioreq_server_t));
> > +        break;
> > +
> > +    case HVMOP_get_ioreq_server_info:
> > +        rc = hvmop_get_ioreq_server_info(
> > +            guest_handle_cast(arg, xen_hvm_get_ioreq_server_info_t));
> > +        break;
> > +
> > +    case HVMOP_map_io_range_to_ioreq_server:
> > +        rc = hvmop_map_io_range_to_ioreq_server(
> > +            guest_handle_cast(arg, xen_hvm_io_range_t));
> > +        break;
> > +
> > +    case HVMOP_unmap_io_range_from_ioreq_server:
> > +        rc = hvmop_unmap_io_range_from_ioreq_server(
> > +            guest_handle_cast(arg, xen_hvm_io_range_t));
> > +        break;
> > +
> > +    case HVMOP_destroy_ioreq_server:
> > +        rc = hvmop_destroy_ioreq_server(
> > +            guest_handle_cast(arg, xen_hvm_destroy_ioreq_server_t));
> > +        break;
> 
> Neither here nor in the individual functions are any XSM checks - I
> don't think you want to permit any domain to issue any of these on
> an arbitrary other domain?
> 

Ok.

> > +int vasprintf(char **bufp, const char *fmt, va_list args)
> > +{
> > +    va_list args_copy;
> > +    size_t size;
> > +    char *buf, dummy[1];
> > +
> > +    va_copy(args_copy, args);
> > +    size = vsnprintf(dummy, 0, fmt, args_copy);
> > +    va_end(args_copy);
> > +
> > +    buf = _xmalloc(++size, 1);
> 
> xmalloc_array(char, ++size)
> 

Ok.

> > +EXPORT_SYMBOL(vasprintf);
> 
> I realize that there are more of these in that file, but since they're
> useless to us, let's at least not add any new ones.
> 

Ok.

> > @@ -46,7 +48,10 @@ struct hvm_ioreq_vcpu {
> >      evtchn_port_t    ioreq_evtchn;
> >  };
> >
> > +#define MAX_IO_RANGE_TYPE (HVMOP_IO_RANGE_PCI + 1)
> 
> A common misconception: You mean NR_ here, not MAX_ (or else
> you'd have to drop the " + 1").
> 

Yes, indeed I do mean NR_.

> >  struct hvm_domain {
> > -    struct hvm_ioreq_server *ioreq_server;
> > +    /* Guest page range used for non-default ioreq servers */
> > +    unsigned long           ioreq_gmfn_base;
> > +    unsigned long           ioreq_gmfn_mask;
> > +    unsigned int            ioreq_gmfn_count;
> > +
> > +    /* Lock protects all other values in the following block */
> >      spinlock_t              ioreq_server_lock;
> > +    ioservid_t              ioreq_server_id;
> > +    unsigned int            ioreq_server_count;
> > +    struct list_head        ioreq_server_list;
> 
> Rather than using all these redundant prefixes, could I talk you into
> using sub-structures instead:
> 

Ok, if you prefer that style.

>     struct {
>         struct {
>         } gmfn;
>         struct {
>         } server;
>     } ioreq;
> 
> ?
> 
> > +typedef uint16_t ioservid_t;
> > +DEFINE_XEN_GUEST_HANDLE(ioservid_t);
> 
> I don't think you really need this handle type.
> 

Ok.

> > +struct xen_hvm_get_ioreq_server_info {
> > +    domid_t domid;               /* IN - domain to be serviced */
> > +    ioservid_t id;               /* IN - server id */
> > +    xen_pfn_t ioreq_pfn;         /* OUT - sync ioreq pfn */
> > +    xen_pfn_t bufioreq_pfn;      /* OUT - buffered ioreq pfn */
> > +    evtchn_port_t bufioreq_port; /* OUT - buffered ioreq port */
> > +};
> 
> Afaict this structure's layout will differ between 32- and 64-bit
> tool stack domains, i.e. you need to add some explicit padding/
> alignment. Or even better yet, shuffle the fields - bufioreq_port
> fits in the 32-bit hole after id.
> 

Ok, I'll re-order.

> > +struct xen_hvm_destroy_ioreq_server {
> > +    domid_t domid; /* IN - domain to be serviced */
> > +    ioservid_t id; /* IN - server id */
> > +};
> 
> Thats the same structure as for create, just that the last field is an
> output there. Please have just one structure for creation/destruction.
> 

Creation gains another parameter in a subsequent patch, so I think it's best to leave them separate.

  Paul

> Jan

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 7/9] ioreq-server: remove p2m entries when server is enabled
  2014-05-01 12:08 ` [PATCH v5 7/9] ioreq-server: remove p2m entries when server is enabled Paul Durrant
  2014-05-06 10:48   ` Ian Campbell
@ 2014-05-07 12:09   ` Jan Beulich
  1 sibling, 0 replies; 57+ messages in thread
From: Jan Beulich @ 2014-05-07 12:09 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel, Ian Jackson, Ian Campbell, Stefano Stabellini

>>> On 01.05.14 at 14:08, <paul.durrant@citrix.com> wrote:
> +static int hvm_add_ioreq_gmfn(
> +    struct domain *d, struct hvm_ioreq_page *iorp)
> +{
> +    return guest_physmap_add_page(d, iorp->gmfn,
> +                                  page_to_mfn(iorp->page), 0);

What if the guest meanwhile put something else at that address? Or
is the address range protected by some means (e.g. marked reserved
in the E820 map)?

> +    case HVMOP_set_ioreq_server_state:
> +        rc = hvmop_set_ioreq_server_state(
> +            guest_handle_cast(arg, xen_hvm_set_ioreq_server_state_t));
> +        break;

Again no XSM operation here or in the called function?

> @@ -68,6 +68,7 @@ struct hvm_ioreq_server {
>      spinlock_t             bufioreq_lock;
>      evtchn_port_t          bufioreq_evtchn;
>      struct rangeset        *range[MAX_IO_RANGE_TYPE];
> +    bool_t                 enabled;
>  };

Is there no 1-byte hole anywhere in the structure where this could
be put in a more efficient manner?

> --- a/xen/include/public/hvm/hvm_op.h
> +++ b/xen/include/public/hvm/hvm_op.h
> @@ -340,6 +340,22 @@ struct xen_hvm_destroy_ioreq_server {
>  typedef struct xen_hvm_destroy_ioreq_server xen_hvm_destroy_ioreq_server_t;
>  DEFINE_XEN_GUEST_HANDLE(xen_hvm_destroy_ioreq_server_t);
>  
> +/*
> + * HVMOP_set_ioreq_server_state: Enable or disable the IOREQ Server <id> servicing
> + *                               domain <domid>.
> + *
> + * The IOREQ Server will not be passed any emulation requests until it is in the
> + * enabled state.
> + */
> +#define HVMOP_set_ioreq_server_state 22
> +struct xen_hvm_set_ioreq_server_state {
> +    domid_t domid;   /* IN - domain to be serviced */
> +    ioservid_t id;   /* IN - server id */
> +    uint8_t enabled; /* IN - enabled? */    
> +};
> +typedef struct xen_hvm_set_ioreq_server_state xen_hvm_set_ioreq_server_state_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_hvm_set_ioreq_server_state_t);
> +
>  #endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */

No change to HVMOP_get_ioreq_server_info at all?

Jan

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 8/9] ioreq-server: make buffered ioreq handling optional
  2014-05-01 12:08 ` [PATCH v5 8/9] ioreq-server: make buffered ioreq handling optional Paul Durrant
  2014-05-06 10:52   ` Ian Campbell
@ 2014-05-07 12:13   ` Jan Beulich
  1 sibling, 0 replies; 57+ messages in thread
From: Jan Beulich @ 2014-05-07 12:13 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel, Ian Jackson, Ian Campbell, Stefano Stabellini

>>> On 01.05.14 at 14:08, <paul.durrant@citrix.com> wrote:
>  struct xen_hvm_create_ioreq_server {
> -    domid_t domid; /* IN - domain to be serviced */
> -    ioservid_t id; /* OUT - server id */
> +    domid_t domid;           /* IN - domain to be serviced */
> +    uint8_t handle_bufioreq; /* IN - should server handle buffered ioreqs */
> +    ioservid_t id;           /* OUT - server id */
>  };

Ah, okay, here you change the structure, so please ignore my
request to fold it with the destroy one.

Jan

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 6/9] ioreq-server: add support for multiple servers
  2014-05-07 12:06     ` Paul Durrant
@ 2014-05-07 12:23       ` Jan Beulich
  2014-05-07 12:25         ` Paul Durrant
  0 siblings, 1 reply; 57+ messages in thread
From: Jan Beulich @ 2014-05-07 12:23 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Ian Jackson, Stefano Stabellini, Ian Campbell, xen-devel

>>> On 07.05.14 at 14:06, <Paul.Durrant@citrix.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> >>> On 01.05.14 at 14:08, <paul.durrant@citrix.com> wrote:
>> > +static int hvm_access_cf8(
>> > +    int dir, uint32_t port, uint32_t bytes, uint32_t *val)
>> > +{
>> > +    struct vcpu *curr = current;
>> > +    struct hvm_domain *hd = &curr->domain->arch.hvm_domain;
>> > +    int rc;
>> > +
>> > +    BUG_ON(port < 0xcf8);
>> > +    port -= 0xcf8;
>> > +
>> > +    spin_lock(&hd->pci_lock);
>> 
>> Is there really any danger in not having this lock at all? On real
>> hardware, if the OS doesn't properly serialize accesses, the
>> result is going to be undefined too. All I think you need to make
>> sure is that ->pci_cf8 never gets updated non-atomically.
>> 
> 
> I could remove it, but is it doing any harm?

Any spin lock it harmful to performance.

>> > +static int hvm_access_cfc(
>> > +    int dir, uint32_t port, uint32_t bytes, uint32_t *val)
>> > +{
>> > +    struct vcpu *curr = current;
>> > +    struct hvm_domain *hd = &curr->domain->arch.hvm_domain;
>> > +    int rc;
>> > +
>> > +    BUG_ON(port < 0xcfc);
>> > +    port -= 0xcfc;
>> > +
>> > +    spin_lock(&hd->pci_lock);
>> > +
>> > +    if ( hd->pci_cf8 & (1 << 31) ) {
>> > +        /* Fall through to an emulator */
>> > +        rc = X86EMUL_UNHANDLEABLE;
>> > +    } else {
>> > +        /* Config access disabled */
>> 
>> Why does this not also get passed through to an emulator?
>> 
> 
> I was trying to be consistent with QEMU here. It squashes any data accesses 
> if cf8 has the top bit set.

But afaict with that dropped the entire function can go away.

>> > +void hvm_broadcast_assist_req(ioreq_t *p)
>> > +{
>> > +    struct vcpu *v = current;
>> > +    struct domain *d = v->domain;
>> > +    struct hvm_ioreq_server *s;
>> > +
>> > +    list_for_each_entry ( s,
>> > +                          &d->arch.hvm_domain.ioreq_server_list,
>> > +                          list_entry )
>> > +        (void) hvm_send_assist_req_to_ioreq_server(s, v, p);
>> > +}
>> 
>> Is there possibly any way to make sure only operations not having
>> any results and not affecting guest visible state changes can make
>> it here?
>> 
> 
> Well, I could whitelist the IOREQ type(s) here I guess.

By way of ASSERT() perhaps then...

>> >  struct hvm_domain {
>> > -    struct hvm_ioreq_server *ioreq_server;
>> > +    /* Guest page range used for non-default ioreq servers */
>> > +    unsigned long           ioreq_gmfn_base;
>> > +    unsigned long           ioreq_gmfn_mask;
>> > +    unsigned int            ioreq_gmfn_count;
>> > +
>> > +    /* Lock protects all other values in the following block */
>> >      spinlock_t              ioreq_server_lock;
>> > +    ioservid_t              ioreq_server_id;
>> > +    unsigned int            ioreq_server_count;
>> > +    struct list_head        ioreq_server_list;
>> 
>> Rather than using all these redundant prefixes, could I talk you into
>> using sub-structures instead:
> 
> Ok, if you prefer that style.

I'm not insisting here since I know some others are of different
opinion. But doing it that way may potentially allow passing the
address to just a sub-structure to functions, in turn making those
functions more legible. But as said - if you're not convinced, leave
it as is.

Jan

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 6/9] ioreq-server: add support for multiple servers
  2014-05-07 12:23       ` Jan Beulich
@ 2014-05-07 12:25         ` Paul Durrant
  2014-05-07 12:34           ` Jan Beulich
  0 siblings, 1 reply; 57+ messages in thread
From: Paul Durrant @ 2014-05-07 12:25 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Ian Jackson, Stefano Stabellini, Ian Campbell, xen-devel

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 07 May 2014 13:23
> To: Paul Durrant
> Cc: Ian Campbell; Ian Jackson; Stefano Stabellini; xen-devel@lists.xen.org
> Subject: RE: [PATCH v5 6/9] ioreq-server: add support for multiple servers
> 
> >>> On 07.05.14 at 14:06, <Paul.Durrant@citrix.com> wrote:
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> >>> On 01.05.14 at 14:08, <paul.durrant@citrix.com> wrote:
> >> > +static int hvm_access_cf8(
> >> > +    int dir, uint32_t port, uint32_t bytes, uint32_t *val)
> >> > +{
> >> > +    struct vcpu *curr = current;
> >> > +    struct hvm_domain *hd = &curr->domain->arch.hvm_domain;
> >> > +    int rc;
> >> > +
> >> > +    BUG_ON(port < 0xcf8);
> >> > +    port -= 0xcf8;
> >> > +
> >> > +    spin_lock(&hd->pci_lock);
> >>
> >> Is there really any danger in not having this lock at all? On real
> >> hardware, if the OS doesn't properly serialize accesses, the
> >> result is going to be undefined too. All I think you need to make
> >> sure is that ->pci_cf8 never gets updated non-atomically.
> >>
> >
> > I could remove it, but is it doing any harm?
> 
> Any spin lock it harmful to performance.
> 

Ok, I'll remove it.

> >> > +static int hvm_access_cfc(
> >> > +    int dir, uint32_t port, uint32_t bytes, uint32_t *val)
> >> > +{
> >> > +    struct vcpu *curr = current;
> >> > +    struct hvm_domain *hd = &curr->domain->arch.hvm_domain;
> >> > +    int rc;
> >> > +
> >> > +    BUG_ON(port < 0xcfc);
> >> > +    port -= 0xcfc;
> >> > +
> >> > +    spin_lock(&hd->pci_lock);
> >> > +
> >> > +    if ( hd->pci_cf8 & (1 << 31) ) {
> >> > +        /* Fall through to an emulator */
> >> > +        rc = X86EMUL_UNHANDLEABLE;
> >> > +    } else {
> >> > +        /* Config access disabled */
> >>
> >> Why does this not also get passed through to an emulator?
> >>
> >
> > I was trying to be consistent with QEMU here. It squashes any data
> accesses
> > if cf8 has the top bit set.
> 
> But afaict with that dropped the entire function can go away.
> 

Yes, it can. Do you not think it would be a good idea to be consistent with QEMU though?

> >> > +void hvm_broadcast_assist_req(ioreq_t *p)
> >> > +{
> >> > +    struct vcpu *v = current;
> >> > +    struct domain *d = v->domain;
> >> > +    struct hvm_ioreq_server *s;
> >> > +
> >> > +    list_for_each_entry ( s,
> >> > +                          &d->arch.hvm_domain.ioreq_server_list,
> >> > +                          list_entry )
> >> > +        (void) hvm_send_assist_req_to_ioreq_server(s, v, p);
> >> > +}
> >>
> >> Is there possibly any way to make sure only operations not having
> >> any results and not affecting guest visible state changes can make
> >> it here?
> >>
> >
> > Well, I could whitelist the IOREQ type(s) here I guess.
> 
> By way of ASSERT() perhaps then...

Yes.

> 
> >> >  struct hvm_domain {
> >> > -    struct hvm_ioreq_server *ioreq_server;
> >> > +    /* Guest page range used for non-default ioreq servers */
> >> > +    unsigned long           ioreq_gmfn_base;
> >> > +    unsigned long           ioreq_gmfn_mask;
> >> > +    unsigned int            ioreq_gmfn_count;
> >> > +
> >> > +    /* Lock protects all other values in the following block */
> >> >      spinlock_t              ioreq_server_lock;
> >> > +    ioservid_t              ioreq_server_id;
> >> > +    unsigned int            ioreq_server_count;
> >> > +    struct list_head        ioreq_server_list;
> >>
> >> Rather than using all these redundant prefixes, could I talk you into
> >> using sub-structures instead:
> >
> > Ok, if you prefer that style.
> 
> I'm not insisting here since I know some others are of different
> opinion. But doing it that way may potentially allow passing the
> address to just a sub-structure to functions, in turn making those
> functions more legible. But as said - if you're not convinced, leave
> it as is.
> 

Ok, I'll see how it looks.

  Paul

> Jan

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 6/9] ioreq-server: add support for multiple servers
  2014-05-07 12:25         ` Paul Durrant
@ 2014-05-07 12:34           ` Jan Beulich
  2014-05-07 12:37             ` Paul Durrant
  0 siblings, 1 reply; 57+ messages in thread
From: Jan Beulich @ 2014-05-07 12:34 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Ian Jackson, Stefano Stabellini, Ian Campbell, xen-devel

>>> On 07.05.14 at 14:25, <Paul.Durrant@citrix.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> >>> On 07.05.14 at 14:06, <Paul.Durrant@citrix.com> wrote:
>> >> From: Jan Beulich [mailto:JBeulich@suse.com]
>> >> >>> On 01.05.14 at 14:08, <paul.durrant@citrix.com> wrote:
>> >> > +static int hvm_access_cfc(
>> >> > +    int dir, uint32_t port, uint32_t bytes, uint32_t *val)
>> >> > +{
>> >> > +    struct vcpu *curr = current;
>> >> > +    struct hvm_domain *hd = &curr->domain->arch.hvm_domain;
>> >> > +    int rc;
>> >> > +
>> >> > +    BUG_ON(port < 0xcfc);
>> >> > +    port -= 0xcfc;
>> >> > +
>> >> > +    spin_lock(&hd->pci_lock);
>> >> > +
>> >> > +    if ( hd->pci_cf8 & (1 << 31) ) {
>> >> > +        /* Fall through to an emulator */
>> >> > +        rc = X86EMUL_UNHANDLEABLE;
>> >> > +    } else {
>> >> > +        /* Config access disabled */
>> >>
>> >> Why does this not also get passed through to an emulator?
>> >>
>> >
>> > I was trying to be consistent with QEMU here. It squashes any data
>> accesses
>> > if cf8 has the top bit set.
>> 
>> But afaict with that dropped the entire function can go away.
>> 
> 
> Yes, it can. Do you not think it would be a good idea to be consistent with 
> QEMU though?

By removing the function you're going to be consistent with qemu,
because you're going to have qemu supply the data.

Jan

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 6/9] ioreq-server: add support for multiple servers
  2014-05-07 12:34           ` Jan Beulich
@ 2014-05-07 12:37             ` Paul Durrant
  2014-05-07 14:07               ` Jan Beulich
  0 siblings, 1 reply; 57+ messages in thread
From: Paul Durrant @ 2014-05-07 12:37 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Ian Jackson, Stefano Stabellini, Ian Campbell, xen-devel

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 07 May 2014 13:34
> To: Paul Durrant
> Cc: Ian Campbell; Ian Jackson; Stefano Stabellini; xen-devel@lists.xen.org
> Subject: RE: [PATCH v5 6/9] ioreq-server: add support for multiple servers
> 
> >>> On 07.05.14 at 14:25, <Paul.Durrant@citrix.com> wrote:
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> >>> On 07.05.14 at 14:06, <Paul.Durrant@citrix.com> wrote:
> >> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> >> >>> On 01.05.14 at 14:08, <paul.durrant@citrix.com> wrote:
> >> >> > +static int hvm_access_cfc(
> >> >> > +    int dir, uint32_t port, uint32_t bytes, uint32_t *val)
> >> >> > +{
> >> >> > +    struct vcpu *curr = current;
> >> >> > +    struct hvm_domain *hd = &curr->domain->arch.hvm_domain;
> >> >> > +    int rc;
> >> >> > +
> >> >> > +    BUG_ON(port < 0xcfc);
> >> >> > +    port -= 0xcfc;
> >> >> > +
> >> >> > +    spin_lock(&hd->pci_lock);
> >> >> > +
> >> >> > +    if ( hd->pci_cf8 & (1 << 31) ) {
> >> >> > +        /* Fall through to an emulator */
> >> >> > +        rc = X86EMUL_UNHANDLEABLE;
> >> >> > +    } else {
> >> >> > +        /* Config access disabled */
> >> >>
> >> >> Why does this not also get passed through to an emulator?
> >> >>
> >> >
> >> > I was trying to be consistent with QEMU here. It squashes any data
> >> accesses
> >> > if cf8 has the top bit set.
> >>
> >> But afaict with that dropped the entire function can go away.
> >>
> >
> > Yes, it can. Do you not think it would be a good idea to be consistent with
> > QEMU though?
> 
> By removing the function you're going to be consistent with qemu,
> because you're going to have qemu supply the data.
> 

Yes, true, in select_ioreq_server() I could just pass cfc accesses to the default emulator if cf8 has the top bit set.

  Paul

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 6/9] ioreq-server: add support for multiple servers
  2014-05-07 12:37             ` Paul Durrant
@ 2014-05-07 14:07               ` Jan Beulich
  2014-05-07 14:12                 ` Paul Durrant
  0 siblings, 1 reply; 57+ messages in thread
From: Jan Beulich @ 2014-05-07 14:07 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Ian Jackson, Stefano Stabellini, Ian Campbell, xen-devel

>>> On 07.05.14 at 14:37, <Paul.Durrant@citrix.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> >>> On 07.05.14 at 14:25, <Paul.Durrant@citrix.com> wrote:
>> >> From: Jan Beulich [mailto:JBeulich@suse.com]
>> >> >>> On 07.05.14 at 14:06, <Paul.Durrant@citrix.com> wrote:
>> >> >> From: Jan Beulich [mailto:JBeulich@suse.com]
>> >> >> >>> On 01.05.14 at 14:08, <paul.durrant@citrix.com> wrote:
>> >> >> > +static int hvm_access_cfc(
>> >> >> > +    int dir, uint32_t port, uint32_t bytes, uint32_t *val)
>> >> >> > +{
>> >> >> > +    struct vcpu *curr = current;
>> >> >> > +    struct hvm_domain *hd = &curr->domain->arch.hvm_domain;
>> >> >> > +    int rc;
>> >> >> > +
>> >> >> > +    BUG_ON(port < 0xcfc);
>> >> >> > +    port -= 0xcfc;
>> >> >> > +
>> >> >> > +    spin_lock(&hd->pci_lock);
>> >> >> > +
>> >> >> > +    if ( hd->pci_cf8 & (1 << 31) ) {
>> >> >> > +        /* Fall through to an emulator */
>> >> >> > +        rc = X86EMUL_UNHANDLEABLE;
>> >> >> > +    } else {
>> >> >> > +        /* Config access disabled */
>> >> >>
>> >> >> Why does this not also get passed through to an emulator?
>> >> >>
>> >> >
>> >> > I was trying to be consistent with QEMU here. It squashes any data
>> >> accesses
>> >> > if cf8 has the top bit set.
>> >>
>> >> But afaict with that dropped the entire function can go away.
>> >>
>> >
>> > Yes, it can. Do you not think it would be a good idea to be consistent with
>> > QEMU though?
>> 
>> By removing the function you're going to be consistent with qemu,
>> because you're going to have qemu supply the data.
>> 
> 
> Yes, true, in select_ioreq_server() I could just pass cfc accesses to the 
> default emulator if cf8 has the top bit set.

I continue to fail to see why you make this dependent on the value
of the top bit - just have any accesses to port 0xCFC be handled by
the emulator responsible for the respective device.

Jan

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 6/9] ioreq-server: add support for multiple servers
  2014-05-07 14:07               ` Jan Beulich
@ 2014-05-07 14:12                 ` Paul Durrant
  2014-05-07 14:22                   ` Jan Beulich
  0 siblings, 1 reply; 57+ messages in thread
From: Paul Durrant @ 2014-05-07 14:12 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Ian Jackson, Stefano Stabellini, Ian Campbell, xen-devel

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 07 May 2014 15:08
> To: Paul Durrant
> Cc: Ian Campbell; Ian Jackson; Stefano Stabellini; xen-devel@lists.xen.org
> Subject: RE: [PATCH v5 6/9] ioreq-server: add support for multiple servers
> 
> >>> On 07.05.14 at 14:37, <Paul.Durrant@citrix.com> wrote:
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> >>> On 07.05.14 at 14:25, <Paul.Durrant@citrix.com> wrote:
> >> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> >> >>> On 07.05.14 at 14:06, <Paul.Durrant@citrix.com> wrote:
> >> >> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> >> >> >>> On 01.05.14 at 14:08, <paul.durrant@citrix.com> wrote:
> >> >> >> > +static int hvm_access_cfc(
> >> >> >> > +    int dir, uint32_t port, uint32_t bytes, uint32_t *val)
> >> >> >> > +{
> >> >> >> > +    struct vcpu *curr = current;
> >> >> >> > +    struct hvm_domain *hd = &curr->domain->arch.hvm_domain;
> >> >> >> > +    int rc;
> >> >> >> > +
> >> >> >> > +    BUG_ON(port < 0xcfc);
> >> >> >> > +    port -= 0xcfc;
> >> >> >> > +
> >> >> >> > +    spin_lock(&hd->pci_lock);
> >> >> >> > +
> >> >> >> > +    if ( hd->pci_cf8 & (1 << 31) ) {
> >> >> >> > +        /* Fall through to an emulator */
> >> >> >> > +        rc = X86EMUL_UNHANDLEABLE;
> >> >> >> > +    } else {
> >> >> >> > +        /* Config access disabled */
> >> >> >>
> >> >> >> Why does this not also get passed through to an emulator?
> >> >> >>
> >> >> >
> >> >> > I was trying to be consistent with QEMU here. It squashes any data
> >> >> accesses
> >> >> > if cf8 has the top bit set.
> >> >>
> >> >> But afaict with that dropped the entire function can go away.
> >> >>
> >> >
> >> > Yes, it can. Do you not think it would be a good idea to be consistent
> with
> >> > QEMU though?
> >>
> >> By removing the function you're going to be consistent with qemu,
> >> because you're going to have qemu supply the data.
> >>
> >
> > Yes, true, in select_ioreq_server() I could just pass cfc accesses to the
> > default emulator if cf8 has the top bit set.
> 
> I continue to fail to see why you make this dependent on the value
> of the top bit - just have any accesses to port 0xCFC be handled by
> the emulator responsible for the respective device.

Because, according to what I read, unless cf8 has the top bit set, a subsequent access to cfc should not be treated as a config cycle.

  Paul

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 6/9] ioreq-server: add support for multiple servers
  2014-05-07 14:12                 ` Paul Durrant
@ 2014-05-07 14:22                   ` Jan Beulich
  0 siblings, 0 replies; 57+ messages in thread
From: Jan Beulich @ 2014-05-07 14:22 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Ian Jackson, Stefano Stabellini, Ian Campbell, xen-devel

>>> On 07.05.14 at 16:12, <Paul.Durrant@citrix.com> wrote:
>>  -----Original Message-----
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: 07 May 2014 15:08
>> To: Paul Durrant
>> Cc: Ian Campbell; Ian Jackson; Stefano Stabellini; xen-devel@lists.xen.org 
>> Subject: RE: [PATCH v5 6/9] ioreq-server: add support for multiple servers
>> 
>> >>> On 07.05.14 at 14:37, <Paul.Durrant@citrix.com> wrote:
>> >> From: Jan Beulich [mailto:JBeulich@suse.com]
>> >> >>> On 07.05.14 at 14:25, <Paul.Durrant@citrix.com> wrote:
>> >> >> From: Jan Beulich [mailto:JBeulich@suse.com]
>> >> >> >>> On 07.05.14 at 14:06, <Paul.Durrant@citrix.com> wrote:
>> >> >> >> From: Jan Beulich [mailto:JBeulich@suse.com]
>> >> >> >> >>> On 01.05.14 at 14:08, <paul.durrant@citrix.com> wrote:
>> >> >> >> > +static int hvm_access_cfc(
>> >> >> >> > +    int dir, uint32_t port, uint32_t bytes, uint32_t *val)
>> >> >> >> > +{
>> >> >> >> > +    struct vcpu *curr = current;
>> >> >> >> > +    struct hvm_domain *hd = &curr->domain->arch.hvm_domain;
>> >> >> >> > +    int rc;
>> >> >> >> > +
>> >> >> >> > +    BUG_ON(port < 0xcfc);
>> >> >> >> > +    port -= 0xcfc;
>> >> >> >> > +
>> >> >> >> > +    spin_lock(&hd->pci_lock);
>> >> >> >> > +
>> >> >> >> > +    if ( hd->pci_cf8 & (1 << 31) ) {
>> >> >> >> > +        /* Fall through to an emulator */
>> >> >> >> > +        rc = X86EMUL_UNHANDLEABLE;
>> >> >> >> > +    } else {
>> >> >> >> > +        /* Config access disabled */
>> >> >> >>
>> >> >> >> Why does this not also get passed through to an emulator?
>> >> >> >>
>> >> >> >
>> >> >> > I was trying to be consistent with QEMU here. It squashes any data
>> >> >> accesses
>> >> >> > if cf8 has the top bit set.
>> >> >>
>> >> >> But afaict with that dropped the entire function can go away.
>> >> >>
>> >> >
>> >> > Yes, it can. Do you not think it would be a good idea to be consistent
>> with
>> >> > QEMU though?
>> >>
>> >> By removing the function you're going to be consistent with qemu,
>> >> because you're going to have qemu supply the data.
>> >>
>> >
>> > Yes, true, in select_ioreq_server() I could just pass cfc accesses to the
>> > default emulator if cf8 has the top bit set.
>> 
>> I continue to fail to see why you make this dependent on the value
>> of the top bit - just have any accesses to port 0xCFC be handled by
>> the emulator responsible for the respective device.
> 
> Because, according to what I read, unless cf8 has the top bit set, a 
> subsequent access to cfc should not be treated as a config cycle.

Oh, right, I mis-read your previous comment.

Jan

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 0/9] Support for running secondary emulators
  2014-05-01 12:08 [PATCH v5 0/9] Support for running secondary emulators Paul Durrant
                   ` (8 preceding siblings ...)
  2014-05-01 12:08 ` [PATCH v5 9/9] ioreq-server: bring the PCI hotplug controller implementation into Xen Paul Durrant
@ 2014-05-07 14:41 ` Jan Beulich
  2014-05-07 14:45   ` Paul Durrant
  9 siblings, 1 reply; 57+ messages in thread
From: Jan Beulich @ 2014-05-07 14:41 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel

>>> On 01.05.14 at 14:08, <paul.durrant@citrix.com> wrote:
> This patch series adds the ioreq server interface which I mentioned in
> my talk at the Xen developer summit in Edinburgh at the end of last year.
> The code is based on work originally done by Julien Grall but has been
> re-written to allow existing versions of QEMU to work unmodified.

And one more general question now that I got through the entire
series: How much of this new functionality is really x86-specific?
IOW wouldn't it make sense to have some (big?) parts of this live
in e.g. xen/common/ioreq-server.c?

Jan

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 0/9] Support for running secondary emulators
  2014-05-07 14:41 ` [PATCH v5 0/9] Support for running secondary emulators Jan Beulich
@ 2014-05-07 14:45   ` Paul Durrant
  0 siblings, 0 replies; 57+ messages in thread
From: Paul Durrant @ 2014-05-07 14:45 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 07 May 2014 15:41
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH v5 0/9] Support for running secondary
> emulators
> 
> >>> On 01.05.14 at 14:08, <paul.durrant@citrix.com> wrote:
> > This patch series adds the ioreq server interface which I mentioned in
> > my talk at the Xen developer summit in Edinburgh at the end of last year.
> > The code is based on work originally done by Julien Grall but has been
> > re-written to allow existing versions of QEMU to work unmodified.
> 
> And one more general question now that I got through the entire
> series: How much of this new functionality is really x86-specific?
> IOW wouldn't it make sense to have some (big?) parts of this live
> in e.g. xen/common/ioreq-server.c?
> 

Well, I'd rather keep it x86 for now - since that's what I'm testing on, and I'm not aware of a need for secondary emulators in for ARM. I don't see why it couldn't be made more common if a need arose though.

  Paul

^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2014-05-07 14:45 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-01 12:08 [PATCH v5 0/9] Support for running secondary emulators Paul Durrant
2014-05-01 12:08 ` [PATCH v5 1/9] hvm_set_ioreq_page() releases wrong page in error path Paul Durrant
2014-05-01 12:48   ` Andrew Cooper
2014-05-01 12:08 ` [PATCH v5 2/9] ioreq-server: pre-series tidy up Paul Durrant
2014-05-06 12:25   ` Jan Beulich
2014-05-06 12:37     ` Paul Durrant
2014-05-01 12:08 ` [PATCH v5 3/9] ioreq-server: centralize access to ioreq structures Paul Durrant
2014-05-06 12:35   ` Jan Beulich
2014-05-06 12:41     ` Paul Durrant
2014-05-06 14:13       ` Paul Durrant
2014-05-06 14:21         ` Jan Beulich
2014-05-06 14:32           ` Paul Durrant
2014-05-06 14:39             ` Jan Beulich
2014-05-01 12:08 ` [PATCH v5 4/9] ioreq-server: create basic ioreq server abstraction Paul Durrant
2014-05-06 12:55   ` Jan Beulich
2014-05-06 13:12     ` Paul Durrant
2014-05-06 13:24       ` Jan Beulich
2014-05-06 13:40         ` Paul Durrant
2014-05-06 13:50           ` Jan Beulich
2014-05-06 13:44         ` Paul Durrant
2014-05-06 13:51           ` Jan Beulich
2014-05-06 13:53             ` Paul Durrant
2014-05-01 12:08 ` [PATCH v5 5/9] ioreq-server: on-demand creation of ioreq server Paul Durrant
2014-05-06 14:18   ` Jan Beulich
2014-05-06 14:24     ` Paul Durrant
2014-05-06 15:07       ` Jan Beulich
2014-05-01 12:08 ` [PATCH v5 6/9] ioreq-server: add support for multiple servers Paul Durrant
2014-05-06 10:46   ` Ian Campbell
2014-05-06 13:28     ` Paul Durrant
2014-05-07  9:44       ` Ian Campbell
2014-05-07  9:48         ` Paul Durrant
2014-05-07 11:13   ` Jan Beulich
2014-05-07 12:06     ` Paul Durrant
2014-05-07 12:23       ` Jan Beulich
2014-05-07 12:25         ` Paul Durrant
2014-05-07 12:34           ` Jan Beulich
2014-05-07 12:37             ` Paul Durrant
2014-05-07 14:07               ` Jan Beulich
2014-05-07 14:12                 ` Paul Durrant
2014-05-07 14:22                   ` Jan Beulich
2014-05-01 12:08 ` [PATCH v5 7/9] ioreq-server: remove p2m entries when server is enabled Paul Durrant
2014-05-06 10:48   ` Ian Campbell
2014-05-06 16:57     ` Paul Durrant
2014-05-07 12:09   ` Jan Beulich
2014-05-01 12:08 ` [PATCH v5 8/9] ioreq-server: make buffered ioreq handling optional Paul Durrant
2014-05-06 10:52   ` Ian Campbell
2014-05-06 13:17     ` Paul Durrant
2014-05-07 12:13   ` Jan Beulich
2014-05-01 12:08 ` [PATCH v5 9/9] ioreq-server: bring the PCI hotplug controller implementation into Xen Paul Durrant
2014-05-06 11:24   ` Ian Campbell
2014-05-06 13:02     ` Paul Durrant
2014-05-06 13:24       ` Ian Campbell
2014-05-06 13:35         ` Paul Durrant
2014-05-07  9:48           ` Ian Campbell
2014-05-07  9:51             ` Paul Durrant
2014-05-07 14:41 ` [PATCH v5 0/9] Support for running secondary emulators Jan Beulich
2014-05-07 14:45   ` Paul Durrant

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.