All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/6] Support for running secondary emulators
@ 2014-03-05 14:47 Paul Durrant
  2014-03-05 14:47 ` [PATCH v3 1/6] ioreq-server: centralize access to ioreq structures Paul Durrant
                   ` (7 more replies)
  0 siblings, 8 replies; 48+ messages in thread
From: Paul Durrant @ 2014-03-05 14:47 UTC (permalink / raw)
  To: xen-devel

This patch series adds the ioreq server interface which I mentioned in
my talk at the Xen developer summit in Edinburgh at the end of last year.
The code is based on work originally done by Julien Grall but has been
re-written to allow existing versions of QEMU to work unmodified.

The code is available in my xen.git [1] repo on xenbits, under the 'savannah3'
branch, and I have also written a demo emulator to test the code, which can
be found in my demu.git [2] repo.


The modifications are broken down as follows:

Patch #1 basically just moves some code around to make subsequent patches
more obvious. 

Patch #2 tidies up some uses of ioreq_t as suggested by Andrew Cooper.

Patch #3 again is largely code movement, from various places into a new
hvm_ioreq_server structure. There should be no functional change at this
stage as the ioreq server is still created at domain initialisation time (as
were its contents prior to this patch).

Patch #4 is the first functional change. The ioreq server struct
initialisation is now deferred until something actually tries to play with
the HVM parameters which reference it. In practice this is QEMU, which
needs to read the ioreq pfns so it can map them.

Patch #5 is the big one. This moves from a single ioreq server per domain
to a list. The server that is created when the HVM parameters are reference
is given id 0 and is considered to be the 'catch all' server which is, after
all, how QEMU is used. Any secondary emulator, created using the new API
in xenctrl.h, will have id 1 or above and only gets ioreqs when I/O hits one
of its registered IO ranges or PCI devices.

Patch #6 pulls the PCI hotplug controller emulation into Xen. This is
necessary to allow a secondary emulator to hotplug a PCI device into the VM.
The code implements the controller in the same way as upstream QEMU and thus
the variant of the DSDT ASL used for upstream QEMU is retained.


There are no modifications to libxl to actually invoke a secondary emulator
at this stage. The only changes made are simply to increase the number of
special pages reserved for a VM to allow the use of more than one emulator
and call the new PCI hotplug API when attaching or detaching PCI devices.
The demo emulator can simply be invoked from a shell and will hotplug its
device onto the PCI bus (and remove it again when it's killed). The emulated
device is not an awful lot of use at this stage - it appears as a SCSI
controller with one IO BAR and one MEM BAR and has no intrinsic
functionality... but then it is only supposed to be demo :-)

  Paul

[1] http://xenbits.xen.org/gitweb/?p=people/pauldu/xen.git
[2] http://xenbits.xen.org/gitweb/?p=people/pauldu/demu.git

v2:
 - First non-RFC posting

v3:
 - Addressed comments from Jan Beulich

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v3 1/6] ioreq-server: centralize access to ioreq structures
  2014-03-05 14:47 [PATCH v3 0/6] Support for running secondary emulators Paul Durrant
@ 2014-03-05 14:47 ` Paul Durrant
  2014-03-05 14:47 ` [PATCH v3 2/6] ioreq-server: tidy up use of ioreq_t Paul Durrant
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 48+ messages in thread
From: Paul Durrant @ 2014-03-05 14:47 UTC (permalink / raw)
  To: xen-devel; +Cc: Paul Durrant

To simplify creation of the ioreq server abstraction in a
subsequent patch, this patch centralizes all use of the shared
ioreq structure and the buffered ioreq ring to the source module
xen/arch/x86/hvm/hvm.c.

This patch also adds some missing emacs boilerplate in the places where I
needed it.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
---
 xen/arch/x86/hvm/emulate.c        |   38 +++++--------
 xen/arch/x86/hvm/hvm.c            |  113 ++++++++++++++++++++++++++++++++++++-
 xen/arch/x86/hvm/io.c             |   99 ++------------------------------
 xen/arch/x86/hvm/stdvga.c         |    2 +-
 xen/arch/x86/hvm/vmx/vvmx.c       |   13 ++++-
 xen/include/asm-x86/hvm/hvm.h     |   15 ++++-
 xen/include/asm-x86/hvm/io.h      |    2 +-
 xen/include/asm-x86/hvm/support.h |   19 ++++---
 8 files changed, 165 insertions(+), 136 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 868aa1d..0ba2020 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -57,24 +57,11 @@ static int hvmemul_do_io(
     int value_is_ptr = (p_data == NULL);
     struct vcpu *curr = current;
     struct hvm_vcpu_io *vio;
-    ioreq_t *p = get_ioreq(curr);
-    ioreq_t _ioreq;
+    ioreq_t p[1];
     unsigned long ram_gfn = paddr_to_pfn(ram_gpa);
     p2m_type_t p2mt;
     struct page_info *ram_page;
     int rc;
-    bool_t has_dm = 1;
-
-    /*
-     * Domains without a backing DM, don't have an ioreq page.  Just
-     * point to a struct on the stack, initialising the state as needed.
-     */
-    if ( !p )
-    {
-        has_dm = 0;
-        p = &_ioreq;
-        p->state = STATE_IOREQ_NONE;
-    }
 
     /* Check for paged out page */
     ram_page = get_page_from_gfn(curr->domain, ram_gfn, &p2mt, P2M_UNSHARE);
@@ -173,15 +160,6 @@ static int hvmemul_do_io(
         return X86EMUL_UNHANDLEABLE;
     }
 
-    if ( p->state != STATE_IOREQ_NONE )
-    {
-        gdprintk(XENLOG_WARNING, "WARNING: io already pending (%d)?\n",
-                 p->state);
-        if ( ram_page )
-            put_page(ram_page);
-        return X86EMUL_UNHANDLEABLE;
-    }
-
     vio->io_state =
         (p_data == NULL) ? HVMIO_dispatched : HVMIO_awaiting_completion;
     vio->io_size = size;
@@ -233,7 +211,7 @@ static int hvmemul_do_io(
         break;
     case X86EMUL_UNHANDLEABLE:
         /* If there is no backing DM, just ignore accesses */
-        if ( !has_dm )
+        if ( !hvm_has_dm(curr->domain) )
         {
             rc = X86EMUL_OKAY;
             vio->io_state = HVMIO_none;
@@ -241,7 +219,7 @@ static int hvmemul_do_io(
         else
         {
             rc = X86EMUL_RETRY;
-            if ( !hvm_send_assist_req(curr) )
+            if ( !hvm_send_assist_req(curr, p) )
                 vio->io_state = HVMIO_none;
             else if ( p_data == NULL )
                 rc = X86EMUL_OKAY;
@@ -1292,3 +1270,13 @@ struct segment_register *hvmemul_get_seg_reg(
         hvm_get_segment_register(current, seg, &hvmemul_ctxt->seg_reg[seg]);
     return &hvmemul_ctxt->seg_reg[seg];
 }
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 9e85c13..0b2e57e 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -345,9 +345,27 @@ void hvm_migrate_pirqs(struct vcpu *v)
     spin_unlock(&d->event_lock);
 }
 
+static ioreq_t *get_ioreq(struct vcpu *v)
+{
+    struct domain *d = v->domain;
+    shared_iopage_t *p = d->arch.hvm_domain.ioreq.va;
+    ASSERT((v == current) || spin_is_locked(&d->arch.hvm_domain.ioreq.lock));
+    return p ? &p->vcpu_ioreq[v->vcpu_id] : NULL;
+}
+
+bool_t hvm_io_pending(struct vcpu *v)
+{
+    ioreq_t *p = get_ioreq(v);
+
+    if ( !p )
+         return 0;
+
+    return ( p->state != STATE_IOREQ_NONE );
+}
+
 void hvm_do_resume(struct vcpu *v)
 {
-    ioreq_t *p;
+    ioreq_t *p = get_ioreq(v);
 
     check_wakeup_from_wait();
 
@@ -355,7 +373,7 @@ void hvm_do_resume(struct vcpu *v)
         pt_restore_timer(v);
 
     /* NB. Optimised for common case (p->state == STATE_IOREQ_NONE). */
-    if ( !(p = get_ioreq(v)) )
+    if ( !p )
         goto check_inject_trap;
 
     while ( p->state != STATE_IOREQ_NONE )
@@ -1407,7 +1425,87 @@ void hvm_vcpu_down(struct vcpu *v)
     }
 }
 
-bool_t hvm_send_assist_req(struct vcpu *v)
+int hvm_buffered_io_send(struct domain *d, const ioreq_t *p)
+{
+    struct hvm_ioreq_page *iorp = &d->arch.hvm_domain.buf_ioreq;
+    buffered_iopage_t *pg = iorp->va;
+    buf_ioreq_t bp;
+    /* Timeoffset sends 64b data, but no address. Use two consecutive slots. */
+    int qw = 0;
+
+    /* Ensure buffered_iopage fits in a page */
+    BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE);
+
+    /*
+     * Return 0 for the cases we can't deal with:
+     *  - 'addr' is only a 20-bit field, so we cannot address beyond 1MB
+     *  - we cannot buffer accesses to guest memory buffers, as the guest
+     *    may expect the memory buffer to be synchronously accessed
+     *  - the count field is usually used with data_is_ptr and since we don't
+     *    support data_is_ptr we do not waste space for the count field either
+     */
+    if ( (p->addr > 0xffffful) || p->data_is_ptr || (p->count != 1) )
+        return 0;
+
+    bp.type = p->type;
+    bp.dir  = p->dir;
+    switch ( p->size )
+    {
+    case 1:
+        bp.size = 0;
+        break;
+    case 2:
+        bp.size = 1;
+        break;
+    case 4:
+        bp.size = 2;
+        break;
+    case 8:
+        bp.size = 3;
+        qw = 1;
+        break;
+    default:
+        gdprintk(XENLOG_WARNING, "unexpected ioreq size: %u\n", p->size);
+        return 0;
+    }
+
+    bp.data = p->data;
+    bp.addr = p->addr;
+
+    spin_lock(&iorp->lock);
+
+    if ( (pg->write_pointer - pg->read_pointer) >=
+         (IOREQ_BUFFER_SLOT_NUM - qw) )
+    {
+        /* The queue is full: send the iopacket through the normal path. */
+        spin_unlock(&iorp->lock);
+        return 0;
+    }
+
+    pg->buf_ioreq[pg->write_pointer % IOREQ_BUFFER_SLOT_NUM] = bp;
+
+    if ( qw )
+    {
+        bp.data = p->data >> 32;
+        pg->buf_ioreq[(pg->write_pointer+1) % IOREQ_BUFFER_SLOT_NUM] = bp;
+    }
+
+    /* Make the ioreq_t visible /before/ write_pointer. */
+    wmb();
+    pg->write_pointer += qw ? 2 : 1;
+
+    notify_via_xen_event_channel(d, d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN]);
+    spin_unlock(&iorp->lock);
+
+    return 1;
+}
+
+bool_t hvm_has_dm(struct domain *d)
+{
+    return !!d->arch.hvm_domain.ioreq.va;
+}
+
+bool_t hvm_send_assist_req(struct vcpu *v, const ioreq_t *proto_p)
 {
     ioreq_t *p;
 
@@ -1425,6 +1523,15 @@ bool_t hvm_send_assist_req(struct vcpu *v)
         return 0;
     }
 
+    p->dir = proto_p->dir;
+    p->data_is_ptr = proto_p->data_is_ptr;
+    p->type = proto_p->type;
+    p->size = proto_p->size;
+    p->addr = proto_p->addr;
+    p->count = proto_p->count;
+    p->df = proto_p->df;
+    p->data = proto_p->data;
+
     prepare_wait_on_xen_event_channel(v->arch.hvm_vcpu.xen_port);
 
     /*
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index bf6309d..ba50c53 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -46,87 +46,9 @@
 #include <xen/iocap.h>
 #include <public/hvm/ioreq.h>
 
-int hvm_buffered_io_send(ioreq_t *p)
-{
-    struct vcpu *v = current;
-    struct hvm_ioreq_page *iorp = &v->domain->arch.hvm_domain.buf_ioreq;
-    buffered_iopage_t *pg = iorp->va;
-    buf_ioreq_t bp;
-    /* Timeoffset sends 64b data, but no address. Use two consecutive slots. */
-    int qw = 0;
-
-    /* Ensure buffered_iopage fits in a page */
-    BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE);
-
-    /*
-     * Return 0 for the cases we can't deal with:
-     *  - 'addr' is only a 20-bit field, so we cannot address beyond 1MB
-     *  - we cannot buffer accesses to guest memory buffers, as the guest
-     *    may expect the memory buffer to be synchronously accessed
-     *  - the count field is usually used with data_is_ptr and since we don't
-     *    support data_is_ptr we do not waste space for the count field either
-     */
-    if ( (p->addr > 0xffffful) || p->data_is_ptr || (p->count != 1) )
-        return 0;
-
-    bp.type = p->type;
-    bp.dir  = p->dir;
-    switch ( p->size )
-    {
-    case 1:
-        bp.size = 0;
-        break;
-    case 2:
-        bp.size = 1;
-        break;
-    case 4:
-        bp.size = 2;
-        break;
-    case 8:
-        bp.size = 3;
-        qw = 1;
-        break;
-    default:
-        gdprintk(XENLOG_WARNING, "unexpected ioreq size: %u\n", p->size);
-        return 0;
-    }
-    
-    bp.data = p->data;
-    bp.addr = p->addr;
-    
-    spin_lock(&iorp->lock);
-
-    if ( (pg->write_pointer - pg->read_pointer) >=
-         (IOREQ_BUFFER_SLOT_NUM - qw) )
-    {
-        /* The queue is full: send the iopacket through the normal path. */
-        spin_unlock(&iorp->lock);
-        return 0;
-    }
-    
-    memcpy(&pg->buf_ioreq[pg->write_pointer % IOREQ_BUFFER_SLOT_NUM],
-           &bp, sizeof(bp));
-    
-    if ( qw )
-    {
-        bp.data = p->data >> 32;
-        memcpy(&pg->buf_ioreq[(pg->write_pointer+1) % IOREQ_BUFFER_SLOT_NUM],
-               &bp, sizeof(bp));
-    }
-
-    /* Make the ioreq_t visible /before/ write_pointer. */
-    wmb();
-    pg->write_pointer += qw ? 2 : 1;
-
-    notify_via_xen_event_channel(v->domain,
-            v->domain->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN]);
-    spin_unlock(&iorp->lock);
-    
-    return 1;
-}
-
 void send_timeoffset_req(unsigned long timeoff)
 {
+    struct vcpu *curr = current;
     ioreq_t p[1];
 
     if ( timeoff == 0 )
@@ -142,33 +64,22 @@ void send_timeoffset_req(unsigned long timeoff)
 
     p->state = STATE_IOREQ_READY;
 
-    if ( !hvm_buffered_io_send(p) )
+    if ( !hvm_buffered_io_send(curr->domain, p) )
         printk("Unsuccessful timeoffset update\n");
 }
 
 /* Ask ioemu mapcache to invalidate mappings. */
 void send_invalidate_req(void)
 {
-    struct vcpu *v = current;
-    ioreq_t *p = get_ioreq(v);
-
-    if ( !p )
-        return;
-
-    if ( p->state != STATE_IOREQ_NONE )
-    {
-        gdprintk(XENLOG_ERR, "WARNING: send invalidate req with something "
-                 "already pending (%d)?\n", p->state);
-        domain_crash(v->domain);
-        return;
-    }
+    struct vcpu *curr = current;
+    ioreq_t p[1];
 
     p->type = IOREQ_TYPE_INVALIDATE;
     p->size = 4;
     p->dir = IOREQ_WRITE;
     p->data = ~0UL; /* flush all */
 
-    (void)hvm_send_assist_req(v);
+    (void)hvm_send_assist_req(curr, p);
 }
 
 int handle_mmio(void)
diff --git a/xen/arch/x86/hvm/stdvga.c b/xen/arch/x86/hvm/stdvga.c
index 19e80ed..9e2d28e 100644
--- a/xen/arch/x86/hvm/stdvga.c
+++ b/xen/arch/x86/hvm/stdvga.c
@@ -580,7 +580,7 @@ static int stdvga_intercept_mmio(ioreq_t *p)
         buf = (p->dir == IOREQ_WRITE);
     }
 
-    rc = (buf && hvm_buffered_io_send(p));
+    rc = (buf && hvm_buffered_io_send(d, p));
 
     spin_unlock(&s->lock);
 
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 40167d6..0421623 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -1394,7 +1394,6 @@ void nvmx_switch_guest(void)
     struct vcpu *v = current;
     struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
     struct cpu_user_regs *regs = guest_cpu_user_regs();
-    const ioreq_t *ioreq = get_ioreq(v);
 
     /*
      * A pending IO emulation may still be not finished. In this case, no
@@ -1404,7 +1403,7 @@ void nvmx_switch_guest(void)
      * don't want to continue as this setup is not implemented nor supported
      * as of right now.
      */
-    if ( !ioreq || ioreq->state != STATE_IOREQ_NONE )
+    if ( hvm_io_pending(v) )
         return;
     /*
      * a softirq may interrupt us between a virtual vmentry is
@@ -2522,3 +2521,13 @@ void nvmx_set_cr_read_shadow(struct vcpu *v, unsigned int cr)
     /* nvcpu.guest_cr is what L2 write to cr actually. */
     __vmwrite(read_shadow_field, v->arch.hvm_vcpu.nvcpu.guest_cr[cr]);
 }
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index dcc3483..08a62ea 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -26,6 +26,7 @@
 #include <asm/hvm/asid.h>
 #include <public/domctl.h>
 #include <public/hvm/save.h>
+#include <public/hvm/ioreq.h>
 #include <asm/mm.h>
 
 /* Interrupt acknowledgement sources. */
@@ -227,7 +228,7 @@ int prepare_ring_for_helper(struct domain *d, unsigned long gmfn,
                             struct page_info **_page, void **_va);
 void destroy_ring_for_helper(void **_va, struct page_info *page);
 
-bool_t hvm_send_assist_req(struct vcpu *v);
+bool_t hvm_send_assist_req(struct vcpu *v, const ioreq_t *p);
 
 void hvm_get_guest_pat(struct vcpu *v, u64 *guest_pat);
 int hvm_set_guest_pat(struct vcpu *v, u64 guest_pat);
@@ -339,6 +340,8 @@ static inline unsigned long hvm_get_shadow_gs_base(struct vcpu *v)
 void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
                                    unsigned int *ecx, unsigned int *edx);
 void hvm_migrate_timers(struct vcpu *v);
+bool_t hvm_has_dm(struct domain *d);
+bool_t hvm_io_pending(struct vcpu *v);
 void hvm_do_resume(struct vcpu *v);
 void hvm_migrate_pirqs(struct vcpu *v);
 
@@ -522,3 +525,13 @@ bool_t nhvm_vmcx_hap_enabled(struct vcpu *v);
 enum hvm_intblk nhvm_interrupt_blocked(struct vcpu *v);
 
 #endif /* __ASM_X86_HVM_HVM_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index 86db58d..bfd28c2 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -92,7 +92,7 @@ static inline int hvm_buffered_io_intercept(ioreq_t *p)
 }
 
 int hvm_mmio_intercept(ioreq_t *p);
-int hvm_buffered_io_send(ioreq_t *p);
+int hvm_buffered_io_send(struct domain *d, const ioreq_t *p);
 
 static inline void register_portio_handler(
     struct domain *d, unsigned long addr,
diff --git a/xen/include/asm-x86/hvm/support.h b/xen/include/asm-x86/hvm/support.h
index 3529499..05ef5c5 100644
--- a/xen/include/asm-x86/hvm/support.h
+++ b/xen/include/asm-x86/hvm/support.h
@@ -22,19 +22,10 @@
 #define __ASM_X86_HVM_SUPPORT_H__
 
 #include <xen/types.h>
-#include <public/hvm/ioreq.h>
 #include <xen/sched.h>
 #include <xen/hvm/save.h>
 #include <asm/processor.h>
 
-static inline ioreq_t *get_ioreq(struct vcpu *v)
-{
-    struct domain *d = v->domain;
-    shared_iopage_t *p = d->arch.hvm_domain.ioreq.va;
-    ASSERT((v == current) || spin_is_locked(&d->arch.hvm_domain.ioreq.lock));
-    return p ? &p->vcpu_ioreq[v->vcpu_id] : NULL;
-}
-
 #define HVM_DELIVER_NO_ERROR_CODE  -1
 
 #ifndef NDEBUG
@@ -142,3 +133,13 @@ int hvm_mov_to_cr(unsigned int cr, unsigned int gpr);
 int hvm_mov_from_cr(unsigned int cr, unsigned int gpr);
 
 #endif /* __ASM_X86_HVM_SUPPORT_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v3 2/6] ioreq-server: tidy up use of ioreq_t
  2014-03-05 14:47 [PATCH v3 0/6] Support for running secondary emulators Paul Durrant
  2014-03-05 14:47 ` [PATCH v3 1/6] ioreq-server: centralize access to ioreq structures Paul Durrant
@ 2014-03-05 14:47 ` Paul Durrant
  2014-03-10 15:43   ` George Dunlap
  2014-03-05 14:47 ` [PATCH v3 3/6] ioreq-server: create basic ioreq server abstraction Paul Durrant
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 48+ messages in thread
From: Paul Durrant @ 2014-03-05 14:47 UTC (permalink / raw)
  To: xen-devel; +Cc: Paul Durrant

This patch tidies up various occurences of single element ioreq_t
arrays on the stack and improves coding style.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
---
 xen/arch/x86/hvm/emulate.c |   36 ++++++++++++++++++------------------
 xen/arch/x86/hvm/hvm.c     |    2 ++
 xen/arch/x86/hvm/io.c      |   37 +++++++++++++++++--------------------
 3 files changed, 37 insertions(+), 38 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 0ba2020..1c71902 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -57,7 +57,7 @@ static int hvmemul_do_io(
     int value_is_ptr = (p_data == NULL);
     struct vcpu *curr = current;
     struct hvm_vcpu_io *vio;
-    ioreq_t p[1];
+    ioreq_t p;
     unsigned long ram_gfn = paddr_to_pfn(ram_gpa);
     p2m_type_t p2mt;
     struct page_info *ram_page;
@@ -171,38 +171,38 @@ static int hvmemul_do_io(
     if ( vio->mmio_retrying )
         *reps = 1;
 
-    p->dir = dir;
-    p->data_is_ptr = value_is_ptr;
-    p->type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO;
-    p->size = size;
-    p->addr = addr;
-    p->count = *reps;
-    p->df = df;
-    p->data = value;
+    p.dir = dir;
+    p.data_is_ptr = value_is_ptr;
+    p.type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO;
+    p.size = size;
+    p.addr = addr;
+    p.count = *reps;
+    p.df = df;
+    p.data = value;
 
     if ( dir == IOREQ_WRITE )
-        hvmtrace_io_assist(is_mmio, p);
+        hvmtrace_io_assist(is_mmio, &p);
 
     if ( is_mmio )
     {
-        rc = hvm_mmio_intercept(p);
+        rc = hvm_mmio_intercept(&p);
         if ( rc == X86EMUL_UNHANDLEABLE )
-            rc = hvm_buffered_io_intercept(p);
+            rc = hvm_buffered_io_intercept(&p);
     }
     else
     {
-        rc = hvm_portio_intercept(p);
+        rc = hvm_portio_intercept(&p);
     }
 
     switch ( rc )
     {
     case X86EMUL_OKAY:
     case X86EMUL_RETRY:
-        *reps = p->count;
-        p->state = STATE_IORESP_READY;
+        *reps = p.count;
+        p.state = STATE_IORESP_READY;
         if ( !vio->mmio_retry )
         {
-            hvm_io_assist(p);
+            hvm_io_assist(&p);
             vio->io_state = HVMIO_none;
         }
         else
@@ -219,7 +219,7 @@ static int hvmemul_do_io(
         else
         {
             rc = X86EMUL_RETRY;
-            if ( !hvm_send_assist_req(curr, p) )
+            if ( !hvm_send_assist_req(curr, &p) )
                 vio->io_state = HVMIO_none;
             else if ( p_data == NULL )
                 rc = X86EMUL_OKAY;
@@ -238,7 +238,7 @@ static int hvmemul_do_io(
 
  finish_access:
     if ( dir == IOREQ_READ )
-        hvmtrace_io_assist(is_mmio, p);
+        hvmtrace_io_assist(is_mmio, &p);
 
     if ( p_data != NULL )
         memcpy(p_data, &vio->io_data, size);
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 0b2e57e..10b8e8c 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -349,7 +349,9 @@ static ioreq_t *get_ioreq(struct vcpu *v)
 {
     struct domain *d = v->domain;
     shared_iopage_t *p = d->arch.hvm_domain.ioreq.va;
+
     ASSERT((v == current) || spin_is_locked(&d->arch.hvm_domain.ioreq.lock));
+
     return p ? &p->vcpu_ioreq[v->vcpu_id] : NULL;
 }
 
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index ba50c53..7aac61d 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -49,22 +49,19 @@
 void send_timeoffset_req(unsigned long timeoff)
 {
     struct vcpu *curr = current;
-    ioreq_t p[1];
+    ioreq_t p = {
+        .type = IOREQ_TYPE_TIMEOFFSET,
+        .size = 8,
+        .count = 1,
+        .dir = IOREQ_WRITE,
+        .data = timeoff,
+        .state = STATE_IOREQ_READY,
+    };
 
     if ( timeoff == 0 )
         return;
 
-    memset(p, 0, sizeof(*p));
-
-    p->type = IOREQ_TYPE_TIMEOFFSET;
-    p->size = 8;
-    p->count = 1;
-    p->dir = IOREQ_WRITE;
-    p->data = timeoff;
-
-    p->state = STATE_IOREQ_READY;
-
-    if ( !hvm_buffered_io_send(curr->domain, p) )
+    if ( !hvm_buffered_io_send(curr->domain, &p) )
         printk("Unsuccessful timeoffset update\n");
 }
 
@@ -72,14 +69,14 @@ void send_timeoffset_req(unsigned long timeoff)
 void send_invalidate_req(void)
 {
     struct vcpu *curr = current;
-    ioreq_t p[1];
-
-    p->type = IOREQ_TYPE_INVALIDATE;
-    p->size = 4;
-    p->dir = IOREQ_WRITE;
-    p->data = ~0UL; /* flush all */
-
-    (void)hvm_send_assist_req(curr, p);
+    ioreq_t p = {
+        .type = IOREQ_TYPE_INVALIDATE,
+        .size = 4,
+        .dir = IOREQ_WRITE,
+        .data = ~0UL, /* flush all */
+    };
+
+    (void)hvm_send_assist_req(curr, &p);
 }
 
 int handle_mmio(void)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v3 3/6] ioreq-server: create basic ioreq server abstraction.
  2014-03-05 14:47 [PATCH v3 0/6] Support for running secondary emulators Paul Durrant
  2014-03-05 14:47 ` [PATCH v3 1/6] ioreq-server: centralize access to ioreq structures Paul Durrant
  2014-03-05 14:47 ` [PATCH v3 2/6] ioreq-server: tidy up use of ioreq_t Paul Durrant
@ 2014-03-05 14:47 ` Paul Durrant
  2014-03-05 14:47 ` [PATCH v3 4/6] ioreq-server: on-demand creation of ioreq server Paul Durrant
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 48+ messages in thread
From: Paul Durrant @ 2014-03-05 14:47 UTC (permalink / raw)
  To: xen-devel; +Cc: Paul Durrant

Collect together data structures concerning device emulation together into
a new struct hvm_ioreq_server.

Code that deals with the shared and buffered ioreq pages is extracted from
functions such as hvm_domain_initialise, hvm_vcpu_initialise and do_hvm_op
and consolidated into a set of hvm_ioreq_server manipulation functions.

This patch also adds some more missing emacs boilerplate in the places
where I needed it.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
---
 xen/arch/x86/hvm/hvm.c           |  303 +++++++++++++++++++++++++-------------
 xen/include/asm-x86/hvm/domain.h |   17 ++-
 xen/include/asm-x86/hvm/vcpu.h   |   12 +-
 3 files changed, 226 insertions(+), 106 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 10b8e8c..bbf9577 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -345,39 +345,43 @@ void hvm_migrate_pirqs(struct vcpu *v)
     spin_unlock(&d->event_lock);
 }
 
-static ioreq_t *get_ioreq(struct vcpu *v)
+static ioreq_t *get_ioreq(struct hvm_ioreq_server *s, int id)
 {
-    struct domain *d = v->domain;
-    shared_iopage_t *p = d->arch.hvm_domain.ioreq.va;
+    shared_iopage_t *p = s->ioreq.va;
 
-    ASSERT((v == current) || spin_is_locked(&d->arch.hvm_domain.ioreq.lock));
+    ASSERT(p != NULL);
 
-    return p ? &p->vcpu_ioreq[v->vcpu_id] : NULL;
+    return &p->vcpu_ioreq[id];
 }
 
 bool_t hvm_io_pending(struct vcpu *v)
 {
-    ioreq_t *p = get_ioreq(v);
+    struct hvm_ioreq_server *s = v->domain->arch.hvm_domain.ioreq_server;
+    ioreq_t *p;
 
-    if ( !p )
-         return 0;
+    if ( !s )
+        return 0;
 
+    p = get_ioreq(s, v->vcpu_id);
     return ( p->state != STATE_IOREQ_NONE );
 }
 
 void hvm_do_resume(struct vcpu *v)
 {
-    ioreq_t *p = get_ioreq(v);
+    struct domain *d = v->domain;
+    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+    ioreq_t *p;
 
     check_wakeup_from_wait();
 
     if ( is_hvm_vcpu(v) )
         pt_restore_timer(v);
 
-    /* NB. Optimised for common case (p->state == STATE_IOREQ_NONE). */
-    if ( !p )
+    if ( !s )
         goto check_inject_trap;
 
+    /* NB. Optimised for common case (p->state == STATE_IOREQ_NONE). */
+    p = get_ioreq(s, v->vcpu_id);
     while ( p->state != STATE_IOREQ_NONE )
     {
         switch ( p->state )
@@ -387,7 +391,7 @@ void hvm_do_resume(struct vcpu *v)
             break;
         case STATE_IOREQ_READY:  /* IOREQ_{READY,INPROCESS} -> IORESP_READY */
         case STATE_IOREQ_INPROCESS:
-            wait_on_xen_event_channel(v->arch.hvm_vcpu.xen_port,
+            wait_on_xen_event_channel(p->vp_eport,
                                       (p->state != STATE_IOREQ_READY) &&
                                       (p->state != STATE_IOREQ_INPROCESS));
             break;
@@ -410,7 +414,6 @@ void hvm_do_resume(struct vcpu *v)
 static void hvm_init_ioreq_page(
     struct domain *d, struct hvm_ioreq_page *iorp)
 {
-    memset(iorp, 0, sizeof(*iorp));
     spin_lock_init(&iorp->lock);
     domain_pause(d);
 }
@@ -553,6 +556,149 @@ static int handle_pvh_io(
     return X86EMUL_OKAY;
 }
 
+static void hvm_update_ioreq_server_evtchn(struct hvm_ioreq_server *s)
+{
+    if ( s->ioreq.va != NULL )
+    {
+        struct domain *d = s->domain;
+        shared_iopage_t *p = s->ioreq.va;
+        struct vcpu *v;
+
+        for_each_vcpu ( d, v )
+            p->vcpu_ioreq[v->vcpu_id].vp_eport = v->arch.hvm_vcpu.ioreq_evtchn;
+    }
+}
+
+static int hvm_ioreq_server_add_vcpu(struct hvm_ioreq_server *s, struct vcpu *v)
+{
+    int rc;
+
+    rc = alloc_unbound_xen_event_channel(v, s->domid, NULL);
+    if ( rc < 0 )
+        goto done;
+
+    v->arch.hvm_vcpu.ioreq_evtchn = rc;
+
+    if ( v->vcpu_id == 0 )
+    {
+        struct domain *d = s->domain;
+
+        rc = alloc_unbound_xen_event_channel(v, s->domid, NULL);
+        if ( rc < 0 )
+            goto done;
+
+        s->buf_ioreq_evtchn = rc;
+        d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN] = s->buf_ioreq_evtchn;
+    }
+
+    hvm_update_ioreq_server_evtchn(s);
+    rc = 0;
+
+ done:
+    return rc;
+}
+
+static void hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s, struct vcpu *v)
+{
+    if ( v->vcpu_id == 0 )
+        free_xen_event_channel(v, s->buf_ioreq_evtchn);
+
+    free_xen_event_channel(v, v->arch.hvm_vcpu.ioreq_evtchn);
+}
+
+static int hvm_create_ioreq_server(struct domain *d)
+{
+    struct hvm_ioreq_server *s;
+
+    s = xzalloc(struct hvm_ioreq_server);
+    if ( !s )
+        return -ENOMEM;
+
+    s->domain = d;
+
+    hvm_init_ioreq_page(d, &s->ioreq);
+    hvm_init_ioreq_page(d, &s->buf_ioreq);
+
+    d->arch.hvm_domain.ioreq_server = s;
+    return 0;
+}
+
+static void hvm_destroy_ioreq_server(struct domain *d)
+{
+    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+
+    hvm_destroy_ioreq_page(d, &s->ioreq);
+    hvm_destroy_ioreq_page(d, &s->buf_ioreq);
+
+    xfree(s);
+}
+
+static int hvm_replace_event_channel(struct vcpu *v, domid_t remote_domid,
+                                     evtchn_port_t *p_port)
+{
+    evtchn_port_t old_port, new_port;
+
+    new_port = alloc_unbound_xen_event_channel(v, remote_domid, NULL);
+    if ( new_port < 0 )
+        return new_port;
+
+    /* xchg() ensures that only we call free_xen_event_channel(). */
+    old_port = xchg(p_port, new_port);
+    free_xen_event_channel(v, old_port);
+    return 0;
+}
+
+static int hvm_set_ioreq_server_domid(struct hvm_ioreq_server *s, domid_t domid)
+{
+    struct domain *d = s->domain;
+    struct vcpu *v;
+    int rc;
+
+    domain_pause(d);
+
+    for_each_vcpu ( d, v )
+    {
+        rc = hvm_replace_event_channel(v, domid, &v->arch.hvm_vcpu.ioreq_evtchn);
+        if ( rc )
+            goto done;
+
+        if ( v->vcpu_id == 0 ) {
+            rc = hvm_replace_event_channel(v, domid, &s->buf_ioreq_evtchn);
+            if ( rc )
+                goto done;
+
+            d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN] = s->buf_ioreq_evtchn;
+        }
+    }
+
+    hvm_update_ioreq_server_evtchn(s);
+
+    s->domid = domid;
+    rc = 0;
+
+ done:
+    domain_unpause(d);
+
+    return rc;
+}
+
+static int hvm_set_ioreq_server_pfn(struct hvm_ioreq_server *s, unsigned long pfn)
+{
+    int rc;
+
+    rc = hvm_set_ioreq_page(s->domain, &s->ioreq, pfn);
+    if ( rc )
+        return rc;
+
+    hvm_update_ioreq_server_evtchn(s);
+    return 0;
+}
+
+static int hvm_set_ioreq_server_buf_pfn(struct hvm_ioreq_server *s, unsigned long pfn)
+{
+    return hvm_set_ioreq_page(s->domain, &s->buf_ioreq, pfn);
+}
+
 int hvm_domain_initialise(struct domain *d)
 {
     int rc;
@@ -620,17 +766,20 @@ int hvm_domain_initialise(struct domain *d)
 
     rtc_init(d);
 
-    hvm_init_ioreq_page(d, &d->arch.hvm_domain.ioreq);
-    hvm_init_ioreq_page(d, &d->arch.hvm_domain.buf_ioreq);
+    rc = hvm_create_ioreq_server(d);
+    if ( rc != 0 )
+        goto fail2;
 
     register_portio_handler(d, 0xe9, 1, hvm_print_line);
 
     rc = hvm_funcs.domain_initialise(d);
     if ( rc != 0 )
-        goto fail2;
+        goto fail3;
 
     return 0;
 
+ fail3:
+    hvm_destroy_ioreq_server(d);
  fail2:
     rtc_deinit(d);
     stdvga_deinit(d);
@@ -654,8 +803,7 @@ void hvm_domain_relinquish_resources(struct domain *d)
     if ( hvm_funcs.nhvm_domain_relinquish_resources )
         hvm_funcs.nhvm_domain_relinquish_resources(d);
 
-    hvm_destroy_ioreq_page(d, &d->arch.hvm_domain.ioreq);
-    hvm_destroy_ioreq_page(d, &d->arch.hvm_domain.buf_ioreq);
+    hvm_destroy_ioreq_server(d);
 
     msixtbl_pt_cleanup(d);
 
@@ -1287,7 +1435,7 @@ int hvm_vcpu_initialise(struct vcpu *v)
 {
     int rc;
     struct domain *d = v->domain;
-    domid_t dm_domid;
+    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
 
     hvm_asid_flush_vcpu(v);
 
@@ -1330,30 +1478,10 @@ int hvm_vcpu_initialise(struct vcpu *v)
          && (rc = nestedhvm_vcpu_initialise(v)) < 0 ) /* teardown: nestedhvm_vcpu_destroy */
         goto fail5;
 
-    dm_domid = d->arch.hvm_domain.params[HVM_PARAM_DM_DOMAIN];
-
-    /* Create ioreq event channel. */
-    rc = alloc_unbound_xen_event_channel(v, dm_domid, NULL); /* teardown: none */
-    if ( rc < 0 )
+    rc = hvm_ioreq_server_add_vcpu(s, v);
+    if ( rc != 0 )
         goto fail6;
 
-    /* Register ioreq event channel. */
-    v->arch.hvm_vcpu.xen_port = rc;
-
-    if ( v->vcpu_id == 0 )
-    {
-        /* Create bufioreq event channel. */
-        rc = alloc_unbound_xen_event_channel(v, dm_domid, NULL); /* teardown: none */
-        if ( rc < 0 )
-            goto fail6;
-        d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN] = rc;
-    }
-
-    spin_lock(&d->arch.hvm_domain.ioreq.lock);
-    if ( d->arch.hvm_domain.ioreq.va != NULL )
-        get_ioreq(v)->vp_eport = v->arch.hvm_vcpu.xen_port;
-    spin_unlock(&d->arch.hvm_domain.ioreq.lock);
-
     if ( v->vcpu_id == 0 )
     {
         /* NB. All these really belong in hvm_domain_initialise(). */
@@ -1387,6 +1515,11 @@ int hvm_vcpu_initialise(struct vcpu *v)
 
 void hvm_vcpu_destroy(struct vcpu *v)
 {
+    struct domain *d = v->domain;
+    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+
+    hvm_ioreq_server_remove_vcpu(s, v);
+
     nestedhvm_vcpu_destroy(v);
 
     free_compat_arg_xlat(v);
@@ -1398,9 +1531,6 @@ void hvm_vcpu_destroy(struct vcpu *v)
         vlapic_destroy(v);
 
     hvm_funcs.vcpu_destroy(v);
-
-    /* Event channel is already freed by evtchn_destroy(). */
-    /*free_xen_event_channel(v, v->arch.hvm_vcpu.xen_port);*/
 }
 
 void hvm_vcpu_down(struct vcpu *v)
@@ -1429,8 +1559,9 @@ void hvm_vcpu_down(struct vcpu *v)
 
 int hvm_buffered_io_send(struct domain *d, const ioreq_t *p)
 {
-    struct hvm_ioreq_page *iorp = &d->arch.hvm_domain.buf_ioreq;
-    buffered_iopage_t *pg = iorp->va;
+    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+    struct hvm_ioreq_page *iorp;
+    buffered_iopage_t *pg;
     buf_ioreq_t bp;
     /* Timeoffset sends 64b data, but no address. Use two consecutive slots. */
     int qw = 0;
@@ -1438,6 +1569,12 @@ int hvm_buffered_io_send(struct domain *d, const ioreq_t *p)
     /* Ensure buffered_iopage fits in a page */
     BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE);
 
+    if ( !s )
+        return 0;
+
+    iorp = &s->buf_ioreq;
+    pg = iorp->va;
+
     /*
      * Return 0 for the cases we can't deal with:
      *  - 'addr' is only a 20-bit field, so we cannot address beyond 1MB
@@ -1496,7 +1633,7 @@ int hvm_buffered_io_send(struct domain *d, const ioreq_t *p)
     wmb();
     pg->write_pointer += qw ? 2 : 1;
 
-    notify_via_xen_event_channel(d, d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN]);
+    notify_via_xen_event_channel(d, s->buf_ioreq_evtchn);
     spin_unlock(&iorp->lock);
 
     return 1;
@@ -1504,24 +1641,28 @@ int hvm_buffered_io_send(struct domain *d, const ioreq_t *p)
 
 bool_t hvm_has_dm(struct domain *d)
 {
-    return !!d->arch.hvm_domain.ioreq.va;
+    return !!d->arch.hvm_domain.ioreq_server;
 }
 
 bool_t hvm_send_assist_req(struct vcpu *v, const ioreq_t *proto_p)
 {
+    struct domain *d = v->domain;
+    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
     ioreq_t *p;
 
     if ( unlikely(!vcpu_start_shutdown_deferral(v)) )
         return 0; /* implicitly bins the i/o operation */
 
-    if ( !(p = get_ioreq(v)) )
+    if ( !s )
         return 0;
 
+    p = get_ioreq(s, v->vcpu_id);
+
     if ( unlikely(p->state != STATE_IOREQ_NONE) )
     {
         /* This indicates a bug in the device model. Crash the domain. */
         gdprintk(XENLOG_ERR, "Device model set bad IO state %d.\n", p->state);
-        domain_crash(v->domain);
+        domain_crash(d);
         return 0;
     }
 
@@ -1534,14 +1675,14 @@ bool_t hvm_send_assist_req(struct vcpu *v, const ioreq_t *proto_p)
     p->df = proto_p->df;
     p->data = proto_p->data;
 
-    prepare_wait_on_xen_event_channel(v->arch.hvm_vcpu.xen_port);
+    prepare_wait_on_xen_event_channel(p->vp_eport);
 
     /*
      * Following happens /after/ blocking and setting up ioreq contents.
      * prepare_wait_on_xen_event_channel() is an implicit barrier.
      */
     p->state = STATE_IOREQ_READY;
-    notify_via_xen_event_channel(v->domain, v->arch.hvm_vcpu.xen_port);
+    notify_via_xen_event_channel(d, p->vp_eport);
 
     return 1;
 }
@@ -4134,21 +4275,6 @@ static int hvmop_flush_tlb_all(void)
     return 0;
 }
 
-static int hvm_replace_event_channel(struct vcpu *v, domid_t remote_domid,
-                                     int *p_port)
-{
-    int old_port, new_port;
-
-    new_port = alloc_unbound_xen_event_channel(v, remote_domid, NULL);
-    if ( new_port < 0 )
-        return new_port;
-
-    /* xchg() ensures that only we call free_xen_event_channel(). */
-    old_port = xchg(p_port, new_port);
-    free_xen_event_channel(v, old_port);
-    return 0;
-}
-
 long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
 
 {
@@ -4161,7 +4287,6 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
     case HVMOP_get_param:
     {
         struct xen_hvm_param a;
-        struct hvm_ioreq_page *iorp;
         struct domain *d;
         struct vcpu *v;
 
@@ -4194,19 +4319,12 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
             switch ( a.index )
             {
             case HVM_PARAM_IOREQ_PFN:
-                iorp = &d->arch.hvm_domain.ioreq;
-                if ( (rc = hvm_set_ioreq_page(d, iorp, a.value)) != 0 )
-                    break;
-                spin_lock(&iorp->lock);
-                if ( iorp->va != NULL )
-                    /* Initialise evtchn port info if VCPUs already created. */
-                    for_each_vcpu ( d, v )
-                        get_ioreq(v)->vp_eport = v->arch.hvm_vcpu.xen_port;
-                spin_unlock(&iorp->lock);
+                rc = hvm_set_ioreq_server_pfn(d->arch.hvm_domain.ioreq_server,
+                                              a.value);
                 break;
             case HVM_PARAM_BUFIOREQ_PFN: 
-                iorp = &d->arch.hvm_domain.buf_ioreq;
-                rc = hvm_set_ioreq_page(d, iorp, a.value);
+                rc = hvm_set_ioreq_server_buf_pfn(d->arch.hvm_domain.ioreq_server,
+                                                  a.value);
                 break;
             case HVM_PARAM_CALLBACK_IRQ:
                 hvm_set_callback_via(d, a.value);
@@ -4261,31 +4379,8 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
                 if ( a.value == DOMID_SELF )
                     a.value = curr_d->domain_id;
 
-                rc = 0;
-                domain_pause(d); /* safe to change per-vcpu xen_port */
-                if ( d->vcpu[0] )
-                    rc = hvm_replace_event_channel(d->vcpu[0], a.value,
-                             (int *)&d->vcpu[0]->domain->arch.hvm_domain.params
-                                     [HVM_PARAM_BUFIOREQ_EVTCHN]);
-                if ( rc )
-                {
-                    domain_unpause(d);
-                    break;
-                }
-                iorp = &d->arch.hvm_domain.ioreq;
-                for_each_vcpu ( d, v )
-                {
-                    rc = hvm_replace_event_channel(v, a.value,
-                                                   &v->arch.hvm_vcpu.xen_port);
-                    if ( rc )
-                        break;
-
-                    spin_lock(&iorp->lock);
-                    if ( iorp->va != NULL )
-                        get_ioreq(v)->vp_eport = v->arch.hvm_vcpu.xen_port;
-                    spin_unlock(&iorp->lock);
-                }
-                domain_unpause(d);
+                rc = hvm_set_ioreq_server_domid(d->arch.hvm_domain.ioreq_server,
+                                                a.value);
                 break;
             case HVM_PARAM_ACPI_S_STATE:
                 /* Not reflexive, as we must domain_pause(). */
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index b1e3187..d8dbfab 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -41,10 +41,16 @@ struct hvm_ioreq_page {
     void *va;
 };
 
-struct hvm_domain {
+struct hvm_ioreq_server {
+    struct domain          *domain;
+    domid_t                domid; /* domid of emulator */
     struct hvm_ioreq_page  ioreq;
     struct hvm_ioreq_page  buf_ioreq;
+    evtchn_port_t          buf_ioreq_evtchn;
+};
 
+struct hvm_domain {
+    struct hvm_ioreq_server *ioreq_server;
     struct pl_time         pl_time;
 
     struct hvm_io_handler *io_handler;
@@ -100,3 +106,12 @@ struct hvm_domain {
 
 #endif /* __ASM_X86_HVM_DOMAIN_H__ */
 
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index 122ab0d..b282bab 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -138,7 +138,7 @@ struct hvm_vcpu {
     spinlock_t          tm_lock;
     struct list_head    tm_list;
 
-    int                 xen_port;
+    evtchn_port_t       ioreq_evtchn;
 
     bool_t              flag_dr_dirty;
     bool_t              debug_state_latch;
@@ -186,3 +186,13 @@ struct hvm_vcpu {
 };
 
 #endif /* __ASM_X86_HVM_VCPU_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v3 4/6] ioreq-server: on-demand creation of ioreq server
  2014-03-05 14:47 [PATCH v3 0/6] Support for running secondary emulators Paul Durrant
                   ` (2 preceding siblings ...)
  2014-03-05 14:47 ` [PATCH v3 3/6] ioreq-server: create basic ioreq server abstraction Paul Durrant
@ 2014-03-05 14:47 ` Paul Durrant
  2014-03-10 17:46   ` George Dunlap
  2014-03-14 11:18   ` Ian Campbell
  2014-03-05 14:48 ` [PATCH v3 5/6] ioreq-server: add support for multiple servers Paul Durrant
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 48+ messages in thread
From: Paul Durrant @ 2014-03-05 14:47 UTC (permalink / raw)
  To: xen-devel; +Cc: Paul Durrant

This patch only creates the ioreq server when the legacy HVM parameters
are touched by an emulator. It also lays some groundwork for supporting
multiple IOREQ servers. For instance, it introduces ioreq server reference
counting which is not strictly necessary at this stage but will become so
when ioreq servers can be destroyed prior the domain dying.

There is a significant change in the layout of the special pages reserved
in xc_hvm_build_x86.c. This is so that we can 'grow' them downwards without
moving pages such as the xenstore page when building a domain that can
support more than one emulator.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
---
 tools/libxc/xc_hvm_build_x86.c |   40 +++++--
 xen/arch/x86/hvm/hvm.c         |  240 +++++++++++++++++++++++++---------------
 2 files changed, 176 insertions(+), 104 deletions(-)

diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
index dd3b522..b65e702 100644
--- a/tools/libxc/xc_hvm_build_x86.c
+++ b/tools/libxc/xc_hvm_build_x86.c
@@ -41,13 +41,12 @@
 #define SPECIALPAGE_PAGING   0
 #define SPECIALPAGE_ACCESS   1
 #define SPECIALPAGE_SHARING  2
-#define SPECIALPAGE_BUFIOREQ 3
-#define SPECIALPAGE_XENSTORE 4
-#define SPECIALPAGE_IOREQ    5
-#define SPECIALPAGE_IDENT_PT 6
-#define SPECIALPAGE_CONSOLE  7
-#define NR_SPECIAL_PAGES     8
-#define special_pfn(x) (0xff000u - NR_SPECIAL_PAGES + (x))
+#define SPECIALPAGE_XENSTORE 3
+#define SPECIALPAGE_IDENT_PT 4
+#define SPECIALPAGE_CONSOLE  5
+#define SPECIALPAGE_IOREQ    6
+#define NR_SPECIAL_PAGES     SPECIALPAGE_IOREQ + 2 /* ioreq server needs 2 pages */
+#define special_pfn(x) (0xff000u - 1 - (x))
 
 #define VGA_HOLE_SIZE (0x20)
 
@@ -114,7 +113,7 @@ static void build_hvm_info(void *hvm_info_page, uint64_t mem_size,
     /* Memory parameters. */
     hvm_info->low_mem_pgend = lowmem_end >> PAGE_SHIFT;
     hvm_info->high_mem_pgend = highmem_end >> PAGE_SHIFT;
-    hvm_info->reserved_mem_pgstart = special_pfn(0);
+    hvm_info->reserved_mem_pgstart = special_pfn(0) - NR_SPECIAL_PAGES;
 
     /* Finish with the checksum. */
     for ( i = 0, sum = 0; i < hvm_info->length; i++ )
@@ -473,6 +472,23 @@ static int setup_guest(xc_interface *xch,
     munmap(hvm_info_page, PAGE_SIZE);
 
     /* Allocate and clear special pages. */
+
+    DPRINTF("%d SPECIAL PAGES:\n", NR_SPECIAL_PAGES);
+    DPRINTF("  PAGING:    %"PRI_xen_pfn"\n",
+            (xen_pfn_t)special_pfn(SPECIALPAGE_PAGING));
+    DPRINTF("  ACCESS:    %"PRI_xen_pfn"\n",
+            (xen_pfn_t)special_pfn(SPECIALPAGE_ACCESS));
+    DPRINTF("  SHARING:   %"PRI_xen_pfn"\n",
+            (xen_pfn_t)special_pfn(SPECIALPAGE_SHARING));
+    DPRINTF("  STORE:     %"PRI_xen_pfn"\n",
+            (xen_pfn_t)special_pfn(SPECIALPAGE_XENSTORE));
+    DPRINTF("  IDENT_PT:  %"PRI_xen_pfn"\n",
+            (xen_pfn_t)special_pfn(SPECIALPAGE_IDENT_PT));
+    DPRINTF("  CONSOLE:   %"PRI_xen_pfn"\n",
+            (xen_pfn_t)special_pfn(SPECIALPAGE_CONSOLE));
+    DPRINTF("  IOREQ:     %"PRI_xen_pfn"\n",
+            (xen_pfn_t)special_pfn(SPECIALPAGE_IOREQ));
+
     for ( i = 0; i < NR_SPECIAL_PAGES; i++ )
     {
         xen_pfn_t pfn = special_pfn(i);
@@ -488,10 +504,6 @@ static int setup_guest(xc_interface *xch,
 
     xc_set_hvm_param(xch, dom, HVM_PARAM_STORE_PFN,
                      special_pfn(SPECIALPAGE_XENSTORE));
-    xc_set_hvm_param(xch, dom, HVM_PARAM_BUFIOREQ_PFN,
-                     special_pfn(SPECIALPAGE_BUFIOREQ));
-    xc_set_hvm_param(xch, dom, HVM_PARAM_IOREQ_PFN,
-                     special_pfn(SPECIALPAGE_IOREQ));
     xc_set_hvm_param(xch, dom, HVM_PARAM_CONSOLE_PFN,
                      special_pfn(SPECIALPAGE_CONSOLE));
     xc_set_hvm_param(xch, dom, HVM_PARAM_PAGING_RING_PFN,
@@ -500,6 +512,10 @@ static int setup_guest(xc_interface *xch,
                      special_pfn(SPECIALPAGE_ACCESS));
     xc_set_hvm_param(xch, dom, HVM_PARAM_SHARING_RING_PFN,
                      special_pfn(SPECIALPAGE_SHARING));
+    xc_set_hvm_param(xch, dom, HVM_PARAM_IOREQ_PFN,
+                     special_pfn(SPECIALPAGE_IOREQ));
+    xc_set_hvm_param(xch, dom, HVM_PARAM_BUFIOREQ_PFN,
+                     special_pfn(SPECIALPAGE_IOREQ) - 1);
 
     /*
      * Identity-map page table is required for running with CR0.PG=0 when
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index bbf9577..22b2a2c 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -366,22 +366,9 @@ bool_t hvm_io_pending(struct vcpu *v)
     return ( p->state != STATE_IOREQ_NONE );
 }
 
-void hvm_do_resume(struct vcpu *v)
+static void hvm_wait_on_io(struct domain *d, ioreq_t *p)
 {
-    struct domain *d = v->domain;
-    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
-    ioreq_t *p;
-
-    check_wakeup_from_wait();
-
-    if ( is_hvm_vcpu(v) )
-        pt_restore_timer(v);
-
-    if ( !s )
-        goto check_inject_trap;
-
     /* NB. Optimised for common case (p->state == STATE_IOREQ_NONE). */
-    p = get_ioreq(s, v->vcpu_id);
     while ( p->state != STATE_IOREQ_NONE )
     {
         switch ( p->state )
@@ -397,12 +384,29 @@ void hvm_do_resume(struct vcpu *v)
             break;
         default:
             gdprintk(XENLOG_ERR, "Weird HVM iorequest state %d.\n", p->state);
-            domain_crash(v->domain);
+            domain_crash(d);
             return; /* bail */
         }
     }
+}
+
+void hvm_do_resume(struct vcpu *v)
+{
+    struct domain *d = v->domain;
+    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+
+    check_wakeup_from_wait();
+
+    if ( is_hvm_vcpu(v) )
+        pt_restore_timer(v);
+
+    if ( s )
+    {
+        ioreq_t *p = get_ioreq(s, v->vcpu_id);
+
+        hvm_wait_on_io(d, p);
+    }
 
- check_inject_trap:
     /* Inject pending hw/sw trap */
     if ( v->arch.hvm_vcpu.inject_trap.vector != -1 ) 
     {
@@ -411,11 +415,13 @@ void hvm_do_resume(struct vcpu *v)
     }
 }
 
-static void hvm_init_ioreq_page(
-    struct domain *d, struct hvm_ioreq_page *iorp)
+static void hvm_init_ioreq_page(struct hvm_ioreq_server *s, bool_t buf)
 {
+    struct hvm_ioreq_page *iorp;
+
+    iorp = buf ? &s->buf_ioreq : &s->ioreq;
+
     spin_lock_init(&iorp->lock);
-    domain_pause(d);
 }
 
 void destroy_ring_for_helper(
@@ -431,16 +437,13 @@ void destroy_ring_for_helper(
     }
 }
 
-static void hvm_destroy_ioreq_page(
-    struct domain *d, struct hvm_ioreq_page *iorp)
+static void hvm_destroy_ioreq_page(struct hvm_ioreq_server *s, bool_t buf)
 {
-    spin_lock(&iorp->lock);
+    struct hvm_ioreq_page *iorp;
 
-    ASSERT(d->is_dying);
+    iorp = buf ? &s->buf_ioreq : &s->ioreq;
 
     destroy_ring_for_helper(&iorp->va, iorp->page);
-
-    spin_unlock(&iorp->lock);
 }
 
 int prepare_ring_for_helper(
@@ -487,9 +490,11 @@ int prepare_ring_for_helper(
     return 0;
 }
 
-static int hvm_set_ioreq_page(
-    struct domain *d, struct hvm_ioreq_page *iorp, unsigned long gmfn)
+static int hvm_set_ioreq_page(struct hvm_ioreq_server *s, bool_t buf,
+                              unsigned long gmfn)
 {
+    struct domain *d = s->domain;
+    struct hvm_ioreq_page *iorp;
     struct page_info *page;
     void *va;
     int rc;
@@ -497,22 +502,17 @@ static int hvm_set_ioreq_page(
     if ( (rc = prepare_ring_for_helper(d, gmfn, &page, &va)) )
         return rc;
 
-    spin_lock(&iorp->lock);
+    iorp = buf ? &s->buf_ioreq : &s->ioreq;
 
     if ( (iorp->va != NULL) || d->is_dying )
     {
-        destroy_ring_for_helper(&iorp->va, iorp->page);
-        spin_unlock(&iorp->lock);
+        destroy_ring_for_helper(&va, page);
         return -EINVAL;
     }
 
     iorp->va = va;
     iorp->page = page;
 
-    spin_unlock(&iorp->lock);
-
-    domain_unpause(d);
-
     return 0;
 }
 
@@ -606,29 +606,88 @@ static void hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s, struct vcpu
     free_xen_event_channel(v, v->arch.hvm_vcpu.ioreq_evtchn);
 }
 
-static int hvm_create_ioreq_server(struct domain *d)
+static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
 {
     struct hvm_ioreq_server *s;
+    unsigned long pfn;
+    struct vcpu *v;
+    int rc;
+
+    if ( d->arch.hvm_domain.ioreq_server != NULL )
+        return -EEXIST;
 
+    gdprintk(XENLOG_DEBUG, "%s: %d\n", __func__, d->domain_id);
+
+    rc = -ENOMEM;
     s = xzalloc(struct hvm_ioreq_server);
     if ( !s )
-        return -ENOMEM;
+        goto fail_alloc;
 
     s->domain = d;
+    s->domid = domid;
+
+    /* Initialize shared pages */
+    pfn = d->arch.hvm_domain.params[HVM_PARAM_IOREQ_PFN];
+
+    hvm_init_ioreq_page(s, 0);
+    if ( (rc = hvm_set_ioreq_page(s, 0, pfn)) < 0 )
+        goto fail_set_ioreq;
 
-    hvm_init_ioreq_page(d, &s->ioreq);
-    hvm_init_ioreq_page(d, &s->buf_ioreq);
+    pfn = d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_PFN];
+
+    hvm_init_ioreq_page(s, 1);
+    if ( (rc = hvm_set_ioreq_page(s, 1, pfn)) < 0 )
+        goto fail_set_buf_ioreq;
+
+    domain_pause(d);
+
+    for_each_vcpu ( d, v )
+    {
+        if ( (rc = hvm_ioreq_server_add_vcpu(s, v)) < 0 )
+            goto fail_add_vcpu;
+    }
 
     d->arch.hvm_domain.ioreq_server = s;
+
+    domain_unpause(d);
+
     return 0;
+
+ fail_add_vcpu:
+    for_each_vcpu ( d, v )
+        hvm_ioreq_server_remove_vcpu(s, v);
+    domain_unpause(d);
+    hvm_destroy_ioreq_page(s, 1);
+ fail_set_buf_ioreq:
+    hvm_destroy_ioreq_page(s, 0);
+ fail_set_ioreq:
+    xfree(s);
+ fail_alloc:
+    return rc;
 }
 
 static void hvm_destroy_ioreq_server(struct domain *d)
 {
-    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+    struct hvm_ioreq_server *s;
+    struct vcpu *v;
+
+    gdprintk(XENLOG_DEBUG, "%s: %d\n", __func__, d->domain_id);
+
+    s = d->arch.hvm_domain.ioreq_server;
+    if ( !s )
+        return;
+
+    domain_pause(d);
+
+    d->arch.hvm_domain.ioreq_server = NULL;
+
+    for_each_vcpu ( d, v )
+        hvm_ioreq_server_remove_vcpu(s, v);
+
+    domain_unpause(d);
 
-    hvm_destroy_ioreq_page(d, &s->ioreq);
-    hvm_destroy_ioreq_page(d, &s->buf_ioreq);
+    hvm_destroy_ioreq_page(s, 1);
+    hvm_destroy_ioreq_page(s, 0);
 
     xfree(s);
 }
@@ -648,14 +707,22 @@ static int hvm_replace_event_channel(struct vcpu *v, domid_t remote_domid,
     return 0;
 }
 
-static int hvm_set_ioreq_server_domid(struct hvm_ioreq_server *s, domid_t domid)
+static int hvm_set_ioreq_server_domid(struct domain *d, domid_t domid)
 {
-    struct domain *d = s->domain;
+    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
     struct vcpu *v;
     int rc;
 
     domain_pause(d);
 
+    rc = -ENOENT;
+    if ( !s )
+        goto done;
+
+    rc = 0;
+    if ( s->domid == domid )
+        goto done;
+
     for_each_vcpu ( d, v )
     {
         rc = hvm_replace_event_channel(v, domid, &v->arch.hvm_vcpu.ioreq_evtchn);
@@ -682,23 +749,6 @@ static int hvm_set_ioreq_server_domid(struct hvm_ioreq_server *s, domid_t domid)
     return rc;
 }
 
-static int hvm_set_ioreq_server_pfn(struct hvm_ioreq_server *s, unsigned long pfn)
-{
-    int rc;
-
-    rc = hvm_set_ioreq_page(s->domain, &s->ioreq, pfn);
-    if ( rc )
-        return rc;
-
-    hvm_update_ioreq_server_evtchn(s);
-    return 0;
-}
-
-static int hvm_set_ioreq_server_buf_pfn(struct hvm_ioreq_server *s, unsigned long pfn)
-{
-    return hvm_set_ioreq_page(s->domain, &s->buf_ioreq, pfn);
-}
-
 int hvm_domain_initialise(struct domain *d)
 {
     int rc;
@@ -766,20 +816,14 @@ int hvm_domain_initialise(struct domain *d)
 
     rtc_init(d);
 
-    rc = hvm_create_ioreq_server(d);
-    if ( rc != 0 )
-        goto fail2;
-
     register_portio_handler(d, 0xe9, 1, hvm_print_line);
 
     rc = hvm_funcs.domain_initialise(d);
     if ( rc != 0 )
-        goto fail3;
+        goto fail2;
 
     return 0;
 
- fail3:
-    hvm_destroy_ioreq_server(d);
  fail2:
     rtc_deinit(d);
     stdvga_deinit(d);
@@ -1478,9 +1522,12 @@ int hvm_vcpu_initialise(struct vcpu *v)
          && (rc = nestedhvm_vcpu_initialise(v)) < 0 ) /* teardown: nestedhvm_vcpu_destroy */
         goto fail5;
 
-    rc = hvm_ioreq_server_add_vcpu(s, v);
-    if ( rc != 0 )
-        goto fail6;
+    if ( s )
+    {
+        rc = hvm_ioreq_server_add_vcpu(s, v);
+        if ( rc != 0 )
+            goto fail6;
+    }
 
     if ( v->vcpu_id == 0 )
     {
@@ -1518,7 +1565,8 @@ void hvm_vcpu_destroy(struct vcpu *v)
     struct domain *d = v->domain;
     struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
 
-    hvm_ioreq_server_remove_vcpu(s, v);
+    if ( s )
+        hvm_ioreq_server_remove_vcpu(s, v);
 
     nestedhvm_vcpu_destroy(v);
 
@@ -1644,19 +1692,12 @@ bool_t hvm_has_dm(struct domain *d)
     return !!d->arch.hvm_domain.ioreq_server;
 }
 
-bool_t hvm_send_assist_req(struct vcpu *v, const ioreq_t *proto_p)
+static bool_t hvm_send_assist_req_to_server(struct hvm_ioreq_server *s,
+                                            struct vcpu *v,
+                                            const ioreq_t *proto_p)
 {
     struct domain *d = v->domain;
-    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
-    ioreq_t *p;
-
-    if ( unlikely(!vcpu_start_shutdown_deferral(v)) )
-        return 0; /* implicitly bins the i/o operation */
-
-    if ( !s )
-        return 0;
-
-    p = get_ioreq(s, v->vcpu_id);
+    ioreq_t *p = get_ioreq(s, v->vcpu_id);
 
     if ( unlikely(p->state != STATE_IOREQ_NONE) )
     {
@@ -1687,6 +1728,20 @@ bool_t hvm_send_assist_req(struct vcpu *v, const ioreq_t *proto_p)
     return 1;
 }
 
+bool_t hvm_send_assist_req(struct vcpu *v, const ioreq_t *p)
+{
+    struct domain *d = v->domain;
+    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+
+    if ( unlikely(!vcpu_start_shutdown_deferral(v)) )
+        return 0;
+
+    if ( !s )
+        return 0;
+
+    return hvm_send_assist_req_to_server(s, v, p);
+}
+
 void hvm_hlt(unsigned long rflags)
 {
     struct vcpu *curr = current;
@@ -4318,14 +4373,6 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
 
             switch ( a.index )
             {
-            case HVM_PARAM_IOREQ_PFN:
-                rc = hvm_set_ioreq_server_pfn(d->arch.hvm_domain.ioreq_server,
-                                              a.value);
-                break;
-            case HVM_PARAM_BUFIOREQ_PFN: 
-                rc = hvm_set_ioreq_server_buf_pfn(d->arch.hvm_domain.ioreq_server,
-                                                  a.value);
-                break;
             case HVM_PARAM_CALLBACK_IRQ:
                 hvm_set_callback_via(d, a.value);
                 hvm_latch_shinfo_size(d);
@@ -4379,8 +4426,9 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
                 if ( a.value == DOMID_SELF )
                     a.value = curr_d->domain_id;
 
-                rc = hvm_set_ioreq_server_domid(d->arch.hvm_domain.ioreq_server,
-                                                a.value);
+                rc = hvm_create_ioreq_server(d, a.value);
+                if ( rc == -EEXIST )
+                    rc = hvm_set_ioreq_server_domid(d, a.value);
                 break;
             case HVM_PARAM_ACPI_S_STATE:
                 /* Not reflexive, as we must domain_pause(). */
@@ -4478,6 +4526,14 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
             case HVM_PARAM_ACPI_S_STATE:
                 a.value = d->arch.hvm_domain.is_s3_suspended ? 3 : 0;
                 break;
+            case HVM_PARAM_IOREQ_PFN:
+            case HVM_PARAM_BUFIOREQ_PFN:
+            case HVM_PARAM_BUFIOREQ_EVTCHN:
+                /* May need to create server */
+                rc = hvm_create_ioreq_server(d, curr_d->domain_id);
+                if ( rc != 0 && rc != -EEXIST )
+                    goto param_fail;
+                /*FALLTHRU*/
             default:
                 a.value = d->arch.hvm_domain.params[a.index];
                 break;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v3 5/6] ioreq-server: add support for multiple servers
  2014-03-05 14:47 [PATCH v3 0/6] Support for running secondary emulators Paul Durrant
                   ` (3 preceding siblings ...)
  2014-03-05 14:47 ` [PATCH v3 4/6] ioreq-server: on-demand creation of ioreq server Paul Durrant
@ 2014-03-05 14:48 ` Paul Durrant
  2014-03-14 11:52   ` Ian Campbell
  2014-03-05 14:48 ` [PATCH v3 6/6] ioreq-server: bring the PCI hotplug controller implementation into Xen Paul Durrant
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 48+ messages in thread
From: Paul Durrant @ 2014-03-05 14:48 UTC (permalink / raw)
  To: xen-devel; +Cc: Paul Durrant

The legacy 'catch-all' server is always created with id 0. Secondary
servers will have an id ranging from 1 to a limit set by the toolstack
via the 'max_emulators' build info field. This defaults to 1 so ordinarily
no extra special pages are reserved for secondary emulators. It may be
increased using the secondary_device_emulators parameter in xl.cfg(5).
There's no clear limit to apply to the number of emulators so I've not
applied one.

Because of the re-arrangement of the special pages in a previous patch we
only need the addition of parameter HVM_PARAM_NR_IOREQ_SERVERS to determine
the layout of the shared pages for multiple emulators. Guests migrated in
from hosts without this patch will be lacking the save record which stores
the new parameter and so the guest is assumed to only have had a single
emulator.

Added some more emacs boilerplate to xenctrl.h and xenguest.h

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
---
 docs/man/xl.cfg.pod.5            |    7 +
 tools/libxc/xc_domain.c          |  175 +++++++
 tools/libxc/xc_domain_restore.c  |   20 +
 tools/libxc/xc_domain_save.c     |   12 +
 tools/libxc/xc_hvm_build_x86.c   |   24 +-
 tools/libxc/xenctrl.h            |   51 ++
 tools/libxc/xenguest.h           |   12 +
 tools/libxc/xg_save_restore.h    |    1 +
 tools/libxl/libxl.h              |    8 +
 tools/libxl/libxl_create.c       |    3 +
 tools/libxl/libxl_dom.c          |    1 +
 tools/libxl/libxl_types.idl      |    1 +
 tools/libxl/xl_cmdimpl.c         |    3 +
 xen/arch/x86/hvm/hvm.c           |  964 ++++++++++++++++++++++++++++++++++++--
 xen/arch/x86/hvm/io.c            |    2 +-
 xen/include/asm-x86/hvm/domain.h |   23 +-
 xen/include/asm-x86/hvm/hvm.h    |    3 +-
 xen/include/asm-x86/hvm/io.h     |    2 +-
 xen/include/public/hvm/hvm_op.h  |   70 +++
 xen/include/public/hvm/ioreq.h   |    1 +
 xen/include/public/hvm/params.h  |    4 +-
 21 files changed, 1324 insertions(+), 63 deletions(-)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index e15a49f..0226c55 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -1281,6 +1281,13 @@ specified, enabling the use of XenServer PV drivers in the guest.
 This parameter only takes effect when device_model_version=qemu-xen.
 See F<docs/misc/pci-device-reservations.txt> for more information.
 
+=item B<secondary_device_emulators=NUMBER>
+
+If a number of secondary device emulators (i.e. in addition to
+qemu-xen or qemu-xen-traditional) are to be invoked to support the
+guest then this parameter can be set with the count of how many are
+to be used. The default value is zero.
+
 =back
 
 =head2 Device-Model Options
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 369c3f3..dfa905b 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1284,6 +1284,181 @@ int xc_get_hvm_param(xc_interface *handle, domid_t dom, int param, unsigned long
     return rc;
 }
 
+int xc_hvm_create_ioreq_server(xc_interface *xch,
+                               domid_t domid,
+                               ioservid_t *id)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_create_ioreq_server_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_create_ioreq_server;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+    arg->domid = domid;
+    rc = do_xen_hypercall(xch, &hypercall);
+    *id = arg->id;
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
+int xc_hvm_get_ioreq_server_info(xc_interface *xch,
+                                 domid_t domid,
+                                 ioservid_t id,
+                                 xen_pfn_t *pfn,
+                                 xen_pfn_t *buf_pfn,
+                                 evtchn_port_t *buf_port)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_get_ioreq_server_info_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_get_ioreq_server_info;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+    arg->domid = domid;
+    arg->id = id;
+    rc = do_xen_hypercall(xch, &hypercall);
+    if ( rc != 0 )
+        goto done;
+
+    if ( pfn )
+        *pfn = arg->pfn;
+
+    if ( buf_pfn )
+        *buf_pfn = arg->buf_pfn;
+
+    if ( buf_port )
+        *buf_port = arg->buf_port;
+
+done:
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
+int xc_hvm_map_io_range_to_ioreq_server(xc_interface *xch, domid_t domid,
+                                        ioservid_t id, int is_mmio,
+                                        uint64_t start, uint64_t end)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_map_io_range_to_ioreq_server_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_map_io_range_to_ioreq_server;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+    arg->domid = domid;
+    arg->id = id;
+    arg->is_mmio = is_mmio;
+    arg->start = start;
+    arg->end = end;
+    rc = do_xen_hypercall(xch, &hypercall);
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
+int xc_hvm_unmap_io_range_from_ioreq_server(xc_interface *xch, domid_t domid,
+                                            ioservid_t id, int is_mmio,
+                                            uint64_t start)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_unmap_io_range_from_ioreq_server_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_unmap_io_range_from_ioreq_server;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+    arg->domid = domid;
+    arg->id = id;
+    arg->is_mmio = is_mmio;
+    arg->start = start;
+    rc = do_xen_hypercall(xch, &hypercall);
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
+int xc_hvm_map_pcidev_to_ioreq_server(xc_interface *xch, domid_t domid,
+                                      ioservid_t id, uint16_t bdf)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_map_pcidev_to_ioreq_server_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_map_pcidev_to_ioreq_server;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+    arg->domid = domid;
+    arg->id = id;
+    arg->bdf = bdf;
+    rc = do_xen_hypercall(xch, &hypercall);
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
+int xc_hvm_unmap_pcidev_from_ioreq_server(xc_interface *xch, domid_t domid,
+                                          ioservid_t id, uint16_t bdf)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_unmap_pcidev_from_ioreq_server_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_unmap_pcidev_from_ioreq_server;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+    arg->domid = domid;
+    arg->id = id;
+    arg->bdf = bdf;
+    rc = do_xen_hypercall(xch, &hypercall);
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
+int xc_hvm_destroy_ioreq_server(xc_interface *xch,
+                                domid_t domid,
+                                ioservid_t id)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_destroy_ioreq_server_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_destroy_ioreq_server;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+    arg->domid = domid;
+    arg->id = id;
+    rc = do_xen_hypercall(xch, &hypercall);
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
 int xc_domain_setdebugging(xc_interface *xch,
                            uint32_t domid,
                            unsigned int enable)
diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c
index 1f6ce50..3116653 100644
--- a/tools/libxc/xc_domain_restore.c
+++ b/tools/libxc/xc_domain_restore.c
@@ -746,6 +746,7 @@ typedef struct {
     uint64_t acpi_ioport_location;
     uint64_t viridian;
     uint64_t vm_generationid_addr;
+    uint64_t nr_ioreq_servers;
 
     struct toolstack_data_t tdata;
 } pagebuf_t;
@@ -996,6 +997,16 @@ static int pagebuf_get_one(xc_interface *xch, struct restore_ctx *ctx,
         DPRINTF("read generation id buffer address");
         return pagebuf_get_one(xch, ctx, buf, fd, dom);
 
+    case XC_SAVE_ID_HVM_NR_IOREQ_SERVERS:
+        /* Skip padding 4 bytes then read the acpi ioport location. */
+        if ( RDEXACT(fd, &buf->nr_ioreq_servers, sizeof(uint32_t)) ||
+             RDEXACT(fd, &buf->nr_ioreq_servers, sizeof(uint64_t)) )
+        {
+            PERROR("error reading the number of IOREQ servers");
+            return -1;
+        }
+        return pagebuf_get_one(xch, ctx, buf, fd, dom);
+
     default:
         if ( (count > MAX_BATCH_SIZE) || (count < 0) ) {
             ERROR("Max batch size exceeded (%d). Giving up.", count);
@@ -1755,6 +1766,15 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
     if (pagebuf.viridian != 0)
         xc_set_hvm_param(xch, dom, HVM_PARAM_VIRIDIAN, 1);
 
+    if ( hvm ) {
+        int nr_ioreq_servers = pagebuf.nr_ioreq_servers;
+
+        if ( nr_ioreq_servers == 0 )
+            nr_ioreq_servers = 1;
+
+        xc_set_hvm_param(xch, dom, HVM_PARAM_NR_IOREQ_SERVERS, nr_ioreq_servers);
+    }
+
     if (pagebuf.acpi_ioport_location == 1) {
         DBGPRINTF("Use new firmware ioport from the checkpoint\n");
         xc_set_hvm_param(xch, dom, HVM_PARAM_ACPI_IOPORTS_LOCATION, 1);
diff --git a/tools/libxc/xc_domain_save.c b/tools/libxc/xc_domain_save.c
index 42c4752..3293e29 100644
--- a/tools/libxc/xc_domain_save.c
+++ b/tools/libxc/xc_domain_save.c
@@ -1731,6 +1731,18 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iter
             PERROR("Error when writing the viridian flag");
             goto out;
         }
+
+        chunk.id = XC_SAVE_ID_HVM_NR_IOREQ_SERVERS;
+        chunk.data = 0;
+        xc_get_hvm_param(xch, dom, HVM_PARAM_NR_IOREQ_SERVERS,
+                         (unsigned long *)&chunk.data);
+
+        if ( (chunk.data != 0) &&
+             wrexact(io_fd, &chunk, sizeof(chunk)) )
+        {
+            PERROR("Error when writing the number of IOREQ servers");
+            goto out;
+        }
     }
 
     if ( callbacks != NULL && callbacks->toolstack_save != NULL )
diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
index b65e702..6d6328a 100644
--- a/tools/libxc/xc_hvm_build_x86.c
+++ b/tools/libxc/xc_hvm_build_x86.c
@@ -45,7 +45,7 @@
 #define SPECIALPAGE_IDENT_PT 4
 #define SPECIALPAGE_CONSOLE  5
 #define SPECIALPAGE_IOREQ    6
-#define NR_SPECIAL_PAGES     SPECIALPAGE_IOREQ + 2 /* ioreq server needs 2 pages */
+#define NR_SPECIAL_PAGES(n)  SPECIALPAGE_IOREQ + (2 * n) /* ioreq server needs 2 pages */
 #define special_pfn(x) (0xff000u - 1 - (x))
 
 #define VGA_HOLE_SIZE (0x20)
@@ -85,7 +85,8 @@ static int modules_init(struct xc_hvm_build_args *args,
 }
 
 static void build_hvm_info(void *hvm_info_page, uint64_t mem_size,
-                           uint64_t mmio_start, uint64_t mmio_size)
+                           uint64_t mmio_start, uint64_t mmio_size,
+                           int max_emulators)
 {
     struct hvm_info_table *hvm_info = (struct hvm_info_table *)
         (((unsigned char *)hvm_info_page) + HVM_INFO_OFFSET);
@@ -113,7 +114,7 @@ static void build_hvm_info(void *hvm_info_page, uint64_t mem_size,
     /* Memory parameters. */
     hvm_info->low_mem_pgend = lowmem_end >> PAGE_SHIFT;
     hvm_info->high_mem_pgend = highmem_end >> PAGE_SHIFT;
-    hvm_info->reserved_mem_pgstart = special_pfn(0) - NR_SPECIAL_PAGES;
+    hvm_info->reserved_mem_pgstart = special_pfn(0) - NR_SPECIAL_PAGES(max_emulators);
 
     /* Finish with the checksum. */
     for ( i = 0, sum = 0; i < hvm_info->length; i++ )
@@ -256,6 +257,10 @@ static int setup_guest(xc_interface *xch,
         stat_1gb_pages = 0;
     int pod_mode = 0;
     int claim_enabled = args->claim_enabled;
+    int max_emulators = args->max_emulators;
+
+    if ( max_emulators < 1 )
+        goto error_out;
 
     if ( nr_pages > target_pages )
         pod_mode = XENMEMF_populate_on_demand;
@@ -468,12 +473,13 @@ static int setup_guest(xc_interface *xch,
               xch, dom, PAGE_SIZE, PROT_READ | PROT_WRITE,
               HVM_INFO_PFN)) == NULL )
         goto error_out;
-    build_hvm_info(hvm_info_page, v_end, mmio_start, mmio_size);
+    build_hvm_info(hvm_info_page, v_end, mmio_start, mmio_size,
+                   max_emulators);
     munmap(hvm_info_page, PAGE_SIZE);
 
     /* Allocate and clear special pages. */
 
-    DPRINTF("%d SPECIAL PAGES:\n", NR_SPECIAL_PAGES);
+    DPRINTF("%d SPECIAL PAGES:\n", NR_SPECIAL_PAGES(max_emulators));
     DPRINTF("  PAGING:    %"PRI_xen_pfn"\n",
             (xen_pfn_t)special_pfn(SPECIALPAGE_PAGING));
     DPRINTF("  ACCESS:    %"PRI_xen_pfn"\n",
@@ -486,10 +492,10 @@ static int setup_guest(xc_interface *xch,
             (xen_pfn_t)special_pfn(SPECIALPAGE_IDENT_PT));
     DPRINTF("  CONSOLE:   %"PRI_xen_pfn"\n",
             (xen_pfn_t)special_pfn(SPECIALPAGE_CONSOLE));
-    DPRINTF("  IOREQ:     %"PRI_xen_pfn"\n",
+    DPRINTF("  IOREQ(%02d): %"PRI_xen_pfn"\n", max_emulators * 2,
             (xen_pfn_t)special_pfn(SPECIALPAGE_IOREQ));
 
-    for ( i = 0; i < NR_SPECIAL_PAGES; i++ )
+    for ( i = 0; i < NR_SPECIAL_PAGES(max_emulators); i++ )
     {
         xen_pfn_t pfn = special_pfn(i);
         rc = xc_domain_populate_physmap_exact(xch, dom, 1, 0, 0, &pfn);
@@ -515,7 +521,9 @@ static int setup_guest(xc_interface *xch,
     xc_set_hvm_param(xch, dom, HVM_PARAM_IOREQ_PFN,
                      special_pfn(SPECIALPAGE_IOREQ));
     xc_set_hvm_param(xch, dom, HVM_PARAM_BUFIOREQ_PFN,
-                     special_pfn(SPECIALPAGE_IOREQ) - 1);
+                     special_pfn(SPECIALPAGE_IOREQ) - max_emulators);
+    xc_set_hvm_param(xch, dom, HVM_PARAM_NR_IOREQ_SERVERS,
+                     max_emulators);
 
     /*
      * Identity-map page table is required for running with CR0.PG=0 when
diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
index 13f816b..84cab13 100644
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -1801,6 +1801,47 @@ void xc_clear_last_error(xc_interface *xch);
 int xc_set_hvm_param(xc_interface *handle, domid_t dom, int param, unsigned long value);
 int xc_get_hvm_param(xc_interface *handle, domid_t dom, int param, unsigned long *value);
 
+/*
+ * IOREQ server API
+ */
+int xc_hvm_create_ioreq_server(xc_interface *xch,
+                               domid_t domid,
+                               ioservid_t *id);
+
+int xc_hvm_get_ioreq_server_info(xc_interface *xch,
+                                 domid_t domid,
+                                 ioservid_t id,
+                                 xen_pfn_t *pfn,
+                                 xen_pfn_t *buf_pfn,
+                                 evtchn_port_t *buf_port);
+
+int xc_hvm_map_io_range_to_ioreq_server(xc_interface *xch,
+                                        domid_t domid,
+                                        ioservid_t id,
+                                        int is_mmio,
+                                        uint64_t start,
+                                        uint64_t end);
+
+int xc_hvm_unmap_io_range_from_ioreq_server(xc_interface *xch,
+                                            domid_t domid,
+                                            ioservid_t id,
+                                            int is_mmio,
+                                            uint64_t start);
+
+int xc_hvm_map_pcidev_to_ioreq_server(xc_interface *xch,
+                                      domid_t domid,
+                                      ioservid_t id,
+                                      uint16_t bdf);
+
+int xc_hvm_unmap_pcidev_from_ioreq_server(xc_interface *xch,
+                                          domid_t domid,
+                                          ioservid_t id,
+                                          uint16_t bdf);
+
+int xc_hvm_destroy_ioreq_server(xc_interface *xch,
+                                domid_t domid,
+                                ioservid_t id);
+
 /* HVM guest pass-through */
 int xc_assign_device(xc_interface *xch,
                      uint32_t domid,
@@ -2428,3 +2469,13 @@ int xc_kexec_load(xc_interface *xch, uint8_t type, uint16_t arch,
 int xc_kexec_unload(xc_interface *xch, int type);
 
 #endif /* XENCTRL_H */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/libxc/xenguest.h b/tools/libxc/xenguest.h
index a0e30e1..1300933 100644
--- a/tools/libxc/xenguest.h
+++ b/tools/libxc/xenguest.h
@@ -234,6 +234,8 @@ struct xc_hvm_build_args {
     struct xc_hvm_firmware_module smbios_module;
     /* Whether to use claim hypercall (1 - enable, 0 - disable). */
     int claim_enabled;
+    /* Maximum number of emulators for VM */
+    int max_emulators;
 };
 
 /**
@@ -306,3 +308,13 @@ xen_pfn_t *xc_map_m2p(xc_interface *xch,
                       int prot,
                       unsigned long *mfn0);
 #endif /* XENGUEST_H */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/libxc/xg_save_restore.h b/tools/libxc/xg_save_restore.h
index f859621..5170b7f 100644
--- a/tools/libxc/xg_save_restore.h
+++ b/tools/libxc/xg_save_restore.h
@@ -259,6 +259,7 @@
 #define XC_SAVE_ID_HVM_ACCESS_RING_PFN  -16
 #define XC_SAVE_ID_HVM_SHARING_RING_PFN -17
 #define XC_SAVE_ID_TOOLSTACK          -18 /* Optional toolstack specific info */
+#define XC_SAVE_ID_HVM_NR_IOREQ_SERVERS -19
 
 /*
 ** We process save/restore/migrate in batches of pages; the below
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 06bbca6..5a70b76 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -95,6 +95,14 @@
 #define LIBXL_HAVE_BUILDINFO_EVENT_CHANNELS 1
 
 /*
+ * LIBXL_HAVE_BUILDINFO_HVM_MAX_EMULATORS indicates that the
+ * max_emulators field is present in the hvm sections of
+ * libxl_domain_build_info. This field can be used to reserve
+ * extra special pages for secondary device emulators.
+ */
+#define LIBXL_HAVE_BUILDINFO_HVM_MAX_EMULATORS 1
+
+/*
  * libxl ABI compatibility
  *
  * The only guarantee which libxl makes regarding ABI compatibility
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index a604cd8..cce93d9 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -330,6 +330,9 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
 
         libxl_defbool_setdefault(&b_info->u.hvm.gfx_passthru, false);
 
+        if (b_info->u.hvm.max_emulators < 1)
+            b_info->u.hvm.max_emulators = 1;
+
         break;
     case LIBXL_DOMAIN_TYPE_PV:
         libxl_defbool_setdefault(&b_info->u.pv.e820_host, false);
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 55f74b2..9de06f9 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -637,6 +637,7 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
     args.mem_size = (uint64_t)(info->max_memkb - info->video_memkb) << 10;
     args.mem_target = (uint64_t)(info->target_memkb - info->video_memkb) << 10;
     args.claim_enabled = libxl_defbool_val(info->claim_mode);
+    args.max_emulators = info->u.hvm.max_emulators;
     if (libxl__domain_firmware(gc, info, &args)) {
         LOG(ERROR, "initializing domain firmware failed");
         goto out;
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 649ce50..b707159 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -372,6 +372,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
                                        ("xen_platform_pci", libxl_defbool),
                                        ("usbdevice_list",   libxl_string_list),
                                        ("vendor_device",    libxl_vendor_device),
+                                       ("max_emulators",    integer),
                                        ])),
                  ("pv", Struct(None, [("kernel", string),
                                       ("slack_memkb", MemKB),
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 4fc46eb..cf9b67d 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -1750,6 +1750,9 @@ skip_vfb:
 
             b_info->u.hvm.vendor_device = d;
         }
+ 
+        if (!xlu_cfg_get_long (config, "secondary_device_emulators", &l, 0))
+            b_info->u.hvm.max_emulators = l + 1;
     }
 
     xlu_cfg_destroy(config);
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 22b2a2c..996c374 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -356,14 +356,22 @@ static ioreq_t *get_ioreq(struct hvm_ioreq_server *s, int id)
 
 bool_t hvm_io_pending(struct vcpu *v)
 {
-    struct hvm_ioreq_server *s = v->domain->arch.hvm_domain.ioreq_server;
-    ioreq_t *p;
+    struct domain *d = v->domain;
+    struct list_head *entry;
 
-    if ( !s )
-        return 0;
+    list_for_each ( entry, &d->arch.hvm_domain.ioreq_server_list )
+    {
+        struct hvm_ioreq_server *s = list_entry(entry,
+                                                struct hvm_ioreq_server,
+                                                list_entry);
+        ioreq_t *p = get_ioreq(s, v->vcpu_id);
 
-    p = get_ioreq(s, v->vcpu_id);
-    return ( p->state != STATE_IOREQ_NONE );
+        p = get_ioreq(s, v->vcpu_id);
+        if ( p->state != STATE_IOREQ_NONE )
+            return 1;
+    }
+
+    return 0;
 }
 
 static void hvm_wait_on_io(struct domain *d, ioreq_t *p)
@@ -393,18 +401,20 @@ static void hvm_wait_on_io(struct domain *d, ioreq_t *p)
 void hvm_do_resume(struct vcpu *v)
 {
     struct domain *d = v->domain;
-    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+    struct list_head *entry;
 
     check_wakeup_from_wait();
 
     if ( is_hvm_vcpu(v) )
         pt_restore_timer(v);
 
-    if ( s )
+    list_for_each ( entry, &d->arch.hvm_domain.ioreq_server_list )
     {
-        ioreq_t *p = get_ioreq(s, v->vcpu_id);
+        struct hvm_ioreq_server *s = list_entry(entry,
+                                                struct hvm_ioreq_server,
+                                                list_entry);
 
-        hvm_wait_on_io(d, p);
+        hvm_wait_on_io(d, get_ioreq(s, v->vcpu_id));
     }
 
     /* Inject pending hw/sw trap */
@@ -542,6 +552,83 @@ static int hvm_print_line(
     return X86EMUL_OKAY;
 }
 
+static int hvm_access_cf8(
+    int dir, uint32_t port, uint32_t bytes, uint32_t *val)
+{
+    struct vcpu *curr = current;
+    struct hvm_domain *hd = &curr->domain->arch.hvm_domain;
+    int rc;
+
+    BUG_ON(port < 0xcf8);
+    port -= 0xcf8;
+
+    spin_lock(&hd->pci_lock);
+
+    if ( dir == IOREQ_WRITE )
+    {
+        switch ( bytes )
+        {
+        case 4:
+            hd->pci_cf8 = *val;
+            break;
+
+        case 2:
+        {
+            uint32_t mask = 0xffff << (port * 8);
+            uint32_t subval = *val << (port * 8);
+
+            hd->pci_cf8 = (hd->pci_cf8 & ~mask) |
+                          (subval & mask);
+            break;
+        }
+            
+        case 1:
+        {
+            uint32_t mask = 0xff << (port * 8);
+            uint32_t subval = *val << (port * 8);
+
+            hd->pci_cf8 = (hd->pci_cf8 & ~mask) |
+                          (subval & mask);
+            break;
+        }
+
+        default:
+            break;
+        }
+
+        /* We always need to fall through to the catch all emulator */
+        rc = X86EMUL_UNHANDLEABLE;
+    }
+    else
+    {
+        switch ( bytes )
+        {
+        case 4:
+            *val = hd->pci_cf8;
+            rc = X86EMUL_OKAY;
+            break;
+
+        case 2:
+            *val = (hd->pci_cf8 >> (port * 8)) & 0xffff;
+            rc = X86EMUL_OKAY;
+            break;
+            
+        case 1:
+            *val = (hd->pci_cf8 >> (port * 8)) & 0xff;
+            rc = X86EMUL_OKAY;
+            break;
+
+        default:
+            rc = X86EMUL_UNHANDLEABLE;
+            break;
+        }
+    }
+
+    spin_unlock(&hd->pci_lock);
+
+    return rc;
+}
+
 static int handle_pvh_io(
     int dir, uint32_t port, uint32_t bytes, uint32_t *val)
 {
@@ -588,7 +675,8 @@ static int hvm_ioreq_server_add_vcpu(struct hvm_ioreq_server *s, struct vcpu *v)
             goto done;
 
         s->buf_ioreq_evtchn = rc;
-        d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN] = s->buf_ioreq_evtchn;
+        if ( s->id == 0 )
+            d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN] = s->buf_ioreq_evtchn;
     }
 
     hvm_update_ioreq_server_evtchn(s);
@@ -606,34 +694,49 @@ static void hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s, struct vcpu
     free_xen_event_channel(v, v->arch.hvm_vcpu.ioreq_evtchn);
 }
 
-static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
+static int hvm_create_ioreq_server(struct domain *d, ioservid_t id, domid_t domid)
 {
     struct hvm_ioreq_server *s;
     unsigned long pfn;
     struct vcpu *v;
     int rc;
 
-    if ( d->arch.hvm_domain.ioreq_server != NULL )
-        return -EEXIST;
+    if ( id >= d->arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS] )
+        return -EINVAL;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
 
-    gdprintk(XENLOG_DEBUG, "%s: %d\n", __func__, d->domain_id);
+    rc = -EEXIST;
+    list_for_each_entry ( s, 
+                          &d->arch.hvm_domain.ioreq_server_list,
+                          list_entry )
+    {
+        if ( s->id == id )
+            goto fail_exist;
+    }
+
+    gdprintk(XENLOG_DEBUG, "%s: %d:%d\n", __func__, d->domain_id, id);
 
     rc = -ENOMEM;
     s = xzalloc(struct hvm_ioreq_server);
     if ( !s )
         goto fail_alloc;
 
+    s->id = id;
     s->domain = d;
     s->domid = domid;
+    INIT_LIST_HEAD(&s->mmio_range_list);
+    INIT_LIST_HEAD(&s->portio_range_list);
+    INIT_LIST_HEAD(&s->pcidev_list);
 
     /* Initialize shared pages */
-    pfn = d->arch.hvm_domain.params[HVM_PARAM_IOREQ_PFN];
+    pfn = d->arch.hvm_domain.params[HVM_PARAM_IOREQ_PFN] - s->id;
 
     hvm_init_ioreq_page(s, 0);
     if ( (rc = hvm_set_ioreq_page(s, 0, pfn)) < 0 )
         goto fail_set_ioreq;
 
-    pfn = d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_PFN];
+    pfn = d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_PFN] - s->id;
 
     hvm_init_ioreq_page(s, 1);
     if ( (rc = hvm_set_ioreq_page(s, 1, pfn)) < 0 )
@@ -647,10 +750,12 @@ static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
             goto fail_add_vcpu;
     }
 
-    d->arch.hvm_domain.ioreq_server = s;
+    list_add(&s->list_entry,
+             &d->arch.hvm_domain.ioreq_server_list);
 
     domain_unpause(d);
 
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
     return 0;
 
  fail_add_vcpu:
@@ -663,23 +768,34 @@ static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
  fail_set_ioreq:
     xfree(s);
  fail_alloc:
+ fail_exist:
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
     return rc;
 }
 
-static void hvm_destroy_ioreq_server(struct domain *d)
+static void hvm_destroy_ioreq_server(struct domain *d, ioservid_t id)
 {
     struct hvm_ioreq_server *s;
     struct vcpu *v;
 
-    gdprintk(XENLOG_DEBUG, "%s: %d\n", __func__, d->domain_id);
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
 
-    s = d->arch.hvm_domain.ioreq_server;
-    if ( !s )
-        return;
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server_list,
+                          list_entry)
+    {
+        if ( s->id == id )
+            goto found;
+    }
+
+    goto done;
+
+ found:
+    gdprintk(XENLOG_DEBUG, "%s: %d:%d\n", __func__, d->domain_id, id);
 
     domain_pause(d);
 
-    d->arch.hvm_domain.ioreq_server = NULL;
+    list_del_init(&s->list_entry);
 
     for_each_vcpu ( d, v )
         hvm_ioreq_server_remove_vcpu(s, v);
@@ -689,7 +805,375 @@ static void hvm_destroy_ioreq_server(struct domain *d)
     hvm_destroy_ioreq_page(s, 1);
     hvm_destroy_ioreq_page(s, 0);
 
-    xfree(s);
+    xfree(s);
+
+ done:
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+}
+
+static int hvm_get_ioreq_server_buf_port(struct domain *d, ioservid_t id,
+                                         evtchn_port_t *port)
+{
+    struct list_head *entry;
+    int rc;
+
+    if ( id >= d->arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS] )
+        return -EINVAL;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    rc = -ENOENT;
+    list_for_each ( entry,
+                    &d->arch.hvm_domain.ioreq_server_list )
+    {
+        struct hvm_ioreq_server *s = list_entry(entry,
+                                                struct hvm_ioreq_server,
+                                                list_entry);
+
+        if ( s->id == id )
+        {
+            *port = s->buf_ioreq_evtchn;
+            rc = 0;
+            break;
+        }
+    }
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    return rc;
+}
+
+static int hvm_get_ioreq_server_pfn(struct domain *d, ioservid_t id, bool_t buf,
+                                    xen_pfn_t *pfn)
+{
+    struct list_head *entry;
+    int rc;
+
+    if ( id >= d->arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS] )
+        return -EINVAL;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    rc = -ENOENT;
+    list_for_each ( entry,
+                    &d->arch.hvm_domain.ioreq_server_list )
+    {
+        struct hvm_ioreq_server *s = list_entry(entry,
+                                                struct hvm_ioreq_server,
+                                                list_entry);
+
+        if ( s->id == id )
+        {
+            int i = buf ? HVM_PARAM_BUFIOREQ_PFN : HVM_PARAM_IOREQ_PFN;
+
+            *pfn = d->arch.hvm_domain.params[i] - s->id;
+            rc = 0;
+            break;
+        }
+    }
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    return rc;
+}
+
+static int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
+                                            int is_mmio, uint64_t start, uint64_t end)
+{
+    struct hvm_ioreq_server *s;
+    struct hvm_io_range *x;
+    struct list_head *list;
+    int rc;
+
+    if ( id >= d->arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS] )
+        return -EINVAL;
+
+    x = xmalloc(struct hvm_io_range);
+    if ( x == NULL )
+        return -ENOMEM;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    rc = -ENOENT;
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server_list,
+                          list_entry )
+    {
+        if ( s->id == id )
+            goto found;
+    }
+
+    goto fail;
+
+ found:
+    INIT_RCU_HEAD(&x->rcu);
+    x->start = start;
+    x->end = end;
+
+    list = ( is_mmio ) ? &s->mmio_range_list : &s->portio_range_list;
+    list_add_rcu(&x->list_entry, list);
+
+    gdprintk(XENLOG_DEBUG, "%d:%d: +%s %"PRIX64" - %"PRIX64"\n",
+             d->domain_id,
+             s->id,
+             ( is_mmio ) ? "MMIO" : "PORTIO",
+             x->start,
+             x->end);
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    return 0;
+
+ fail:
+    xfree(x);
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    return rc;
+}
+
+static void free_io_range(struct rcu_head *rcu)
+{
+    struct hvm_io_range *x;
+
+    x = container_of (rcu, struct hvm_io_range, rcu);
+
+    xfree(x);
+}
+
+static int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
+                                                int is_mmio, uint64_t start)
+{
+    struct hvm_ioreq_server *s;
+    struct list_head *list, *entry;
+    int rc;
+
+    if ( id >= d->arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS] )
+        return -EINVAL;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    rc = -ENOENT;
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server_list,
+                          list_entry )
+    {
+        if ( s->id == id )
+            goto found;
+    }
+
+    goto done;
+
+ found:
+    list = ( is_mmio ) ? &s->mmio_range_list : &s->portio_range_list;
+
+    list_for_each ( entry,
+                    list )
+    {
+        struct hvm_io_range *x = list_entry(entry,
+                                            struct hvm_io_range,
+                                            list_entry);
+
+        if ( start == x->start )
+        {
+            gdprintk(XENLOG_DEBUG, "%d:%d: -%s %"PRIX64" - %"PRIX64"\n",
+                     d->domain_id,
+                     s->id,
+                     ( is_mmio ) ? "MMIO" : "PORTIO",
+                     x->start,
+                     x->end);
+
+            list_del_rcu(&x->list_entry);
+            call_rcu(&x->rcu, free_io_range);
+
+            rc = 0;
+            break;
+        }
+    }
+
+ done:
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    return rc;
+}
+
+static int hvm_map_pcidev_to_ioreq_server(struct domain *d, ioservid_t id,
+                                          uint16_t bdf)
+{
+    struct hvm_ioreq_server *s;
+    struct hvm_pcidev *x;
+    int rc;
+
+    if ( id >= d->arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS] )
+        return -EINVAL;
+
+    x = xmalloc(struct hvm_pcidev);
+    if ( x == NULL )
+        return -ENOMEM;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    rc = -ENOENT;
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server_list,
+                          list_entry )
+    {
+        if ( s->id == id )
+            goto found;
+    }
+
+    goto fail;
+
+ found:
+    INIT_RCU_HEAD(&x->rcu);
+    x->bdf = bdf;
+
+    list_add_rcu(&x->list_entry, &s->pcidev_list);
+
+    gdprintk(XENLOG_DEBUG, "%d:%d: +PCIDEV %04X\n",
+             d->domain_id,
+             s->id,
+             x->bdf);
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    return 0;
+
+ fail:
+    xfree(x);
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    return rc;
+}
+
+static void free_pcidev(struct rcu_head *rcu)
+{
+    struct hvm_pcidev *x;
+
+    x = container_of (rcu, struct hvm_pcidev, rcu);
+
+    xfree(x);
+}
+
+static int hvm_unmap_pcidev_from_ioreq_server(struct domain *d, ioservid_t id,
+                                              uint16_t bdf)
+{
+    struct hvm_ioreq_server *s;
+    struct list_head *entry;
+    int rc;
+
+    if ( id >= d->arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS] )
+        return -EINVAL;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    rc = -ENOENT;
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server_list,
+                          list_entry )
+    {
+        if ( s->id == id )
+            goto found;
+    }
+
+    goto done;
+
+ found:
+    list_for_each ( entry,
+                    &s->pcidev_list )
+    {
+        struct hvm_pcidev *x = list_entry(entry,
+                                          struct hvm_pcidev,
+                                          list_entry);
+
+        if ( bdf == x->bdf )
+        {
+            gdprintk(XENLOG_DEBUG, "%d:%d: -PCIDEV %04X\n",
+                     d->domain_id,
+                     s->id,
+                     x->bdf);
+
+            list_del_rcu(&x->list_entry);
+            call_rcu(&x->rcu, free_pcidev);
+
+            rc = 0;
+            break;
+        }
+    }
+
+ done:
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    return rc;
+}
+
+static int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v)
+{
+    struct list_head *entry;
+    int rc;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    list_for_each ( entry,
+                    &d->arch.hvm_domain.ioreq_server_list )
+    {
+        struct hvm_ioreq_server *s = list_entry(entry,
+                                                struct hvm_ioreq_server,
+                                                list_entry);
+
+        rc = hvm_ioreq_server_add_vcpu(s, v);
+        if ( rc )
+            goto fail;
+    }
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    return 0;
+
+ fail:
+    list_for_each ( entry,
+                    &d->arch.hvm_domain.ioreq_server_list )
+    {
+        struct hvm_ioreq_server *s = list_entry(entry,
+                                                struct hvm_ioreq_server,
+                                                list_entry);
+
+        hvm_ioreq_server_remove_vcpu(s, v);
+    }
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    return rc;
+}
+
+static void hvm_all_ioreq_servers_remove_vcpu(struct domain *d, struct vcpu *v)
+{
+    struct list_head *entry;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    list_for_each ( entry,
+                    &d->arch.hvm_domain.ioreq_server_list )
+    {
+        struct hvm_ioreq_server *s = list_entry(entry,
+                                                struct hvm_ioreq_server,
+                                                list_entry);
+
+        hvm_ioreq_server_remove_vcpu(s, v);
+    }
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+}
+
+static void hvm_destroy_all_ioreq_servers(struct domain *d)
+{
+    ioservid_t id;
+
+    for ( id = 0;
+          id < d->arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS];
+          id++ )
+        hvm_destroy_ioreq_server(d, id);
 }
 
 static int hvm_replace_event_channel(struct vcpu *v, domid_t remote_domid,
@@ -707,18 +1191,31 @@ static int hvm_replace_event_channel(struct vcpu *v, domid_t remote_domid,
     return 0;
 }
 
-static int hvm_set_ioreq_server_domid(struct domain *d, domid_t domid)
+static int hvm_set_ioreq_server_domid(struct domain *d, ioservid_t id, domid_t domid)
 {
-    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+    struct hvm_ioreq_server *s;
     struct vcpu *v;
     int rc;
 
+    if ( id >= d->arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS] )
+        return -EINVAL;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
     domain_pause(d);
 
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server_list,
+                          list_entry )
+    {
+        if ( s->id == id )
+            goto found;
+    }
+
     rc = -ENOENT;
-    if ( !s )
-        goto done;
+    goto done;
 
+ found:
     rc = 0;
     if ( s->domid == domid )
         goto done;
@@ -734,7 +1231,8 @@ static int hvm_set_ioreq_server_domid(struct domain *d, domid_t domid)
             if ( rc )
                 goto done;
 
-            d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN] = s->buf_ioreq_evtchn;
+            if ( s->id == 0 )
+                d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN] = s->buf_ioreq_evtchn;
         }
     }
 
@@ -746,6 +1244,8 @@ static int hvm_set_ioreq_server_domid(struct domain *d, domid_t domid)
  done:
     domain_unpause(d);
 
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+
     return rc;
 }
 
@@ -776,6 +1276,9 @@ int hvm_domain_initialise(struct domain *d)
 
     }
 
+    spin_lock_init(&d->arch.hvm_domain.ioreq_server_lock);
+    INIT_LIST_HEAD(&d->arch.hvm_domain.ioreq_server_list);
+    spin_lock_init(&d->arch.hvm_domain.pci_lock);
     spin_lock_init(&d->arch.hvm_domain.irq_lock);
     spin_lock_init(&d->arch.hvm_domain.uc_lock);
 
@@ -817,6 +1320,7 @@ int hvm_domain_initialise(struct domain *d)
     rtc_init(d);
 
     register_portio_handler(d, 0xe9, 1, hvm_print_line);
+    register_portio_handler(d, 0xcf8, 4, hvm_access_cf8);
 
     rc = hvm_funcs.domain_initialise(d);
     if ( rc != 0 )
@@ -847,7 +1351,7 @@ void hvm_domain_relinquish_resources(struct domain *d)
     if ( hvm_funcs.nhvm_domain_relinquish_resources )
         hvm_funcs.nhvm_domain_relinquish_resources(d);
 
-    hvm_destroy_ioreq_server(d);
+    hvm_destroy_all_ioreq_servers(d);
 
     msixtbl_pt_cleanup(d);
 
@@ -1479,7 +1983,6 @@ int hvm_vcpu_initialise(struct vcpu *v)
 {
     int rc;
     struct domain *d = v->domain;
-    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
 
     hvm_asid_flush_vcpu(v);
 
@@ -1522,12 +2025,9 @@ int hvm_vcpu_initialise(struct vcpu *v)
          && (rc = nestedhvm_vcpu_initialise(v)) < 0 ) /* teardown: nestedhvm_vcpu_destroy */
         goto fail5;
 
-    if ( s )
-    {
-        rc = hvm_ioreq_server_add_vcpu(s, v);
-        if ( rc != 0 )
-            goto fail6;
-    }
+    rc = hvm_all_ioreq_servers_add_vcpu(d, v);
+    if ( rc != 0 )
+        goto fail6;
 
     if ( v->vcpu_id == 0 )
     {
@@ -1563,10 +2063,8 @@ int hvm_vcpu_initialise(struct vcpu *v)
 void hvm_vcpu_destroy(struct vcpu *v)
 {
     struct domain *d = v->domain;
-    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
 
-    if ( s )
-        hvm_ioreq_server_remove_vcpu(s, v);
+    hvm_all_ioreq_servers_remove_vcpu(d, v);
 
     nestedhvm_vcpu_destroy(v);
 
@@ -1605,9 +2103,110 @@ void hvm_vcpu_down(struct vcpu *v)
     }
 }
 
-int hvm_buffered_io_send(struct domain *d, const ioreq_t *p)
+static DEFINE_RCU_READ_LOCK(ioreq_server_rcu_lock);
+
+static struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
+                                                        ioreq_t *p)
+{
+#define BDF(cf8) (((cf8) & 0x00ffff00) >> 8)
+
+    struct hvm_ioreq_server *s;
+    uint8_t type;
+    uint64_t addr;
+
+    if ( p->type == IOREQ_TYPE_PIO &&
+         (p->addr & ~3) == 0xcfc )
+    { 
+        /* PCI config data cycle */
+        type = IOREQ_TYPE_PCI_CONFIG;
+
+        spin_lock(&d->arch.hvm_domain.pci_lock);
+        addr = d->arch.hvm_domain.pci_cf8 + (p->addr & 3);
+        spin_unlock(&d->arch.hvm_domain.pci_lock);
+    }
+    else
+    {
+        type = p->type;
+        addr = p->addr;
+    }
+
+    rcu_read_lock(&ioreq_server_rcu_lock);
+
+    switch ( type )
+    {
+    case IOREQ_TYPE_COPY:
+    case IOREQ_TYPE_PIO:
+    case IOREQ_TYPE_PCI_CONFIG:
+        break;
+    default:
+        goto done;
+    }
+
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server_list,
+                          list_entry )
+    {
+        switch ( type )
+        {
+            case IOREQ_TYPE_COPY:
+            case IOREQ_TYPE_PIO: {
+                struct list_head *list;
+                struct hvm_io_range *x;
+
+                list = ( type == IOREQ_TYPE_COPY ) ?
+                    &s->mmio_range_list :
+                    &s->portio_range_list;
+
+                list_for_each_entry ( x,
+                                      list,
+                                      list_entry )
+                {
+                    if ( (addr >= x->start) && (addr <= x->end) )
+                        goto found;
+                }
+                break;
+            }
+            case IOREQ_TYPE_PCI_CONFIG: {
+                struct hvm_pcidev *x;
+
+                list_for_each_entry ( x,
+                                      &s->pcidev_list,
+                                      list_entry )
+                {
+                    if ( BDF(addr) == x->bdf ) {
+                        p->type = type;
+                        p->addr = addr;
+                        goto found;
+                    }
+                }
+                break;
+            }
+        }
+    }
+
+ done:
+    /* The catch-all server has id 0 */
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server_list,
+                          list_entry )
+    {
+        if ( s->id == 0 )
+            goto found;
+    }
+
+    s = NULL;
+
+ found:
+    rcu_read_unlock(&ioreq_server_rcu_lock);
+
+    return s;
+
+#undef BDF
+}
+
+int hvm_buffered_io_send(struct domain *d, ioreq_t *p)
 {
-    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+    struct hvm_ioreq_server *s;
     struct hvm_ioreq_page *iorp;
     buffered_iopage_t *pg;
     buf_ioreq_t bp;
@@ -1617,6 +2216,7 @@ int hvm_buffered_io_send(struct domain *d, const ioreq_t *p)
     /* Ensure buffered_iopage fits in a page */
     BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE);
 
+    s = hvm_select_ioreq_server(d, p);
     if ( !s )
         return 0;
 
@@ -1689,7 +2289,7 @@ int hvm_buffered_io_send(struct domain *d, const ioreq_t *p)
 
 bool_t hvm_has_dm(struct domain *d)
 {
-    return !!d->arch.hvm_domain.ioreq_server;
+    return !list_empty(&d->arch.hvm_domain.ioreq_server_list);
 }
 
 static bool_t hvm_send_assist_req_to_server(struct hvm_ioreq_server *s,
@@ -1728,20 +2328,36 @@ static bool_t hvm_send_assist_req_to_server(struct hvm_ioreq_server *s,
     return 1;
 }
 
-bool_t hvm_send_assist_req(struct vcpu *v, const ioreq_t *p)
+bool_t hvm_send_assist_req(struct vcpu *v, ioreq_t *p)
 {
-    struct domain *d = v->domain;
-    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+    struct hvm_ioreq_server *s;
 
     if ( unlikely(!vcpu_start_shutdown_deferral(v)) )
         return 0;
 
+    s = hvm_select_ioreq_server(v->domain, p);
     if ( !s )
         return 0;
 
     return hvm_send_assist_req_to_server(s, v, p);
 }
 
+void hvm_broadcast_assist_req(struct vcpu *v, const ioreq_t *p)
+{
+    struct domain *d = v->domain;
+    struct list_head *entry;
+
+    list_for_each ( entry,
+                    &d->arch.hvm_domain.ioreq_server_list )
+    {
+        struct hvm_ioreq_server *s = list_entry(entry,
+                                                struct hvm_ioreq_server,
+                                                list_entry);
+
+        (void) hvm_send_assist_req_to_server(s, v, p);
+    }
+}
+
 void hvm_hlt(unsigned long rflags)
 {
     struct vcpu *curr = current;
@@ -4330,6 +4946,215 @@ static int hvmop_flush_tlb_all(void)
     return 0;
 }
 
+static int hvmop_create_ioreq_server(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_create_ioreq_server_t) uop)
+{
+    struct domain *curr_d = current->domain;
+    xen_hvm_create_ioreq_server_t op;
+    struct domain *d;
+    ioservid_t id;
+    int rc;
+
+    if ( copy_from_guest(&op, uop, 1) )
+        return -EFAULT;
+
+    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
+    if ( rc != 0 )
+        return rc;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    rc = -ENOSPC;
+    for ( id = 1;
+          id <  d->arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS];
+          id++ )
+    {
+        rc = hvm_create_ioreq_server(d, id, curr_d->domain_id);
+        if ( rc == -EEXIST )
+            continue;
+
+        break;
+    }
+
+    if ( rc == -EEXIST )
+        rc = -ENOSPC;
+
+    if ( rc != 0 )
+        goto out;
+
+    op.id = id;
+
+    rc = copy_to_guest(uop, &op, 1) ? -EFAULT : 0;
+    
+ out:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
+static int hvmop_get_ioreq_server_info(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_get_ioreq_server_info_t) uop)
+{
+    xen_hvm_get_ioreq_server_info_t op;
+    struct domain *d;
+    int rc;
+
+    if ( copy_from_guest(&op, uop, 1) )
+        return -EFAULT;
+
+    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
+    if ( rc != 0 )
+        return rc;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    if ( (rc = hvm_get_ioreq_server_pfn(d, op.id, 0, &op.pfn)) < 0 )
+        goto out;
+
+    if ( (rc = hvm_get_ioreq_server_pfn(d, op.id, 1, &op.buf_pfn)) < 0 )
+        goto out;
+
+    if ( (rc = hvm_get_ioreq_server_buf_port(d, op.id, &op.buf_port)) < 0 )
+        goto out;
+
+    rc = copy_to_guest(uop, &op, 1) ? -EFAULT : 0;
+    
+ out:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
+static int hvmop_map_io_range_to_ioreq_server(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_map_io_range_to_ioreq_server_t) uop)
+{
+    xen_hvm_map_io_range_to_ioreq_server_t op;
+    struct domain *d;
+    int rc;
+
+    if ( copy_from_guest(&op, uop, 1) )
+        return -EFAULT;
+
+    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
+    if ( rc != 0 )
+        return rc;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    rc = hvm_map_io_range_to_ioreq_server(d, op.id, op.is_mmio,
+                                          op.start, op.end);
+
+ out:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
+static int hvmop_unmap_io_range_from_ioreq_server(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_unmap_io_range_from_ioreq_server_t) uop)
+{
+    xen_hvm_unmap_io_range_from_ioreq_server_t op;
+    struct domain *d;
+    int rc;
+
+    if ( copy_from_guest(&op, uop, 1) )
+        return -EFAULT;
+
+    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
+    if ( rc != 0 )
+        return rc;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    rc = hvm_unmap_io_range_from_ioreq_server(d, op.id, op.is_mmio,
+                                              op.start);
+    
+ out:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
+static int hvmop_map_pcidev_to_ioreq_server(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_map_pcidev_to_ioreq_server_t) uop)
+{
+    xen_hvm_map_pcidev_to_ioreq_server_t op;
+    struct domain *d;
+    int rc;
+
+    if ( copy_from_guest(&op, uop, 1) )
+        return -EFAULT;
+
+    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
+    if ( rc != 0 )
+        return rc;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    rc = hvm_map_pcidev_to_ioreq_server(d, op.id, op.bdf);
+
+ out:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
+static int hvmop_unmap_pcidev_from_ioreq_server(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_unmap_pcidev_from_ioreq_server_t) uop)
+{
+    xen_hvm_unmap_pcidev_from_ioreq_server_t op;
+    struct domain *d;
+    int rc;
+
+    if ( copy_from_guest(&op, uop, 1) )
+        return -EFAULT;
+
+    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
+    if ( rc != 0 )
+        return rc;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    rc = hvm_unmap_pcidev_from_ioreq_server(d, op.id, op.bdf);
+
+ out:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
+static int hvmop_destroy_ioreq_server(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_destroy_ioreq_server_t) uop)
+{
+    xen_hvm_destroy_ioreq_server_t op;
+    struct domain *d;
+    int rc;
+
+    if ( copy_from_guest(&op, uop, 1) )
+        return -EFAULT;
+
+    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
+    if ( rc != 0 )
+        return rc;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    hvm_destroy_ioreq_server(d, op.id);
+    rc = 0;
+
+ out:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
 long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
 
 {
@@ -4338,6 +5163,41 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
 
     switch ( op )
     {
+    case HVMOP_create_ioreq_server:
+        rc = hvmop_create_ioreq_server(
+            guest_handle_cast(arg, xen_hvm_create_ioreq_server_t));
+        break;
+    
+    case HVMOP_get_ioreq_server_info:
+        rc = hvmop_get_ioreq_server_info(
+            guest_handle_cast(arg, xen_hvm_get_ioreq_server_info_t));
+        break;
+    
+    case HVMOP_map_io_range_to_ioreq_server:
+        rc = hvmop_map_io_range_to_ioreq_server(
+            guest_handle_cast(arg, xen_hvm_map_io_range_to_ioreq_server_t));
+        break;
+    
+    case HVMOP_unmap_io_range_from_ioreq_server:
+        rc = hvmop_unmap_io_range_from_ioreq_server(
+            guest_handle_cast(arg, xen_hvm_unmap_io_range_from_ioreq_server_t));
+        break;
+    
+    case HVMOP_map_pcidev_to_ioreq_server:
+        rc = hvmop_map_pcidev_to_ioreq_server(
+            guest_handle_cast(arg, xen_hvm_map_pcidev_to_ioreq_server_t));
+        break;
+    
+    case HVMOP_unmap_pcidev_from_ioreq_server:
+        rc = hvmop_unmap_pcidev_from_ioreq_server(
+            guest_handle_cast(arg, xen_hvm_unmap_pcidev_from_ioreq_server_t));
+        break;
+    
+    case HVMOP_destroy_ioreq_server:
+        rc = hvmop_destroy_ioreq_server(
+            guest_handle_cast(arg, xen_hvm_destroy_ioreq_server_t));
+        break;
+    
     case HVMOP_set_param:
     case HVMOP_get_param:
     {
@@ -4426,9 +5286,9 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
                 if ( a.value == DOMID_SELF )
                     a.value = curr_d->domain_id;
 
-                rc = hvm_create_ioreq_server(d, a.value);
+                rc = hvm_create_ioreq_server(d, 0, a.value);
                 if ( rc == -EEXIST )
-                    rc = hvm_set_ioreq_server_domid(d, a.value);
+                    rc = hvm_set_ioreq_server_domid(d, 0, a.value);
                 break;
             case HVM_PARAM_ACPI_S_STATE:
                 /* Not reflexive, as we must domain_pause(). */
@@ -4493,6 +5353,10 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
                 if ( a.value > SHUTDOWN_MAX )
                     rc = -EINVAL;
                 break;
+            case HVM_PARAM_NR_IOREQ_SERVERS:
+                if ( d == current->domain )
+                    rc = -EPERM;
+                break;
             }
 
             if ( rc == 0 ) 
@@ -4530,7 +5394,7 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
             case HVM_PARAM_BUFIOREQ_PFN:
             case HVM_PARAM_BUFIOREQ_EVTCHN:
                 /* May need to create server */
-                rc = hvm_create_ioreq_server(d, curr_d->domain_id);
+                rc = hvm_create_ioreq_server(d, 0, curr_d->domain_id);
                 if ( rc != 0 && rc != -EEXIST )
                     goto param_fail;
                 /*FALLTHRU*/
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index 7aac61d..cf87d49 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -76,7 +76,7 @@ void send_invalidate_req(void)
         .data = ~0UL, /* flush all */
     };
 
-    (void)hvm_send_assist_req(curr, &p);
+    hvm_broadcast_assist_req(curr, &p);
 }
 
 int handle_mmio(void)
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index d8dbfab..0ed2bb2 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -41,16 +41,37 @@ struct hvm_ioreq_page {
     void *va;
 };
 
+struct hvm_io_range {
+    struct list_head    list_entry;
+    uint64_t            start, end;
+    struct rcu_head     rcu;
+};	
+
+struct hvm_pcidev {
+    struct list_head    list_entry;
+    uint16_t            bdf;
+    struct rcu_head     rcu;
+};	
+
 struct hvm_ioreq_server {
+    struct list_head       list_entry;
+    ioservid_t             id;
     struct domain          *domain;
     domid_t                domid; /* domid of emulator */
     struct hvm_ioreq_page  ioreq;
     struct hvm_ioreq_page  buf_ioreq;
     evtchn_port_t          buf_ioreq_evtchn;
+    struct list_head       mmio_range_list;
+    struct list_head       portio_range_list;
+    struct list_head       pcidev_list;
 };
 
 struct hvm_domain {
-    struct hvm_ioreq_server *ioreq_server;
+    struct list_head        ioreq_server_list;
+    spinlock_t              ioreq_server_lock;
+    uint32_t                pci_cf8;
+    spinlock_t              pci_lock;
+
     struct pl_time         pl_time;
 
     struct hvm_io_handler *io_handler;
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 08a62ea..6de334b 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -228,7 +228,8 @@ int prepare_ring_for_helper(struct domain *d, unsigned long gmfn,
                             struct page_info **_page, void **_va);
 void destroy_ring_for_helper(void **_va, struct page_info *page);
 
-bool_t hvm_send_assist_req(struct vcpu *v, const ioreq_t *p);
+bool_t hvm_send_assist_req(struct vcpu *v, ioreq_t *p);
+void hvm_broadcast_assist_req(struct vcpu *v, const ioreq_t *p);
 
 void hvm_get_guest_pat(struct vcpu *v, u64 *guest_pat);
 int hvm_set_guest_pat(struct vcpu *v, u64 guest_pat);
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index bfd28c2..be6546d 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -92,7 +92,7 @@ static inline int hvm_buffered_io_intercept(ioreq_t *p)
 }
 
 int hvm_mmio_intercept(ioreq_t *p);
-int hvm_buffered_io_send(struct domain *d, const ioreq_t *p);
+int hvm_buffered_io_send(struct domain *d, ioreq_t *p);
 
 static inline void register_portio_handler(
     struct domain *d, unsigned long addr,
diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
index a9aab4b..6b31189 100644
--- a/xen/include/public/hvm/hvm_op.h
+++ b/xen/include/public/hvm/hvm_op.h
@@ -23,6 +23,7 @@
 
 #include "../xen.h"
 #include "../trace.h"
+#include "../event_channel.h"
 
 /* Get/set subcommands: extra argument == pointer to xen_hvm_param struct. */
 #define HVMOP_set_param           0
@@ -270,6 +271,75 @@ struct xen_hvm_inject_msi {
 typedef struct xen_hvm_inject_msi xen_hvm_inject_msi_t;
 DEFINE_XEN_GUEST_HANDLE(xen_hvm_inject_msi_t);
 
+typedef uint32_t ioservid_t;
+
+DEFINE_XEN_GUEST_HANDLE(ioservid_t);
+
+#define HVMOP_create_ioreq_server 17
+struct xen_hvm_create_ioreq_server {
+    domid_t domid;  /* IN - domain to be serviced */
+    ioservid_t id;  /* OUT - server id */
+};
+typedef struct xen_hvm_create_ioreq_server xen_hvm_create_ioreq_server_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_create_ioreq_server_t);
+
+#define HVMOP_get_ioreq_server_info 18
+struct xen_hvm_get_ioreq_server_info {
+    domid_t domid;          /* IN - domain to be serviced */
+    ioservid_t id;          /* IN - server id */
+    xen_pfn_t pfn;          /* OUT - ioreq pfn */
+    xen_pfn_t buf_pfn;      /* OUT - buf ioreq pfn */
+    evtchn_port_t buf_port; /* OUT - buf ioreq port */
+};
+typedef struct xen_hvm_get_ioreq_server_info xen_hvm_get_ioreq_server_info_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_get_ioreq_server_info_t);
+
+#define HVMOP_map_io_range_to_ioreq_server 19
+struct xen_hvm_map_io_range_to_ioreq_server {
+    domid_t domid;                  /* IN - domain to be serviced */
+    ioservid_t id;                  /* IN - handle from HVMOP_register_ioreq_server */
+    int is_mmio;                    /* IN - MMIO or port IO? */
+    uint64_aligned_t start, end;    /* IN - inclusive start and end of range */
+};
+typedef struct xen_hvm_map_io_range_to_ioreq_server xen_hvm_map_io_range_to_ioreq_server_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_map_io_range_to_ioreq_server_t);
+
+#define HVMOP_unmap_io_range_from_ioreq_server 20
+struct xen_hvm_unmap_io_range_from_ioreq_server {
+    domid_t domid;          /* IN - domain to be serviced */
+    ioservid_t id;          /* IN - handle from HVMOP_register_ioreq_server */
+    uint8_t is_mmio;        /* IN - MMIO or port IO? */
+    uint64_aligned_t start; /* IN - start address of the range to remove */
+};
+typedef struct xen_hvm_unmap_io_range_from_ioreq_server xen_hvm_unmap_io_range_from_ioreq_server_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_unmap_io_range_from_ioreq_server_t);
+
+#define HVMOP_map_pcidev_to_ioreq_server 21
+struct xen_hvm_map_pcidev_to_ioreq_server {
+    domid_t domid;      /* IN - domain to be serviced */
+    ioservid_t id;      /* IN - handle from HVMOP_register_ioreq_server */
+    uint16_t bdf;       /* IN - PCI bus/dev/func */
+};
+typedef struct xen_hvm_map_pcidev_to_ioreq_server xen_hvm_map_pcidev_to_ioreq_server_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_map_pcidev_to_ioreq_server_t);
+
+#define HVMOP_unmap_pcidev_from_ioreq_server 22
+struct xen_hvm_unmap_pcidev_from_ioreq_server {
+    domid_t domid;      /* IN - domain to be serviced */
+    ioservid_t id;      /* IN - handle from HVMOP_register_ioreq_server */
+    uint16_t bdf;       /* IN - PCI bus/dev/func */
+};
+typedef struct xen_hvm_unmap_pcidev_from_ioreq_server xen_hvm_unmap_pcidev_from_ioreq_server_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_unmap_pcidev_from_ioreq_server_t);
+
+#define HVMOP_destroy_ioreq_server 23
+struct xen_hvm_destroy_ioreq_server {
+    domid_t domid;          /* IN - domain to be serviced */
+    ioservid_t id;          /* IN - server id */
+};
+typedef struct xen_hvm_destroy_ioreq_server xen_hvm_destroy_ioreq_server_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_destroy_ioreq_server_t);
+
 #endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
 
 #endif /* __XEN_PUBLIC_HVM_HVM_OP_H__ */
diff --git a/xen/include/public/hvm/ioreq.h b/xen/include/public/hvm/ioreq.h
index f05d130..e84fa75 100644
--- a/xen/include/public/hvm/ioreq.h
+++ b/xen/include/public/hvm/ioreq.h
@@ -34,6 +34,7 @@
 
 #define IOREQ_TYPE_PIO          0 /* pio */
 #define IOREQ_TYPE_COPY         1 /* mmio ops */
+#define IOREQ_TYPE_PCI_CONFIG   2 /* pci config ops */
 #define IOREQ_TYPE_TIMEOFFSET   7
 #define IOREQ_TYPE_INVALIDATE   8 /* mapcache */
 
diff --git a/xen/include/public/hvm/params.h b/xen/include/public/hvm/params.h
index 517a184..4109b11 100644
--- a/xen/include/public/hvm/params.h
+++ b/xen/include/public/hvm/params.h
@@ -145,6 +145,8 @@
 /* SHUTDOWN_* action in case of a triple fault */
 #define HVM_PARAM_TRIPLE_FAULT_REASON 31
 
-#define HVM_NR_PARAMS          32
+#define HVM_PARAM_NR_IOREQ_SERVERS 32
+
+#define HVM_NR_PARAMS          33
 
 #endif /* __XEN_PUBLIC_HVM_PARAMS_H__ */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v3 6/6] ioreq-server: bring the PCI hotplug controller implementation into Xen
  2014-03-05 14:47 [PATCH v3 0/6] Support for running secondary emulators Paul Durrant
                   ` (4 preceding siblings ...)
  2014-03-05 14:48 ` [PATCH v3 5/6] ioreq-server: add support for multiple servers Paul Durrant
@ 2014-03-05 14:48 ` Paul Durrant
  2014-03-14 11:57   ` Ian Campbell
  2014-03-10 18:57 ` [PATCH v3 0/6] Support for running secondary emulators George Dunlap
  2014-03-14 11:02 ` Ian Campbell
  7 siblings, 1 reply; 48+ messages in thread
From: Paul Durrant @ 2014-03-05 14:48 UTC (permalink / raw)
  To: xen-devel; +Cc: Paul Durrant

Because we may now have more than one emulator, the implementation of the
PCI hotplug controller needs to be done by Xen. Happily the code is very
short and simple and it also removes the need for a different ACPI DSDT
when using different variants of QEMU.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
---
 tools/firmware/hvmloader/acpi/mk_dsdt.c |  147 ++++------------------
 tools/libxc/xc_domain.c                 |   46 +++++++
 tools/libxc/xenctrl.h                   |   11 ++
 tools/libxl/libxl_pci.c                 |   15 +++
 xen/arch/x86/hvm/Makefile               |    1 +
 xen/arch/x86/hvm/hotplug.c              |  207 +++++++++++++++++++++++++++++++
 xen/arch/x86/hvm/hvm.c                  |   40 +++++-
 xen/include/asm-x86/hvm/domain.h        |   12 ++
 xen/include/asm-x86/hvm/io.h            |    6 +
 xen/include/public/hvm/hvm_op.h         |    9 ++
 xen/include/public/hvm/ioreq.h          |    2 +
 11 files changed, 373 insertions(+), 123 deletions(-)
 create mode 100644 xen/arch/x86/hvm/hotplug.c

diff --git a/tools/firmware/hvmloader/acpi/mk_dsdt.c b/tools/firmware/hvmloader/acpi/mk_dsdt.c
index a4b693b..6408b44 100644
--- a/tools/firmware/hvmloader/acpi/mk_dsdt.c
+++ b/tools/firmware/hvmloader/acpi/mk_dsdt.c
@@ -58,28 +58,6 @@ static void pop_block(void)
     printf("}\n");
 }
 
-static void pci_hotplug_notify(unsigned int slt)
-{
-    stmt("Notify", "\\_SB.PCI0.S%02X, EVT", slt);
-}
-
-static void decision_tree(
-    unsigned int s, unsigned int e, char *var, void (*leaf)(unsigned int))
-{
-    if ( s == (e-1) )
-    {
-        (*leaf)(s);
-        return;
-    }
-
-    push_block("If", "And(%s, 0x%02x)", var, (e-s)/2);
-    decision_tree((s+e)/2, e, var, leaf);
-    pop_block();
-    push_block("Else", NULL);
-    decision_tree(s, (s+e)/2, var, leaf);
-    pop_block();
-}
-
 static struct option options[] = {
     { "maxcpu", 1, 0, 'c' },
     { "dm-version", 1, 0, 'q' },
@@ -322,64 +300,21 @@ int main(int argc, char **argv)
                    dev, intx, ((dev*4+dev/8+intx)&31)+16);
     printf("})\n");
 
-    /*
-     * Each PCI hotplug slot needs at least two methods to handle
-     * the ACPI event:
-     *  _EJ0: eject a device
-     *  _STA: return a device's status, e.g. enabled or removed
-     * 
-     * Eject button would generate a general-purpose event, then the
-     * control method for this event uses Notify() to inform OSPM which
-     * action happened and on which device.
-     *
-     * Pls. refer "6.3 Device Insertion, Removal, and Status Objects"
-     * in ACPI spec 3.0b for details.
-     *
-     * QEMU provides a simple hotplug controller with some I/O to handle
-     * the hotplug action and status, which is beyond the ACPI scope.
-     */
-    if (dm_version == QEMU_XEN_TRADITIONAL) {
-        for ( slot = 0; slot < 0x100; slot++ )
-        {
-            push_block("Device", "S%02X", slot);
-            /* _ADR == dev:fn (16:16) */
-            stmt("Name", "_ADR, 0x%08x", ((slot & ~7) << 13) | (slot & 7));
-            /* _SUN == dev */
-            stmt("Name", "_SUN, 0x%08x", slot >> 3);
-            push_block("Method", "_EJ0, 1");
-            stmt("Store", "0x%02x, \\_GPE.DPT1", slot);
-            stmt("Store", "0x88, \\_GPE.DPT2");
-            stmt("Store", "0x%02x, \\_GPE.PH%02X", /* eject */
-                 (slot & 1) ? 0x10 : 0x01, slot & ~1);
-            pop_block();
-            push_block("Method", "_STA, 0");
-            stmt("Store", "0x%02x, \\_GPE.DPT1", slot);
-            stmt("Store", "0x89, \\_GPE.DPT2");
-            if ( slot & 1 )
-                stmt("ShiftRight", "0x4, \\_GPE.PH%02X, Local1", slot & ~1);
-            else
-                stmt("And", "\\_GPE.PH%02X, 0x0f, Local1", slot & ~1);
-            stmt("Return", "Local1"); /* IN status as the _STA */
-            pop_block();
-            pop_block();
-        }
-    } else {
-        stmt("OperationRegion", "SEJ, SystemIO, 0xae08, 0x04");
-        push_block("Field", "SEJ, DWordAcc, NoLock, WriteAsZeros");
-        indent(); printf("B0EJ, 32,\n");
-        pop_block();
+    stmt("OperationRegion", "SEJ, SystemIO, 0xae08, 0x04");
+    push_block("Field", "SEJ, DWordAcc, NoLock, WriteAsZeros");
+    indent(); printf("B0EJ, 32,\n");
+    pop_block();
 
-        /* hotplug_slot */
-        for (slot = 1; slot <= 31; slot++) {
-            push_block("Device", "S%i", slot); {
-                stmt("Name", "_ADR, %#06x0000", slot);
-                push_block("Method", "_EJ0,1"); {
-                    stmt("Store", "ShiftLeft(1, %#06x), B0EJ", slot);
-                    stmt("Return", "0x0");
-                } pop_block();
-                stmt("Name", "_SUN, %i", slot);
+    /* hotplug_slot */
+    for (slot = 1; slot <= 31; slot++) {
+        push_block("Device", "S%i", slot); {
+            stmt("Name", "_ADR, %#06x0000", slot);
+            push_block("Method", "_EJ0,1"); {
+                stmt("Store", "ShiftLeft(1, %#06x), B0EJ", slot);
+                stmt("Return", "0x0");
             } pop_block();
-        }
+            stmt("Name", "_SUN, %i", slot);
+        } pop_block();
     }
 
     pop_block();
@@ -389,26 +324,11 @@ int main(int argc, char **argv)
     /**** GPE start ****/
     push_block("Scope", "\\_GPE");
 
-    if (dm_version == QEMU_XEN_TRADITIONAL) {
-        stmt("OperationRegion", "PHP, SystemIO, 0x10c0, 0x82");
-
-        push_block("Field", "PHP, ByteAcc, NoLock, Preserve");
-        indent(); printf("PSTA, 8,\n"); /* hotplug controller event reg */
-        indent(); printf("PSTB, 8,\n"); /* hotplug controller slot reg */
-        for ( slot = 0; slot < 0x100; slot += 2 )
-        {
-            indent();
-            /* Each hotplug control register manages a pair of pci functions. */
-            printf("PH%02X, 8,\n", slot);
-        }
-        pop_block();
-    } else {
-        stmt("OperationRegion", "PCST, SystemIO, 0xae00, 0x08");
-        push_block("Field", "PCST, DWordAcc, NoLock, WriteAsZeros");
-        indent(); printf("PCIU, 32,\n");
-        indent(); printf("PCID, 32,\n");
-        pop_block();
-    }
+    stmt("OperationRegion", "PCST, SystemIO, 0xae00, 0x08");
+    push_block("Field", "PCST, DWordAcc, NoLock, WriteAsZeros");
+    indent(); printf("PCIU, 32,\n");
+    indent(); printf("PCID, 32,\n");
+    pop_block();
 
     stmt("OperationRegion", "DG1, SystemIO, 0xb044, 0x04");
 
@@ -416,33 +336,16 @@ int main(int argc, char **argv)
     indent(); printf("DPT1, 8, DPT2, 8\n");
     pop_block();
 
-    if (dm_version == QEMU_XEN_TRADITIONAL) {
-        push_block("Method", "_L03, 0, Serialized");
-        /* Detect slot and event (remove/add). */
-        stmt("Name", "SLT, 0x0");
-        stmt("Name", "EVT, 0x0");
-        stmt("Store", "PSTA, Local1");
-        stmt("And", "Local1, 0xf, EVT");
-        stmt("Store", "PSTB, Local1"); /* XXX: Store (PSTB, SLT) ? */
-        stmt("And", "Local1, 0xff, SLT");
-        /* Debug */
-        stmt("Store", "SLT, DPT1");
-        stmt("Store", "EVT, DPT2");
-        /* Decision tree */
-        decision_tree(0x00, 0x100, "SLT", pci_hotplug_notify);
+    push_block("Method", "_E01");
+    for (slot = 1; slot <= 31; slot++) {
+        push_block("If", "And(PCIU, ShiftLeft(1, %i))", slot);
+        stmt("Notify", "\\_SB.PCI0.S%i, 1", slot);
         pop_block();
-    } else {
-        push_block("Method", "_E01");
-        for (slot = 1; slot <= 31; slot++) {
-            push_block("If", "And(PCIU, ShiftLeft(1, %i))", slot);
-            stmt("Notify", "\\_SB.PCI0.S%i, 1", slot);
-            pop_block();
-            push_block("If", "And(PCID, ShiftLeft(1, %i))", slot);
-            stmt("Notify", "\\_SB.PCI0.S%i, 3", slot);
-            pop_block();
-        }
+        push_block("If", "And(PCID, ShiftLeft(1, %i))", slot);
+        stmt("Notify", "\\_SB.PCI0.S%i, 3", slot);
         pop_block();
     }
+    pop_block();
 
     pop_block();
     /**** GPE end ****/
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index dfa905b..5b49316 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1459,6 +1459,52 @@ int xc_hvm_destroy_ioreq_server(xc_interface *xch,
     return rc;
 }
 
+int xc_hvm_pci_hotplug_enable(xc_interface *xch,
+                              domid_t domid,
+                              uint32_t slot)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_pci_hotplug_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_pci_hotplug;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+    arg->domid = domid;
+    arg->enable = 1;
+    arg->slot = slot;
+    rc = do_xen_hypercall(xch, &hypercall);
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
+int xc_hvm_pci_hotplug_disable(xc_interface *xch,
+                               domid_t domid,
+                               uint32_t slot)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_pci_hotplug_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_pci_hotplug;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+    arg->domid = domid;
+    arg->enable = 0;
+    arg->slot = slot;
+    rc = do_xen_hypercall(xch, &hypercall);
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
 int xc_domain_setdebugging(xc_interface *xch,
                            uint32_t domid,
                            unsigned int enable)
diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
index 84cab13..b9c9849 100644
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -1842,6 +1842,17 @@ int xc_hvm_destroy_ioreq_server(xc_interface *xch,
                                 domid_t domid,
                                 ioservid_t id);
 
+/*
+ * PCI hotplug API
+ */
+int xc_hvm_pci_hotplug_enable(xc_interface *xch,
+                              domid_t domid,
+                              uint32_t slot);
+
+int xc_hvm_pci_hotplug_disable(xc_interface *xch,
+                               domid_t domid,
+                               uint32_t slot);
+
 /* HVM guest pass-through */
 int xc_assign_device(xc_interface *xch,
                      uint32_t domid,
diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index 2e52470..4176440 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -867,6 +867,13 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, i
         }
         if ( rc )
             return ERROR_FAIL;
+
+        rc = xc_hvm_pci_hotplug_enable(ctx->xch, domid, pcidev->dev);
+        if (rc < 0) {
+            LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "Error: xc_hvm_pci_hotplug_enable failed");
+            return ERROR_FAIL;
+        }
+
         break;
     case LIBXL_DOMAIN_TYPE_PV:
     {
@@ -1182,6 +1189,14 @@ static int do_pci_remove(libxl__gc *gc, uint32_t domid,
                                          NULL, NULL, NULL) < 0)
             goto out_fail;
 
+        rc = xc_hvm_pci_hotplug_disable(ctx->xch, domid, pcidev->dev);
+        if (rc < 0) {
+            LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR,
+                             "Error: xc_hvm_pci_hotplug_disable failed");
+            rc = ERROR_FAIL;
+            goto out_fail;
+        }
+
         switch (libxl__device_model_version_running(gc, domid)) {
         case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL:
             rc = qemu_pci_remove_xenstore(gc, domid, pcidev, force);
diff --git a/xen/arch/x86/hvm/Makefile b/xen/arch/x86/hvm/Makefile
index eea5555..48efddb 100644
--- a/xen/arch/x86/hvm/Makefile
+++ b/xen/arch/x86/hvm/Makefile
@@ -3,6 +3,7 @@ subdir-y += vmx
 
 obj-y += asid.o
 obj-y += emulate.o
+obj-y += hotplug.o
 obj-y += hpet.o
 obj-y += hvm.o
 obj-y += i8254.o
diff --git a/xen/arch/x86/hvm/hotplug.c b/xen/arch/x86/hvm/hotplug.c
new file mode 100644
index 0000000..397c50a
--- /dev/null
+++ b/xen/arch/x86/hvm/hotplug.c
@@ -0,0 +1,207 @@
+/*
+ * hvm/hotplug.c
+ *
+ * Copyright (c) 2014, Citrix Systems Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ */
+
+#include <xen/types.h>
+#include <xen/spinlock.h>
+#include <xen/xmalloc.h>
+#include <asm/hvm/io.h>
+#include <asm/hvm/support.h>
+
+#define SCI_IRQ 9
+
+#define GPE_BASE            (ACPI_GPE0_BLK_ADDRESS_V1)
+#define GPE_LEN             (ACPI_GPE0_BLK_LEN_V1)
+
+#define GPE_PCI_HOTPLUG_STATUS  2
+
+#define PCI_HOTPLUG_BASE    (ACPI_PCI_HOTPLUG_ADDRESS_V1)
+#define PCI_HOTPLUG_LEN     (ACPI_PCI_HOTPLUG_LEN_V1)
+
+#define PCI_UP      0
+#define PCI_DOWN    4
+#define PCI_EJECT   8
+
+static void gpe_update_sci(struct hvm_hotplug *hp)
+{
+    if ( (hp->gpe_sts[0] & hp->gpe_en[0]) & GPE_PCI_HOTPLUG_STATUS )
+        hvm_isa_irq_assert(hp->domain, SCI_IRQ);
+    else
+        hvm_isa_irq_deassert(hp->domain, SCI_IRQ);
+}
+
+static int handle_gpe_io(
+    int dir, uint32_t port, uint32_t bytes, uint32_t *val)
+{
+    struct vcpu *v = current;
+    struct domain *d = v->domain;
+    struct hvm_hotplug  *hp = &d->arch.hvm_domain.hotplug;
+
+    if ( bytes != 1 )
+    {
+        gdprintk(XENLOG_WARNING, "%s: bad access\n", __func__);
+        goto done;
+    }
+
+    port -= GPE_BASE;
+
+    if ( dir == IOREQ_READ )
+    {
+        if ( port < GPE_LEN / 2 )
+        {
+            *val = hp->gpe_sts[port];
+        }
+        else
+        {
+            port -= GPE_LEN / 2;
+            *val = hp->gpe_en[port];
+        }
+    } else {
+        if ( port < GPE_LEN / 2 )
+        {
+            hp->gpe_sts[port] &= ~*val;
+        }
+        else
+        {
+            port -= GPE_LEN / 2;
+            hp->gpe_en[port] = *val;
+        }
+
+        gpe_update_sci(hp);
+    }
+
+ done:
+    return X86EMUL_OKAY;
+}
+
+static void pci_hotplug_eject(struct hvm_hotplug *hp, uint32_t mask)
+{
+    int slot = ffs(mask) - 1;
+
+    gdprintk(XENLOG_INFO, "%s: %d\n", __func__, slot);
+
+    hp->slot_down &= ~(1u  << slot);
+    hp->slot_up &= ~(1u  << slot);
+}
+
+static int handle_pci_hotplug_io(
+    int dir, uint32_t port, uint32_t bytes, uint32_t *val)
+{
+    struct vcpu *v = current;
+    struct domain *d = v->domain;
+    struct hvm_hotplug  *hp = &d->arch.hvm_domain.hotplug;
+
+    if ( bytes != 4 )
+    {
+        gdprintk(XENLOG_WARNING, "%s: bad access\n", __func__);
+        goto done;
+    }
+
+    port -= PCI_HOTPLUG_BASE;
+
+    if ( dir == IOREQ_READ )
+    {
+        switch ( port )
+        {
+        case PCI_UP:
+            *val = hp->slot_up;
+            break;
+        case PCI_DOWN:
+            *val = hp->slot_down;
+            break;
+        default:
+            break;
+        }
+    }
+    else
+    {   
+        switch ( port )
+        {
+        case PCI_EJECT:
+            pci_hotplug_eject(hp, *val);
+            break;
+        default:
+            break;
+        }
+    }
+
+ done:
+    return X86EMUL_OKAY;
+}
+
+void pci_hotplug(struct domain *d, int slot, bool_t enable)
+{
+    struct hvm_hotplug  *hp = &d->arch.hvm_domain.hotplug;
+
+    gdprintk(XENLOG_INFO, "%s: %s %d\n", __func__,
+             ( enable ) ? "enable" : "disable", slot);
+
+    if ( enable )
+        hp->slot_up |= (1u << slot);
+    else
+        hp->slot_down |= (1u << slot);
+
+    hp->gpe_sts[0] |= GPE_PCI_HOTPLUG_STATUS;
+    gpe_update_sci(hp);
+}
+
+int gpe_init(struct domain *d)
+{
+    struct hvm_hotplug  *hp = &d->arch.hvm_domain.hotplug;
+
+    hp->domain = d;
+
+    hp->gpe_sts = xzalloc_array(uint8_t, GPE_LEN / 2);
+    if ( hp->gpe_sts == NULL )
+        goto fail1;
+
+    hp->gpe_en = xzalloc_array(uint8_t, GPE_LEN / 2);
+    if ( hp->gpe_en == NULL )
+        goto fail2;
+
+    register_portio_handler(d, GPE_BASE, GPE_LEN, handle_gpe_io);
+    register_portio_handler(d, PCI_HOTPLUG_BASE, PCI_HOTPLUG_LEN,
+                            handle_pci_hotplug_io);
+
+    return 0;
+
+ fail2:
+    xfree(hp->gpe_sts);
+
+ fail1:
+    return -ENOMEM;
+}
+
+void gpe_deinit(struct domain *d)
+{
+    struct hvm_hotplug  *hp = &d->arch.hvm_domain.hotplug;
+
+    xfree(hp->gpe_en);
+    xfree(hp->gpe_sts);
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * c-tab-always-indent: nil
+ * End:
+ */
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 996c374..507f4ea 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1319,15 +1319,21 @@ int hvm_domain_initialise(struct domain *d)
 
     rtc_init(d);
 
+    rc = gpe_init(d);
+    if ( rc != 0 )
+        goto fail2;
+
     register_portio_handler(d, 0xe9, 1, hvm_print_line);
     register_portio_handler(d, 0xcf8, 4, hvm_access_cf8);
 
     rc = hvm_funcs.domain_initialise(d);
     if ( rc != 0 )
-        goto fail2;
+        goto fail3;
 
     return 0;
 
+ fail3:
+    gpe_deinit(d);
  fail2:
     rtc_deinit(d);
     stdvga_deinit(d);
@@ -1373,6 +1379,7 @@ void hvm_domain_destroy(struct domain *d)
         return;
 
     hvm_funcs.domain_destroy(d);
+    gpe_deinit(d);
     rtc_deinit(d);
     stdvga_deinit(d);
     vioapic_deinit(d);
@@ -5155,6 +5162,32 @@ static int hvmop_destroy_ioreq_server(
     return rc;
 }
 
+static int hvmop_pci_hotplug(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_pci_hotplug_t) uop)
+{
+    xen_hvm_pci_hotplug_t op;
+    struct domain *d;
+    int rc;
+
+    if ( copy_from_guest(&op, uop, 1) )
+        return -EFAULT;
+
+    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
+    if ( rc != 0 )
+        return rc;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    pci_hotplug(d, op.slot, op.enable);
+    rc = 0;
+
+out:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
 long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
 
 {
@@ -5198,6 +5231,11 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
             guest_handle_cast(arg, xen_hvm_destroy_ioreq_server_t));
         break;
     
+    case HVMOP_pci_hotplug:
+        rc = hvmop_pci_hotplug(
+            guest_handle_cast(arg, xen_hvm_pci_hotplug_t));
+        break;
+
     case HVMOP_set_param:
     case HVMOP_get_param:
     {
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index 0ed2bb2..76135a7 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -66,12 +66,24 @@ struct hvm_ioreq_server {
     struct list_head       pcidev_list;
 };
 
+struct hvm_hotplug {
+    struct domain   *domain;
+    uint8_t         *gpe_sts;
+    uint8_t         *gpe_en;
+
+    /* PCI hotplug */
+    uint32_t        slot_up;
+    uint32_t        slot_down;
+};
+
 struct hvm_domain {
     struct list_head        ioreq_server_list;
     spinlock_t              ioreq_server_lock;
     uint32_t                pci_cf8;
     spinlock_t              pci_lock;
 
+    struct hvm_hotplug      hotplug;
+
     struct pl_time         pl_time;
 
     struct hvm_io_handler *io_handler;
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index be6546d..b30a50d 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -142,5 +142,11 @@ void stdvga_init(struct domain *d);
 void stdvga_deinit(struct domain *d);
 
 extern void hvm_dpci_msi_eoi(struct domain *d, int vector);
+
+int gpe_init(struct domain *d);
+void gpe_deinit(struct domain *d);
+
+void pci_hotplug(struct domain *d, int slot, bool_t enable);
+
 #endif /* __ASM_X86_HVM_IO_H__ */
 
diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
index 6b31189..20a53ab 100644
--- a/xen/include/public/hvm/hvm_op.h
+++ b/xen/include/public/hvm/hvm_op.h
@@ -340,6 +340,15 @@ struct xen_hvm_destroy_ioreq_server {
 typedef struct xen_hvm_destroy_ioreq_server xen_hvm_destroy_ioreq_server_t;
 DEFINE_XEN_GUEST_HANDLE(xen_hvm_destroy_ioreq_server_t);
 
+#define HVMOP_pci_hotplug 24
+struct xen_hvm_pci_hotplug {
+    domid_t domid;          /* IN - domain to be serviced */
+    uint8_t enable;         /* IN - enable or disable? */
+    uint32_t slot;          /* IN - slot to enable/disable */
+};
+typedef struct xen_hvm_pci_hotplug xen_hvm_pci_hotplug_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_pci_hotplug_t);
+
 #endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
 
 #endif /* __XEN_PUBLIC_HVM_HVM_OP_H__ */
diff --git a/xen/include/public/hvm/ioreq.h b/xen/include/public/hvm/ioreq.h
index e84fa75..40bfa61 100644
--- a/xen/include/public/hvm/ioreq.h
+++ b/xen/include/public/hvm/ioreq.h
@@ -101,6 +101,8 @@ typedef struct buffered_iopage buffered_iopage_t;
 #define ACPI_PM_TMR_BLK_ADDRESS_V1   (ACPI_PM1A_EVT_BLK_ADDRESS_V1 + 0x08)
 #define ACPI_GPE0_BLK_ADDRESS_V1     0xafe0
 #define ACPI_GPE0_BLK_LEN_V1         0x04
+#define ACPI_PCI_HOTPLUG_ADDRESS_V1  0xae00
+#define ACPI_PCI_HOTPLUG_LEN_V1      0x10
 
 /* Compatibility definitions for the default location (version 0). */
 #define ACPI_PM1A_EVT_BLK_ADDRESS    ACPI_PM1A_EVT_BLK_ADDRESS_V0
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 2/6] ioreq-server: tidy up use of ioreq_t
  2014-03-05 14:47 ` [PATCH v3 2/6] ioreq-server: tidy up use of ioreq_t Paul Durrant
@ 2014-03-10 15:43   ` George Dunlap
  2014-03-10 15:46     ` Paul Durrant
  0 siblings, 1 reply; 48+ messages in thread
From: George Dunlap @ 2014-03-10 15:43 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel

On Wed, Mar 5, 2014 at 2:47 PM, Paul Durrant <paul.durrant@citrix.com> wrote:
> This patch tidies up various occurences of single element ioreq_t
> arrays on the stack and improves coding style.
>
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>

Maybe I missed this in the earlier discussion, but why is most of this
not integrated into patch 1?

 -George

> ---
>  xen/arch/x86/hvm/emulate.c |   36 ++++++++++++++++++------------------
>  xen/arch/x86/hvm/hvm.c     |    2 ++
>  xen/arch/x86/hvm/io.c      |   37 +++++++++++++++++--------------------
>  3 files changed, 37 insertions(+), 38 deletions(-)
>
> diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
> index 0ba2020..1c71902 100644
> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -57,7 +57,7 @@ static int hvmemul_do_io(
>      int value_is_ptr = (p_data == NULL);
>      struct vcpu *curr = current;
>      struct hvm_vcpu_io *vio;
> -    ioreq_t p[1];
> +    ioreq_t p;
>      unsigned long ram_gfn = paddr_to_pfn(ram_gpa);
>      p2m_type_t p2mt;
>      struct page_info *ram_page;
> @@ -171,38 +171,38 @@ static int hvmemul_do_io(
>      if ( vio->mmio_retrying )
>          *reps = 1;
>
> -    p->dir = dir;
> -    p->data_is_ptr = value_is_ptr;
> -    p->type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO;
> -    p->size = size;
> -    p->addr = addr;
> -    p->count = *reps;
> -    p->df = df;
> -    p->data = value;
> +    p.dir = dir;
> +    p.data_is_ptr = value_is_ptr;
> +    p.type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO;
> +    p.size = size;
> +    p.addr = addr;
> +    p.count = *reps;
> +    p.df = df;
> +    p.data = value;
>
>      if ( dir == IOREQ_WRITE )
> -        hvmtrace_io_assist(is_mmio, p);
> +        hvmtrace_io_assist(is_mmio, &p);
>
>      if ( is_mmio )
>      {
> -        rc = hvm_mmio_intercept(p);
> +        rc = hvm_mmio_intercept(&p);
>          if ( rc == X86EMUL_UNHANDLEABLE )
> -            rc = hvm_buffered_io_intercept(p);
> +            rc = hvm_buffered_io_intercept(&p);
>      }
>      else
>      {
> -        rc = hvm_portio_intercept(p);
> +        rc = hvm_portio_intercept(&p);
>      }
>
>      switch ( rc )
>      {
>      case X86EMUL_OKAY:
>      case X86EMUL_RETRY:
> -        *reps = p->count;
> -        p->state = STATE_IORESP_READY;
> +        *reps = p.count;
> +        p.state = STATE_IORESP_READY;
>          if ( !vio->mmio_retry )
>          {
> -            hvm_io_assist(p);
> +            hvm_io_assist(&p);
>              vio->io_state = HVMIO_none;
>          }
>          else
> @@ -219,7 +219,7 @@ static int hvmemul_do_io(
>          else
>          {
>              rc = X86EMUL_RETRY;
> -            if ( !hvm_send_assist_req(curr, p) )
> +            if ( !hvm_send_assist_req(curr, &p) )
>                  vio->io_state = HVMIO_none;
>              else if ( p_data == NULL )
>                  rc = X86EMUL_OKAY;
> @@ -238,7 +238,7 @@ static int hvmemul_do_io(
>
>   finish_access:
>      if ( dir == IOREQ_READ )
> -        hvmtrace_io_assist(is_mmio, p);
> +        hvmtrace_io_assist(is_mmio, &p);
>
>      if ( p_data != NULL )
>          memcpy(p_data, &vio->io_data, size);
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 0b2e57e..10b8e8c 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -349,7 +349,9 @@ static ioreq_t *get_ioreq(struct vcpu *v)
>  {
>      struct domain *d = v->domain;
>      shared_iopage_t *p = d->arch.hvm_domain.ioreq.va;
> +
>      ASSERT((v == current) || spin_is_locked(&d->arch.hvm_domain.ioreq.lock));
> +
>      return p ? &p->vcpu_ioreq[v->vcpu_id] : NULL;
>  }
>
> diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
> index ba50c53..7aac61d 100644
> --- a/xen/arch/x86/hvm/io.c
> +++ b/xen/arch/x86/hvm/io.c
> @@ -49,22 +49,19 @@
>  void send_timeoffset_req(unsigned long timeoff)
>  {
>      struct vcpu *curr = current;
> -    ioreq_t p[1];
> +    ioreq_t p = {
> +        .type = IOREQ_TYPE_TIMEOFFSET,
> +        .size = 8,
> +        .count = 1,
> +        .dir = IOREQ_WRITE,
> +        .data = timeoff,
> +        .state = STATE_IOREQ_READY,
> +    };
>
>      if ( timeoff == 0 )
>          return;
>
> -    memset(p, 0, sizeof(*p));
> -
> -    p->type = IOREQ_TYPE_TIMEOFFSET;
> -    p->size = 8;
> -    p->count = 1;
> -    p->dir = IOREQ_WRITE;
> -    p->data = timeoff;
> -
> -    p->state = STATE_IOREQ_READY;
> -
> -    if ( !hvm_buffered_io_send(curr->domain, p) )
> +    if ( !hvm_buffered_io_send(curr->domain, &p) )
>          printk("Unsuccessful timeoffset update\n");
>  }
>
> @@ -72,14 +69,14 @@ void send_timeoffset_req(unsigned long timeoff)
>  void send_invalidate_req(void)
>  {
>      struct vcpu *curr = current;
> -    ioreq_t p[1];
> -
> -    p->type = IOREQ_TYPE_INVALIDATE;
> -    p->size = 4;
> -    p->dir = IOREQ_WRITE;
> -    p->data = ~0UL; /* flush all */
> -
> -    (void)hvm_send_assist_req(curr, p);
> +    ioreq_t p = {
> +        .type = IOREQ_TYPE_INVALIDATE,
> +        .size = 4,
> +        .dir = IOREQ_WRITE,
> +        .data = ~0UL, /* flush all */
> +    };
> +
> +    (void)hvm_send_assist_req(curr, &p);
>  }
>
>  int handle_mmio(void)
> --
> 1.7.10.4
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 2/6] ioreq-server: tidy up use of ioreq_t
  2014-03-10 15:43   ` George Dunlap
@ 2014-03-10 15:46     ` Paul Durrant
  2014-03-10 15:53       ` George Dunlap
  0 siblings, 1 reply; 48+ messages in thread
From: Paul Durrant @ 2014-03-10 15:46 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel

> -----Original Message-----
> From: dunlapg@gmail.com [mailto:dunlapg@gmail.com] On Behalf Of
> George Dunlap
> Sent: 10 March 2014 15:43
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH v3 2/6] ioreq-server: tidy up use of ioreq_t
> 
> On Wed, Mar 5, 2014 at 2:47 PM, Paul Durrant <paul.durrant@citrix.com>
> wrote:
> > This patch tidies up various occurences of single element ioreq_t
> > arrays on the stack and improves coding style.
> >
> > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> 
> Maybe I missed this in the earlier discussion, but why is most of this
> not integrated into patch 1?
> 

It was a patch that was added after the v1 RFC patch series. I wanted to keep it separate to avoid making patch 1 massively different to what it was before.

  Paul

>  -George
> 
> > ---
> >  xen/arch/x86/hvm/emulate.c |   36 ++++++++++++++++++------------------
> >  xen/arch/x86/hvm/hvm.c     |    2 ++
> >  xen/arch/x86/hvm/io.c      |   37 +++++++++++++++++--------------------
> >  3 files changed, 37 insertions(+), 38 deletions(-)
> >
> > diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
> > index 0ba2020..1c71902 100644
> > --- a/xen/arch/x86/hvm/emulate.c
> > +++ b/xen/arch/x86/hvm/emulate.c
> > @@ -57,7 +57,7 @@ static int hvmemul_do_io(
> >      int value_is_ptr = (p_data == NULL);
> >      struct vcpu *curr = current;
> >      struct hvm_vcpu_io *vio;
> > -    ioreq_t p[1];
> > +    ioreq_t p;
> >      unsigned long ram_gfn = paddr_to_pfn(ram_gpa);
> >      p2m_type_t p2mt;
> >      struct page_info *ram_page;
> > @@ -171,38 +171,38 @@ static int hvmemul_do_io(
> >      if ( vio->mmio_retrying )
> >          *reps = 1;
> >
> > -    p->dir = dir;
> > -    p->data_is_ptr = value_is_ptr;
> > -    p->type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO;
> > -    p->size = size;
> > -    p->addr = addr;
> > -    p->count = *reps;
> > -    p->df = df;
> > -    p->data = value;
> > +    p.dir = dir;
> > +    p.data_is_ptr = value_is_ptr;
> > +    p.type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO;
> > +    p.size = size;
> > +    p.addr = addr;
> > +    p.count = *reps;
> > +    p.df = df;
> > +    p.data = value;
> >
> >      if ( dir == IOREQ_WRITE )
> > -        hvmtrace_io_assist(is_mmio, p);
> > +        hvmtrace_io_assist(is_mmio, &p);
> >
> >      if ( is_mmio )
> >      {
> > -        rc = hvm_mmio_intercept(p);
> > +        rc = hvm_mmio_intercept(&p);
> >          if ( rc == X86EMUL_UNHANDLEABLE )
> > -            rc = hvm_buffered_io_intercept(p);
> > +            rc = hvm_buffered_io_intercept(&p);
> >      }
> >      else
> >      {
> > -        rc = hvm_portio_intercept(p);
> > +        rc = hvm_portio_intercept(&p);
> >      }
> >
> >      switch ( rc )
> >      {
> >      case X86EMUL_OKAY:
> >      case X86EMUL_RETRY:
> > -        *reps = p->count;
> > -        p->state = STATE_IORESP_READY;
> > +        *reps = p.count;
> > +        p.state = STATE_IORESP_READY;
> >          if ( !vio->mmio_retry )
> >          {
> > -            hvm_io_assist(p);
> > +            hvm_io_assist(&p);
> >              vio->io_state = HVMIO_none;
> >          }
> >          else
> > @@ -219,7 +219,7 @@ static int hvmemul_do_io(
> >          else
> >          {
> >              rc = X86EMUL_RETRY;
> > -            if ( !hvm_send_assist_req(curr, p) )
> > +            if ( !hvm_send_assist_req(curr, &p) )
> >                  vio->io_state = HVMIO_none;
> >              else if ( p_data == NULL )
> >                  rc = X86EMUL_OKAY;
> > @@ -238,7 +238,7 @@ static int hvmemul_do_io(
> >
> >   finish_access:
> >      if ( dir == IOREQ_READ )
> > -        hvmtrace_io_assist(is_mmio, p);
> > +        hvmtrace_io_assist(is_mmio, &p);
> >
> >      if ( p_data != NULL )
> >          memcpy(p_data, &vio->io_data, size);
> > diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> > index 0b2e57e..10b8e8c 100644
> > --- a/xen/arch/x86/hvm/hvm.c
> > +++ b/xen/arch/x86/hvm/hvm.c
> > @@ -349,7 +349,9 @@ static ioreq_t *get_ioreq(struct vcpu *v)
> >  {
> >      struct domain *d = v->domain;
> >      shared_iopage_t *p = d->arch.hvm_domain.ioreq.va;
> > +
> >      ASSERT((v == current) || spin_is_locked(&d-
> >arch.hvm_domain.ioreq.lock));
> > +
> >      return p ? &p->vcpu_ioreq[v->vcpu_id] : NULL;
> >  }
> >
> > diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
> > index ba50c53..7aac61d 100644
> > --- a/xen/arch/x86/hvm/io.c
> > +++ b/xen/arch/x86/hvm/io.c
> > @@ -49,22 +49,19 @@
> >  void send_timeoffset_req(unsigned long timeoff)
> >  {
> >      struct vcpu *curr = current;
> > -    ioreq_t p[1];
> > +    ioreq_t p = {
> > +        .type = IOREQ_TYPE_TIMEOFFSET,
> > +        .size = 8,
> > +        .count = 1,
> > +        .dir = IOREQ_WRITE,
> > +        .data = timeoff,
> > +        .state = STATE_IOREQ_READY,
> > +    };
> >
> >      if ( timeoff == 0 )
> >          return;
> >
> > -    memset(p, 0, sizeof(*p));
> > -
> > -    p->type = IOREQ_TYPE_TIMEOFFSET;
> > -    p->size = 8;
> > -    p->count = 1;
> > -    p->dir = IOREQ_WRITE;
> > -    p->data = timeoff;
> > -
> > -    p->state = STATE_IOREQ_READY;
> > -
> > -    if ( !hvm_buffered_io_send(curr->domain, p) )
> > +    if ( !hvm_buffered_io_send(curr->domain, &p) )
> >          printk("Unsuccessful timeoffset update\n");
> >  }
> >
> > @@ -72,14 +69,14 @@ void send_timeoffset_req(unsigned long timeoff)
> >  void send_invalidate_req(void)
> >  {
> >      struct vcpu *curr = current;
> > -    ioreq_t p[1];
> > -
> > -    p->type = IOREQ_TYPE_INVALIDATE;
> > -    p->size = 4;
> > -    p->dir = IOREQ_WRITE;
> > -    p->data = ~0UL; /* flush all */
> > -
> > -    (void)hvm_send_assist_req(curr, p);
> > +    ioreq_t p = {
> > +        .type = IOREQ_TYPE_INVALIDATE,
> > +        .size = 4,
> > +        .dir = IOREQ_WRITE,
> > +        .data = ~0UL, /* flush all */
> > +    };
> > +
> > +    (void)hvm_send_assist_req(curr, &p);
> >  }
> >
> >  int handle_mmio(void)
> > --
> > 1.7.10.4
> >
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 2/6] ioreq-server: tidy up use of ioreq_t
  2014-03-10 15:46     ` Paul Durrant
@ 2014-03-10 15:53       ` George Dunlap
  2014-03-10 16:04         ` Paul Durrant
  0 siblings, 1 reply; 48+ messages in thread
From: George Dunlap @ 2014-03-10 15:53 UTC (permalink / raw)
  To: Paul Durrant; +Cc: George Dunlap, xen-devel

On Mon, Mar 10, 2014 at 3:46 PM, Paul Durrant <Paul.Durrant@citrix.com> wrote:
>> -----Original Message-----
>> From: dunlapg@gmail.com [mailto:dunlapg@gmail.com] On Behalf Of
>> George Dunlap
>> Sent: 10 March 2014 15:43
>> To: Paul Durrant
>> Cc: xen-devel@lists.xen.org
>> Subject: Re: [Xen-devel] [PATCH v3 2/6] ioreq-server: tidy up use of ioreq_t
>>
>> On Wed, Mar 5, 2014 at 2:47 PM, Paul Durrant <paul.durrant@citrix.com>
>> wrote:
>> > This patch tidies up various occurences of single element ioreq_t
>> > arrays on the stack and improves coding style.
>> >
>> > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
>>
>> Maybe I missed this in the earlier discussion, but why is most of this
>> not integrated into patch 1?
>>
>
> It was a patch that was added after the v1 RFC patch series. I wanted to keep it separate to avoid making patch 1 massively different to what it was before.

Isn't the point of the review process to change patches? :-)  In
general, both reviewers and code archaeologists (i.e., people going
through commits long after the fact) want as much as possible to know
what the end result is going to look like.  Having one patch which
does things one way, and another immediately following it that does
things a different way doesn't really help anybody, as far as I can
tell.  It only increases the amount of busy-work people have to do to
figure out what's going on.

 -George

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 2/6] ioreq-server: tidy up use of ioreq_t
  2014-03-10 15:53       ` George Dunlap
@ 2014-03-10 16:04         ` Paul Durrant
  2014-03-10 16:56           ` George Dunlap
  0 siblings, 1 reply; 48+ messages in thread
From: Paul Durrant @ 2014-03-10 16:04 UTC (permalink / raw)
  To: George Dunlap; +Cc: George Dunlap, xen-devel

> -----Original Message-----
> From: dunlapg@gmail.com [mailto:dunlapg@gmail.com] On Behalf Of
> George Dunlap
> Sent: 10 March 2014 15:53
> To: Paul Durrant
> Cc: George Dunlap; xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH v3 2/6] ioreq-server: tidy up use of ioreq_t
> 
> On Mon, Mar 10, 2014 at 3:46 PM, Paul Durrant <Paul.Durrant@citrix.com>
> wrote:
> >> -----Original Message-----
> >> From: dunlapg@gmail.com [mailto:dunlapg@gmail.com] On Behalf Of
> >> George Dunlap
> >> Sent: 10 March 2014 15:43
> >> To: Paul Durrant
> >> Cc: xen-devel@lists.xen.org
> >> Subject: Re: [Xen-devel] [PATCH v3 2/6] ioreq-server: tidy up use of
> ioreq_t
> >>
> >> On Wed, Mar 5, 2014 at 2:47 PM, Paul Durrant <paul.durrant@citrix.com>
> >> wrote:
> >> > This patch tidies up various occurences of single element ioreq_t
> >> > arrays on the stack and improves coding style.
> >> >
> >> > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> >>
> >> Maybe I missed this in the earlier discussion, but why is most of this
> >> not integrated into patch 1?
> >>
> >
> > It was a patch that was added after the v1 RFC patch series. I wanted to
> keep it separate to avoid making patch 1 massively different to what it was
> before.
> 
> Isn't the point of the review process to change patches? :-)  In
> general, both reviewers and code archaeologists (i.e., people going
> through commits long after the fact) want as much as possible to know
> what the end result is going to look like.  Having one patch which
> does things one way, and another immediately following it that does
> things a different way doesn't really help anybody, as far as I can
> tell.  It only increases the amount of busy-work people have to do to
> figure out what's going on.
> 

Patch 2 was added to code clean up that is not critical to this patch series. IMO folding it into patch 1 obscures the original purpose of that patch. If reviewers only cared about the end result then what's purpose of patch series in the first place? May as well just fold everything into one patch.

  Paul

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 2/6] ioreq-server: tidy up use of ioreq_t
  2014-03-10 16:04         ` Paul Durrant
@ 2014-03-10 16:56           ` George Dunlap
  2014-03-11 10:06             ` Paul Durrant
  0 siblings, 1 reply; 48+ messages in thread
From: George Dunlap @ 2014-03-10 16:56 UTC (permalink / raw)
  To: Paul Durrant, George Dunlap; +Cc: George Dunlap, xen-devel

On 03/10/2014 04:04 PM, Paul Durrant wrote:
>> -----Original Message-----
>> From: dunlapg@gmail.com [mailto:dunlapg@gmail.com] On Behalf Of
>> George Dunlap
>> Sent: 10 March 2014 15:53
>> To: Paul Durrant
>> Cc: George Dunlap; xen-devel@lists.xen.org
>> Subject: Re: [Xen-devel] [PATCH v3 2/6] ioreq-server: tidy up use of ioreq_t
>>
>> On Mon, Mar 10, 2014 at 3:46 PM, Paul Durrant <Paul.Durrant@citrix.com>
>> wrote:
>>>> -----Original Message-----
>>>> From: dunlapg@gmail.com [mailto:dunlapg@gmail.com] On Behalf Of
>>>> George Dunlap
>>>> Sent: 10 March 2014 15:43
>>>> To: Paul Durrant
>>>> Cc: xen-devel@lists.xen.org
>>>> Subject: Re: [Xen-devel] [PATCH v3 2/6] ioreq-server: tidy up use of
>> ioreq_t
>>>> On Wed, Mar 5, 2014 at 2:47 PM, Paul Durrant <paul.durrant@citrix.com>
>>>> wrote:
>>>>> This patch tidies up various occurences of single element ioreq_t
>>>>> arrays on the stack and improves coding style.
>>>>>
>>>>> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
>>>> Maybe I missed this in the earlier discussion, but why is most of this
>>>> not integrated into patch 1?
>>>>
>>> It was a patch that was added after the v1 RFC patch series. I wanted to
>> keep it separate to avoid making patch 1 massively different to what it was
>> before.
>>
>> Isn't the point of the review process to change patches? :-)  In
>> general, both reviewers and code archaeologists (i.e., people going
>> through commits long after the fact) want as much as possible to know
>> what the end result is going to look like.  Having one patch which
>> does things one way, and another immediately following it that does
>> things a different way doesn't really help anybody, as far as I can
>> tell.  It only increases the amount of busy-work people have to do to
>> figure out what's going on.
>>
> Patch 2 was added to code clean up that is not critical to this patch series. IMO folding it into patch 1 obscures the original purpose of that patch. If reviewers only cared about the end result then what's purpose of patch series in the first place? May as well just fold everything into one patch.

So what you mean is, in the case of "ioreq_t p[1]" being replaced by 
"ioreq_t p", patch 1 puts it on the stack and patch 2 changes p from a 
pointer to a struct, and you think putting them in one patch makes it 
harder to discern one from the other.

I'd still rather they be in one patch, but that might be a taste thing.  
Another option might have been to have patch 1 do the code motion, and 
patch 2 use the ioreq on the stack instead -- then you're not 
introducing a weird construct that you're going to obliterate right 
away.  To do that, I guess you'd have to avoid making get_ioreq() static 
to hvm.c until patch 2.

But we're getting into bike-shed territory here.  I think it would be 
better to change the break-down, but I won't make that a condition for 
acceptance.

  -George

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 4/6] ioreq-server: on-demand creation of ioreq server
  2014-03-05 14:47 ` [PATCH v3 4/6] ioreq-server: on-demand creation of ioreq server Paul Durrant
@ 2014-03-10 17:46   ` George Dunlap
  2014-03-11 10:54     ` Paul Durrant
  2014-03-14 11:18   ` Ian Campbell
  1 sibling, 1 reply; 48+ messages in thread
From: George Dunlap @ 2014-03-10 17:46 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel

On Wed, Mar 5, 2014 at 2:47 PM, Paul Durrant <paul.durrant@citrix.com> wrote:
> This patch only creates the ioreq server when the legacy HVM parameters
> are touched by an emulator. It also lays some groundwork for supporting
> multiple IOREQ servers. For instance, it introduces ioreq server reference
> counting which is not strictly necessary at this stage but will become so
> when ioreq servers can be destroyed prior the domain dying.
>
> There is a significant change in the layout of the special pages reserved
> in xc_hvm_build_x86.c. This is so that we can 'grow' them downwards without
> moving pages such as the xenstore page when building a domain that can
> support more than one emulator.
>
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> ---
>  tools/libxc/xc_hvm_build_x86.c |   40 +++++--
>  xen/arch/x86/hvm/hvm.c         |  240 +++++++++++++++++++++++++---------------
>  2 files changed, 176 insertions(+), 104 deletions(-)
>
> diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
> index dd3b522..b65e702 100644
> --- a/tools/libxc/xc_hvm_build_x86.c
> +++ b/tools/libxc/xc_hvm_build_x86.c
> @@ -41,13 +41,12 @@
>  #define SPECIALPAGE_PAGING   0
>  #define SPECIALPAGE_ACCESS   1
>  #define SPECIALPAGE_SHARING  2
> -#define SPECIALPAGE_BUFIOREQ 3
> -#define SPECIALPAGE_XENSTORE 4
> -#define SPECIALPAGE_IOREQ    5
> -#define SPECIALPAGE_IDENT_PT 6
> -#define SPECIALPAGE_CONSOLE  7
> -#define NR_SPECIAL_PAGES     8
> -#define special_pfn(x) (0xff000u - NR_SPECIAL_PAGES + (x))
> +#define SPECIALPAGE_XENSTORE 3
> +#define SPECIALPAGE_IDENT_PT 4
> +#define SPECIALPAGE_CONSOLE  5
> +#define SPECIALPAGE_IOREQ    6
> +#define NR_SPECIAL_PAGES     SPECIALPAGE_IOREQ + 2 /* ioreq server needs 2 pages */
> +#define special_pfn(x) (0xff000u - 1 - (x))
>
>  #define VGA_HOLE_SIZE (0x20)
>
> @@ -114,7 +113,7 @@ static void build_hvm_info(void *hvm_info_page, uint64_t mem_size,
>      /* Memory parameters. */
>      hvm_info->low_mem_pgend = lowmem_end >> PAGE_SHIFT;
>      hvm_info->high_mem_pgend = highmem_end >> PAGE_SHIFT;
> -    hvm_info->reserved_mem_pgstart = special_pfn(0);
> +    hvm_info->reserved_mem_pgstart = special_pfn(0) - NR_SPECIAL_PAGES;

It might be better to #define this up above, just below special_pfn()?
 If we do that, and make the change below, then the "direction of
growth" will be encoded all in one place.

>
>      /* Finish with the checksum. */
>      for ( i = 0, sum = 0; i < hvm_info->length; i++ )
> @@ -473,6 +472,23 @@ static int setup_guest(xc_interface *xch,
>      munmap(hvm_info_page, PAGE_SIZE);
>
>      /* Allocate and clear special pages. */
> +
> +    DPRINTF("%d SPECIAL PAGES:\n", NR_SPECIAL_PAGES);
> +    DPRINTF("  PAGING:    %"PRI_xen_pfn"\n",
> +            (xen_pfn_t)special_pfn(SPECIALPAGE_PAGING));
> +    DPRINTF("  ACCESS:    %"PRI_xen_pfn"\n",
> +            (xen_pfn_t)special_pfn(SPECIALPAGE_ACCESS));
> +    DPRINTF("  SHARING:   %"PRI_xen_pfn"\n",
> +            (xen_pfn_t)special_pfn(SPECIALPAGE_SHARING));
> +    DPRINTF("  STORE:     %"PRI_xen_pfn"\n",
> +            (xen_pfn_t)special_pfn(SPECIALPAGE_XENSTORE));
> +    DPRINTF("  IDENT_PT:  %"PRI_xen_pfn"\n",
> +            (xen_pfn_t)special_pfn(SPECIALPAGE_IDENT_PT));
> +    DPRINTF("  CONSOLE:   %"PRI_xen_pfn"\n",
> +            (xen_pfn_t)special_pfn(SPECIALPAGE_CONSOLE));
> +    DPRINTF("  IOREQ:     %"PRI_xen_pfn"\n",
> +            (xen_pfn_t)special_pfn(SPECIALPAGE_IOREQ));
> +
>      for ( i = 0; i < NR_SPECIAL_PAGES; i++ )
>      {
>          xen_pfn_t pfn = special_pfn(i);
> @@ -488,10 +504,6 @@ static int setup_guest(xc_interface *xch,
>
>      xc_set_hvm_param(xch, dom, HVM_PARAM_STORE_PFN,
>                       special_pfn(SPECIALPAGE_XENSTORE));
> -    xc_set_hvm_param(xch, dom, HVM_PARAM_BUFIOREQ_PFN,
> -                     special_pfn(SPECIALPAGE_BUFIOREQ));
> -    xc_set_hvm_param(xch, dom, HVM_PARAM_IOREQ_PFN,
> -                     special_pfn(SPECIALPAGE_IOREQ));
>      xc_set_hvm_param(xch, dom, HVM_PARAM_CONSOLE_PFN,
>                       special_pfn(SPECIALPAGE_CONSOLE));
>      xc_set_hvm_param(xch, dom, HVM_PARAM_PAGING_RING_PFN,
> @@ -500,6 +512,10 @@ static int setup_guest(xc_interface *xch,
>                       special_pfn(SPECIALPAGE_ACCESS));
>      xc_set_hvm_param(xch, dom, HVM_PARAM_SHARING_RING_PFN,
>                       special_pfn(SPECIALPAGE_SHARING));
> +    xc_set_hvm_param(xch, dom, HVM_PARAM_IOREQ_PFN,
> +                     special_pfn(SPECIALPAGE_IOREQ));
> +    xc_set_hvm_param(xch, dom, HVM_PARAM_BUFIOREQ_PFN,
> +                     special_pfn(SPECIALPAGE_IOREQ) - 1);

If we say "special_pfn(SPECIALPAGE_IOREQ+1)", it doesn't assume that
things grow a specific direction.

>
>      /*
>       * Identity-map page table is required for running with CR0.PG=0 when
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index bbf9577..22b2a2c 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -366,22 +366,9 @@ bool_t hvm_io_pending(struct vcpu *v)
>      return ( p->state != STATE_IOREQ_NONE );
>  }
>
> -void hvm_do_resume(struct vcpu *v)
> +static void hvm_wait_on_io(struct domain *d, ioreq_t *p)
>  {
> -    struct domain *d = v->domain;
> -    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
> -    ioreq_t *p;
> -
> -    check_wakeup_from_wait();
> -
> -    if ( is_hvm_vcpu(v) )
> -        pt_restore_timer(v);
> -
> -    if ( !s )
> -        goto check_inject_trap;
> -
>      /* NB. Optimised for common case (p->state == STATE_IOREQ_NONE). */
> -    p = get_ioreq(s, v->vcpu_id);
>      while ( p->state != STATE_IOREQ_NONE )
>      {
>          switch ( p->state )
> @@ -397,12 +384,29 @@ void hvm_do_resume(struct vcpu *v)
>              break;
>          default:
>              gdprintk(XENLOG_ERR, "Weird HVM iorequest state %d.\n", p->state);
> -            domain_crash(v->domain);
> +            domain_crash(d);
>              return; /* bail */
>          }
>      }
> +}
> +
> +void hvm_do_resume(struct vcpu *v)
> +{
> +    struct domain *d = v->domain;
> +    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
> +
> +    check_wakeup_from_wait();
> +
> +    if ( is_hvm_vcpu(v) )
> +        pt_restore_timer(v);
> +
> +    if ( s )
> +    {
> +        ioreq_t *p = get_ioreq(s, v->vcpu_id);
> +
> +        hvm_wait_on_io(d, p);
> +    }
>
> - check_inject_trap:
>      /* Inject pending hw/sw trap */
>      if ( v->arch.hvm_vcpu.inject_trap.vector != -1 )
>      {
> @@ -411,11 +415,13 @@ void hvm_do_resume(struct vcpu *v)
>      }
>  }
>
> -static void hvm_init_ioreq_page(
> -    struct domain *d, struct hvm_ioreq_page *iorp)
> +static void hvm_init_ioreq_page(struct hvm_ioreq_server *s, bool_t buf)
>  {
> +    struct hvm_ioreq_page *iorp;
> +
> +    iorp = buf ? &s->buf_ioreq : &s->ioreq;
> +
>      spin_lock_init(&iorp->lock);
> -    domain_pause(d);
>  }
>
>  void destroy_ring_for_helper(
> @@ -431,16 +437,13 @@ void destroy_ring_for_helper(
>      }
>  }
>
> -static void hvm_destroy_ioreq_page(
> -    struct domain *d, struct hvm_ioreq_page *iorp)
> +static void hvm_destroy_ioreq_page(struct hvm_ioreq_server *s, bool_t buf)
>  {
> -    spin_lock(&iorp->lock);
> +    struct hvm_ioreq_page *iorp;
>
> -    ASSERT(d->is_dying);
> +    iorp = buf ? &s->buf_ioreq : &s->ioreq;
>
>      destroy_ring_for_helper(&iorp->va, iorp->page);
> -
> -    spin_unlock(&iorp->lock);

BTW, is there a reason you're getting rid of the locks here?

 -George

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 0/6] Support for running secondary emulators
  2014-03-05 14:47 [PATCH v3 0/6] Support for running secondary emulators Paul Durrant
                   ` (5 preceding siblings ...)
  2014-03-05 14:48 ` [PATCH v3 6/6] ioreq-server: bring the PCI hotplug controller implementation into Xen Paul Durrant
@ 2014-03-10 18:57 ` George Dunlap
  2014-03-11 10:48   ` Paul Durrant
  2014-03-14 11:02 ` Ian Campbell
  7 siblings, 1 reply; 48+ messages in thread
From: George Dunlap @ 2014-03-10 18:57 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel

On Wed, Mar 5, 2014 at 2:47 PM, Paul Durrant <paul.durrant@citrix.com> wrote:
> This patch series adds the ioreq server interface which I mentioned in
> my talk at the Xen developer summit in Edinburgh at the end of last year.
> The code is based on work originally done by Julien Grall but has been
> re-written to allow existing versions of QEMU to work unmodified.
>
> The code is available in my xen.git [1] repo on xenbits, under the 'savannah3'
> branch, and I have also written a demo emulator to test the code, which can
> be found in my demu.git [2] repo.
>
>
> The modifications are broken down as follows:
>
> Patch #1 basically just moves some code around to make subsequent patches
> more obvious.
>
> Patch #2 tidies up some uses of ioreq_t as suggested by Andrew Cooper.
>
> Patch #3 again is largely code movement, from various places into a new
> hvm_ioreq_server structure. There should be no functional change at this
> stage as the ioreq server is still created at domain initialisation time (as
> were its contents prior to this patch).
>
> Patch #4 is the first functional change. The ioreq server struct
> initialisation is now deferred until something actually tries to play with
> the HVM parameters which reference it. In practice this is QEMU, which
> needs to read the ioreq pfns so it can map them.
>
> Patch #5 is the big one. This moves from a single ioreq server per domain
> to a list. The server that is created when the HVM parameters are reference
> is given id 0 and is considered to be the 'catch all' server which is, after
> all, how QEMU is used. Any secondary emulator, created using the new API
> in xenctrl.h, will have id 1 or above and only gets ioreqs when I/O hits one
> of its registered IO ranges or PCI devices.
>
> Patch #6 pulls the PCI hotplug controller emulation into Xen. This is
> necessary to allow a secondary emulator to hotplug a PCI device into the VM.
> The code implements the controller in the same way as upstream QEMU and thus
> the variant of the DSDT ASL used for upstream QEMU is retained.

Overall looks good -- looking forward to getting this one in.  It
should help unify the "HVM no-emulator" (aka PVH) path as well.

BTW, I see in your "savannah3" branch you have an extra patch not
submitted here, allowing vga=none.  Have you seen Fabio's patch on the
same subject?  It looks a bit more complete:

http://marc.info/?l=xen-devel&m=139306550111524

 -George

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 2/6] ioreq-server: tidy up use of ioreq_t
  2014-03-10 16:56           ` George Dunlap
@ 2014-03-11 10:06             ` Paul Durrant
  0 siblings, 0 replies; 48+ messages in thread
From: Paul Durrant @ 2014-03-11 10:06 UTC (permalink / raw)
  To: George Dunlap; +Cc: George Dunlap, xen-devel

> -----Original Message-----
> From: George Dunlap [mailto:george.dunlap@eu.citrix.com]
> Sent: 10 March 2014 16:56
> To: Paul Durrant; George Dunlap
> Cc: George Dunlap; xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH v3 2/6] ioreq-server: tidy up use of ioreq_t
> 
> On 03/10/2014 04:04 PM, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: dunlapg@gmail.com [mailto:dunlapg@gmail.com] On Behalf Of
> >> George Dunlap
> >> Sent: 10 March 2014 15:53
> >> To: Paul Durrant
> >> Cc: George Dunlap; xen-devel@lists.xen.org
> >> Subject: Re: [Xen-devel] [PATCH v3 2/6] ioreq-server: tidy up use of
> ioreq_t
> >>
> >> On Mon, Mar 10, 2014 at 3:46 PM, Paul Durrant
> <Paul.Durrant@citrix.com>
> >> wrote:
> >>>> -----Original Message-----
> >>>> From: dunlapg@gmail.com [mailto:dunlapg@gmail.com] On Behalf Of
> >>>> George Dunlap
> >>>> Sent: 10 March 2014 15:43
> >>>> To: Paul Durrant
> >>>> Cc: xen-devel@lists.xen.org
> >>>> Subject: Re: [Xen-devel] [PATCH v3 2/6] ioreq-server: tidy up use of
> >> ioreq_t
> >>>> On Wed, Mar 5, 2014 at 2:47 PM, Paul Durrant
> <paul.durrant@citrix.com>
> >>>> wrote:
> >>>>> This patch tidies up various occurences of single element ioreq_t
> >>>>> arrays on the stack and improves coding style.
> >>>>>
> >>>>> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> >>>> Maybe I missed this in the earlier discussion, but why is most of this
> >>>> not integrated into patch 1?
> >>>>
> >>> It was a patch that was added after the v1 RFC patch series. I wanted to
> >> keep it separate to avoid making patch 1 massively different to what it
> was
> >> before.
> >>
> >> Isn't the point of the review process to change patches? :-)  In
> >> general, both reviewers and code archaeologists (i.e., people going
> >> through commits long after the fact) want as much as possible to know
> >> what the end result is going to look like.  Having one patch which
> >> does things one way, and another immediately following it that does
> >> things a different way doesn't really help anybody, as far as I can
> >> tell.  It only increases the amount of busy-work people have to do to
> >> figure out what's going on.
> >>
> > Patch 2 was added to code clean up that is not critical to this patch series.
> IMO folding it into patch 1 obscures the original purpose of that patch. If
> reviewers only cared about the end result then what's purpose of patch
> series in the first place? May as well just fold everything into one patch.
> 
> So what you mean is, in the case of "ioreq_t p[1]" being replaced by
> "ioreq_t p", patch 1 puts it on the stack and patch 2 changes p from a
> pointer to a struct, and you think putting them in one patch makes it
> harder to discern one from the other.
> 

The issue is that there were other places in the code that used "ioreq_t p[1]", so in patch 1 I elected to follow suit and then in patch 2 change all uses of that construct, including the one I'd introduced. Does that not seem logical? It seems logical to me. It creates a clear separation between code motion and a change of construct.
 
> I'd still rather they be in one patch, but that might be a taste thing.

I think so.

> Another option might have been to have patch 1 do the code motion, and
> patch 2 use the ioreq on the stack instead -- then you're not
> introducing a weird construct that you're going to obliterate right
> away.  To do that, I guess you'd have to avoid making get_ioreq() static
> to hvm.c until patch 2.

I'm only introducing one occurrence of that construct. The majority were already present in the code before I touched it.

> 
> But we're getting into bike-shed territory here.  I think it would be
> better to change the break-down, but I won't make that a condition for
> acceptance.
> 

It's possibly slightly clearer, but a re-base and re-test for the sake of that doesn't seem like a good ROI. These patches are not exactly the 'meat' of the series anyway :-)

  Paul

>   -George

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 0/6] Support for running secondary emulators
  2014-03-10 18:57 ` [PATCH v3 0/6] Support for running secondary emulators George Dunlap
@ 2014-03-11 10:48   ` Paul Durrant
  0 siblings, 0 replies; 48+ messages in thread
From: Paul Durrant @ 2014-03-11 10:48 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel

> -----Original Message-----
> From: dunlapg@gmail.com [mailto:dunlapg@gmail.com] On Behalf Of
> George Dunlap
> Sent: 10 March 2014 18:58
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH v3 0/6] Support for running secondary
> emulators
> 
> On Wed, Mar 5, 2014 at 2:47 PM, Paul Durrant <paul.durrant@citrix.com>
> wrote:
> > This patch series adds the ioreq server interface which I mentioned in
> > my talk at the Xen developer summit in Edinburgh at the end of last year.
> > The code is based on work originally done by Julien Grall but has been
> > re-written to allow existing versions of QEMU to work unmodified.
> >
> > The code is available in my xen.git [1] repo on xenbits, under the
> 'savannah3'
> > branch, and I have also written a demo emulator to test the code, which
> can
> > be found in my demu.git [2] repo.
> >
> >
> > The modifications are broken down as follows:
> >
> > Patch #1 basically just moves some code around to make subsequent
> patches
> > more obvious.
> >
> > Patch #2 tidies up some uses of ioreq_t as suggested by Andrew Cooper.
> >
> > Patch #3 again is largely code movement, from various places into a new
> > hvm_ioreq_server structure. There should be no functional change at this
> > stage as the ioreq server is still created at domain initialisation time (as
> > were its contents prior to this patch).
> >
> > Patch #4 is the first functional change. The ioreq server struct
> > initialisation is now deferred until something actually tries to play with
> > the HVM parameters which reference it. In practice this is QEMU, which
> > needs to read the ioreq pfns so it can map them.
> >
> > Patch #5 is the big one. This moves from a single ioreq server per domain
> > to a list. The server that is created when the HVM parameters are
> reference
> > is given id 0 and is considered to be the 'catch all' server which is, after
> > all, how QEMU is used. Any secondary emulator, created using the new API
> > in xenctrl.h, will have id 1 or above and only gets ioreqs when I/O hits one
> > of its registered IO ranges or PCI devices.
> >
> > Patch #6 pulls the PCI hotplug controller emulation into Xen. This is
> > necessary to allow a secondary emulator to hotplug a PCI device into the
> VM.
> > The code implements the controller in the same way as upstream QEMU
> and thus
> > the variant of the DSDT ASL used for upstream QEMU is retained.
> 
> Overall looks good -- looking forward to getting this one in.  It
> should help unify the "HVM no-emulator" (aka PVH) path as well.
> 
> BTW, I see in your "savannah3" branch you have an extra patch not
> submitted here, allowing vga=none.  Have you seen Fabio's patch on the
> same subject?  It looks a bit more complete:
> 
> http://marc.info/?l=xen-devel&m=139306550111524
> 

Yes. I think I notice that just after writing the code ;-) I'll swap over when I do savannah4.

  Paul

>  -George

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 4/6] ioreq-server: on-demand creation of ioreq server
  2014-03-10 17:46   ` George Dunlap
@ 2014-03-11 10:54     ` Paul Durrant
  2014-03-14 11:04       ` Ian Campbell
  0 siblings, 1 reply; 48+ messages in thread
From: Paul Durrant @ 2014-03-11 10:54 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel

> -----Original Message-----
> From: dunlapg@gmail.com [mailto:dunlapg@gmail.com] On Behalf Of
> George Dunlap
> Sent: 10 March 2014 17:46
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH v3 4/6] ioreq-server: on-demand creation of
> ioreq server
> 
> On Wed, Mar 5, 2014 at 2:47 PM, Paul Durrant <paul.durrant@citrix.com>
> wrote:
> > This patch only creates the ioreq server when the legacy HVM parameters
> > are touched by an emulator. It also lays some groundwork for supporting
> > multiple IOREQ servers. For instance, it introduces ioreq server reference
> > counting which is not strictly necessary at this stage but will become so
> > when ioreq servers can be destroyed prior the domain dying.
> >
> > There is a significant change in the layout of the special pages reserved
> > in xc_hvm_build_x86.c. This is so that we can 'grow' them downwards
> without
> > moving pages such as the xenstore page when building a domain that can
> > support more than one emulator.
> >
> > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> > ---
> >  tools/libxc/xc_hvm_build_x86.c |   40 +++++--
> >  xen/arch/x86/hvm/hvm.c         |  240 +++++++++++++++++++++++++------
> ---------
> >  2 files changed, 176 insertions(+), 104 deletions(-)
> >
> > diff --git a/tools/libxc/xc_hvm_build_x86.c
> b/tools/libxc/xc_hvm_build_x86.c
> > index dd3b522..b65e702 100644
> > --- a/tools/libxc/xc_hvm_build_x86.c
> > +++ b/tools/libxc/xc_hvm_build_x86.c
> > @@ -41,13 +41,12 @@
> >  #define SPECIALPAGE_PAGING   0
> >  #define SPECIALPAGE_ACCESS   1
> >  #define SPECIALPAGE_SHARING  2
> > -#define SPECIALPAGE_BUFIOREQ 3
> > -#define SPECIALPAGE_XENSTORE 4
> > -#define SPECIALPAGE_IOREQ    5
> > -#define SPECIALPAGE_IDENT_PT 6
> > -#define SPECIALPAGE_CONSOLE  7
> > -#define NR_SPECIAL_PAGES     8
> > -#define special_pfn(x) (0xff000u - NR_SPECIAL_PAGES + (x))
> > +#define SPECIALPAGE_XENSTORE 3
> > +#define SPECIALPAGE_IDENT_PT 4
> > +#define SPECIALPAGE_CONSOLE  5
> > +#define SPECIALPAGE_IOREQ    6
> > +#define NR_SPECIAL_PAGES     SPECIALPAGE_IOREQ + 2 /* ioreq server
> needs 2 pages */
> > +#define special_pfn(x) (0xff000u - 1 - (x))
> >
> >  #define VGA_HOLE_SIZE (0x20)
> >
> > @@ -114,7 +113,7 @@ static void build_hvm_info(void *hvm_info_page,
> uint64_t mem_size,
> >      /* Memory parameters. */
> >      hvm_info->low_mem_pgend = lowmem_end >> PAGE_SHIFT;
> >      hvm_info->high_mem_pgend = highmem_end >> PAGE_SHIFT;
> > -    hvm_info->reserved_mem_pgstart = special_pfn(0);
> > +    hvm_info->reserved_mem_pgstart = special_pfn(0) -
> NR_SPECIAL_PAGES;
> 
> It might be better to #define this up above, just below special_pfn()?
>  If we do that, and make the change below, then the "direction of
> growth" will be encoded all in one place.
> 

Sounds like a good idea. I'll do that.

> >
> >      /* Finish with the checksum. */
> >      for ( i = 0, sum = 0; i < hvm_info->length; i++ )
> > @@ -473,6 +472,23 @@ static int setup_guest(xc_interface *xch,
> >      munmap(hvm_info_page, PAGE_SIZE);
> >
> >      /* Allocate and clear special pages. */
> > +
> > +    DPRINTF("%d SPECIAL PAGES:\n", NR_SPECIAL_PAGES);
> > +    DPRINTF("  PAGING:    %"PRI_xen_pfn"\n",
> > +            (xen_pfn_t)special_pfn(SPECIALPAGE_PAGING));
> > +    DPRINTF("  ACCESS:    %"PRI_xen_pfn"\n",
> > +            (xen_pfn_t)special_pfn(SPECIALPAGE_ACCESS));
> > +    DPRINTF("  SHARING:   %"PRI_xen_pfn"\n",
> > +            (xen_pfn_t)special_pfn(SPECIALPAGE_SHARING));
> > +    DPRINTF("  STORE:     %"PRI_xen_pfn"\n",
> > +            (xen_pfn_t)special_pfn(SPECIALPAGE_XENSTORE));
> > +    DPRINTF("  IDENT_PT:  %"PRI_xen_pfn"\n",
> > +            (xen_pfn_t)special_pfn(SPECIALPAGE_IDENT_PT));
> > +    DPRINTF("  CONSOLE:   %"PRI_xen_pfn"\n",
> > +            (xen_pfn_t)special_pfn(SPECIALPAGE_CONSOLE));
> > +    DPRINTF("  IOREQ:     %"PRI_xen_pfn"\n",
> > +            (xen_pfn_t)special_pfn(SPECIALPAGE_IOREQ));
> > +
> >      for ( i = 0; i < NR_SPECIAL_PAGES; i++ )
> >      {
> >          xen_pfn_t pfn = special_pfn(i);
> > @@ -488,10 +504,6 @@ static int setup_guest(xc_interface *xch,
> >
> >      xc_set_hvm_param(xch, dom, HVM_PARAM_STORE_PFN,
> >                       special_pfn(SPECIALPAGE_XENSTORE));
> > -    xc_set_hvm_param(xch, dom, HVM_PARAM_BUFIOREQ_PFN,
> > -                     special_pfn(SPECIALPAGE_BUFIOREQ));
> > -    xc_set_hvm_param(xch, dom, HVM_PARAM_IOREQ_PFN,
> > -                     special_pfn(SPECIALPAGE_IOREQ));
> >      xc_set_hvm_param(xch, dom, HVM_PARAM_CONSOLE_PFN,
> >                       special_pfn(SPECIALPAGE_CONSOLE));
> >      xc_set_hvm_param(xch, dom, HVM_PARAM_PAGING_RING_PFN,
> > @@ -500,6 +512,10 @@ static int setup_guest(xc_interface *xch,
> >                       special_pfn(SPECIALPAGE_ACCESS));
> >      xc_set_hvm_param(xch, dom, HVM_PARAM_SHARING_RING_PFN,
> >                       special_pfn(SPECIALPAGE_SHARING));
> > +    xc_set_hvm_param(xch, dom, HVM_PARAM_IOREQ_PFN,
> > +                     special_pfn(SPECIALPAGE_IOREQ));
> > +    xc_set_hvm_param(xch, dom, HVM_PARAM_BUFIOREQ_PFN,
> > +                     special_pfn(SPECIALPAGE_IOREQ) - 1);
> 
> If we say "special_pfn(SPECIALPAGE_IOREQ+1)", it doesn't assume that
> things grow a specific direction.
> 

Ok. 

> >
> >      /*
> >       * Identity-map page table is required for running with CR0.PG=0 when
> > diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> > index bbf9577..22b2a2c 100644
> > --- a/xen/arch/x86/hvm/hvm.c
> > +++ b/xen/arch/x86/hvm/hvm.c
> > @@ -366,22 +366,9 @@ bool_t hvm_io_pending(struct vcpu *v)
> >      return ( p->state != STATE_IOREQ_NONE );
> >  }
> >
> > -void hvm_do_resume(struct vcpu *v)
> > +static void hvm_wait_on_io(struct domain *d, ioreq_t *p)
> >  {
> > -    struct domain *d = v->domain;
> > -    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
> > -    ioreq_t *p;
> > -
> > -    check_wakeup_from_wait();
> > -
> > -    if ( is_hvm_vcpu(v) )
> > -        pt_restore_timer(v);
> > -
> > -    if ( !s )
> > -        goto check_inject_trap;
> > -
> >      /* NB. Optimised for common case (p->state == STATE_IOREQ_NONE).
> */
> > -    p = get_ioreq(s, v->vcpu_id);
> >      while ( p->state != STATE_IOREQ_NONE )
> >      {
> >          switch ( p->state )
> > @@ -397,12 +384,29 @@ void hvm_do_resume(struct vcpu *v)
> >              break;
> >          default:
> >              gdprintk(XENLOG_ERR, "Weird HVM iorequest state %d.\n", p-
> >state);
> > -            domain_crash(v->domain);
> > +            domain_crash(d);
> >              return; /* bail */
> >          }
> >      }
> > +}
> > +
> > +void hvm_do_resume(struct vcpu *v)
> > +{
> > +    struct domain *d = v->domain;
> > +    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
> > +
> > +    check_wakeup_from_wait();
> > +
> > +    if ( is_hvm_vcpu(v) )
> > +        pt_restore_timer(v);
> > +
> > +    if ( s )
> > +    {
> > +        ioreq_t *p = get_ioreq(s, v->vcpu_id);
> > +
> > +        hvm_wait_on_io(d, p);
> > +    }
> >
> > - check_inject_trap:
> >      /* Inject pending hw/sw trap */
> >      if ( v->arch.hvm_vcpu.inject_trap.vector != -1 )
> >      {
> > @@ -411,11 +415,13 @@ void hvm_do_resume(struct vcpu *v)
> >      }
> >  }
> >
> > -static void hvm_init_ioreq_page(
> > -    struct domain *d, struct hvm_ioreq_page *iorp)
> > +static void hvm_init_ioreq_page(struct hvm_ioreq_server *s, bool_t buf)
> >  {
> > +    struct hvm_ioreq_page *iorp;
> > +
> > +    iorp = buf ? &s->buf_ioreq : &s->ioreq;
> > +
> >      spin_lock_init(&iorp->lock);
> > -    domain_pause(d);
> >  }
> >
> >  void destroy_ring_for_helper(
> > @@ -431,16 +437,13 @@ void destroy_ring_for_helper(
> >      }
> >  }
> >
> > -static void hvm_destroy_ioreq_page(
> > -    struct domain *d, struct hvm_ioreq_page *iorp)
> > +static void hvm_destroy_ioreq_page(struct hvm_ioreq_server *s, bool_t
> buf)
> >  {
> > -    spin_lock(&iorp->lock);
> > +    struct hvm_ioreq_page *iorp;
> >
> > -    ASSERT(d->is_dying);
> > +    iorp = buf ? &s->buf_ioreq : &s->ioreq;
> >
> >      destroy_ring_for_helper(&iorp->va, iorp->page);
> > -
> > -    spin_unlock(&iorp->lock);
> 
> BTW, is there a reason you're getting rid of the locks here?
> 

Yes, I don't believe it is needed.

  Paul

>  -George

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 0/6] Support for running secondary emulators
  2014-03-05 14:47 [PATCH v3 0/6] Support for running secondary emulators Paul Durrant
                   ` (6 preceding siblings ...)
  2014-03-10 18:57 ` [PATCH v3 0/6] Support for running secondary emulators George Dunlap
@ 2014-03-14 11:02 ` Ian Campbell
  2014-03-14 13:26   ` Paul Durrant
  7 siblings, 1 reply; 48+ messages in thread
From: Ian Campbell @ 2014-03-14 11:02 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel

On Wed, 2014-03-05 at 14:47 +0000, Paul Durrant wrote:
> This patch series adds the ioreq server interface which I mentioned in
[...]

FYI due to the traffic levels on xen-devel it is conventional now to CC
the maintainers of the code being patches. ./scripts/get_maintainer.pl
can help with this.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 4/6] ioreq-server: on-demand creation of ioreq server
  2014-03-11 10:54     ` Paul Durrant
@ 2014-03-14 11:04       ` Ian Campbell
  2014-03-14 13:28         ` Paul Durrant
  0 siblings, 1 reply; 48+ messages in thread
From: Ian Campbell @ 2014-03-14 11:04 UTC (permalink / raw)
  To: Paul Durrant; +Cc: George Dunlap, xen-devel

On Tue, 2014-03-11 at 10:54 +0000, Paul Durrant wrote:
> > > -static void hvm_destroy_ioreq_page(
> > > -    struct domain *d, struct hvm_ioreq_page *iorp)
> > > +static void hvm_destroy_ioreq_page(struct hvm_ioreq_server *s, bool_t
> > buf)
> > >  {
> > > -    spin_lock(&iorp->lock);
> > > +    struct hvm_ioreq_page *iorp;
> > >
> > > -    ASSERT(d->is_dying);
> > > +    iorp = buf ? &s->buf_ioreq : &s->ioreq;
> > >
> > >      destroy_ring_for_helper(&iorp->va, iorp->page);
> > > -
> > > -    spin_unlock(&iorp->lock);
> > 
> > BTW, is there a reason you're getting rid of the locks here?
> > 
> 
> Yes, I don't believe it is needed.

This smells like a separate patch to me, but at the very least removing
a lock needs mention in the commit message.

Ian.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 4/6] ioreq-server: on-demand creation of ioreq server
  2014-03-05 14:47 ` [PATCH v3 4/6] ioreq-server: on-demand creation of ioreq server Paul Durrant
  2014-03-10 17:46   ` George Dunlap
@ 2014-03-14 11:18   ` Ian Campbell
  2014-03-14 13:30     ` Paul Durrant
  1 sibling, 1 reply; 48+ messages in thread
From: Ian Campbell @ 2014-03-14 11:18 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel

On Wed, 2014-03-05 at 14:47 +0000, Paul Durrant wrote:
> This patch only creates the ioreq server when the legacy HVM parameters
> are touched by an emulator.

The "ioreq server" is a hypervisor side concept? I was confused by
expecting this to involve a server process in userspace (and couldn't
see any way that could work ;-)).

> It also lays some groundwork for supporting
> multiple IOREQ servers. For instance, it introduces ioreq server reference
> counting

And refactors hvm_do_resume I think? No semantic change?

And it makes changes wrt whether the domain is paused while things are
set up, which needs rationale (as does the associated lock removal which
was already commented upon by George).

TBH, I think all of these deserve to be split out, but at the very least
they should be discussed in the change log (i.e. the list following "FOr
instance" should be a complete list, not a single example).

>  which is not strictly necessary at this stage but will become so
> when ioreq servers can be destroyed prior the domain dying.
> 
> There is a significant change in the layout of the special pages reserved
> in xc_hvm_build_x86.c. This is so that we can 'grow' them downwards without
> moving pages such as the xenstore page when building a domain that can
> support more than one emulator.
> 
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> ---
>  tools/libxc/xc_hvm_build_x86.c |   40 +++++--
>  xen/arch/x86/hvm/hvm.c         |  240 +++++++++++++++++++++++++---------------
>  2 files changed, 176 insertions(+), 104 deletions(-)
> 
> diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
> index dd3b522..b65e702 100644
> --- a/tools/libxc/xc_hvm_build_x86.c
> +++ b/tools/libxc/xc_hvm_build_x86.c
> @@ -41,13 +41,12 @@
>  #define SPECIALPAGE_PAGING   0
>  #define SPECIALPAGE_ACCESS   1
>  #define SPECIALPAGE_SHARING  2
> -#define SPECIALPAGE_BUFIOREQ 3

This guy has disappeared entirely?

> -#define SPECIALPAGE_XENSTORE 4
> -#define SPECIALPAGE_IOREQ    5
> -#define SPECIALPAGE_IDENT_PT 6
> -#define SPECIALPAGE_CONSOLE  7
> -#define NR_SPECIAL_PAGES     8
> -#define special_pfn(x) (0xff000u - NR_SPECIAL_PAGES + (x))
> +#define SPECIALPAGE_XENSTORE 3
> +#define SPECIALPAGE_IDENT_PT 4
> +#define SPECIALPAGE_CONSOLE  5
> +#define SPECIALPAGE_IOREQ    6
> +#define NR_SPECIAL_PAGES     SPECIALPAGE_IOREQ + 2 /* ioreq server needs 2 pages */

BY way of documentation I think
#define SPECIALPAGE_IOREQ2 7 should just be included in the list. Or
maybe just /* ioreq server needs 2 pages: 7 */ where that define would
go (with the 7 in the correct column).

> +#define special_pfn(x) (0xff000u - 1 - (x))
>  
>  #define VGA_HOLE_SIZE (0x20)
>  
> @@ -114,7 +113,7 @@ static void build_hvm_info(void *hvm_info_page, uint64_t mem_size,
>      /* Memory parameters. */
>      hvm_info->low_mem_pgend = lowmem_end >> PAGE_SHIFT;
>      hvm_info->high_mem_pgend = highmem_end >> PAGE_SHIFT;
> -    hvm_info->reserved_mem_pgstart = special_pfn(0);
> +    hvm_info->reserved_mem_pgstart = special_pfn(0) - NR_SPECIAL_PAGES;

#define SPECIAL_PAGES_START up with the list?

>      /* Finish with the checksum. */
>      for ( i = 0, sum = 0; i < hvm_info->length; i++ )
> @@ -473,6 +472,23 @@ static int setup_guest(xc_interface *xch,
>      munmap(hvm_info_page, PAGE_SIZE);
>  
>      /* Allocate and clear special pages. */
> +
> +    DPRINTF("%d SPECIAL PAGES:\n", NR_SPECIAL_PAGES);
> +    DPRINTF("  PAGING:    %"PRI_xen_pfn"\n",
> +            (xen_pfn_t)special_pfn(SPECIALPAGE_PAGING));
> +    DPRINTF("  ACCESS:    %"PRI_xen_pfn"\n",
> +            (xen_pfn_t)special_pfn(SPECIALPAGE_ACCESS));
> +    DPRINTF("  SHARING:   %"PRI_xen_pfn"\n",
> +            (xen_pfn_t)special_pfn(SPECIALPAGE_SHARING));
> +    DPRINTF("  STORE:     %"PRI_xen_pfn"\n",
> +            (xen_pfn_t)special_pfn(SPECIALPAGE_XENSTORE));
> +    DPRINTF("  IDENT_PT:  %"PRI_xen_pfn"\n",
> +            (xen_pfn_t)special_pfn(SPECIALPAGE_IDENT_PT));
> +    DPRINTF("  CONSOLE:   %"PRI_xen_pfn"\n",
> +            (xen_pfn_t)special_pfn(SPECIALPAGE_CONSOLE));
> +    DPRINTF("  IOREQ:     %"PRI_xen_pfn"\n",
> +            (xen_pfn_t)special_pfn(SPECIALPAGE_IOREQ));
> +
>      for ( i = 0; i < NR_SPECIAL_PAGES; i++ )
>      {
>          xen_pfn_t pfn = special_pfn(i);
> @@ -488,10 +504,6 @@ static int setup_guest(xc_interface *xch,
>  
>      xc_set_hvm_param(xch, dom, HVM_PARAM_STORE_PFN,
>                       special_pfn(SPECIALPAGE_XENSTORE));
> -    xc_set_hvm_param(xch, dom, HVM_PARAM_BUFIOREQ_PFN,
> -                     special_pfn(SPECIALPAGE_BUFIOREQ));
> -    xc_set_hvm_param(xch, dom, HVM_PARAM_IOREQ_PFN,
> -                     special_pfn(SPECIALPAGE_IOREQ));
>      xc_set_hvm_param(xch, dom, HVM_PARAM_CONSOLE_PFN,
>                       special_pfn(SPECIALPAGE_CONSOLE));
>      xc_set_hvm_param(xch, dom, HVM_PARAM_PAGING_RING_PFN,
> @@ -500,6 +512,10 @@ static int setup_guest(xc_interface *xch,
>                       special_pfn(SPECIALPAGE_ACCESS));
>      xc_set_hvm_param(xch, dom, HVM_PARAM_SHARING_RING_PFN,
>                       special_pfn(SPECIALPAGE_SHARING));
> +    xc_set_hvm_param(xch, dom, HVM_PARAM_IOREQ_PFN,
> +                     special_pfn(SPECIALPAGE_IOREQ));
> +    xc_set_hvm_param(xch, dom, HVM_PARAM_BUFIOREQ_PFN,
> +                     special_pfn(SPECIALPAGE_IOREQ) - 1);

This seems to suggest that BUFIOREQ has actually become IOREQ2?

>  
>      /*
>       * Identity-map page table is required for running with CR0.PG=0 when
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index bbf9577..22b2a2c 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -366,22 +366,9 @@ bool_t hvm_io_pending(struct vcpu *v)
>      return ( p->state != STATE_IOREQ_NONE );
>  }
>  
> -void hvm_do_resume(struct vcpu *v)
> +static void hvm_wait_on_io(struct domain *d, ioreq_t *p)
>  {
> -    struct domain *d = v->domain;
> -    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
> -    ioreq_t *p;
> -
> -    check_wakeup_from_wait();
> -
> -    if ( is_hvm_vcpu(v) )
> -        pt_restore_timer(v);
> -
> -    if ( !s )
> -        goto check_inject_trap;
> -
>      /* NB. Optimised for common case (p->state == STATE_IOREQ_NONE). */
> -    p = get_ioreq(s, v->vcpu_id);
>      while ( p->state != STATE_IOREQ_NONE )
>      {
>          switch ( p->state )
> @@ -397,12 +384,29 @@ void hvm_do_resume(struct vcpu *v)
>              break;
>          default:
>              gdprintk(XENLOG_ERR, "Weird HVM iorequest state %d.\n", p->state);
> -            domain_crash(v->domain);
> +            domain_crash(d);
>              return; /* bail */
>          }
>      }
> +}
> +
> +void hvm_do_resume(struct vcpu *v)
> +{
> +    struct domain *d = v->domain;
> +    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
> +
> +    check_wakeup_from_wait();
> +
> +    if ( is_hvm_vcpu(v) )
> +        pt_restore_timer(v);
> +
> +    if ( s )
> +    {
> +        ioreq_t *p = get_ioreq(s, v->vcpu_id);
> +
> +        hvm_wait_on_io(d, p);
> +    }
>  
> - check_inject_trap:
>      /* Inject pending hw/sw trap */
>      if ( v->arch.hvm_vcpu.inject_trap.vector != -1 ) 
>      {
> @@ -411,11 +415,13 @@ void hvm_do_resume(struct vcpu *v)
>      }
>  }
>  
> -static void hvm_init_ioreq_page(
> -    struct domain *d, struct hvm_ioreq_page *iorp)
> +static void hvm_init_ioreq_page(struct hvm_ioreq_server *s, bool_t buf)
>  {
> +    struct hvm_ioreq_page *iorp;
> +
> +    iorp = buf ? &s->buf_ioreq : &s->ioreq;

This appears a lot, suggesting that either this logic deserves to be
further up the call chain or there should be a helper for it. I suspect
the former.

> @@ -606,29 +606,88 @@ static void hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s, struct vcpu
>      free_xen_event_channel(v, v->arch.hvm_vcpu.ioreq_evtchn);
>  }
>  
> -static int hvm_create_ioreq_server(struct domain *d)
> +static int hvm_create_ioreq_server(struct domain *d, domid_t domid)

(I've just realised I've straying into hypervisor side territory. yet
another reason why this stuff should be split out since the tools
changes don't appear to be very related to most of this. Stopping here.
)

Ian.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 5/6] ioreq-server: add support for multiple servers
  2014-03-05 14:48 ` [PATCH v3 5/6] ioreq-server: add support for multiple servers Paul Durrant
@ 2014-03-14 11:52   ` Ian Campbell
  2014-03-17 11:45     ` George Dunlap
  2014-03-17 12:25     ` Paul Durrant
  0 siblings, 2 replies; 48+ messages in thread
From: Ian Campbell @ 2014-03-14 11:52 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel

On Wed, 2014-03-05 at 14:48 +0000, Paul Durrant wrote:
> The legacy 'catch-all' server is always created with id 0. Secondary
> servers will have an id ranging from 1 to a limit set by the toolstack
> via the 'max_emulators' build info field. This defaults to 1 so ordinarily
> no extra special pages are reserved for secondary emulators. It may be
> increased using the secondary_device_emulators parameter in xl.cfg(5).
> There's no clear limit to apply to the number of emulators so I've not
> applied one.
> 
> Because of the re-arrangement of the special pages in a previous patch we
> only need the addition of parameter HVM_PARAM_NR_IOREQ_SERVERS to determine
> the layout of the shared pages for multiple emulators. Guests migrated in
> from hosts without this patch will be lacking the save record which stores
> the new parameter and so the guest is assumed to only have had a single
> emulator.
> 
> Added some more emacs boilerplate to xenctrl.h and xenguest.h
> 
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> ---
>  docs/man/xl.cfg.pod.5            |    7 +
>  tools/libxc/xc_domain.c          |  175 +++++++
>  tools/libxc/xc_domain_restore.c  |   20 +
>  tools/libxc/xc_domain_save.c     |   12 +
>  tools/libxc/xc_hvm_build_x86.c   |   24 +-
>  tools/libxc/xenctrl.h            |   51 ++
>  tools/libxc/xenguest.h           |   12 +
>  tools/libxc/xg_save_restore.h    |    1 +
>  tools/libxl/libxl.h              |    8 +
>  tools/libxl/libxl_create.c       |    3 +
>  tools/libxl/libxl_dom.c          |    1 +
>  tools/libxl/libxl_types.idl      |    1 +
>  tools/libxl/xl_cmdimpl.c         |    3 +
>  xen/arch/x86/hvm/hvm.c           |  964 ++++++++++++++++++++++++++++++++++++--
>  xen/arch/x86/hvm/io.c            |    2 +-
>  xen/include/asm-x86/hvm/domain.h |   23 +-
>  xen/include/asm-x86/hvm/hvm.h    |    3 +-
>  xen/include/asm-x86/hvm/io.h     |    2 +-
>  xen/include/public/hvm/hvm_op.h  |   70 +++
>  xen/include/public/hvm/ioreq.h   |    1 +
>  xen/include/public/hvm/params.h  |    4 +-
>  21 files changed, 1324 insertions(+), 63 deletions(-)
> 
> diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
> index e15a49f..0226c55 100644
> --- a/docs/man/xl.cfg.pod.5
> +++ b/docs/man/xl.cfg.pod.5
> @@ -1281,6 +1281,13 @@ specified, enabling the use of XenServer PV drivers in the guest.
>  This parameter only takes effect when device_model_version=qemu-xen.
>  See F<docs/misc/pci-device-reservations.txt> for more information.
>  
> +=item B<secondary_device_emulators=NUMBER>
> +
> +If a number of secondary device emulators (i.e. in addition to
> +qemu-xen or qemu-xen-traditional) are to be invoked to support the
> +guest then this parameter can be set with the count of how many are
> +to be used. The default value is zero.

This is an odd thing to expose to the user. Surely (lib)xl should be
launching these things while building the domain and therefore know how
many there are the config options should be things like
"use_split_dm_for_foo=1" or device_emulators = ["/usr/bin/first-dm",
"/usr/local/bin/second-dm"] or something.

> +
>  =back
>  
>  =head2 Device-Model Options
> diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
> index 369c3f3..dfa905b 100644
> --- a/tools/libxc/xc_domain.c
> +++ b/tools/libxc/xc_domain.c
> +int xc_hvm_get_ioreq_server_info(xc_interface *xch,
> +                                 domid_t domid,
> +                                 ioservid_t id,
> +                                 xen_pfn_t *pfn,
> +                                 xen_pfn_t *buf_pfn,
> +                                 evtchn_port_t *buf_port)
> +{
> +    DECLARE_HYPERCALL;
> +    DECLARE_HYPERCALL_BUFFER(xen_hvm_get_ioreq_server_info_t, arg);
> +    int rc;
> +
> +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> +    if ( arg == NULL )
> +        return -1;
> +
> +    hypercall.op     = __HYPERVISOR_hvm_op;
> +    hypercall.arg[0] = HVMOP_get_ioreq_server_info;
> +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> +    arg->domid = domid;
> +    arg->id = id;
> +    rc = do_xen_hypercall(xch, &hypercall);
> +    if ( rc != 0 )
> +        goto done;
> +
> +    if ( pfn )
> +        *pfn = arg->pfn;
> +
> +    if ( buf_pfn )
> +        *buf_pfn = arg->buf_pfn;
> +
> +    if ( buf_port )
> +        *buf_port = arg->buf_port;

This looks a bit like this function should take a
xen_hvm_get_ioreq_server_info_t* and use the bounce buffering stuff.
Unless there is some desire to hide that struct from the callers
perhaps?

> +
> +done:
> +    xc_hypercall_buffer_free(xch, arg);
> +    return rc;
> +}
> +
> +int xc_hvm_map_io_range_to_ioreq_server(xc_interface *xch, domid_t domid,
> +                                        ioservid_t id, int is_mmio,
> +                                        uint64_t start, uint64_t end)
> +{
> +    DECLARE_HYPERCALL;
> +    DECLARE_HYPERCALL_BUFFER(xen_hvm_map_io_range_to_ioreq_server_t, arg);
> +    int rc;
> +
> +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> +    if ( arg == NULL )
> +        return -1;
> +
> +    hypercall.op     = __HYPERVISOR_hvm_op;
> +    hypercall.arg[0] = HVMOP_map_io_range_to_ioreq_server;
> +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> +    arg->domid = domid;
> +    arg->id = id;
> +    arg->is_mmio = is_mmio;
> +    arg->start = start;
> +    arg->end = end;

Bounce a struct here instead?

> +    rc = do_xen_hypercall(xch, &hypercall);
> +    xc_hypercall_buffer_free(xch, arg);
> +    return rc;
> +}
> +
> +int xc_hvm_unmap_io_range_from_ioreq_server(xc_interface *xch, domid_t domid,
> +                                            ioservid_t id, int is_mmio,
> +                                            uint64_t start)
> +{
> +    DECLARE_HYPERCALL;
> +    DECLARE_HYPERCALL_BUFFER(xen_hvm_unmap_io_range_from_ioreq_server_t, arg);
> +    int rc;
> +
> +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> +    if ( arg == NULL )
> +        return -1;
> +
> +    hypercall.op     = __HYPERVISOR_hvm_op;
> +    hypercall.arg[0] = HVMOP_unmap_io_range_from_ioreq_server;
> +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> +    arg->domid = domid;
> +    arg->id = id;
> +    arg->is_mmio = is_mmio;
> +    arg->start = start;

Again?

> +    rc = do_xen_hypercall(xch, &hypercall);
> +    xc_hypercall_buffer_free(xch, arg);
> +    return rc;
> +}
> +
> +int xc_hvm_map_pcidev_to_ioreq_server(xc_interface *xch, domid_t domid,
> +                                      ioservid_t id, uint16_t bdf)
> +{
> +    DECLARE_HYPERCALL;
> +    DECLARE_HYPERCALL_BUFFER(xen_hvm_map_pcidev_to_ioreq_server_t, arg);
> +    int rc;
> +
> +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> +    if ( arg == NULL )
> +        return -1;
> +
> +    hypercall.op     = __HYPERVISOR_hvm_op;
> +    hypercall.arg[0] = HVMOP_map_pcidev_to_ioreq_server;
> +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> +    arg->domid = domid;
> +    arg->id = id;
> +    arg->bdf = bdf;

Guess what ;-)

> +    rc = do_xen_hypercall(xch, &hypercall);
> +    xc_hypercall_buffer_free(xch, arg);
> +    return rc;
> +}
> +
> +int xc_hvm_unmap_pcidev_from_ioreq_server(xc_interface *xch, domid_t domid,
> +                                          ioservid_t id, uint16_t bdf)
> +{
> +    DECLARE_HYPERCALL;
> +    DECLARE_HYPERCALL_BUFFER(xen_hvm_unmap_pcidev_from_ioreq_server_t, arg);
> +    int rc;
> +
> +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> +    if ( arg == NULL )
> +        return -1;
> +
> +    hypercall.op     = __HYPERVISOR_hvm_op;
> +    hypercall.arg[0] = HVMOP_unmap_pcidev_from_ioreq_server;
> +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> +    arg->domid = domid;
> +    arg->id = id;
> +    arg->bdf = bdf;

I'm going to stop suggesting it now ...

> +    rc = do_xen_hypercall(xch, &hypercall);
> +    xc_hypercall_buffer_free(xch, arg);
> +    return rc;
> +}
> +
> +int xc_hvm_destroy_ioreq_server(xc_interface *xch,
> +                                domid_t domid,
> +                                ioservid_t id)
> +{
> +    DECLARE_HYPERCALL;
> +    DECLARE_HYPERCALL_BUFFER(xen_hvm_destroy_ioreq_server_t, arg);
> +    int rc;
> +
> +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> +    if ( arg == NULL )
> +        return -1;
> +
> +    hypercall.op     = __HYPERVISOR_hvm_op;
> +    hypercall.arg[0] = HVMOP_destroy_ioreq_server;
> +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> +    arg->domid = domid;
> +    arg->id = id;

OK, one more...

> +    rc = do_xen_hypercall(xch, &hypercall);
> +    xc_hypercall_buffer_free(xch, arg);
> +    return rc;
> +}
> +
>  int xc_domain_setdebugging(xc_interface *xch,
>                             uint32_t domid,
>                             unsigned int enable)
> diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c
> index 1f6ce50..3116653 100644
> --- a/tools/libxc/xc_domain_restore.c
> +++ b/tools/libxc/xc_domain_restore.c
> @@ -746,6 +746,7 @@ typedef struct {
>      uint64_t acpi_ioport_location;
>      uint64_t viridian;
>      uint64_t vm_generationid_addr;
> +    uint64_t nr_ioreq_servers;

This makes me wonder: what happens if the source and target hosts do
different amounts of disaggregation? Perhaps in Xen N+1 we split some
additional component out into its own process?

This is going to be complex with the allocation of space for special
pages, isn't it?

>  
>      struct toolstack_data_t tdata;
>  } pagebuf_t;
> @@ -996,6 +997,16 @@ static int pagebuf_get_one(xc_interface *xch, struct restore_ctx *ctx,
>          DPRINTF("read generation id buffer address");
>          return pagebuf_get_one(xch, ctx, buf, fd, dom);
>  
> +    case XC_SAVE_ID_HVM_NR_IOREQ_SERVERS:
> +        /* Skip padding 4 bytes then read the acpi ioport location. */
> +        if ( RDEXACT(fd, &buf->nr_ioreq_servers, sizeof(uint32_t)) ||
> +             RDEXACT(fd, &buf->nr_ioreq_servers, sizeof(uint64_t)) )
> +        {
> +            PERROR("error reading the number of IOREQ servers");
> +            return -1;
> +        }
> +        return pagebuf_get_one(xch, ctx, buf, fd, dom);
> +
>      default:
>          if ( (count > MAX_BATCH_SIZE) || (count < 0) ) {
>              ERROR("Max batch size exceeded (%d). Giving up.", count);
> @@ -1755,6 +1766,15 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
>      if (pagebuf.viridian != 0)
>          xc_set_hvm_param(xch, dom, HVM_PARAM_VIRIDIAN, 1);
>  
> +    if ( hvm ) {
> +        int nr_ioreq_servers = pagebuf.nr_ioreq_servers;
> +
> +        if ( nr_ioreq_servers == 0 )
> +            nr_ioreq_servers = 1;
> +
> +        xc_set_hvm_param(xch, dom, HVM_PARAM_NR_IOREQ_SERVERS, nr_ioreq_servers);
> +    }
> +
>      if (pagebuf.acpi_ioport_location == 1) {
>          DBGPRINTF("Use new firmware ioport from the checkpoint\n");
>          xc_set_hvm_param(xch, dom, HVM_PARAM_ACPI_IOPORTS_LOCATION, 1);
> diff --git a/tools/libxc/xc_domain_save.c b/tools/libxc/xc_domain_save.c
> index 42c4752..3293e29 100644
> --- a/tools/libxc/xc_domain_save.c
> +++ b/tools/libxc/xc_domain_save.c
> @@ -1731,6 +1731,18 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iter
>              PERROR("Error when writing the viridian flag");
>              goto out;
>          }
> +
> +        chunk.id = XC_SAVE_ID_HVM_NR_IOREQ_SERVERS;
> +        chunk.data = 0;
> +        xc_get_hvm_param(xch, dom, HVM_PARAM_NR_IOREQ_SERVERS,
> +                         (unsigned long *)&chunk.data);
> +
> +        if ( (chunk.data != 0) &&

Can this ever be 0 for an HVM guest?

> +             wrexact(io_fd, &chunk, sizeof(chunk)) )
> +        {
> +            PERROR("Error when writing the number of IOREQ servers");
> +            goto out;
> +        }
>      }
>  
>      if ( callbacks != NULL && callbacks->toolstack_save != NULL )

> diff --git a/tools/libxc/xg_save_restore.h b/tools/libxc/xg_save_restore.h
> index f859621..5170b7f 100644
> --- a/tools/libxc/xg_save_restore.h
> +++ b/tools/libxc/xg_save_restore.h
> @@ -259,6 +259,7 @@
>  #define XC_SAVE_ID_HVM_ACCESS_RING_PFN  -16
>  #define XC_SAVE_ID_HVM_SHARING_RING_PFN -17
>  #define XC_SAVE_ID_TOOLSTACK          -18 /* Optional toolstack specific info */
> +#define XC_SAVE_ID_HVM_NR_IOREQ_SERVERS -19

Absence of this chunk => assume 1 iff hvm?

> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> index 4fc46eb..cf9b67d 100644
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
> @@ -1750,6 +1750,9 @@ skip_vfb:
>  
>              b_info->u.hvm.vendor_device = d;
>          }
> + 
> +        if (!xlu_cfg_get_long (config, "secondary_device_emulators", &l, 0))
> +            b_info->u.hvm.max_emulators = l + 1;

+ 1 ? Oh secondary. Well I already objected to the concept of this
generally but even if going this way then max_/nr_device_emulators would
have been better.

>      }
>  
>      xlu_cfg_destroy(config);
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 22b2a2c..996c374 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c

[...]

Skipping the xen side, if you want review from those folks I suggest you
CC them. People may find this more wieldy if you made the Xen changes
first?

> diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
> index a9aab4b..6b31189 100644
> --- a/xen/include/public/hvm/hvm_op.h
> +++ b/xen/include/public/hvm/hvm_op.h
> @@ -23,6 +23,7 @@
>  
>  #include "../xen.h"
>  #include "../trace.h"
> +#include "../event_channel.h"
>  
>  /* Get/set subcommands: extra argument == pointer to xen_hvm_param struct. */
>  #define HVMOP_set_param           0
> @@ -270,6 +271,75 @@ struct xen_hvm_inject_msi {
>  typedef struct xen_hvm_inject_msi xen_hvm_inject_msi_t;
>  DEFINE_XEN_GUEST_HANDLE(xen_hvm_inject_msi_t);
>  
> +typedef uint32_t ioservid_t;
> +
> +DEFINE_XEN_GUEST_HANDLE(ioservid_t);

I don't think you have any pointers to these in your interface, do you?
(if not then you don't need this)

> +#define HVMOP_get_ioreq_server_info 18
> +struct xen_hvm_get_ioreq_server_info {
> +    domid_t domid;          /* IN - domain to be serviced */
> +    ioservid_t id;          /* IN - server id */
> +    xen_pfn_t pfn;          /* OUT - ioreq pfn */
> +    xen_pfn_t buf_pfn;      /* OUT - buf ioreq pfn */

Are all servers required to have both buffered and unbuffered modes? I
could imagine a simple one requiring only one or the other.

> +    evtchn_port_t buf_port; /* OUT - buf ioreq port */
> +};
> +typedef struct xen_hvm_get_ioreq_server_info xen_hvm_get_ioreq_server_info_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_hvm_get_ioreq_server_info_t);
> +
> +#define HVMOP_map_io_range_to_ioreq_server 19
> +struct xen_hvm_map_io_range_to_ioreq_server {
> +    domid_t domid;                  /* IN - domain to be serviced */
> +    ioservid_t id;                  /* IN - handle from HVMOP_register_ioreq_server */
> +    int is_mmio;                    /* IN - MMIO or port IO? */

Do we normally make this distinction via two different hypercalls?

Ian.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 6/6] ioreq-server: bring the PCI hotplug controller implementation into Xen
  2014-03-05 14:48 ` [PATCH v3 6/6] ioreq-server: bring the PCI hotplug controller implementation into Xen Paul Durrant
@ 2014-03-14 11:57   ` Ian Campbell
  2014-03-14 13:25     ` Paul Durrant
  0 siblings, 1 reply; 48+ messages in thread
From: Ian Campbell @ 2014-03-14 11:57 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel

On Wed, 2014-03-05 at 14:48 +0000, Paul Durrant wrote:
> diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
> index 2e52470..4176440 100644
> --- a/tools/libxl/libxl_pci.c
> +++ b/tools/libxl/libxl_pci.c
> @@ -867,6 +867,13 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, i
>          }
>          if ( rc )
>              return ERROR_FAIL;
> +
> +        rc = xc_hvm_pci_hotplug_enable(ctx->xch, domid, pcidev->dev);
> +        if (rc < 0) {
> +            LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "Error: xc_hvm_pci_hotplug_enable failed");
> +            return ERROR_FAIL;
> +        }

Perhaps I'm misreading this but does this imply that you cannot hotplug
PCI devices into an HVM guest which wasn't started with a PCI device?
That doesn't sound right/desirable.

> diff --git a/xen/include/public/hvm/ioreq.h b/xen/include/public/hvm/ioreq.h
> index e84fa75..40bfa61 100644
> --- a/xen/include/public/hvm/ioreq.h
> +++ b/xen/include/public/hvm/ioreq.h
> @@ -101,6 +101,8 @@ typedef struct buffered_iopage buffered_iopage_t;
>  #define ACPI_PM_TMR_BLK_ADDRESS_V1   (ACPI_PM1A_EVT_BLK_ADDRESS_V1 + 0x08)
>  #define ACPI_GPE0_BLK_ADDRESS_V1     0xafe0
>  #define ACPI_GPE0_BLK_LEN_V1         0x04
> +#define ACPI_PCI_HOTPLUG_ADDRESS_V1  0xae00
> +#define ACPI_PCI_HOTPLUG_LEN_V1      0x10

This section is to do with qemu, perhaps having moved this to Xen these
should be in their own new section?

Is there no problem with the availability of the i/o space for the
different versions of qemu (i.e. they are both the same today?) The AML
looked like it poked a different thing in the trad case -- so is 0xae00
unused there?

Ian.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 6/6] ioreq-server: bring the PCI hotplug controller implementation into Xen
  2014-03-14 11:57   ` Ian Campbell
@ 2014-03-14 13:25     ` Paul Durrant
  2014-03-14 14:08       ` Ian Campbell
  0 siblings, 1 reply; 48+ messages in thread
From: Paul Durrant @ 2014-03-14 13:25 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel

> -----Original Message-----
> From: Ian Campbell
> Sent: 14 March 2014 11:58
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH v3 6/6] ioreq-server: bring the PCI hotplug
> controller implementation into Xen
> 
> On Wed, 2014-03-05 at 14:48 +0000, Paul Durrant wrote:
> > diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
> > index 2e52470..4176440 100644
> > --- a/tools/libxl/libxl_pci.c
> > +++ b/tools/libxl/libxl_pci.c
> > @@ -867,6 +867,13 @@ static int do_pci_add(libxl__gc *gc, uint32_t
> domid, libxl_device_pci *pcidev, i
> >          }
> >          if ( rc )
> >              return ERROR_FAIL;
> > +
> > +        rc = xc_hvm_pci_hotplug_enable(ctx->xch, domid, pcidev->dev);
> > +        if (rc < 0) {
> > +            LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "Error:
> xc_hvm_pci_hotplug_enable failed");
> > +            return ERROR_FAIL;
> > +        }
> 
> Perhaps I'm misreading this but does this imply that you cannot hotplug
> PCI devices into an HVM guest which wasn't started with a PCI device?
> That doesn't sound right/desirable.
> 

I don't think that is the case. The extra code here is because we're intercepting the hotplug controller IO space in Xen so QEMU may well play with its hotplug controller device model, but the guest will never see it.

> > diff --git a/xen/include/public/hvm/ioreq.h
> b/xen/include/public/hvm/ioreq.h
> > index e84fa75..40bfa61 100644
> > --- a/xen/include/public/hvm/ioreq.h
> > +++ b/xen/include/public/hvm/ioreq.h
> > @@ -101,6 +101,8 @@ typedef struct buffered_iopage buffered_iopage_t;
> >  #define ACPI_PM_TMR_BLK_ADDRESS_V1
> (ACPI_PM1A_EVT_BLK_ADDRESS_V1 + 0x08)
> >  #define ACPI_GPE0_BLK_ADDRESS_V1     0xafe0
> >  #define ACPI_GPE0_BLK_LEN_V1         0x04
> > +#define ACPI_PCI_HOTPLUG_ADDRESS_V1  0xae00
> > +#define ACPI_PCI_HOTPLUG_LEN_V1      0x10
> 
> This section is to do with qemu, perhaps having moved this to Xen these
> should be in their own new section?
> 

That sounds reasonable.

> Is there no problem with the availability of the i/o space for the
> different versions of qemu (i.e. they are both the same today?) The AML
> looked like it poked a different thing in the trad case -- so is 0xae00
> unused there?
> 

QEMU will still emulate a PCI hotplug controller but the guest will no longer see it. In the case of upstream that io range is now handled by xen, so it really really can't get to it. If trad is used then the hotplug controller would still be visible if the guest talks to the old IO ranges, but since they are not specified in the ACPI table any more it shouldn’t have anything to do with them. If you think that's a problem then I could hook those IO ranges in Xen too and stop the IO getting through.

  Paul

> Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 0/6] Support for running secondary emulators
  2014-03-14 11:02 ` Ian Campbell
@ 2014-03-14 13:26   ` Paul Durrant
  0 siblings, 0 replies; 48+ messages in thread
From: Paul Durrant @ 2014-03-14 13:26 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel

> -----Original Message-----
> From: Ian Campbell
> Sent: 14 March 2014 11:03
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH v3 0/6] Support for running secondary
> emulators
> 
> On Wed, 2014-03-05 at 14:47 +0000, Paul Durrant wrote:
> > This patch series adds the ioreq server interface which I mentioned in
> [...]
> 
> FYI due to the traffic levels on xen-devel it is conventional now to CC
> the maintainers of the code being patches. ./scripts/get_maintainer.pl
> can help with this.
> 

Ok. I'll do that for the next series.

  Paul

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 4/6] ioreq-server: on-demand creation of ioreq server
  2014-03-14 11:04       ` Ian Campbell
@ 2014-03-14 13:28         ` Paul Durrant
  0 siblings, 0 replies; 48+ messages in thread
From: Paul Durrant @ 2014-03-14 13:28 UTC (permalink / raw)
  To: Ian Campbell; +Cc: George Dunlap, xen-devel

> -----Original Message-----
> From: Ian Campbell
> Sent: 14 March 2014 11:04
> To: Paul Durrant
> Cc: George Dunlap; xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH v3 4/6] ioreq-server: on-demand creation of
> ioreq server
> 
> On Tue, 2014-03-11 at 10:54 +0000, Paul Durrant wrote:
> > > > -static void hvm_destroy_ioreq_page(
> > > > -    struct domain *d, struct hvm_ioreq_page *iorp)
> > > > +static void hvm_destroy_ioreq_page(struct hvm_ioreq_server *s,
> bool_t
> > > buf)
> > > >  {
> > > > -    spin_lock(&iorp->lock);
> > > > +    struct hvm_ioreq_page *iorp;
> > > >
> > > > -    ASSERT(d->is_dying);
> > > > +    iorp = buf ? &s->buf_ioreq : &s->ioreq;
> > > >
> > > >      destroy_ring_for_helper(&iorp->va, iorp->page);
> > > > -
> > > > -    spin_unlock(&iorp->lock);
> > >
> > > BTW, is there a reason you're getting rid of the locks here?
> > >
> >
> > Yes, I don't believe it is needed.
> 
> This smells like a separate patch to me, but at the very least removing
> a lock needs mention in the commit message.
> 

Ok. I can prefix the series with a lock removal patch.

  Paul

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 4/6] ioreq-server: on-demand creation of ioreq server
  2014-03-14 11:18   ` Ian Campbell
@ 2014-03-14 13:30     ` Paul Durrant
  0 siblings, 0 replies; 48+ messages in thread
From: Paul Durrant @ 2014-03-14 13:30 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel

> -----Original Message-----
> From: Ian Campbell
> Sent: 14 March 2014 11:19
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH v3 4/6] ioreq-server: on-demand creation of
> ioreq server
> 
> On Wed, 2014-03-05 at 14:47 +0000, Paul Durrant wrote:
> > This patch only creates the ioreq server when the legacy HVM parameters
> > are touched by an emulator.
> 
> The "ioreq server" is a hypervisor side concept? I was confused by
> expecting this to involve a server process in userspace (and couldn't
> see any way that could work ;-)).
> 
> > It also lays some groundwork for supporting
> > multiple IOREQ servers. For instance, it introduces ioreq server reference
> > counting
> 
> And refactors hvm_do_resume I think? No semantic change?
> 
> And it makes changes wrt whether the domain is paused while things are
> set up, which needs rationale (as does the associated lock removal which
> was already commented upon by George).
> 
> TBH, I think all of these deserve to be split out, but at the very least
> they should be discussed in the change log (i.e. the list following "FOr
> instance" should be a complete list, not a single example).
> 
> >  which is not strictly necessary at this stage but will become so
> > when ioreq servers can be destroyed prior the domain dying.
> >

Actually I think that paragraph now doesn't match the code properly anyway. I'll make sure to change it in the next version.

  Paul

> > There is a significant change in the layout of the special pages reserved
> > in xc_hvm_build_x86.c. This is so that we can 'grow' them downwards
> without
> > moving pages such as the xenstore page when building a domain that can
> > support more than one emulator.
> >
> > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> > ---
> >  tools/libxc/xc_hvm_build_x86.c |   40 +++++--
> >  xen/arch/x86/hvm/hvm.c         |  240 +++++++++++++++++++++++++------
> ---------
> >  2 files changed, 176 insertions(+), 104 deletions(-)
> >
> > diff --git a/tools/libxc/xc_hvm_build_x86.c
> b/tools/libxc/xc_hvm_build_x86.c
> > index dd3b522..b65e702 100644
> > --- a/tools/libxc/xc_hvm_build_x86.c
> > +++ b/tools/libxc/xc_hvm_build_x86.c
> > @@ -41,13 +41,12 @@
> >  #define SPECIALPAGE_PAGING   0
> >  #define SPECIALPAGE_ACCESS   1
> >  #define SPECIALPAGE_SHARING  2
> > -#define SPECIALPAGE_BUFIOREQ 3
> 
> This guy has disappeared entirely?
> 
> > -#define SPECIALPAGE_XENSTORE 4
> > -#define SPECIALPAGE_IOREQ    5
> > -#define SPECIALPAGE_IDENT_PT 6
> > -#define SPECIALPAGE_CONSOLE  7
> > -#define NR_SPECIAL_PAGES     8
> > -#define special_pfn(x) (0xff000u - NR_SPECIAL_PAGES + (x))
> > +#define SPECIALPAGE_XENSTORE 3
> > +#define SPECIALPAGE_IDENT_PT 4
> > +#define SPECIALPAGE_CONSOLE  5
> > +#define SPECIALPAGE_IOREQ    6
> > +#define NR_SPECIAL_PAGES     SPECIALPAGE_IOREQ + 2 /* ioreq server
> needs 2 pages */
> 
> BY way of documentation I think
> #define SPECIALPAGE_IOREQ2 7 should just be included in the list. Or
> maybe just /* ioreq server needs 2 pages: 7 */ where that define would
> go (with the 7 in the correct column).
> 
> > +#define special_pfn(x) (0xff000u - 1 - (x))
> >
> >  #define VGA_HOLE_SIZE (0x20)
> >
> > @@ -114,7 +113,7 @@ static void build_hvm_info(void *hvm_info_page,
> uint64_t mem_size,
> >      /* Memory parameters. */
> >      hvm_info->low_mem_pgend = lowmem_end >> PAGE_SHIFT;
> >      hvm_info->high_mem_pgend = highmem_end >> PAGE_SHIFT;
> > -    hvm_info->reserved_mem_pgstart = special_pfn(0);
> > +    hvm_info->reserved_mem_pgstart = special_pfn(0) -
> NR_SPECIAL_PAGES;
> 
> #define SPECIAL_PAGES_START up with the list?
> 
> >      /* Finish with the checksum. */
> >      for ( i = 0, sum = 0; i < hvm_info->length; i++ )
> > @@ -473,6 +472,23 @@ static int setup_guest(xc_interface *xch,
> >      munmap(hvm_info_page, PAGE_SIZE);
> >
> >      /* Allocate and clear special pages. */
> > +
> > +    DPRINTF("%d SPECIAL PAGES:\n", NR_SPECIAL_PAGES);
> > +    DPRINTF("  PAGING:    %"PRI_xen_pfn"\n",
> > +            (xen_pfn_t)special_pfn(SPECIALPAGE_PAGING));
> > +    DPRINTF("  ACCESS:    %"PRI_xen_pfn"\n",
> > +            (xen_pfn_t)special_pfn(SPECIALPAGE_ACCESS));
> > +    DPRINTF("  SHARING:   %"PRI_xen_pfn"\n",
> > +            (xen_pfn_t)special_pfn(SPECIALPAGE_SHARING));
> > +    DPRINTF("  STORE:     %"PRI_xen_pfn"\n",
> > +            (xen_pfn_t)special_pfn(SPECIALPAGE_XENSTORE));
> > +    DPRINTF("  IDENT_PT:  %"PRI_xen_pfn"\n",
> > +            (xen_pfn_t)special_pfn(SPECIALPAGE_IDENT_PT));
> > +    DPRINTF("  CONSOLE:   %"PRI_xen_pfn"\n",
> > +            (xen_pfn_t)special_pfn(SPECIALPAGE_CONSOLE));
> > +    DPRINTF("  IOREQ:     %"PRI_xen_pfn"\n",
> > +            (xen_pfn_t)special_pfn(SPECIALPAGE_IOREQ));
> > +
> >      for ( i = 0; i < NR_SPECIAL_PAGES; i++ )
> >      {
> >          xen_pfn_t pfn = special_pfn(i);
> > @@ -488,10 +504,6 @@ static int setup_guest(xc_interface *xch,
> >
> >      xc_set_hvm_param(xch, dom, HVM_PARAM_STORE_PFN,
> >                       special_pfn(SPECIALPAGE_XENSTORE));
> > -    xc_set_hvm_param(xch, dom, HVM_PARAM_BUFIOREQ_PFN,
> > -                     special_pfn(SPECIALPAGE_BUFIOREQ));
> > -    xc_set_hvm_param(xch, dom, HVM_PARAM_IOREQ_PFN,
> > -                     special_pfn(SPECIALPAGE_IOREQ));
> >      xc_set_hvm_param(xch, dom, HVM_PARAM_CONSOLE_PFN,
> >                       special_pfn(SPECIALPAGE_CONSOLE));
> >      xc_set_hvm_param(xch, dom, HVM_PARAM_PAGING_RING_PFN,
> > @@ -500,6 +512,10 @@ static int setup_guest(xc_interface *xch,
> >                       special_pfn(SPECIALPAGE_ACCESS));
> >      xc_set_hvm_param(xch, dom, HVM_PARAM_SHARING_RING_PFN,
> >                       special_pfn(SPECIALPAGE_SHARING));
> > +    xc_set_hvm_param(xch, dom, HVM_PARAM_IOREQ_PFN,
> > +                     special_pfn(SPECIALPAGE_IOREQ));
> > +    xc_set_hvm_param(xch, dom, HVM_PARAM_BUFIOREQ_PFN,
> > +                     special_pfn(SPECIALPAGE_IOREQ) - 1);
> 
> This seems to suggest that BUFIOREQ has actually become IOREQ2?
> 
> >
> >      /*
> >       * Identity-map page table is required for running with CR0.PG=0 when
> > diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> > index bbf9577..22b2a2c 100644
> > --- a/xen/arch/x86/hvm/hvm.c
> > +++ b/xen/arch/x86/hvm/hvm.c
> > @@ -366,22 +366,9 @@ bool_t hvm_io_pending(struct vcpu *v)
> >      return ( p->state != STATE_IOREQ_NONE );
> >  }
> >
> > -void hvm_do_resume(struct vcpu *v)
> > +static void hvm_wait_on_io(struct domain *d, ioreq_t *p)
> >  {
> > -    struct domain *d = v->domain;
> > -    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
> > -    ioreq_t *p;
> > -
> > -    check_wakeup_from_wait();
> > -
> > -    if ( is_hvm_vcpu(v) )
> > -        pt_restore_timer(v);
> > -
> > -    if ( !s )
> > -        goto check_inject_trap;
> > -
> >      /* NB. Optimised for common case (p->state == STATE_IOREQ_NONE).
> */
> > -    p = get_ioreq(s, v->vcpu_id);
> >      while ( p->state != STATE_IOREQ_NONE )
> >      {
> >          switch ( p->state )
> > @@ -397,12 +384,29 @@ void hvm_do_resume(struct vcpu *v)
> >              break;
> >          default:
> >              gdprintk(XENLOG_ERR, "Weird HVM iorequest state %d.\n", p-
> >state);
> > -            domain_crash(v->domain);
> > +            domain_crash(d);
> >              return; /* bail */
> >          }
> >      }
> > +}
> > +
> > +void hvm_do_resume(struct vcpu *v)
> > +{
> > +    struct domain *d = v->domain;
> > +    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
> > +
> > +    check_wakeup_from_wait();
> > +
> > +    if ( is_hvm_vcpu(v) )
> > +        pt_restore_timer(v);
> > +
> > +    if ( s )
> > +    {
> > +        ioreq_t *p = get_ioreq(s, v->vcpu_id);
> > +
> > +        hvm_wait_on_io(d, p);
> > +    }
> >
> > - check_inject_trap:
> >      /* Inject pending hw/sw trap */
> >      if ( v->arch.hvm_vcpu.inject_trap.vector != -1 )
> >      {
> > @@ -411,11 +415,13 @@ void hvm_do_resume(struct vcpu *v)
> >      }
> >  }
> >
> > -static void hvm_init_ioreq_page(
> > -    struct domain *d, struct hvm_ioreq_page *iorp)
> > +static void hvm_init_ioreq_page(struct hvm_ioreq_server *s, bool_t buf)
> >  {
> > +    struct hvm_ioreq_page *iorp;
> > +
> > +    iorp = buf ? &s->buf_ioreq : &s->ioreq;
> 
> This appears a lot, suggesting that either this logic deserves to be
> further up the call chain or there should be a helper for it. I suspect
> the former.
> 
> > @@ -606,29 +606,88 @@ static void
> hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s, struct vcpu
> >      free_xen_event_channel(v, v->arch.hvm_vcpu.ioreq_evtchn);
> >  }
> >
> > -static int hvm_create_ioreq_server(struct domain *d)
> > +static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
> 
> (I've just realised I've straying into hypervisor side territory. yet
> another reason why this stuff should be split out since the tools
> changes don't appear to be very related to most of this. Stopping here.
> )
> 
> Ian.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 6/6] ioreq-server: bring the PCI hotplug controller implementation into Xen
  2014-03-14 13:25     ` Paul Durrant
@ 2014-03-14 14:08       ` Ian Campbell
  2014-03-14 14:31         ` Paul Durrant
  0 siblings, 1 reply; 48+ messages in thread
From: Ian Campbell @ 2014-03-14 14:08 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel

On Fri, 2014-03-14 at 13:25 +0000, Paul Durrant wrote:
> > -----Original Message-----
> > From: Ian Campbell
> > Sent: 14 March 2014 11:58
> > To: Paul Durrant
> > Cc: xen-devel@lists.xen.org
> > Subject: Re: [Xen-devel] [PATCH v3 6/6] ioreq-server: bring the PCI hotplug
> > controller implementation into Xen
> > 
> > On Wed, 2014-03-05 at 14:48 +0000, Paul Durrant wrote:
> > > diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
> > > index 2e52470..4176440 100644
> > > --- a/tools/libxl/libxl_pci.c
> > > +++ b/tools/libxl/libxl_pci.c
> > > @@ -867,6 +867,13 @@ static int do_pci_add(libxl__gc *gc, uint32_t
> > domid, libxl_device_pci *pcidev, i
> > >          }
> > >          if ( rc )
> > >              return ERROR_FAIL;
> > > +
> > > +        rc = xc_hvm_pci_hotplug_enable(ctx->xch, domid, pcidev->dev);
> > > +        if (rc < 0) {
> > > +            LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "Error:
> > xc_hvm_pci_hotplug_enable failed");
> > > +            return ERROR_FAIL;
> > > +        }
> > 
> > Perhaps I'm misreading this but does this imply that you cannot hotplug
> > PCI devices into an HVM guest which wasn't started with a PCI device?
> > That doesn't sound right/desirable.
> > 
> 
> I don't think that is the case. The extra code here is because we're
> intercepting the hotplug controller IO space in Xen so QEMU may well
> play with its hotplug controller device model, but the guest will
> never see it.

That wasn't what I meant.

Unless the guest has a PCI device enabled the above code will never be
called, so we will never setup the hotplug controller within Xen.

> > Is there no problem with the availability of the i/o space for the
> > different versions of qemu (i.e. they are both the same today?) The AML
> > looked like it poked a different thing in the trad case -- so is 0xae00
> > unused there?
> > 
> 
> QEMU will still emulate a PCI hotplug controller but the guest will no
> longer see it. In the case of upstream that io range is now handled by
> xen, so it really really can't get to it. If trad is used then the
> hotplug controller would still be visible if the guest talks to the
> old IO ranges, but since they are not specified in the ACPI table any
> more it shouldn’t have anything to do with them. If you think that's a
> problem then I could hook those IO ranges in Xen too and stop the IO
> getting through.

What I meant was what if there was something else at 0xae00 on trad?
(since trad seems to have its hotplug controller somewhere else this is
possible). That something will now be shadowed by the hotplug controller
in Xen. If that something was important for some other reason this is a
problem. IOW is there a hole in the io port address map at this location
on both qemus?

BTW, what happens on migrate from current Xen to something with this
patch in? The guest will be using the old AML and poke the old
addresses. Maybe that just works?

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 6/6] ioreq-server: bring the PCI hotplug controller implementation into Xen
  2014-03-14 14:08       ` Ian Campbell
@ 2014-03-14 14:31         ` Paul Durrant
  2014-03-14 15:01           ` Ian Campbell
  0 siblings, 1 reply; 48+ messages in thread
From: Paul Durrant @ 2014-03-14 14:31 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel

> -----Original Message-----
> From: Ian Campbell
> Sent: 14 March 2014 14:09
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH v3 6/6] ioreq-server: bring the PCI hotplug
> controller implementation into Xen
> 
> On Fri, 2014-03-14 at 13:25 +0000, Paul Durrant wrote:
> > > -----Original Message-----
> > > From: Ian Campbell
> > > Sent: 14 March 2014 11:58
> > > To: Paul Durrant
> > > Cc: xen-devel@lists.xen.org
> > > Subject: Re: [Xen-devel] [PATCH v3 6/6] ioreq-server: bring the PCI
> hotplug
> > > controller implementation into Xen
> > >
> > > On Wed, 2014-03-05 at 14:48 +0000, Paul Durrant wrote:
> > > > diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
> > > > index 2e52470..4176440 100644
> > > > --- a/tools/libxl/libxl_pci.c
> > > > +++ b/tools/libxl/libxl_pci.c
> > > > @@ -867,6 +867,13 @@ static int do_pci_add(libxl__gc *gc, uint32_t
> > > domid, libxl_device_pci *pcidev, i
> > > >          }
> > > >          if ( rc )
> > > >              return ERROR_FAIL;
> > > > +
> > > > +        rc = xc_hvm_pci_hotplug_enable(ctx->xch, domid, pcidev->dev);
> > > > +        if (rc < 0) {
> > > > +            LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "Error:
> > > xc_hvm_pci_hotplug_enable failed");
> > > > +            return ERROR_FAIL;
> > > > +        }
> > >
> > > Perhaps I'm misreading this but does this imply that you cannot hotplug
> > > PCI devices into an HVM guest which wasn't started with a PCI device?
> > > That doesn't sound right/desirable.
> > >
> >
> > I don't think that is the case. The extra code here is because we're
> > intercepting the hotplug controller IO space in Xen so QEMU may well
> > play with its hotplug controller device model, but the guest will
> > never see it.
> 
> That wasn't what I meant.
> 
> Unless the guest has a PCI device enabled the above code will never be
> called, so we will never setup the hotplug controller within Xen.
> 

I don't follow. The hotplug controller is set up by the call to gpe_init() in hvm_domain_initialize(). The above code is there to tell the hotplug controller a new device has appeared. Am I missing something?

> > > Is there no problem with the availability of the i/o space for the
> > > different versions of qemu (i.e. they are both the same today?) The AML
> > > looked like it poked a different thing in the trad case -- so is 0xae00
> > > unused there?
> > >
> >
> > QEMU will still emulate a PCI hotplug controller but the guest will no
> > longer see it. In the case of upstream that io range is now handled by
> > xen, so it really really can't get to it. If trad is used then the
> > hotplug controller would still be visible if the guest talks to the
> > old IO ranges, but since they are not specified in the ACPI table any
> > more it shouldn’t have anything to do with them. If you think that's a
> > problem then I could hook those IO ranges in Xen too and stop the IO
> > getting through.
> 
> What I meant was what if there was something else at 0xae00 on trad?

I don't believe so.

> (since trad seems to have its hotplug controller somewhere else this is
> possible). That something will now be shadowed by the hotplug controller
> in Xen. If that something was important for some other reason this is a
> problem. IOW is there a hole in the io port address map at this location
> on both qemus?
> 

The new implementation in Xen directly overlays the upstream QEMU controller. I believe those IO ports are unimplemented by trad.

> BTW, what happens on migrate from current Xen to something with this
> patch in? The guest will be using the old AML and poke the old
> addresses. Maybe that just works?
> 

If you started with upstream, we now overlay them with IO ports having the same semantics, so that should be fine. If you started with trad then there will be a problem - the IO ports will still function, but the new API call will cause the SCI to be asserted and unless something talks to to the new IO port it won't be de-asserted.
So, I guess we need something to explicitly init the new hotplug controller in domain build rather than always creating it.

  Paul

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 6/6] ioreq-server: bring the PCI hotplug controller implementation into Xen
  2014-03-14 14:31         ` Paul Durrant
@ 2014-03-14 15:01           ` Ian Campbell
  2014-03-14 15:18             ` Paul Durrant
  0 siblings, 1 reply; 48+ messages in thread
From: Ian Campbell @ 2014-03-14 15:01 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel

On Fri, 2014-03-14 at 14:31 +0000, Paul Durrant wrote:
> > -----Original Message-----
> > From: Ian Campbell
> > Sent: 14 March 2014 14:09
> > To: Paul Durrant
> > Cc: xen-devel@lists.xen.org
> > Subject: Re: [Xen-devel] [PATCH v3 6/6] ioreq-server: bring the PCI hotplug
> > controller implementation into Xen
> > 
> > On Fri, 2014-03-14 at 13:25 +0000, Paul Durrant wrote:
> > > > -----Original Message-----
> > > > From: Ian Campbell
> > > > Sent: 14 March 2014 11:58
> > > > To: Paul Durrant
> > > > Cc: xen-devel@lists.xen.org
> > > > Subject: Re: [Xen-devel] [PATCH v3 6/6] ioreq-server: bring the PCI
> > hotplug
> > > > controller implementation into Xen
> > > >
> > > > On Wed, 2014-03-05 at 14:48 +0000, Paul Durrant wrote:
> > > > > diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
> > > > > index 2e52470..4176440 100644
> > > > > --- a/tools/libxl/libxl_pci.c
> > > > > +++ b/tools/libxl/libxl_pci.c
> > > > > @@ -867,6 +867,13 @@ static int do_pci_add(libxl__gc *gc, uint32_t
> > > > domid, libxl_device_pci *pcidev, i
> > > > >          }
> > > > >          if ( rc )
> > > > >              return ERROR_FAIL;
> > > > > +
> > > > > +        rc = xc_hvm_pci_hotplug_enable(ctx->xch, domid, pcidev->dev);
> > > > > +        if (rc < 0) {
> > > > > +            LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "Error:
> > > > xc_hvm_pci_hotplug_enable failed");
> > > > > +            return ERROR_FAIL;
> > > > > +        }
> > > >
> > > > Perhaps I'm misreading this but does this imply that you cannot hotplug
> > > > PCI devices into an HVM guest which wasn't started with a PCI device?
> > > > That doesn't sound right/desirable.
> > > >
> > >
> > > I don't think that is the case. The extra code here is because we're
> > > intercepting the hotplug controller IO space in Xen so QEMU may well
> > > play with its hotplug controller device model, but the guest will
> > > never see it.
> > 
> > That wasn't what I meant.
> > 
> > Unless the guest has a PCI device enabled the above code will never be
> > called, so we will never setup the hotplug controller within Xen.
> > 
> 
> I don't follow. The hotplug controller is set up by the call to
> gpe_init() in hvm_domain_initialize(). The above code is there to tell
> the hotplug controller a new device has appeared. Am I missing
> something?

No, I was, didn't realise this was per-device setup.

I assume this is ok to call for both cold- and hotplug

> > > > Is there no problem with the availability of the i/o space for the
> > > > different versions of qemu (i.e. they are both the same today?) The AML
> > > > looked like it poked a different thing in the trad case -- so is 0xae00
> > > > unused there?
> > > >
> > >
> > > QEMU will still emulate a PCI hotplug controller but the guest will no
> > > longer see it. In the case of upstream that io range is now handled by
> > > xen, so it really really can't get to it. If trad is used then the
> > > hotplug controller would still be visible if the guest talks to the
> > > old IO ranges, but since they are not specified in the ACPI table any
> > > more it shouldn’t have anything to do with them. If you think that's a
> > > problem then I could hook those IO ranges in Xen too and stop the IO
> > > getting through.
> > 
> > What I meant was what if there was something else at 0xae00 on trad?
> 
> I don't believe so.
> 
> > (since trad seems to have its hotplug controller somewhere else this is
> > possible). That something will now be shadowed by the hotplug controller
> > in Xen. If that something was important for some other reason this is a
> > problem. IOW is there a hole in the io port address map at this location
> > on both qemus?
> > 
> 
> The new implementation in Xen directly overlays the upstream QEMU
> controller.

I got this part.

>  I believe those IO ports are unimplemented by trad.

That's the important thing (although turning "believe" into "have
confirmed" would make me sleep easier).

> > BTW, what happens on migrate from current Xen to something with this
> > patch in? The guest will be using the old AML and poke the old
> > addresses. Maybe that just works?
> > 
> 
> If you started with upstream, we now overlay them with IO ports having
> the same semantics, so that should be fine.

Do we need to arrange to restore any state saved by qemu into Xen or is
it stateless across migration?

>  If you started with trad then there will be a problem - the IO ports
> will still function, but the new API call will cause the SCI to be
> asserted and unless something talks to to the new IO port it won't be
> de-asserted.
>
> So, I guess we need something to explicitly init the new hotplug
> controller in domain build rather than always creating it.

You also need to keep the distinction in the AML then, which is
unfortunate.

We could tie this feature (multireq) to qemu version and simply not
support it for qemu-trad, that might simplify some of this sort of thing
a bit.

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 6/6] ioreq-server: bring the PCI hotplug controller implementation into Xen
  2014-03-14 15:01           ` Ian Campbell
@ 2014-03-14 15:18             ` Paul Durrant
  2014-03-14 18:06               ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 48+ messages in thread
From: Paul Durrant @ 2014-03-14 15:18 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel

> -----Original Message-----
> From: Ian Campbell
> Sent: 14 March 2014 15:02
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH v3 6/6] ioreq-server: bring the PCI hotplug
> controller implementation into Xen
> 
> On Fri, 2014-03-14 at 14:31 +0000, Paul Durrant wrote:
> > > -----Original Message-----
> > > From: Ian Campbell
> > > Sent: 14 March 2014 14:09
> > > To: Paul Durrant
> > > Cc: xen-devel@lists.xen.org
> > > Subject: Re: [Xen-devel] [PATCH v3 6/6] ioreq-server: bring the PCI
> hotplug
> > > controller implementation into Xen
> > >
> > > On Fri, 2014-03-14 at 13:25 +0000, Paul Durrant wrote:
> > > > > -----Original Message-----
> > > > > From: Ian Campbell
> > > > > Sent: 14 March 2014 11:58
> > > > > To: Paul Durrant
> > > > > Cc: xen-devel@lists.xen.org
> > > > > Subject: Re: [Xen-devel] [PATCH v3 6/6] ioreq-server: bring the PCI
> > > hotplug
> > > > > controller implementation into Xen
> > > > >
> > > > > On Wed, 2014-03-05 at 14:48 +0000, Paul Durrant wrote:
> > > > > > diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
> > > > > > index 2e52470..4176440 100644
> > > > > > --- a/tools/libxl/libxl_pci.c
> > > > > > +++ b/tools/libxl/libxl_pci.c
> > > > > > @@ -867,6 +867,13 @@ static int do_pci_add(libxl__gc *gc, uint32_t
> > > > > domid, libxl_device_pci *pcidev, i
> > > > > >          }
> > > > > >          if ( rc )
> > > > > >              return ERROR_FAIL;
> > > > > > +
> > > > > > +        rc = xc_hvm_pci_hotplug_enable(ctx->xch, domid, pcidev-
> >dev);
> > > > > > +        if (rc < 0) {
> > > > > > +            LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "Error:
> > > > > xc_hvm_pci_hotplug_enable failed");
> > > > > > +            return ERROR_FAIL;
> > > > > > +        }
> > > > >
> > > > > Perhaps I'm misreading this but does this imply that you cannot
> hotplug
> > > > > PCI devices into an HVM guest which wasn't started with a PCI
> device?
> > > > > That doesn't sound right/desirable.
> > > > >
> > > >
> > > > I don't think that is the case. The extra code here is because we're
> > > > intercepting the hotplug controller IO space in Xen so QEMU may well
> > > > play with its hotplug controller device model, but the guest will
> > > > never see it.
> > >
> > > That wasn't what I meant.
> > >
> > > Unless the guest has a PCI device enabled the above code will never be
> > > called, so we will never setup the hotplug controller within Xen.
> > >
> >
> > I don't follow. The hotplug controller is set up by the call to
> > gpe_init() in hvm_domain_initialize(). The above code is there to tell
> > the hotplug controller a new device has appeared. Am I missing
> > something?
> 
> No, I was, didn't realise this was per-device setup.
> 
> I assume this is ok to call for both cold- and hotplug
> 

I believe so. I've certainly seen no fallout in testing my new VGA device (which is cold plugged to a paused domain).

> > > > > Is there no problem with the availability of the i/o space for the
> > > > > different versions of qemu (i.e. they are both the same today?) The
> AML
> > > > > looked like it poked a different thing in the trad case -- so is 0xae00
> > > > > unused there?
> > > > >
> > > >
> > > > QEMU will still emulate a PCI hotplug controller but the guest will no
> > > > longer see it. In the case of upstream that io range is now handled by
> > > > xen, so it really really can't get to it. If trad is used then the
> > > > hotplug controller would still be visible if the guest talks to the
> > > > old IO ranges, but since they are not specified in the ACPI table any
> > > > more it shouldn’t have anything to do with them. If you think that's a
> > > > problem then I could hook those IO ranges in Xen too and stop the IO
> > > > getting through.
> > >
> > > What I meant was what if there was something else at 0xae00 on trad?
> >
> > I don't believe so.
> >
> > > (since trad seems to have its hotplug controller somewhere else this is
> > > possible). That something will now be shadowed by the hotplug
> controller
> > > in Xen. If that something was important for some other reason this is a
> > > problem. IOW is there a hole in the io port address map at this location
> > > on both qemus?
> > >
> >
> > The new implementation in Xen directly overlays the upstream QEMU
> > controller.
> 
> I got this part.
> 
> >  I believe those IO ports are unimplemented by trad.
> 
> That's the important thing (although turning "believe" into "have
> confirmed" would make me sleep easier).

Sure, I can check.

> 
> > > BTW, what happens on migrate from current Xen to something with this
> > > patch in? The guest will be using the old AML and poke the old
> > > addresses. Maybe that just works?
> > >
> >
> > If you started with upstream, we now overlay them with IO ports having
> > the same semantics, so that should be fine.
> 
> Do we need to arrange to restore any state saved by qemu into Xen or is
> it stateless across migration?
> 

It should be stateless as we cannot migrate with a pass-through device plugged in AFAIK.

> >  If you started with trad then there will be a problem - the IO ports
> > will still function, but the new API call will cause the SCI to be
> > asserted and unless something talks to to the new IO port it won't be
> > de-asserted.
> >
> > So, I guess we need something to explicitly init the new hotplug
> > controller in domain build rather than always creating it.
> 
> You also need to keep the distinction in the AML then, which is
> unfortunate.
> 

I don't think we do. The old AML should migrate across with the guest. For every new domain, it will get the new hotplug controller and hence need the new AML.

> We could tie this feature (multireq) to qemu version and simply not
> support it for qemu-trad, that might simplify some of this sort of thing
> a bit.
> 

If we did that then we *would* need to select the right AML, but - providing I can verify that trad doesn't have any IO ports covered up by the new implementation - I don't think we need to differentiate.

  Paul

> Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 6/6] ioreq-server: bring the PCI hotplug controller implementation into Xen
  2014-03-14 15:18             ` Paul Durrant
@ 2014-03-14 18:06               ` Konrad Rzeszutek Wilk
  2014-03-17 11:13                 ` Paul Durrant
  0 siblings, 1 reply; 48+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-03-14 18:06 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Ian Campbell, xen-devel

On Fri, Mar 14, 2014 at 03:18:12PM +0000, Paul Durrant wrote:
> > -----Original Message-----
> > From: Ian Campbell
> > Sent: 14 March 2014 15:02
> > To: Paul Durrant
> > Cc: xen-devel@lists.xen.org
> > Subject: Re: [Xen-devel] [PATCH v3 6/6] ioreq-server: bring the PCI hotplug
> > controller implementation into Xen
> > 
> > On Fri, 2014-03-14 at 14:31 +0000, Paul Durrant wrote:
> > > > -----Original Message-----
> > > > From: Ian Campbell
> > > > Sent: 14 March 2014 14:09
> > > > To: Paul Durrant
> > > > Cc: xen-devel@lists.xen.org
> > > > Subject: Re: [Xen-devel] [PATCH v3 6/6] ioreq-server: bring the PCI
> > hotplug
> > > > controller implementation into Xen
> > > >
> > > > On Fri, 2014-03-14 at 13:25 +0000, Paul Durrant wrote:
> > > > > > -----Original Message-----
> > > > > > From: Ian Campbell
> > > > > > Sent: 14 March 2014 11:58
> > > > > > To: Paul Durrant
> > > > > > Cc: xen-devel@lists.xen.org
> > > > > > Subject: Re: [Xen-devel] [PATCH v3 6/6] ioreq-server: bring the PCI
> > > > hotplug
> > > > > > controller implementation into Xen
> > > > > >
> > > > > > On Wed, 2014-03-05 at 14:48 +0000, Paul Durrant wrote:
> > > > > > > diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
> > > > > > > index 2e52470..4176440 100644
> > > > > > > --- a/tools/libxl/libxl_pci.c
> > > > > > > +++ b/tools/libxl/libxl_pci.c
> > > > > > > @@ -867,6 +867,13 @@ static int do_pci_add(libxl__gc *gc, uint32_t
> > > > > > domid, libxl_device_pci *pcidev, i
> > > > > > >          }
> > > > > > >          if ( rc )
> > > > > > >              return ERROR_FAIL;
> > > > > > > +
> > > > > > > +        rc = xc_hvm_pci_hotplug_enable(ctx->xch, domid, pcidev-
> > >dev);
> > > > > > > +        if (rc < 0) {
> > > > > > > +            LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "Error:
> > > > > > xc_hvm_pci_hotplug_enable failed");
> > > > > > > +            return ERROR_FAIL;
> > > > > > > +        }
> > > > > >
> > > > > > Perhaps I'm misreading this but does this imply that you cannot
> > hotplug
> > > > > > PCI devices into an HVM guest which wasn't started with a PCI
> > device?
> > > > > > That doesn't sound right/desirable.
> > > > > >
> > > > >
> > > > > I don't think that is the case. The extra code here is because we're
> > > > > intercepting the hotplug controller IO space in Xen so QEMU may well
> > > > > play with its hotplug controller device model, but the guest will
> > > > > never see it.
> > > >
> > > > That wasn't what I meant.
> > > >
> > > > Unless the guest has a PCI device enabled the above code will never be
> > > > called, so we will never setup the hotplug controller within Xen.
> > > >
> > >
> > > I don't follow. The hotplug controller is set up by the call to
> > > gpe_init() in hvm_domain_initialize(). The above code is there to tell
> > > the hotplug controller a new device has appeared. Am I missing
> > > something?
> > 
> > No, I was, didn't realise this was per-device setup.
> > 
> > I assume this is ok to call for both cold- and hotplug
> > 
> 
> I believe so. I've certainly seen no fallout in testing my new VGA device (which is cold plugged to a paused domain).
> 
> > > > > > Is there no problem with the availability of the i/o space for the
> > > > > > different versions of qemu (i.e. they are both the same today?) The
> > AML
> > > > > > looked like it poked a different thing in the trad case -- so is 0xae00
> > > > > > unused there?
> > > > > >
> > > > >
> > > > > QEMU will still emulate a PCI hotplug controller but the guest will no
> > > > > longer see it. In the case of upstream that io range is now handled by
> > > > > xen, so it really really can't get to it. If trad is used then the
> > > > > hotplug controller would still be visible if the guest talks to the
> > > > > old IO ranges, but since they are not specified in the ACPI table any
> > > > > more it shouldn’t have anything to do with them. If you think that's a
> > > > > problem then I could hook those IO ranges in Xen too and stop the IO
> > > > > getting through.
> > > >
> > > > What I meant was what if there was something else at 0xae00 on trad?
> > >
> > > I don't believe so.
> > >
> > > > (since trad seems to have its hotplug controller somewhere else this is
> > > > possible). That something will now be shadowed by the hotplug
> > controller
> > > > in Xen. If that something was important for some other reason this is a
> > > > problem. IOW is there a hole in the io port address map at this location
> > > > on both qemus?
> > > >
> > >
> > > The new implementation in Xen directly overlays the upstream QEMU
> > > controller.
> > 
> > I got this part.
> > 
> > >  I believe those IO ports are unimplemented by trad.
> > 
> > That's the important thing (although turning "believe" into "have
> > confirmed" would make me sleep easier).
> 
> Sure, I can check.

What about future version of QEMU? If they moved the addresses in the future
(or decided to expand the existing ones to do some extra stuff) - what is our
path to deal with this?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 6/6] ioreq-server: bring the PCI hotplug controller implementation into Xen
  2014-03-14 18:06               ` Konrad Rzeszutek Wilk
@ 2014-03-17 11:13                 ` Paul Durrant
  0 siblings, 0 replies; 48+ messages in thread
From: Paul Durrant @ 2014-03-17 11:13 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Ian Campbell, xen-devel

> -----Original Message-----
> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com]
> Sent: 14 March 2014 18:07
> To: Paul Durrant
> Cc: Ian Campbell; xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH v3 6/6] ioreq-server: bring the PCI hotplug
> controller implementation into Xen
> 
> On Fri, Mar 14, 2014 at 03:18:12PM +0000, Paul Durrant wrote:
> > > -----Original Message-----
> > > From: Ian Campbell
> > > Sent: 14 March 2014 15:02
> > > To: Paul Durrant
> > > Cc: xen-devel@lists.xen.org
> > > Subject: Re: [Xen-devel] [PATCH v3 6/6] ioreq-server: bring the PCI
> hotplug
> > > controller implementation into Xen
> > >
> > > On Fri, 2014-03-14 at 14:31 +0000, Paul Durrant wrote:
> > > > > -----Original Message-----
> > > > > From: Ian Campbell
> > > > > Sent: 14 March 2014 14:09
> > > > > To: Paul Durrant
> > > > > Cc: xen-devel@lists.xen.org
> > > > > Subject: Re: [Xen-devel] [PATCH v3 6/6] ioreq-server: bring the PCI
> > > hotplug
> > > > > controller implementation into Xen
> > > > >
> > > > > On Fri, 2014-03-14 at 13:25 +0000, Paul Durrant wrote:
> > > > > > > -----Original Message-----
> > > > > > > From: Ian Campbell
> > > > > > > Sent: 14 March 2014 11:58
> > > > > > > To: Paul Durrant
> > > > > > > Cc: xen-devel@lists.xen.org
> > > > > > > Subject: Re: [Xen-devel] [PATCH v3 6/6] ioreq-server: bring the
> PCI
> > > > > hotplug
> > > > > > > controller implementation into Xen
> > > > > > >
> > > > > > > On Wed, 2014-03-05 at 14:48 +0000, Paul Durrant wrote:
> > > > > > > > diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
> > > > > > > > index 2e52470..4176440 100644
> > > > > > > > --- a/tools/libxl/libxl_pci.c
> > > > > > > > +++ b/tools/libxl/libxl_pci.c
> > > > > > > > @@ -867,6 +867,13 @@ static int do_pci_add(libxl__gc *gc,
> uint32_t
> > > > > > > domid, libxl_device_pci *pcidev, i
> > > > > > > >          }
> > > > > > > >          if ( rc )
> > > > > > > >              return ERROR_FAIL;
> > > > > > > > +
> > > > > > > > +        rc = xc_hvm_pci_hotplug_enable(ctx->xch, domid, pcidev-
> > > >dev);
> > > > > > > > +        if (rc < 0) {
> > > > > > > > +            LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "Error:
> > > > > > > xc_hvm_pci_hotplug_enable failed");
> > > > > > > > +            return ERROR_FAIL;
> > > > > > > > +        }
> > > > > > >
> > > > > > > Perhaps I'm misreading this but does this imply that you cannot
> > > hotplug
> > > > > > > PCI devices into an HVM guest which wasn't started with a PCI
> > > device?
> > > > > > > That doesn't sound right/desirable.
> > > > > > >
> > > > > >
> > > > > > I don't think that is the case. The extra code here is because we're
> > > > > > intercepting the hotplug controller IO space in Xen so QEMU may
> well
> > > > > > play with its hotplug controller device model, but the guest will
> > > > > > never see it.
> > > > >
> > > > > That wasn't what I meant.
> > > > >
> > > > > Unless the guest has a PCI device enabled the above code will never
> be
> > > > > called, so we will never setup the hotplug controller within Xen.
> > > > >
> > > >
> > > > I don't follow. The hotplug controller is set up by the call to
> > > > gpe_init() in hvm_domain_initialize(). The above code is there to tell
> > > > the hotplug controller a new device has appeared. Am I missing
> > > > something?
> > >
> > > No, I was, didn't realise this was per-device setup.
> > >
> > > I assume this is ok to call for both cold- and hotplug
> > >
> >
> > I believe so. I've certainly seen no fallout in testing my new VGA device
> (which is cold plugged to a paused domain).
> >
> > > > > > > Is there no problem with the availability of the i/o space for the
> > > > > > > different versions of qemu (i.e. they are both the same today?)
> The
> > > AML
> > > > > > > looked like it poked a different thing in the trad case -- so is
> 0xae00
> > > > > > > unused there?
> > > > > > >
> > > > > >
> > > > > > QEMU will still emulate a PCI hotplug controller but the guest will no
> > > > > > longer see it. In the case of upstream that io range is now handled
> by
> > > > > > xen, so it really really can't get to it. If trad is used then the
> > > > > > hotplug controller would still be visible if the guest talks to the
> > > > > > old IO ranges, but since they are not specified in the ACPI table any
> > > > > > more it shouldn’t have anything to do with them. If you think that's
> a
> > > > > > problem then I could hook those IO ranges in Xen too and stop the
> IO
> > > > > > getting through.
> > > > >
> > > > > What I meant was what if there was something else at 0xae00 on
> trad?
> > > >
> > > > I don't believe so.
> > > >
> > > > > (since trad seems to have its hotplug controller somewhere else this
> is
> > > > > possible). That something will now be shadowed by the hotplug
> > > controller
> > > > > in Xen. If that something was important for some other reason this is
> a
> > > > > problem. IOW is there a hole in the io port address map at this
> location
> > > > > on both qemus?
> > > > >
> > > >
> > > > The new implementation in Xen directly overlays the upstream QEMU
> > > > controller.
> > >
> > > I got this part.
> > >
> > > >  I believe those IO ports are unimplemented by trad.
> > >
> > > That's the important thing (although turning "believe" into "have
> > > confirmed" would make me sleep easier).
> >
> > Sure, I can check.
> 
> What about future version of QEMU? If they moved the addresses in the
> future
> (or decided to expand the existing ones to do some extra stuff) - what is our
> path to deal with this?

I don't think we have to worry about that now. If our AML always points at Xen's hotplug controller implementation then, if QEMU decides to add some new IO ports for its hotplug controller implementation then it's no different from the machine type getting a new device model that we don't expose to the guest: we have to decide whether it’s a problem on a case-by-case basis.

  Paul
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 5/6] ioreq-server: add support for multiple servers
  2014-03-14 11:52   ` Ian Campbell
@ 2014-03-17 11:45     ` George Dunlap
  2014-03-17 12:25     ` Paul Durrant
  1 sibling, 0 replies; 48+ messages in thread
From: George Dunlap @ 2014-03-17 11:45 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Paul Durrant, xen-devel

On Fri, Mar 14, 2014 at 11:52 AM, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> On Wed, 2014-03-05 at 14:48 +0000, Paul Durrant wrote:
>> The legacy 'catch-all' server is always created with id 0. Secondary
>> servers will have an id ranging from 1 to a limit set by the toolstack
>> via the 'max_emulators' build info field. This defaults to 1 so ordinarily
>> no extra special pages are reserved for secondary emulators. It may be
>> increased using the secondary_device_emulators parameter in xl.cfg(5).
>> There's no clear limit to apply to the number of emulators so I've not
>> applied one.
>>
>> Because of the re-arrangement of the special pages in a previous patch we
>> only need the addition of parameter HVM_PARAM_NR_IOREQ_SERVERS to determine
>> the layout of the shared pages for multiple emulators. Guests migrated in
>> from hosts without this patch will be lacking the save record which stores
>> the new parameter and so the guest is assumed to only have had a single
>> emulator.
>>
>> Added some more emacs boilerplate to xenctrl.h and xenguest.h
>>
>> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
>> ---
>>  docs/man/xl.cfg.pod.5            |    7 +
>>  tools/libxc/xc_domain.c          |  175 +++++++
>>  tools/libxc/xc_domain_restore.c  |   20 +
>>  tools/libxc/xc_domain_save.c     |   12 +
>>  tools/libxc/xc_hvm_build_x86.c   |   24 +-
>>  tools/libxc/xenctrl.h            |   51 ++
>>  tools/libxc/xenguest.h           |   12 +
>>  tools/libxc/xg_save_restore.h    |    1 +
>>  tools/libxl/libxl.h              |    8 +
>>  tools/libxl/libxl_create.c       |    3 +
>>  tools/libxl/libxl_dom.c          |    1 +
>>  tools/libxl/libxl_types.idl      |    1 +
>>  tools/libxl/xl_cmdimpl.c         |    3 +
>>  xen/arch/x86/hvm/hvm.c           |  964 ++++++++++++++++++++++++++++++++++++--
>>  xen/arch/x86/hvm/io.c            |    2 +-
>>  xen/include/asm-x86/hvm/domain.h |   23 +-
>>  xen/include/asm-x86/hvm/hvm.h    |    3 +-
>>  xen/include/asm-x86/hvm/io.h     |    2 +-
>>  xen/include/public/hvm/hvm_op.h  |   70 +++
>>  xen/include/public/hvm/ioreq.h   |    1 +
>>  xen/include/public/hvm/params.h  |    4 +-
>>  21 files changed, 1324 insertions(+), 63 deletions(-)
>>
>> diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
>> index e15a49f..0226c55 100644
>> --- a/docs/man/xl.cfg.pod.5
>> +++ b/docs/man/xl.cfg.pod.5
>> @@ -1281,6 +1281,13 @@ specified, enabling the use of XenServer PV drivers in the guest.
>>  This parameter only takes effect when device_model_version=qemu-xen.
>>  See F<docs/misc/pci-device-reservations.txt> for more information.
>>
>> +=item B<secondary_device_emulators=NUMBER>
>> +
>> +If a number of secondary device emulators (i.e. in addition to
>> +qemu-xen or qemu-xen-traditional) are to be invoked to support the
>> +guest then this parameter can be set with the count of how many are
>> +to be used. The default value is zero.
>
> This is an odd thing to expose to the user. Surely (lib)xl should be
> launching these things while building the domain and therefore know how
> many there are the config options should be things like
> "use_split_dm_for_foo=1" or device_emulators = ["/usr/bin/first-dm",
> "/usr/local/bin/second-dm"] or something.

I think the idea was that someone may want to hot-plug such an
emulated device later.

I agree that long-term, the standard should be to specify secondary
device models in the config file and have libxl / xl figure out how
many there are.  But I don't think that means that having a way to
specify it explicitly is a bad idea; and I think for development
purposes it might make sense to start with it manually specified and
add the support for multiple emulators in the config file later.

[snip]

>>          xc_set_hvm_param(xch, dom, HVM_PARAM_ACPI_IOPORTS_LOCATION, 1);
>> diff --git a/tools/libxc/xc_domain_save.c b/tools/libxc/xc_domain_save.c
>> index 42c4752..3293e29 100644
>> --- a/tools/libxc/xc_domain_save.c
>> +++ b/tools/libxc/xc_domain_save.c
>> @@ -1731,6 +1731,18 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iter
>>              PERROR("Error when writing the viridian flag");
>>              goto out;
>>          }
>> +
>> +        chunk.id = XC_SAVE_ID_HVM_NR_IOREQ_SERVERS;
>> +        chunk.data = 0;
>> +        xc_get_hvm_param(xch, dom, HVM_PARAM_NR_IOREQ_SERVERS,
>> +                         (unsigned long *)&chunk.data);
>> +
>> +        if ( (chunk.data != 0) &&
>
> Can this ever be 0 for an HVM guest?

If we end up making PVH basically a degenerate case of HVM (without a
device model), then something like this may.

 -George

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 5/6] ioreq-server: add support for multiple servers
  2014-03-14 11:52   ` Ian Campbell
  2014-03-17 11:45     ` George Dunlap
@ 2014-03-17 12:25     ` Paul Durrant
  2014-03-17 12:35       ` Ian Campbell
  1 sibling, 1 reply; 48+ messages in thread
From: Paul Durrant @ 2014-03-17 12:25 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel

> -----Original Message-----
> From: Ian Campbell
> Sent: 14 March 2014 11:52
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH v3 5/6] ioreq-server: add support for
> multiple servers
> 
> On Wed, 2014-03-05 at 14:48 +0000, Paul Durrant wrote:
> > The legacy 'catch-all' server is always created with id 0. Secondary
> > servers will have an id ranging from 1 to a limit set by the toolstack
> > via the 'max_emulators' build info field. This defaults to 1 so ordinarily
> > no extra special pages are reserved for secondary emulators. It may be
> > increased using the secondary_device_emulators parameter in xl.cfg(5).
> > There's no clear limit to apply to the number of emulators so I've not
> > applied one.
> >
> > Because of the re-arrangement of the special pages in a previous patch we
> > only need the addition of parameter HVM_PARAM_NR_IOREQ_SERVERS
> to determine
> > the layout of the shared pages for multiple emulators. Guests migrated in
> > from hosts without this patch will be lacking the save record which stores
> > the new parameter and so the guest is assumed to only have had a single
> > emulator.
> >
> > Added some more emacs boilerplate to xenctrl.h and xenguest.h
> >
> > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> > ---
> >  docs/man/xl.cfg.pod.5            |    7 +
> >  tools/libxc/xc_domain.c          |  175 +++++++
> >  tools/libxc/xc_domain_restore.c  |   20 +
> >  tools/libxc/xc_domain_save.c     |   12 +
> >  tools/libxc/xc_hvm_build_x86.c   |   24 +-
> >  tools/libxc/xenctrl.h            |   51 ++
> >  tools/libxc/xenguest.h           |   12 +
> >  tools/libxc/xg_save_restore.h    |    1 +
> >  tools/libxl/libxl.h              |    8 +
> >  tools/libxl/libxl_create.c       |    3 +
> >  tools/libxl/libxl_dom.c          |    1 +
> >  tools/libxl/libxl_types.idl      |    1 +
> >  tools/libxl/xl_cmdimpl.c         |    3 +
> >  xen/arch/x86/hvm/hvm.c           |  964
> ++++++++++++++++++++++++++++++++++++--
> >  xen/arch/x86/hvm/io.c            |    2 +-
> >  xen/include/asm-x86/hvm/domain.h |   23 +-
> >  xen/include/asm-x86/hvm/hvm.h    |    3 +-
> >  xen/include/asm-x86/hvm/io.h     |    2 +-
> >  xen/include/public/hvm/hvm_op.h  |   70 +++
> >  xen/include/public/hvm/ioreq.h   |    1 +
> >  xen/include/public/hvm/params.h  |    4 +-
> >  21 files changed, 1324 insertions(+), 63 deletions(-)
> >
> > diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
> > index e15a49f..0226c55 100644
> > --- a/docs/man/xl.cfg.pod.5
> > +++ b/docs/man/xl.cfg.pod.5
> > @@ -1281,6 +1281,13 @@ specified, enabling the use of XenServer PV
> drivers in the guest.
> >  This parameter only takes effect when device_model_version=qemu-xen.
> >  See F<docs/misc/pci-device-reservations.txt> for more information.
> >
> > +=item B<secondary_device_emulators=NUMBER>
> > +
> > +If a number of secondary device emulators (i.e. in addition to
> > +qemu-xen or qemu-xen-traditional) are to be invoked to support the
> > +guest then this parameter can be set with the count of how many are
> > +to be used. The default value is zero.
> 
> This is an odd thing to expose to the user. Surely (lib)xl should be
> launching these things while building the domain and therefore know how
> many there are the config options should be things like
> "use_split_dm_for_foo=1" or device_emulators = ["/usr/bin/first-dm",
> "/usr/local/bin/second-dm"] or something.
> 

As George says, I don’t want to restrict the only way to kick off emulators to being libxl at this point.

> > +
> >  =back
> >
> >  =head2 Device-Model Options
> > diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
> > index 369c3f3..dfa905b 100644
> > --- a/tools/libxc/xc_domain.c
> > +++ b/tools/libxc/xc_domain.c
> > +int xc_hvm_get_ioreq_server_info(xc_interface *xch,
> > +                                 domid_t domid,
> > +                                 ioservid_t id,
> > +                                 xen_pfn_t *pfn,
> > +                                 xen_pfn_t *buf_pfn,
> > +                                 evtchn_port_t *buf_port)
> > +{
> > +    DECLARE_HYPERCALL;
> > +    DECLARE_HYPERCALL_BUFFER(xen_hvm_get_ioreq_server_info_t,
> arg);
> > +    int rc;
> > +
> > +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> > +    if ( arg == NULL )
> > +        return -1;
> > +
> > +    hypercall.op     = __HYPERVISOR_hvm_op;
> > +    hypercall.arg[0] = HVMOP_get_ioreq_server_info;
> > +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> > +    arg->domid = domid;
> > +    arg->id = id;
> > +    rc = do_xen_hypercall(xch, &hypercall);
> > +    if ( rc != 0 )
> > +        goto done;
> > +
> > +    if ( pfn )
> > +        *pfn = arg->pfn;
> > +
> > +    if ( buf_pfn )
> > +        *buf_pfn = arg->buf_pfn;
> > +
> > +    if ( buf_port )
> > +        *buf_port = arg->buf_port;
> 
> This looks a bit like this function should take a
> xen_hvm_get_ioreq_server_info_t* and use the bounce buffering stuff.
> Unless there is some desire to hide that struct from the callers
> perhaps?
> 

Well, I guess the caller could do the marshalling but isn't it neater to hide it all in libxenctrl internals? I don't know what the general philosophy behind libxenctrl apis is. Most of the other functions seem to take an xc_interface and a bunch of args rather than a single struct.

> > +
> > +done:
> > +    xc_hypercall_buffer_free(xch, arg);
> > +    return rc;
> > +}
> > +
> > +int xc_hvm_map_io_range_to_ioreq_server(xc_interface *xch, domid_t
> domid,
> > +                                        ioservid_t id, int is_mmio,
> > +                                        uint64_t start, uint64_t end)
> > +{
> > +    DECLARE_HYPERCALL;
> > +
> DECLARE_HYPERCALL_BUFFER(xen_hvm_map_io_range_to_ioreq_server_t,
> arg);
> > +    int rc;
> > +
> > +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> > +    if ( arg == NULL )
> > +        return -1;
> > +
> > +    hypercall.op     = __HYPERVISOR_hvm_op;
> > +    hypercall.arg[0] = HVMOP_map_io_range_to_ioreq_server;
> > +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> > +    arg->domid = domid;
> > +    arg->id = id;
> > +    arg->is_mmio = is_mmio;
> > +    arg->start = start;
> > +    arg->end = end;
> 
> Bounce a struct here instead?
> 
> > +    rc = do_xen_hypercall(xch, &hypercall);
> > +    xc_hypercall_buffer_free(xch, arg);
> > +    return rc;
> > +}
> > +
> > +int xc_hvm_unmap_io_range_from_ioreq_server(xc_interface *xch,
> domid_t domid,
> > +                                            ioservid_t id, int is_mmio,
> > +                                            uint64_t start)
> > +{
> > +    DECLARE_HYPERCALL;
> > +
> DECLARE_HYPERCALL_BUFFER(xen_hvm_unmap_io_range_from_ioreq_ser
> ver_t, arg);
> > +    int rc;
> > +
> > +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> > +    if ( arg == NULL )
> > +        return -1;
> > +
> > +    hypercall.op     = __HYPERVISOR_hvm_op;
> > +    hypercall.arg[0] = HVMOP_unmap_io_range_from_ioreq_server;
> > +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> > +    arg->domid = domid;
> > +    arg->id = id;
> > +    arg->is_mmio = is_mmio;
> > +    arg->start = start;
> 
> Again?
> 
> > +    rc = do_xen_hypercall(xch, &hypercall);
> > +    xc_hypercall_buffer_free(xch, arg);
> > +    return rc;
> > +}
> > +
> > +int xc_hvm_map_pcidev_to_ioreq_server(xc_interface *xch, domid_t
> domid,
> > +                                      ioservid_t id, uint16_t bdf)
> > +{
> > +    DECLARE_HYPERCALL;
> > +
> DECLARE_HYPERCALL_BUFFER(xen_hvm_map_pcidev_to_ioreq_server_t,
> arg);
> > +    int rc;
> > +
> > +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> > +    if ( arg == NULL )
> > +        return -1;
> > +
> > +    hypercall.op     = __HYPERVISOR_hvm_op;
> > +    hypercall.arg[0] = HVMOP_map_pcidev_to_ioreq_server;
> > +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> > +    arg->domid = domid;
> > +    arg->id = id;
> > +    arg->bdf = bdf;
> 
> Guess what ;-)
> 
> > +    rc = do_xen_hypercall(xch, &hypercall);
> > +    xc_hypercall_buffer_free(xch, arg);
> > +    return rc;
> > +}
> > +
> > +int xc_hvm_unmap_pcidev_from_ioreq_server(xc_interface *xch,
> domid_t domid,
> > +                                          ioservid_t id, uint16_t bdf)
> > +{
> > +    DECLARE_HYPERCALL;
> > +
> DECLARE_HYPERCALL_BUFFER(xen_hvm_unmap_pcidev_from_ioreq_serve
> r_t, arg);
> > +    int rc;
> > +
> > +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> > +    if ( arg == NULL )
> > +        return -1;
> > +
> > +    hypercall.op     = __HYPERVISOR_hvm_op;
> > +    hypercall.arg[0] = HVMOP_unmap_pcidev_from_ioreq_server;
> > +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> > +    arg->domid = domid;
> > +    arg->id = id;
> > +    arg->bdf = bdf;
> 
> I'm going to stop suggesting it now ...
> 
> > +    rc = do_xen_hypercall(xch, &hypercall);
> > +    xc_hypercall_buffer_free(xch, arg);
> > +    return rc;
> > +}
> > +
> > +int xc_hvm_destroy_ioreq_server(xc_interface *xch,
> > +                                domid_t domid,
> > +                                ioservid_t id)
> > +{
> > +    DECLARE_HYPERCALL;
> > +    DECLARE_HYPERCALL_BUFFER(xen_hvm_destroy_ioreq_server_t, arg);
> > +    int rc;
> > +
> > +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> > +    if ( arg == NULL )
> > +        return -1;
> > +
> > +    hypercall.op     = __HYPERVISOR_hvm_op;
> > +    hypercall.arg[0] = HVMOP_destroy_ioreq_server;
> > +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> > +    arg->domid = domid;
> > +    arg->id = id;
> 
> OK, one more...
> 
> > +    rc = do_xen_hypercall(xch, &hypercall);
> > +    xc_hypercall_buffer_free(xch, arg);
> > +    return rc;
> > +}
> > +
> >  int xc_domain_setdebugging(xc_interface *xch,
> >                             uint32_t domid,
> >                             unsigned int enable)
> > diff --git a/tools/libxc/xc_domain_restore.c
> b/tools/libxc/xc_domain_restore.c
> > index 1f6ce50..3116653 100644
> > --- a/tools/libxc/xc_domain_restore.c
> > +++ b/tools/libxc/xc_domain_restore.c
> > @@ -746,6 +746,7 @@ typedef struct {
> >      uint64_t acpi_ioport_location;
> >      uint64_t viridian;
> >      uint64_t vm_generationid_addr;
> > +    uint64_t nr_ioreq_servers;
> 
> This makes me wonder: what happens if the source and target hosts do
> different amounts of disaggregation? Perhaps in Xen N+1 we split some
> additional component out into its own process?
> 
> This is going to be complex with the allocation of space for special
> pages, isn't it?
>

As long as we have enough special pages then is it complex? All Xen needs to know is the base ioreq server pfn and how many the VM has. I'm overloading the existing HVM param as the base and then adding a new one for the count. George (as I understand it) suggested leaving the old params alone, grandfathering them in for the catch-all server, and then having a new area for secondary emulators. I'm happy with that and, as long as suitable save records were introduced for any other dissag. work, then I don't think there's any conflict.
 
> >
> >      struct toolstack_data_t tdata;
> >  } pagebuf_t;
> > @@ -996,6 +997,16 @@ static int pagebuf_get_one(xc_interface *xch,
> struct restore_ctx *ctx,
> >          DPRINTF("read generation id buffer address");
> >          return pagebuf_get_one(xch, ctx, buf, fd, dom);
> >
> > +    case XC_SAVE_ID_HVM_NR_IOREQ_SERVERS:
> > +        /* Skip padding 4 bytes then read the acpi ioport location. */
> > +        if ( RDEXACT(fd, &buf->nr_ioreq_servers, sizeof(uint32_t)) ||
> > +             RDEXACT(fd, &buf->nr_ioreq_servers, sizeof(uint64_t)) )
> > +        {
> > +            PERROR("error reading the number of IOREQ servers");
> > +            return -1;
> > +        }
> > +        return pagebuf_get_one(xch, ctx, buf, fd, dom);
> > +
> >      default:
> >          if ( (count > MAX_BATCH_SIZE) || (count < 0) ) {
> >              ERROR("Max batch size exceeded (%d). Giving up.", count);
> > @@ -1755,6 +1766,15 @@ int xc_domain_restore(xc_interface *xch, int
> io_fd, uint32_t dom,
> >      if (pagebuf.viridian != 0)
> >          xc_set_hvm_param(xch, dom, HVM_PARAM_VIRIDIAN, 1);
> >
> > +    if ( hvm ) {
> > +        int nr_ioreq_servers = pagebuf.nr_ioreq_servers;
> > +
> > +        if ( nr_ioreq_servers == 0 )
> > +            nr_ioreq_servers = 1;
> > +
> > +        xc_set_hvm_param(xch, dom, HVM_PARAM_NR_IOREQ_SERVERS,
> nr_ioreq_servers);
> > +    }
> > +
> >      if (pagebuf.acpi_ioport_location == 1) {
> >          DBGPRINTF("Use new firmware ioport from the checkpoint\n");
> >          xc_set_hvm_param(xch, dom,
> HVM_PARAM_ACPI_IOPORTS_LOCATION, 1);
> > diff --git a/tools/libxc/xc_domain_save.c b/tools/libxc/xc_domain_save.c
> > index 42c4752..3293e29 100644
> > --- a/tools/libxc/xc_domain_save.c
> > +++ b/tools/libxc/xc_domain_save.c
> > @@ -1731,6 +1731,18 @@ int xc_domain_save(xc_interface *xch, int io_fd,
> uint32_t dom, uint32_t max_iter
> >              PERROR("Error when writing the viridian flag");
> >              goto out;
> >          }
> > +
> > +        chunk.id = XC_SAVE_ID_HVM_NR_IOREQ_SERVERS;
> > +        chunk.data = 0;
> > +        xc_get_hvm_param(xch, dom, HVM_PARAM_NR_IOREQ_SERVERS,
> > +                         (unsigned long *)&chunk.data);
> > +
> > +        if ( (chunk.data != 0) &&
> 
> Can this ever be 0 for an HVM guest?
> 

I was assuming it would 0 for a guest migrated in from a host that did not know about secondary emulators.

> > +             wrexact(io_fd, &chunk, sizeof(chunk)) )
> > +        {
> > +            PERROR("Error when writing the number of IOREQ servers");
> > +            goto out;
> > +        }
> >      }
> >
> >      if ( callbacks != NULL && callbacks->toolstack_save != NULL )
> 
> > diff --git a/tools/libxc/xg_save_restore.h b/tools/libxc/xg_save_restore.h
> > index f859621..5170b7f 100644
> > --- a/tools/libxc/xg_save_restore.h
> > +++ b/tools/libxc/xg_save_restore.h
> > @@ -259,6 +259,7 @@
> >  #define XC_SAVE_ID_HVM_ACCESS_RING_PFN  -16
> >  #define XC_SAVE_ID_HVM_SHARING_RING_PFN -17
> >  #define XC_SAVE_ID_TOOLSTACK          -18 /* Optional toolstack specific
> info */
> > +#define XC_SAVE_ID_HVM_NR_IOREQ_SERVERS -19
> 
> Absence of this chunk => assume 1 iff hvm?
> 

Yes. That's the intention.

> > diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> > index 4fc46eb..cf9b67d 100644
> > --- a/tools/libxl/xl_cmdimpl.c
> > +++ b/tools/libxl/xl_cmdimpl.c
> > @@ -1750,6 +1750,9 @@ skip_vfb:
> >
> >              b_info->u.hvm.vendor_device = d;
> >          }
> > +
> > +        if (!xlu_cfg_get_long (config, "secondary_device_emulators", &l, 0))
> > +            b_info->u.hvm.max_emulators = l + 1;
> 
> + 1 ? Oh secondary. Well I already objected to the concept of this
> generally but even if going this way then max_/nr_device_emulators would
> have been better.
> 
> >      }
> >
> >      xlu_cfg_destroy(config);
> > diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> > index 22b2a2c..996c374 100644
> > --- a/xen/arch/x86/hvm/hvm.c
> > +++ b/xen/arch/x86/hvm/hvm.c
> 
> [...]
> 
> Skipping the xen side, if you want review from those folks I suggest you
> CC them. People may find this more wieldy if you made the Xen changes
> first?
> 
> > diff --git a/xen/include/public/hvm/hvm_op.h
> b/xen/include/public/hvm/hvm_op.h
> > index a9aab4b..6b31189 100644
> > --- a/xen/include/public/hvm/hvm_op.h
> > +++ b/xen/include/public/hvm/hvm_op.h
> > @@ -23,6 +23,7 @@
> >
> >  #include "../xen.h"
> >  #include "../trace.h"
> > +#include "../event_channel.h"
> >
> >  /* Get/set subcommands: extra argument == pointer to xen_hvm_param
> struct. */
> >  #define HVMOP_set_param           0
> > @@ -270,6 +271,75 @@ struct xen_hvm_inject_msi {
> >  typedef struct xen_hvm_inject_msi xen_hvm_inject_msi_t;
> >  DEFINE_XEN_GUEST_HANDLE(xen_hvm_inject_msi_t);
> >
> > +typedef uint32_t ioservid_t;
> > +
> > +DEFINE_XEN_GUEST_HANDLE(ioservid_t);
> 
> I don't think you have any pointers to these in your interface, do you?
> (if not then you don't need this)
> 
> > +#define HVMOP_get_ioreq_server_info 18
> > +struct xen_hvm_get_ioreq_server_info {
> > +    domid_t domid;          /* IN - domain to be serviced */
> > +    ioservid_t id;          /* IN - server id */
> > +    xen_pfn_t pfn;          /* OUT - ioreq pfn */
> > +    xen_pfn_t buf_pfn;      /* OUT - buf ioreq pfn */
> 
> Are all servers required to have both buffered and unbuffered modes? I
> could imagine a simple one requiring only one or the other.
> 

True. Perhaps it should be optional.

> > +    evtchn_port_t buf_port; /* OUT - buf ioreq port */
> > +};
> > +typedef struct xen_hvm_get_ioreq_server_info
> xen_hvm_get_ioreq_server_info_t;
> > +DEFINE_XEN_GUEST_HANDLE(xen_hvm_get_ioreq_server_info_t);
> > +
> > +#define HVMOP_map_io_range_to_ioreq_server 19
> > +struct xen_hvm_map_io_range_to_ioreq_server {
> > +    domid_t domid;                  /* IN - domain to be serviced */
> > +    ioservid_t id;                  /* IN - handle from
> HVMOP_register_ioreq_server */
> > +    int is_mmio;                    /* IN - MMIO or port IO? */
> 
> Do we normally make this distinction via two different hypercalls?
> 

I don't really think there's much precedent at an API layer. There's code inside Xen which overloads portio and mmio to some degree.

  Paul

> Ian.
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 5/6] ioreq-server: add support for multiple servers
  2014-03-17 12:25     ` Paul Durrant
@ 2014-03-17 12:35       ` Ian Campbell
  2014-03-17 12:51         ` Paul Durrant
  0 siblings, 1 reply; 48+ messages in thread
From: Ian Campbell @ 2014-03-17 12:35 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel

On Mon, 2014-03-17 at 12:25 +0000, Paul Durrant wrote:
> > drivers in the guest.
> > >  This parameter only takes effect when device_model_version=qemu-xen.
> > >  See F<docs/misc/pci-device-reservations.txt> for more information.
> > >
> > > +=item B<secondary_device_emulators=NUMBER>
> > > +
> > > +If a number of secondary device emulators (i.e. in addition to
> > > +qemu-xen or qemu-xen-traditional) are to be invoked to support the
> > > +guest then this parameter can be set with the count of how many are
> > > +to be used. The default value is zero.
> > 
> > This is an odd thing to expose to the user. Surely (lib)xl should be
> > launching these things while building the domain and therefore know how
> > many there are the config options should be things like
> > "use_split_dm_for_foo=1" or device_emulators = ["/usr/bin/first-dm",
> > "/usr/local/bin/second-dm"] or something.
> > 
> 
> As George says, I don’t want to restrict the only way to kick off
> emulators to being libxl at this point.

OK. There's conversely no support here for libxl launching them either
though?

> 
> > > +
> > >  =back
> > >
> > >  =head2 Device-Model Options
> > > diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
> > > index 369c3f3..dfa905b 100644
> > > --- a/tools/libxc/xc_domain.c
> > > +++ b/tools/libxc/xc_domain.c
> > > +int xc_hvm_get_ioreq_server_info(xc_interface *xch,
> > > +                                 domid_t domid,
> > > +                                 ioservid_t id,
> > > +                                 xen_pfn_t *pfn,
> > > +                                 xen_pfn_t *buf_pfn,
> > > +                                 evtchn_port_t *buf_port)
> > > +{
> > > +    DECLARE_HYPERCALL;
> > > +    DECLARE_HYPERCALL_BUFFER(xen_hvm_get_ioreq_server_info_t,
> > arg);
> > > +    int rc;
> > > +
> > > +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> > > +    if ( arg == NULL )
> > > +        return -1;
> > > +
> > > +    hypercall.op     = __HYPERVISOR_hvm_op;
> > > +    hypercall.arg[0] = HVMOP_get_ioreq_server_info;
> > > +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> > > +    arg->domid = domid;
> > > +    arg->id = id;
> > > +    rc = do_xen_hypercall(xch, &hypercall);
> > > +    if ( rc != 0 )
> > > +        goto done;
> > > +
> > > +    if ( pfn )
> > > +        *pfn = arg->pfn;
> > > +
> > > +    if ( buf_pfn )
> > > +        *buf_pfn = arg->buf_pfn;
> > > +
> > > +    if ( buf_port )
> > > +        *buf_port = arg->buf_port;
> > 
> > This looks a bit like this function should take a
> > xen_hvm_get_ioreq_server_info_t* and use the bounce buffering stuff.
> > Unless there is some desire to hide that struct from the callers
> > perhaps?
> > 
> 
> Well, I guess the caller could do the marshalling but isn't it neater
> to hide it all in libxenctrl internals? I don't know what the general
> philosophy behind libxenctrl apis is. Most of the other functions seem
> to take an xc_interface and a bunch of args rather than a single
> struct.

I guess it depends what the caller is most likely to do -- if it uses
them immediately and throws them away then this way is fine, if it's
going to pass them around then the existing struct seems useful. I
presume it is the former or you'd have written it the other way, in
which case this is fine.

> > > diff --git a/tools/libxc/xc_domain_restore.c
> > b/tools/libxc/xc_domain_restore.c
> > > index 1f6ce50..3116653 100644
> > > --- a/tools/libxc/xc_domain_restore.c
> > > +++ b/tools/libxc/xc_domain_restore.c
> > > @@ -746,6 +746,7 @@ typedef struct {
> > >      uint64_t acpi_ioport_location;
> > >      uint64_t viridian;
> > >      uint64_t vm_generationid_addr;
> > > +    uint64_t nr_ioreq_servers;
> > 
> > This makes me wonder: what happens if the source and target hosts do
> > different amounts of disaggregation? Perhaps in Xen N+1 we split some
> > additional component out into its own process?
> > 
> > This is going to be complex with the allocation of space for special
> > pages, isn't it?
> >
> 
> As long as we have enough special pages then is it complex?

The "have enough" is where the complexity comes in though. If Xen
version X needed N special pages and Xen X+1 needs N+2 pages then we
have a tricky situation because people may well configure the guest with
N.

>  All Xen needs to know is the base ioreq server pfn and how many the
> VM has. I'm overloading the existing HVM param as the base and then
> adding a new one for the count. George (as I understand it) suggested
> leaving the old params alone, grandfathering them in for the catch-all
> server, and then having a new area for secondary emulators. I'm happy
> with that and, as long as suitable save records were introduced for
> any other dissag. work, then I don't think there's any conflict.

> > > +
> > > +        chunk.id = XC_SAVE_ID_HVM_NR_IOREQ_SERVERS;
> > > +        chunk.data = 0;
> > > +        xc_get_hvm_param(xch, dom, HVM_PARAM_NR_IOREQ_SERVERS,
> > > +                         (unsigned long *)&chunk.data);
> > > +
> > > +        if ( (chunk.data != 0) &&
> > 
> > Can this ever be 0 for an HVM guest?
> > 
> 
> I was assuming it would 0 for a guest migrated in from a host that did
> not know about secondary emulators.

Would that host even include this chunk type at all?

> > > +    evtchn_port_t buf_port; /* OUT - buf ioreq port */
> > > +};
> > > +typedef struct xen_hvm_get_ioreq_server_info
> > xen_hvm_get_ioreq_server_info_t;
> > > +DEFINE_XEN_GUEST_HANDLE(xen_hvm_get_ioreq_server_info_t);
> > > +
> > > +#define HVMOP_map_io_range_to_ioreq_server 19
> > > +struct xen_hvm_map_io_range_to_ioreq_server {
> > > +    domid_t domid;                  /* IN - domain to be serviced */
> > > +    ioservid_t id;                  /* IN - handle from
> > HVMOP_register_ioreq_server */
> > > +    int is_mmio;                    /* IN - MMIO or port IO? */
> > 
> > Do we normally make this distinction via two different hypercalls?
> > 
> 
> I don't really think there's much precedent at an API layer. There's
> code inside Xen which overloads portio and mmio to some degree.

XEN_DOMCTL_ioport_mapping vs. XEN_DOMCTL_memory_mapping is the closest
analogue I think.

It's not really clear to me when DOMCTL vs HVMOP is appropriate, maybe
one of the h/v side folks will comment.

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 5/6] ioreq-server: add support for multiple servers
  2014-03-17 12:35       ` Ian Campbell
@ 2014-03-17 12:51         ` Paul Durrant
  2014-03-17 12:53           ` Ian Campbell
  0 siblings, 1 reply; 48+ messages in thread
From: Paul Durrant @ 2014-03-17 12:51 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel

> -----Original Message-----
> From: Ian Campbell
> Sent: 17 March 2014 12:36
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH v3 5/6] ioreq-server: add support for
> multiple servers
> 
> On Mon, 2014-03-17 at 12:25 +0000, Paul Durrant wrote:
> > > drivers in the guest.
> > > >  This parameter only takes effect when device_model_version=qemu-
> xen.
> > > >  See F<docs/misc/pci-device-reservations.txt> for more information.
> > > >
> > > > +=item B<secondary_device_emulators=NUMBER>
> > > > +
> > > > +If a number of secondary device emulators (i.e. in addition to
> > > > +qemu-xen or qemu-xen-traditional) are to be invoked to support the
> > > > +guest then this parameter can be set with the count of how many are
> > > > +to be used. The default value is zero.
> > >
> > > This is an odd thing to expose to the user. Surely (lib)xl should be
> > > launching these things while building the domain and therefore know
> how
> > > many there are the config options should be things like
> > > "use_split_dm_for_foo=1" or device_emulators = ["/usr/bin/first-dm",
> > > "/usr/local/bin/second-dm"] or something.
> > >
> >
> > As George says, I don’t want to restrict the only way to kick off
> > emulators to being libxl at this point.
> 
> OK. There's conversely no support here for libxl launching them either
> though?
> 

Not in this series. It is something I plan to do but, with the hotplug support, it's quite easy just to kick off a secondary emulator from a shell and that serves my purposes for now.

> >
> > > > +
> > > >  =back
> > > >
> > > >  =head2 Device-Model Options
> > > > diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
> > > > index 369c3f3..dfa905b 100644
> > > > --- a/tools/libxc/xc_domain.c
> > > > +++ b/tools/libxc/xc_domain.c
> > > > +int xc_hvm_get_ioreq_server_info(xc_interface *xch,
> > > > +                                 domid_t domid,
> > > > +                                 ioservid_t id,
> > > > +                                 xen_pfn_t *pfn,
> > > > +                                 xen_pfn_t *buf_pfn,
> > > > +                                 evtchn_port_t *buf_port)
> > > > +{
> > > > +    DECLARE_HYPERCALL;
> > > > +    DECLARE_HYPERCALL_BUFFER(xen_hvm_get_ioreq_server_info_t,
> > > arg);
> > > > +    int rc;
> > > > +
> > > > +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> > > > +    if ( arg == NULL )
> > > > +        return -1;
> > > > +
> > > > +    hypercall.op     = __HYPERVISOR_hvm_op;
> > > > +    hypercall.arg[0] = HVMOP_get_ioreq_server_info;
> > > > +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> > > > +    arg->domid = domid;
> > > > +    arg->id = id;
> > > > +    rc = do_xen_hypercall(xch, &hypercall);
> > > > +    if ( rc != 0 )
> > > > +        goto done;
> > > > +
> > > > +    if ( pfn )
> > > > +        *pfn = arg->pfn;
> > > > +
> > > > +    if ( buf_pfn )
> > > > +        *buf_pfn = arg->buf_pfn;
> > > > +
> > > > +    if ( buf_port )
> > > > +        *buf_port = arg->buf_port;
> > >
> > > This looks a bit like this function should take a
> > > xen_hvm_get_ioreq_server_info_t* and use the bounce buffering stuff.
> > > Unless there is some desire to hide that struct from the callers
> > > perhaps?
> > >
> >
> > Well, I guess the caller could do the marshalling but isn't it neater
> > to hide it all in libxenctrl internals? I don't know what the general
> > philosophy behind libxenctrl apis is. Most of the other functions seem
> > to take an xc_interface and a bunch of args rather than a single
> > struct.
> 
> I guess it depends what the caller is most likely to do -- if it uses
> them immediately and throws them away then this way is fine, if it's
> going to pass them around then the existing struct seems useful. I
> presume it is the former or you'd have written it the other way, in
> which case this is fine.
> 

Yes, that's right.

> > > > diff --git a/tools/libxc/xc_domain_restore.c
> > > b/tools/libxc/xc_domain_restore.c
> > > > index 1f6ce50..3116653 100644
> > > > --- a/tools/libxc/xc_domain_restore.c
> > > > +++ b/tools/libxc/xc_domain_restore.c
> > > > @@ -746,6 +746,7 @@ typedef struct {
> > > >      uint64_t acpi_ioport_location;
> > > >      uint64_t viridian;
> > > >      uint64_t vm_generationid_addr;
> > > > +    uint64_t nr_ioreq_servers;
> > >
> > > This makes me wonder: what happens if the source and target hosts do
> > > different amounts of disaggregation? Perhaps in Xen N+1 we split some
> > > additional component out into its own process?
> > >
> > > This is going to be complex with the allocation of space for special
> > > pages, isn't it?
> > >
> >
> > As long as we have enough special pages then is it complex?
> 
> The "have enough" is where the complexity comes in though. If Xen
> version X needed N special pages and Xen X+1 needs N+2 pages then we
> have a tricky situation because people may well configure the guest with
> N.
> 

I don't quite follow. The specials are just part of the guest image and so they get migrated around with that guest, so providing we know how many special pages a guest had when it was created (so we know how many there are to play with for secondary emulation) there's no problem is there? The 'max emulators' save record essentially tells us that - base and range records would call it out more explicitly.

> >  All Xen needs to know is the base ioreq server pfn and how many the
> > VM has. I'm overloading the existing HVM param as the base and then
> > adding a new one for the count. George (as I understand it) suggested
> > leaving the old params alone, grandfathering them in for the catch-all
> > server, and then having a new area for secondary emulators. I'm happy
> > with that and, as long as suitable save records were introduced for
> > any other dissag. work, then I don't think there's any conflict.
> 
> > > > +
> > > > +        chunk.id = XC_SAVE_ID_HVM_NR_IOREQ_SERVERS;
> > > > +        chunk.data = 0;
> > > > +        xc_get_hvm_param(xch, dom,
> HVM_PARAM_NR_IOREQ_SERVERS,
> > > > +                         (unsigned long *)&chunk.data);
> > > > +
> > > > +        if ( (chunk.data != 0) &&
> > >
> > > Can this ever be 0 for an HVM guest?
> > >
> >
> > I was assuming it would 0 for a guest migrated in from a host that did
> > not know about secondary emulators.
> 
> Would that host even include this chunk type at all?
> 

Sorry, this is the save end of things isn't it... The answer is no, the value can never legitimately be 0 for an HVM guest - it has to be at least 1.

> > > > +    evtchn_port_t buf_port; /* OUT - buf ioreq port */
> > > > +};
> > > > +typedef struct xen_hvm_get_ioreq_server_info
> > > xen_hvm_get_ioreq_server_info_t;
> > > > +DEFINE_XEN_GUEST_HANDLE(xen_hvm_get_ioreq_server_info_t);
> > > > +
> > > > +#define HVMOP_map_io_range_to_ioreq_server 19
> > > > +struct xen_hvm_map_io_range_to_ioreq_server {
> > > > +    domid_t domid;                  /* IN - domain to be serviced */
> > > > +    ioservid_t id;                  /* IN - handle from
> > > HVMOP_register_ioreq_server */
> > > > +    int is_mmio;                    /* IN - MMIO or port IO? */
> > >
> > > Do we normally make this distinction via two different hypercalls?
> > >
> >
> > I don't really think there's much precedent at an API layer. There's
> > code inside Xen which overloads portio and mmio to some degree.
> 
> XEN_DOMCTL_ioport_mapping vs. XEN_DOMCTL_memory_mapping is the
> closest
> analogue I think.
> 
> It's not really clear to me when DOMCTL vs HVMOP is appropriate, maybe
> one of the h/v side folks will comment.
> 

Ok. HVMOP seems most appropriate to me since we're dealing with something very HVM-specific.

  Paul
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 5/6] ioreq-server: add support for multiple servers
  2014-03-17 12:51         ` Paul Durrant
@ 2014-03-17 12:53           ` Ian Campbell
  2014-03-17 13:56             ` Paul Durrant
  0 siblings, 1 reply; 48+ messages in thread
From: Ian Campbell @ 2014-03-17 12:53 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel

On Mon, 2014-03-17 at 12:51 +0000, Paul Durrant wrote:

> > > > > diff --git a/tools/libxc/xc_domain_restore.c
> > > > b/tools/libxc/xc_domain_restore.c
> > > > > index 1f6ce50..3116653 100644
> > > > > --- a/tools/libxc/xc_domain_restore.c
> > > > > +++ b/tools/libxc/xc_domain_restore.c
> > > > > @@ -746,6 +746,7 @@ typedef struct {
> > > > >      uint64_t acpi_ioport_location;
> > > > >      uint64_t viridian;
> > > > >      uint64_t vm_generationid_addr;
> > > > > +    uint64_t nr_ioreq_servers;
> > > >
> > > > This makes me wonder: what happens if the source and target hosts do
> > > > different amounts of disaggregation? Perhaps in Xen N+1 we split some
> > > > additional component out into its own process?
> > > >
> > > > This is going to be complex with the allocation of space for special
> > > > pages, isn't it?
> > > >
> > >
> > > As long as we have enough special pages then is it complex?
> > 
> > The "have enough" is where the complexity comes in though. If Xen
> > version X needed N special pages and Xen X+1 needs N+2 pages then we
> > have a tricky situation because people may well configure the guest with
> > N.
> > 
> 
> I don't quite follow. The specials are just part of the guest image
> and so they get migrated around with that guest, so providing we know
> how many special pages a guest had when it was created (so we know how
> many there are to play with for secondary emulation) there's no
> problem is there?

What if the newer version of Xen requires more secondaries than the
older one? That's the case I'm thinking of.

Ian.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 5/6] ioreq-server: add support for multiple servers
  2014-03-17 12:53           ` Ian Campbell
@ 2014-03-17 13:56             ` Paul Durrant
  2014-03-17 14:44               ` Ian Campbell
  0 siblings, 1 reply; 48+ messages in thread
From: Paul Durrant @ 2014-03-17 13:56 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel

> -----Original Message-----
> From: Ian Campbell
> Sent: 17 March 2014 12:54
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH v3 5/6] ioreq-server: add support for
> multiple servers
> 
> On Mon, 2014-03-17 at 12:51 +0000, Paul Durrant wrote:
> 
> > > > > > diff --git a/tools/libxc/xc_domain_restore.c
> > > > > b/tools/libxc/xc_domain_restore.c
> > > > > > index 1f6ce50..3116653 100644
> > > > > > --- a/tools/libxc/xc_domain_restore.c
> > > > > > +++ b/tools/libxc/xc_domain_restore.c
> > > > > > @@ -746,6 +746,7 @@ typedef struct {
> > > > > >      uint64_t acpi_ioport_location;
> > > > > >      uint64_t viridian;
> > > > > >      uint64_t vm_generationid_addr;
> > > > > > +    uint64_t nr_ioreq_servers;
> > > > >
> > > > > This makes me wonder: what happens if the source and target hosts
> do
> > > > > different amounts of disaggregation? Perhaps in Xen N+1 we split
> some
> > > > > additional component out into its own process?
> > > > >
> > > > > This is going to be complex with the allocation of space for special
> > > > > pages, isn't it?
> > > > >
> > > >
> > > > As long as we have enough special pages then is it complex?
> > >
> > > The "have enough" is where the complexity comes in though. If Xen
> > > version X needed N special pages and Xen X+1 needs N+2 pages then we
> > > have a tricky situation because people may well configure the guest with
> > > N.
> > >
> >
> > I don't quite follow. The specials are just part of the guest image
> > and so they get migrated around with that guest, so providing we know
> > how many special pages a guest had when it was created (so we know how
> > many there are to play with for secondary emulation) there's no
> > problem is there?
> 
> What if the newer version of Xen requires more secondaries than the
> older one? That's the case I'm thinking of.
> 

I see. I guess the only other option is to put the pfns somewhere that we can always grow (within reason). The guest itself never maps these pfns, only the emulator, but they should be part of the guest's allocation. Is there somewhere else in the p2m that they could live such that we can grow the space even for migrated-in guests? Somewhere just above the top of RAM perhaps?

  Paul

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 5/6] ioreq-server: add support for multiple servers
  2014-03-17 13:56             ` Paul Durrant
@ 2014-03-17 14:44               ` Ian Campbell
  2014-03-17 14:52                 ` Paul Durrant
  0 siblings, 1 reply; 48+ messages in thread
From: Ian Campbell @ 2014-03-17 14:44 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel

On Mon, 2014-03-17 at 13:56 +0000, Paul Durrant wrote:
> > -----Original Message-----
> > From: Ian Campbell
> > Sent: 17 March 2014 12:54
> > To: Paul Durrant
> > Cc: xen-devel@lists.xen.org
> > Subject: Re: [Xen-devel] [PATCH v3 5/6] ioreq-server: add support for
> > multiple servers
> > 
> > On Mon, 2014-03-17 at 12:51 +0000, Paul Durrant wrote:
> > 
> > > > > > > diff --git a/tools/libxc/xc_domain_restore.c
> > > > > > b/tools/libxc/xc_domain_restore.c
> > > > > > > index 1f6ce50..3116653 100644
> > > > > > > --- a/tools/libxc/xc_domain_restore.c
> > > > > > > +++ b/tools/libxc/xc_domain_restore.c
> > > > > > > @@ -746,6 +746,7 @@ typedef struct {
> > > > > > >      uint64_t acpi_ioport_location;
> > > > > > >      uint64_t viridian;
> > > > > > >      uint64_t vm_generationid_addr;
> > > > > > > +    uint64_t nr_ioreq_servers;
> > > > > >
> > > > > > This makes me wonder: what happens if the source and target hosts
> > do
> > > > > > different amounts of disaggregation? Perhaps in Xen N+1 we split
> > some
> > > > > > additional component out into its own process?
> > > > > >
> > > > > > This is going to be complex with the allocation of space for special
> > > > > > pages, isn't it?
> > > > > >
> > > > >
> > > > > As long as we have enough special pages then is it complex?
> > > >
> > > > The "have enough" is where the complexity comes in though. If Xen
> > > > version X needed N special pages and Xen X+1 needs N+2 pages then we
> > > > have a tricky situation because people may well configure the guest with
> > > > N.
> > > >
> > >
> > > I don't quite follow. The specials are just part of the guest image
> > > and so they get migrated around with that guest, so providing we know
> > > how many special pages a guest had when it was created (so we know how
> > > many there are to play with for secondary emulation) there's no
> > > problem is there?
> > 
> > What if the newer version of Xen requires more secondaries than the
> > older one? That's the case I'm thinking of.
> > 
> 
> I see. I guess the only other option is to put the pfns somewhere that
> we can always grow (within reason). The guest itself never maps these
> pfns, only the emulator, but they should be part of the guest's
> allocation. Is there somewhere else in the p2m that they could live
> such that we can grow the space even for migrated-in guests? Somewhere
> just above the top of RAM perhaps?

It's always struck me as odd to have a Xen<->DM communication channel
sitting there in guest pfn space (regardless of who the nominal owner
is).

I don't suppose there is any way to pull these pages out of the guest
pfn space while still accounting them to the guest. Or if there is it
would probably be a whole other kettle of fish than this series.

Ian.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 5/6] ioreq-server: add support for multiple servers
  2014-03-17 14:44               ` Ian Campbell
@ 2014-03-17 14:52                 ` Paul Durrant
  2014-03-17 14:55                   ` Ian Campbell
  2014-03-20 11:11                   ` Tim Deegan
  0 siblings, 2 replies; 48+ messages in thread
From: Paul Durrant @ 2014-03-17 14:52 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel

> -----Original Message-----
> From: Ian Campbell
> Sent: 17 March 2014 14:44
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH v3 5/6] ioreq-server: add support for
> multiple servers
> 
> On Mon, 2014-03-17 at 13:56 +0000, Paul Durrant wrote:
> > > -----Original Message-----
> > > From: Ian Campbell
> > > Sent: 17 March 2014 12:54
> > > To: Paul Durrant
> > > Cc: xen-devel@lists.xen.org
> > > Subject: Re: [Xen-devel] [PATCH v3 5/6] ioreq-server: add support for
> > > multiple servers
> > >
> > > On Mon, 2014-03-17 at 12:51 +0000, Paul Durrant wrote:
> > >
> > > > > > > > diff --git a/tools/libxc/xc_domain_restore.c
> > > > > > > b/tools/libxc/xc_domain_restore.c
> > > > > > > > index 1f6ce50..3116653 100644
> > > > > > > > --- a/tools/libxc/xc_domain_restore.c
> > > > > > > > +++ b/tools/libxc/xc_domain_restore.c
> > > > > > > > @@ -746,6 +746,7 @@ typedef struct {
> > > > > > > >      uint64_t acpi_ioport_location;
> > > > > > > >      uint64_t viridian;
> > > > > > > >      uint64_t vm_generationid_addr;
> > > > > > > > +    uint64_t nr_ioreq_servers;
> > > > > > >
> > > > > > > This makes me wonder: what happens if the source and target
> hosts
> > > do
> > > > > > > different amounts of disaggregation? Perhaps in Xen N+1 we split
> > > some
> > > > > > > additional component out into its own process?
> > > > > > >
> > > > > > > This is going to be complex with the allocation of space for special
> > > > > > > pages, isn't it?
> > > > > > >
> > > > > >
> > > > > > As long as we have enough special pages then is it complex?
> > > > >
> > > > > The "have enough" is where the complexity comes in though. If Xen
> > > > > version X needed N special pages and Xen X+1 needs N+2 pages then
> we
> > > > > have a tricky situation because people may well configure the guest
> with
> > > > > N.
> > > > >
> > > >
> > > > I don't quite follow. The specials are just part of the guest image
> > > > and so they get migrated around with that guest, so providing we know
> > > > how many special pages a guest had when it was created (so we know
> how
> > > > many there are to play with for secondary emulation) there's no
> > > > problem is there?
> > >
> > > What if the newer version of Xen requires more secondaries than the
> > > older one? That's the case I'm thinking of.
> > >
> >
> > I see. I guess the only other option is to put the pfns somewhere that
> > we can always grow (within reason). The guest itself never maps these
> > pfns, only the emulator, but they should be part of the guest's
> > allocation. Is there somewhere else in the p2m that they could live
> > such that we can grow the space even for migrated-in guests? Somewhere
> > just above the top of RAM perhaps?
> 
> It's always struck me as odd to have a Xen<->DM communication channel
> sitting there in guest pfn space (regardless of who the nominal owner
> is).
> 
> I don't suppose there is any way to pull these pages out of the guest
> pfn space while still accounting them to the guest. Or if there is it
> would probably be a whole other kettle of fish than this series.
> 

The closest analogy I can think of accounting-wise would be shadow pages. I'll have a look at how they are handled.

  Paul

> Ian.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 5/6] ioreq-server: add support for multiple servers
  2014-03-17 14:52                 ` Paul Durrant
@ 2014-03-17 14:55                   ` Ian Campbell
  2014-03-18 11:33                     ` Paul Durrant
  2014-03-20 11:11                   ` Tim Deegan
  1 sibling, 1 reply; 48+ messages in thread
From: Ian Campbell @ 2014-03-17 14:55 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel

On Mon, 2014-03-17 at 14:52 +0000, Paul Durrant wrote:
> > -----Original Message-----
> > From: Ian Campbell

> > It's always struck me as odd to have a Xen<->DM communication channel
> > sitting there in guest pfn space (regardless of who the nominal owner
> > is).
> > 
> > I don't suppose there is any way to pull these pages out of the guest
> > pfn space while still accounting them to the guest. Or if there is it
> > would probably be a whole other kettle of fish than this series.
> > 
> 
> The closest analogy I can think of accounting-wise would be shadow
> pages. I'll have a look at how they are handled.

I think the big difference is that noone outside Xen needs to be able to
refer to a shadow page, whereas the device models need some sort of
handle onto the ring to be able to map them etc. Not insurmountable I
suppose.

Ian.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 5/6] ioreq-server: add support for multiple servers
  2014-03-17 14:55                   ` Ian Campbell
@ 2014-03-18 11:33                     ` Paul Durrant
  2014-03-18 13:24                       ` George Dunlap
  0 siblings, 1 reply; 48+ messages in thread
From: Paul Durrant @ 2014-03-18 11:33 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel

> -----Original Message-----
> From: Ian Campbell
> Sent: 17 March 2014 14:56
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH v3 5/6] ioreq-server: add support for
> multiple servers
> 
> On Mon, 2014-03-17 at 14:52 +0000, Paul Durrant wrote:
> > > -----Original Message-----
> > > From: Ian Campbell
> 
> > > It's always struck me as odd to have a Xen<->DM communication channel
> > > sitting there in guest pfn space (regardless of who the nominal owner
> > > is).
> > >
> > > I don't suppose there is any way to pull these pages out of the guest
> > > pfn space while still accounting them to the guest. Or if there is it
> > > would probably be a whole other kettle of fish than this series.
> > >
> >
> > The closest analogy I can think of accounting-wise would be shadow
> > pages. I'll have a look at how they are handled.
> 
> I think the big difference is that noone outside Xen needs to be able to
> refer to a shadow page, whereas the device models need some sort of
> handle onto the ring to be able to map them etc. Not insurmountable I
> suppose.
> 

Probably not, but it's looking like it will be a bit of a can of worms. Are you ok with sticking to base+range HVM params for secondary emulators that can potentially be moved on migration for now? I.e. the save image just contains a count. There's still some growth room in the existing area (all pages from FE800000 to FF000000 AFACT) so as long as - as George said - we don't bake the PFN layout in, I don’t think we preclude moving the emulator PFNs around in future.

  Paul
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 5/6] ioreq-server: add support for multiple servers
  2014-03-18 11:33                     ` Paul Durrant
@ 2014-03-18 13:24                       ` George Dunlap
  2014-03-18 13:38                         ` Paul Durrant
  2014-03-18 13:45                         ` Paul Durrant
  0 siblings, 2 replies; 48+ messages in thread
From: George Dunlap @ 2014-03-18 13:24 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Ian Campbell, xen-devel

On Tue, Mar 18, 2014 at 11:33 AM, Paul Durrant <Paul.Durrant@citrix.com> wrote:
>> -----Original Message-----
>> From: Ian Campbell
>> Sent: 17 March 2014 14:56
>> To: Paul Durrant
>> Cc: xen-devel@lists.xen.org
>> Subject: Re: [Xen-devel] [PATCH v3 5/6] ioreq-server: add support for
>> multiple servers
>>
>> On Mon, 2014-03-17 at 14:52 +0000, Paul Durrant wrote:
>> > > -----Original Message-----
>> > > From: Ian Campbell
>>
>> > > It's always struck me as odd to have a Xen<->DM communication channel
>> > > sitting there in guest pfn space (regardless of who the nominal owner
>> > > is).
>> > >
>> > > I don't suppose there is any way to pull these pages out of the guest
>> > > pfn space while still accounting them to the guest. Or if there is it
>> > > would probably be a whole other kettle of fish than this series.
>> > >
>> >
>> > The closest analogy I can think of accounting-wise would be shadow
>> > pages. I'll have a look at how they are handled.
>>
>> I think the big difference is that noone outside Xen needs to be able to
>> refer to a shadow page, whereas the device models need some sort of
>> handle onto the ring to be able to map them etc. Not insurmountable I
>> suppose.
>>
>
> Probably not, but it's looking like it will be a bit of a can of worms. Are you ok with sticking to base+range HVM params for secondary emulators that can potentially be moved on migration for now? I.e. the save image just contains a count. There's still some growth room in the existing area (all pages from FE800000 to FF000000 AFACT) so as long as - as George said - we don't bake the PFN layout in, I don't think we preclude moving the emulator PFNs around in future.

xentrace has to share pages between Xen and dom0; it just exposes an
interface for dom0 to get the mfn and then maps those mfns.  Couldn't
you do something similar?  When you create an ioreq server, Xen could
allocate the pages internally; and then you could use
hvm_get_ioreq_server_info to get the MFNs, and xc_map_foreign_range()
to map them.  Am I missing something?

 -George

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 5/6] ioreq-server: add support for multiple servers
  2014-03-18 13:24                       ` George Dunlap
@ 2014-03-18 13:38                         ` Paul Durrant
  2014-03-18 13:45                         ` Paul Durrant
  1 sibling, 0 replies; 48+ messages in thread
From: Paul Durrant @ 2014-03-18 13:38 UTC (permalink / raw)
  To: George Dunlap; +Cc: Ian Campbell, xen-devel

> -----Original Message-----
> From: dunlapg@gmail.com [mailto:dunlapg@gmail.com] On Behalf Of
> George Dunlap
> Sent: 18 March 2014 13:24
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; Ian Campbell
> Subject: Re: [Xen-devel] [PATCH v3 5/6] ioreq-server: add support for
> multiple servers
> 
> On Tue, Mar 18, 2014 at 11:33 AM, Paul Durrant <Paul.Durrant@citrix.com>
> wrote:
> >> -----Original Message-----
> >> From: Ian Campbell
> >> Sent: 17 March 2014 14:56
> >> To: Paul Durrant
> >> Cc: xen-devel@lists.xen.org
> >> Subject: Re: [Xen-devel] [PATCH v3 5/6] ioreq-server: add support for
> >> multiple servers
> >>
> >> On Mon, 2014-03-17 at 14:52 +0000, Paul Durrant wrote:
> >> > > -----Original Message-----
> >> > > From: Ian Campbell
> >>
> >> > > It's always struck me as odd to have a Xen<->DM communication
> channel
> >> > > sitting there in guest pfn space (regardless of who the nominal owner
> >> > > is).
> >> > >
> >> > > I don't suppose there is any way to pull these pages out of the guest
> >> > > pfn space while still accounting them to the guest. Or if there is it
> >> > > would probably be a whole other kettle of fish than this series.
> >> > >
> >> >
> >> > The closest analogy I can think of accounting-wise would be shadow
> >> > pages. I'll have a look at how they are handled.
> >>
> >> I think the big difference is that noone outside Xen needs to be able to
> >> refer to a shadow page, whereas the device models need some sort of
> >> handle onto the ring to be able to map them etc. Not insurmountable I
> >> suppose.
> >>
> >
> > Probably not, but it's looking like it will be a bit of a can of worms. Are you
> ok with sticking to base+range HVM params for secondary emulators that can
> potentially be moved on migration for now? I.e. the save image just contains
> a count. There's still some growth room in the existing area (all pages from
> FE800000 to FF000000 AFACT) so as long as - as George said - we don't bake
> the PFN layout in, I don't think we preclude moving the emulator PFNs
> around in future.
> 
> xentrace has to share pages between Xen and dom0; it just exposes an
> interface for dom0 to get the mfn and then maps those mfns.  Couldn't
> you do something similar?  When you create an ioreq server, Xen could
> allocate the pages internally; and then you could use
> hvm_get_ioreq_server_info to get the MFNs, and xc_map_foreign_range()
> to map them.  Am I missing something?
> 

If you use xc_map_foreign_range() then presumably the page is still in the p2m. AFAIK the value supplied to xc_map_foreign_range() is still a guest frame number isn't it?

  Paul

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 5/6] ioreq-server: add support for multiple servers
  2014-03-18 13:24                       ` George Dunlap
  2014-03-18 13:38                         ` Paul Durrant
@ 2014-03-18 13:45                         ` Paul Durrant
  1 sibling, 0 replies; 48+ messages in thread
From: Paul Durrant @ 2014-03-18 13:45 UTC (permalink / raw)
  To: George Dunlap; +Cc: Ian Campbell, xen-devel

> -----Original Message-----
> From: Paul Durrant
> Sent: 18 March 2014 13:39
> To: 'George Dunlap'
> Cc: xen-devel@lists.xen.org; Ian Campbell
> Subject: RE: [Xen-devel] [PATCH v3 5/6] ioreq-server: add support for
> multiple servers
> 
> > -----Original Message-----
> > From: dunlapg@gmail.com [mailto:dunlapg@gmail.com] On Behalf Of
> > George Dunlap
> > Sent: 18 March 2014 13:24
> > To: Paul Durrant
> > Cc: xen-devel@lists.xen.org; Ian Campbell
> > Subject: Re: [Xen-devel] [PATCH v3 5/6] ioreq-server: add support for
> > multiple servers
> >
> > On Tue, Mar 18, 2014 at 11:33 AM, Paul Durrant <Paul.Durrant@citrix.com>
> > wrote:
> > >> -----Original Message-----
> > >> From: Ian Campbell
> > >> Sent: 17 March 2014 14:56
> > >> To: Paul Durrant
> > >> Cc: xen-devel@lists.xen.org
> > >> Subject: Re: [Xen-devel] [PATCH v3 5/6] ioreq-server: add support for
> > >> multiple servers
> > >>
> > >> On Mon, 2014-03-17 at 14:52 +0000, Paul Durrant wrote:
> > >> > > -----Original Message-----
> > >> > > From: Ian Campbell
> > >>
> > >> > > It's always struck me as odd to have a Xen<->DM communication
> > channel
> > >> > > sitting there in guest pfn space (regardless of who the nominal
> owner
> > >> > > is).
> > >> > >
> > >> > > I don't suppose there is any way to pull these pages out of the guest
> > >> > > pfn space while still accounting them to the guest. Or if there is it
> > >> > > would probably be a whole other kettle of fish than this series.
> > >> > >
> > >> >
> > >> > The closest analogy I can think of accounting-wise would be shadow
> > >> > pages. I'll have a look at how they are handled.
> > >>
> > >> I think the big difference is that noone outside Xen needs to be able to
> > >> refer to a shadow page, whereas the device models need some sort of
> > >> handle onto the ring to be able to map them etc. Not insurmountable I
> > >> suppose.
> > >>
> > >
> > > Probably not, but it's looking like it will be a bit of a can of worms. Are you
> > ok with sticking to base+range HVM params for secondary emulators that
> can
> > potentially be moved on migration for now? I.e. the save image just
> contains
> > a count. There's still some growth room in the existing area (all pages from
> > FE800000 to FF000000 AFACT) so as long as - as George said - we don't bake
> > the PFN layout in, I don't think we preclude moving the emulator PFNs
> > around in future.
> >
> > xentrace has to share pages between Xen and dom0; it just exposes an
> > interface for dom0 to get the mfn and then maps those mfns.  Couldn't
> > you do something similar?  When you create an ioreq server, Xen could
> > allocate the pages internally; and then you could use
> > hvm_get_ioreq_server_info to get the MFNs, and
> xc_map_foreign_range()
> > to map them.  Am I missing something?
> >
> 
> If you use xc_map_foreign_range() then presumably the page is still in the
> p2m. AFAIK the value supplied to xc_map_foreign_range() is still a guest
> frame number isn't it?

Ah, I didn't know about mapping using DOMID_XEN. That looks ok then :-)

  Paul

> 
>   Paul

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 5/6] ioreq-server: add support for multiple servers
  2014-03-17 14:52                 ` Paul Durrant
  2014-03-17 14:55                   ` Ian Campbell
@ 2014-03-20 11:11                   ` Tim Deegan
  2014-03-20 11:22                     ` Paul Durrant
  1 sibling, 1 reply; 48+ messages in thread
From: Tim Deegan @ 2014-03-20 11:11 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Ian Campbell, xen-devel

At 14:52 +0000 on 17 Mar (1395064322), Paul Durrant wrote:
> > -----Original Message-----
> > From: Ian Campbell
> > I don't suppose there is any way to pull these pages out of the guest
> > pfn space while still accounting them to the guest. Or if there is it
> > would probably be a whole other kettle of fish than this series.
> > 
> 
> The closest analogy I can think of accounting-wise would be shadow
> pages. I'll have a look at how they are handled.

It should be much simpler than shadow pages _provided_ that nobody
adds a XENMAPSPACE namespace that uses real MFNs.  Shadow pages have
to be accounted weirdly becaus ethey can't be owned by the guest or PV
guests could just map them.

As long as we stick to the rule that all HVM-guest operations have to
go through the p2m, and we don't let the guest map pages by MFN, then
removing it from the p2m is enough to stop the guest accessing it.

Of course, for security purposes we ought to treat the device-model
process/stubdom as being under guest control anyway - hence the
stubdom qemu.

Tim.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 5/6] ioreq-server: add support for multiple servers
  2014-03-20 11:11                   ` Tim Deegan
@ 2014-03-20 11:22                     ` Paul Durrant
  2014-03-20 12:10                       ` Paul Durrant
  0 siblings, 1 reply; 48+ messages in thread
From: Paul Durrant @ 2014-03-20 11:22 UTC (permalink / raw)
  To: Tim (Xen.org); +Cc: Ian Campbell, xen-devel

> -----Original Message-----
> From: Tim Deegan [mailto:tim@xen.org]
> Sent: 20 March 2014 11:12
> To: Paul Durrant
> Cc: Ian Campbell; xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH v3 5/6] ioreq-server: add support for
> multiple servers
> 
> At 14:52 +0000 on 17 Mar (1395064322), Paul Durrant wrote:
> > > -----Original Message-----
> > > From: Ian Campbell
> > > I don't suppose there is any way to pull these pages out of the guest
> > > pfn space while still accounting them to the guest. Or if there is it
> > > would probably be a whole other kettle of fish than this series.
> > >
> >
> > The closest analogy I can think of accounting-wise would be shadow
> > pages. I'll have a look at how they are handled.
> 
> It should be much simpler than shadow pages _provided_ that nobody
> adds a XENMAPSPACE namespace that uses real MFNs.  Shadow pages have
> to be accounted weirdly becaus ethey can't be owned by the guest or PV
> guests could just map them.
> 
> As long as we stick to the rule that all HVM-guest operations have to
> go through the p2m, and we don't let the guest map pages by MFN, then
> removing it from the p2m is enough to stop the guest accessing it.
>

My plan is to use alloc_domheap_pages() for secondary emulators so that the pages are accounted to the guest, but never add those pages to the p2m. For the default emulator I'll use the existing specials, for compatibility.
My concern is that this will limit secondary emulators to running in dom0, since they'll need to use DOMID_XEN to map the pages. 

  Paul
 
> Of course, for security purposes we ought to treat the device-model
> process/stubdom as being under guest control anyway - hence the
> stubdom qemu.
> 
> Tim.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 5/6] ioreq-server: add support for multiple servers
  2014-03-20 11:22                     ` Paul Durrant
@ 2014-03-20 12:10                       ` Paul Durrant
  0 siblings, 0 replies; 48+ messages in thread
From: Paul Durrant @ 2014-03-20 12:10 UTC (permalink / raw)
  To: Paul Durrant, Tim (Xen.org); +Cc: Ian Campbell, xen-devel

> -----Original Message-----
> From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-
> bounces@lists.xen.org] On Behalf Of Paul Durrant
> Sent: 20 March 2014 11:22
> To: Tim (Xen.org)
> Cc: Ian Campbell; xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH v3 5/6] ioreq-server: add support for
> multiple servers
> 
> > -----Original Message-----
> > From: Tim Deegan [mailto:tim@xen.org]
> > Sent: 20 March 2014 11:12
> > To: Paul Durrant
> > Cc: Ian Campbell; xen-devel@lists.xen.org
> > Subject: Re: [Xen-devel] [PATCH v3 5/6] ioreq-server: add support for
> > multiple servers
> >
> > At 14:52 +0000 on 17 Mar (1395064322), Paul Durrant wrote:
> > > > -----Original Message-----
> > > > From: Ian Campbell
> > > > I don't suppose there is any way to pull these pages out of the guest
> > > > pfn space while still accounting them to the guest. Or if there is it
> > > > would probably be a whole other kettle of fish than this series.
> > > >
> > >
> > > The closest analogy I can think of accounting-wise would be shadow
> > > pages. I'll have a look at how they are handled.
> >
> > It should be much simpler than shadow pages _provided_ that nobody
> > adds a XENMAPSPACE namespace that uses real MFNs.  Shadow pages
> have
> > to be accounted weirdly becaus ethey can't be owned by the guest or PV
> > guests could just map them.
> >
> > As long as we stick to the rule that all HVM-guest operations have to
> > go through the p2m, and we don't let the guest map pages by MFN, then
> > removing it from the p2m is enough to stop the guest accessing it.
> >
> 
> My plan is to use alloc_domheap_pages() for secondary emulators so that
> the pages are accounted to the guest, but never add those pages to the
> p2m. For the default emulator I'll use the existing specials, for compatibility.
> My concern is that this will limit secondary emulators to running in dom0,
> since they'll need to use DOMID_XEN to map the pages.
> 

Having talked this through with Tim, the conclusion is that emulator pages have to live in the guest p2m otherwise there's no good way to find them from the emulating domain. However, whilst the emulator is active it should be possible to remove them from the guest p2m so that the guest has no direct access.
So, I will  use a range special pfns as before, but simply advertise them via base and range HVM params so that I can allocate from and free to that space as ioreq servers come and go.

  Paul

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2014-03-20 12:10 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-05 14:47 [PATCH v3 0/6] Support for running secondary emulators Paul Durrant
2014-03-05 14:47 ` [PATCH v3 1/6] ioreq-server: centralize access to ioreq structures Paul Durrant
2014-03-05 14:47 ` [PATCH v3 2/6] ioreq-server: tidy up use of ioreq_t Paul Durrant
2014-03-10 15:43   ` George Dunlap
2014-03-10 15:46     ` Paul Durrant
2014-03-10 15:53       ` George Dunlap
2014-03-10 16:04         ` Paul Durrant
2014-03-10 16:56           ` George Dunlap
2014-03-11 10:06             ` Paul Durrant
2014-03-05 14:47 ` [PATCH v3 3/6] ioreq-server: create basic ioreq server abstraction Paul Durrant
2014-03-05 14:47 ` [PATCH v3 4/6] ioreq-server: on-demand creation of ioreq server Paul Durrant
2014-03-10 17:46   ` George Dunlap
2014-03-11 10:54     ` Paul Durrant
2014-03-14 11:04       ` Ian Campbell
2014-03-14 13:28         ` Paul Durrant
2014-03-14 11:18   ` Ian Campbell
2014-03-14 13:30     ` Paul Durrant
2014-03-05 14:48 ` [PATCH v3 5/6] ioreq-server: add support for multiple servers Paul Durrant
2014-03-14 11:52   ` Ian Campbell
2014-03-17 11:45     ` George Dunlap
2014-03-17 12:25     ` Paul Durrant
2014-03-17 12:35       ` Ian Campbell
2014-03-17 12:51         ` Paul Durrant
2014-03-17 12:53           ` Ian Campbell
2014-03-17 13:56             ` Paul Durrant
2014-03-17 14:44               ` Ian Campbell
2014-03-17 14:52                 ` Paul Durrant
2014-03-17 14:55                   ` Ian Campbell
2014-03-18 11:33                     ` Paul Durrant
2014-03-18 13:24                       ` George Dunlap
2014-03-18 13:38                         ` Paul Durrant
2014-03-18 13:45                         ` Paul Durrant
2014-03-20 11:11                   ` Tim Deegan
2014-03-20 11:22                     ` Paul Durrant
2014-03-20 12:10                       ` Paul Durrant
2014-03-05 14:48 ` [PATCH v3 6/6] ioreq-server: bring the PCI hotplug controller implementation into Xen Paul Durrant
2014-03-14 11:57   ` Ian Campbell
2014-03-14 13:25     ` Paul Durrant
2014-03-14 14:08       ` Ian Campbell
2014-03-14 14:31         ` Paul Durrant
2014-03-14 15:01           ` Ian Campbell
2014-03-14 15:18             ` Paul Durrant
2014-03-14 18:06               ` Konrad Rzeszutek Wilk
2014-03-17 11:13                 ` Paul Durrant
2014-03-10 18:57 ` [PATCH v3 0/6] Support for running secondary emulators George Dunlap
2014-03-11 10:48   ` Paul Durrant
2014-03-14 11:02 ` Ian Campbell
2014-03-14 13:26   ` Paul Durrant

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.